Having settled on your objectives for a system, the next step is to decide on an appropriate product and produce a plan for implementing it. A daunting task in the implementation of any GIS is the collection of appropriate data. Existing data within a organisation may have to be digitised, a costly and time consuming process. There are now an increasing number of vendors of digital data suitable for GIS. The data tends to be quite expensive and you must be sure that the data is suitable to your particular application.

What is spatial data ?

Spatial data is information which is linked to a specific location, for example the population of a town, or the occupant of an address. In many cases the difficult part of setting up data for a GIS is linking information to a location; a process known as geocoding. Within a particular data set there must, of course, exist an element which specifies its location. Ideally this would be a map co- ordinate, but it could also be a postcode or street address. The element within the data that identifies the location is known as its geocode.
A comprehensive understanding of the nature of geographical information is crucial in the data collection process and for the success of GIS as a whole. You will need to address the following questions: Are you aware of the consequences of bringing together datasets collected at different scales ? How accurate are the locations of features such as roads on small scale datasets ? Data sets can be divided into those about people - socio-economic - and those concerning the environment.

Socio-economic data
Socio-economic data is widely available, often from national and local government, and is usually the product of population surveys and censuses. This data is also used by a number of commercial vendors who combine census information with other datasets to produce neighbourhood profiles classifying particular areas for marketing purposes. This ability to recognise particular markets based on geographical datasets is known as Geodemographics and is one of the fastest growth areas within GIS.
Environmental Data
The collection and analysis of information about the environment was one of the driving forces behind the development of GIS and continues to be an important application area. Environmental data sets often tend to be large and require considerable management. Sources of environmental data include:
Environmental data often includes boundaries between vegetation types, for example, which are fuzzy i.e. they are not defined by a simple line. Conversely, socio-economic data is usually related to administrative boundaries, which are sharp if artificial.

Data models

Reality is too complex for even the most sophisticated GIS software, so in order to represent reality in a spatial database, a simplification of reality is created. This simplification is known as a data model. In a data model, reality is simplified into just four spatial entities, or elements, which can be used to represent the real world. These four spatial entities are:

In a telecommunications GIS application, for example, a point may represent the location of a junction box ; a line might represent a section of cable; an area may represent a building and a surface may represent the land surface through which cables are laid.
Attributes are then attached to these spatial entities, for example the type of cable, the address of the building and the height of any particular point. The linking of spatial entities with their attributes is one of the key concepts of GIS.
Spatial entities and their attributes are stored using a number of spatial data models by specific GIS software, and it is important to understand the characteristics of each, since the data model used has considerable influence on the functionality of the GIS.
The spatial data models are :

The Raster data model is the simpler of the two and is based on the division of reality into a regular grid of identically shaped cells.
Each cell is assigned a single value which represents the attribute for the area of that cell. In a soils data set, for example, a cell may have a value of 216, which might represent clay soil. The area which each cell represents varies from a few metres to kilometres and is known as the resolution of the grid. The higher the resolution of the grid, the more cells are required to represent a given area.

The Vector data model is similar in its operation to the join the dots books we all used as kids. An objects shape is represented by dots which are located where the shape of the object changes. The dots are joined by straight lines. In the vector data model the dots are known as vertices. Each vertex is joined by a straight line known as a segment. Where arcs or segments join, the vertex is called a node. A series of arcs which return to the same node form an area, or polygon.

At first sight, this data model appears to be similar to the data structure used by CAD systems and simple computer drawing packages. The GIS vector data model is slightly more complex as each vertex, arc, node and polygon is uniquely identified and the relationships between them are stored in the database.
The relationships between the elements of a vector data model, in terms of relative location and connections, are known as Topology. Topology gives the vector data model a level of ëintelligenceí which means that the GIS can recognise which arcs are joined to each other, and identify those polygons which are adjacent to each other.
The vector data model is best suited to representing linear features such as cable networks, and the relationships between areas. The main disadvantage of the vector model is when datasets are combined and analysed, as a much higher level of processing is required.
In addition to these two main data models, a third, the object-based model, is becoming increasingly popular. This represents the world in the form of objects the user would recognise. i.e. on a highway, it would represent the whole highway and not the individual segments that go to make it up. This has a number of advantages since the model is less abstract and easier to understand. However, the processing requirements are high.

Converting your data

You will need to make a decision over the type of data model you wish to use - Raster or Vector - and it is important to realise that a particular data model may be better suited to your application. However, the choice of data structure you can use for any particular application is often an arbitrary decision, since GIS software will generally support one particular model as fully as another. Data structure is a logical arrangement of your data in a format suitable for you and your system to manage it.
Whichever model and structure you choose, you will, of course, need to convert the data you already have into a format which can be used by the GIS. Converting data into digital format is a labour-intensive activity, and can account for up to 80% of the total system cost. Time spent on fact-finding and planning is time well spent.
Central to any data capture plan is a thorough internal data audit. This will help you determine the size, scope and cost of the task ahead. Given that few organisations are able to re deploy staff to tackle a data capture exercise, two realistic alternatives remain. Either you can hire, train and equip a dedicated team, or contract the job to a specialist bureau. The latter will almost certainly be able to undercut the in-house option, but you need to ascertain that this will not be at the expense of quality control and flexibility. Data capture can also be an opportunity to improve the quality of your data by incorporating new information with the old.

Scanning or table digitising?

You also have a choice to make between methods of converting your data: scanning and Vectorisation. Scanning offers ease and speed, but the resulting raster images lack the intelligence needed for vector-based GIS. A fair degree of operator expertise is also required, and compression techniques (typically run-length encoding) will need to be applied to keep the files to a manageable size. Vectorisation can be applied automatically or interactively to produce intelligent vector files.
Table digitising has the advantage of employing inexpensive digitising equipment. However, operator training is needed to obtain good results, especially from indifferent originals. Conversely, the procedure is laborious, time-consuming and, hence, costly.
Other possibilities such as raster-to-vector conversion and pattern recognition are worth considering in this trade-off between productivity, cost, quality and usability. While scanning and table digitising will accommodate the bulk of conversion needs, from text documents to line art and even video images, special techniques have been developed to enter material from other sources. These range from simple programs that facilitate the keyboard entry of survey co- ordinates to techniques that reconcile aerial photographs with base maps. Photogrammetric, remotely-sensed and CAD-generated data represent yet further potential input sources.