DATABASES, DESIGN, AND ORGANISATION


Databases

GIS Databases

Database design

Database management system


Databases

A database is a collection of information that's related to a particular subject or purpose, such as tracking residential population or maintaining a music collection. If your database isn't stored on a computer, or only parts of it are, you may be tracking information from a variety of sources that you're having to coordinate and organize yourself.

Within a database, divide your data into separate storage containers called tables; view, add, and update table data by using online forms; find and retrieve just the data you want by using queries; and analyse or print data in a specific layout by using reports. Allow users to view, update, or analyse the database's data from the Internet or an intranet by creating data access pages.

To store your data, create one table for each type of information that you track. To bring the data from multiple tables together in a query, form, report, or data access page, define relationships between the tables.

To find and retrieve just the data that meets conditions that you specify, including data from multiple tables, create a query. A query can also update or delete multiple records at the same time, and perform predefined or custom calculations on your data. To easily view, enter, and change data directly in a table, create a form


GIS databases

The issue of designing and organising a GIS database has to be considered in its entirety and needs a conceptual understanding of different disciplines, - cartography and mapmaking, geography, GIS, databases etc. here an overview of the design procedure that could be adopted and the organisational issues have been addressed. The issue of updating the database and the linkage aspect of the GIS database to other databases has also been addressed.

The Geographical Information System (GIS) has two distinct utilisation capabilities - the first pertaining to querying and obtaining information and the second pertaining to in targeted analytical modelling. The importance of the GIS database stems from the fact that the data elements of the database are closely interrelated and thus need to be structured for easy integration and retrieval. The GIS database has also to cater to the different needs of applications. In general, a proper database organisation needs to ensure the following [Healey, 1991; NCGIA, 1990]:

a)      Flexibility in the design to adapt to the needs of different users.

b)      A controlled and standardised approach to data input and updation.

c)      A system of validation checks to maintain the integrity and consistency of the data elements.

d)      A level of security for minimising damage to the data.

e)      Minimising redundancy in data storage.

THE DATA IN GIS

Broadly categorised, the basic data for the GIS database has two components:

a) Spatial data - consisting of maps and which have been pr-pared either by field surveys or by the interpretation of Remote-ly Sensed (RS) data. Some examples of the maps are the soil survey map,geological map, landuse map from RS data, village map etc. Much of these maps are available in analog form and it is of late that some map information is available directly in digital format. Thus, the incorporation of these maps into a GIS depends upon whether it is in analog or digital format - each of which has to be handled differently.

b) Non-spatial data - attributes as complementary to the spatial data and describe what is at a point, along a line or in a polygon and as socio-economic characteristics from census and other sources. The attributes of a soil category could be the depth of soil, texture, erosion, drainage etc and for a geological category could be the rock type, its age, major composition etc. The socio-economic characteristics could be the demographic data, occupation data for a village or traffic volume data for roads in a city etc. The non-spatial data is mainly available in tabular records in analog form and need to be converted into digital format for incorporation in GIS. However, the 1991 census data is now available in digital mode and thus direct incorporation to GIS database is possible.

2.1 MEASUREMENT OF GEOGRAPHICAL DATA

The data in a GIS is generally having a geographical connotation and thus it carries the normal characteristics of geographical data. The measurement of the data pertains to the description of what the data represents - a naming or legending or classification function and the calculation of their quantity - a counting or scaling or measurement function. Thus, scaling of the data is important while organising a GIS database. There are four scales by which data is represented [Brien, 1992]:

a) nominal, where the data is principally classified into mutually exclusive sets or levels based on relevant characteristics. The landuse information on a map representing the different categories of landuses is a nominal representation of data. The nominal scale is the commonly used measure for spatial data.

b) ordinal, which is a more sophisticated measurement as the classes are placed into some form of rank order based on a logical property of magnitude. A Ground water prospect map showing different classes of prospects and categorised from "high prospect" to "low prospect" is an ordinal scale measurement.

c) interval, which is continous scale of measurement and is crude representation of numeric data on a scale. Here, the class definition is a rank order where the differences between the ranks are quantified. The representation of population density in rank order is an example of interval data.

d) ratio, which is also a continous scale where the original of the scale is real and not imaginary. Further ratio interval represents the scaling between individual observation in the dataset and not just between datasets. An example of the ratio scale is when each value is normalised against a reference - generally an average or maxima or minima.

The above four scales have been defined as an hierarchy and thus the ratio scale exhibits all the defining operations while those further down the hierarchy possess fewer. Thus, a ratio scale may be reexpressed as an interval, ordinal or nominal data but nominal data cannot be expressed as ratios. Further, the nominal and ordinal scale are used to define categorical data - which is the method of representing maps or spatial data and the interval and ratio data are used to define continous data. TABLE - 1 shows the characteristics of the scales.

 


DATABASE DESIGN

GIS database design

Just as in any normal database activity, the GIS database also needs to be designed so as to cater to the needs of the application that proposes to utilise it. Apart from this the design would also:

a) provide a comprehensive framework of the database.

b) allow the database to be viewed in its entirety so that interaction and linkages between elements can be defined and evaluated.

c) permit identification of potential bottlenecks and problem areas so that design alternatives can be considered.

d) identify the essential and correct data and filter out irrelevant data

e) define updation procedures so that newer data can be incorporated in future.

The design of the GIS database will include three major elements [NCGIA, 1990]:

a) Conceptual design, basically laying down the application requirements and specifying the end- utilisation of the database. The conceptual design is independent of hardware and software and could be a wish-list of utilisation goals.

b) Logical design, which is the specification of the database vis-a-vis a particular GIS package. This design sets out the logical structure of the database elements determined by the GIS package.

c) Physical design, which pertains to the hardware and software characteristics and requires consideration of file structure, memory and disk space, access and speed etc.

Each stage is interrelated to the next stage of the design and impacts the organisation in a major way. For example, if the concepts are clearly defined, the logical design is easier done and if the logical design is clear the physical design is also easy. FIGURE 1 shows a framework of the design elements and their relationship. The success or failure of a GIS project is determined by the strength of the design and a good deal of time must be allocated to the design activity. SAC has evolved a set of design guidelines for the GIS database creation [Rao et al (1990)] which has been adopted for implementation of GIS projects for Bombay Metropolitan Region (BMR) [SAC and BMRDA, 1992]; Regional planning at district level for Bharatpur [SAC and TCPO, 1992]; Wasteland Development for Dungarpur [SAC, 1993]. Much of what has been discussed here is based on the design guidelines evolved and also the experience gained in the execution of the different GIS projects. To illustrate the design aspects of a GIS database examples from design of the Bharatpur district database will be explained and referred.

Designing a database

Good database design is the keystone to creating a database that does what you want it to do effectively, accurately, and efficiently.

Steps in designing a database

·        Determine the purpose of your database

·        Determine the tables you need

·        Determine the fields you need

·        Identify the field or fields with unique values in each record

Determine the relationships between tables

3.1 GIS - CORE OF THE DATABASE

The Geographical Information system (GIS) package is the core of the GIS database as both spatial and non-spatial databases have to be handled. The GIS package offers efficient utilities for handling both these datasets and also allows for the spatial database organisation; non-spatial datasets organisation - mainly as attributes of the spatial elements; analysis and transformation for obtaining the required information; obtaining information in specific format (cartographic quality outputs and reports); organisation of a user-friendly Query-system. Different types of GIS packages are available and the GIS database organisation depends on the GIS package that is to be utilised. Apart from the basic functionality of a GIS package, some of the crucial aspects that impact the GIS database organisation are as follows:

a) data structure of the GIS package. Most GIS packages adopt either a raster or vector structure, or their variants, internally to organise spatial data and represent realworld features.

b) attribute data management. Most of the GIS packages have embedded linkage to a Data Base Management System (DBMS) to manage the attribute data as tables.

c) a tiled concept of spatial data handling, which is fundamental to the way maps are represented in real world. For example, 16 SOI 1:50,000 map sheets make up 1 1: 250,000 sheet and 16 1:250,000 sheet make 1 1:1,000,000 sheet. This map tile graticule could also be represented in a GIS and some GIS package allow tile-data handling.

4.0 GIS DATABASE - CONCEPTUAL DESIGN

The Conceptual Design (CD) of a GIS database defines the application needs and the end objective of the database. Generally, this is a statement of end needs and is defined fuzzily. However, it crystallises and evolves as the GIS database progresses but within the framework of the broad statement of intentions. However, the clearer and well defined the CD the easier it is for the logical designing of the GIS database. Some of the key issues that merit consideration for the CD are:

a) Specifying the ultimate use of the GIS database as a single statement. Some examples could be GIS DATABASE FOR URBAN PLANNING AT MICRO-LEVEL; GIS DATABASE FOR WATER SUPPLY MANAGEMENT; GIS DATABASE FOR WILDLIFE HABITAT MANAGEMENT. The important aspect here is the management of a particular resource, facility etc and thus the statement would generally include the management activity.

b) Level or detail of GIS database which indicates the scale or level of the data contents of the database. A database designed for MICRO-LEVEL would require far more details than one designed for MACRO-LEVEL applications. TABLE 1 illustrates the relationship between level and applications which could be used as a guideline In most of the cases the level or detail is implicit in the statement of end use.

c) Spatial elements of GIS database, which depends upon the end use and defines the spatialdatasets that will populate the database. The spatial elements is application specific and is mainly made of maps obtained from different sources.

The spatial elements could be categorised into primary elements, which are the ones that are digitised or entered into the database and derived elements, those that are derived from the primary elements based on a GIS operation. For example, the contours/elevation points could be primary elements but the slope that is derived from the contours/elevation points is a derived element. This distinction of the primary and secondary element is useful in estimating the database creation load and also in scheduling GIS operations. TABLE 2 illustrates some of the primary elements and derived elements of a GIS database for district level planning applications.

d) Non-spatial elements of GIS database which are the non-spatial datasets that would populate the GIS database. The actual definition of the non-spatial elements would depend upon the end use and is application specific. For example, non-spatial data for forest applications would include data on tree species, age, production etc and non-spatial data for urban applications would include wardwise population, services and facilities data and so on. TABLE 3 shows some of the typical non-spatial data elements for a district planning application. Much of the non-spatial data comes from sources like the Census department, municipalities, resource survey agencies etc.

e) Source of spatial and non-spatial data is an important design issue as it brings about the details of the data collection activity and also helps identify the need for data generation. Most of the spatial data or thematic maps are available from the central and state survey agencies and non-spatial data is available as Census records or from the survey departments.

f) Age of data is an important design issue as it, in turn, defines the age of the database - making it either useful or useless for a particular end application. For example, if the application is to study the impact of pollution in an urban area then the pollution data needs to be current and the use of past data would render the impact analysis ineffective.

g) Spatial data domain, pertaining to the basic framework of the spatial datasets. Most of the spatial data sets follow the Survey of India (SOI) latitude-longitude coordinate system (as is given in the SOI maps) and thus, the spatial data base needs to follow the standards of the SOI mapsheets.

h) Impact of study area extent, defining the actual geographical area for which the GIS database is to be organised. Mostly, if SOI framework is adopted, the coverage will be in non-overlapping SOI map sheets - extent in certain mapsheets is partial as against the full extent in certain mapsheet. The extent definition also lays down the limits of the database and also helps in the logical design of the spatial elements.

i) Spatial Registration framework, is essential to adopt a standard registration procedure for the database. This is generally done by the use of registration points - also called TIC points in GIS. These registration points could be the corners of the graticule of the spatial domain - say the four corners of the SOI mapsheet at 1: 50,000 scale or control points that can be discerned - road intersections, railline-road intersections, bridges etc in each spatial element that is to populate the database. Unique identifiers for each registration point helps in locating and registering the database. FIGURE 2 shows the scheme of registration points used for the Bharatpur project. This scheme is a "shared" method of points where each registration is a part of more than one mapsheet. This helps in the map joining/mosaicking and sheet-by-sheet data digitisation process.

j) Non-spatial data domain specifying the levels of non-spatial data. The non-spatial datasets are available at different levels and it is essential to organise the non-spatial data at the lowest unit. The higher levels could then be abstracted from the lowest unit whenever required. For example for the Bharatpur database non-spatial data was available at different levels of administrative units - district, taluk and village. The village was the lowest unit at which the non-spatial data was available and thus non-spatial data domain was considered at the village level.

5.0 GIS DATABASE - LOGICAL DESIGN

The Logical Design of the GIS database pertains to the logical definition of the database and is a more detailed organisation activity in a GIS. Most of the design issues are specific to GIS and thus the scope varies with the type and kind of GIS package to be utilised. However, in an overall manner most of these issues are common over the different GIS packages. SAC has evolved a set of guidelines for the logical designing of the GIS database which have been adopted in the organisation of GIS databases for BMR, Bharatpur, Dungarpur etc. TABLE 4 shows some of these critical design guidelines adopted which could be adopted for the GIS database organisation. Some of the key issues are:

a) Coordinate system for database, which determines the way coordinates are to be stored in the GIS packages. Most GIS package offer a range of coordinate systems depending on what projection systems are employed. The coordinate system for the GIS database needs to be in appropriate units that represent the geographic features in their true shape and sizes. The coordinate system would generally get defined by the spatial domain of the GIS database. For example, if the SOI 1:50, 000 graticule has been adopted for the database, it is essential to have the same coordinate/projection system that SOI adopts. All SOI toposheets on 1:50, 000 scale adopt the Polyconic projection system. Further, the units of the polyconic projection are represented in actual ground distances - meters. As a result all spatial elements of the GIS database are referenced in an uniform coordinate system. This would allow for easy integration of spatial datasets as part of the analysis and also maintain a homogeneity in the GIS database.

b) Spatial Tile design pertains to the concept of a set of map tiles composing the total extent. For example, the district of Bharatpur is organised in 19 map tiles of SOI sheets at 1:50,000 scale. Certain GIS packages allow for the organisation of tiles which facilitates the systematic data entry on a tile-by-tile basis and also the horizontal organisation of spatial data.

FIGURE 3 shows the concept of horizontal and vertical organisation of the spatial data in the database.

c) Defining attribute data dictionary: The data dictionary is an organised collection of attribute data records containing information on the feature attribute codes and names used for the spatial database. The dictionary consists descriptions of the attribute code for each spatial data element. TABLE 5 shows a partial listing of the attribute data dictionary adopted for Bharatpur database.

d) Spatial data normalisation is akin to the Normalisation of relations and pertains to finding the simplest structure of the spatial data and identifying the dependency between spatial elements. Normalisation avoids of general information and also reduces redundancy. A process of normalisation of the spatial data is also essential to identify master templates and component templates. This normalisation process insures that the coincident component features of the various elements are coordinate coincident - thus limiting overlay sliver problems. This also ensures the redundancy in digitisation process as master templates are digitised only once and form a part of all elements. For example, in the Bharatpur database, the following features have been identified as master templates: - district /taluka boundary- rivers/streams- water bodies These elements need to occur in each spatial element and also because they need to be coordinate coincident.

e) Tolerances definitions are an important aspect of the GIS database design. The tolerances specify the error-level associated with each spatial element. The different tolerances that need to be considered are:

- Coordinate Movement Tolerance (CMT) which specifies the limit upto which coordinates could move as part of a GIS operation. If the tolerance is not stringent then repeated GIS operation could move the coordinates significantly so as to distort the size and shape of the features.

- Weed Tolerance (WT) which pertains to the minimum separation between coordinates while digitising. For example a straight line could be represented by two vertices and intermediate vertices are redundant. A proper weed tolerance would not create the intermediate vertices at all and thus not populate the database unnecessarily.

- Minimum Spatial Unit (MSU) which indicates the smallest representable area in the database. Any polygon feature having lesser area than the MSU would be aggregated. The MSU is an indication of the resolution of the database. The concept of MSU is pertinent for vector GIS databases and is not applicable for raster GIS databases as in a raster GIS a raster/grid becomes the MSU as no features below the MSU can be resolved. The tolerances are all dependent on the scale or level of database. Some of the general guidelines suggested for different scales are listed in TABLE 7.

f) Spatial and non-spatial data linkage where the interlinkages of the spatial and non-spatial data are defined. These linkages and interrelationships are an important element of the GIS database organisation as they define the userrelations or userviews that can be created. There are two major linkage aspects involved:

- for all spatial data sets representing resources information or thematic information and those other than administrative maps, the linkage is achieved through the data dictionary feature code at the time of creation/digitisation itself.

- for administrative maps - village and taluk maps, the linkage is achieved on a one-to-one relation based on a unique code for each village or the taluk. For example, in Bharatpur database this code has been identified as the census code for the 1463 villages/settlements in the district. Thus the internal organisation of the spatial village/taluk boundaries is flexible to relate to the village-wise non-spatial database on a one-to-one basis. FIGURE 4 shows the type of relation that was adopted for the Bharatpur database.

6.0 GIS DATABASE PHYSICAL DESIGN

The Physical Design (PD) pertains to the assessment of the load, disk space requirement, memory requirement, access and speed requirements etc for the GIS. Much of these pertain to the hardware platform on which the GIS will operate. There are no standards on PD aspects available and much of the design has to be based on experience. However, some of the key issues are as follows:

a) Disk space requirement is a major concern for GIS database designers. The paradigm THERE IS NO END TO A GIS DATABASE sums it up all as most GIS databases have realised how fast their disk space estimates have gone awry. As an illustration of this aspect, the Bharatpur database takes up about 54 MB space for the actual data. Any further integrated analysis which would create intermediate outputs would take anywhere between 3-4 times the normal space. Thus, experience shows that for a district database a 300 MB disk is just sufficient and a higher disk space would be appropriate. The differences in space utilisation of different GIS packages is reflected in a benchmark application run on PC-ARC/INFO and ISROGIS packages. The PC-ARC/INFO utilised 26.84 MB space while for the same dataset the ISROGIS utilised 35.22 MB [Rao et al, 1993]. This is to illustrate the range of space utilisation variation.

b) Load of database is also difficult to determine as there is no way of estimating the number of points, line, polygons in each spatial element. However, broad guidelines could be evolved and estimates made. For example, the National Capital Region Planning Board (NCRPB) have adopted a two way categorisation of spatial elements - three-level qualitative density based categorisation of maps and two-level full or partial coverage based categorisation for the 67 maps covering the NCR. Based upon this the total spatial maps to be organised is estimated as 657 map sheets.

c) Access and speed requirements are more oriented towards the ability to handle large and dense maps rather that the time involved in processing. The GIS applications are not real-time applications and thus the access time or speed becomes a secondary aspect. A benchmark study on PC-ARC/INFO and ISROGIS has sown that the time taken for an overall application - consisting of various steps is 11.5 hrs and 7.5 hours respectively [Rao et al, 1993]. The point to be noted is that even though there is a 4 hrs difference the implication on the application is not driven by the difference as it is not real-time. d) File and data organisation in GIS is an activity which is taken care of by the GIS package itself and no design aspects need be considered for the physical organisation of files. Each GIS package has its own file system organisation which could be either a single file or a set of files and are transparent to the user.

7.0 GIS DATABASE CREATION

7.1 Spatial database creation

Based on the design, the steps of database creation are worked out and a procedure laid down. The procedure for the spatial database creation is described below:

a) Master template creation: As discussed earlier, a master template is created as a reference layer and consisting of the district boundary, rivers etc. This template is then used for the component themes digitisation.

b) Thematic map manuscript preparation - Based on the spatial domain (in Bharatpur database it was the SOI graticule of 1:50,000 scale), the different theme oriented information is transferred from the base map to a mylar/transparent sheet. Spatial data manuscripts are mylars consisting features that are to be digitised. These manuscripts are prepared on a sheet-by- sheet basis for digitisation. These manuscripts consist "instructions" for digitisation or scanning - which include:

- Registration point locations and identifiers - feature codes as per the dictionary defined earlier. - feature boundaries - tolerance specifications - any other digitisation/scanning instructions

c) Digitisation of features: The theme features of the spatial dataset are then digitised/scanned using the GIS package. The digitisation is done for each mapsheet of the spatial reference. The master registration-point reference are used for the digitisation. The theme digitisation is done as a component into a copy of the master template layer.

d) Coverage editing - The digitised coverage is processed for digitisation errors such as dangles, constituting the overshoots or undershoots, and labels for polygons. This constitutes obtaining a report of these errors and then a manual editing of these features. Finally the coverage is processed for topology creation. As in the case of digitisation, the editing has also to be done on a mapsheet basis.

In the case of raster GIS packages, the topology construction may not be relevant. However, a clumping process to identify the clump of rasters having similar characteristics is essential.

d) Appending of mapsheets thematic features: The next step in the procedure is the appending or mosaicking of the different mapsheets into a single theme map for the whole extent. The graticule of registration points are used for this purpose.

e) Attribute coding verification: The attribute codes for the different categories need to be then verified and additional attributes - featurename, description etc. are added into the feature database. It is only after this procedure that the theme coverage is ready for GIS analysis. FIGURE 5 shows the procedure for spatial database creation.

7.2 Non-spatial database organisation

Non-spatial data elements are listed in TABLE 3 and most of these are available in analog mode- specifically the census data of 1981 and earlier. Towards converting it into a digital mode, a suitable application package could be used to configure a data entry system. At SAC, a dBASE interface module has been developed for the census data capture. It is a user friendly module for easy entry and editing of census data and organisation into a database and is based on the taluk-village hierarchy of the district. To this end it makes use of a primary file containing taluka-wise village names and their census code as listed in the census abstract. The module can be directly used for entering the census data in sector-wise databases which are created as secondary files. These secondary files are related to the primary file of villages based on the census village code as the keyitem [Rangwala et al, 1988]. Using this module, the census data for Bharatpur district has been organised into different sectoral databases. The Census data of 1991 is available in digital format as a set of database files. These database files could be structured into a sectoral organisation so that incorporation in GIS is easier.

7.3 Defining relations between spatial and non-spatial data

The GIS allows for the spatial data and the non-spatial features to be related or linked based upon a defined relationship. The relation in the GIS is a method of relating the same spatial entity to different non-spatial entities based on a linkkey.

The linkages are more pertinent for the village-wise data where village-boundary theme or settlement theme represents the spatial distribution of villages or the settlements and a one-to-one relationship can be defined for each of the village/settlement entity and the non-spatial data for the village/settlement. Apart from this, the village-taluk hierarchy can also be "forced" into all spatial datasets so as to be able to extract taluk-wise spatial feature information - either in spatial format or as non-spatial tabular output.

7.4 Integration of village boundaries - Issues

One of the important aspects of GIS database for districts/regions is the combined analysis of the tabular socioeconomic data and the thematic natural resources data. These two discrete datasets have different characteristics. The socioeconomic and developmental data is mainly the data collected by the Census which is on a village-wise basis. This dataset is based on a villa ge-taluk-district hierarchy and is mainly tabular. As against this, the thematic data on natural resources is based on a spatial framework. These datasets follow the SOI toposheet graticule and thus are based on the Polyconic projection system. An integrated planning exercise would require that these two datasets be combined/analysed together to derive meaningful plan inputs. The integration would be to:

a) merge the attributes of the villages and the natural resources for generating plan scenarios

b) spatial representation of the non-spatial tabular attributes of the villages.

c) amenability to aggregate and abstract the village attributes and the natural resources to the village-taluk-district and SOI graticule (for example 1:50,000 and 1:250,000 scale)

d) generate the village/taluk-wise information of natural resources for tabular updation.

A methodology for integrating the village boundary to a SOI mapbase has been developed at SAC and is based on projection of census village boundaries from a transparency to a standard SOI map base and transfer of village boundaries to the base [SAC and TCPO, 1992].

8.0 DATABASE UPDATION AND LINKAGES

Both the spatial and non-spatial database will have to be updated frequently so as to have the latest data for the further analysis/modeling. Some of the data elements could be relat ively static and thus could be created once and updated only when there are changes. Such elements are mainly administrative boundaries, elevation points, drainage maps etc. However, the data elements that have to be more frequently updated are as follows:

a) Spatial database: The updation of the spatial database will have to be based mainly on the inputs from RS data as also from the periodic surveys carried out by different agencies . Updation can be categorised as follows:

- RS data based updation - mainly landuse/cover (every year); forest type/densit y maps (once in two years); urban landuse maps (once a year for major cities and once in 3 years for towns/small cities); geological maps (once in 10 years); geomorphological/hydrogeomorphological maps (once in 3 years); GW potential maps (once in 2 years or whenever drought occurs); flood maps (pre- and post-flood season every year) etc. - Updation based on survey agency maps - mainly soil maps; forest maps; detailed geological and mineral data; road maps etc. These maps could be acquired from the respective agency and digitised into the database. These could be taken up whenever available - ideally once in 10 years.

b) Non-spatial data: Much of the non-spatial data are based on the census records and thus would be updated once every 10 years. However, it would be more proper if some of the non-spatial data is available more frequently - say, once every five years so as to be optimal for the planning process. A ten year schedule is not commensurate with the ongoing development as the database needs to be updated for intermediate developments in a more frequent manner. Otherwise, data of a decade would be used for a planning process and suggesting developmental plans which would have already taken place. Exchange of data from the GIS database to other computerised databases at district level can be done so as to be able to provide data for further use. This exchange would mean:

a) a non-spatial data exchange as the district does not have the capability to handle data in spatial format. In case the capability to handle spatial data is available then the spatial data exchange can also be visualised.

b) the non-spatial representation of all datasets in the GIS database. This non-spatial representation of data could be on a taluk-basis or village-basis.

 


Database management systems

DBMS

The origins of DBMS data models is in computer science (Clarke, 1997).

A DBMS contains:


·        A data definition language

·        A data dictionary

·        A data entry module

·        A data update module

·        A report generator

·        A query language


 

Data definition language (DDL):

DDL is the language used to describe the contents of the database (Modarres, 1998). DDL is the part of the DBMS that is allows the user to set up a new database, to specify how many attributes there will be what types and lengths or numerical ranges of each attribute will be and how much of the user is allowed to do (Clarke, 1997).

It is used to describe, for example, attribute names (field names), data types, location in the database, etc.

This establishes the data dictionary, a catalog of all of the attributes with their legal values and ranges.

The most management function is data entry, and since most entry of attribute data

Monotonous and may be by transcription from paper records, the DBMS's data-entry system should be able to enforce the ranges and limit entered into the data dictionary by definition language.

All data entry is subject to error, and first step after entry should be verification, and after that updated to reflect change.

Then the DBMS can be used to perform functions such as sorting, reordering, subsetting, and searching; to do so requires the use of query language, the part that allows the user to interact with the data to perform those tasks(Clarke, 1997).

Data manipulation and query language: Normally a fourth-generation language (4GL) is supported by a DBMS to form commands for input, edit, analysis, output, reformatting, etc. Some degree of standardisation has been achieved with SQL (Structured Query Language) (Modarres, 1998).

DBMS queries are sorting, renumbering, subsetting, and searching.

The query language is the user interface for searching.


GIS DATABASE 

INTRODUCTION

The real world is too complex for our immediate and direct understanding. We create "models" of reality that are intended to have some similarity with selected aspects of the real world. Data bases are created from these "models" as a fundamental step in coming to know the nature and status of that reality (Modarres, 1998). 

The Geographical Information System (GIS) has two distinct utilisation capabilities - the first pertaining to querying and obtaining information and the second pertaining to in tegrated analytical modelling. However, both these capabilities depend upon the core of the GIS - the database that has been organised. Many a GIS utilisation have been limited because of improper database organisation. The importance of the GIS database stems from the fact that the data elements of the database are closely interrelated and thus need to be structured for easy integration and retrieval. The GIS database has also to cater to the different needs of applications. In general, a proper database organisation needs to ensure the following [Healey, 1991; NCGIA, 1990]:

·        flexibility in the design to adapt to the needs of different users.

·        a controlled and standardised approach to data input and updation.

·        a system of validation checks to maintain the integrity and consistency of the data elements.

·        a level of security for minimising damage to the data.

·        minimising redundancy in data storage.

While the above is a general consideration for database organisation, in a GIS domain the considerations are pertinent with the different types and nature of data that need to be organised and stored.

 

What is a database?

Database: is a large collection of data in a computer system, organized so that it can be expanded, updated, and retrieved rapidly for various uses. It could be a file or a set of files (Ronli, 1999). File: is a collection of organized records of information. A record has usually a record number and record content. The file has a name give by the system or user (Ronli, 1999).

 

A database is a collection of information related to a particular subject or purpose, such as tracking customer orders or maintaining a music collection (microsoft, 1997).

 

Database: is self-describing collection of integrated records (Kroenke, 1995)

Database is self-describing: It contains, in addition to the user's source data, a description of its structure. This description is called a data dictionary (or data directory or metadata). It is the data dictionary that makes program/data independence possible.

A database is a collection of integrated records: Bits are aggregated into bytes or characters; characters are aggregated into fields; fields are aggregated into records; and records into files. Bits - characters - fields - records -files

Others are metadata, indexes that are used to represent relationship among the data and also to improve the performance of database application, the database often contains data about the applications that uses the database. The structure of data entry form, or a report, is sometimes part of the database, which is called application metadata. Thus, database contains four types of data: files of the user's data, indexes, and application metadata.

Files + metadata + indexes + application = Database.

 

A spatial data base is a collection of spatially referenced data that acts as a model of reality. 

Spatial database: stores GEOREFERENCED data. For example, wells with their locations, bank account holders with addresses, and property taxes with boundaries

 

 

INTRODUCTION TO DATABASE PROCESSING

A successful GIS begins with a database, so it important to first take a look at database

In GIS, the database is important as its creation will often account for up to three-quarters of the time and effort involved in developing a geographic information system. (Kenneth et al, 1996).

It is important, however, to view these GIS databases as more than simple stores of information. The database is used to abstract very specific sorts of information about reality and organize it in a way that will prove useful. The database should be viewed as a representation or model of the world developed for a very specific application (Kenneth et al, 1996). There are very many things involved in the design of a database.

 

GIS has become more powerful as database products has be more powerful and database technology more accessible (Kroenke, 1995). This has been so because

·        The personal computer DBMS have become more powerful and easier to use, and their price has decreased substantially. Products such as Microsoft's access not only provide the power to a true relational DBMS on a PC, but also include facilities for developing GUI-based forms, reports, and menus.

·        The new modelling methodologies and tools, especially those based on the object-oriented thinking, have become available. studies show that semantic object modelling( say with SALSA)  to be far superior to the old techniques, such as the entity-relationship modelling (say with IEF) approach: able to create better models, faster, and with greater satisfaction

·        There has the emergence of client server processing in general and especially client server database processing in particular. This enables companies to download main frame to a server database on a PC, making ease for personal to access to database.


 

THE DATA IN DATABASE

Broadly categorised, the basic data for the GIS database has two components:

a) Spatial data - consisting of maps and which have been pr-pared either by field surveys or by the interpretation of Remote-ly Sensed (RS) data. Some examples of the maps are the soil survey map,geological map, landuse map from RS data, village map etc. Much of these maps are available in analog form and it is of late that some map information is available directly in digital format. Thus, the incorporation of these maps into a GIS depends upon whether it is in analog or digital format - each of which has to be handled differently.

b) Non-spatial data - attributes as complementary to the spatial data and describe what is at a point, along a line or in a polygon and as socio-economic characteristics from census and other sources. The attributes of a soil category could be the depth of soil, texture, erosion, drainage etc and for a geological category could be the rock type, its age, major composition etc. The socio-economic characteristics could be the demographic data, occupation data for a village or traffic volume data for roads in a city etc. The non-spatial data is mainly available in tabular records in analog form and need to be converted into digital format for incorporation in GIS. However, the 1991 census data is now available in digital mode and thus direct incorporation to GIS database is possible.

 

Data input In database

There are very many methods of data entry in database, these include use of advancing technologies, such as scanning, feature recognition, raster-to-vector conversion, and image processing, along with traditional digitizing and key entry methods (Kroenke, 1995).

Digitizing on a tablet captures map data by tracing lines from a map by hand, using a cursor and an electronically-sensitive tablet. The result is a string of points with (x, y) values.

Scanning places a map on a glass plate, and passes a light beam over it measuring the reflected light intensity. The result is a grid of pixels. Image size and resolution are important to scanning. Small features on the map can drop out if the pixels are too big.

Attribute data can be thought of as being contained in a flat file. This is a table of attributes by records, with entries called values.

 

How data is represented in database

Kenneth et al, 1996).

It is important to realize that this non-spatial data can be filed away in several different forms depending on how it needs to be used and accessed. Perhaps the simplist method is the flat file or spreadsheet, where each geographic feature is matched to one row of data

 

Flat Files and Spreadsheets

A flat file or spreadsheet is a simple method for storing data. All records in this data base have the same number of "fields". Individual records have different data in each field with one field serving as a key to locate a particular record (Kenneth et al, 1996). For a person, or a tract of land there could be hundreds of fields associated with the record. When the number of fields becomes lengthy a flat file is cumbersome to search. Also the key field is usually determined by the programmer and searching by other determinants may be difficult for the user. Although this type of database is simple in its structure, expanding the number of fields usually entails reprogramming. Additionally, adding new records is time consuming, particularly when there are numerous fields. Other methods offer more flexibility and responsiveness in GIS.

 

 

Hierarchical Files

Hierarchical files store data in more than one type of record. This method is usually described as a "parent-child, one-to-many" relationship (Kenneth et al, 1996). One field is key to all records, but data in one record does not have to be repeated in another. This system allows records with similar attributes to be associated together. The records are linked to each other by a key field in a hierarchy of files. Each record, except for the master record, has a higher level record file linked by a key field "pointer". In other words, one record may lead to another and so on in a relatively descending pattern. An advantage is that when the relationship is clearly defined, and queries follow a standard routine, a very efficient data structure results. The database is arranged according to its use and needs. Access to different records is readily available, or easy to deny to a user by not furnishing that particular file of the database. One of the disadvantages is one must access the master record, with the key field determinant, in order to link "downward" to other records.

 

 

Relational Files

Relational files connect different files or tables (relations) without using internal pointers or keys. Instead a common link of data is used to join or associate records. The link is not hierchical (Kenneth et al, 1996). A "matrices of tables" is used to store the information. As long as the tables have a common link they may be combined by the user to form new inquires and data output. This is the most flexible system and is particularly suited to SQL (structured query language). Queries are not limited by a hierarchy of files, but instead are based on relationships from one type of record to another that the user establishes. Because of its flexibility this system is the most popular database model for GIS. They remain the dominant form of DBMS today (Clarke, 1997).

They are simple, and user's standpoint is an extension of the flat file model. The major difference is that a database can consist of several flat files, and each can contain different attributes associated with a record.

 

 

 

Flat, Hierarchical, and Relational Files Compared

Structure

Advantages

Disadvantages 

Flat Files

·        Fast data retrieval

·        Simple structure and easy to program

·        Difficult to process multiple values of a data item

·        Adding new data categories requires reprogramming

·        Slow data retrieval without the key

Hierarchical Files

·        Adding and deleting records is easy

·        Fast data retrieval through higher level records

·        Multiple associations with like records in different files

·        Pointer path restricts access

·        Each association requires repetitive data in other records

·        Pointers require large amount of computer storage

Relational Files

·        Easy access and minimal technical training for users, as data is kept in different files.

·        Flexibility for unforeseen inquiries as it allows to assemble any combination of attributes and records as long as they are linked by a key attribute.

·        Easy modification and addition of new relationships, data, and records

·        Physical storage of data can change without affecting relationships between records

·        New relations can require considerable processing

·        Sequential access is slow

·        Method of storage an disks impacts processing time

·        Easy to make logical mistakes due to flexibility of relationships between records

 

Now, let us consider a couple of examples of matching applications to database structures.

 Exploratory research--flat files are easy to organize, space is not particular problem

 Government agencies--hierarchical systems are particularly attractive

 Planning and development--relational might be justified for flexibility

 

 

Why the use of database (Problems which database-processing system has solved)

 


1.      Data is separated and isolated. If some related data is needed, do so the system manager must determine which parts of the file are needed; then he must decide how to the files are related; and he must co-ordinate the processing of the files so that the correct data is extracted. Database-processing system data is stored in one place and the database management system accesses the stored data. So, the general structure of all database applications is (users - database application - DBMS - Database). (Kroenke, 1995)

2.      Data is often duplicated. This waste files space and brings the problem of data integrity. If data items differ, they will produce inconsistent results, making difficult determine which is true thus reducing the credibility of the data. (Kroenke, 1995)

3.      Application programs are dependent on the file format. If changes are made in the file format, the application program also must be changed

4.      Files are often incompatible with one another. File format of the program language or the product used to generate it e.g. COBOL and C program. (Kroenke, 1995)

5.      The difficult of present the data the way the users view it.


 

 

Benefits of relational model

The data is stored, at least conceptually, in a way the user can readily understand (Kroenke, 1995). Data is stored in tables, and the relationship between the rows of the table is visible in the data. Unlike the earlier database models where the DBMS stored the relationship in the systems data such as indexes which would hide the relationship, RDBMS enables the user to obtain information from the database without the assistance of the professional as the relationship is always stored in user-visible the data. RDBMS are particularly in the Decision-support system (DSS).

 

Microcomputer DBMS products

There has been a lot of development in the DBMS: started with dBase II was not a DBMS, dBase III that was a DBMS, dBase IV that was RDBMS. Today DBMS products provide rich and robust user interfaces using graphical user interface such as Microsoft windows. Two products are Microsoft's access and Borland's paradox for windows (Kroenke, 1995).

 

Client server Database Applications

The development of local area networks (LAN) which led to linking of micro computer CPUs so that they can work simultaneously, which was advantageous (greater performance) and more problematic, led to a new style of database processing called the client server database architecture (Kroenke, 1995).

 

Distributed Database processing

Organisational database applications address the problems of file processing and allow more integrated processing of organisational data. Personal and work-group database systems bring in database technology even closer to the user by allowing him access locally managed database. Distributed database combine these types of databases processing by allowing personal, work-group, and organisational databases to be combined into integrate but distributed system (Kroenke, 1995).

 

Object-oriented DBMS (OODBMS)

These are the DBMS, which come as a result of development of a new type of programming called Object-oriented programming. It difficult and has different types of data structures.

 

Data modelling

Is the processing of creating a representation of the user's view of the data.

There are two data modelling tools. The entity-relation approach and semantic object approach.

 

Development of database

 


Components of database systems

 

 

Features and functions

       Of DBMS

 


!.Design tools subsystem

    -Tables creation tool

    -Form ceation tool                                               developer

    -Query creation tool

Database                        -Report creation tool

User's data                                   -Procedural lanngauge            Application 

Metadata                                              compiler                           program      

Overhead data                          2.Run tool subsystem

Indexes                           -Form processor

Linked lists                      -Query processor

Application                                  -report writer                                          

   -Procedural language                application            users

       run time                                    program       

3.DBMS Engine

 

 

 


figure from (Kroenke, 1995)

 

General strategies

To develop a database, we build a data model that identifies the things to be stored in the database and defines their structure and the relationship among them. This familiarity must be obtained early in the development process, by interviewing the user and the building the requirements.

There are two general strategies for developing a database: Top-down development and Bottom-up development (Kroenke, 1995)


Top-down development: proceeds from the to the specific. It begins with goals of the organisation, the means by which the goals can be accomplished, the information requirements that must be satisfied to reach those goals, an abstract of data model is constructed. Using this high-level model, the development team progressively works down-wards towards more and more detailed descriptions and models. Intermediate-level   models also are expanded with more detail until the particular databases and related applications  can be identified. One or more applications are selected for development. Over time, the entire high-level data model  is transformed into lower-level models, and the indicated systems, databases, and applications are created. For Bottom-up development is the reverse. The entity-relation approach is more effective with top-down development, and the semantic object approach is more effective with bottom-up development.


 

 

FUNDAMENTAL DATA BASE ELEMENTS 

Elements of reality modelled in a GIS data base have two identities: entity and object

Entity 

An entity is the element in reality. An entity is something that can be by the user's work environment, something important to the user of the system, e.g. someone's name (Kroenke, 1995). An entity is "a phenomenon of interest in reality that is not further subdivided into phenomena of the same kind". e.g. a city could be considered an entity and subdivided into component parts but these parts would not be called cities (they could be districts, neighbourhoods, etc.) therefore it would be an entity. e.g. a forest could be subdivided into smaller forests therefore it would not be an entity. 

Similar phenomena to be stored in a data base are identified as entity types. An entity type is any grouping of similar phenomena that should eventually get represented and stored in a uniform way, e.g. roads, rivers, elevations and vegetation. Entities are grouped into entity classes, or collection of entities of the same type (Kroenke, 1995).

Entity class is the general form or description of a thing, where as an instance of an entity class is the representation of a particular entity, e.g. car 123

 

Attributes 

Entities have attributes, or as they are some called, properties which the entity's characteristics

An attribute is a characteristic of an entity selected for representation. It is usually non-spatial though some may be related to the spatial character of the phenomena under study (e.g. area and perimeter of a region).

The actual value of the attribute that has been measured (sampled) and stored in the database is called attribute value

An entity type is almost always labelled and known by attributes (e.g. a road usually has a name and is identified according to its class such as freeway, state road, etc.). 

Attribute values often are conceptually organised in attribute tables, which list individual entities in the rows and attributes in the column. Entities in each cell of the table represent the attribute value of a specific attribute for a specific entity. 

 

Identifiers

Entities instances have names that identify them. The identifier of an instance is one or more of its attributes (Kroenke, 1995). An identify may be ethier unique or not. If it is unique , its value will identify one, entity instance.

 

Relationships

Entity can be associated with one another in the relationship. The E-R model contains both relationship classes and relationship instances. Relationship classes are associations among entity classes, and relationship instances are associated among entity instances. Relationships can have attributes. (Kroenke, 1995).

A relation can include many entities; the number of entities in a relationship is the degree of relationship. Although the E-R model allows relationships of any degree, most application of the model involve only relationship of degree two.

Relationship is an association established between common fields (columns) in two or more4 tables. Such relationships are sometimes called binary relationships. relationship can be one-to-one, one-to-many, or many-to-many. (1:1, 1:N, N:M)

one-to-one relationship, each record in Table A can have only one matching record in Table B, and each record in Table B can have only one matching record in Table A. This type of relationship is not common, because most information related in this way would be in one table. You might use a one-to-one relationship to divide a table with many fields, to isolate part of a table for security reasons, or to store information that applies only to a subset of the main table. For example, you might want to create a table to track employees participating in a fundraising soccer game.

A one-to-many relationship

A one-to-many relationship is the most common type of relationship. In a one-to-many relationship, a record in Table A can have many matching records in Table B, but a record in Table B has only one matching record in Table A.

many-to-many relationship

In a many-to-many relationship between two tables, one record in either table can relate to many records in the other table. In a many-to-many relationship, a record in Table A can have many matching records in Table B, and a record in Table B can have many matching records in Table A. This type of relationship is only possible by defining a third table (called a junction table) whose primary key consists of two fields - the foreign keys from both Tables A and B. A many-to-many relationship is really two one-to-many relationships with a third table.

 

Weak entities

These entities whose presence in the database depends on the presence of another entity e.g. apartment depends on the building. E-R model includes a special type of weak entity called an ID-dependent entity

 

Object 

An object is the element as it is represented in the data base. An object is "a digital representation of all or part of an entity". 

The method of digital representation of a phenomenon varies according to scale purpose and other factors. e.g. a city could be represented geographically as a point if the area under consideration were continental in scale; while the same city could be geographically represented as an area if we are dealing with a geographical data base for a state or a country. 

The digital representation of entity types in a spatial data base requires the selection of appropriate spatial object types. The object types are listed in Table 3-1 (and illustrated in Figure 3-1) based on the following definition of spatial dimensions. 

·        0-D: an object having a position in space, but no length (e.g. a point). 

·        1-D: an object having a length and is composed of two or more 0-D objects (e.g. a line

·        2-D: an object having a length and width and is bounded by at least three 1-D line segment objects (e.g. an area).

·        3-D: an object having a length, width and height/depth and is bounded by at least four 2-D objects (e.g. a volume).

An object class is the set of objects which represent the set of entities, e.g. the set of points representing the set of wells.

 

0-dimensional object types 

·        point - specifies geometric location

·        node - a topological junction or end point, may specify location

1-dimensional object types 

·        line - a one dimensional object

·        line segment - a direct line between two points

·        string - a sequence of line segments

·        arc - a locus of points that forms a curve that is defined by a mathematical function

·        link - a connection between two nodes

·        directed link - a link with one direction specified

·        chain - a directed sequence of nonintersecting line segments and/or arcs with nodes at each end

·        ring - a sequence of non-intersecting chains, strings, links or arcs with closure

2-dimensional object types 

·        area - a bounded continuous object which may or may not include its boundary

·        interior area - an area not including its boundary

·        polygon - an area consisting of an interior area, one outer ring and zero or more non-intersecting, nonnested inner rings

·        pixel - a picture element that is the smallest nondivisible element of an image

 

Figure 3-1. Spatial object types. 

 

Data base model 

A data base model is a conceptual description of a database defining entity type and associated attributes.

Each entity type is represented by specific spatial objects.

After the database is constructed, the data base model is a view of the database, which the system can present to the user. 

Examples of data base models can be grouped by application areas; e.g. transportation applications require different data base models than do natural resource applications. 

Layers 

Spatial objects can be grouped into layers, also called overlays, coverages or themes. 

One layer may represent a single entity type or a group of conceptually related entity types. E.g. a layer may have only stream segments or may have streams, lakes, coastline and swamps.

What kind of objects do you use to represent these entity types? 

 

SPATIAL OBJECTS AND DATA BASE MODELS 

The objects in a spatial database are representations of real-world entities with associated attributes. The power of a GIS comes from its ability to look at entities in their geographical context and examine relationships between entities. Thus a GIS data base is much more than a collection of objects and attributes. 

·        How are lines linked together to form complex hydrologic or transportation networks? 

·        How can points, lines, or areas be used to represent more complex entities like surfaces? 

 

 

Point Data 

·        Points represent the simplest type of spatial object. 

·        Choice of entities which will be represented as points depends on the scale of the map or application. e.g. on a large scale map, building structures are encoded as point locations. e.g. on a small scale map - cities are encoded as point locations. 

·        The coordinates of each point can be stored as two additional attributes. 

·        Information on a set of points can be viewed as an extended attribute table.

Each row (or record) represents a point recording all information about the point.
Each column is an attribute (or a field), two of which are the x, y coordinates.
Each point is independent of each other, represented as a separate row (Figure 3-2).

 

Figure 3-2. Point data attribute table. 

 

Line Data 

Lines represent network entities such as: 

·        infrastructure networks

·        transportation networks - highway and railway

·        utility networks - gas, electricity, telephone and water pipe

·        airline networks - hubs and routes 

·        natural networks

·        river channels 

Network characteristics (Figure 3-3): 

·        A network is composed of nodes and links.

·        The valency of a node is the number of links at the node. e.g.

·        ends of dangling lines are "1-valent"

·        4-valent nodes are most common in street networks

·        3-valent nodes are most common in hydrology

·        A tree network has only one path between any pair of nodes, no loops or circuits are possible. Most river networks are trees.

 

 


 


Figure 3-3. Nodes and links in network entities. 

 

Attributes of network entities: 

·        Link attributes

·        transportation: direction of traffic, length, number of lanes, time to pass

·        pipe lines: diameter of pipe, direction of gas flow

·        electricity: voltage of electrical transmission line, height of towers 

·        Node attributes

·        transportation: presence of traffic lights and overpass, names of intersecting streets

·        electricity: presence of shutoff valves, transformers 

 

Area Data 

Area data are represented on area class maps, or choropleth maps. Boundaries may be defined by natural phenomena such as lakes, or by man such as forest stands, CENSUS zones. 

Types of areas that can be represented include: 

·        Environmental/natural resource zones: land cover, forest, soil, water bodies

·        Socio-economic zones: CENSUS tracts, postcodes

·        Land records: land parcel boundaries, land ownership 

Area coverage (Figure 3-4): 

·        Type 1: Entities are isolated areas, possibly overlapping

·        Any place can be within any number of entities, or none.

·        Areas do not exhaust the space. 

·        Type 2: Any place is within exactly one entity

·        Areas exhaust the space.

·        Every boundary line separates two areas, except for the outer boundary.

·        Areas may not overlap. 

·        Any layer of the first type can be converted to one of the second type.

·        Holes and islands (Figure 3-5):

·        Areas often have "holes" or areas of different attributes wholly enclosed within them.

·        More than one primitive single-boundary area (islands) can be grouped into an area object.

 


Figure 3-4. Area coverage: (a) Entities are separate; (b) Entities fill the space; (c) First type represented as second type. 

 




Figure 3-5. Holes and islands. 

 


Representation of Continuous Surfaces 

Examples of continuous surface:

·        elevation (as part of topographic data)

·        rainfall, pressure, temperature

·        population density 

General nature of surfaces 

·        Critical points

·        peaks and pits - highest and lowest points

·        ridge lines, valley bottoms - lines across which slope reverses suddenly

·        passes - convergence of 2 ridges and 2 valleys 

·        faults - sharp discontinuity of elevation - cliffs 

·        fronts - sharp discontinuity of slope 

·        slopes and aspects can be derived from elevations 

Data structures for representing surfaces 

·        Traditional data models do not have a method for representing surfaces, therefore surfaces are represented by the use of points, lines or areas. 

·        Points - grid of elevations 

·        Lines - digitised contours 

·        Areas - TIN (Triangulated irregular network) 

 

THE VECTOR GIS 

What is a vector data model? 

·        based on vectors (as opposed to space-occupancy raster structures) (Figure 3-6) 

·        fundamental primitive is a point 

·        objects are created by connecting points with straight lines 

·        areas are defined by sets of lines (polygons)

 



Figure 3-6. Example of vector GIS data. 

 


Arcs 

·        When planar enforcement is used, area objects in one class or layer cannot overlap and must exhaust the space of a layer. 

·        Every piece of boundary line is a common boundary between two areas. 

·        The stretch of common boundary between two junctions (nodes) may be called edge, chain, or arc

·        Arcs have attributes which identify the polygons on either side (e.g. "left" and "right" polygons) 

·        In what direction by which we can define "left" or "right"? 

·        Arcs (chains/edges) are fundamental in vector GIS (Figure 3-7).

 

 

 

Figure 3-7. An arc in vector GIS. 

 

Node

The beginning of any point of the arc or the location of a point feature, or the inntersection of two arcs

 

Polygon

It is closed chain of arcs that represent all area features.

 

Data Base Creation 

Data base creation involves several stages:

·        input of the spatial data

·        input of attribute data

·        linking spatial and attribute data 

Once points are entered and geometric lines are created, the topology of the spatial object must be "built" (Figure 3-8). 

Building topology involves calculating and encoding relationships between the points, lines and areas. 

This information may be automatically coded into tables of information in the data base.

Topology is recorded in 3 data tables. One for each type of spatial element, arc (polygon attribute table), node (node topology table), polygon (arc topology table) and a fourth table is used to store nodes, coordinates and vertices.

 



Figure 3-8. Example of "built" topology

 

Editing

During the topology generation process, problems such as overshoots, undershoots and spikes are either flagged for editing by the user or corrected automatically. 

Automatic editing involves the use of a tolerance value which defines the width of a buffer zone around objects within which adjacent objects should be joined. Tolerance value is related to the precision with which locations can be digitised. 

 

Adding Attributes 

·        Once the objects have been formed by building topology, attributes can be keyed in or imported from other digital data bases. 

·        Once added to the data base, attributes must be linked to the different objects. 

Attribute data is stored and manipulated in entirely separate ways from the locational data. Usually a Relational Data Base Management System (RDBMS) is used to store and manage attribute data and their links to the corresponding spatial objects.

 

 

 

 

Things to consider in Database

Essential Differences between raster and vector (Hazelton, 1999)

The essential differences between the two approach is how they deal with representing things in space.

Raster systems are built on the premise that there is something at all points of interest, so we will record something for every location. The space under consideration is exhausted by a regular data structure. The concern is not with the boundaries between objects as much as with the objects themselves, their interiors, or the phenomena they are representing. Spatial resolution is not as important as complete coverage. The presence of something is more important than its exact extent.

Vector systems are built on the premise that we only need to record and deal with the essential points. If there isn’t something of significance at a location, don’t record anything. Not all locations in space are referenced, and many are simply referenced indirectly, as being inside a polygon. The data structure supports irregular objects and very high resolution. We are interested in the boundaries between objects at least as much as with the objects themselves, often more so. We need precision representation of linear objects, and this need overrides other needs for surface and area modeling of all but the simplest kind. Precision is the watchword in vector GIS, together with making spatial relationship explicit.

The explicit nature of the relationships in vector GIS requires ‘topology’, . It also allows much easier analysis of these kinds of relationships, especially connectivity between locations (points), which is done with lines. In raster GIS, we can figure out which cells are the eight surrounding the one we are currently in, so connectivity is implicit in the data structure, and we don’t need all this extra stuff

 

 

Spaghetti (Hazelton, 1999)

We drew a polygon by just making lines; we never explicitly say in the database ‘this is a polygon’. We often call this kind of representation the Spaghetti data model, after the way that a plate of spaghetti looks (and is structured).

While the spaghetti looks fine, it doesn’t really satisfy our needs. While we just need to see a picture, it’s fine. As soon as we need to do anything beyond looking, when we need to get the machine to do some analysis rather than the user, we run into problems.

A simple question might be: “If I follow this road (a line), what other roads join it?” This is rather an important question if you want the machine to tell you how to get from A to B. If we have a spaghetti model for our data storage, we can find out the answer to the question. However, we need to look at every single line and co-ordinate pair in the database and compare it to every line segment in the line we are starting from.

Why is this? At any moment, another line might cross ours, and we won’t know it unless we test every single segment of that line against all the segments of our line, to see if they cross or meet. If there is a crossing or meeting, we can tag that line and keep going, but every such query requires a complete search and test for the entire database. As most GIS databases are fairly large, this is horribly inefficient. We can only consider it in a small system, such as MapInfo, and even there we do give the system a few hints to make things simpler. For big GIS, we use a thing called Topology

 

Topology (Hazelton, 1999)

When we look into the display of the spaghetti, we quickly see the polygons, the intersections and the like. This is because our brains are very powerful parallel-processing systems adapted to make sense of visual data very rapidly. Our lives depend on this capability, today and since the dawn of the species. But computers are painfully slow and awkward at this operation. They are good at crunching numbers, so we have to make the structure in the mess of spaghetti obvious to the machine, in a numerical form. We call this ‘topology.’

Topology is a branch of mathematics that deals with very basic geometric concepts. Way before we think about angle and distance and size, there are more fundamental properties of objects, properties which don’t change when we do a wide range of things to the object. For example, no matter how we manipulate an object, provided we don’t tear it, we don’t change the number of pieces there are of it, or the number of complete holes there are in it. If two objects are connected, the connectedness remains constant no matter how much we rotate, scale, move or otherwise manipulate them. One object is inside another until we tear open the outer one to remove the inner. So you can see that there are some very basic properties of objects that remain constant (or invariant) under a range of things that can be done to them (operations).

Extending this to vector GIS databases, we find that an object is a polygon no matter how many sides it has (beyond a basic minimum); that the holes in an object remain the same no matter how it is transformed into different map projections; that a line goes from one end to the other at all times (i.e., there is a direction associated with it). We can build these things into the GIS by making them explicit. To do this requires a more developed data structure than just the spaghetti

 

Topological Relationships (Hazelton, 1999)

1)      Ownership and Component-ness

The most fundamental topological relationship is ‘owns’ or ‘is a component of’. (You will note that these two relationships are actually two sides of the one relationship.) This allows us to build a definite structure into all the objects in the vector GIS database. It works like this.

We still have the great line strings of points, with the nodes at the ends. But we now give the line strings a definite identifier, a unique number (to keep the computer happy). The nodes we also keep a special note of in a separate area, linked back to the original data (which includes the co-ordinates). As we build up things like polygon boundaries, we link the line strings into chains, of which we also keep a special note. So we have a series of two-way relationships being built here. The nodes are a component of the lines, which are a component of the chains. The chains own the lines, which in turn own the nodes. We make this explicit in our database, so that we can find where everything is.

One of the beauties of this system is that ownership and component-ship are not exclusive. A node can be the end-point of several lines, for example, and a line obviously forms a boundary between (and is owned by) two polygons. This means that we only need to store things once in the database, reducing redundancy.

At the next level, we take collections of chains that form a closed loop, and reference these as the boundary of a polygon. The polygon then explicitly owns this collection of chains as its boundary. A polygon may have an outer boundary and several inner ones, such as islands in a lake.

We need to make sure that the relationships are two-way, so we need to have the chains refer to the polygons that own them. Each chain can only be owned by a maximum of two polygons in a 2-D representation, so we make explicit which two these are.

You will notice that in all the work to this point, co-ordinates have not been involved. This is an essential part of topology, and it means that the relationship hold true for any map projection of the data

 

2)      Direction

Another important topological relationship is the direction of a line. We take this to be from it ‘from- node’ to its ‘to-node’, naturally. In most cases, the from-node, is just where the line started to be entered. The user never sees this direction, as it doesn’t affect anything outside the topological structure. In many GIS, you can add a direction to a line, making it a directed line, where the direction has a special meaning, such as a one-way street. The GIS will store this information, but the direction of the line that the user want can be either direction.

With direction, an interesting property occurs in 2-D (but not in 3-D). As we move along the line in its proper direction, we find that the two polygons that the line bounds are on either side, to the left and to the right. So we record the left and right polygon identifiers with the line data, and we use this to provide a link from the line to each polygon.

In fact, because of this role of the line being linked to all the other components, the line assumes major importance in 2-D GIS. In addition, there is a topological property called ‘duality’, which means that there are strong relationships between the components in both directions which must be made explicit for the data structure to work properly. By working with the lines as the basis of the data structure, we have a single step to get to all the other components.

 

3)      Connectivity

The topological relationships we now have as explicit in the database enable us to tell very quickly what lines meet at which nodes. We can choose a single node and find all the lines to which it belongs, and so all the polygons in whose boundary it is a component. This makes it very quick to determine how pipeline networks inter-connect. This simplifies a lot of the kinds of queries that are involved in network analysis.

 

4)      Adjacency

The other aspect of connectivity is the relationship between polygons. If two polygons share a boundary, they are adjacent. If they share just a common point, their adjacency is of a lower order. But by making the fundamental relationships explicit in the database, it is quick and easy to determine this adjacency and its degree. This helps in a number of different applications of spatial analysis

 

5) Nestedness

Another topological relationship is that of having things inside other things. The database handles this by referring to closed loops of chains as boundaries, and noting which of them are internal boundaries, i.e. inside another polygon, and which are external. It is then very simple to search for common boundaries and seek nested objects

 

6) How Many More?

In a 2-D GIS, there are quite a few different topological relationships. There are those between polygons, between polygons and lines, between lines and lines, between lines and points, between polygons and points, and between points and points. Some are quite simple; others are more complex.

As far as polygons and lines are concerned, a topologically sound database can handle only two kinds of relationships (fundamentally). Either the two objects touch along a common boundary (polygons meet each other at lines and points, for example) or they do not touch at all. This is the state in a database when we have ‘built’ the topology

 

Improper Relationships (Topological Division) (Hazelton, 1999)

When we enter data into a GIS, we can have ‘improper’ topological relationships occur. For instance, we may digitize two polygons in a single layer that overlap, so that at one point we will have two values for a single attribute. This presents us with problems in analysis, so we don’t want this to happen. With spaghetti we can’t control this, but if we build the topology correctly, we can ensure that for each attribute layer, we have single values at any point.

(Remember that part of the basic idea of a vector GIS is that within a polygon, the attribute value is constant, changing sharply at the boundary. This is exactly the same as with a raster GIS: sharp changes at boundaries, no change within. We have not yet got to the point of a GIS that allows continuous variability.)

A similar circumstance arises when we undertake topological overlay. Here we have two layers that while having ‘good’ topology, will naturally have overlapping polygons. It is perfectly reasonable to have a map where you have overlapping polygons representing different attributes. However, when we want to build the topology in the newly created layer (created by the overlay operation), we need to start breaking the large polygons down into small polygons, such that any one attribute has just one value within the polygon.

So in all cases of importance to us in this course, when we have the data structure in the state where all polygons are in one of the two basic topological states, and the same for lines, etc., we have eliminated all the improper relationships and can now proceed with analysis. We can be confident that we don’t have any ambiguities in the spatial relationships expressed in the database, so that analysis will work properly.

 

Scale, Accuracy, Precision, Resolution (Hazelton, 1999)

An interesting myth that has grown around vector GIS is that of the scale-less database. The argument runs that since we can represent locations to fractions of a millimeter, we can work on a 1 : 1 scale, and so avoid the problems of scale in maps and the like. While this is a nice idea, even with GPS it is still a long way off.

The question is, how good is the input data? If I digitize locations from a 1:24,000 map, a location is good to about 12 meters. If I digitize a 1:500 map, the locations are good to about 0·25 meter. Note that for maps, this data quality is for ‘well-defined points’ only. Points that aren’t well-defined, lines, polygons and the like don’t count! So only a very small part of the map will actually be to that precision. How good is the rest? There are no standards for that part of the map.

I can measure objects on the Earth with GPS and get precision to 0·1 meter. With good surveying gear, I may even be able to get to 0·01 meter. We are still a long way from a millimeter, let alone a fraction of a millimeter. Yet it is easy to pull up co-ordinates to whatever number of decimal places one wishes.

The quality of the data, its accuracy if you like, is based very much on the precision of the measurements used in the database. But there is nothing in a GIS, in almost every case, to let the user know how good the data actually as, while it is being used. You pick a point, and read out the co- ordinates to the fraction of a millimeter, and nothing springs up to say “Well actually, that’s only good to ± 50 meters, you know.” It is very misleading.

As GIS users, you need to be very aware of this issue. It is so<SPACER TYPE=”horizontal” SIZE=”3”>easy to be led astray here, and many of your less well-educated users may fall into these pitfalls. Remember the Garbage In, Gospel Out situation. Here is very easy place to see it happen.

The resolution of the computer hardware is also an issue here. ARC/INFO work with real numbers (floating point) for co-ordinates, and these can be either single precision or double precision. Single precision is good to 6 or 7 significant figures, while double precision is good to 14 to 16 significant figures. If you are recording locations using UTM co-ordinates, you will only get meter resolution if you use single precision and the full co-ordinates. MGE, on the other hand, uses integers, so that every location is ultimately expressed as a number between 0 and 4&middot;2 billion. There can be questions of the fine-grained nature of this, but if you are aware of the differences and what is happening, things will be OK.

 

Problems with Vector GIS (Hazelton, 1999)

A full-blown vector GIS, especially with an associated raster component, is an awesome system. ARC/INFO sports over 3,000 commands, while MGE is not far behind it (although it has a much more mouse-windows oriented interface). It is very easy to get lost in the complexity and intricacies of these products.

We have already looked at questions of precision, accuracy, resolution and scale. Vector systems have no built-in ‘check’ of the raster cell size as a give-away about their resolution. In many cases there is no metadata or data quality information to let you know about the data in the database. You may never know that part of your database was digitized from a 1:500,000 map, while all the rest is 1:24,000, and yet that difference could play havoc with analyses performed on the data.

Another issue we haven’t touched on is the question of data conversion from raster to vector. We often need to do this to help a vector analysis. When the vectors are produced, there may be nothing (in the lineage part of data quality) to let you know that these were converted from a raster dataset of some resolution. When the vectors are smoothed and the data is included, how will we know there is anything different about those lines and polygons?

Similar problems going from vector are raster are less of an issue, as the vector looks like it should be of a higher resolution and converts easily. But was the vector as good as the raster resolution? How can you tell? It is surprising how many raster GIS have the same resolution as a vector system.

As with all science, you can avoid fooling other people if you first don’t fool yourself. If you know about the system, its capabilities, the data and what it should be able to achieve, you can do well with vector GIS.

Conclusions   (Hazelton, 1999)

Vector GIS is a powerful tool for spatial representation and analysis. Yet it is open to misuse and abuse, like any other information system. Some of the potential traps have been pointed out, and you must be aware of them.

If you can focus on the application rather than the hardware and software, you will do a good job with GIS in general

 

 

GIS database design

Before actually building the tables, forms, and other objects that will make up your database, it is important to take time to design your database. A good database design is the keystone to creating a database that does what you want it to do effectively, accurately, and efficiently.

 

Database design stages

Conceptual database design (focus on the content of the database i.e. user GIS data needs to get GIS functional and data requirements, it is done by listing the database elements)

1.      physical database design (the actual structure of the database is developed and documented based on the content (features and attributes) identified above)

2.      database implementation (the  actual coding of the physical database)

 

These are the basic steps in designing a database:

1          Determine the purpose of your database.

2          Determine the tables you need in the database.

3          Determine the fields you need in the tables.

4          Identify fields with unique values.

5          Determine the relationships between tables.

6          Refine your design.

7                    Add data and create other database objects (Tables, queries, forms, reports, macros, and modules.

8          Use Microsoft Access analysis tools.

 

Determine the purpose of your database

The first step in designing a database is to determine the purpose of the database and how it's to be used. You need to know what information you want from the database. From that, you can determine what subjects you need to store facts about (the tables) and what facts you need to store about each subject (the fields in the tables).

 

Talk to people who will use the database. Brainstorm about the questions you'd like the database to answer. Sketch out the reports you'd like it to produce. Gather the forms you currently use to record your data. Examine well-designed databases similar to the one you are designing.

 

Determine the tables you need

Determining the tables can be the trickiest step in the database design process. That's because the results you want from your database — the reports you want to print, the forms you want to use, the questions you want answered — don't necessarily provide clues about the structure of the tables that produce them.

 

Table is the fundamental structure of a relational database management system. A table is an object that stores data in records (rows) and fields (columns). The data is usually about a particular category of things, such as employees or orders.

 

A table should not contain duplicate information, and information should not be duplicated between tables.

When each piece of information is stored in only one table, you update it in one place. This is more efficient, and also eliminates the possibility of duplicate entries that contain different information. For example, you would want to store each customer address and phone number once, in one table.

 

Each table should contain information about one subject.

When each table contains facts about only one subject, you can maintain information about each subject independently from other subjects. For example, you would store customer addresses in a different table from the customers' orders, so that you could delete one order and still maintain the customer information.

 

In table Datasheet view, you can add, edit, or view the data in a table. You can also check the spelling and print your table's data, filter or sort records, change the datasheet's appearance, or change the table's structure by adding or deleting columns.

You can sort, filter, or find records in the rows of your datasheet by the data in one or more adjacent columns.

           

You use a unique tag called a primary key to identify each record in your table. Just as a license plate number identifies a car, the primary key uniquely identifies a record. A table's primary key is used to refer to a table's records in other tables.

 

Determine the fields you need

Each table contains information about the same subject, and each field in a table contains individual facts about the table's subject. For example, a customer table may include company name, address, city, state, and phone number fields.

Field is an element of a table that contains a specific item of information, such as last name. A field is represented by a column or cell in a datasheet.

When sketching out the fields for each table, keep these tips in mind:

·        Relate each field directly to the subject of the table.

·        Don't include derived or calculated data (data that is the result of an expression).

·        Include all the information you need.

·        Store information in its smallest logical parts (for example, First Name and Last Name, rather than Name.)

 

Identify fields with unique values

In order for DBMS to connect information stored in separate tables - for example, to connect a customer with all the customer's orders- each table in your database must include a field or set of fields that uniquely identifies each individual record in the table. Such a field or set of fields is called a primary key.

The power of a relational database system such as Microsoft Access comes from its ability to quickly find and bring together information stored in separate tables using queries, forms, and reports. Once you designate a primary key for a table, to ensure uniqueness, DBMS will prevent any duplicate or Null values from being entered in the primary key fields.

 

A query is a question about the data stored in your tables, or a request to perform an action on the data. A query can bring together data from multiple tables to use as the source of data for a form or report.

 

A form is a database object on which you place controls for taking actions or for entering, displaying, and editing data in fields.

 

A report is a database object that presents information formatted and organized according to your specifications. Examples of reports are sales summaries, phone lists, and mailing labels.

There are three kinds of primary keys that can be defined in Microsoft Access: AutoNumber, single-field, and multiple-field.

 

 

Determine the relationships between tables

Now that you've divided your information into tables and identified primary key fields, you need a way to tell DBMS how to bring related information back together again in meaningful ways. To do this, you define relationships between tables.

Foreign key is one or more table fields that refer to the primary key field or fields in another table. A foreign key indicates how the tables are related - the data in the foreign key and primary key fields must match.

 

Refine the design

After you have designed the tables, fields, and relationships you need, it's time to study the design and detect any flaws that might remain. It is easier to change your database design now, rather than after you have filled the tables with data.

 

Use Microsoft Access to create your tables, specify relationships between the tables, and enter a few records of data in each table. See if you can use the database to get the answers you want. Create rough drafts of your forms and reports and see if they show the data you expect. Look for unnecessary duplications of data and eliminate them.

 

Enter data and create other database objects

When you are satisfied that the table structures meet the design goals described here, then it's time to go ahead and add all your existing data to the tables. You can then create any queries, forms, reports, macros, and modules that you may want.

 

Use Microsoft Access analysis tools

Microsoft Access includes two tools that can help you to refine your database design. The Table Analyzer Wizard can analyze the design of one table at a time, can propose new table structures and relationships if appropriate, and can restructure a table into new related tables if that makes sense. For information on running the Table Analyzer Wizard, click  .

 

The Performance Analyzer can analyze your entire database and make recommendations and suggestions for improving it. The wizard can also implement these recommendations and suggestions. For information on using the Performance Analyzer, click  .

 

For additional ideas on designing a database, you may want to look at the Northwind sample database and the database schemas for one or more of the databases that you can create with the Database Wizard. For information on using the Database Wizard, click  .

 

What is involved in the design of a database

·        the logic elements ( they provide for the positional (x,y) reference structure that holds graphic information, and are designated  as nodes, links, chains, and areas)

·        the graphic elements, which are assigned to logic elements (i.e. to design the graphics elements for the features that have to be represented graphically, maintained, and accessed in graphic fashion)

·        the attributes (alphanumeric data), which are assigned/linked to the features, and the display rules for the attributes are also included

·        GIS data relationship (i.e., relation between feature classes and their attribute types, relation among attribute types, and relationships among features.)

·        the digital data to be included in the database e.g. raster images, satellite imagery, existing digital landbased data(maps) and facilities/assets data

·        database has to be logically structured to relate similar data types to each other , either through laying or object-based approaches

 

 

 

 

 

 

 

 

 

 

Should I use a macro or Visual Basic?

 

In Microsoft Access, you can accomplish many tasks with macros or through the user interface. In many other database programs, the same tasks require programming. Whether to use a macro or Visual Basic for Applications often depends on what you want to do.

When should I use a macro?

Macros are an easy way to take care of simple details such as opening and closing forms, showing and hiding toolbars, and running reports. You can quickly and easily tie together the database objects you've created because there's little syntax to remember; the arguments for each action are displayed in the lower part of the Macro window.

 

In addition to the ease of use macros provide, you must use macros to:

 

·           Make global key assignments.

·           Carry out an action or series of actions when a database first opens. However, you can use the Startup dialog box to cause certain things to occur when a database opens, such as open a form.

 

When should I use Visual Basic?

You should use Visual Basic instead of macros if you want to:

 

·           Make your database easier to maintain. Because macros are separate objects from the forms and reports that use them, a database containing many macros that respond to events on forms and reports can be difficult to maintain. In contrast, Visual Basic event procedures are built into the form's or report's definition. If you move a form or report from one database to another, the event procedures built into the form or report move with it.

·           Create your own functions. Microsoft Access includes many built-in functions, such as the IPmt function, which calculates an interest payment. You can use these functions to perform calculations without having to create complicated expressions. Using Visual Basic, you can also create your own functions either to perform calculations that exceed the capability of an expression or to replace complex expressions. In addition, you can use the functions you create in expressions to apply a common operation to more than one object.

 

·           Mask error messages. When something unexpected happens while a user is working with your database, and Microsoft Access displays an error message, the message can be quite mysterious to the user, especially if the user isn't familiar with Microsoft Access. Using Visual Basic, you can detect the error when it occurs and either display your own message or take some action.

·           Create or manipulate objects. In most cases, you'll find that it's easiest to create and modify an object in that object's Design view. In some situations, however, you may want to manipulate the definition of an object in code. Using Visual Basic, you can manipulate all the objects in a database, as well as the database itself.

 

·           Perform system-level actions. You can carry out the RunApp action in a macro to run another Windows-based or MS-DOS–based application from your application, but you can't use a macro to do much else outside Microsoft Access. Using Visual Basic, you can check to see if a file exists on the system, use Automation or dynamic data exchange (DDE) to communicate with other Windows-based applications such as Microsoft Excel, and call functions in Windows dynamic-link libraries (DLLs).

 

·           Manipulate records one at a time. You can use Visual Basic to step through a set of records one record at a time and perform an operation on each record. In contrast, macros work with entire sets of records at once.

·           Pass arguments to your Visual Basic procedures. You can set arguments for macro actions in the lower part of the Macro window when you create the macro, but you can't change them when the macro is running. With Visual Basic, however, you can pass arguments to your code at the time it is run or you can use variables for arguments — something you can't do in macros. This gives you a great deal of flexibility in how your Visual Basic procedures run.

 

Structured Query Language (SQL)

 

A language used in querying, updating, and managing relational databases. SQL can be used to retrieve, sort, and filter specific data to be extracted from the database.


The design of database had to take care of the basic data elements (micro data) and aggregated data (macro data) as data was obtained in both formats. Here records of individual persons and households collected in the survey in their raw form as well as in their final corrected form, and the results of processing in the form of aggregations are stored with a view to preserving them for the future and to making access as easy as possible at all times. Some of the main advantages of a micro-database are the possibilities to retrieve data theoretically at any level of detail, and to build sampling frames. Since micro data could be used illegally in efforts to disclose sensitive information, privacy concerns must always be taken into consideration, in this the names were restricted and removed from the general display ( ).  For the case, aggregated census data were stored in that format to preserve earlier aggregations, to provide readily usable information. Micro data were saved to allow aggregations to be made that were not programmed initially

 

Demographic Geocoding/Georeferencing

Georeferencing being the process of assigning a geographic location (e.g. latitude and longitude) to a geographic feature based on its address, this was carried to able to convert automatically existing addresses into a GIS database. For this to be accomplishable the digital record for the feature must have a field which can be linked to a geographic base file with known geographic coordinates.

Considering the way population data is collected using field survey, which is the main source of data, this data has to be georeferenced (geocoded) before it is analyzed in GIS. Demographic data is usually referenced by point and area, the integration of the two has been highlighted by Bracken (1994), can be done in three ways. First, point address locations may be added allocated to census zones so that in effect address data becomes another aggregate field of zonal record. Second, each address location can be assigned data from its enveloping zone, so that the point takes on the attributes of its surrounding area. Third, both types of data can be re-represented geographically onto a neutral base in the form of a georeferenced grid. It is this third alternative in which Bracken (1994) developed a surface model which generate a spatial distribution of population as a fine and variable resolution geographical grid which is advocated for and developed further by (Bracken and Martin, 1995). It is this technique being partly employed to derive surface from the points to represent polygons but this time using buildings as the georeferencing spatial units as among the information recorded in population data collection is the place of residence mostly building number. This done to provide a way to disaggregate demographic analysis and this can be easily combined with other spatial analysis. It is accomplished by transferring the attributes of the bigger feature (e.g. road) on to the smaller dimensional features (buildings) so that individuals are geocoded on the right road; it explained further below.

There are many techniques of georeferencing (Cowen, 1997), in this thesis employed three techniques 1) totally assign new unique field, 2) Database Queries to read fields from tables and join them to other tables, 3) Any set of addresses can be accurately georeferenced by joining to this file on the basis of common fields.

With that we are in position

Once the particular address is located on a map then the coordinates can usually be read directly from the screen.

For the building road number was added to the building

This can simply be a relational data base join in which the geographic coordinates of the basemap are linked to the address records and made spatial.

 

Computer-Based Analysis for Public Management

Relational Database Design

Thomas H. Grayson
 2000

 

Relational Database Design

In this study, use a relational database and designing a relational model: all data have been represented as tables. Tables are comprised of rows and columns; rows and columns are unordered (i.e., the order in which rows and columns are referenced does not matter). Each table has a primary key, a unique identifier constructed from one or more columns. A table is linked to another by including the other table's primary key. Such an included column is called a foreign key. Let me talk more about how the primary keys were created. For individual data, each was given a unique identifier number which is the first column in table   . for the building    

Qualities of a Good Database Design

Introduction to Entity-Relationship Modeling

E-R Modeling Process

From E-R Model to Database Design

Database Design Rules of Thumb


(not applicable to DBF files, which do not support NULLs)


Why is this rule often hard to practice with GIS?


Note that keeping column names short may be at odds with keeping your column names meaningful for neophytes. Be aware that you are making a tradeoff!

Example: The Parcels Database

Table

Primary Key

PARCEL 

PID, WPB

OWNERS 

OWNERNUM

FIRES 

PID, WPB, FDATE

TAX 

PID, WPB

Primary Table Columns

Foreign Table Columns 

Cardinality

OWNERS.OWNERNUM

PARCEL.ONUM

One-to-many

PARCEL.PID, PARCEL.WPB

FIRES.PID, FIRES.WPB

One-to-many

PARCEL.PID, PARCEL.WPB

TAX.PID, TAX.WPB

One-to-one

Parcels Database Enhancements

 


Last modified 29 October 2000 by Wadembere, M. I.