Geographical Information Systems

Demographic Spatial Analysis and Modeling








Wadembere Mugumbu Ismail





Thesis submitted in fulfillment of the requirements for the award of the Degree of

Master of Science (Planning – Geographical Information Systems)

of the University Science Malaysia



June 2001









This work is dedicated to the memory of…

my father, A. B. K. Mugumbu, who guided me to the right path;

my dear sister, Zaina Gimbo Mugumbu, who left when it had just begun …


………to the three inspiring men towards my study……

my dear uncle, Mr. Y. B. K. Wadembere, who always wants me to have the best;

my dear uncle, Prof. Mukwanason A. Hyuha, for the promises which keep me going,

my dear supervisor, Assoc. Prof. Lee, for his visions in the use Information Technology..


……… to the four most important women in my life……

my mother, Zulaiha Mugumbu for her love, prayers, pains and troubles;

my aunt (mummy Fati) for her guidance since my childhood and all troubles

my loving sister, Sarah H. Wadembere, for every good thing a sister would offer;

my friend (greenatwins_1101) for her constant long calls which gave a sense of love.



…… and my ever loving and caring extended family…






First and foremost, I would like to give Glory to God Almighty for His Grace and help in all my endeavors and for bringing me this far in my educational career.

I would like to express my gratitude to my supervisor Assoc. Prof. Dr. Lee Lik Meng for his guidance throughout this research; his comments and constructive criticisms greatly enhanced this thesis. Further, he is thanked for providing advice on use of Information Technology (IT) and helping me to gain dynamic research skills to complete this project.

Further thanks go to the academic staff of the school of housing, building and planning; and non-academic staff. People who stay and work in the shadows and whose praises are yet to be sang to a high crescendo with special thanks to wonderful IT laboratory staff (Mr. Leow, Miss Aisha, and Miss Kam Hong) for advice on numerous technical issues about GIS, Networking, Internet and providing Computing support.

What else can one say to his friends and brothers who during the two years stay in Malaysia have shared the tears, sorrows and joy with him? People who in one way or the other have been instrumental in making one’s stay here a memorable one. Would "thank you" be enough for providing valuable assistance and provided a research environment that was productive. Aminu, Osman, Ibrahim, Abdul-Fatah, Abdul-Nouh, Chong, Ali, Ahmad and his wife, Sattam, Jennifer, Ling, Papa, Azha, Mohd, Jasuya, Ahmad, Sulieman, Aku, Shola, Shakirh, Hajara, Obby, Mustapha, Malinga, Musa, Abdullah, Joseph and his wife, Dr. Omar, Dr Umaru and all the numerous friends whose names have not been mentioned due to over-sight.

My most sincere gratitude goes to the staff of Penang State Planning Department, Malaysia; Institute of Postgraduate Studies, university library, security division, university clinic of Universiti Sains Malaysia; the food vendors and those who wish to remain anonymous for research commenting; their help was greatly appreciated and they are generously thanked.

Several hours across the pearl of Africa (Uganda), I want to thank my beloved mother for her patience, perseverance and understanding throughout this period. I fondly remember you. Daddy and uncles’ daughters: Hajara, Marrum, Hanifa, Fatuma. Sarah, Amin, Nakato, Sophia, Nulu. May God bless you and I love you all. I am grateful to my brothers and cousin: Asuman, Hamidu, Moses, Ibrahim, Nouh, Alumasi, Zub, are all fondly remembered. My Mothers and aunts: mummy Fati, Petwa, Nalongo, Mugo,Yasin.  My friends: Irumba, George, Nathan, Grace, Wana, Lucy, Mugisha. And all other friends; too numerous to mention. Thanks for your love and support.

Finally, no words could describe my debt to my sponsors (Islamic Development Bank) and my family for their moral support and though overseas lived with me every single moment of all my graduate studies.

It has been nice knowing you all; and to you all I say

'terima kasih, banyak…banyak'..



Table of Contents

Dedication  i

Acknowledgement ii

Table of Contents  iv

List of Figures  ix

List of Tables  xi

Abbreviations  xii

Acronyms  xiii

Abstrak  xv

Abstract xvii


1.1       Background  1

1.2       Problem Statement 1

1.3       Objectives of the Study  4

1.4       The Approach of the Study  4

1.5       Thesis Layout 6


2.1       Introduction  7

2.2       Demographic Information in Planning  7

2.3       Demographic Data Analysis  8

2.3.1      Aggregation Demographic Analysis  8

2.3.2      Disaggregated Demographic Data  9

2.3.3      Need for Demographic Spatial Analysis  11

2.4       Demographic Statistical Analysis  12

2.4.1      Spatial Statistical Analysis  13    Spatial Pattern Analysis  13    Nearest Neighbor Analysis  13    Spatial Autocorrelation  14    Descriptive Statistics  14

2.5       GIS Spatial Analysis  15

2.5.1      Need for GIS Demographic Spatial Analysis  15

2.5.2      Spatial Analytical and Modeling Capabilities into GIS  16

2.6       GIS Demographic Spatial Analysis into Planning  18

2.7       Summary  21


3.1       Introduction  23

3.2       Study Area  23

3.3       Resources used in the Study  25

3.4       Data used in the Study  25

3.5       Attempts to carry out Demographic Spatial Analysis  26

3.5.1      SPSS for Windows  26

3.5.2      S-Plus 2000 Professional 27

3.5.3      IDRISI for Windows  27

3.5.4      SpaceStat Extension for ArcView   27

3.6       GIS in Malaysia  28

3.6.1      GIS-DSA Research and Applications in Malaysia  30

3.7       Database for GIS Demographic Analysis  31

3.7.1      Uncertainty (Fuzziness) in GIS Demographic Analysis  31

3.7.2      Individual Modeling  32

3.7.3      Demographic Geocoding/Georeferencing using Buildings  32

3.7.4      Demographic Data Representation and Manipulation in GIS  33

3.7.5      Connecting DB with GIS  35

3.8       GIS Demographic Analysis  36

3.8.1      GIS Approach to Spatial Modeling  36

3.8.2      Evolving the Demographic Model 39

3.9       Visualization and Presentation Techniques  40

3.9.1      Choropleth Technique  40

3.9.2      Cartograms  41

3.9.3      Lorenz Curves  42

3.10     GIS Demographics Visualization  43

3.10.1    Demographics Visualization Methodology  44    Cross-Variable Mapping  45    Multiple Displays (Multimedia Interaction) 45    Surface and Multi-Dimensional Displays  46    Composite Indices (Dimension Reduction) 47    Superimposition  47    Dynamic Displays (Dynamic Data Visualization) 48

3.11     Multi-Dimensional GIS for Demographic Modeling  48

3.11.1    Two-Dimensional GIS  49

3.11.2    Two and half-Dimensional GIS  49

3.11.3    Three-Dimensional GIS  49

3.12     Summary  49


4.1       Introduction  51

4.2       Two-Dimensional GIS Demographic Analysis  51

4.2.1      ArcView GIS Extension  52

4.2.2      Demographics Analyst 52    Development Strategy  52    User Interface  53    Functions of Demographics Analyst 53    Table Document Graphical User Interface  56    Scripts in the Demographics Analyst 57    Further Details on Demographics Analyst 58

4.2.3      Other Scripts and Extensions used  58

4.2.4      Demographic Nearest Neighbor Analysis  59

4.2.5      Demographic Spatial Change Analysis  60

4.2.6      Spatial Progressive Similarity Clustering of DCs  60

4.2.7      Demographic Spatial Alternative Appraisal 61

4.2.8      Demographic Spatial Segregation/Integration Analysis  63

4.2.9      Selection of Analysis Variables among DCs  68

4.3       Summary  70


5.1       Introduction  72

5.2       Surface and Volume-Based Demographic Spatial Analysis  72

5.3       3D Spatial Object Representation of Demographics  73

5.3.1      3D Data Structures  74

5.3.2      Surface-Based Representation  75

5.3.3      Volume-Based Representation  76

5.4       Demographic Data Interpolation and Extrapolation  77

5.4.1      Geo-Demographic Interpolation and Extrapolation  78    Demographic Holes, Breaks and Boundaries  79    Generation Demographic Points at Boundaries  80    MODC Extrapolation at Boundaries  80    Triangular Irregular Network (TIN) 80    Building a Delaunay Triangulation  81    Generating the Constrained Delaunay Triangulation  81

5.5       Demographics Modeling by Surface GIS  82

5.5.1      Demographic Surface Characterization  83

5.5.2      Demographic Quantitative Surface Characterization  84    Demographic Iso-lines and Vertical Demographics  84    Quantitative Spatial Effect 85    Quantitative Spatial Analysis  86    Transparent Neighborhood Analysis  87

5.5.3      Surface Characterization of Demographics Variation  88    Demographic Spatial Variation  88    Demographic Directional Variation  90    Demographic Outshoot 91    Demographic Dropfold  92    Demographic Undershed  92    Demographic Overshed  93    Demographic Overfold/Underfold  93    Demographic Spatial Pass  94

5.5.4      Demographic Visibility Surface Analysis  94    Demographic Viewshed  95    Demographic Linear Analysis  96

5.5.5      Uncertainty in Demographic Surface Analysis  97

5.6       Volume-Based 3D Demographics Modeling  99

5.7       Summary  102

6          CONCLUSION   103

6.1       Introduction  103

6.2       Summary of Thesis  103

6.3       Human-Computer Interaction  105

6.4       GIS Demographic Database for Planning Analysis  106

6.5       Summary of Findings, Contributions and Applications  107

6.6       Suggestions for Future Work  108


Appendix A                  Statistical Analyses  120

Appendix B                  GIS Analyses  123

Appendix C                  True 3D GIS-DSA   126

Appendix D                  Field Survey Forms  130

Appendix E                   Data used in the Study  139

Appendix F                   Demographics Analyst on the Web  151

Appendix G                  Thesis Web site  152


List of Figures

Figure No.                                          Title of Figure                                                           

Figure 1.1 Model of carrying out GIS demographic spatial analysis  5

Figure 1.2 Three Dimensional demographic modeling tasks  5

Figure 1.3 Visual overview of the main portions of the thesis  6

Figure 2.1 GIS demographic spatial analysis into planning process  20

Figure 3.1 Location of the study area  23

Figure 3.2 Location of roads, land use and buildings in study area  24

Figure 3.3 Buildings in heritage area displayed in 3D   25

Figure 3.4 Linking tables by creating primary and secondary keys  36

Figure 3.5 Traditional and GIS approaches to demographic analysis  37

Figure 3.6 GIS demographic analysis procedure  38

Figure 3.7 Comparison of choropleth mapping and cartogram   41

Figure 3.8 Illustration of measure of concentration using Lorenz curve  42

Figure 3.9 GIS demographic visualization  45

Figure 4.1 ArcView showing position and functions of Demographics Analyst 54

Figure 4.2 Demonstration of the user interface and Demographics Analyst 56

Figure 4.3 ArcView table interface showing Demographics Analyst 57

Figure 4.4 Nearest neighbor analysis  59

Figure 4.5 Identifying demographic characteristics in progressive clustering  61

Figure 4.6 Combining demographic characteristics in progressive clustering  61

Figure 4.7 Location of the people and the area for clearing  62

Figure 4.8 Evaluating alternatives by editing features  62

Figure 4.9 Spatial distribution of racial categories  67

Figure 4.10 Aggregation of demographic characteristics  68

Figure 4.11 Spatial distribution and location of marital status  70

Figure 4.12 Spatial distribution and location of the various religions in study area  70

Figure 5.1 Surface and volume visualization  73

Figure 5.2 Three dimensional spatial object representations  74

Figure 5.3 Spatial demographic interpolation and extrapolation  79

Figure 5.4 Constrained Delaunay triangulation procedure  82

Figure 5.5 TIN showing total population per building  84

Figure 5.6 Persons per building as vertical demographics  85

Figure 5.7 Quantitative spatial location of persons per building  85

Figure 5.8 Quantitative spatial effect analysis  86

Figure 5.9 Spatial location of single persons (a) and religions (b) 86

Figure 5.10 Malay per building  87

Figure 5.11 Malay per building overlaid with persons per building  87

Figure 5.12 Overly of Malay and Persons per building in 2D   88

Figure 5.13 Slope from TIN of total population per building  89

Figure 5.14 Slope of surface of religion  89

Figure 5.15 Slope of surface of race  90

Figure 5.16 Aspect from TIN of persons per building. 90

Figure 5.17 Aspect of surface of religion  91

Figure 5.18 Aspect of surface of race  91

Figure 5.19 Demographic features on surface of Malay per building  92

Figure 5.20 Surface of Malay per building showing demographic features  93

Figure 5.21 Surface of racial spatial influence  94

Figure 5.22 Positions of the observation points and target point 95

Figure 5.23 Demographic Viewshed on TIN of number persons per building  96

Figure 5.24 Line of sight with observer and target at different MODC   97

Figure 5.25 Different surface generated by Spline, IDW, and TIN   98

Figure 5.26 Continuous spatial distribution of the Malay race  99

Figure 5.27 TIN of Persons per building  99

Figure 5.28: Spatial distribution and location of the various religions in study area  100

Figure 5.29 Continuous spatial location of the various religions  100

Figure 5.30 Spatial distribution and location of marital status  101

Figure 5.31 Spatial distribution and location of marital status  101

Figure 5.32 Cut and fill between Malay and the total population  102

Figure C.1 Using trapezoidal rules  128

Figure C.2 Incremental volume calculations using polyhedron  129

Figure F.1 ArcScript web site interface for searching for scripts  151

Figure F.2 ArcScript web site interface showing search results  151

Figure G.1 This thesis web site interface on the Internet 152



List of Tables

Table No.                                            Title of Table                                                 

Table 3.1 Representation of demographic data in GIS  35

Table 4.1 Spearman Correlation of Analysis Variables  63

Table 4.2 Road * Race cross tabulation  66

Table 4.3 Analysis by entering DCs with road as dependent variable  69

Table A.1 Pearson correlation of Analysis Variables  121

Table A.2 ANOVA   121

Table A.3 Regression coefficients of demographic characteristics  121

Table A.4 Excluding variables analysis  122

Table E.1 Persons in the study area  139

Table E.2: Penang population according to mukim   144

Table E.3: Building Information in the study Area  145






Two and One Half-Dimensional (surface)


Two-Dimensional (plane surface)


Three-Dimensional (volumetric body)


Three Dimensional Demographic Model


Three Dimensional Models


Boundary Representation


Constrained Delaunay triangulation


Demographic Characteristic


Digital Elevation Models or Digital Terrain Models


Demographic Spatial Informatics


Delaunay Triangulation


Exploratory Spatial Data Analysis


Geographic Information


Geographic Information System


Geographic Information Science


Geographical Information System Demographic Analysis


GIS Demographic Model.


Geographical Information System Demographic Spatial Analysis


GIS Demographic Visualization


Geographical Information System Spatial Analysis


Inverse Distance Weighting


Modifiable Areal Unit Problem


Magnitude of Demographic Characteristic


Tetrahedral Network


Triangular Irregular Network



Planning is a future-oriented multi-disciplinary, comprehensive, and self-questioning i.e. a process of analyzing problems, designing alternative solutions, evaluating the alternatives and their consequences, making formal recommendations, and formulating strategies for action and participating in their developing and implementation. The intent of this process followed in both government and business is to improve collective decisions in the public and private sectors, so that today's solutions do not become tomorrow's problems.

Geographical Information System (GIS) is a system of hardware, software, data, and people to collect, store, analyze, manipulate, model, visualize, and disseminating information about areas of the earth. Data analysis can be defined as the extraction of significant facts embodied in a dataset. Spatial data analysis therefore is the process of seeking out patterns, associations and the extraction of useful information from data that are distributed over space to help in description, characterization, discovering, understanding and prediction of patterns and spatial phenomena. It comprises of a set of techniques for analyzing, computing, visualizing, simplifying and theorizing about geographic data. Modeling is a key ingredient of this analytical process, but measurement, statistical summary and visualization are also involved. In GIS, there are two common uses of word model. One is the idea of data model, an ideal schema for organizing data about the real world. The other is the symbolic representation of the relationships between spatial objects and their attributes i.e. the process to put together expressions of these general principles with representations of parts of the reference system to form a replica that exhibits behavior similar to that of the reference system.

Demography is the scientific study of human populations involving primarily producing new knowledge and understanding of human behavior through the measurement of the size, growth, density, distribution, and diminution of the numbers of people. Data describing a human population is referred to as demographic data and Demographics is the demographic information itself which is applied in business, planning and public administration. Geo-demographics is the classification of people or households in relation to their neighborhood and share similar socio-economic, behavioral characteristics, or demographic characteristics. Iso-demographics is having same magnitude/type of demographic characteristic. Demographics iso-lines are lines connecting points of same magnitude/type of demographic characteristic. Demographics iso-surfaces are surface having same magnitude/type of demographic characteristic. Demographic boundaries are the boundaries created by differences or change in demographic characteristics.

Ethnicity is the cultural practices, language, cuisine, and traditions - not biological or physical differences - used to distinguish groups of people. Race is defined primarily by society, not by genetics, and there are no universally accepted categories.

Cohort: A group of individuals sharing a common demographic experience with respect to an observed period of time (e.g., individuals sharing the same birth year or years, individuals who fall in a specified age range.)

Georeferencing (or Geocoding) The process of assigning a geographic location (e.g. latitude and longitude) to a geographic feature on the basis of its address. Geocode A code assigned to identify a geographic entity; to assign an address (such as housing unit, business, industry, farm) to the full set of geographic code(s) applicable to the location of that address on the surface of the Earth.

Mukim is word in Bahasa Malaysia used to refer to a subdivision almost equivalent to the English parish

Analisis dan pemodelan Ruang Demografi Menggunakan GIS


Sistem Maklumat Geografi (“Geographical Information System”- GIS) boleh digunapakai bagi menjalankan analisis ruang dan perwakilan kuantitatif serta pemodelan bagi data-data berkaitan ruang. Ia sesuai dengan analisis populasi yang menggunakan sifat-sifat data mengenai manusia bagi mendapatkan saiz populasi, komposisinya, ciri-cirinya dan bagaimana populasi boleh diagihkankan mengikut ruang.


Kajian ini dimulai dengan memfokuskan kepada isu-isu berikut (1) Apakah teknik GIS yang sesuai untuk digunakan pada analisis populasi yang spesifik? (2) Apakah dimensi GIS  yang sesuai untuk sesuatu analisa itu? (3) Apakah terma yang boleh digunakan untuk mengkategori dan menggambarkan demografi, dan (4) Bagaimana untuk menjalankan berbilang analisa ciri-ciri demografi secara selari. Motivasinya adalah untuk membawa manfaat teknologi GIS kepada proses perancangan secara amnya dan kepada demografi secara khususnya.


Dalam pembinaan teknik-teknik ini, kajian ini mengambilpakai pengalaman dan pengetahuan daripada usaha-usaha aplikasi GIS yang terdahulu dalam analisis perancangan dan demografi. Ia juga mengkaji bagaimana GIS analisis ruang dan teknik pemodelan boleh digunapakai bagi analisis demografi  dengan cara mengkaji keputusan analisis demografi yang konvensional, apa yang telah dilakukan dalam GIS-DSA (“GIS-Demographic Spatial Analysis”), dan apa yang dijangkakan oleh seseorang perancang pada demografi. Ia juga menggunapakai kaedah “geoferencing” individu menggunakan bangunan sebagai entiti ruang dan menggunapakai gabungan pakej statistik dan GIS untuk menilai data teragih serta mikro daripada kajian sebenar. Ini membenarkan kajian analisis demografi  dijalankan dalam bentuk “2D GIS”, yang melibatkan penghasilan modul prototaip (“Demographics Analyst”) sebagai perlanjutan daripada “ArcView” bagi menyiapkan analisis demografi jiran terdekat, analisis interaktif kepadatan, laporan  ruang alternatif, pengumpulan ruang progesif, dan lain-lain lagi. Diikuti dengan pengkategorian dan pemodelan permukaan demografi yang melibatkan pengenalan kepada berbilang terma demografi, penjanaan sifat dan kuantiti demografi, terjemahan dan aplikasinya; dilakukakan setelah metodologi ekstrapolasi dan interpolasi. Akhirnya, menggunakan 3D, satu teknik pemodelan “incremental solid” diperkenalkan, yang memerlukan kajian lanjutan untuk keseluruhan pembangunannya termasuk algoritma dan kod-kod bagi pembangunan perisian.


Kajian ini menunjukkan semua analisis demografi tidak boleh dicapai dengan baik dengan hanya meninjau atau membuat analisis dalam satu dimensi GIS sahaja; ia memerlukan GIS berbilang-dimensi untuk menjalankan GIS-DSA. Ia juga menunjukkan bagaimana berbilang dimensi GIS (2D, 2.5D dan 3D) boleh digunakan untuk melaksanakan tugas tertentu dalam analsis demografi teragih dan tidak teragih dengan pengumpulan data secara tradisional. Keperluan ini banyak meminjam terma daripada lain-lain aplikasi dan bidang bagi digunakan untuk menjana penvisualan sifat demografi dan kuantitinya yang selari dengan pandangan para perancang. Semua maklumat ini boleh diperolehi di



Geographical Information Systems (GIS) can be used to undertake spatial analysis, quantitative representation and modeling of spatial data; making them fit for population analyses which uses attribute data about humans in order to get the size of population, its composition, characteristics, and how they are and will be spatially distributed.

This study was initiated by the need to address the following issues: (1) which GIS techniques to apply to specific population analysis? (2) Which GIS dimension is fit for which analysis task? (3) What surface terms to characterize and visualize demographics, and (4) How to carry out multi-vertical analysis of demographic characteristics. It was motivated by the desire to bring the benefits of GIS technology to the planning process in general and to demographic analysis in particular.

In developing the techniques, this study learnt from the experiences and knowledge of previous efforts of GIS applications in planning and demographic analysis. It examined how GIS spatial analysis and modeling techniques are being used and can be utilized for demographic analysis by examining results from the conventional demographic analyses, what has been done in GIS demographic Spatial Analysis (GIS-DSA) and what a planner expects from demographics. It employed individual georeferencing using buildings as the spatial entity and using GIS and statistical packages to experiment on both aggregated and micro demographic data from field survey. That enabled carrying out demographic analysis in 2D GIS, which involved producing a prototype module (Demographics Analyst) as an extension to ArcView GIS to accomplish spatial demographic nearest neighbor analysis, interactive density analysis, spatial alternative appraisal, spatial progressive clustering, etc. Followed by demographic surface characterization and modeling which involved introducing various demographic terms, generation of demographic features and quantities, their interpretation, and application; done after developing Geo-demographic interpolation and extrapolation methodology. Finally, using true 3D, a technique of incremental solid demographic modeling for quantity analysis was introduced, which need further research for its full development including algorithms and codes for a software development.

This study shown that all demographic analyses cannot be achieved by looking or carrying out analysis only in one dimensionality of GIS; need to employ multi-dimensional GIS to accomplish GIS-DSA. It has shown how the various dimensionalities of GIS (2D, 2.5D, and 3D) can be used to accomplish specific tasks in demographics analysis at both aggregated and disaggregated with data collected by the traditional methods. This necessitates borrowing many terms from other applications and fields and even coining new ones to be used in generating visualizable demographic features and quantities that are in line with planner’s point of view. All these available at (thesis web site) with easy to use interface as described in Appendix G



Chapter I


1.1                   Background

In planning any area, the growth potentials must be expressed in terms of the population it is expected to sustain – the size of population, its composition, characteristics and its spatial distribution. To achieve this, the planning process has to use many attribute data about humans. Planning analyses require information on the different Demographic Characteristics (DCs), their quantities, how they vary from one location to another, how they are and will be spatially distributed in a study area and which DCs to be considered for which spatial planning analysis. Hence, demographics are the main inputs in the planning process.

This population data is normally collected at the point level (individuals/households) but it is always aggregated to existing spatial entities (e.g. administrative units) to allow tabulations according to various data attributes and demographic analysis using statistical techniques. Here the human geographical dimensions of the information in data during demographic analysis are being forgotten most of the time, making it only to be used in data collection. As a result georeferencing information is lost or hidden or details are difficult to extract. Openshaw (1994a) talks about "spatial analysis crime" a label applied to those agencies that hold spatial information and fail to adequately use or analyze it. The problem may not be the lack of interest in spatial analysis by those with the data, no total lack of suitable methods or taken up with the quantitative revolution years of the 1960s, but how to go over it (how to carry out demographic spatial analysis and modeling in GIS environment). Past research confirms that the Geographic Information (GI)-based tools developed by vendors and/or academics are for various reasons under-utilized (Harris, 1989; Harris & Batty, 1993; Klosterman, 1997; lee, 1995). Among the reasons for under-utilization of GIS in planning is incompatibility of the mostly generic GI tools with the tasks and functions performed by planners. Murray (1999) concludes that it is one thing to have digital data, but a far more challenging issue is how this data can be analyzed and modeled leading to its understanding even for the users who are not GIS specialists. This, then, is the central theme in this thesis: GIS demographic spatial analysis and modeling in 2D, 2.5D and 3D can generate visualizable features and quantities for planning analysis.

1.2                   Problem Statement

The focus in this research was demographic spatial analysis and modeling in 2D, 2.5D and 3D GIS. This study investigated how GIS spatial analysis and modeling improves demographic data analysis at both aggregated and disaggregated levels in the planning process where demographic information and spatial analysis are strongly interrelated. We seek for new ways and how the available 2D, 2.5D and 3D GIS spatial analysis and modeling methods can be manipulated to produce a set of techniques that can be used to generate visualizable demographic features and quantities for planning analysis. This is further explained in the paragraphs that follow.

To easily understand and fully utilize all of demographic data there is need to carry spatial analyses and modeling at both aggregate and disaggregate levels and to be linked to their locations. Thus, the problem ranges from representing spatially micro demographic data to aggregated data at which demographic features and quantities are required. GIS, with its spatially referenced data and spatial analysis tools can provide solutions and these problems correspond directly to the two key strengths of GIS - manipulation and display of spatially referenced data (Chrisman, 1997; Chou, 1997; DeMers, 2000; Tomlin, 1990). This is further facilitated by GIS’s capability to test and manipulate variables faster and as it is less expensive to test models rather than reality and can predict consequences of proposed activities through simulation, which helps to pick "best" alternative. It is described as an important revolution in the planning practice (UCGIS) as GIS-related capabilities, techniques, and methods contribute to several skill areas of professional planners. These include analytical/research, communication and data processing (Godschalk & McMahon, 1992; Friedmann & Kuester, 1994; Kaufman & Simons, 1995). However, existing GIS spatial analysis and modeling techniques are not directly tailored to Demographic Spatial Analysis (DSA). We need to highlight the weak areas in current demographics analysis and look towards demographic analysis, characterization, and modeling in 2D, 2.5D and 3D GIS, or a combination of them to be able to generate demographic features and quantities for planning analysis.

As we search for GIS-DSA, there are issues that need to be considered as highlighted by Gerland (1996). That unlike physical data (infrastructure, land cover, and land use), demographic data have some properties that make it difficult to deal as they are intended to explain or manage the behavior of individuals or groups. The position and the boundaries of demographic phenomena cannot be directly determined through observation or measurements, the phenomena are linked to people and their activities. Therefore, their distribution over space is often extremely uneven and heterogeneous, and these phenomena are not permanent but transient (Gerland, 1996). Also because of the very heterogeneous sources of population data, a variety of integration problems can occur (Gerland, 1996). Some of the problems are missing positional information, inconsistent classifications and methodologies, different spatial units, different levels of aggregation ("resolution"), thematic and spatial data gaps and different time references.

These proceeding reviews bring in what the researcher termed as GIS-DSA problems. (1) The problem of demographic georeferencing i.e. how demographic data be geocoded efficiently to represent DCs, carry out spatial analysis and modeling. (2) Which GIS techniques to apply for which DA. (3) The question of which GIS dimension (2D, 2.5D and 3D) can accomplish DSA task. (4) The issue of surface terms that can be used to characterize demographics. (4) Multi-vertical DCs representation and modeling to be able to deal with DCs having the same location.

Traditional population has been represented in 2D vector format, making it difficult to carry out surface analysis. To overcome the vector-based areal analysis, the raster surface has been proposed which uses discrete irregular data locations using the concept of ‘moving-kernel density estimation’ (Bracken & Martin, 1989; Martin, 1989). All in all population analysis in 2D has its own disadvantages in that the attributes are just linked to the x, y coordinates and can not be used as an integral part of the location during analysis and modeling; necessitating the move to 2.5D to overcome limitations. The idea that population can most appropriately be mapped and modeled as a surface (2.5D) is not new, Schmid and MacCannell (1955) discussed the construction of contour-based maps of population density, while Nordbeck and Rystedt (1970) demonstrated that population density can be viewed as a continuously varying reference interval surface. Tobler (1979) presented a method for pycnophylactic (volume-preserving) interpolation of values from irregular zones into surface form, and Goodchild, et al. (1993) reviewed a number of approaches to areal interpolation, noting that the process can be viewed as involving the estimation of an underlying population surface. Bracken and Martin (1989) dealt with the generation of socioeconomic surfaces for public policymaking, to over come the problems inherent in the analysis and presentation of such data in conventional area-based form. Martin (1996) comes out with use of surface representations to overcome problem of zonal boundaries, which support spatial analysis. The latest development in GIS population analysis and modeling is population geocoding using raster-based techniques, which use regular grid for modeling (Martin, 1999). Martin (2000) has developed a FORTRAN program for surface construction, which has been written for a Unix workstation, the technique is not currently included within any commercial GIS software. He is now working on a Visual Basic implementation. With those developments of population surface representation, the question is what surface terms can be used to characterize demographics. This calls for coining of terms to facilitate generation of visualizable demographic features and quantities.

Another issue is that some DCs are represented spatially by two entities that differentiate them. For example gender, it either male or female; marital status is either single or married; etc. when these entities are modeled in GIS with the latest development in GIS surface analysis and modeling, we are able only to show their spatial locations and extents but not their spatial quantities. For a planner who is always looking how much and how DCs vary as we move from one location to the next, that does not provide total solution for s/his needs. They lack the true z values that should be the quantity of DCs, hence need techniques to represent them in such a way that their quantities can be represented spatially at the same time differentiating them by their traditional characteristics i.e. the combination of the surface and the Solid analysis and modeling.

1.3                   Objectives of the Study

The objective of this study was to investigate how GIS (2D, 2.5D, and 3D) spatial analysis and modeling improves the demographic data analysis at both aggregate and disaggregate levels in the planning process in order to come up with documentation of a set of GIS demographic spatial analysis (GIS-DSA) techniques for generating visualizable demographic features and quantities. The following apriori sub objectives are formulated:

·        To assess how demographic analysis is being conducted and the current weaknesses.

·        The available GIS spatial analysis and modeling techniques that can be employed.

·        Requirements of a planner from demographics

·        Coining of new surface terms for surface characterization of demographics.

·        Development of GIS techniques (2D, 2.5D, and 3D) for DSA.

·        The utilization of GIS-DSA techniques in planning process.

To accomplish the objectives, to keep focused and help in setting out the methodology, following research questions were formulated:

1.4                   The Approach of the Study

The approach starts by a review of literature of GIS and demographics in planning followed by Demographic Statistical Spatial Analysis (DSSA). We then look at the methods of GIS data analysis and modeling; GIS in planning analysis. After that, we start questioning what GIS-DA requires, how GIS-SA can be manipulated in relationship with results from DSSA to get GIS-DSA. Up to this stage, the concentration is in the conventional 2D GIS and done following the model given Figure 1.1

Figure 1.1 Model of carrying out GIS demographic spatial analysis

Then introduce 3D demographic spatial analysis and modeling, which is modeling the vertical dimension and encompasses the following general tasks as given in Figure 1.2.

Figure 1.2 Three Dimensional demographic modeling tasks

·        Generation:  reading demographics from the database, formation of relations among the diverse observations i.e. model construction.

·        Manipulation: modification, refinement and derivation of intermediate models.

·        Interpretation: analysis and information extraction.

·        Visualization: graphical rendering and derived information; and 

·        Application: development of appropriate application models for planning purposes.

Modeling the vertical dimension is divided into surfaces-based (2.5D) and true 3D (solid) modeling. Start by looking at the shortcomings of 2D GIS-DA; then introduce the new demographic surface terms, their representation and the derivation of DCs from the surface and their interpretation. This proceeds by employing the conventional techniques from terrain analysis and modeling (DEM and DTM) and new techniques. Before embarking on a detailed description of the nature of GIS demographic surface analysis and modeling its scope is defined by addressing a number of underlying questions. First, what should a characterization of demographics in terms of surface attempt to achieve? Here look at characterization as having three specific objectives to answer, namely to identify spatial pattern, to facilitate interpretation, and to allow visualization of results, the whole aim being their applications. Second, How should demographic surface be modeled? In order to provide an objective scheme for development and evaluation of characterization tools, first demographic surface terms are identified and define demographic surface in terms of its form (generation), what lies upon it (appearance) and what it is used for (planning); which involves interpretation, visualization, analysis and application.

As we accomplish these, we shall be contributing to what difference does it make to carry out demographic spatial analysis in GIS in terms generating visualizable demographic features and quantities.

As this approach is followed, we can say from Namboodiri’s (1991) definition of scope of demographic studies[1], this thesis is concerned with demographic analysis of size, composition and spatial spread and not with the conditions (deaths, migration, and birth) that produce those changes and the implications (e.g. mortality, fertility) of changes in population structure. Have limited the terms size and composition to number of people and their make respectively to age, sex, ethnicity, and marital status which has been categorized into single (unmarried, divorced, living alone, separated, widowed) and married. This is done so that we focus on contribution of GIS to DSA and not the various categories of DCs. Using these variables devise and developed techniques in form of 2D, 2.5D and 3D GIS; these are neither to replace nor to substitute the statistical analysis but to complement, enhance and fill the missing gap which an analyst may face when carrying out demographics in GIS. In addition, this is not about implementing statistical methods as that have been done elsewhere and only reference will be made to such methods.

1.5                   Thesis Layout

This thesis main body is divided in four parts under six chapters (Figure 1.3).

Figure 1.3 Visual overview of the main portions of the thesis

First, is the “Introduction” covered in chapter one that highlights the research background, motivation, problem statement, objectives, methodology and list of Acronyms. Second is the Literature Review (chapter two) of GIS and demographic data in planning analysis, what is needed by a planner in terms of DA, information supposed to be derived from such data and analysis, the concerns and methods of DSSA, and what is lacking in them and then GIS-DSA and concludes by looking at a possible structure of utilizing GIS-DSA techniques in planning analysis. Third, is the “Approach” divided into chapter three which covers the study area, data used in the study, database design, connection between DB and GIS, evolving the GIS-DM, visualization of demographics, and introduces multi-dimensional GIS by outlining 2D, 2.5D, and 3D GIS to give an insight on GIS analysis and modeling techniques. Chapter four deals with demographic spatial analysis in 2D GIS. Chapter five starts with demographic data interpolation and extrapolation then demographic surface characterization. It ends by developing a model for 3D demographic spatial analysis. Finally, the Conclusion in chapter six which summarizes the research, findings, contributions, application and highlights areas of future work.

Chapter II


2.1                   Introduction

At the core of this thesis is the perceived need for improvement and new spatial techniques in the current planning practice for the planner to carry out demographic analysis. The term spatial is crucial as it means that it is not enough for the planner to carry out aspatial demographic analysis. This chapter examines four important questions that are the basis and key to GIS Demographic Spatial Analysis (GIS-DSA). The first, why there is need for demographic analysis in planning; the answer to this question is important because it places this study in the context of planning. The second question, how demographics are analyzed; by answering this, set justification for developing the GIS-DSA for planning analysis. Third, demographic data itself and what are the setbacks for a disaggregated spatial analysis? Then finally, what are problems and attempts that have been done to empower GIS for demographic spatial analysis? So that we learn from others and we do not fall in the likely potholes as we move to GIS-DSA.

2.2                   Demographic Information in Planning

Although physical planning has been the traditional focus (Solesbury, 1974), demographic is an important concern, as plans must reflect the interests and priorities of people and the implementation of plans must take into consideration the possible conflicts of DCs. One of the main reasons for studying demography is that population structure and change are intertwined with physical, social, economic and political structure (Namboodiri, 1991) and the study of demography is essential to understanding these linkages so that have a grasp of what the available facts are and how to use them to examine the determinants and consequences of population trends. In addition, understanding the determinants and consequences of population trends is essential in implementing and evaluating polices and programs that are often introduced with the view to steering population trends in specific directions.

In planning analysis, demographic data is termed as being very important (Plane, et al., 1994) and in sustainable development, which is, ultimate planning goal; it is declared as the ultimate requirement (Chapin, et al., 1985). This is so, as in planning any area, the growth potentials must be expressed in terms of the population it is expected to sustain. To achieve that, there are many aspects that are needed by a planner in terms of demographic analysis – chief among those is analysis of the size of population, composition, characteristics, spatial distribution, (Chapin, et al., 1985; Haining, 1990; Klosterman, et al., 1994; Plane, et al., 1994), generation of demographic features and quantities and selection DCs to use in the different planning analysis. DCs affect priorities in health and social care and they can be used for predicting where money needs to be spent, saved or redirected. For example; if there was a sudden rise in births, a lot more money would have to be spent on education. At a more national level comparisons can be made from area to area. For example, if one area shows a particularly high rate of deaths from smoke related cancers; extra resources might be directed in to health education in that area. DCs can also identify areas of need for major projects such as determining quantity and location of important facilities like housing, schools, hospitals, etc and amenities like parking spaces, electricity supply, drainage system, water supply, wastewater treatment, and solid waste capacity depend on demographics. So demographic information is useful for: assessing need, prioritizing need, planning ahead, saving money, justification of expenditure, justification of action, estimating expenditure, and evaluating. A plan formulated without serious consideration of the carrying capacities of the population (DCs) will eventually leads to man made disasters like traffic congestion, pollution, flood, strained infrastructures and a wholesale destruction of the delicate balance in our environment[2] as emphasized by several planners and leader[3].

2.3                   Demographic Data Analysis

2.3.1        Aggregation Demographic Analysis

Most of the population data, e.g. Penang, Malaysia; is reported according to age (5 year cohorts), sex, marital status, ethnicity, household size, type of household (like single person household, nuclear family households, extended family, headship rate, and non private households), and growth rate; nevertheless, aggregated to mukim (see Table E.2) to be used in planning analysis[4].

By analyzing population according to such divisions and other areal geographies such as census zones, electoral constituencies, or local government areas leads to problem of modifiable areal unit problem (MAUP) as fully discussed by Openshaw (1984). It gives rise to false interpretations where analyses are made purely because of arbitrary aggregations of the data  (Fotheringham & Rogerson, 1993; Openshaw, 1984) and may even produce relationships that are non-existent (Thomas & Huggett, 1980). However, Openshaw (1984, page 33) reminds us of the important fact that “the MAUP exist because of uncertainty as to what are the spatial entities that are being studied”. If this uncertainty is removed, then the MAUP disappears with it.

Another issue is that of ecological fallacy, which is the inappropriate inference of individual/household-level relationships from areal unit results (Wrigley, et al., 1996). Any observed pattern in the mapped data may be largely due to the particular configuration of zonal boundaries used; and the relationship between variables that are observed at one level of aggregation may not hold at the individual or any other level of aggregation (Martin, 1996, 1999). Many researches have been carried out in this area including Openshaw (1984) where he quantified for the typical range of ecological fallacy problems, which might be expected in census data analysis via a study of 122,342 households census records for the city of Florence, Italy.

With MAUP and ecological fallacy, population aggregation becomes a critical problem when it comes to allocation in planning; among these are economic resource allocation, facility location, and recreation location planning; which all require detailed population spatial distribution (section 2.3.3). This has led to a trend in application of GIS for population analysis and increased precise georeferencing. As a result new data series are becoming available which offer the ability to directly georeference individual property to sub-meter precision such as the UK’s ADDRESS-POINT product (Martin, 1996). However to argue that the answer to minimize the impact of the MAUP and ecological fallacy is by only using data at the lowest possible level of aggregation and doing away with zones altogether by moving to frame independent forms of spatial data representation and analysis; there are problems with both approaches and it seems inevitable that population analysis will continue to use aggregate data. Before we look for different ways of analyzing population to avoid such analysis problems, it important to know why analysis is done at such aggregation level so that advantages of aggregated geographical analysis are incorporated in our developments and improvement, the justifications for this analysis could be seen as (Openshaw, 1991; Martin, 1996,1999):

·        Some analysis and many social phenomena (such as unemployment rate) cannot be measured for an individual, but only have meaning in relation to aggregation data.

·        The density index is simple and so can readily be understood by policy makers and political subdivisions (zones) are the smallest geographical unit for which this index could be accurately calculated because the zone areas are known.

·        The zones boundaries are statutorily defined and possesses a legal significance- this is important because public expenditure has been conventionally allocated only through processes that recognize official geographical areas. Zones have been used for a number of other key indicators with resource implications index like defining deprivation areas for health service resource allocation.

·        Due to the requirement for protection of individuals’ identities

In addition the above justifications, it has the following four main advantages: 1) Aggregating data does not provide much burden on computing resources. 2) Do give sufficient insights into system-wide behavior by facilitating the understanding of the behavior of groups of people (Fotheringham & Rogerson, 1993). 3) Spatial aggregation is a form of simplification, which furthers our understanding of a complex problem, and 4) by aggregating spatially, errors in poor quality data will tend to cancel.

2.3.2        Disaggregated Demographic Data

Aggregation has the above justification and advantages, but true analysis of population structure is about looking at it at disaggregated level of families, households, individuals, gender, race, age, ethnicity, etc. As Eversley, et al. (1982) states, the key to the analysis of consequences of change in demographics for welfare policies lies not in the aggregation of number, but in the structure of the population. Taking an example like a change (decrease) in the population may be represented by a proportional reduction of households of all types, or a similar reduction in average households size, or a combination of both of these, by considering the population at that aggregation, we work on the assumption that the populations are similar in structure, composition, and distribution. Under such assumption; change in policy like expansion of infrastructure would be constant, these assumptions are sometimes quite unrealistic as the actual situation on ground may be that different small areas have different population densities, not forgetting that areas will be having different population demographic composition leading to different needs for each area.

Also planning for new area, disaggregated demographics plays a big role, as different age cohorts tend to settle in the same area. This has been experienced in many cities like the analysis carried out by Burnley, et al. (1997) where from the socio-demographic profiles of movers to outer suburban Sydney (Australia) were markedly over represented in the 25-34 age-cohort (45.1% of the movers) making it the peak home-purchasing group. From the findings done on two North American cities of Toronto and Vancouver in Canada, Skaburskis (1997) gives determinates for locational preferences and the age 30 when most children leave their parents to form households, which fall within the 25-34 cohort given by Burnley, et al. (1997).

There is need to examine the changing pattern of demography at micro level to monitor the needs and demands as the growth in the number of households has greatly exceeded the rate of population growth in most countries and so more dwellings would be needed even if overall population remained stable. Taking for example the average household size in Great Britain fell from 3.3 to 2.5 persons between 1961 and 1991 (Chris, 1995). This change alone implies that an additional 100 dwellings may have been required for every 1000 persons in the population in 1991 than had been the case 40 years before. When Vaupel (1998) examines ageing and longevity through a demographic analysis "the population of most of the world's countries are growing older. This shift is creating a new demography, demography of low fertility and long lives. The rapidly growing populations of the elderly are putting unprecedented stresses on societies, because new systems of financial support, social support, and health care have to be developed and implemented”. Thus, it is important to analyze DCs to show where the change is occurring i.e. where marriages are taking place, where the fertility is constant or changing, separation and divorce leading to single parent households. The inter-metropolitan distributions: where the specific age cohorts are concentrated in inner city, outskirts, where pensioners are living; how the area varies.

There is need to analyze disaggregated population sparsity in planning as it may cause additional costs (staffing, fabric, additional transport, etc) in deriving public services to population in different areas. As this had been carried out according to political boundaries, this becomes costly to derive services to areas, which are not densely populated compared to densely populated areas.

Although by aggregating spatially, errors in poor quality data will tend to cancel, research evidence, however, suggests that for a given set of data, aggregating data into spatial units the benefit occurs only when the data quality is poor. The issue of the computing limitations is now history as modern electronic computers are able to handle large volumes of data when a highly disaggregated spatial representation is adopted especially with the GIS technology which is capable of handling the finest level of resolution required. Looking from the data observation point, traditionally population are collected mainly using the field survey method where information about each individual is recorded, but this population data is later aggregated mainly according to zones to be used in planning analysis. This leads to hiding important spatial information mostly about individuals and end up by not using data of spatial allocation of people, which is much needed for efficient social planning. We have to change and the question of resources should not be the main setback as the same data collected traditionally can be utilized in disaggregated spatial analysis.

About confidentiality concerns, many authorities have expressed their views including Openshaw (1994b) view on privacy, where he says that, there are civil liberties, privacy, and data protection excuses that can be applied, but these are often grossly exaggerated, misunderstood, and treated only in a highly negative manner. He further argues that in the UK the data protection act of 1984 is not so much protecting sensitive data as preventing or scaring people from performing analysis and concludes by saying, this is clearly not right, more especially if there is a public goods case that can be made. Privacy, confidentiality, and civil rights should not prevent analysis so much as to ensure that the information is not released or misused in a way that impact on the privacy of the identifiable individual. This is very important but it is not a valid excuse for non-analysis within appropriate confidentiality envelopes and barriers (Openshaw, 1996). Even though the data have not been released at the micro-data level, the planner has to analyze and take advantage of using non-aggregated data. Software control can be used to protect privacy after the planners have analyzed the data at the micro-data level. In some parts of the world, this was over come long time ago like Nordbeck and Rystedt (1970) cover the case in which individual people are directly observed at coordinate locations, and also Sweden data are publicly available with the geographical coordinates of individual houses (Tobler, 1979). Finally, disaggregate models enable us to learn about individual behavior and from this aggregation can be performed where the planner is not interested in individual behavior.

The areal aggregation approach is the most common and includes the conventional choropleth census mapping and census type data are generally available in this form. Martin (1991) points out the limitations of this approach, including the inherent assumption that geographic space is divided into internally homogeneous zones with all change occurring across zone boundaries. This is a fundamental flaw, as neither the attributes characteristics (age, sex, race, etc) nor the distribution of population can reasonably be expected to be uniform within any arbitrarily defined areal unit. Although total counts and summary statistics for the attributes defined will be correct, this information is impossible to interpret up to individual level precisely.

2.3.3        Need for Demographic Spatial Analysis

After looking at the way population is handled, and since planning is becoming increasingly demanding both spatial and non-spatial data (Jebasingam, 2000; Nasruddin, 2000); an understanding of the spatial distribution of human activity has an increasing important role to play in answering many questions as expressed in section 2.2. It provides valuable information on the analysis of settlement and neighborhood patterns and the distribution of residential, commercial, and industrial activities. It has been argued that social and spatial structure “feed upon one another” to, in a literal sense, eventually become each other. Knowing where the young and old live, tells us about the changing social relationships and their individual geographies. It is a basis for a better knowing of why society is organized as it is, making us to learn from history so that we properly plan. Dorling (1994) argues that spatial patterns of a population reflect the social structure. These patterns reflect a reality that impinges upon everyday lives; lives whose course is governed by that social structure, a social structure that is changing spatially.

Spatial population analysis becomes a critical problem when it comes to allocation in planning; among these are economic resource allocation, facility location, and recreation location planning; which all require detailed population spatial distribution (Planes, et al., 1994). Openshaw (1991) gives three basic types of locational problem of interest here:

1.      Pure location problem: which involves finding the optimum geographical or spatial location of a single facility to serve a fixed set of demand. This needs the evaluation of the demand that is expected at potential location. Taking the example of location of youth center, the question is which site is likely to attract more youth.

2.      Location problem: here it is assumed that the location of demand points (viz. people) and the location of two or more facilities are both known and fixed. The problem is to determine an optimal allocation of the demand points to facilities so as to satisfy a particular objective; for example determining which people go to the youth center.

3.      Location-allocation problem: this combines the above two. That is, given a set of fixed demand points determine the optimum locations for two or more facilities so that they best serve the demand points. For example determining the location of more youth centers given the population in the study area.

From this, we see the need for spatial population at individual level according to DCs, as different age groups, sex category, and marital status need different facility to be located in area where such population with specific DCs is living, as one zone may be having population with few falling under specific youth category. Also not taking population spatially at disaggregated level has the weakness in that a wrong conclusion may be made like two zones having the same total population and differ in that one zone may be having households concentrated in one corner. Thus, we should not take households and zones as being homogenous in their composition and size. There are several reasons why spatial analysis is key to integrated demographics assessment framework.

·        There is a strong link between humans and their environment. Spatial analysis techniques and methods help to incorporate spatial elements in order to develop clearer picture of this human/environment link.

·        For population study, different group's outcomes require varying spatial resolution of population model.

·        People’s actions and activities are spatial. Adding a spatial perspective can often add an important dimension to a study. For instance, using spatial analysis researchers could identify geographic clusters, define service areas for facilities as well as develop models to calculate the impact of changes in populations.

·        Demographic spatial analysis is needed more in this 21st century, as the newest generation of adults, younger than most residents is becoming the majority (Listokin, et al.); and different groups of city residents have become more sophisticated in pursuing their special interests. They are better informed, understand laws and procedures, have greater political skills, and are more militant and persistent. They have learned that planning brings order to change. Planners have to respond by directing public services and capital improvements toward upgrading the quality of life in those areas that have unique attractions for various groups.

Thus, spatial demographics are needed in the preparation and implementation of comprehensive plans, location-allocation plans and development plans, which deal with the process of growth/decline of cities and metropolitan areas.

2.4                   Demographic Statistical Analysis

Before we go GIS Spatial Analysis (section 2.5), let us look at the methods and techniques that have been most commonly used to analyze demographics. This is necessary to guide us as we seek for GIS-DSA. The methods of statistical data analysis fall into Multivariate data analysis and univariate data analysis. Univariate analysis involves the examination across cases of one variable at a time and three major characteristics are looked at: the distribution, the central tendency and the dispersion. Univariate statistical techniques represent a variety of basic descriptive statistics and include: Poisson process, nearest neighbor analysis, etc (Haining, 1990; Cressie, 1991; Fotheringham, et al., 1994). There are situations where several possible response variable needs to be dealt with simultaneously and this bring into consideration multivariate statistical techniques. Multivariate statistics provide the ability to analyze complex sets of data where there are many independent and possible dependent variables, which are correlated to each other at varying degrees. The most commonly used methods of modeling multivariate data include (Fotheringham, et al. 1994; Plane, et al. 1994, Trevor, 1994) descriptive statistics, multivariate statistical analysis, multivariate spatial correlation, Clustering, Geostatistical, Spatial econometric modeling, factorial ecology and spatial general linear modeling. Since this thesis is about spatial analysis, only the spatial statistical analysis techniques will be looked at.

2.4.1        Spatial Statistical Analysis

Under spatial analysis, the focus is a spatial data set i.e. a data set in which each observation is referenced to a site or area (geographical location). Much of demographic data is collected in spatial context and methods of analyses of spatial data include data description, map interpolation, exploratory data analyses (including descriptive statistics), explanatory analyses, and confirmatory data analyses (statistical inference, development and testing of models) (Haining, 1990). Spatial statistics (statistics concerned with data collected at various points in space) summarize and describe numerically a variety of spatial patterns. Spatial statistics fall into three categories: Point pattern analysis, spatial autocorrelation and Geostatistics (include descriptive spatial statistics) (Cressie, 1993)       Spatial Pattern Analysis

Generally, analysis of spatial data involves the usage of either the connectivity or the similarity of spatial objects. A data set consisting of irregularly distributed points within a region is referred as (spatial) point pattern. In point pattern analysis the spatial properties of points are studied rather than the individual entities (quality of the point). Points are one-dimensional (1D) features, thus the valid measures of point distribution are the number of occurrences in the pattern and respective geographic location (Chou, 1997). The objective of analysis of a point pattern may involve test of complete spatial randomness, estimation of intensity, stochastic model fitting (involving or containing a random variable or variables) etc, to provide an explanation of the underlying processes. Point pattern analysis is concerned with the location of events, and with answering questions about the distribution of those locations, specifically whether they are clustered, randomly or regularly distributed (Bailey & Gatrell, 1995; Cressie, 1993) so that we able to determine the relationships between DCs. To determine such relationships use a number of techniques including the use of basic descriptive statistics (i.e. mean, standard deviation, etc.), Poisson process using the chi-square test statistic, nearest neighbor distance using the R test statistic, quadrat analysis and spatial autocorrelation using Geary's C and Moran's I test statistic (Chou, 1997)       Nearest Neighbor Analysis

Nearest neighbor analysis examines the distances between each point and the closest point to it (Fotheringham, et al., 1994; Wulder, 1999). The Nearest neighbor is a method of exploring pattern in locational data by comparing graphically the observed distribution functions of event-to-event or random point-to-event nearest neighbor distances. Either with each other or with those that may be theoretically expected from various hypothesized models, in particular that of spatial randomness (Upton, 1985), i.e. it describes distribution of points according to their spacing. The Nearest neighbor index measures the degree of spatial dispersion in the distribution based on the minimum of the inter-feature distances (Chou, 1997), i.e. it is based on the distance between adjacent point features. Such that the distance between point features in a clustered pattern will be smaller than in a scattered (uniform) distribution with random falling between the two. Thus Nearest neighbor analysis puts us in position determine how sparse the DCs are as we carry out planning.       Spatial Autocorrelation

Spatial autocorrelation may be defined as the relationship among values of a single variable that comes from the geographic arrangement of the areas in which these values occur. It measures the similarity of objects within an area, the degree to which a spatial phenomenon is correlated to itself in space (Cliff & Ord, 1973, 1981), the level of interdependence between the variables, the nature and strength of the interdependence. Spatial autocorrelation is an assessment of the correlation of a variable in reference to spatial location of the variable. Assess if the values are interrelated, and if so is there a spatial pattern to the correlation, i.e. is there spatial autocorrelation. Spatial autocorrelation tools test whether the observed value of a variable at one locality is independent of values of the variable at neighboring localities. Spatial autocorrelation may be classified as either positive or negative. Positive spatial autocorrelation has all similar values appearing together i.e. a map pattern where geographic features of similar value tend to cluster on a map. While negative spatial autocorrelation has dissimilar values appearing in close association i.e. a map pattern in which geographic units of similar values scatter throughout the map. When no statistically significant spatial autocorrelation exists, the pattern of spatial distribution is considered random (Chou, 1997). With this analysis, we can test to know the characteristics of the population where similar DCs are uniformly distributed or concentrated in one locality.       Descriptive Statistics

Descriptive statistics addresses itself to summarizing in brief form the information contained in a distribution. They are used to describe the basic features of the data in a study. They provide simple summaries about the sample and the measures, present quantitative descriptions in a manageable form and help us to summary large amounts of data in a sensible way. Descriptive statistics are divided into basic descriptive statistics and descriptive statistics for spatial data (Chou, 1997; Willemain, 1980): Basic descriptive statistics are aspatial- include 1) central tendency, which shows the trend in the distribution: include mean, median, and mode. 2) Dispersion that shows the extent of dispersion about the central tendency includes the range and the standard deviation. 3) The entropy is an index of uncertainty representing in a quantitative way, how well we can predict which value a random variable will take on. 4) Skewness measures the extent to which the bulk of the values in a distribution are concentrated to one side or the other of the mean; and 5) kurtosis measures the extent to which values are concentrated in one part of a frequency distribution. Descriptive statistics for spatial data, unlike the first group which deals with a simple set of numbers and do not refer to geographic location, x, y coordinates or anything spatial, this leads us to understand spatial relationships among points and the distribution of point features. Employ techniques like spatial dispersion, spatial arrangement, spatial mean, geometric center and standard distance.

2.5                   GIS Spatial Analysis

GIS and spatial analysis have enjoyed a long and productive relationship over the past decades (Goodchild, et al., 1992; Fotheringham & Rogerson, 1994), the origins of spatial analysis lie in the development in the early 1960s of quantitative geography and regional science (Chou, 1997) and many developments have taken place. GIS has been seen as the key to implementing methods of spatial analysis, making them more accessible to a broader range of users, and hopefully more widely used in making effective decisions and in supporting scientific research. GIS is different from other statistical analysis because the attribute data has established links to maps for visual analysis (Clarke, 1997). Any statistic we can think of to describe the data then automatically has geographic properties and as a result can be placed on maps for visual processing. It has been argued that in this sense the relationship between spatial analysis and GIS is analogous to that between statistics and the statistical packages. Specialized GIS packages directed specifically at spatial analysis have emerged (Bailey & Gatrell, 1995). Anselin, Chou (1997) and Fotheringham, et al. (1994) have discussed the ways in which implementation of spatial analysis methods in GIS are leading to new exploratory evidence.

The analysis of spatial order and spatial association requires the following three elements of spatial information: 1) the exact location of every spatial feature must be available, 2) attribute data which provides important information about the properties of the spatial features under consideration, and 3) topology responsible for defining the spatial relationships between map features (Burrough, 1986, Chou, 1997). From Chou (1997), DeMers (1997) and Heywood, et al. (1998) GISs are indispensable for spatial analysis because of their ability to integrate all the three elements of spatial formation in locally consistent manner. A database management system handling only attribute is best used for aspatial statistical analysis. A computer system capable of handling location and attribute data but not topological elements is suitable for automated cartography, but not spatial analysis. A typical automated cartography system provides mapping functions for organization and presentation of spatial information but spatial relationships among map features can be effectively processed only by using GIS that provides the functionality to handle all three types of elements

2.5.1        Need for GIS Demographic Spatial Analysis

We need to answer questions like "what if?” which is probably the comprehensive attempt to date in simulating land use scenarios (Klosterman, 1999), this needs demographics to play a vital role in the outcomes, but yet to gain usability in planning at various levels of government (Matheny, et al., 1999) due to lack of easy reference to spatial databases. Demographic data for planning analysis has been traditional analyzed by statistical techniques (section 2.4) and various models have been developed like Population Analysis Spreadsheets (PAS), Spread model (Klosterman, et al., 1994). Most of these models although they can account for change in demography, they lack the spatial aspect as it is not possible to geographically view and analyze the patterns, ignoring the demographic spatial dimension. There are approaches from different fields trying or which have taken advantage of GIS’s spatial analysis capability in order to incorporate the spatial aspect like geology, terrain analysis, etc. Although the GI-based tools have proved useful for understanding physical and environmental processes, the socio-economic dynamics are still hard to model and/or simulate making the use of GIS in demographic data not fully utilized. Nevertheless, it is rapidly expanding, as Martin (1996) reports; the 1991 census of population was the first in UK to be conducted in what might be called the ‘GIS era’. The 2001 census geography is designed by and for GIS (Martin, 1999), and many other areas have or are integrating spatial aspect like in Tiger files[5] and PopMap[6].

When it comes to planning, for almost two decades in 1960s and 1970s GIS and planning modeling developed in parallel with few interactions (Sui, 1998). But, this changed by the 1980s and by early 1990s; it was and perhaps still a general consensus within GIS community and GIS applications that the lack of analytical and modeling capabilities is one of the major deficiencies in the current generation of GIS technology. This seriously limits the usefulness of GIS as a research tool to analyze spatial data and relationships (Anselin & Getis, 1993; Goodchild, 1987; Fischer & Nijkamp, 1992; Openshaw, 1991; Sui, 1998).

2.5.2        Spatial Analytical and Modeling Capabilities into GIS

Many researchers like Goodchild, et al. (1992), Anselin and Hudak (1992), Fischer and Nijkamp (1992), Fotheringham and Rogerson (1994), and Fischer, et al. (1996) are working towards overcoming lack of spatial analytical and modeling capabilities in GIS. To address this shortcoming, extensions toward enabling statistical analysis within the GIS environment have been attempted (Zhang & Griffith, 1997; Anselin). In each case there is an increasing distinction being made between those who see the analysis functions as being driven by the user[7] with the emphasis on interactive spatial statistics (e.g. using Xlisp Stat) with various graphic devices providing a means of visualizing patterns and relationships; and others[8] who regard the analysis as being performed largely automatically by machine but with some user guidance, intuition and insights, whilst also presenting the results back for further interpretation. As Openshaw (1995) points out, the aim is to create an intelligent partnership that allows machines to do what they are best at (e.g. pattern sifting) and let human beings do what they are good at (interpretation, application of experience, ability to think laterally, etc). He concludes that the optimal approach is clearly a combination of the two. In addition, various approaches have been proposed which include integration of spatial analysis methods and other models in GIS that has led to a new exploratory analysis (Goodchild, 1987; Haining, 1990; Fotheringham & Rogerson, 1993; Openshaw, 1994c; Openshaw, 1997). This followed Fotheringham, et al. (1994); Chou (1997); and DeMers (2000) conclusion that GIS is an incomplete set of spatial analytical tools, in many cases we are obliged test or combine GIS tools with statistical analysis and others in order to accomplish spatial analysis. Many on-going research on the linkage and integration between GIS, spatial statistical analysis and other models (e.g. Arentze, et al., 1996; Fotheringham, et al., 1994) have suggested the following approaches.

1.    Embedding GIS-like functionalities into other spatial analysis or modeling packages (Birkin, et al., 1987) e.g. XLisp-Stat Package which extend the geographical data handling and mapping facilities of statistical programming package (Openshaw, et al., 1996; Tierney, 1991).

2.    Embedding urban modeling into GIS e.g. urban data management system (UDMS), TransCAD, ArcView Spatial/Network analysts (Sui, 1998)

3.    Loose coupling, where GIS package and urban modeling program (e.g. TRANSPLAN, TRIPS) or statistical package (e.g. S-Plus, SAS, SPSS) can be maintained as two separate packages and simply exchange data between the two systems via data exchange using either ASCII or binary data format (Shaw, 1993; Sui, 1998; Sui & Lo, 1992). The approach writes information from a GIS into a file and read this into a statistical package to carry out the analysis. The results are then read back by GIS. Anselin, et al. (1993) have combined SpaceStat, a program for the analysis of spatial data, with the Arc/Info using this approach.

4.    Tight coupling: Spatial analysis or urban models can be fully integrated within the GIS software via either GIS macro or conventional programming (Anselin, et al., 1993; Ding & Fotheringham, 1992; Sui, 1998).

Although loose coupling is the simplest approach (Densham, 1996; Fedra, 1991; Nyerges, 1991) it has many problems (Densham, 1996) including being unable to drag-drop, inconsistence between data structures causing versioning problems that both introduce errors to and propagate them through analysis. In Carver (1998), we see that to export spatial data from the GIS to standard statistical systems is not an adequate solution, because the nature of spatial data requires specific spatial analytical functions. In addition, it is not realistic to embed GIS functions into a spatial statistical package although it seems to be an overwhelming preference (Carver, 1998). A full integration of spatial analysis tools into a GIS seems most promising (Hansen, 1996). And that using this strategy we can utilize the interactivity between maps, charts and spatial statistics to get a good feeling of patterns and relationships within the data; examples include Arc/S-Plus (Arc/Info is linked to S-Plus), SpaceStat integration with ArcView GIS by Anselin, Openshaw’s Geographical Analysis Machine (GAM) (Openshaw et al., 1987). Specialized GIS packages directed specifically at spatial analysis have emerged as given by Anselin (1996, 1999), Anselin and Getis, (1993), Bailey and Gatrell (1995), Haining (1990), a good example is IDRISI for windows.

Coming to relation with this study, these are not directed towards generating demographic features and quantities from the planner’s point of view and the difficulty the planner often experiences in understanding what the results mean in relation to planning analysis is the issue here, not the integration. The principle need is to develop a style from the existing techniques and documentation of spatial analysis for GIS demographic analysis, so that the planner (as a user of GIS) can use, not to force a planner to the methods that were created by experts for experts (Openshaw, et al., 1996). A fully integrated system can be built by designing both GIS and analytical capabilities around a common data model and providing a single user interface i.e. model inside GIS environment. Although, modeling inside GIS is one of the most frequently cited deficiencies of GIS (Harris & Batty 1993); but Longley and Batty (1996) concludes that GIS should be adapted and extended so that it is made relevant to the domains to which it is applied as well as to the ways in which it can be used to extend science. It is this approach being adopted (in this thesis) where demographic analysis in 2D, 2.5D and 3D are being modeled in a GIS environment. But, it is useful to try to identify the types of spatial analysis needs that appear to exist in the GIS era and take advantage of several criteria that aim to distinguish between GISable and GIS irrelevant technology (Openshaw, 1991, 1994c) as we move to a GIS-DSA. With this approach, it is easy for the planner to carry out DSA and relate to other spatial databases as it will be discussed and experimented (in section 4 & 5) with the tasks and the way a planner is accustomed to using demographics. Nevertheless, let us look at GIS demographics spatial analysis into planning in the section that follows

2.6                   GIS Demographic Spatial Analysis into Planning

Before we look at study area, materials, methods, design of GIS for demographic analysis, and experimentation of GIS-DSA, it is vital to know how the developed GIS-DSA in 2D, 2.5D and 3D can be integration in the planning analysis. A planner may use different sets of techniques to support his/her activities and Arentze et al. (1996) differentiates between those techniques or models that are external to the planning activities and those that support these activities. External techniques and models typically provide information that planners can use to base their decision upon e.g. population analysis, input/output analysis, spatial choice models, etc; other techniques and methods support these activities directly e.g. design methods, multi-criteria evaluation, scheduling and time management algorithms. Here we are concerned with population; but before we go into detail, and also to better understand its integration; let us look at the developments of GIS in planning.

Planning theory is based on two types of rationality that are relevant for understanding the role of GISci and GIS technology in urban and regional planning: instrumental and communicative rationality. Instrumental (functional) rationality is based on a positivist idea, which puts information gathering and scientific analysis at the core of planning. It assumes a direct relationship between the information available and quality of decisions based on this information. Communicative (substantive or procedural) rationality focuses on open and inclusive planning process, public participation, dialogue, consensus building, and conflict resolution (Godschalk, et al., 1994; Innes, 1996). While the two theoretical stances are often viewed as competing (Sager, 1990; Yifachel, 1999), the role of information (this case demographics) is relevant to both of them (and not restricted to instrumental rationality as the more traditional view would hold). Participants in the planning process rely on many types of "information”, including both the formal analytic reports and quantitative measures and the understandings and meanings attached to planning issues and activities (Innes, 1998). Indeed, GISci and GIS in particular has contributed and will continue to contribute to the planning practice in this information age with communicative (expression with words) transcending the quantitative dichotomy. Many researchers (Arentze, 1996; Lee, 1995; Longley & Batty, 1996) have dealt with importance and application of GIS in planning and many planning models have been integrated into GIS (section 2.5). This is further evidence from GI research e.g. UCGIS research priorities (UCGIS) all applicable to the field of planning and will certainly benefit from its research.

From that we see the most critical areas of GI research that have benefited and carry the potential of being the most useful for planning practice are: 1) GIS database developments for planning-related analysis; 2) Integration of GI technologies with urban models. 3) Building of planning support systems. 4) Facilitating discourse and participation in the planning process. 5) Evaluation of planning practice and technological impact. GIS-based research in planning spans all five-contribution areas and a variety of planning sub fields, including urban growth management, land use planning, zoning, housing, community and economic development, transportation planning, environmental issues, provision of community parks and open space, and supply of public utilities and amenities (Harris & Batty, 1993; Webster 1993, 1994; Wellar, et al., 1994). In addition, Wellar, et al. (1994) and Webster (1993, 1994) matches the scientific input required to the various stages of the planning process: a) problem identification requires description and prediction; b) goal setting, plan generation, evaluation of alternatives, and choice of solution requires prescription; c) implementation requires description, prediction, and prescription; and finally d) monitoring requires description and prediction

With such GIS applications in planning, GIS-DSA integration is possible as the developed 2D, 2.5D, and 3D GIS-DSA techniques are within GIS environment and GIS aid the planning process via incorporation one or more of the following features: Modeling procedures (Kammeier, 1999); expert systems (Edamura & Tsuchida 1999; Shi & Yeh 1999); databases, decision trees, computer aided design or CAD (Ranzinger & Gleixner, 1997; Schuur, 1994); mapping (Singh, 1999); user interfaces for public participation (Shiffer, 1992); virtual reality and World Wide Web (Doyle, et al., 1998; Heikkila, 1998); development of planning support systems and integration of GI with other technologies like hypertext, groupware, audio/visuals, multimedia, models, simulations, expert systems, etc. (Varkki, 1997; Hopkins, 1999); Evaluation of planning practice and technological impact (Knaap et al., 1998; Nedovic-Budic, 1998, 1999; Sawicki & Flynn, 1996; Talen, 1996, 1998); facilitating discourse and participation in the planning process (Craig, 1998; Harvey & Chrisman, 1998; Sawicki & Craig, 1996; Sarjakoski, 1998; Schon, et al., 1999; Talen, 1999).

For a clear picture of the contribution of GIS-DSA and its integration into the planning process, let us take into consideration, the different scales and different stages of planning (combination of those proposed by Arentze, et al. 1996; Yeh, 1999) i.e. problem identification (determination of planning objectives), goal analysis and specification (the analysis of existing situations), generating alternatives (development of planning option), information collection (modeling and projection), evaluation (selection of planning options), plan implementation, monitoring and feedback/early warning. Different functions, scales, and stages of planning need different demographics input and also make different use of GIS (Figure 2.1) and quite different types of decisions are made at each planning level and the methods of planning used will differ (Kerr, 1992).

Figure 2.1 GIS demographic spatial analysis into planning process

The first step in planning process is that of problem identification. GIS are well suited for this task and GIS-DSA comes directly into play as the problem touches the human being. With search, retrieval and overlay options allows the planner to test if certain areas meet the conditions s/he thinks are indicative of the problem according to the demographics.

The second phase concerns goal analysis and specification. Although GIS do not allow one to perform such analysis (Arentze, et al., 1996); its environment aids one to immediately define goals seeing the results of analysis and demographics play a role in goal compatibility analysis. GIS-DSA can be used in analysis of the existing situations by running spatial query, mapping, and integrating the generated demographics with other features and databases to identify areas of conflict.

The next step involves generating alternatives. Although it is still largely a manual exercise (Arentze, et al., 1996), there is a lot of research to improve it (Lee, 1995). GIS provides a suitable environment for performing this task if the problem is to find suitable locations for developing or reorganizing activities. Map overlay techniques combined with GIS-DSA and generated demographics are useful and demographics are vital as a component of overlay in developing planning options as in identifying possible options like solution space for future development, or narrowing down the space to be searched.

Once the alternatives are defined, one needs collects information on how these alternatives perform on the articulated goals and objectives. This involves prediction and projection, where GIS-DSA plays an imperative role by making it possible to carry out spatial modeling of demographic distribution; making it easy to estimate the widest range of impacts of existing trends of population and measuring system performance.

The next step of planning process involves evaluating alternatives based on their evolution scores. Adopting a view point that the planning style is subjective in that the planner generates the alternatives and sets the weights in the evaluation process, then techniques such as decision analysis, multi-criteria evaluation support this stage (Arentze, et al., 1996). Alternatively, one can choose a more rational style; in this case, the task of the planner would more typically be to formulate objectives subject to a set of conditions; employing techniques like location-allocation which GIS-DSA directly supports. For example, as service provision vary by needs and neighborhood, because of differences in local area characteristics. It therefore becomes important to establish a consistent standard upon which to evaluate different zones. GIS-DSA can assist in this task, by considering the profile of the potential beneficiaries in a catchment area and relating to the characteristics of the actual planning area considered, with GIS assisting in location analysis to evaluate different planning scenarios.

Once a decision is made, a plan is implemented. However, in planning for any new project there is need for population support before the decision can be taken to implement the project; thus it is important to know how the population of different characteristics feel about geographical locations. Using GIS-DSA, we able to carry out direct mail to specific persons, this may result in proper project location. In addition, results of GIS-DSA can be used in the implementation of urban plans by carrying demographic impact assessment of proposed projects thus evaluating and minimize the impact of development on the population.

Finally, we come to the aspect of monitoring the process. GIS-DSA helps to provide spatial change that allows the planner to assess whether the actual evolution of the system is consistent with the predictions underlying the plan. Closely related to monitoring is early warning. In this case, the system warns in advance that the evolution of the system would not be according to plan if the present trend continues. Models of early warning are often the same as the ones used predictive and evolution purpose. Hence, GIS-DSA directly contributing to the process in examining whether development is following the needs of the population, to evaluate the impact of development on the population and see whether adjustments of plan are needed.

2.7                   Summary

In this chapter we have look at the need for demographic analysis in planning and the touching issues of data aggregation, why disaggregation has not been the total practice, demographic statistical analysis and presentation techniques to draw ourselves to things to put into consideration/to be incorporated in demographic spatial analysis which included needs for GIS Demographic Spatial Analysis (GIS-DSA). Finally as a GIS demographic spatial and modeling to generate visualizable features and quantities for planning analysis, we looked at a model to integrate GIS-DSA in planning and the pros and cons were documented, showing where and how demographic analysis can be employed in planning; making the whole process feasible. This cleared a way for analysis, but before that, we have to look at the study area and design of GIS-DSA in the following chapter.


Chapter III


3.1                   Introduction

A review of the GIS and demographics in Planning and methods of analysis was undertaken so that we get hands on what have been done, how it have been done, the outcomes and their application and integration to help in GIS demographic spatial analysis (GIS-DSA). This chapter covers the study area, resources used in the experimentation, how the data was obtained, stored in a Database (DB) including the DB design, geocoding, usage of data in GIS, carrying out GIS data analysis and modeling, evolving Demographic model (DM) in GIS, and ways of GIS demographics visualization. As those will be examined, we will also be outlining uncertainty that maybe involved in GIS-Demographic analysis. It will end by looking at dimensionalities of GIS that can be used to accomplish GIS-DSA.

3.2                   Study Area

The study area was heritage area found in a high-density residential and commercial area in Georgetown city, Penang, Malaysia (Figure 3.1).

Figure 3.1 Location of the study area

Target (red area on the left) indicates Penang (state of Malaysia) and box A on the right shows location of heritage area in Georgetown (city in Penang). See Figure 3.2 for expanded visualization of A.

The study area is bounded by Kampong Kolam road on the north and northeast, Lebuh Pantai on the south, with Lepuh Armenian and Lebuh Acheen running through it north to south, and Lorong Lumut, Lebuh Cannon, and Jalan Masjid running east to west (Figure 3.2).

Figure 3.2 Location of roads, land use and buildings in study area

Brown represents social activities, pink for commercial, blue for open space, gray for utilities and light green shows buildings


The area has many historical and important features, which include: The Khoo Kongsi, the headquarters for the Chinese clan called Khoo; which was first built in 1835 and destroyed by fire in 1884, but rebuilt later in 1902. The Malay mosque called Lebuh Acheh mosque is the oldest mosque on the Island built in 1808. The syed Mohammed Al-Atas residence built in the 1860s being a landmark for the early settlers on the island, Cannon square, one of the traditional Chinese settlement, and others. The area has a layer of urban form built from the 17th century; this delineates the heritage city of Georgetown. The quality of its urban form lie in the unity of elements which are related to individual buildings, clusters of buildings and spaces. This unity expresses the melting pot quality of the city multi racial, multi culture, multi influence and multi institution society. In addition, the landscape and architectural forms play an important role in the spatial arrangement of the buildings, which are almost equally spaced, and having mostly two floors (Figure 3.3) this makes them ideal to be used as spatial units of analysis as discussed in section 3.7.3.

Figure 3.3 Buildings in heritage area displayed in 3D

Dark brown shows building with one floor, chocolate -two floors, blue - three floors and brown - 4 floors per building.


3.3                   Resources used in the Study

The following were used: ArcView GIS version 3.1, ArcView Avenue (customization and application development for ArcView) programming environment, ArcView Spatial Analyst, ArcView 3D Analyst, SpaceStat Extension, IDRISI for windows, and customized extension (Demographics Analyst) developed in this study (section 4.2.2). All for GIS analysis, modeling, and to provide a graphical user interface (GUI) for direct interaction to view and edit geo-feature objects. Microsoft Access 2000 as a relational database management system (RDBMS) to store the data so that we are able to compare the compatibilities and take advantage of other analysis outside GIS. Data was retrieved from RDBMS to be used in GIS, statistical analysis and visualization; others packages used included S-Plus 2000 professional and SPSS 9.0 for windows for statistical analysis. The link between these software packages was done using Microsoft Open Database Connectivity (ODBC) and other import and export functions within these packages [9]. The references[10] used include books and periodicals, lectures, seminars and discussions, software packages, and the Internet sites about GIS, planning, demography, computer science and population all on www[11].

3.4                   Data used in the Study

To accomplish the objectives using the outlined methodology, the main information requirements were: 1) The cadastral GIS of the study area, which already existed in GIS format obtained from Assoc. Prof. Dr. Lee Lik Meng[12]. This was included, as most data collected by local governments are associated with properties and recorded by address. With it being part of the DB, make it easy to relate with other records like physical features, buildings, housing codes, and many other aspects taken into consideration during planning. 2) Land use: shopping points, housing, recreation, etc obtained in GIS format from Penang state planning office (Jabatan Perancangan Bandar dan Desa) Penang, Malaysia. 3) Population: two data sets were used i.e. the Penang population census data (Table E.2) that was obtained at mukim level from Malaysia population and housing census of 1991 collected by Statistics Department of Malaysia. The other was micro level data (Table E.1) collected in study area using a questionnaire (field survey form in Appendix D) carried out in October 1999 with other two master students Jennifer Tiong[13] and Chong Chee Kit[14]. During the survey, a sample of 343 persons was interviewed. This was selected basing on location (building) with the aim of obtaining data on every household. The response was good in that we managed to get the targeted information and for the household members who were absent, the information was obtained from those present. The following information was extracted from the survey: name, age, education level, religion, building number, household member relationship to household head, occupation, origin of accentors, marital status (grouped into single or married), number of children, gender, race, and households. With such information and means of collecting the data, it is enough to be termed micro data (Ma, et al., 1997) and has all the necessary details for a disaggregated GIS-DSA. However, other details like the names are hidden from the display tables for privacy reasons but included in database for analysis purposes. 4) Buildings: floor space, number of floors, ownership, building location, area, etcTable E.3 and Figure 3.3). This was collected together with population data as we divided ourselves into two groups; one for personal data and the other the physical structures, infrastructures and buildings.

3.5                   Attempts to carry out Demographic Spatial Analysis

Before we can see GIS trend in Malaysia and how GIS-DSA preparation was done, let us try to carry out DSA by other techniques and software packages. The packages tried include SPSS for windows, S-Plus, IDRISI for windows and SpaceStat Extension to ArcView.

3.5.1        SPSS for Windows

SPSS for windows is a computer program (computer software) for statistical analysis. It has several analytical functions that can be used to accomplish demographic analysis, which include: Descriptive which gives univariate summary statistics (sample size, mean, minimum, maximum, standard deviation, variance, range, sum, standard error of the mean, and kurtosis and skewness with their standard errors) for several demographic variables in a single table and calculates standardized values (z scores). Another set of functions are statistics measuring either similarities or dissimilarities (distances), either between pairs of variables or between pairs of cases. These similarity or distance measures can then be used with other procedures, such as factor analysis, cluster analysis or multidimensional scaling, to help analyze complex data sets. For example, it is possible to measure similarities between population sets based on certain characteristics, such as age, sex, race, size, etc. the result helps to gain a sense of which population sets are similar to each other and which are different from each other. With SPSS, we are able to carry out Linear Regression which estimates the coefficients of the linear equation, involving one or more independent variables that best predict the value of the dependent variable; for example, to determine whether the location of residence of person is related to the ethnic group. Scatterplot indicates that whether the variables have either positive or negative linearly related.

The above SPSS functions and several others like Bivariate Correlations, Cox Regression Analysis, Crosstabs, Canonical Correlation, Curve Estimation, Analysis of Variance (ANOVA), Discriminant Analysis, Factor Analysis, Frequencies, General Loglinear Analysis, GLM Multivariate, Hierarchical Cluster Analysis, Kaplan-Meier Survival Analysis, K-Means Cluster Analysis, Life Tables, Logistic Regression, Logit Loglinear Analysis, Spearman Correlation Coefficient, Variance, etc; all these explained under SPSS for windows[15] on the thesis web site. They give aspatial demographics that cannot be directly related to individual locations and they can be applied in spatial analysis to generate visualizable demographic features and quantities for planning analysis.

3.5.2        S-Plus 2000 Professional

S-Plus offers Basic Statistics, Regression functions, linear and non-linear nonlinear regression and minimization, Mixed Effects Models, Classification, Nonparametric regression, ANOVA analysis, Smoothing and Interpolation, Multivariate Analysis, Cluster Analysis, Time Series, generalized linear models, generalized additive models, tree models, smoothing splines, survival analysis, multiple comparisons, mixed-effects models, survival analysis, quality control and discriminant analysis and much more. In addition to classical statistical techniques, S-PLUS offers exploratory graphing techniques in 2D and 3D (from histograms to bar charts to Scatterplot) to help you as you dig deeper into your data, to discover hidden trends and relationships that get lost in summary statistics and text output. For example, by selecting demographic data points on graphs you are able to see them highlighted across all graphs and in the data set. Also enable to redraw graphs without selected outliers, explode graph panels to view subsets of data, and easily view pop-up descriptions of data values. All those are well explained under S-plus 2000 professional help under S-Plus 2000 Professional [16]on the thesis web site. As you can notice in the analysis, there is no reference made to the geographical locations that is very important for DSA.

3.5.3        IDRISI for Windows

IDRISI is produced by the Clark Labs[17], a non-profit research organization within the Graduate School of Geography at Clark University. Activities undertaken by the Labs include: the development, distribution and support of the raster GIS and Image processing software IDRISI and the vector digitizing and editing software CartaLinx.

IDRISI is primarily a raster system with vector capabilities for display, conversion and linking to a database management system. All analysis, except for some simple querying of the database is performed in raster. Facilities to translate data from vector to raster and back again are included in the Reformat/Raster-Vector conversion menu of IDRISI. As explained under IDRISI for windows[18] on the thesis web site, IDRISI has rich surface analysis capability that can be used for GIS demographic surface analysis as will be dealt with in section 5.5, but they are not oriented to generating visualizable demographic features and quantities for planning analysis.

3.5.4        SpaceStat Extension for ArcView

As explained in section 2.5.2 various approaches have been proposed which include integration of spatial analysis methods and other models in GIS that has led to a new exploratory analysis (Goodchild, 1987; Haining, 1990; Fotheringham & Rogerson, 1993; Openshaw, 1994c; Openshaw, 1997). The SpaceStat Extension for ArcView was developed by Anselin, et al (1998) and it currently being distributed BioMedware Inc[19]. It aims at incorporating statistical functions in ArcView GIS. It is based on the tight coupling approach. This is different from the loose coupling one by Anselin, et al. (1993) were combined SpaceStat, a program for the analysis of spatial data with the Arc/Info and the stand alone SpaceStat.

With SpaceStat using Moran scatter plot, we are able to carry out Moran analysis where the result comes in a new view with a unique value map with four colors corresponding to the four quadrants of the Moran Scatterplot of a selected variable. Using Box Map, we are able to create a new View with a quartile map for a selected variable with the outliers highlighted (a box map). LISA Local Moran Map is responsible for creating a new View with a unique value map for those locations with a significant Local Moran statistic. Moran Significance Map is a combination of a Moran Scatterplot Map and a Local Moran map, showing the quadrant of the Moran Scatterplot only for those locations with a significant Local Moran statistic. G-Stat Map gives the same as LISA Local Moran Map but for the Gi or Gi* statistic.

As noted from the function are directly at accomplishing statistical; they are not directed towards generating demographic features and quantities from the planner’s point of view and the difficulty the planner often experiences in understanding what the results mean in relation to planning analysis.

3.6                   GIS in Malaysia

After looking at the study area (Georgetown city, Penang - Malaysia), resources to be used, data to be used in this study and attempts to carry out DSA by other techniques and packages; before we can go to how GIS-DSA preparation was done, let us take a look at the GIS developments, activities, research and which have/are taking place in the country so that we get to know this research’s possible contributions and applications in addition to the general planning applications as discussed in section 2.

Because GIS are designed as a generic system for handling any kind of spatial data, they have a wide range of applications in urban and natural environments, such as urban planning, natural resources administration, agriculture, public utility network management, route optimization, demography, cartography, coastal monitoring, fire and epidemics control. In most domains, GIS play a major role as a decision support tool for planning activities. All those are applied in Malaysia, but here we are mainly concerned with GIS research and applications to demographic analysis. But before that, let us take a look at the general GIS trend in Malaysia.

To start with, let us look at Malaysia GIS Resources. There are many web sites dedicated to the use of GIS in Malaysia like GIS Malaysia[20] which gives GIS news, host GIS articles, provides link to out standing GIS web sites, etc. GISNET MALAYSIA[21], a public service site dedicated to information and news on GIS and related technologies and professions in Malaysia. There is a website for property listing using Web-Based GIS at Malaysia real net[22]. Lot parcels and base map can be bought online from department of surveying and mapping (JUPEM) web site[23]. There are maps and geographical information from the Geography Site at About[24]. There are free GIS data about Malaysia for downloading from the GIS Data Depot[25]. Free download of Penang GIS Maps[26] (Penang Island Road Network, Georgetown, Penang and Universiti Sains Malaysia Main Campus) provided by Assoc. Prof. Dr. Lee Lik Meng and Universiti Sains Malaysia.

In addition to the GIS web sites, big GIS software vendors like ESRI have set up offices in Malaysia to provide users with technical assistance which has further facilitated GIS development and usage. There are many companies helping and developing GIS packages like Landsoft sdn bhd[27]. NaLIS[28] which is coordinated by the Ministry of Land and Cooperative Development is now trying to solve issues and policies related to data infrastructure, data standards and data sharing between each agencies.

GIS has been applied in Malaysia by many companies and organizations like CH2M Hill; which was started during the company's involvement in the telecommunications arena to install a fiber optic system in Malaysia. They employed GIS to identify the optimum system configuration for a hybrid fiber optics cable (HFC) system. Also, set up ARC/INFO NT to maintain the Malaysian GIS center as a long-term installation. Now they CH2M Hill's Penang office supports other company projects that require data conversion. Maps are scanned, placed on the company's FTP server and pulled off in Penang. There (Colorado, USA), the GIS center handles heads-up digitizing, cleanup and attribution. Then, the Penang center places the polished ARC/INFO files back on the server. These files are then downloaded by CH2M Hill's other GIS centers.

GIS is being used, especially in the planning processes and the application of best practices for land management and habitat security for elephants and rhinos in Malaysia. The project is the Sabah component of AREAS (Asian Rhino and Elephant Action Strategy), a WWF initiative to coordinate Asian elephant and rhino work in their range states through a strategic approach (WWF in Malaysia[29]). GIS has been applied in National Land Information System (NaLIS[30]) to assist in Planning and Development in Malaysia. The Government of Malaysia using GIS developed the River Basin Information System[31] (RBIS) in 1998, through technical cooperation of Japanese International Corporation Agency (JICA[32]) to support the river basin management for Perak river basin. The RBIS gives access to various records and statistics related to the river basin management through the system. GIS was applied by Perunding Utama[33] in collaboration with Iris Environmental Systems that were commissioned by ESSO Malaysia Berhad in 1993 to prepare a report and ESI maps for the coastline of Negeri Sembilan, part of Selangor and most of Pulau Pinang. The project covered a coastline distance of about 137 km. The study highlighted the need to modify the classification system to take into account differences in coastal geomorphology as dictated by changes in tide conditions. GIS has also been used by the same company in urban planning studies, such as the Structure Plan for the District of Yan. The study involved the digitizing of cadastral, coastline, natural and man-made drainage systems, infrastructure and other information for the production of maps. Maps showing land use, river systems, coastline, drainage and infrastructure features were produced which were used in planning of future development in Yan District. GIS has also been applied in “The Ecosystem Approach To Environmental Management” under LESTARI’s Langat Basin study. The Langat Basin lies adjacent and to the south of the Klang Valley, Malaysia’s most highly developed urban Centre where the nation’s capital, Kuala Lumpur, is situated. It has an area of approximately 2200 km2 and a population of around 725,000 people. The new administrative capital, Putra Jaya, Cyberjaya, the Kuala Lumpur International Airport (KLIA) in Sepang, the multimedia super corridor (MSC) and a high technology park are all located in the Basin. GIS been employed in utility provision like the SAT NLSA for the Largest GIS-Substation in Malaysia[34]; a project is for Tenaga Nasional Berhad (TNB) by VA TECH SAT Malaysia. Other GIS projects in Malaysia include: the Pulau Langkawi GIS hydrology database[35], GIS Applications for Dumping Site Selection in Pulau Langkawi-Malaysia (Yagoub, et al), Perak Town map[36], etc.

Many presentations have taken place like ESRI South Asia Users' Conference and different researcher have demonstrated and laid foundation for the use of GIS. A good example is the Penang Experience by Lee (1997) in Creating Large Digital Maps for Municipal Planning Applications Using Desktop GIS.

3.6.1        GIS-DSA Research and Applications in Malaysia

Various researchers in Malaysia have/are looking at demographic analysis using GIS like Ruslan, et al. (2000a) have look at the use of GIS to assist in understanding the spatial variation of racial segregation. Also, they have employed GIS to examine the trend and spatial pattern of population density and growth in peninsular Malaysia between 1980 and 2000 (Ruslan, et al., (2000b). Yaakup, et al (1994) have described a GIS approach to spatial modeling for squatter settlement planning in Kuala Lumpur, Malaysia

There are research groups like GeoData Program[37] under school of Humanities, Universiti Sains Malaysia; where demographic analysis in GIS is among their areas of application research. They are other GIS demographic analysis project like the “Internet GIS for Malaysian Population Analysis”. This provides ways to understand population characteristics in Malaysia. It is an Internet GIS (Web GIS or Web-based GIS) created to analyze demographic statistics. This system creates choropleth maps that have the functions of "Zoom In", "Zoom Out”, graph displays, attribute query and basic spatial analysis, etc and it is accessible over the Internet[38]. Other Internet GIS developments include Penang population[39] analysis on the web. Also, the Department of Statistics Malaysia (DOSM), being the leading government agency in the collection, compilation and dissemination of national and state level statistics, has accordingly taken appropriate measures to use GIS. As part of the implementation of the GIS under the 1991 Population and Housing Census Project; there has been the creation of the Cartographic Database which involved the capture of geographic data pertaining to census geostatistical and administrative unit which was completed in 1995. With the completion of the creation of the database, it was then possible to generate thematic maps as well as produce spatial data according to the requirements of users. The Census Atlas, released in 1996, represented one of the main products arising from the application of GIS. The maps presented in the Census Atlas covered a variety of topics namely population size and composition, marital status, migration, education, economic household and housing. These maps were produced by the ArcView Version 2.1 GIS software on workstation using the inkjet plotter. Apart from the Census Atlas, the Department has also found GIS to be extremely useful in meeting the special needs of data users. In this respect, GIS applications have been put to good use in cases where data is required to cover certain ad hoc areas by radiating from a point or specified distance (band) away from a selected feature. For example, the Department has been able to meet requests speedily for data on population, households, housing and other related characteristics for areas within a certain radius from given geographical locations as specified by the data users. Population data in terms of parliamentary constituencies have also been generated using GIS.

From the above you can see the GIS has been applied by both government and private organizations in different fields all requiring demographics which makes sense and necessitates for the development of techniques to generate visualizable demographic features and quantities which can employed in different ways and applications as outlined in section 6.5 under concluding remarks.

3.7                   Database for GIS Demographic Analysis

After collecting all the above data, it was put into a database (DB). This was organized according to individual levels to maintain the micro data, at the same making facilities for storage of aggregate data. In this study, used a relational database and to avoid repeating research about designing a relational model and DB structure see thesis by Chong Chee Kit available on internet[40] which dealt with DB using the same data. Also, for information about the steps in design, qualities of a good DB design, and the E-R modeling process see Cowen (1997). Below discuss the connecting the DB with GIS layers and geocoding; before that, let us look at factors put into considerations to help in disaggregated modeling and analysis including uncertainty in GIS-DSA

3.7.1        Uncertainty (Fuzziness) in GIS Demographic Analysis

As DB is the bases for experimenting the techniques of GIS-DSA in 2D, 2.5D and 3D, it is important that we know the uncertainties involved. Uncertainty is the degree to which an information source does not fully inform i.e. imperfect knowledge regarding aspects of a model i.e. the discrepancy between geographic data in GIS, and the geographic reality that the data are intended to represent (Chrisman, UCGIS). Uncertainty as a general topic concerns many disciplines from statistics to philosophy but the focus here is to mobilize the results of these more generic efforts to the specific topic of GIS-DSA. A large amount of the uncertainty in GI comes from fundamental choices in measuring and representing (Heuvelink, 1998) i.e. at the level of inputs, data management, analyses, model formulation, representation, and output of the model in a GIS environment. Hence, spatial analysis in GIS as Isaaks and Srivastava (1989) writes is subject to uncertainty due: 1) The datasets upon which they operate which are the outcome of a process of discretization and generalization. 2) Actual value at any particular location on the continuous surface, which is a factor of the distance that location is from the nearest data points. 3) The variation of the surface between the data points. 4) The accuracy of the data measures in the dataset. Uncertainty can be categorized as containing both horizontal (positional accuracy) and vertical components (attribute accuracy) i.e. may be due to an incorrect magnitude of demographic characteristic (MODC) at the correct location, or a correct MODC for an incorrect location, or some combination of these. The MODC is determined at data collection; thus, uncertainty due factor 4 was outside the scope of this thesis. Uncertainty due factor 1 and 2 has been carefully considered by using actual data values at discrete level in geocoding and representation (section 3.7.3) and basing interpretation on only data points. For the variation of the surface between the data points i.e. uncertainty due factor 3 has been dealt with by using two data structures (gird and TIN) making it possible to deal with irregular and regular data points in both raster and vector GIS formats. However, uncertainty exists in the location of individual members of the same households. This is because they are initially assumed to be staying at the location, but to provide better visualization, their location is slightly displaced but within the boundary of the household. This was employed as it can be evidenced from section 3.10 that visualization is an effective way of understanding spatial data (Hearnshaw & Unwin, 1994) and in the examination of spatial data error (Beard, 1994) as it a powerful mechanism for identifying the spatial distribution and possible causes of uncertainty. The explanation of uncertainty in the various techniques will be dealt with independently when discussing each technique also with the possible applications and the limitations.

3.7.2        Individual Modeling

The main concern for this thesis was disaggregated modeling where the individuals had been represented as points in DB and to be manipulated in GIS at the same level. This was to cater for future depends as the use of GIS for socioeconomic application continues to widen in this 21st century. Also, this has been advocated in the last decade (Martin, 1991, 1999) and various commentators have suggested that being in position to carry out GIS spatial and modeling at different scale is one of the solutions to overcome the limiting factors preventing effective use of GIS for socioeconomic applications. For the socioeconomic applications, the objects of ultimate interest are usually individual persons or household (Martin & Higgs, 1995); and in their commentary about scale and generalization in geographical analysis with regard to the measurement of geographical event; Longley and Batty (1996) supports the use of point data as it offers the most precise and accurate representations of spatial phenomena. In addition, spatial modeling of individuals can help us understand dynamic population level processes such as spatial similarity and habitat availability and preferences. Also individual-level databases facilitate ad hoc aggregation, allowing the design of areal units to suit analytical requirements. Goodchild, et al. (1992) has emphasized that nature of a GIS data model determines the range of analytical processes that can be undertaken and is not a sub-issue relating only to representation or display. Since the DB determines GIS data model, using individuals clearly lays ground for the success of GIS-DSA. The question at this point is the location chosen for the georeferencing of individuals. Should the data relate to home address, workplace or both? In this study used the building of residence as the spatial unit for georeferencing of individuals due to the reasons and advantages as discussed in the section 3.7.3 that follows.

3.7.3        Demographic Geocoding/Georeferencing using Buildings

Georeferencing being the process of assigning/associating a data point with a geographic location (e.g. latitude and longitude) based on some form of address; involve inputting spatial data in the GIS by assigning geographical coordinates to each point, line, and area entity (DeMers, 1997) to allow using addresses to identify locations on a map (Chrisman, 1997). This address need not necessarily be a street or mailing address, but can be any key identifier of a particular location, such as the name of a place or the lot and block number of a property parcel (see geocoding in Appendix B).

The buildings were used for geocoding as we could use the single value geocoding method and the use of building is in line with the formulated seven tests for effective analysis in GIS (Coombes, 1995), which have been interpreted in this thesis in terms of building as spatial units:

1)      Are the buildings the smallest that the confidentiality restrictions deem to be possible (to allow for maximum flexibility of aggregation). It is true; they are the smallest that can be used for georeferencing, as people are mobile and cannot assign permanent coordinates to them; thus use of buildings as they are the smallest permanent spatial entity that can be employed in analysis and buildings can be similar in terms of size, area, etc which help in aggregation analysis. The use of building for location (which can be manipulated or aggregated in various fashions) may also offer lesser threat to personal privacy for example saying in a housing project (buildings) there is 145 people within age cohort 7-14 years do not point to any specific person. But for planning purposes, this is vital information e.g. planning for location of high school; we get how many people within a certain age cohort who are possible users based on their proximity to proposed site.

2)      Is each of the areas in a set of building, or other set of areas, defined on a consistent basis. Yes in this century people at least do not normally live in the open air, they stay in buildings.

3)      Does this set of area represent (part of) ‘real-world’ entities, such as settlement, which can thus be recognized using these boundaries? Yes, buildings are real entities and they are the basis for human settlement. As Okpala (1980) concludes that, a residential building is a realistic base for analyzing population as buildings are occupied by people

4)      Does the set of areas allow comparison with previous data at all level, or for some minimal grouping of areas to create consistent boundaries? Yes, buildings allow comparison as information about development of welfare of people and improvement in settlement is obtained down from building: economic situation, demographic analysis, social characteristics basic needs, housing needs, etc (Yaakup, et al., 1994). In addition, building can serve as both a definitive linkage point between two address-bearing databases, and as a categorical or a continuous 2D variable along which imputation of data is possible and can be employed in vertical dimension analysis to aid in density (section and spatial influence analysis (section in planning.

5)      Does the set of areas cover the whole of the study area without leaving any locations whose data are too spare to allow them to be published? Yes for population analyses, buildings cover the whole area as each person lives in at least one.

6)      Are the boundaries of the areas available in the digital form? Yes, building data is always available as a complete data about buildings can be obtained from the planning office as they approve and keep record to monitor developments. In addition, it is almost universally collected in census, development care documents, associated with a broad range of both socio-economic and environmental factors.

7)      Can these areas be readily and accurately linked by their location coding to all the areas used in the many non-census datasets? Yes, buildings can be used for all areas by creating unique identifier for census and non-census datasets. 

Using buildings we have been able to say yes to all questions and for this specific study area; the buildings have almost the same planimetric area and most of them are two floors (Table E.3 and Figure 3.3); thus not biasing spatial and vertical analyses.

3.7.4        Demographic Data Representation and Manipulation in GIS

A GIS geared towards the analysis and representation of the DCs should adopt four approaches. First is the individual level approach, in which data are held relating to every individual person in the population. Second is areal aggregation approach where data is grouped according to predefined zones. The third option begins with the assumption that demographic phenomena of interest to the analysis are essentially continuous over space and attempts are to reconstruct this continuity. Lastly, representation using solids where demographics are taken as occupying 3D space. Having to put demographics in GIS there is need to look at how the different entities will be represented in order to be able to manipulate them.
Table 3.1
highlights how the DCs can be represented using the GIS primitives (points, lines, area, and polygons).

Table 3.1 Representation of demographic data in GIS






Ø       Real world entity

·         Individual persons

·         Street/road residence

·         Building/Zone of living

·         Population density

Data collection and entry

Ø       Digital object

·         Personal coordinates

·         Street coordinates

·         Building/zone boundary

·         TIN or 3D-DM

Data manipulation

Ø       Manipulation technique

·         Nearest neighbor analysis

·         Boundary generation

·         Surface generation

·         Topological analysis

·         Areal interpolation

·         Centroid generation

·         Surface generation

·         Slope analysis

·         TIN/TEN creation

·         Analysis of surface/Solid form

Data output transformations

Ø       Visualization technique

·         Point mapping

·         Multivariate display

·         Convert to 3D

·         Line mapping

·         Line cartograms

·         Convert to 3D

·         Choropleth mapping

·         Areal cartograms

·         Convert to 3D

·         Tin mapping

·         Grid mapping

·         Convert to 3D


The columns illustrate four classes of geographical phenomena namely point, line, area, and surfaces. The rows illustrate four stages in the representation process. DCs exist in the real world; georeferencing them provides the link with digital objects, which may be used to represent their locations; then GIS provides the manipulation tools for the creation of new objects; and visualization techniques are applied to each case. Between these stages are the data collection and entry, data manipulation, and data output transformations. 

3.7.5        Connecting DB with GIS

As all data was represented as tables, these tables comprised of rows and columns; rows and columns are unordered (i.e., the order in which rows and columns are referenced does not matter). Each table has a primary key, a unique identifier constructed from one or more columns. A table is linked to another by including the other table's primary key. Such an included column is called a foreign key. For individual data, each person was given a unique identifier number, which is the first column in Table E.1 and second column on Figure 3.4a – screen capture from ArcView GIS. To obtain a primary key another column called Rd_bldg (fourth column in Table E.1 or fifth column on Figure 3.4a) was created by combining a person’ building of residence number (third column in Table E.1/fourth column on Figure 3.4a, the same as second column in Table E.3/third column on Figure 3.4c) and the road/street number (fifth column in Table E.1/sixth column on Figure 3.4a, the same as second column on Figure 3.4b). The result was used as a foreign key when geocoding persons to buildings. For buildings, the primary key was created by joining the building number (second column in Table E.3/third column on Figure 3.4c) and the road/street number (fourth column in Table E.3/fourth column on Figure 3.4c). The relationship between buildings, land use, and cadastral (lots) was established using the coordinates of the polygons.

Figure 3.4 Linking tables by creating primary and secondary keys

When it came to the aggregated population from the census, the mukim number (second column in Table E.2/fourth column on Figure 3.4d) and district number (first column in Table E.2/second column on Figure 3.4) were combined to get the primary key, which was used to geocode the population to the mukim.

3.8                   GIS Demographic Analysis

3.8.1        GIS Approach to Spatial Modeling

GIS possess the means to capture, store and manipulate spatially referenced data (Burrough, 1986, Huxhold, 1991), thus the use of GIS for modeling is mainly aiming at incorporate geographic space as a factor in the model. This has been handled by many researchers including Lee (1995) who discusses details of GIS and spatial interaction modeling in chapter four of his PhD thesis. With such spatial modeling and data manipulation in GIS and GIS’s effect on spatial statistical analysis, which has led to broadening of process of hypothesis testing; provides easy ways for GIS-DSA in much more flexible as it can be noted from the Figure 3.5, modified from Getis (1999). A step has been added to the traditional approach of hypothesis-guided inquiry, and most steps have been expanded to include more opportunities to access data from different vantage points. The added step, data manipulation, presents planners with opportunities to use larger samples, view data over a series of map scales, and generally to be in a stronger position to carry out spatial analysis of demographic data.

Figure 3.5 Traditional and GIS approaches to demographic analysis

To accomplish GIS-DSA, the following components as shown in Figure 3.6 have been identified

Figure 3.6 GIS demographic analysis procedure

Geocoding/referencing was dealt with in section 3.7.3; GeoProcessing is a way to create new data based on themes in a view. In most cases, alter the geometric properties of the features in a dataset while controlling some aspects of how its attribute data is handled (ESRI, 1998) as explained under Geoprocessing in Appendix B. Linking attribute utilizes the linkage with tables letting us to work with data from a tabular data source in GIS. Then, this data from the tables is added to maps, and symbolize, query and analyze this data geographically (see linking tables in Appendix B). During GIS-SA the inputs and results can be visualized using the methodology in section 3.10. GIS, modeling, and visualization need to operate together in an interactive way, i.e. there are interrelated stages of analysis (determination of features), visualization (checking results of analysis and to allow interaction), interpretation, and modeling requiring choice of points as a basis for the model and allows measurements, studies of change, simulation, and deriving features. For example, a 3D-DM may be modified by model manipulation procedures. It might then be displayed by visualization procedures, or analyzed through interpretation functions. Visualization and interpretation in turn may require or support further modification or adaptation of the original 3D-DM. Thus, results of individual/various modeling steps may feed back into previously run procedures. Whereas the methods of Analysis, Visualization and Modeling are generally used independently of each other, here have used them cooperatively to improve the results and make the process of deriving the demographic features and quantities easier and faster. Ideally, the modeling will feed the visualization, which in turn influences the human operator who can then change the modeling parameters; but during such process there may be need to bring in external data which necessities evolving the model.

3.8.2        Evolving the Demographic Model

Like any good GIS term, data integration can mean many things to many people. In this thesis, it refers to the process of adding more data and combining data from different sources into an existing GIS-DM. To do so, the new data must be related to features in developed GIS-DM some way. The relationship will be either: explicit where the data further describes the features being modeled, this involves providing additional attributes to the feature; or Implicit if the data further describes some attribute of the features. For example, an individual may have an attribute called ‘male’ with values ‘active’, ‘healthy, etc. If additional information is obtained on those attributes e.g. on “healthy” it may be that the person do not have viral diseases; these implicitly become attributes of the male.

To achieve the above, the new data must have a common key with the existing data. A key is defined generically as any piece of data that identifies a particular feature or row of data; they can be categorized as spatial or non-spatial. Spatial key is one where two features are considered related by virtue of occupying the same location in space. This could be a point location (x, y in 2D or x, y, z in 3D), a linear segment (having the same start/end point with respect to some reference location on a line) like demographic boundaries, an area (occupying the same polygonal extent), or volumetric (occupying the same solid extent). Non-spatial key is where the common attribute can be thought of as database column i.e. one where some common textual or numeric attribute value is used to identify which rows from different tables to join.

Combining the keys (spatial or non-spatial) with type of relationships (explicit or implicit), we get four properties of different key types:

1.      An explicit non-spatial key will be the primary key or some other unique key of the feature and a corresponding key in the new data. This may be as straightforward as building numbers, like in this thesis where geocoding ID had to be created by combining the building number with the road/street number to create a unique ID (see Table E.3) or it may involve some work like performing address matching when the common keys are addresses.

2.      An implicit non-spatial key: the additional data describes some attribute of a feature rather than the feature itself. In this case, the additional data would be a lookup table with the key field in the existing data acting as a foreign key into the new data table. The previous example of male illustrates an implicit non-spatial key.

3.      An explicit spatial key: the additional data includes a location (1, 2, or 3D) to be matched with the existing data by proximity e.g. relating land use to buildings.

4.      An implicit spatial key: the additional data spatially surrounds the existing data. The attributes of the new data become associated with existing features by virtue of the fact that the existing features are spatially ‘inside’ the new dataset.

Through the above data integration, we get the following advantages and will affect the model in the following ways: 1) Provide additional dimensionality to data: adding data for example describing male to DM of ethnicity allows for visualizing, analyzing, modeling a variable in terms of these characteristics. 2) Enable analyze relationships between additional features: for example, bringing in size of households locations to a DM of ethnicity to analyze how households vary with ethnic groupings. Can also generate and analyze inter-related data sets from different sources like administrative records, censuses, field surveys, and survey rounds. 3) To facilitate in its application, for instance evolving DC with planning features then integrating it puts us in position to analyze relationships between additional features and analyze correlation between different data sets. The result is that the analysis process of demographics is enriched with information available and gaps in any particular data set are filled in.

3.9                   Visualization and Presentation Techniques

Presentation techniques cover different approaches to the visualization and presentation of demographic datasets. These approaches are designed around the concept of data spaces, and present the user with a series of tables, pictures, or graphics; each describing a data space, and with tools to explore the synergy between data spaces. In most cases, this means mapping the distribution of people in terms of their place of residence. This has been accomplished in many ways: using data tables, data pictures and graphics (symbols for location e.g. circle, cubes) (Hornby, et al., 1984; Witherick, 1990; Indiana State University web site[41]). Before we go on to GIS demographics visualization, let us look at the traditionally used techniques of choroplethic maps, cartograms, and Lorenz curves; this is necessary as demographics has been presented by these techniques.

3.9.1        Choropleth Technique

This involves mapping the distribution in terms of the relationships between the numbers and the area, a measure usually referred to as population density (Figure 3.7a). The disadvantage with this; is that none of them relate to the actual areal units to be used in the analysis and maps represent average values for the chosen areal units (Witherick, 1990). The spatial aggregations according to the imposed arbitrary zones tend to persist and indeed, they often dominate subsequent analyses (Hearnshaw, et al., 1994). In addition, their interpretation may give false impression; this happens particularly where adjacent areas show significantly different density values; the map indicating quite erroneously marked break in the continuity of population distribution occurring along the boundary between two areas, when in fact there is a smooth transition in densities. There is also some technical problems encountered in the compilation of population density maps i.e. how many different classes should be recognized and how should these classes be delimited? In addition, as Witherick (1990) put it, population density per unit area is a very crude measure, as it does not put into consideration the inhabitable land (areas) and other physical considerations. Another shortcoming of this method as it is based on areal units (zones); is that of variable size of the spatial units and also the implied assumption that all attributes of a zone have uniform spatial distributed throughout the zone (Wegener, 1999); not accounting for the topographical relationships and ignoring the fact that DCs and activities are continuous in space.

3.9.2        Cartograms

Cartograms are special purpose maps or diagram showing geographical statistical information (Dorling, 1994) used to illustrate some feature other than area. The cartographer (map-maker) tries to keep the features in the same relative position and shape as they would be on a real map, but the size for example of each country is distorted according to how large or small the statistic is for that particular category of information. For example, on a cartogram for oil production, Saudi Arabia would appear to be the largest nation on the face of the earth, and tiny countries such as Bahrain and Kuwait would appear to be very large as well. The visualization of demographic data by use of cartograms is where a particular exaggeration is deliberately chosen in maps (Figure 3.7b). In this context, the term cartogram should be taken to mean equal population cartogram. This distinction needs to be made because ordinary maps are in fact a form of cartogram based on equal land area.

Although a population cartogram is an appropriate basis for seeing how something is distributed spatially across groups of people, such cartogram is not a distortion of the world, but a representation of some particular aspect of it (Dorling, 1993, 1994), but makes spatial analysis difficult, as some details cannot be retrieved making them inappropriate for disaggregated population spatial analysis. In addition, it makes it difficult to judge density, as the appearance of a cartogram of the same population will vary from one person to another. The scale of visualization becomes local; the quality of representation of the demographic data and its spatial distribution begins to break down (Bracken, 1994) making interpretation of results dependent on scale of visualization.

Figure 3.7 Comparison of choropleth mapping and cartogram

The subdivisions represent mukim in Penang, Malaysia. Red representing 0-5662, green 5663-18339, blue 183340-45822, purple 45823-107263, and yellow 107264-207287 persons per mukim according to 1991 population of Malaysia (Table E.2)


Cartograms differ from traditional maps as they use a variable other than area to derive the size of areal units on the map. A major draw back of using traditional map based cartographic representations for portraying human based socio-economic information is that areas with high populations and high population densities, e.g. cities are displayed very small on maps. Traditional maps therefore tend to highlight patterns in the least important areas, i.e. where few people live. In contrast, cartograms represent areas in relation to their population size. As a result, patterns are displayed in relation to the number of people involved instead of the size of the area involved. Example Figure 3.7 clearly shows how using a cartogram can give a vastly different impression of overall trends.

3.9.3        Lorenz Curves

 Lorenz curve is a graphical method widely used to show variations in demographic concentration (Figure 3.8). The drawing of the Lorenz curve proceeds as described by Lingner (1974) and the cumulated percentages of area of the zones are plotted against cumulative percentage of population on a graph and points joined together to form a 'curve' (Figure 3.8). If the zones have similar densities, this curve will follow the diagonal, indicating an even distribution of population (concentration of population) throughout the total zone concerned. Normally, of course, there is considerable deviation from this and more 'bowed' the Lorenz curve, the greater is the unevenness of distribution of population. If the population were distributed with inequality, the curve would coincide with the x-axis (Lingner, 1974).

Figure 3.8 Illustration of measure of concentration using Lorenz curve

(From Lingner, 1974. A handbook for population analysts Part A: Basic methods and methods. Population concentration in USA 1950.)

Gini Coefficients (henceforth, Ginis) are often used to summaries Lorenz curves. Consider the Lorenz curve in which the cumulative percentage of tracts is plotted against the cumulative percentage of the elderly. The Gini compares the area L, between the diagonal (signifying equal distribution) and the Lorenz curve, to the entire area under the diagonal, T. as L/T approaches zero (1) the population under study is more (less) equally distributed (Goodman, 1986). Lorenz curve has the limitation that is can only be used when the information about the areas is available in equivalent zones/units (Hornby, et al., 1984 and Witherick, 1990).

3.10               GIS Demographics Visualization

Visualization being to make visible what was obscure, what could not easily be imagined or seen as GIS models contain a lot of information that remains locked up inside without proper rendering. User expectations concerning ease of use and clarity of interpretation have increased to the point where it is expected that analysis tools convey the impacts of various management plans[42] using techniques conducive to instant understanding and meaning. Visualization techniques have proven valuable in the presentation of analysis results (Church, et al, 1994). These techniques can be crucial in supporting users in gaining new insights into the structure of their problems by generating different views of the situation and by exploiting their own visual skills so that they can recognize meaningful alternatives and strategies during the problem-solving process (Angehrn & Luthi, 1990).

GIS-related visualization is described as the interface of three processes: 1) computer analysis (data collection, organization, modeling and representation); 2) human cognition (perception, pattern identification and mental imaging); and 3) graphic design principles (construction of visual displays). MacEachren (1994) further describes the concept of "geographic visualization" as stressing map use that can be conceptualized as a 3D space. Both descriptions of visualization imply that maps and associated images can now be constructed that incorporate 3D perspectives. This ideally suits GIS Demographic Visualization (GIS-DV) to contribute to the techniques of GIS-DSA. In so doing provide solution to overcome some of the problems inherent in DSSA (section 2.4) that require the analyst to know beforehand precisely what s/he is looking for (Dorling, 1994) Although we can use the dot maps in which persons are represented spatially by dot, has limitation that the spatial dots do not carry along with them their aspatial aspects, the others being how many people should be represented by a single dot, where should it be placed, and what should be the physical size of the dots themselves (Hornby, et al., 1984). This calls for GIS-DV where we are able to see simultaneously the detail and the whole dataset.

The purpose is to highlight the potential for exploring a demographic database by means of visualizing spatial patterns to enable detect spatial clusters or clearly delimited subsets that may point to substantive processes that have generated the pattern, or which may indicate alternative variables that may provide insight into the issue of interest. We are concerned with questions like how demographics are varying as change localities; how neighborhoods are spatially interacting with each other. These questions cannot be answered by conventional quantities techniques because the answers are unlikely to be simple enough to be presentable in tables or questions. Pictures are needed to show how different places and holistic patterns need also to be seen without generalizing out the detail (Dorling, 1994). Here the visualization should be in such a way that analysts choose what they wish to see and how they wish to view it.

3.10.1    Demographics Visualization Methodology

As geographers develop and embrace visualization coupling it with GIS called geographic visualization (Buckley, 1999) many techniques are utilized which have been developed and presented in literature by several people including Cleveland (1993), DiBiase, et al. (1994), Dorling (1994), Robinson, et al. (1995), and Buckley (1999). In this thesis, we are interested in ones that involve some form of georeferencing as a way of representing the spatial aspects of demographics. Methods to enhance visualization include the ability to rotate, set lighting, zoom, use cross-sections, linkages and various transparency values.

This methodology is to initially map demographics as color-coded based on the level and type of characteristics and ways of representation (section 3.7.4), then employ the techniques of Cross-variable mapping, Dynamic displays or Dynamic Data Visualization, Multiple displays and multimedia interaction, Surface and multi-dimensional displays, Superimposition, Composite indices or Dimension reduction and symbol segmentation. These are being employed in combination (see Figure 3.9) then adding new perspectives taking nature of demographic data, advantages of techniques and the balance between readability and utilization.

Figure 3.9 GIS demographic visualization   Cross-Variable Mapping

This method is limited to either two or three variables as the number of classes the human eye can distinguish is limited (Buckley, 1999). Bivariate mapping is used to simultaneously depict magnitude of variables within a homogenous area for two map themes (Robinson et al., 1995) and trivariate mapping is used to show three variables in the same way. This will be utilized with results of demographics ESDA in 2D (section 4.2).   Multiple Displays (Multimedia Interaction)

Multiple displays can be generated in either constant or complementary formats (Buckley, 1999). Constant display being a series of displays with the same graphic design structure that depicts changes in DC from multiple to multiple. For example, showing how age composition vary from race to race and according to location; the consistency of design ensures that attention is directed towards changes in the characteristic; this helps to analysis DC according to structure make up and in relation to their location. The results of multiple displays are either taken to dynamic displays (section or multi-dimensional displays (section

Complementary formats helps in combining the DCs display with other formats; this is when multimedia comes in. As the name suggests, multimedia, is a term describing a computer system, which employs multiple media; it has been applied with different meanings in different contexts. It has entered common language and is used when describing publications, presentations, television, communication, computer games, and information systems. Multimedia is sometimes restricted to systems, which employ a wide range of technology. Buckley (1999) notes computer manufacturers, in particular, as well as some writers in the popular computer press, have a tendency to claim that a system must offer live video and audio, in synchronization, in order to qualify as a multimedia system. Although video and audio are powerful media, which are natural components of many user interfaces, their presence should not be the defining characteristic of a multimedia system. Such a technologically driven definition only serves to confuse users and designers alike (Buckley, 1999), and the potential for exploiting the technology to the best possible advantage for the end users may suffer as a result. Such restrictions will therefore not be implicit in the definition of multimedia in this thesis. It follows from this that a word processor, for example, which allows drawings and text to be integrated in a document is a multimedia system. This study uses GIS packages like ArcView GIS as multimedia systems, to integrate demographics with photographs, text, plots, tables, images, graphics, and other formats for display data. This multimedia approach is extended to hypermedia by linking the multiple channels of information either transparently (Buttenfield, 1996, p.466) or in different windows, which do not obstruct the main window and the main window controls their display.   Surface and Multi-Dimensional Displays

Portraying the demographic surface in a single 2D view  (whereby real world phenomenon is projected in Euclidean space in either vector or raster formats) always leads to some ambiguity or incompleteness. The use of multi-dimensional displays (2.5D and 3D) where each dimensional is used to depict one (or more) DCs helps to overcome this problem as the perception of objects on a flat map or computer screen is a key component to the full realization of their form. For maps, cartographic symbols (for 2D map legends) have been proposed to show volumetric features on a map (Kraak, 1992); they also offer interesting opportunities for exploring abstract data that are not available in two dimensions.

Here MODC is used as the height i.e. expressed as volume; the aim to show how it varies across the study areas and use the 3D facilitates of pan, zoom, interacting tilt and rotation to change the viewing perspective. There are a number of factors to keep in mind when using multi-dimensional displays. A common rule of thumb is that the dimensionality of the display should not exceed the dimensionality of the data and because elevation of the surface in one location may obscure another location, varying the perspective should be used when more than one dimension are displayed (Buckley, 1999). If change in perspective cannot be achieved using multiple or dynamic displays, the use of the technique may be limited. MacEachren (1995) cautions against inappropriate use of realism in multi-dimensional displays reminding us of that realistic representation tend to convince the user that the information on the map is “real”; when, in fact is all maps are abstractions of reality. For example, it is not right or misleading to interpret the MODC of all the developed 3D images in relation to the spatial extent, as all heights have been exaggerated by a factor of four.

To aid multi-dimensional visualization, color illumination or shading of objects is a powerful cue to the 3D structure of an object (Kraak, 1992). Further, users of multi-dimensional data have to be able to peel back layers, slice edges, and zoom in and out of the scene. Another common function required in multi-dimensional systems is the ability to scale data along the axis. Typically, this is performed on the z-axis (Kraak, 1993), by doing this the user can exaggerate MODC.

Demographic data can be visualized as surface (2.5D and 3D) in a variety of ways where a surface is elevated to depict the MODC in relation to its geographical location. A single data layer can be viewed as a colored or gray-scale flat raster image. 2 or 3 such data layers can be combined together by the user and displayed as a single flat raster or flat raster images can be draped over a 2.5D surface representing demographic data as 2.5D surface taking advantage of transparency adjustments. 3D representations allow the user to shift viewpoint and to 'fly through' the data, which can be useful in gaining an overview of data patterns (Shepherd, 1995) and data sets can be combine for comparison purposes. 

Van Driel (1989) recognized that the advantage of 3D lies in the way we see the information. It is estimated that 50 percent of the brain's neurons are involved in vision. What is more, it is believed that 3D displays stimulate more neurons: involving a larger portion of the brain in the problem solving process. With 2D contour maps, e.g., the mind must first build a conceptual model of the relief before any analysis can be made.   Composite Indices (Dimension Reduction)

As demographic data sets are always large, and a visualization, which attempts to display too many data dimensions, will become incomprehensible to the user (DiBiase, et al., 1994). Composite indices also called cartographic modeling or composite mapping (Buckley, 1999) are created when several data variables are combined into one. Using some techniques, spatially referenced data can be condensed into fewer layers, without losing too much useful information. Multiple variables can be generalized by statistically collapsing spatial data into fewer variables using combination of links (+, -, *, /) or multivariate techniques where summary statistics can be calculated from a combination of data layers.   Superimposition

Several layers of data are often combined for example in a weighted overlay or analytical hierarchy process. Simple operations such as layer addition, subtraction and multiplication are standard options in GIS (Chrisman, 1997; DeMers, 1997, 2000). Combinations that are more complex will require an interface where the user can specify mathematical weightings and possibly fuzzy rules for combination of data layers in something more approaching a rule-based system.   Dynamic Displays (Dynamic Data Visualization)

Dynamic displays introduce an element of change in time, space or display of parameters and these are described in terms of interaction and animation (Buckley, 1999). To visualize data, a static display is not enough. Being able to rotate an object depends upon the view plane normal or viewing plane. This plane should be rotatable in x, y, and z directions, and rotation should be dynamic.

Animation is a useful technique for viewing data layers, which represent change as a result of a stochastic simulation. It is considered here as it provides multiple simultaneous views of the same event, thus allows different perspectives (vantage point, orientation, illumination), generalization parameters (classification, simplification, exaggeration), scale (extent, resolution), level of measurement, and number of dimensions (Robinson et al., 1995) of the same aspect of execution to be related to each other. Thereby, allowing the observer to gain an instantaneous insight into variation of DC.

It can be used for dynamic graphs where data are can be represented by means of multiple and simultaneously available views such as tables, a list of labels, a bar chart, pie chart, histogram, stem and leaf plots, or scatterplot. These views are shown in different windows on a computer screen. They are linked in the sense that when a location in any one of the windows is selected by means of pointing device, the corresponding locations in the other windows are highlighted as well and GIS adds map as another view.

3.11               Multi-Dimensional GIS for Demographic Modeling

For GIS analysis and modeling, data is handled in different ways according to dimensionality; and each will be dealt with separately to be able to assess the shortcoming of each and what tasks it can accomplish; but before that let us look at what is involved in each. 2D is based on a Cartesian (x, y) coordinate system and is usually tied to a mapping datum and have been part of mapping since its inception and they form the fundamental base on which to present geographical data analysis. 2.5D mapping and analysis uses these Cartesian coordinates but adds an attribute such as height to achieve the extra half of a dimension. This is sometimes incorrectly referred to as 3D mapping and analysis; this type of model can only describe a surface and cannot handle more than one Z value at the same point (De Floriani, et al., 1998), thus the name 2.5D. 3D uses the concept of volumetric objects. If surfaces can be animated (surface fly-through or time lapse), they are called three and one half-dimensional (3.5D). 4D GIS adds time to volumetric analysis and mapping, and 5D GIS is defined as 4D plus attributes, but the focus in this thesis remains on purely spatial dimensions in 2D and 2.5D and 3D GIS.

3.11.1    Two-Dimensional GIS

2D mapping is limited to representation of data on planar surfaces and elevation is represented using attribute values. 2D mapping of demographical phenomena has been used with significant results. There are some advantages and disadvantages of 2D GIS for representing data, which are inherently 2.5D and 3D. Map components include vector objects (points, lines, polygons) or raster grids, which are used to display phenomena.

3.11.2    Two and half-Dimensional GIS

Mapping of demographical phenomena in 2.5D deals primarily with surfaces. These resulting surface interpretations can represent the distribution of DCs. The use of 2.5D representation of surfaces has some disadvantages for modeling, the most obvious being the lack of volumetric capabilities (Bernhardsen, 1999). Nonetheless, simple and efficient surface generation can be valuable for investigations that do not require 3D capabilities. Here z values can be used in a perspective plot to create the appearance of 3D. These are actually 2.5D plots, which are attractive for displaying continuous surfaces e.g. perspective plots can be computed from any viewpoint.

3.11.3    Three-Dimensional GIS

3D GIS is a technology that is increasingly being used for display and analysis of data containing horizontal and vertical spatial coordinates. A wide range of applications of 3D analysis is available to users in general (Raper, 1989). The 2D point, line and polygon vector representation of objects can be extended to include a volume element in 3D space (Kraak, 1993) and 3D raster grids can be used in analysis and display (Abdul-Rahman, et al., 1998). These volumetric vector and raster systems evolved from early solid modeling (Mäntylä, 1988).

3.12               Summary

This chapter has looked at the study area, resources to be used in the experimentation, how the data was obtained, stored in a Database (DB) including the DB design, geocoding, usage of data in GIS, and evolving Demographic model (DM) in GIS. Have been able to show how various DCs can be added to a developed model to consist of any DCs according to the analyst choice. This has the advantage in that at first, the analyst can develop DM consisting of DCs s/he previews as necessary at that time but as s/he continues, may find it necessary to add or remove for simulation purposes or finds them significant for specific analysis. It also showed the requirements for GIS-DSA and GIS demographics visualization methodology in order to generate visualizable features. It has ended by properly outlining the various dimensions (2D, 2.5D and 3D) that will be utilized in GIS-DSA in the chapters to follow.


Chapter IV


4.1                   Introduction

A review of GIS and demographics in planning and the methods of analysis was undertaken so that we get hands on what have been done, how it have been done, and the outcomes to help in GIS demographic spatial analysis (GIS-DSA). This chapter deals with 2D GIS demographic analysis before asking ourselves in chapter five, what other dimensionalities in GIS can be used to accomplish other GIS-DSA tasks of GIS-DSA.

Before we start to experiment GIS-DSA, let us summarize what GIS-DSA is suppose to achieve i.e. show spatial distribution and variation of DCs, show spatial relation between DCs and other infrastructures, how DCs can be represented as continuous variables, represent and visualize spatial influence of DCs, how the demographic quantities vary spatially, and multi-vertical representation of DCs. In order to accomplish that, the following will be looked at: spatial progressive similarity clustering, demographic spatial alternative appraisal, selection of DCs, demographic nearest neighbor analysis, demographic spatial change analysis and demographic spatial segregation/integration. Others include demographic iso-lines and vertical demographics, quantitative spatial effect, quantitative spatial analysis, transparent neighborhood analysis, demographic undershed, demographic overshed, demographic shrinkage points, demographic escalation points, demographic spatial pass, demographic overfold, demographic underfold, demographic spatial variation, demographic directional variation, demographic visibility analysis, and demographic solid analysis. Those will be based on the many GIS spatial analysis techniques (Bailey, 1994; Chou, 1997; Volusia, 1997), which include: Single layer operations (GIS procedures which correspond to attribute queries, spatial queries, and alternations of data that operate on a single data layer). Multiple-layer operations, which are useful for manipulation of spatial data on multiple data layers. Spatial modeling, which involves the construction of explanatory and predictive models for statistical testing. Point pattern analysis, which deals with the examination and evaluation of spatial patterns and the processes of point features. Network analysis, designed specifically for line features organized in connected networks like location analysis. Surface analysis deals with the spatial distribution of surface information; it involves the processing of spatial data in a continuous spatial form. Others include: spatial overlay, boundary analysis, proximity analysis, buffer analysis, clustering, and georeferencing and solid modeling.

4.2                   Two-Dimensional GIS Demographic Analysis

As a move to carry out GIS-DSA in 2D GIS using ArcView GIS as an example, let us look at demographic analysis preparation in ArcView, which involves getting a base map; in this case, the base map used is that of the cadastral GIS. On base map overlay the building layer on which all the spatial analyses are based. For the population analyses that are first carried out in database or statistical software are imported into ArcView GIS using the accessing tabular data capabilities of ArcView or linked using the SQL connection (see Appendix B). After importing population files, they are georeferenced to the buildings as the reference spatial units using ArcView’s single field style geocoding style[43]. As most GISs are not tooled for demographic analysis, to aid in analysis the following scripts and extension have been used.

4.2.1        ArcView GIS Extension

Many research and extension have been developed to incorporate different function in ArcView and there exist many scripts and extensions on ESRI ArcScript web site[44]. Some developments have focused on customizing the interface to reduce the functions so that non-GIS experts can easily accomplish task; a good example is on be Yagoub, et al when they developed a user interface for selecting a dumping site in Langkawi-Malaysia. Some have focused on incorporating new functions like SpaceStat by Anselin; the developed module in the next section follows this trend due to lack functions to accomplish GIS-DSA. The main intention here is to create new functions although reducing the circle (access to existing functions) through which non-GIS user can revolve around ArcView to carry out analysis and modeling has been taken into consideration. As the ability to describe complex geographic modeling with straightforward options for a greater number of users is well recommended by many researchers (Wilson, 1990).

4.2.2        Demographics Analyst

This module was developed using the object-oriented scripting language of ArcView Avenue to assist in demographic analysis, prototype for GIS-DSA and human-computer interaction within existing geographic information systems software i.e. ArcView GIS. It comprises of different scripts implementing several features of DSA that are unavailable from the standard menu choices of ArcView GIS.       Development Strategy

In the design and development of the new module, were constantly guided by various principles of human computer interface and software development. The quality and ease with which computer applications are developed depends greatly on the effectiveness and efficiency of the developer (Ganter, 1996). Hence, when developing the module used a style with shared consistency of appearance and organization. This style and consistency helps the programmers’ mind and the minds of code readers or future programmers to focus on the meaning of code and ignores the “noise”. Emphasis was put on program documentation (headers and comments) and the naming of variables, scripts and files in ways that impose a useful structure. For example, all names of scripts in the module start with demographic then the action it carries out; like, a script that displaces demographic points is named Demographic.DisplacePoints. This makes it easy to identify them from the ArcView directory when the module is loaded. The Hungarian notation was followed, which is popular in C and windows programming (McConnell, 1993) and is considered the best by many programmers (Ganter, 1996). Another key aspect of software development is respect for cognition or human information management for software is compiled and executed by computers, but it is written, debugged, reviewed, edited and maintained by people. The advice to "think first of people" (Davis 1995, Principle 92) led to a few simple procedures help compensate for the passage of time and the limitations of human memory so that others who may inherit or acquire our code will be added value to them as I did to others’ codes.

The development of this prototype followed a hierarchical interface design approach. This approach has three levels at which interface goals must be addressed. The first is the conceptual level of computer interface design, which should pinpoint what the system is for, who the expected system users are, what needs are met by the system, and what the results of working with the system should be. This is followed by development of operational level goals outlining specific tasks that must be accomplished to achieve conceptual level goals. In addition, this level requires the formalization of these tasks as operations on information. Lastly, at the implementational level, decisions are made about how to implement those operations and represent them to the user. The module that we set out to develop was designed for planners carry out DSA. It was done by taking advantage of the ArcView user interface that allows users to write their own scripts in a language called Avenue, and then incorporate those scripts directly into the interface using pull down menus, buttons, or tools.       User Interface

Designing an interface involves combining the available tools with a set of user requirements and good design principles (ESRI, 1993). The goal of an application programmer should be to create an interface that performs the desired tasks easily, efficiently, and requiring a minimal time for training. Whether the design is for small scale or big application the attention should be paid to organization, logical flow, visual appearance, ease of use, error checking, and on line help. The general principles that applicable to all user interface designs are given by Sommerville (1996): These principles are user familiarity, consistency, minimal surprise, recoverability and user guidance.

The increasing level of complexity of GIS systems requires special consideration to the user-friendliness of the system. This may be discussed from two points of view: from the viewpoint of specialized users, calling for sophisticated capabilities; and from the viewpoint of wide audiences, requiring the ease of use for non-GIS users. Raper and Rhind (1990) argue that ease of use is a vital criterion for the selection of an appropriate GIS. It is generally accepted that a system, which is easy to use, can help cut recruitment and training costs and help retain staff.       Functions of Demographics Analyst

This module was required to accomplish the following, which after its development become the functions of demographic analyst:

·        Adding x y coordinates to analysis tables

·        Updating coordinates

·        Calculating area and perimeter

·        Transferring attributes from one feature to another

·        Linking analysis features with other sources of information like pictures

·        Converting polygons to points

·        Spreading (displacing) points within a set boundary or spatial extent

·        Carrying out interactive nearest neighbor analysis

·        Carrying out progressive spatial clustering

Each function can be accessed independently as items from a pull down menu. This makes it easy for the user by providing a set of capabilities that can be combined in innovative ways. When the Demographic Analyst is loaded, an additional menu appears between the windows and help menu in the traditional ArcView GIS view interface (Figure 4.1) and it also appears in ArcView GIS table interface (Figure 4.3).

Figure 4.1 ArcView showing position and functions of Demographics Analyst

Spreading (displacing) points within a set boundary or spatial extent is a function that was to achieve spreading the geocoded population randomly within the boundaries of the polygons e.g. buildings so that they are not on top of each other. This does not introduce errors as the building was used as unit of spatial analysis. It was done using the “displace points. The result is that each point appears with a unique spatial reference to aid in spatial planning and gives good visualization of the density. It can also be employed in evaluating alternatives by spatial varying of points, which change the spatial location of DCs.

Transferring attributes from one feature to another: when it comes to point analyses, to avoid the problem of points in the display having the same constant attributes after the evaluating alternatives by spatial varying points and introducing new ones. We use “one to many link function that helps to link the personal attributes to the randomly displaced points and other features. This was employed in demographic analysis basing on the buildings as the unit of analysis where the individuals were linked to the building.

Converting polygons to points: Some times it became important to carry out progressive polygon to point analysis, it was done using the “convert polygon to point” which adds another point theme, then the analysis is done basing on the points.

Add x y coordinates is used to assist to evaluate alternatives by varying the numbers where we can delete or add some DCs. In all cases, we are able to add coordinates using add x y coordinates function or update the new locations in table using the function calculate and update coordinates. The functioncalculate area and perimetercomes into play when the area and perimeter are needed in analysis.

Distance between points: It calculates distance from points in one theme to points in another. This function will prompt the user for two point themes in the active view. The first is the point theme containing the selected points that you wish to calculate the distance from. The second is the point theme containing points that you wish to calculate the distance to. Its role is in spatial analysis of DCs that are in different themes.

Spatial nearest neighbor: this function was attached to a button on ArcView GUI (see red arrow labeled “A” on Figure 4.1 or “a”, Figure 4.2). When you click this button, a cursor cross will appear and can drag any rectangular area and then a message box will tell you the R-value and how many features were accounted for in the analysis. R-values relate how clustered or dispersed points are within the rectangle. This provides the analyst the opportunity to carry interactive DSA by specifying a variable spatial extent. It can be applied in many tasks like progressive density analysis, progressive clustering, etc to be detailed later (sections 4.2.4, 4.2.7, 4.2.6).

Find Nearest Feature: Finds feature nearest to a point, it works by creating a new tool in a ViewDocGUI as the apply function with script called Demographic.FindNearestFeature which is automatically put into ArcView directory when Demographic Analyst is loaded. It is attached to tool in ArcView GUI (see red arrow labeled “B”, Figure 4.2). It finds the nearest feature in the active theme to the point entered interactively by the user after activating the cursor by clicking on the tool, reports the distance and selects the feature.

Summarizes theme: Summarizes selected fields from active theme and uses the summary dialog to allow the specification of fields and rules for aggregation. The summarization occurs directly from the view document, using the current active theme. The result of the summarization is a new .dbf file (newtable.dbf), which is opened. If the Merge option is chosen, a new theme is created and can be added to a view. This script provides similar functionality to existing controls in the table DocGUI, however this is executed directly from a view.

Figure 4.2 Demonstration of the user interface and Demographics Analyst

Since the main concern was not to develop the module but to carry out DSA, only a brief description of its functions has been given in this subsection; for the various applications will be dealt within the various sections to follow as the need arise to accomplish certain tasks of DSA. For detailed description and other functions and the tasks that can be accomplished by this module, click the help under the Demographic Analyst[45] menu when the module is loaded. This will open a popup menu having all topics of the module and clicking on any like “displace points” opens up another popup window containing the descriptions as shown on Figure 4.2.       Table Document Graphical User Interface

Demographic Analyst also adds a new menu table DocGUI (Figure 4.3) with the following functions:


Figure 4.3 ArcView table interface showing Demographics Analyst

Create New Table: This creates a new Table using Multi-input dialog box. An empty table is created and Message box is used to prompt the user for information to add to the table. After clicking cancel, the new table document will be added to the project and opened. The default ID in the dialog box is incremented for each record as the record number displayed in the dialog box title. The fields that are created are ID, Name, Age, Building, Road, Gender, Race, Religion, Marital. This allows new data to be added to a project and database.

Combine fields:  This combines two fields in a table, it allows two existing fields within a shapefile to be concatenated and written to a user specified field. To concatenate more than two fields, run the function in succession. It can be run from any interface (i.e. Views, Tables, etc.) as it collects all documents from the project and pulls out all the table documents.

Modify table: This renames or modifies fields in a table document by presenting a dialog box requesting you to select field to modify then after selected a field another dialog box will appear for you to enter modification parameters i.e. name, type, width, precision       Scripts in the Demographics Analyst

There are many scripts that make up the Demographic Analyst; given below are the main scripts; minor scripts like the one which install, uninstall, add menu, remove menu, create dialog and scripts which describe and gives help to the functions are not given below, they can be found as explained in section The scripts include:  Demographic.AboutDemgraphicsAnalyst gives a brief description of extension; it is under the function About Demographic Analyst. Demographic.FindNearestFeature: finds feature nearest to a point, it under the function Find Nearest Feature. Demographic.SpatialNearestNeighbor: perform spatial nearest neighbor analysis, it under the function Spatial Nearest Neighbor. MakeExtension: it is the script responsible to generate the Demographics Analyst when it is complied and it write the extension with the file name Demographics1.avx. Demographic.DistanceBtnThemes: calculates distances from points in one theme to points in another, it is under the function Point Distance Between Themes. Demographic.AddXYCoordinates: adds X and Y coordinates of features to attribute table, it under the function Add XY Coordinates; it was derived from a script called addxycoo.ave. Demographic.Areas: returns the area of a shape, it under the function Calculate Area, Acres and Perimeter. Demographic.AttributeTransfer: this script overlays one polygon theme (the "edit" theme) on another polygon theme (the "source" theme) with similar or larger features and transfers attributes of the source theme to the features of the edit theme, it under the function Attribute Transfer; it was modified from a script called overlayatts.ave[46]. Demographic.UpdateCoordinates: this script updates the latitude and longitude fields of a table to accurately reflect the positions that are assigned to the actual feature, it under the function Calculate and Update Coordinates; it was derived from fixtabxy.ave. Demographic.Summarize: it summarizes selected fields from active theme; it is under the function Summarize Theme; was derived from thmsumm.ave. Demographic.DisplacePoints: this script will displace points that fall on top of one another. It will spread the points in a radial pattern from one central point that remains the same. It prompts the user to enter a disperse distance in decimal degrees, it under the function Displace Points. Demographic.PolygonToPoints: Converts selected polygon/polyline to points to create a new shapefile, it is under the function Convert Polygon to Point; it was derived from a script called poly2point.ave[47]. Demographic.CombineFileds: this script allows two existing fields within a shapefile to be concatenated and written to a user specified field, it was generated from a script called Fields.Concatenate[48]. “Write all Document Locations” write name and path to all themes and tables in a project to a text file, it uses a script called Demographic.WriteInfo modified from a script called writeinfo.ave[49]. All the scripts obtainable from ESRI ArcScript web site[50]       Further Details on Demographics Analyst

This module is available on this thesis web site[51]. It can also be downloaded from at ESRI ArcScript web site[52] as explained in Appendix F. It comprises a compressed file containing both the Demographics Analyst extension file (demographics1.avx) and the ArcView 3.1 project file (demographics.apr) used to create the extension. Demographics.apr contains all of the Avenue scripts used in the Demographic Analyst extension as well as the script that is used to create the extension i.e. MakeExtension. The project file has been included in this package to simplify viewing of the Avenue scripts as a way of promoting sharing and to enable those who would like to make further changes or what to learn how to do it. Also included in the zip file is readme.htm, which introduces the module and explains how to use and load it. By making the scripts available, I hope it will encourage further enhancements to its functionalities.

To use the Demographics Analyst extension, copy the file demographics1.avx to ArcView's 32-bit extension directory. On Windows platforms, this directory is usually: C:\ESRI\AV_GIS30\ARCVIEW\EXT32. After demographics1.avx has been copied to ArcView's 32-bit extension directory, it can be loaded from ArcView's File menu by selecting the Extensions... menu choice then Demographics Analyst. When it is loaded, a menu labeled Demographics Analyst is added to the menu bar for Views between the Window and the Help menus. The Demographic Analyst menu contains the its functions and the help choice with instructions on how to use it. Since emphasis was put on program documentation (headers and comments) and the naming of variables, scripts and files in ways that impose a useful structure. For example, all names of scripts start with Demographic then the action it carries out; like, a script that displaces demographic points is named Demographic.DisplacePoints, and the script which provides more details and functions and help on how to use it is named Demographic.HelpDisplacePoints. This makes it easy to identify them from the ArcView directory when the module is loaded. In order to get the scripts either open the project file (demographics.apr) and click on scripts under the main menu. Alternatively, invoke the script manager when you click on the system script button. You also bring up the script manager when you double-click the Apply property, the click property or the Update property in the Properties list of the Customize dialog box.

4.2.3        Other Scripts and Extensions used

Other Extensions and scripts used in this thesis included: Link2InternetExplorer[53], this extension, once loaded will add a button in View GUI button bar having Internet Explorer icon on it (red arrow labeled “e”, Figure 4.2) used to link html information during analysis. Hot potato extension[54], this once loaded will add buttons in View GUI (see red arrows labeled “b” and “c”, Figure 4.2). It is used for linking images and pictures to view. For example after establishing the link between the buildings and an active theme using link button (red arrow labeled “b”, Figure 4.2), by clicking on the hot potato link button (red arrow labeled “c”, Figure 4.2) and clicking on feature like the heritage center office (highlighted polygon see where the red arrow labeled “d”, Figure 4.2 is pointing), a picture of the building will popup to add on the analysis by providing visualization. Others include random point extension[55], Nearest neighbor analyst extensions[56]; Xtools extension,, Analysis extension, edit tool extension, Bivariate mapping extensions, Database access extensions, Geoprocessing extensions, Coordinate extension, grid analyst extensions, etc all available ESRI ArcScript web site[57] or thesis web site[58] and their contribution to GIS-DSA will be highlighted in the subsequent analyses.

4.2.4        Demographic Nearest Neighbor Analysis

GIS makes it easy to perform spatial nearest neighbor analysis in order to understand the how close individuals or different DCs are. It is one way of analyzing locations of DCs/individuals by measuring the distance between them. This technique has been used to determine spatial relationships in demographic data; e.g. to assess the spatial relationship between the members of the same ethnic group and other ethnic groups. This demonstrated in ArcView GIS using nearest neighbor analysis function under Demographics Analyst (section 4.2.2). It is done by dragging a rectangle around the features you wish to conduct spatial nearest neighbor analysis. The wait cursor will appear and then a message box will tell you the R-value and how many features were accounted for in the analysis. R-values relate how clustered or dispersed points (or centroids of polygons and Polylines) are within the rectangle you specified. An R-value of 0 (zero) indicates an intensely clustered pattern, while an R value of 1 indicates a random distribution, and an R-value of 2 (or higher) indicates strongly dispersed pattern. This was experimented by running the nearest neighbor analysis on two racial groups (Chinese and Indians) in the study area to find out the differences in their location. Started by using the whole study area as the spatial extent, this gave an R-value of 0.373126 for the Chinese and R-value of 0.130641 for the Indians. Changing the spatial extent to smaller area (see red rectangle called spatialborder on Figure 4.4), it gave totally a different set of R-values i.e. R-value of 0.32398 for the Chinese and R-value of 0.207086 for the Indians (Figure 4.4).

Figure 4.4 Nearest neighbor analysis

This shows that the two racial groups do not have the same distribution through the study area with; the Chinese being almost uniformly distributed in the two tested areas with 12 persons in the small area out of 241 in the study area. The Indians have a higher concentration in the small area, which accounts for 38 out of 52 in the study area. With such analysis, we are in position to carry out micro/disaggregated spatial analysis and make conclusions about the spatial composition and location of the population according to the DCs upon which we are basing our spatial analysis.

4.2.5        Demographic Spatial Change Analysis

In conventional analysis, percentage change in population is derived using the model of dividing the most recent population by the earlier population, subtracting one from the result, and multiplying by 100 percent to convert to a percentage (Plane, et al., 1994). It can be noted that the model does not take into consideration the spatial referencing, which is needed so that the analyst knows where the change is occurring. In the conventional model, these spatial percentage changes are taken according known zonal subdivisions. This is when GIS spatial analysis capacity has to be brought in to combine the above model with the spatial dimension to come up with spatial percentage change. There are many opinions for the analyst, either to set a spatial extent using an interactive tool (as demonstrated in section 4.2.4) or to use a polygon that can by overlaid to the population. With this, the analyst does not have to know the zones in advance and the result have spatial component as it is needed (section 2.3.3).

4.2.6        Spatial Progressive Similarity Clustering of DCs

Cluster analysis is about determining homogenous areas; there exist many methods of clustering like measure of similarity, iterative clustering, and agglomerative clustering (Plane, et al., 1994). All these methods have one weakness in that they consider only clustering between areas/regions, because of that they can not accomplish the classification needed at micro level; where DCs could be grouped in such a way that each have to be added incrementally. Thus, need for an improved one i.e. measuring or carrying out cluster analysis using a method developed for thesis named spatial progressive similarity. This method can be used for household identification and incrementally combine to come up with a cluster of similar or same characteristics. This can be used for household analysis and other analysis, which may involve classification. For example, it can be employed to carry out clustering according to ethnicity using householders as the centroid for each household. Here the householders are identified together with their DCs and the distances between every two householders are measured. These distances are compared for every pair of the householders and if householders in pair measured are of the same characteristic under investigation and the distance between these householders is smaller relative to the distance between dissimilar householders, then the two householders are combined and also their entities are used in the next progressive similarity clustering. The distances between two householders is obtained using Euclidean method for calculating distance between two points. The first initial steps they are clustered with many polygons and holes in the study area like Figure 4.5 where polygons are drawn around every different DC where the distance with other DCs is greater than between similar DC. As the process continues those are refined by combining polygons having similar DCs and adjusting the boundaries so that cover up the holes and occupy the all of the study area like Figure 4.6.

Figure 4.5 Identifying demographic characteristics in progressive clustering

Figure 4.6 Combining demographic characteristics in progressive clustering

4.2.7        Demographic Spatial Alternative Appraisal

Planners are always faced with the task of looking for spatial locations where the DCs do not conflict. This can be achieved by evaluating alternatives of DCs spatial distribution in study area. For example evaluating alternatives by editing features, this involves moving points (people) to different or better position in order to evaluate the alternative for planning purposes. For example if we want to move all people staying in the area demarcated with red boundary (see area for clearing on Figure 4.7) so that area is set aside for open space and utilities.

Figure 4.7 Location of the people and the area for clearing

Green polygon representing open space, brown for social activities, pink for commercial, gray for utilities and infrastructures

One way is by using the add x y coordinates function in Demographic Analyst add coordinates in the attribute, then opening the table (Figure 4.8) and make changes to the coordinates so that points move to new locations then geocoding them to the buildings or adding them as event theme under the view menu. We used the easiest way of just moving points from the identified spatial extent to other locations. With this, we are able to visualize the effect of the change before making decision. After making the changes like in our example, the area has been cleared (Figure 4.8); then we employed the function of Demographics Analyst to accomplish the analysis and assess the effect.

Figure 4.8 Evaluating alternatives by editing features

Using the function calculate and update coordinates under demographic analyst we updated the new locations in table. It was also possible to evaluate alternatives by varying the numbers where we can delete or add some DCs. Being able to evaluate alternatives in GIS is vital in that in very many situation planners are faced with the problem of comparing DCs spatially for different locations or zones, which they want to do in such a way that location/zones can be varied at any time in the process. GIS provides the capability to do such types of analyses for different sizes of spatial dimensions. Having the population at different geographical locations geocoded and displaying these geographical locations either in the same view or different views to provide visual comparison in ArcView GIS, it is done by setting a spatial extent using an interactive tool as described in section 4.2.4. This can be done for the different DCs for any spatial extent. Thus, the planner does not have to know the geographical size to which to carry out spatial demographic comparison before beginning on the task of analysis. In addition to that, Demographic Analyst offers more analysis power compared to the traditional techniques where it is possible also to carry out such analyses by setting the same spatial sizes in different zones. This provides the planner with the capacity to judge whether the demographic difference is due to geographical location or to spatial dimension or both.

4.2.8        Demographic Spatial Segregation/Integration Analysis

Segregation indexes (index of dissimilarity, the Gini index of diversity, exposure index, entropy index of segregation) reflect the extent to which the various subgroups of the total population are clustered within certain geographical sub areas (Plane, et al., 1994. pp. 303-307). It is usually standardized so that it ranges from a minimum of zero which is found when each geographic sub area has an equal share of the subgroup total population to a maximum value of one which is obtained only when there is no mixing of subgroup in any geographic sub areas. They have real world application to redistricting and zoning in planning process and these segregation indexes can be used to evaluate the fairness of alternative proposed plans for carving up a region into units or zones (Morrison, et al, 1992; White, 1986). These measures when applied alone may produce results that are not right. A case in point (Morrison & Clark, 1992) is Hispanic district drawn up in Los Angeles (USA) under a special census tabulation. The so-called Hispanic district was, in fact 53 percent non-Hispanic. This was the result of the lack of incorporation of the spatial aspect to determine to extent to apply the indices. GIS can improve them by incorporating the spatial dimension, thus being able to select areas on which to apply the segregation measures and locate areas where the index is low or high as the need varies. This is being experimented in this thesis on “residential racial segregation against integration” and residential racial segregation is being defined as the tendency for individuals with different racial backgrounds to inhabit different parts of metropolitan areas in greater concentrations due to various reasons[59]. Since we want to determine how the concentration of races varies spatially, start with running Correlation in SPSS on micro data (Table E.1). This gives -0.167 (Table 4.1) using Spearman and Pearson correlation as given in
Table A.1

Table 4.1 Spearman Correlation of Analysis Variables











Cor. Coefficient







Sig. (2-tailed)








Cor. Coefficient







Sig. (2-tailed)








Cor. Coefficient







Sig. (2-tailed)








Cor. Coefficient







Sig. (2-tailed)








Cor. Coefficient







Sig. (2-tailed)








Cor. Coefficient







Sig. (2-tailed)







** Correlation is significant at the .01 level (2-tailed) and N= 343


Spearman correlation indicates the relationship between the various racial groups and the streets in the study area (Figure 4.9). The result shows that there is negative correlation between race and location. Tabulating the race against streets (
Table 4.2
) we start to observe trend that is not totally in agreement with the above.

Table 4.2 Road * Race cross tabulation





1 (Chinese)

2 (Indian)

3 (Malay)


1 (Jalan Masjid Kapit)





2 (Lebuh Acheh)





3 (Lebuh Armenian)





4 (Lebuh Cannon)





5 (Medan Cannon)





6 (Lebuh Pantai)





7 (Lorong Lumut)











Continuing with the investigation into GIS, there is disagreement with the results obtained in ArcView GIS where the first step in this process was to geocode all racial groupings so that we relate them to locations. This provides insight into the changing residential patterns of racial and ethnic groups in study area. The results show that, there is spatial relationship between race and location; Chinese as being dominate on Jalan Masjid Kapit, Indians as the majority on Lorong Lumut (Figure 4.9). Such results require less previous knowledge of how to interpret results as they can be visualized. The variation in results can be explained, as the statistical correlation does not take into consideration spatial relationship and proximity, which GISs handle well and the other thing is that GIS provides interactive analyses, giving analyst the opportunity to compare the attributes with the visual results. Then a question comes in “what about if the relationship is not so obvious for the eye to easily see?” This is where GIS is very important as it uses the computing technology to provide the smallest detail in the data. All GISs have the pan and zoom functions; with these, the analyst can select a section where the pattern is not clear and zoom in to see the details and using the pan function to move to the next section. Because of such GIS capabilities, the researcher saw no need in developing a corresponding spatial index as such index would be limiting the capabilities of GIS disaggregated spatial analysis. Also it should be noted that the individual persons have been used in the analyzes with their location being referenced to buildings.

Figure 4.9 Spatial distribution of racial categories

Black dots representing Chinese, yellow for Indian and Malay by blue.


Carrying out aggregation of such data according to the streets (this time looking at only race)
Table 4.2
, we see that much information is not available as compared to Table E.1. Analyzing such aggregated data in GIS with only such a slight aggregation of buildings to streets, we immediately unable to take full advantage of GIS spatial analysis capabilities; as we are able only to show the aggregated number of individuals on each street (Figure 4.10). We loss details about the location of individuals as from that we can not determine who belongs to which building as compared to earlier disaggregated analyses.


Figure 4.10 Aggregation of demographic characteristics

4.2.9        Selection of Analysis Variables among DCs

Here, the problem addressed is that of having many DCs, which one consider for a specific planning analysis. For example “what DCs can contribute to population location determination according to DCs, that must be included in the location analysis” according to data from the study area (Table E.1). First run the analysis in SPSS giving the results in Table 4.1, with Spearman correlation between location (road) and gender = -0.019, age = -0.043, race = -0.167, religion = 0.055, and marital status = 0.072. These results indicate the level at which each DC shows how the population composition varies with location. All those variables giving R (correlation between the predicator variables combined and the dependent variable) of 0.194 (Table 4.3) using multiple regression by entering variables (SPSS, 1998, Foster (1998) and R square of 0.038.

Table 4.3 Analysis by entering DCs with road as dependent variable



R Square

Adjusted R Square

Std. Error of the Estimate


























a Predictors: (Constant), GENDER

b Predictors: (Constant), GENDER, AGE

c Predictors: (Constant), GENDER, AGE, RACE

d Predictors: (Constant), GENDER, AGE, RACE, RELIGION

e Predictors: (Constant), GENDER, AGE, RACE, RELIGION, MARITAL


Looking at Table 4.3 we can see that marital status, which is given the highest correlation with location in Table 4.1, has no contribution to a combined model; this is further explained by running ANOVA as given in Table A.2, getting coefficient of correlation in Table A.3, and excluding variables analysis in Table A.4. In addition carrying spatial analysis in ArcView GIS, the results of marital status in Heritage area (Figure 4.11) show that the marital status (which in this study we divided it into only single and married) is spread throughout the whole area and cannot be used for population allocation. Maybe other population data from different areas, marital status may provide insights for location determination, for this study area is not the case. That is why is important for each analysis to be carried out to determinant DCs which will give reliable results. From that we may be prompted to conclude that none of the variables can be depended upon for location determination, which is not right in this specific case of heritage area. When we compare the other factors spatially it can be found out that race has the highest correlation with location (Figure 4.9) followed by religion (Figure 4.12). This is the importance of GIS analysis, where it can be used to find spatial relationships between variables. From that if you need to get statistical results, then do so after knowing which variables should be included; like in the above example, we should not have included marital status. This eases the work of the planner and guides him from making unrealistic analysis, which will give false results.

Figure 4.11 Spatial distribution and location of marital status

Figure 4.12 Spatial distribution and location of the various religions in study area

Brown dots represents Buddhist, dark blue - Hindu, green - Islam, yellow - Christians, and purple – others



4.3                   Summary

This chapter dealt with demographic spatial analysis in 2D GIS and as modules for all techniques were not available in the ArcView GIS; various extensions and scripts including producing a prototype module (Demographic Analyst) as an extension to ArcView was employed for the analysis to provide extra demands of demographic analysis. These facilitated to compare the demographic analysis results between statistical analysis techniques and GIS and accomplishing GIS demographic spatial analysis in 2D e.g. spatial demographic nearest neighbor analysis, demographic spatial analysis, spatial alternative analysis, spatial progressive clustering, selection of analysis variables, etc.


Chapter V


5.1                   Introduction

Carrying out demographic analysis and modeling in 2D was undertaken on the premise that there are techniques that could accomplish tasks of GIS demographic spatial analysis (GIS-DSA) to generate visualizable features and quantities for planning; which involved development of Demographics Analyst. According to the problem statement in section 1.2, using 2D GIS (section 4.2) we have been able to locate and show the relationships between DCs, represent DCs according to their proximate, and how they vary spatially. However, we have not been able to represent and visualize influence of DCs, how the demographic quantities vary spatially, unable to spatially analyze multi-vertical demographics and their representation. In addition, we have been unable to spatially to represent DCs like marital status, which have only two factors differentiating them so that their quantities can be depicted in addition to their traditional representation (single or married). Hence the need to incorporate a third (vertical) dimension in the spatial analysis in addition to x, y plane and this chapter deals with modeling of demographics as third dimension in order to carry out GIS-DSA in 2.5D and 3D. We start by looking at demographic data interpolation and extrapolation in order carry out modeling of the third (vertical), which has been divided into 1) demographic surface characterization and modeling in 2.5D (surface-based) where relationship between vertical and horizontal position is one to one and 2) 3D solid demographic modeling (volume-based) which is many to one

5.2                   Surface and Volume-Based Demographic Spatial Analysis

The following visualization may be helpful. Imagine that Figure 5.1 is a box consisting of clay, the vertical amount in every location being the gray color and top surface of clay being the blue-green. Cutting the clay using a sharp edge at equal intervals in all directions without removing any slice, it appears as grided as shown with thin black lines running parallel to each other. This is surface-based is where we can visualize the surface (blue-green and gray part) being represented by surface primitives, in other words, 3D objects are described in terms of their external observable surface which describe 3D surface using augmented 2D modeling primitives.

Figure 5.1 Surface and volume visualization

Taking slice DD (red color) away and visualizing the block from side DD, we will observe a hollow opening. We will be able to see the inside and the same happens if slice CC (light green). That is volume-based if as object’s interior is described by solid information i.e. 3D objects are described in terms of the volume they occupy and deals with the 3D aspects using 3D model primitives. With volume-based, in addition to the surface-based we are able to visualize the interior, this enable us to be in position to make conclusion about complex situations according to our desire. The situation maybe that there is a DC on the base and stopping anywhere before reaching the top, or it may reach the surface, or not touching any of the surface like assuming we remove slice DD and using that space we cut out on the side of removing clay, take it out and replace DD in its position. We have to be able to determine the mass and visualize that situation. Thus, unlike volumetric 3D object models, surface representations ignore the invisible internal volume of the objects. In such cases, the only available information is on the observable surface boundary and nothing is known about the space occupied by this boundary. Since the surface boundaries are the primitives of these models, most of the analysis is facilitated by appropriate mappings of the curved surfaces onto planar patches while preserving the surface topology. Each will be handled separately, but before we carry out surface-based (section 5.5) and volume-based in (section 5.6), let look at ways by which demographics can be modeled incorporating the vertical dimension.

5.3                   3D Spatial Object Representation of Demographics

Once the demographic data is collected, the task is to represent it in a 3D-DM, of which the final form is in grid or triangular format i.e. a raster or vector 3D data structure must be chosen to describe demographics as geo-objects. Figure 5.2 shows object representations adapted from Li (1994), but modified to fit this research’s requirement.

Figure 5.2 Three dimensional spatial object representations

5.3.1        3D Data Structures

3D Raster Data Structures: In the raster approach, voxels serve as the building blocks for geo-objects. The topology of voxel data is inherent in the data structure (Worboys, 1995), with cell values in the grid accessible by row, column and level (z). There are three methods to store voxels. The simplest form of storage is as a Binary raster, where voxels are indexed as on or off depending on whether they make up a particular geo-object. Understandably, this approach often requires large amounts of available storage and searching for attribute data within a 3D grid can be enhanced with another alternative, indexing by Octree, the 3D equivalent of their 2D counterpart, the quadtree. Constructive solid geometry (CSG), a common CAD/CAM solid modeling technique (Samet, 1990a, 1990b), combines the occurrences of 3D primitives (objects) such as cubes, spheres, and cylinders using geometrical transformations and regularized set operations into a binary tree. As the 3D raster data structure is a series of rows and columns of cubes; this relative simplicity of 3D raster grids comprised of voxels is important; however, raster volumetric environments do have some limitations (Samet, 1990a). For example, the boundary of an object can be somewhat jagged if voxel resolution is low. Grid spacing (resolution) is an important feature of raster systems. A grid with large spacing (low resolution) may not represent elements effectively, and a grid with small spacing (high resolution) may store too much data about an object to permit efficient processing.

3D Vector Data Structures: Vector data structures represent features as a series of discrete linearly connected points within a database. Building upon the points, connections between points are made to represent lines (vectors). A series of lines can then connect to form polygons. Point, line and polygon features also contain associated data such as attributes that are used to describe the characteristics of features they represent. These basic features are used to build topology between objects and, therefore, define their spatial relationship. This spatial foundation for storing objects allows for query of information and attributes attached to those objects (Worboys, 1995).

5.3.2        Surface-Based Representation

There are many ways for surface-based representation; the most commonly used are grid, shape model, facet model and B-rep (Figure 5.2); each has its strong points, weakness and fields of appropriate application (Abdul-Rahman, et al., 1998; Li, 1994).

Grid: A grid is a popular method of surface representation in GIS, digital mapping and DTM. Many DTM and terrain surface packages are based on this representation as discussed in Petrie and Kennie (1990). It uses points as the basic element and it has several advantages, e.g. the structure is simple to generate, topology information is implicitly defined (Peucker, et al., 1978). The grid structure could be regular or irregular. In both cases (regular or irregular), every grid point has height (single value). Excellent surface maps can be derived with this structure, but it is unable to represent objects with multiple heights.

Shape model: A shape model describes an object surface by using surface point derivatives (e.g. slopes). With known slopes of each grid point, a normal vector of a grid point can be defined and used to determine the shape of the surface. Li (1994) reported that this structure has an important application in 2.5D (surface) model reconstruction but not in 3D.

Boundary representation (B-rep): B-Rep models represent a solid indirectly by a representation of its bounding surface; an object is represented by a combination of primitives of type point, edge, face and volume. Faces, edges, and vertices and the related geometric information form the basic components of B-Rep models (Li, 1994; Reda, 1996). Due to computational complexity and inefficient Boolean operations (Cambray, 1993), it has been suggested that B-rep is suitable for regular objects (Li 1994; Mäntylä 1988) and B-rep models can represent a wide class of objects but data structure is complex, and it requires a large memory space.

Facet model: A facet model describes an object surface-by-surface cell, which can be of different shapes and sizes. One of the most popular facet models is a triangle facet. A TIN is an example of the facets. A surface can be described by a network of triangle facets; each facet consists of three triangle nodes that have x, y, z coordinates for each node. This structure is widely used in DTM and other terrain surface software mainly because of its structural stability (Midtbø, 1996), simplicity for processing (Abdul-Rahman, et al., 1998), and for object visualization (Kraak, 1992). Triangles may be generated in raster or in a vector domain and most techniques of triangulation are based on Delaunay triangulations.

5.3.3        Volume-Based Representation

There are many ways for Volume-based representation; the most commonly used are 3D array, octree, Constructive Solid Geometry (CSG) and 3D TIN (Tetrahedral network, TEN) (Figure 5.2) each has its strong points, weakness and fields of appropriate application (Abdul-Rahman, et al., 1998; Li, 1994). They are employed in Solid Modeling; where all primitives are guaranteed to occupy 3D space. Solid modeling is good for a variety of purposes beyond guaranteeing physically realizable objects. It is easy to derive properties such as length and volume from solids.

Constructive Solid Geometry (CSG) represents an object by a combination of predefined simple primitives called geometric primitives. The primitives are, for example, spheres, cubes, cylinders, cones, or rectangular solids, and they are combined using Boolean set operators and linear transformations. As discussed by Abdul-Rahman and Drummond (1998) the storage space increases as the number of primitives increases (Samet, 1990a) and some research works suggested that CSG is suitable for describing regularly shape objects (Cambray, 1993; Li, 1994).

Octree (Voxel) Representation: An octree is an established hierarchical data structure that specifies the occupancy of cubic regions of object space. Here 3D object is represented by the volume it occupies in a Cartesian 3D space, such a space is subdivided into regular volume units, namely a unit 3D cube, also referred to as voxel (Reda, 1996). It is simply a 3D generation of a quadtree. In this technique, a volume of space is recursively divided into 8 parts until each part of the subdivision is homogeneous. Octree subdivision is a method of data compression applied to eliminate storage of similar data for identical valued adjacent cells. If all the cells within one octant are similar, the whole octant is stored as one element. If features do not fill an entire octant, that area is subdivided into eight again until the highest resolution of storage is reached. The octree structure allows for efficient Boolean operations on geo-objects, but it is time-consuming to build. By compressing the similar voxels, data can be reduced efficiently, yet when objects are complex, little or no data reduction may occur. Octrees are also an approximate representation therefore a very detailed representation of objects is hard to achieve; in octree, storage space increases rapidly when resolution increases.

3D TIN (Tetrahedral Network, TEN): 3D TIN is an extension of a 2D TIN. An object is described by connected but not overlapping tetrahedral networks (TEN) (Midtbø, 1996; Pilouk, 1996). One can estimate a surface value anywhere in the triangulation by averaging node values of nearby triangles, giving more weight and influence to those that are closer. The resolution of TINs can vary, that is, they can be more detailed in areas where the surface is more complex and less detailed in areas where the surface is simpler. The coordinates of the source data are maintained as part of the triangulation so subsequent analysis like interpolation will honor the source data precisely. Here no information is lost; and TINs are well suited for simulating the dynamics of surface change especially using a finite-difference rather than a finite-element approach (Tucker, et al. 1999), in which MODC can be computed at the nodes rather than within the triangles. TIN has the ability to adjust its resolution based on the complexity of the surface being modeled (Abdelguerfi, et al., 1997) and it can incorporate surface specified constraints such as pre-specified linear and area features. Similar to 2D TIN, TEN has many advantages in manipulation, display and analysis; like employing a tetrahedron, which is made of four vertices, six edges and four faces. This representation has been considered a useful data structure in 3D GIS by many researchers including Raper and Kelk (1991). It can be generated using the same technique as for 2D TIN (Abdul-Rahman, et al., 1998). If we built 2D TIN from 2D Voronoi, then the 2D Voronoi is extended to 3D. 3D TIN can be derived from 3D Voronoi polyhedrons (Qingquan & Deren, 1996). These two authors pointed out that tetrahedron has several advantages compared to other solid structures including its the simplest data structure and can be reduced to point, line, area, and volume (solid) representations; fast topological processing; and also convenient for rapid visualization.

5.4                   Demographic Data Interpolation and Extrapolation

In order to present population data continuously, there is a need to interpolate and extrapolate the MODC in locations that may not fall at the exact locations of the data points and demarcation of demographic boundaries to help to derive and generate continuous phenomena in the form of surface (section 5.5.1). In case of population data aggregated to polygons, to spread the data from centroids to cover the whole polygon to meet the needs of prediction and simulation. There are many interpolation techniques each with its weakness and strength and additional conditions being imposed on the general formulation of the spatial interpolation problem (Mitas, et al., 1999), which defines the character of the various interpolation techniques and are classified as point or areal.

For population data aggregated to polygon, areal interpolation is always employed. It is the problem of transferring data from one set of areas (source reporting zones) to another (target reporting zones). This is easy if the target set is an aggregation of the source set, but more difficult, if the boundaries of the target set are independent of the source set (Bracken & Martin, 1995). This implicitly assumes that the data are uniformly distributed throughout the polygon. This is usually not the case for population related data. Many researches are taking place and looking for ways of better population interpolation, those include improves in areal interpolation which weights data values for a partial polygon proportionally to the ratio of partial polygon area to complete polygon area by Shepard, and others. Rase (1998) also dealt with problem of population in his “Interpolation and display of statistical surfaces” and “Volume-Preserving Interpolation of a Smooth Surface from Polygon-Related Data” where the volume-preserving properties of Tobler’s pycnophylactic interpolation and the advantages of the triangular irregular network for preserving the geometry of lines are combined.

The demographic data being used in this thesis do not exist uniformly in the study area and have been geocoded as discrete points (section 3.7.3) and point-based interpolation are employed. These work on the principle that given a number of points whose locations and values are known; determine the values of other points at predetermined locations. MacEachran and Davidson (1987) identify five factors significant to the accuracy of continuous surface representation: 1) data measurement accuracy, 2) control point density, 3) spatial distribution of data collection points, 4) intermediate value estimation, and 5) spatial variability of surface represented. For the first factor, the individual micro data was collected as discussed in section 3.4, second and third have been dealt within geocoding section 3.7.3 were the person’s place of residence was use for georeferencing basing on building as the spatial unit. Here we look at remaining factors (4 & 5) being referred to as spatial interpolation. Spatial interpolation is the procedure of estimating the value of properties at unsampled sites within the area covered by existing observations, it can be thought of as the reverse of the process used to select the few points from a DM, which accurately represent the surface. The rationale behind spatial interpolation is that points close together in space are more likely to have similar values than points far apart (Tobler's Law of Geography; Martin, 1991).

For point-based, the typical examples are conditions based on geostatistical concepts (Kriging), locality (nearest neighbor and finite element methods, IDW, TIN), smoothness and tension (spline), or ad hoc functional forms (polynomials, multi-quadrics). Several researchers including Burrough (1986), Davis (1986), Hearn and Baker (1986), Isaaks and Srivastava (1989), Cressie (1993), Wingle and Poster (1996) have discussed these techniques. Kriging: is based on the concept of random functions where the surface or volume is assumed to be one realization of a random function with a certain spatial covariance, co-kriging-including information about correlations of two or more attributes to improve quality of interpolation, disjunctive Kriging, and zonal Kriging. Kriging has been less successful for applications where local geometry is the key issue (which is the case in this study) and other methods prove to be competitive or even better (Hardy, et al.). Local neighborhood approach methods are based on the assumption that each point influences the resulting surface only up to a certain finite distance (Martin, 1999), among the methods are IDW, natural neighbor, TIN and rectangle based methods[60].

The point-based interpolation can be exact interpolators providing the true value at data point in that the surface passes through all those points. This concept is being used in this study in TIN interpolation (section