You are on page 1of 20

Transactions in GIS, 2003, 7(4): 447 466

Research Article

Building Web-Based Spatial Information Solutions around Open Specications and Open Source Software
Geoffrey Anderson
Cloudshadow Consulting, Inc Boulder, Colorado

Rafael Moreno-Sanchez
Department of Geography University of Colorado at Denver

Abstract
Geographic Information Systems (GIS) are moving from isolated, standalone, monolithic, proprietary systems working in a client-server architecture to smaller web-based applications and components offering specic geo-processing functionality and transparently exchanging data among them. Interoperability is at the core of this new web services model. Compliance with Open Specications (OS) enables interoperability. Web-GIS softwares high costs, complexity and special requirements have prevented many organizations from deploying their data and geo-processing capabilities over the World Wide Web. There are no-cost Open Source Software (OSS) alternatives to proprietary software for operating systems, web servers, and Relational Database Management Systems. We tested the potential of the combined use of OS and OSS to create web-based spatial information solutions. We present in detail the steps taken in creating a prototype system to support land use planning in Mexico with web-based geo-processing capabilities currently not present in commercial web-GIS products. We show that the process is straightforward and accessible to a broad audience of geographic information scientists and developers. We conclude that OS and OSS allow the development of web-based spatial information solutions that are low-cost, simple to implement, compatible with existing information technology infrastructure, and have the potential of interoperating with other systems and applications in the future.

1 Introduction
With a few exceptions (e.g. the Geographic Resources Analysis Support System GRASS; http://www.cecer.army.mil/grass/GRASS.main.html), geographic information technology
Address for correspondence: Rafael Moreno-Sanchez, Department of Geography, University of Colorado at Denver, Campus Box 172, P.O. Box 173364, Denver, CO 80217-3364, USA. E-mail: rmoreno@carbon.cudenver.edu
2003 Blackwell Publishing Ltd, 9600 Garsington Road, Oxford OX4 2DQ, UK and 350 Main Street, Malden MA 02148, USA.

448

G Anderson and R Moreno-Sanchez

has developed as isolated, standalone, monolithic, proprietary systems. This is rapidly changing as geo-processing principles and functionality are moving out of a tightly dened niche into the information technology (IT) mainstream. Isolated, standalone systems are being replaced by integrated components, and large applications are being replaced by smaller, more versatile applications that work together transparently across networks. Of these, the World Wide Web (WWW or the web) is becoming the core medium for distributed computing in IT generally and in the geo-processing domain specically (Hecht 2002b). In other words, Geographic Information Systems (GIS), once focused on data and tools implemented with client-server architecture, now are evolving to a web services model (Dangermond 2002). In this new architecture the web is used for delivering not just data, but geo-processing functionality that can be wrapped in interoperable software components called web services. These components can be plugged together to build larger, more comprehensive services and /or applications (Hecht 2002c). Interoperability between heterogeneous environments, systems and data is fundamental for the implementation of this web services model. With respect to their IT infrastructure, organizations aim to: (1) maximize productivity and efciency; (2) protect critical information infrastructure; and (3) overcome problems related to data sharing, security and data maintenance, as well as software special requirements and steep learning curves. The WWW offers the potential benets of exibility, ubiquity, and reduced costs and risks of obsolescence and isolation. However, when organizations try to use the web as a platform to deliver geographic data and provide geo-processing functionality to their end users, they commonly nd that commercial web-GIS software raises the following issues: (1) it does not currently offer out of the box geo-processing functionality to perform many of the analyses demanded by their end users; (2) it is expensive; (3) it has a steep learning curve; (4) it requires that some of their IT personnel become specialists in the software operation and maintenance; and (5) it is difcult to integrate with existing IT infrastructure (personnel skills, software and applications). The use of OS and Open Source Software (OSS) offer the potential to overcome the abovementioned issues and facilitate the deployment of geographic data and geo-processing functionality on the WWW. There is a growing interest in the use of OS. For example, the British Ordnance Survey is using the Geographic Markup Language (GML) OS to deliver the Digital National Framework on the web and to mobile devices (Holland 2001; http://www.ordinancesurvey.co.uk/dnf /home.htm). According to Lowe (2002), there is a growing market for OSS products fed by small organizations and regional government agencies that cannot afford proprietary softwares (web-GIS, DBMS and web servers) costs, complexity, steep learning curves, training costs, and special requirements. There are already several successful examples of the use of OSS to create basic web-mapping functionality (see cases described by Lowe 2002 and Ramsey 2002). In spite of this growing interest, little has been published about the combined use of OS and OSS for the creation of web-based geo-processing applications. Even less is found in the form of detailed explanations of how they can interact and complement each other to create these applications. This article aims to contribute to the knowledge base about OS and OSS, and how they can be used to create web-based spatial information solutions. We demonstrate the process through a case study in which we created a prototype system to support land use planning in central Mexico. The system implements geo-processing functionality currently not available out of the box in commercial web-GIS software. The article is organized as follows: section 2 denes OS, OSS and
Blackwell Publishing Ltd. 2003

Web-Based Spatial Information Solutions

449

interoperability, and provides the necessary background regarding the organizations and efforts to create OS; section 3 denes and provides a brief background of the specic OS and OSS technologies used to create the web-based spatial information system described in this paper; section 4 presents a brief background about the need for the system and a detailed explanation of the process followed to create a prototype web-based spatial information system with querying and Boolean-intersect overlay geo-processing capabilities to support land use planning in central Mexico; section 5 presents a discussion of the implications and difculties in applying these technologies; and nally, section 6 presents conclusions and suggestions for future research and implementations.

2 Dening Open Specications (OS), Interoperability, and Open Source Software (OSS)
Open Specications provide software engineers and developers information about a given specication as well as specic programming rules and advice for implementing the interfaces and/or protocols that enable interoperability between systems. The OpenGIS Consortium Inc. (OGC) (http://www.opengis.org) denes interoperability as the ability for a system or components of a system to provide information portability and interapplication cooperative process control. In the context of the OGC specications this means software components operating reciprocally (working with each other) to overcome tedious batch conversion tasks, import/export obstacles, and distributed resource access barriers imposed by heterogeneous processing environments and heterogeneous data. Herring (1999) and Kottman (1999) present an in-depth discussion of the OpenGIS Data Model and the OGC process for the creation of OS respectively. Software products can be submitted for testing their interfaces for compliance with OGC OpenGIS Implementation Specications (see http://www.opengis.org/techno/implementation.htm for the most recent approved and in process specications). Initially, the only OpenGIS Specications that products could conform to were the OpenGIS Simple Features Specications for CORBA, OLE/COM and SQL (McKee 1998), but there are now 11 different specications. Within computer environments there are many different aspects of interoperability (Vckovski 1998): (1) independent applications running on the same machine and operating system, i.e. interoperability through a common hardware interface; (2) application A reading data written by another application B, i.e. interoperability through a common data format; and (3) application A communicating with application B by means of interprocess communication or network infrastructure, i.e. interoperability through a common communication protocol. Besides technical issues, there are also interoperability topics at higher levels of abstraction such as semantic barriers (Harvey 1999, Seth 1999). A system based on the OS described in a later section of this article would be able to achieve a level of interoperability of the second abovementioned type. According to Hecht (2002b) interoperability is desirable for the following reasons: (1) it allows for communication between information providers and end users without requiring that both have the same geo-processing or viewer software; (2) no single Geographic Information System (GIS), mapping tool, imaging solution or database answers every need; (3) there are large numbers of database records with a description of location that have the potential to become spatial data, and also, advances in several technologies (e.g. GPS integrated into mobile devices) are increasing the number of database
Blackwell Publishing Ltd. 2003

450

G Anderson and R Moreno-Sanchez

records with location information; (4) the number of software companies offering components to deal with geographic information is growing; (5) it is more efcient to collect data once and maintain them in one place (this is particularly cost effective if communities of users can nd, access and use the information online, so they do not need to access, retrieve and maintain whole les and databases of information for which others are responsible); (6) the ability to seamlessly combine accurate, up-to-date data from multiple sources opens new possibilities for improved decision making and makes data more valuable; and (7) the ability for multiple users, including non-GIS experts, to use a particular set of data (perhaps at different levels with different permissions) also makes the data more valuable. Gardels (1997) discusses how compliance with OGCs OpenGIS specications and the resulting interoperability can contribute to integrating distributed heterogeneous environments into on-line environmental information systems (EIS). He points to three technical strategies (federation, catalogs and data mining) for the integration of these systems, and how they are heavily dependent on interoperability among diverse data sources, formats and models. He concludes that properly designed geodata access and analysis tools, combined with open environmental information systems, can provide sophisticated decision support to the users of geographic information. Two organizations have been coordinating the development of the open specications used in this paper: the OpenGIS Consortium Inc. (OGC) (http://www.opengis.org) and the World Wide Web Consortium (W3C) (http://www.w3.org). The W3C has created more than forty technical specications (http://www.w3.org /TR /) and as of January 2002, the OGC has adopted nine OpenGIS Implementation Specications and 11 candidate specications are in the works (Hecht 2002a; a roadmap to the specications work is presented at http://www.opengis.org /roadmap/index.htm). Briey, Open Source Software (OSS) are programs whose licenses give users the freedom to run the program for any purpose, to modify the program, and to freely redistribute either the original or modied program without further limitations or royalty payments (http://www.opensource.org/docs/denition.php). Among the most well known OSS projects are the Linux operating system and Apache web server. Sometimes the term Open Technologies is used to refer to these projects and others such as XML, HTML, TCP /IP, and Java technology. A comprehensive list of GIS-related OSS can be found at http://opensourcegis.org/. According to Wheeler (2002), OSS reliability, performance, scalability, security and total cost of ownership are at least as good or better than its proprietary competition, and under certain circumstances, they are a superior alternative to their proprietary counterparts.

3 Background on the Specic OS and OSS Used to Create a Web-based Spatial Information System
This section provides background information about the origin and relationships among the OS and OSS we used. We also point to their relevance for the creation of web-based geo-processing functionality. The Extensible Markup Language (XML) is a subset of the Standard Generalized Markup Language (SGML) [ISO 8879] (http://www.w3.org/TR/1998/). XML uses pairs of text-based tags, enclosed in parentheses, to describe the data. These tags make the information passed across the Internet self describing (Waters 1999). Part of its success comes from: (1) the fact that it can be read and written by humans (in contrast
Blackwell Publishing Ltd. 2003

Web-Based Spatial Information Solutions

451

with binary formats) and thus provides a single way of representing structure regardless of whether the information is intended for human or machine consumption; and (2) its similarity to the widely used Hyper Text Markup Language (HTML). XML satises two compelling requirements, rstly it separates data from presentation, and secondly, it transmits data between applications. XML is a metalanguage, i.e. a language that describes other languages (Boumphrey et al. 1998; http://www.xml.com). These languages are called XML schemas (for a detailed denition of what constitutes a schema, and how new schemas can be created see Ducket et al. 2001). There are schemas for over 40 different areas of expertise (http://www.xml.org/xml/registry.jsp presents a registry of XML schemas). In the web-based geo-processing arena, XML is being used to exchange metadata and control information between computers, and between them and humans. According to Aloisio et al. (1999), XML will play a major role in enabling computers to communicate universally with other computers, and to create a new generation of web services designed to interact with other services. They also reafrm that XML is simple and powerful and its similarity to HTML ensures universal adoption. Scalable Vector Graphics (SVG) and the Geography Markup Language (GML) are XML schemas. The rst is a vector graphics language written in XML to describe two-dimensional graphics. The second is an XML encoding for the transport and storage of geographic information, including both the spatial and non-spatial properties of geographic features. SVG is a W3C open specication (http://www.w3.org/TR/SVG/). GML is an OGC open specication (http://www.opengis.net/gml/02-069/GML2-12.html). In SVG the graphical elements are represented within XML tags, hence SVG offers all the advantages of XMLs openness, transportability, and interoperability (Eisenberg 2002). SVG drawings can be dynamic and support embedded interactivity, animation, embedded fonts, XML code, Cascading Style Sheets and scripting languages. A rich set of event handlers such as onmouseover and onclick can be assigned to any SVG graphical object. For example, we used the onmouseover event to show real-world coordinates as the user moves the mouse over the SVG map. SVG is capable of using real world coordinate systems in contrast to other popular vector graphics formats such as Macromedia Flash (Neumann 2002 compares the capabilities of SVG and Flash to handle vector graphics in web applications). All these features make SVG appealing for the graphical representation of geographic data on the web (Gould and Ribalaygua 1999). Puhretmair and Woss (2001) used dynamically generated SVG maps as an intuitive interface to present tourist information contained in distributed data sources. The information is distributed among several servers and websites and is structured in different ways. XML is used to create query tools and integrate the data by communicating with the different services. Then the SVG capabilities to support embedded interactivity, animation, embedded fonts, XML, Cascading Style Sheets and scripting languages are used to create on the y maps as response to queries. The SVGOpen Conference is an excellent source for the growing eld of SVG applications (http://www.svgopen.org). Lake (2001a) briey presents the organization of the GML specication. In GML the geometries and attributes of geographic layers are represented within XML tags, again, this brings forth all the advantages of XMLs openness, transportability and interoperability. GML is designed to support interoperability and does so through the provision of basic geometry tags (all systems that support GML use the same geometry tags), a common data model (features/properties), and a mechanism for creating and sharing application schemas (see the GML 2.1.2 specication at http://www.opengis.net/gml/ 02-069/GML2-12.html). GML conforms to the OGCs Simple Features speciciations
Blackwell Publishing Ltd. 2003

452

G Anderson and R Moreno-Sanchez

and it is not concerned with the visualization of geographic features such as the drawing of maps. Hence, we used SVG for the graphical representation of the data in GML format. GML is as critical to the evolution of the geospatial infrastructure on the web as HTML was to the development of the conventional Internet (Lake 2001b). GML supports geospatial interoperability in various ways (Lake 2001a), rstly it provides a common schema framework for the expression of geospatial features; secondly it provides a common set of GML geometry types, this allows authors of different schemas to share the same mechanisms for geometry description and hence be able to interpret the correspondence between the schemas when they are referring to the same feature in the real world; and third, the denition and publication of GML schemas that can be shared across communities of interest such as transportation, environmental issues, petroleum exploration, etc. facilitates interoperability on the semantic level. XLST (Extensible Stylesheet Language: Transformations) is one of three parts that compose a bigger language called XSL (Extensible Stylesheet Language). XLST is a W3C open specication (http://www.w3.org/Style/XSL/). Essentially it is an XML based language for transforming the structure of XML documents for display on screen, on paper, or spoken word (Kay 2001). In addition, XLST is commonly used to transform data from one data model (e.g. text) in one application to the data model used in another (e.g. SQL statements to create a table in a Relational Database Management System). The XLST formatting code contained in a text le is known as a Style Sheet. XML documents are commonly processed through parsing. Geographic data in GML format tend to be huge text les (Sahay 2002), therefore it is critical to use the most efcient parsing method to process them. The SAX (Simple API for XML) parsing method has been proven to be more efcient than its alternative DOM (Document Object Model) method for processing GML documents (Sahay 2002). The use of SAX results in reduced memory overhead compared to DOM, which requires the retention of the complete document as a tree in memory. In our application, we used the SAX method to extract the geometries from the geographic layers in GML format and convert them to a format more amenable to spatial analysis such as Java2D objects. The Java2D Application Programming Interface (API) is part of the Java Development Kit (JDK). It is used for manipulation of two-dimensional objects. The Java2D API includes the Constructive Area Geometry Methods for the Boolean overlay operations intersection, union, subtraction and exclusive-OR (http://java.sun.com/products/javamedia/2D /). The JDK is free and includes a Java2D demonstration. PHP (acronym derived from its origin as Personal Home Page Tools) is a server-side, HTML-embedded, cross-platform scripting language (Rasmus 2000; http://www.php.net/). It borrows concepts from other common languages such as C and Perl. PHP provides a way to put instructions into HTML les to create dynamic content. The developer can embed PHP structured code (e.g. loops, conditionals, rich data structures) inside HTML tags. PHP is an OSS. We used it on the server side for process control, processing of the users input, and to invoke and pass parameters to applications. PostgreSQL is a sophisticated Object-Relational Database Management System (RDBMS), supporting almost all SQL constructs, including subselects, transactions, and user-dened types and functions. It is the most advanced OSS database available today (Stinson 2001; Stones and Matthew 2001; http://www.postgreSQL.org/). PostGIS, which is also an OSS, is an extension of the PostgreSQL RDBMS that adds support for geographic objects (http://postgis.refractions.net/). In effect, PostGIS spatially enables the PostgreSQL server, allowing it to be used as a backend spatial database for Geographic
Blackwell Publishing Ltd. 2003

Web-Based Spatial Information Solutions

453

Information Systems (GIS), much like ESRIs Spatial Database Engine (SDE) or Oracles Spatial extension. PostGIS follows the OGC Simple Features Specication for SQL (http://www.opengis.org/techno/implementation.htm). MapServer is an OSS development environment for building basic web-mapping applications (http://mapserver.gis.umn.edu/). It facilitates the display and browsing of geographic data in commonly used vector and raster formats. It is not designed to be a full-featured GIS system and hence it does not offer geo-processing functions. Linux is a free Unix-type operating system. Specically, we used the Red Hat Linux distribution (http://www.redhat.com). The Apache web server is a Hyper Text Transfer Protocol (HTTP) compliant web server. It is an OSS maintained by the Apache HTTP Server Project (http://www.apache.org). As of August 2002, 63% of the web sites on the Internet are run on the Apache web server (Netcraft Web Server Survey; http:// www.netcraft.com/survey /).

4 The Case Study Creating a Prototype Web-based Spatial Information System to Support Land Use Planning in Central Mexico
4.1 The need for the web-based system
During the decade of the 90s, the National Institute for Forest, Agriculture and Livestock Research (INIFAP) in Mexico started to use GIS as part of its land suitability studies. The largest of these studies was a strategic level national land use planning project to identify the areas with potential to grow specic crops, forage and forestry species considered of economic relevance for the country. A spatial database of national coverage was created to support this study. The database is organized in the following layers: soils (digitized from 1:50,000 scale maps with information about primary and secondary soil type, and presence of chemical or physical phases), a digital elevation model at 30 meters resolution (elevation and slope are derived from it) and several climate layers (30-year monthly and annual averages for minimum and maximum temperatures, precipitation and evaporation). These layers of information can be combined to derive other parameters that help to estimate the suitability of an area for a specic crop, such as the evaporation/precipitation coefcient. For each agricultural, forage and forestry species considered of strategic importance for the country, INIFAPs researchers compiled a list of the values of the environmental factors (soil type; slope; precipitation; maximum, minimum temperatures; etc.) that are considered ideal for the growth of the particular species. Together with this information they compiled what is called a technological package which is a handbook of best practices for the production of the species in question. State and federal government agencies, researchers, land owners, seed companies, entrepreneurs, and agricultural insurance companies, among others, can request the identication of the areas that fulll the environmental factors for the production of the species of their interest. These areas are identied by querying the spatial database layers for the specic range of values considered ideal for the species in question. These query results are overlaid using a Boolean intersect operation to nd the areas where all the desired environmental parameters are present. After the completion of the rst studies, INIFAP created specialized units to provide service to users of this information. This service is peripheral to INIFAPs research responsibilities. However, soon after starting this service the units were overwhelmed by requests to identify areas that meet specic environmental requirements. INIFAP is
Blackwell Publishing Ltd. 2003

454

G Anderson and R Moreno-Sanchez

in need of an alternative way to fulll these demands in a more timely, efcient and economical way. In the past INIFAP had considered using the web as a platform to serve their geographic data or selected pieces of it, and to perform the previously described geo-processing analyses. However, several factors had deterred them from further pursuing this option: (1) the required Boolean overlay geo-processing capabilities currently do not exist out of the box in commercial web-GIS systems; (2) the high costs of web-GIS software; (3) the special requirements of this software in terms of dedicated personnel and lengthy training; and (4) concerns about the compatibility of the web-GIS software with existing IT infrastructure (personnel skills, software and applications). We decided to test the viability of using OS and OSS technologies to create a web-based spatial information system for non-expert users that will overcome these issues. The aim of the system is to allow end users to perform queries for desired values of environmental factors and do Boolean overlays of the results of these queries to identify the areas where all the desired environmental factors are present. By changing the values in their queries end users can have a quick idea of the effects of these changes on the areas selected and their intersection.

4.2 Creating the prototype system


The state of Guanajuato in central Mexico was selected as a pilot area to create a prototype system. INIFAP stated the following preferences regarding the design and development strategy for the system. It would: (1) easily integrate with the remainder of its existing IT infrastructure (personnel skills, software and applications); (2) be scalable, with initial low costs and low total costs of ownership of the system over the long run; (3) minimize special requirements; (4) not imply steep learning curves; and (5) eventually improve the efciency in the expansion, maintenance and quality control of the national database by centralizing these functions in one place. Next we present the step-by-step process to create a prototype web-based spatial information system around OS and OSS that is capable of processing attribute queries and Boolean intersection overlays. The prototype system is based on a PC computer with average technical specications and a fast (T1) Internet connection. It runs the Linux (specically Red Hat Linux) operating system and the Apache web server. For illustration purposes in this article, we have taken a subset of the data and translated the interface prompts. This demo system and all the referenced source code and relevant web links can be found at http://206.168.217.254/guanajuato/. The demo on this website has detailed instructions of how to perform queries, intersect overlays and display the results. The prototype system continues to evolve. Some improvements to the rst version will be described in the discussion section. The interface of the prototype version that is presented in this website still contains interaction steps that eventually will be hidden from the end users. However, at this point the exposure of these steps serve to closely illustrate each of the processes that occur in the system when processing a query and overlay request. On the interface, the user is presented with a screen divided into two areas, on the left, an area to input queries and present instructions to the users, and a map display area on the right (see Figure 4). To input queries the user uses drop-down boxes to choose values (or a range of values) for each of the environmental factors (e.g. soils, elevation, temperature) contained in the spatial database. After selecting the desired values for each parameter, the user hits the submit query button. The map display window is built
Blackwell Publishing Ltd. 2003

Web-Based Spatial Information Solutions

455

using MapServer (http://mapserver.gis.umn.edu/). In it the user selects which layers he or she wants to display, including the features in each layer that were selected in the query. The map must be refreshed every time the type or number of layers displayed is changed. Visually it can be determined if there is an area where all the selected features intersect. If this is the case, the next step is to write GML documents that represent the selected features for each layer. Finally, the intersection of these selected areas is calculated and the resulting area is output as an SVG le and as a GML le. The SVG le can be displayed by itself or with other layers using the SVGeoprocessor application. The SVGeoprocessor application takes the image generated by MapServer as background and then overlays the intersection area in SVG format. The SVG interactivity capabilities are used to respond to users actions as described in STEP 7. All of the following described processing takes place on the server side of the system. The client side is only used to present the user with input forms and graphical output resulting from his requests. Figure 1 presents a ow diagram of the steps required to respond to a users request for queries on the thematic layers and overlay intersect geo-processing. In the description of the steps that follow we will explain this diagram in detail. STEP 1: In this step, (see Figure 1) the layers for the pilot area that were originally in ESRIs (Environmental Research Systems Institute, Redlands, California) shapele format are converted to tables in the PostgreSQL RDBMS. This conversion is achieved using the shp2pgsql utility included as part of the PostGIS extension. This utility takes a shapele and outputs a series of SQL statements (e.g. CREATE TABLE and INSERT) to create a table in the PostgreSQL RDBMS (Figure 2). The resulting table contains all the attributes of the shapele including the coordinates that dene each feature These SQL statements are then executed in PostgreSQL to create a table that represents the shapele (Figure 3). The shp2pgsql utility allows for the selection of a projection for the data in the resulting table. PostGIS contains a le with close to 1800 projection denitions to choose from. This process was repeated for each of the layers provided (soils, elevation, and the climate layers). STEP 2: In the query building interface (an HTML form; see Figure 4) the user queries the database for the parameters of interest for each layer. When the system nishes processing the intersect query, the area where all the requested environmental parameters intersect is displayed on a SVG map. If the requested environmental parameters do not intersect a message is sent to the user. The user also has the option to invoke the overlay processor interface (Figure 5). This HTML form informs the user rst about the number of features in each layer that meet the requested parameters, and second the number of selected features whose bounding boxes intersect the bounding boxes of the selected features in a second layer. By analyzing these numbers, the user should be able to: (1) see how many features satisfy the specied parameters for each layer, (2) identify which is the most limiting environmental factor in his or her query based on the number of selected features, and (3) identify which layers do not intersect. With this information the user can perform sensitivity analyses by changing the requested parameters for one or more layers. STEPS 3 and 4: The parameters entered by the user are sent to the server using the HTML form. In the server a PHP script converts the users input for each layer into a SQL statement (e.g. SELECT all FROM elevation WHERE elevation 1500 AND elevation 3000) that is fed to the PostgreSQL-PostGIS RDMS. PostgreSQL-PostGIS is invoked from the PHP script to execute the SQL statement which returns a string of coordinates describing the features selected in the layer (together with any of its attributes requested). This string is parsed in another PHP script to create a GML polygon and/or
Blackwell Publishing Ltd. 2003

456

G Anderson and R Moreno-Sanchez

Figure 1 Flow diagram of the processes taking place in the prototype web-based spatial information system

multipolygon entity in a GML document that now will represent the selected features and attribute (Figure 6). STEP 5: The resulting GML documents corresponding to the features selected in each layer (e.g. SoilsSelected.xml, ElevationSelected.xml, etc.), are input into a Java program that computes the intersection of the features (one pair at the time). This program denes a Java class (GMLoverlay.class) that uses the SAX parsing method to search each GML
Blackwell Publishing Ltd. 2003

Web-Based Spatial Information Solutions

457

Figure 2 SQL statements that are output from the shp2pgsql utility in PostGIS extension to convert Shapeles to tables in the PostgreSQL RDBMS

Figure 3 This SQL command SELECT displays all the records (that represent features) in the table simpleshape (that represents a layer). For illustration purposes we are showing a layer with a single feature

document for the geometries and writes each polygon to an array of Java2D area objects. These objects are then intersected using the area intersect function contained in the Java2D API (Figure 7). The intersection result is then output as a GML document (intersection.xml).
Blackwell Publishing Ltd. 2003

458

G Anderson and R Moreno-Sanchez

Figure 4 Sample HTML form

STEP 6: A XSLT Style Sheet (svg.xsl; see Figure 8) was created to transform the GML code contained in the intersection.xml le to SVG graphics for display as a map. Again, Style Sheets are custom-made XML instructions for formatting XML les. By creating other XSLT Style Sheets, the GML output (intersection.xml) can be potentially formatted into any text-based format, for example, any of the existing XML schemas for over forty different areas of expertise, HTML, delimited text, or UNGENERATE Arc/Info format. STEP 7: For convenience and due to the short development cycle for the prototype, we used MapServer to provide the map layout interface (frames, legend, scale bar and zoom levels) and for rendering the individual layers contained in the spatial database. The graphics generated by MapServer are then used as background for the display of the SVG map representing the intersection result. We used the SVG interactivity capabilities as follows: (1) the onmouseover event is handled to report the real-world coordinates of the mouse position on the SVG map, and (2) the onclick event is handled to send a query to the PostgreSQL-PostGIS database that returns the values for each of the layers at the point where the mouse is clicked (the equivalent of spearing through the layers and pulling out the values for each layer). Plate 1 shows the SVG map that the user gets as response to his query and geo-processing request.

5 Discussion
We demonstrated how spatial information in a proprietary GIS format can be converted to tables and managed in a RDBMS environment. This simple transformation makes the information contained in GIS layers easier to combine and process with information contained in other DBMS applications (such as accounting and inventory systems) that might be part of the organizations IT infrastructure. In the case of INIFAP, after this transformation, geographic information (such as the extent of a feature of interest, or distance between two features) can be combined and analyzed in a RDBMS environment (without having to link to external GIS systems) with information about crops
Blackwell Publishing Ltd. 2003

Web-Based Spatial Information Solutions

459

Figure 5 The overlay processor interface

yields, rural census data, or results of fertilization experiments contained in other RDBMS applications. Of course the existing RDBMS do not (and probably never will) have all the spatial analytical capabilities of a GIS system. However, they are constantly evolving and in the near future they will have enough capabilities to satisfy many simple geo-processing requirements. For example, currently PostgreSQL-PostGIS is capable of performing over sixty spatially related operations such as nding the extent of a feature or group of features, distance in projected units between two features, selection of features to the left or right of a feature, and intersection of two feature extents. There are plans in the near future to add topological operators to the PostGIS module including: touches, contains, overlaps, buffer, union and difference. Hence in the future the Boolean intersect overlay operation we implemented in the prototype system could be performed directly in the PostgreSQL-PostGIS RDBMS using database records.
Blackwell Publishing Ltd. 2003

460

G Anderson and R Moreno-Sanchez

Figure 6 Example of GML code representing a single feature (see Figure 9) selected from one of the layers in the spatial database

We also chose to transform the geographic layers from ESRIs shapele format to tables in a RDBMS system in STEP 1 where SQL queries were executed to extract desired features and attributes from each layer. Then using a PHP script, the strings of text returned by these queries were converted to GML documents that represent the selected spatial features and their attributes. Once the GIS layers are in GML format, they can be passed to any system, application or geo-processing service that is able to read this Open Specication. This is how the use of this OS enables a certain degree of interoperability between applications. These applications or services can reside on a single machine, on a local-area network, or on any server connected to the Internet. A packet of information (e.g. a layer or pieces of it in GML format) can be passed from application to application (and these could be of very different nature) adding or extracting information to or
Blackwell Publishing Ltd. 2003

Web-Based Spatial Information Solutions

461

Figure 7 This piece of Java code links to the Java2D API and invokes the Constructive Area Geometry Methods contained in this API to generate the intersection, subtraction, addition, and exclusiveOr of two Java2D area objects. We currently using only the CASE 3 for the intersect method

from the original packet until the desired end result is obtained. This is how interoperability facilitates distributed processing. In other words, compliance with Open Specications enables interoperability between heterogeneous environments and systems, facilitates distributed processing and opens new possibilities for the combination and processing of geographic data. We also used a PHP script to convert the string returned as a result of the query for the desired features (records) in the PostgreSQL-PostGIS RDBMS to a GML document in STEP 4. This process worked well for small data sets; however, when we started to
Blackwell Publishing Ltd. 2003

462

G Anderson and R Moreno-Sanchez

Figure 8 XSLT Style Sheet to transform the GML document (intersection.xml), which represents the area where the selected environmental parameters intersect to SVG for display on the end users browser

process larger areas the processing time was unacceptable (several minutes). To minimize the number of selected features that have to be converted to GML we used the bounding box intersect function within PostgreSQL-PostGIS. This function returns the records (features) from one table (layer) whose bounding boxes intersect the bounding boxes of features in a second layer. In this way only the records that fulll the query and whose bounding boxes intersect with bounding boxes of features from a second layer are returned for conversion to GML documents. This preprocessing greatly reduces the number of records that must be converted to GML for performing the actual intersection of features in the GMLoverlay.class program, and allowed us to process larger geographic areas. In the latest version of the prototype system we were able to signicantly improve the response times by: (1) replacing the use of PHP scripts to control the processes ow by PERL CGI scripts; and (2) consolidating the conversion to SVG and GML formats of the overlay result (intersect.xml) into the GMLoverlay.class Java class. The full code for this implementation can be found at the demonstration website (http://206.168.217.254/guanajuato/). We then created a Java class (GMLoverlay.class) that implements the intersect function that is part of the Constructive Area Geometry Methods included in the Java2D API in STEP 5. In the same way we could have as easily implemented any of the other operators that are included in the Java2D API (union, subtraction, and exclusive-OR) to provide these geo-processing capabilities in the system. As a matter of fact the Java code in Figure 7 is invoking these methods from the Java2D API, we are just not currently using them. In STEP 6 we demonstrated the use of XSLT Style Sheets to convert a layer in GML format (intersection.xml) to SVG (for graphical display). If geographic data in GML
Blackwell Publishing Ltd. 2003

Web-Based Spatial Information Solutions

463

format gains popularity, Style Sheets could be developed to transform GML documents to a wide array of formats (e.g. any text format, any XML schema, or GIS proprietary formats) used by other systems and applications. In addition, through the use of XSLFormatting Objects (XSL-FO) these data could be converted to different printing formats for high quality output. Eventually, libraries of Style Sheets could be posted on websites, downloaded to reside locally, or invoked directly from remote servers to perform a transformation. This possibility would greatly increase the speed and ease with which geographic data is made available to a broader array of IT applications.

6 Conclusions
State-of-the-art web-based geo-processing solutions can be implemented using currently available OSS and OS. The required technology for solving spatial problems in an Internet based computing environment is available from at least one mature OS project. We used the OSS we consider to be the most powerful, widespread, accessible, easy to learn, and with a good level of user support in the form of software documentation, books and user-groups forums. Typical OSS installation involves downloading the source code for the target Operating System (e.g. Linux, Windows XP, NT or 2000), identifying and downloading other required software components, conguring the desired features and compiling the application. This process is straightforward and routine for most personnel with general IT backgrounds, but it could be intimidating for casual users with little programming experience. However, most mature OSS are well supported with thorough installation instructions and any motivated GIS user should be able to install and start using the OSS presented in this article. Given that the purpose of the project here described was exploratory, we ended up using a wider array of OS and OSS than would probably have been optimally required to create the functionality present in the prototype system. For the same reason, we took more than the strictly necessary steps to produce the desired results. For the development of the prototype system we ended up using more than 10 Open Source technologies. Even by using the minimum required number of OSS and OS, one of the issues faced when implementing advanced OSS web-based GIS solutions is the breadth of technical skills required and the logistics of orchestrating the interaction of many applications. Designing web-based GIS solutions requires a thorough understanding of core WWW technologies (such as the conguration and management of web servers), spatial information management expertise, and the ability to choreograph the geoprocessing steps required to solve spatial problems in a distributed environment. It is not difcult to nd these necessary skills in an IT department or in a highly motivated power user. In developing the prototype system, we learned: (1) the potential of SVG to develop highly interactive mapping applications on the web; (2) PostgreSQL-PostGIS is a robust database management system that offers a considerable, and continuously increasing, number of geo-processing functions; (3) Java2D can be effectively used for basic 2D vector overlay (a specically geo-spatially oriented Java API such as the OS Java Topology Suite (http://www.vividsolutions.com/jts/jtshome.htm) is clearly preferable for more advanced geo-processing, but Java2D is a good starting point for performing vector analysis with Java); and (4) MapServer proved to be an easy to use and production quality Internet map server.
Blackwell Publishing Ltd. 2003

464

G Anderson and R Moreno-Sanchez

Figure 9 SVG graphic of a simple feature. Figure 6 shows the GML code from which this image is generated

From this experience, we can conclude that for organizations with scarce resources wanting to implement the distribution of their geographic data and geo-processing services over the WWW, the use of the OS and OSS we used offer the following advantages: (1) no software costs; (2) software tools that were easily learned by personnel with general IT background (UNIX, programming, databases design and management); (3) small software footprints; (4) no need to commit to a proprietary web-GIS, DBMS or web software with their associated costs; (5) ease of compatibility with existing IT infrastructure (personnel with basic databases and programming skills, existing DBMS software and DBMS applications); (6) exibility to implement geo-processing capabilities currently non-existent in commercial web-GIS software (e.g. Boolean intersect overlays); (7) the principles to implement these technologies are straightforward and accessible to a broad audience of geographic information scientists and developers; and (8) the system developed has the potential to interoperate with other systems and applications that use the same OS. Our experience also showed that the resulting text les tend to be large when geographic data are converted to GML format, as illustrated by the GML code (Figure 6) that is required to represent a single small and geometrically simple polygon (Figure 9). The size of the GML les depends on the number of features and the number of points per feature contained in a layer (Sahay 1999 provides a formula for calculating the storage size of a GML document based on these two parameters). As an example, a layer in ESRIs shapele format of size 342,708 bytes would occupy 599,473 bytes in its corresponding GML representation. In addition, because GML up to version 2.1.2 does not support topology, common boundaries between features must be stored twice (once for each feature). The size of the GML les could affect secondary storage (e.g. hard drive or tape) requirements, as well as, the time required to parse the le to extract desired information. A large implementation would greatly benet from le compression algorithms and highly efcient parsing methods. More research is required on both of these areas.
Blackwell Publishing Ltd. 2003

Web-Based Spatial Information Solutions

465

Next, we will test the scalability capacity of these technologies to create the full implementation for INIFAPs web-based spatial information needs. In addition, we are planning to make a full evaluation of the reliability, performance, security, and total costs of ownership for the system. We also need to make the overlay processor interface more user friendly and improve its capacity to explain the intersection results obtained. Finally, so far we have dealt only with geographic data in vector format, and we are currently working on developing web-based geo-processing capabilities for raster data.

Acknowledgements
The authors would like to thank the National Institute for Forest, Agriculture and Livestock Research (INIFAP) Central Region and especially Dr. Hilario Garcia-Nieto for their support and cooperation in the development of this project. We would also like to thank the anonymous reviewers for their helpful comments and suggestions for improvement of an earlier draft of this manuscript.

References
Aloisio G, Milillo G, and Williams R D 1999 An XML architecture for high-performance web-based analysis of remote-sensing archives. Future Generation Computer Systems 16: 91 100 Boumphrey F, Direnzo O, Duckett J, Graf J, Houle P, Hollander D, Jenkins T, Jones P, KingsleyHughes A, Kingsley-Hughes K, McQueen C, and Mohr S 1998 XML Applications. Birmingham, Wrox Press Dangermond J 2002 Web services and GIS. Geospatial Solutions 12(7): 56 Ducket J, Grifn O, Mohr S, Norton F, Stokes-Rees I, Williams K, Kurt Cagle, Nikola O, and Tennison J 2001 Professional XML Schemas. Birmingham, Wrox Press Eisenberg J D 2002 SVG Essentials. Sebastopol, CA, OReilly & Associates Gardels K 1997 Open GIS and on-line environmental libraries. SIGMOD Record 26: 32 8 Gould M and Ribalaygua A 1999 A new breed of web-enabled graphics. GeoWorld 12(3): 46 9 Harvey F 1999 Designing for interoperability: Overcoming semantic differences. In Goodchild M F, Egenhofer M, Fegeas R and Kottman C (eds) Interoperating Geographic Information Systems. Boston, MA, Kluwer: 85 97 Hecht L 2002a Get your free interoperability roadmap. GeoWorld 15(2): 22 3 Hecht L 2002b Insist on interoperability. GeoWorld 15(4): 22 3 Hecht L 2002c Web services are the future of geo-processing. GeoWorld 15(6): 23 4 Herring J 1999 The OpenGIS data model. Photogrammetric Engineering and Remote Sensing 65: 585 8 Holland D 2001 Delivering the digital national framework in GML. GeoEurope 10(8): 29 30 Kay M 2001 XLST Programmers Reference (Second edition). Birmingham, Wrox Press Kottman C 1999 The Open GIS Consortium and progress toward interoperability in GIS. In Goodchild M F, Egenhofer M, Fegeas R and Kottman C (eds) Interoperating Geographic Information Systems. Boston, MA, Kluwer: 39 54 Lake R 2001a GML 2.0 enabling the geospatial web. Geospatial Solutions 11(7): 38 41 Lake R 2001b GML lays the foundation for the geospatial web. GeoWorld 14(10): 42 5 Lowe J 2002 Spatial on a shoestring: Leveraging free Open Source Software. Geospatial Solutions 12(6): 42 5 McKee L 1998 What does OpenGIS Specication conformance mean? GeoWorld 11(8): 38 Neumann A 2002 Comparing .SWF (Shockwave Flash) and .svg (Scalable Vector Graphics) le format specications. In Proceedings of the SVG Open Developers Conference, 1517 July 2002, Zurich Switzerland (available at http://www.carto.net/papers/svg/comparison_ash_ svg.html)
Blackwell Publishing Ltd. 2003

466

G Anderson and R Moreno-Sanchez

Puhretmair F and Woss W 2001 XML-Based integration of GIS and heterogeneous tourism information. In Name (ed) Title. Berlin, Springer-Verlag Lectures Notes in Computer Science No 2068: 346 58 Ramsey P 2002 Open source GIS ghts the three-horned monster. GeoWorld 15(8): 23 5 Rasmus L 2000 PHP Pocket Reference. Sebastopol, CA, OReilly & Associates Sahay N 2002 GMLView: A GML Map Renderer. Unpublished M.S. Technical Paper, Department of Computer Engineering, University of Minnesota (available at http://www-users.cs.umn. edu / ~sahay/ 8701 / planb_1_10.htm) Seth A P 1999 Changing focus on interoperability in information systems: From system, syntax, structure to semantics. In Goodchild M F, Egenhofer M, Fegeas R and Kottman C (eds) Interoperating Geographic Information Systems. Boston, MA, Kluwer: 5 30 Stinson B 2001 PostgreSQL Essential Reference. Indianapolis, IN, New Riders Stones R and Matthew N 2001 Beginning Databases with PostgreSQL. Chicago, IL, Wrox Press Vckovski A 1998 Interoperable and Distributed Processing in GIS. Bristol, PA, Taylor and Francis Wheeler D A 2002 Why Open Source Software/Free Software (OSS / FS)? Look at the Numbers! WWW document, http://www.dwheeler.com / oss_fs_why.html Waters N 1999 Is XML the answer to internet-based GIS? GeoWorld 12(7): 32 3

Blackwell Publishing Ltd. 2003

You might also like