You are on page 1of 11

OpenGeo : Introduction to an Open Source Geostack : PostGIS http://workshops.opengeo.org/stack-intro/postgis.

html

Products & Services


Technology
Support
Partners
About
Blog

Introduction to an Open Source Geostack


Home » Education » Introduction to an Open Source Geostack » PostGIS

PostGIS Table Of Contents


PostGIS is an extension to the PostgreSQL relational database that provides spatial types,
PostGIS
indexes and functions, following the OGC “Simple Features for SQL” (SFSQL).
Starting the Suite
Starting the Suite
Creating a Database
You can start and stop the OpenGeo Suite, and access components like PostGIS and
GeoServer, via the “Dashboard”. Loading Shapes into PostGIS
Loading Shapes into PostGIS...
Start the Dashboard from the Start Menu > OpenGeo (Windows) or Applications > Using the Command Line
OpenGeo (OS/X).

PostGIS System Tables


When you first start the dashboard, it provides a reminder about the default password for
accessing GeoServer. SPATIAL_REF_SYS
GEOMETRY_COLUMNS

Spatial Queries

Measuring
Sub-setting
Spatial Indexes
Spatial Joins
Conclusion

Continue Reading
Previous: Installing PostGIS and
GeoServer

Next: Installing QGIS

About OpenGeo
OpenGeo provides commercial open
Note source software for internet mapping and
geospatial application development. We
The PostGIS database has been installed with unrestricted access for local users (users are a social enterprise dedicated to the
connecting from the same machine as the database is running). That means that it will growth and support of open source
accept any password you provide. If you need to connect from a remote computer, the software.
password for the postgres user has been set to postgres.
License
1. First, we need to start up the Suite (which will start both PostGIS and GeoServer). Click This work is licensed under a Creative
the green Start button at the top right corner of the Dashboard. Commons Attribution-Share Alike 3.0
United States License. Feel free to use this
2. The first time the Suite starts, it initializes a data area and sets up template databases. material, but we ask that you please retain
This can take a couple minutes. Once the Suite has started, you can click the Manage the OpenGeo branding, logos and style.
option under the PostGIS component to start the pgAdmin utility.

1 of 11 07/02/2011 10:35
OpenGeo : Introduction to an Open Source Geostack : PostGIS http://workshops.opengeo.org/stack-intro/postgis.html

Note

PostgreSQL has a number of administrative front-ends. The primary is psql a


command-line tool for entering SQL queries. Another popular PostgreSQL front-end is
the free and open source graphical tool pgAdmin. All queries done in pgAdmin can also
be done on the command line with psql.

3. If this is the first time you have run pgAdmin, you should have a server entry for PostGIS
(localhost:54321) already configured in pgAdmin. Double click the entry, and enter
anything you like at the password prompt to connect to the database.

Note

If you have a previous installation of PgAdmin on your computer, you will not have an
entry for (localhost:54321). You will need to create a new connection. Go to File >
Add Server, and register a new server at localhost and port 54321 (note the
non-standard port number) in order to connect to the PostGIS bundled with the
OpenGeo Suite.

Creating a Database
PostgreSQL has the notion of a template database that can be used to initialize a new
database – the new database automatically gets a copy of everything from the template. When
you installed PostGIS, a spatially enabled database called template_postgis was created.
If we use template_postgis as a template when creating our new database, the new
database will be spatially enabled.

1. Open the Databases tree item and have a look at the available databases. The
postgres database is the user database for the default postgres user and is not too
interesting to us. The template_postgis database is what we are going to use to
create spatial databases.

2. Right-click on the Databases item and select New Database.

2 of 11 07/02/2011 10:35
OpenGeo : Introduction to an Open Source Geostack : PostGIS http://workshops.opengeo.org/stack-intro/postgis.html

Note

If you receive an error indicating that the source database (template_postgis) is


being accessed by other users, this is likely because you still have it selected.
Right-click on the PostGIS (localhost:54321) item and select Disconnect.
You can then double-click the same item to reconnect and try again.

3. Fill in the New Database form as shown below and click OK.

Name postgis

Owner postgres

Encoding UTF8

Template template_postgis

4. Select the new postgis database and open it up to display the tree of objects. You’ll
see the public schema, and under that a couple of PostGIS-specific metadata tables –
geometry_columns and spatial_ref_sys – which we will discuss later.

3 of 11 07/02/2011 10:35
OpenGeo : Introduction to an Open Source Geostack : PostGIS http://workshops.opengeo.org/stack-intro/postgis.html

5. Click on the SQL query button indicated below (or go to Tools > Query Tool).

6. Enter the following query into the query text field:

SELECT postgis_full_version();

Note

This is our first SQL query. postgis_full_version() is management function


that returns version and build configuration.

7. Click the Play button in the toolbar (or press F5) to “Execute the query”. The query will
return the following string, confirming that PostGIS is properly enabled in the database.

8. You have successfully created a PostGIS spatial database!! Now do a spatial calculation
just to make sure. Copy the following into the SQL window:

SELECT ST_Length('LINESTRING(0 0, 1 1)');

Our first spatial query constructs a diagonal line across a one-unit square. The length of
that line is sqrt(2), or 1.4142.

Loading Shapes into PostGIS


The workshop data files are public domain data from the City of Medford, Oregon. The files are
located in the data/ directory of the workshop. The projection of the data is NAD83 State Plane
(Oregon South) in feet, more succinctly and opaquely known as EPSG:2270. The files are:

school_pt.shp a small point file of school locations


road_ln.shp a large line file of street centerlines
taxlot_ply.shp a large polygon file of taxable property parcels

We will load our example data into PostGIS using the pgShapeLoader tool in to convert from
Shape files to PostGIS tables.

1. From the PgAdmin Plugins menu, select PostGIS Shapefile and DBF loader.

4 of 11 07/02/2011 10:35
OpenGeo : Introduction to an Open Source Geostack : PostGIS http://workshops.opengeo.org/stack-intro/postgis.html

The loader still start with the connection information for your current PgAdmin database.
Click the “Test connection...” button to ensure you can connect to the database.

2. Now, click on the button in the “Shape File” area, and browse to the data directory.
Select the “school_pt.shp” file, and click “Open”.

3. Next, change the value of the SRID field to 2270.

4. Finally, click the “Import” button to start the process.

5. Repeat the process for “road_ln.shp” and “taxlot_ply.shp”. These are much larger files. To
make the load process go faster, open the “Options...” dialogue and click the “Load using
COPY rather than INSERT” option on before running the import.

Loading Shapes into PostGIS... Using the Command Line

5 of 11 07/02/2011 10:35
OpenGeo : Introduction to an Open Source Geostack : PostGIS http://workshops.opengeo.org/stack-intro/postgis.html

PostGIS ships with a command-line utility for loading shape files into the database, called
shp2pgsql, as well as a utility for exporting tables to shape files, call pgsql2shp.

If you completed the process with PostGIS Shapefile and DBF loader above, you do not
need to run these commands – the data is already loaded into your database.

Enter the workshop data directory, set the PATH environment variable to include the
PostgreSQL executables directory, and then run the data loading commands. shp2gpsql
converts the shape file into a SQL text file suitable for loading into the database. psql loads
the text file into the target database.

# set PATH=%PATH%;C:\Program Files\OpenGeo\OpenGeo Suite\pgsql\8.4\bin


# shp2pgsql -p 54321 -I -s 2270 -D road_ln.shp road_ln > road_ln.sql
# psql -f road_ln.sql -d postgis
# shp2pgsql -p 54321 -I -s 2270 -D taxlot_ply.shp taxlot_ply > taxlot_ply.sql
# psql -f taxlot_ply.sql -d postgis
# shp2pgsql -p 54321 -I -s 2270 -D school_pt.shp school_pt > school_pt.sql
# psql -f school_pt.sql -d postgis

PostGIS System Tables


PostGIS follows the OGC SFSQL (Simple Features for SQL) specification, which means it
includes two standard system tables of metadata: SPATIAL_REF_SYS and
GEOMETRY_COLUMNS.

SPATIAL_REF_SYS
The SPATIAL_REF_SYS table contains information about “spatial reference systems” –
combinations of geographic systems (ellipsoids, datum) and projected systems (projections,
parameters) that are used for real-world mapping. “Transverse mercator” is an example of a
projection, and WGS84 is an example of a spheroid, but “UTM Zone 10 North, NAD 83” is an
example of a full spatial reference system.

Table "public.spatial_ref_sys"
Column | Type | Modifiers
-----------+-------------------------+-----------
srid | integer | not null
auth_name | character varying(256) |
auth_srid | integer |
srtext | character varying(2048) |
proj4text | character varying(2048) |
Indexes:
"spatial_ref_sys_pkey" PRIMARY KEY, btree (srid)

Each row in the SPATIAL_REF_SYS table corresponds to one spatial reference system. The
srid column is the unique identifier, and is considered “internal” to the database. The
auth_name and auth_srid are the external authority and authority number. The authority is
usually “EPSG” and the table that ships with PostGIS matches the srid to the auth_srid for
convenience.

The srtext is the OGC “well-known text” representation of the spatial reference system. The
proj4text is the representation consumed by the Proj.4 reprojection library PostGIS uses to
provide on-the-fly reprojection. Because only the proj4text is used internally by PostGIS, it
is usually safe to omit the srtext when adding new entries, but be aware that external
programs may use the srtext to determine the projection of a particular table.

GEOMETRY_COLUMNS
The GEOMETRY_COLUMNS table contains information about the spatial columns in a database.

Table "public.geometry_columns"
Column | Type | Modifiers
-------------------+------------------------+-----------
f_table_catalog | character varying(256) | not null
f_table_schema | character varying(256) | not null
f_table_name | character varying(256) | not null
f_geometry_column | character varying(256) | not null
coord_dimension | integer | not null
srid | integer | not null
type | character varying(30) | not null

Each row in the table corresponds to one spatial column. Tables may have multiple spatial
columns. Client software such as QGIS and uDig often use the GEOMETRY_COLUMNS table to
figure out which columns to display to the end user as “layers” suitable for viewing on a map.

The first four columns (f_table_catalog, f_table_schema, f_table_name,


f_geometry_column) serve to uniquely locate the geometry column. The next three
describe the spatial metadata:

coord_dimension provides the dimensionality (2, 3, or 4 dimensions are supported in


PostGIS);
srid provides the spatial reference system and must refer to a valid row in the
SPATIAL_REF_SYS table;

6 of 11 07/02/2011 10:35
OpenGeo : Introduction to an Open Source Geostack : PostGIS http://workshops.opengeo.org/stack-intro/postgis.html

type provides the geometry type (point, linestring, polygon, etc).

Note that the GEOMETRY_COLUMNS table is not automatically updated as you create and drop
tables. You must manually keep it up to date.

One way to keep the table up-to-date is to religiously use the AddGeometryColumn()
function when managing DDL in spatial tables. This function takes in all the information
necessary to create a new column, performs the creation, and adds a metadata record:

SELECT AddGeometryColumn(
'public',
'mytable',
'mygeocolumn',
2,
4326,
'POLYGON'
);

Another way to keep the table up-to-date is to use helper functions. PostGIS 1.4 and higher
provide the Populate_Geometry_Columns() function, which checks for validity and also
fills in missing entries.

-- PostGIS 1.4
SELECT Populate_Geometry_Columns();

populate_geometry_columns
-------------------------------------------
probed:3 inserted:3 conflicts:0 deleted:0
(1 row)

Spatial Queries
We will now construct some queries of our spatial database, using “spatial SQL” functions
provided by PostGIS (and any other SFSQL spatial database). For a reference list of functions
we will be using, see the PostGIS Functions section.

Measuring
The taxlot_ply table contains 91,343 parcel polygons. It also includes a large number of
attributes about each parcel, including:

impvalue (improvement value) yearblt (year built)


landvalue (land value) feeowner (name of the owner)
acreage (reported acreage) state (state of residence of the owner)

We can use the ST_Area() function in combination with these attributes to ask some
questions of the taxlot_ply table. Open the PgAdmin SQL window and enter the following
queries into database.

What is the area in acres of all parcels in the database?

SELECT Sum(ST_Area(the_geom)) / 43560


FROM taxlot_ply;

Answer: 1772888

What is the area in acres of parcels built on since 2000?

7 of 11 07/02/2011 10:35
OpenGeo : Introduction to an Open Source Geostack : PostGIS http://workshops.opengeo.org/stack-intro/postgis.html

SELECT Sum(ST_Area(the_geom)) / 43560


FROM taxlot_ply
WHERE yearblt >= 2000;

Answer: 27176

What is the value per square foot of all parcels?

SELECT Sum(landvalue + impvalue) / Sum(ST_Area(the_geom)) as


FROM taxlot_ply;

Answer: 0.41

What is the value per square foot of all parcels held by out-of-state owners?

SELECT Sum(landvalue + impvalue) / Sum(ST_Area(the_geom)) as


FROM taxlot_ply
WHERE state != 'OR';

Answer: 0.38

Measurement is not limited to areas. We can also use linear measurements to characterize the
roads in the county.

What is the break down of road types in the county?

SELECT
Sum(ST_Length(the_geom)) / 5280 as miles,
Count(*) as nsegments,
cfcc
FROM road_ln
GROUP BY cfcc
ORDER by cfcc;

Sub-setting
So far, our queries have calculated one metric or a summary against every record in the
database. Databases are commonly used to store very large tables – larger than can be stored
in memory – and efficiently access sub-sets of those tables.

First, let’s find out the coordinates of the first school in our school_pt table:

SELECT ST_AsText(the_geom) FROM school_pt WHERE gid = 1;

Answer: POINT(4387009 402407)

Now, let’s take that point, and find the average property value in a one-mile (5280 foot) radius.

SELECT Sum(landvalue + impvalue) / Count(*) as avg_value


FROM taxlot_ply
WHERE
ST_DWithin(
the_geom,
ST_GeomFromText('POINT(4387009 402407)', 2270),
5280
);

Answer: 161,094

8 of 11 07/02/2011 10:35
OpenGeo : Introduction to an Open Source Geostack : PostGIS http://workshops.opengeo.org/stack-intro/postgis.html

There are a number of things going on in this query:

The ST_GeomFromText() function is used to build a geometry object from the text
representation of a point. Note that the SRID is also set to 2270 at the same time, to
match the SRID of our data tables.
The ST_DWithin() function is then used to test every geometry against the query
point, and return true only if the geometry was within 5280 units (feet).
Finally, only those records that passed the distance test were fed into the calculation of
the average property value: total value divided by number of properties.

Spatial Indexes
The PostGIS spatial index is an r-tree index, implemented on top of PostgreSQL’s GiST access
method infrastructure.

An “r-tree” (and any other spatial index) works by sorting the bounding boxes of features into a
quickly searchable tree. Because the features themselves are not indexed, just the bounding
boxes, all queries that use spatial indexes must proceed in two phases. First, the spatial index
is used to generate a subset of records that might match a spatial condition; then, an exact test
is used on just that subset to produce the final output set.

The “r-tree” index uses nested rectangles (in the two-dimensional case, cubes and hypercubes
for higher dimensions) to sort the features into a quickly searchable tree.

To create a spatial index in PostGIS, use the CREATE INDEX [indexname] ON


[tablename] USING GIST ( [geometry] ) command. For example, to index our three
example tables, you would use the following commands.

Let’s compare an unindexed and indexed query for speed.

1. First, drop the spatial indexes on your tables.

DROP INDEX school_pt_the_geom_gist;


DROP INDEX taxlot_ply_the_geom_gist;
DROP INDEX road_ln_the_geom_gist;

2. Run the average property query, and see how fast it executes:

SELECT Sum(landvalue + impvalue) / Count(*) as avg_value


FROM taxlot_ply
WHERE
ST_DWithin(
the_geom,
ST_GeomFromText('POINT(4387009 402407)', 2270),
5280
);

3. Now, add the spatial indexes back onto your tables, and run the query again.

CREATE INDEX school_pt_the_geom_gist ON school_pt USING GIST (the_geom


CREATE INDEX taxlot_ply_the_geom_gist ON taxlot_ply USING GIST
CREATE INDEX road_ln_the_geom_gist ON road_ln USING GIST (the_geom

The unindexed query logs an execution time of over 1000ms, while with the indexes, a time of
less than 50ms is achieved.

Spatial Joins
With spatial indexes in place, we can perform spatial joins quickly – taking information from two

9 of 11 07/02/2011 10:35
OpenGeo : Introduction to an Open Source Geostack : PostGIS http://workshops.opengeo.org/stack-intro/postgis.html

distinct tables and joining it together on the basis of spatial relationships.

Our last query determined the average property value within a one-mile radius of a single
school. We can use a spatial join to determine the property value within a one-mile radius
for all schools. Or, to keep the result set smaller, just the high schools.

SELECT
s.name AS school_name,
Sum(t.landvalue + t.impvalue) / Count(*) AS avg_property_value
FROM taxlot_ply t, school_pt s
WHERE
ST_DWithin(t.the_geom, s.the_geom, 5280)
AND
s.type = 'High School'
GROUP BY s.name
ORDER BY avg_property_value DESC;

And now we know where to send our kids to school in Medford.

Conclusion
These have been a very few examples of using spatial SQL for querying a database. In the
remaining sections of the workshop, most of the querying will happen behind the scenes, as
tools like GeoServer pull data from the database.

However, the power of the spatial database for analysis and querying remains easily available
via scripting languages and direct user tools like PgAdmin to quickly analyze or automate
geospatial tasks.

Previous: Installing PostGIS and GeoServer Next: Installing QGIS

Products & Services Technology

OpenGeo Suite OpenGeo Suite OpenGeo is the geospatial division of


GeoNode OpenPlans, a 501(c)(3) not-for-profit. We're
Learn bringing the best practices of open source
PostGIS
Features software to organizations around the world.
GeoServer
Screenshots & Videos
GeoWebCache 148 Lafayette Street, Penthouse
Download
OpenLayers New York, NY 10013
Purchase
GeoExt 1-877-OPENGEO
Pricing
Subscribe to our newsletter
Training Demos
Follow @opengeo on Twitter
Consulting Follow us on LinkedIn
Publications & Case Studies
Solutions

OpenGeo for Government About Us


OpenGeo for Transit
History
Commercial Solutions
Philosophy

Support Team
Careers
Partners Contact
Press
Partner Terms
Partner FAQ Blog

10 of 11 07/02/2011 10:35
OpenGeo : Introduction to an Open Source Geostack : PostGIS http://workshops.opengeo.org/stack-intro/postgis.html

11 of 11 07/02/2011 10:35

You might also like