You are on page 1of 14

Revisiting the first implementation of

Spatial Analysis with the Spatial Statistics


Toolbox
Joshua Sisskind
josh.sisskind@gmail.com

1
On August 31, 1854 – after several
outbreaks had already occurred
elsewhere in the city – a major
outbreak of cholera struck Soho.
Over the next ten days, over 500
people on or near Broad Street died.
Dr. John Snow wanted to prove his
hypothesis that the cause of the
disease was contaminated water
sources. He created a map, plotting
related deaths and water pumps to
illustrate how cases of cholera were
centered around the Broad Street
water pump.

Snow’s map was unique as it was the first to use cartographic


methods not only to depict a geographic area, but to also
analyze clusters of geographically dependent phenomena .
Spatial statistics are tools that help analyze the
distribution and relationship of features spatially.

Differing from traditional statistics, spatial statistics


implement distance, area and space as an integral
part of the analysis.

As Dr. Snow’s GIS Analyst, we are hoping to spatially


identify the cause of the cholera outbreak

3
Measuring Geographic
Distributions
•Mean Center
•Directional Distribution

Analyzing Patterns
•Average Nearest Neighbor
•Spatial Autocorrelation
•High/Low Clustering tool

Mapping Clusters
•Cluster and Outlier Analysis
•Hot Spot Analysis

4
 The first step in our analysis is to
determine the center of our
cholera deaths. This will be a clue
as to the location of the
contaminated water source.
 We will weight the features so
the mean center is more a measure
of concentration than a measure of
purely geographic distribution. In
this case, we want to use the
number of deaths at each point as
the weight.
 We will also create a standard
distance circle, a circle with a radius
equal to one standard deviation,
with the mean center also the
center of the standard distance
circle.
 What the mean center doesn’t
tell us is whether the data is
concentrated or dispersed or
whether it has a directional trend.

5
The Average Nearest Neighbor tool to
calculate the average distance between
each feature, based on area.

We will now determine if the deaths are


clustered or dispersed.

As evident to our output, the pattern of


deaths exhibits clustering.

Since our Z-score is significant, we can see if


the data is spatially auto-correlated.
Spatial auto-correlation indicates whether
certain values are likely to occur in one
location, or are equally likely to occur at any
location.
Global Moran’s I measures the similarity of
nearby features.
Our results show that the pattern is somewhat
clustered – but also may be due to random
chance.
6
The High/Low Clustering tool measures concentrations of high or low
values for an entire study area.
We will use this tool to see if buildings with numerous cholera related
deaths are clustered. If so, the location of the contaminated water
source could be within that cluster.

7
8
Looking for ‘hot’ and ‘cool’ spots in the data will help determine where there is a high
concentration of cholera related deaths.

In other words, we want to look for clusters of features with high values and clusters of features
with low values.

Legend
!
( < -2.0
(
! -2.0 to -1.0
(
! -1.0 to 1.0
(
! 1.0 to 2.0
(
! > 2.0

9
Outlier
High Value Clusters

Legend
!
( < -2.0
(
! -2.0 to -1.0
(
! -1.0 to 1.0
(
! 1.0 to 2.0
(
! > 2.0

We can also determine which clusters are statistically significant.

Statistically speaking, that is a confidence of greater than 95% that it the cluster of high (or low)
values is not a random occurrence.
10
11
12
Deaths Per Water Pump
300

250

200

150 Broad Street Little Marlborough Rupert


Bridle Newman Warwick

100 Beamers Marlborough Vigo


Piccadilly Dean

50

13
14

You might also like