Professional Documents
Culture Documents
University of Washington
GWR
Definitions, basic concepts
Running GWR
A straightforward implementation in ArcGIS
Basics of OLS
y X
Assumes a stationary process Same stimulus provokes the same response anywhere in the study area
Sampling variation Relationships intrinsically different across space (attitudes, preferences, contextual effects) Model misspecification
Applications: Ecology
Could have been differentiated sampling pattern creates predictable and changing levels of interaction among observations
The relationship between mortality and occupational segregation and between mortality and unemployment varies across Tokyo
The link between multifamily housing and residential burglaries varies widely even when controlling for numerous socioeconomic and neighborhood factors
Map residuals and test them for spatial autocorrelation if our model errs systematically with a spatial pattern then we may be on to something.
Lab Part 1
So what now?
Add more missing variables and try again
Repeat the steps from the lab
Accept that there is something about certain places that makes them different (spatial heterogeneity)
Try GWR
Test variables meant to explore interactions taking place at short distances (spatial dependence)
Try Spatial Regression (Likely a spatial lag model)
Assume that the correlation is a nuisance and control for it in the error term
Try Spatial Regression (Likely a spatial error model)
Local statistical technique to analyze spatial variations in relationships We are not content with global averages of spatial data (climate for example) Why should we be satisfied with global averages in a statistical analysis?
If we think of these points as our data grouped into colors by region we can see that the global and local models differ significantly
Source: Rcker and Schumacher BMC Medical Research Methodology 2008 8:34 doi:10.1186/1471-2288-8-34
Basic definitions
Spatial nonstationarity exists when the same stimulus provokes a different response in different parts of the study region Global models are statements about processes that are assumed to be stationary and, as such, are location GWR independent in greater detail Local models are spatial disaggregations of global models, the results of which are location specific Spatial heterogeneity refers to spatial patterns resulting from broad similarities usually over time Spatial dependence refers to spatial patterns that result from interactions among observations
GWR in a nutshell
Global model
y X
yi i i X i
Where i indicates that there is a set of coefficients estimated for every observation in our data set
becomes
To do so we weight near observations more heavily than more distant ones. We may also estimate coefficients based on some local subset of observations
.5 .5 .5 .5 .5
.5 .5 .5 .5 .5
.5 .5 .5 .5 .5
.5 .5 .5 .5 .5
.5 .5 .5 .5 .5
Improved model fit (R2, AIC, etc) Reduced spatial autocorrelation Represent context
Wij = exp[-((dij/h)2)/2]
Number of observations will vary, but area they represent will remain constant
Weighting option 2
Number of observations will remain fixed, but area will not be the same
A tradeoff between Bias: we include observations that are not part of the same spatial group and Variance: we dont have enough points in our model to say anything with conviction
AIC Variance
Optimum Bias
Bandwidth
To sum
Weighting assumptions are very important to outcomes in GWR Fixed distance kernel is more appropriate when the distribution of your observations is relatively stable across space (e.g. size, number of neighbors). Adaptive kernel is appropriate when distribution varies across space (e.g. events are clustered or polygons are heterogeneous) Once a kernel type is selected optimization takes some of the guesswork out of it, but robustness checks are still needed
Lab
Run GWR model Check Residuals Check variation in coefficients
General Troubleshooting
Significance Testing
How do I know if the variation I see in my coefficients is meaningful? Could do t-test, but you will run into problems with multiple (1,387) tests
Results in lots of false positives Standard correction (Bonferroni) will make any significance finding nearly impossible
Randomly reassign all observation values (dependent and independent variables travel together) to different observation locations
Each countys data gets assigned randomly to a different county
Re-run GWR and record coefficients Repeat lots of times (at least 100) Define a distribution for coefficient values and compare your coefficients to this distribution
1 pe pe np
1 pe pe np
.05 1 (37.97 ) 37.97 1387 8 .001283
F otheringham
In Excel we can find the significant T-statistic using: TINV(.001283,1379) In R we use: qt(1-(.001283/2),1379) Either way we get a value of ~3.23
Outlier problems
Outliers cause problems for everybody, but their impact is greater for local regressions, particularly when bandwidth keeps number of observations low. In standard OLS
Run model and identify observations with high or low residuals (~ +/- 4) Weight these observations less than 1 Re-run until none of the observations have extreme residuals Now do your GWR with weights assigned
Mixed-form models
What if some of your variables are stationary and others have variation?
Mixed-form models allow you to hold some coefficients constant while allowing others to vary
Not yet implemented in any statistical package, but not that difficult from a technical standpoint
Concluding comments