You are on page 1of 52

Bruno Sanso

Department of Applied Mathematics and Statistics


University of California Santa Cruz
http://www.ams.ucsc.edu/bruno
General Description

This course is focused on models for data that are spatially


referenced. The course will have a strong emphasis on model based
geostatistic methods with Bayesian inference.

1
General Description

This course is focused on models for data that are spatially


referenced. The course will have a strong emphasis on model based
geostatistic methods with Bayesian inference.
Geostatistics refers to models for random processes that are
indexed at fixed locations that are irregularly scattered. We will
look into the theoretical properties of those models as well as into
the computational issues involved in the estimation of their
parameters. Familiarity with R, linear models, Bayesian methods
and MCMC methods will be assumed.

1
Outline

1. Introduction. Taxonomy of spatial models. Basic properties of


Gaussian random fields.
2. Inference for spatial Gaussian random fields. Traditional
approaches, maximum likelihood and Bayesian methods.
3. Reduced rank models. Models for multivariate random fields.

2
References

Hierarchical Modeling and Analysis for Spatial Data, Second


Edition, S. Banerjee, B.P. Carlin and A.E. Gelfand. Chapman
and Hall.
Model-based Geostatistics, P.J. Diggle and P.J. Ribeiro.
Springer
Statistical Methods for Spatial Data Analysis, O. Schabenberger
and C.A. Gotway. Chapman and Hall/CRC

3
References

Statistics for Spatial Data, N.A.C. Cressie. Wiley.


Statistics for Spatio-Temporal Data, N.A.C. Cressie and C.K.
Wikle. Wiley.
Handbook of Spatial Statistics, edited by A.E. Gelfand, P.J.
Diggle, M. Fuentes and P. Guttorp. CRC Press.
Interpolation of Spatial Data, M.L. Stein. Springer.
Statistical Analysis of Environmental Space-Time Processes,
N.D. Le and J.V. Zidek. Springer.
Multivariate Geostatistics, H. Wackernagel. Springer.
Correlation Theory of Stationary and Related Random
Functions, A.M. Yaglom. Springer.

4
Introduction

We are interested in spatial processes X(s), s S Rn where n is


usually small, say 2 or 3. We start by considering the univariate
case where X(s) R.

5
Introduction

We are interested in spatial processes X(s), s S Rn where n is


usually small, say 2 or 3. We start by considering the univariate
case where X(s) R.
We are interested in considering dependencies in the distribution of
X(s) that are induced by the indexing variable s

5
Introduction

We are interested in spatial processes X(s), s S Rn where n is


usually small, say 2 or 3. We start by considering the univariate
case where X(s) R.
We are interested in considering dependencies in the distribution of
X(s) that are induced by the indexing variable s
As in the case of time series, we often have substantial amounts of
data, but only one realization. We need to overcome this problem
by assuming either strong structural or prior information or some
repeatability in the form of symmetry or stationarity.

5
Types of Models

A possible classification of spatial models is as follows:

6
Types of Models

A possible classification of spatial models is as follows:


Models for continuous random surfaces. That is, S is a
continuum. It is common to assume that X(s) is Gaussian or
(almost equivalently) to model only the first two moments of the
process. This area is known as Geostatistics.

6
Types of Models

A possible classification of spatial models is as follows:


Models for continuous random surfaces. That is, S is a
continuum. It is common to assume that X(s) is Gaussian or
(almost equivalently) to model only the first two moments of the
process. This area is known as Geostatistics.
Models for mosaic phenomena, where S is countable (usually
finite) and often a lattice. Markov Random Fields are the most
popular models of this type.

6
Types of Models

A possible classification of spatial models is as follows:


Models for continuous random surfaces. That is, S is a
continuum. It is common to assume that X(s) is Gaussian or
(almost equivalently) to model only the first two moments of the
process. This area is known as Geostatistics.
Models for mosaic phenomena, where S is countable (usually
finite) and often a lattice. Markov Random Fields are the most
popular models of this type.
Points and attributes processes where s S is a random variable.
So the inference focuses, at least initially, about the locations where
the process is observed. These are Point Processes. When some
attributes of the process are also considered then we refer to
Marked Point Processes.

6
Graphical Exploration

As for any sta-


tistical modeling
exercise, the first
step of a spatial

1000
data analysis is to

950
perform a careful

900
exploration of the Elevation (X 10 Feet)

Y Coord (X 50 Feet)
data. 850

This figures
800

7
6
5
shows a scat-
750

4
3
terplot of the
700

2
1
elevation data in
650

0
0 1 2 3 4 5 6 7

geoR. X Coord (X 50 feet)

7
Graphical Exploration

6
Top left panel: lo-

5
cation of the points.

4
Y Coord
Y Coord
Colors and symbols

3
2

2
identify the quartiles

1
of the data. Top

0
right and bottom left 0 1 2 3
X Coord
4 5 6 700 750 800
data
850 900 950

panels: scatterplot
950

0.006
of elevation versus 900

each of the coordi-

0.004
850

Density
data

nates, with lowess


800

0.002
curve. Bottom right
750

panel: histogram of
700

0.000
the data. 0 1 2 3
X Coord
4 5 6
650 700 750 800 850 900 950 1000
data

8
Lattice Data

Neighbor Structure for the data on sudden infant deaths in North


Carolina. The counties that are directly connected are neighbors.
Statistical models are based on modeling the distribution of the
attributes of a given county conditional on the neighbors. This has the
advantage of producing sparse structures.

9
Lattice Data

Probability map of North Carolina SIDS cases 7478


1.0

0.8

0.6

0.4

0.2

0.0

For count data, like the SIDS, we are interested in rates or probabilities
of observing an event of interest in a given cell. The purpose of the
modeling is to obtain a smooth version of the raw data that accounts for
the neighbor structure.

10
Point Processes

39.22 Cincinnati crime data in 2006

These data represent the lo-


cations of major crimes in
Cincinnati in 2006. In this
39.18

case we are interested in a


Latitude

description of the intensity


39.14

of the process. We want to


associate non-homegeneities
to spatially-varying covari-
39.10

ates.
84.65 84.60 84.55 84.50 84.45 84.40

Longitude

11
Point Processes or Lattice Models?

Map of Cincinnati

Lattice data usually


39.22

correspond to point
39.2

processes that are dis-


cretized over predefined
spatial units. From the
39.16
Y

properties of Poisson
processes, the number
39.12

of counts per unit


39.1

is a Poisson random
84.6 84.55 84.5 84.45 84.4
variable.
X

12
Point Processes or Lattice Models?

Using Poisson processes retains the spatial continuity.


Likelihood-based methods are complicated by the need to deal with
non trivial normalizing constants.

13
Point Processes or Lattice Models?

Using Poisson processes retains the spatial continuity.


Likelihood-based methods are complicated by the need to deal with
non trivial normalizing constants.
Using a discretized version of the model allows for the use of
generalized linear models. Overdispersion may be artificially
induced by the discretization.

13
Marked Point Processes

Poisson processes can have associated marks. For example, crime


data could include the location, as well as the type of crime. The
latter is a random variable associated with the process. This is
usually referred to as a mark. The full process is a marked point
process.

14
Coordinates and Distances

Spatially referenced data need a reference coordinate system to


index the location of the observation. A useful system of
coordinates is the Universal Transverse Mercator (UMT). This is
based on superimposing a grid on the geographical area and then
measuring the distance in meters from the point of interest and the
nearest grid lines to the south and west. So the coordinates are
referred to as Easting and Northing.

15
Coordinates and Distances

When calculating distances over the surface of the earth that are
far apart, one has to account for the curvature. Consider two
points on the surface given in latitude and longitude, say
P1 = (1 , 1 ) and P2 = (2 , 2 ). The distance is given by

D = R

where
cos = sin 1 sin 2 + cos 1 cos 2 cos(1 2 ).
and R is the radius of the earth. Thus, is the angular distance
between P1 and P2 .

16
R Packages

There are a number of R packages that are relevant for the analysis
of spatial data. A comprehensive discussion of the available
packages is presented in the CRAN Task View: Analysis of Spatial
Data http://cran.r-project.org/web/views/Spatial.html.
The ones that I most familiar with are geoR; fields and spBayes.

17
Basics Definitions

Definition: A random field, random function or stochastic process


X(s), defined on S = Rn , is a function whose values are random
variables, for any value of s.

18
Basics Definitions

Definition: A random field, random function or stochastic process


X(s), defined on S = Rn , is a function whose values are random
variables, for any value of s.
Definition: A Gaussian random field is a random field where all
the finite-dimensional distributions, say, F (s1 , . . . , sn ) are
multivariate normal distributions, for any choice of n and
s1 , . . . , sn .

18
Basics Definitions

Definition: A random field, random function or stochastic process


X(s), defined on S = Rn , is a function whose values are random
variables, for any value of s.
Definition: A Gaussian random field is a random field where all
the finite-dimensional distributions, say, F (s1 , . . . , sn ) are
multivariate normal distributions, for any choice of n and
s1 , . . . , sn .
In order to specify a Gaussian random field we need a mean,

m(s) = E(X(s)), s S

and a covariance function,

C(s, s ) = cov(X(s), X(s ), s, s S.

18
Basics Definitions

Definition: C(s, s ) is positive definite if for any positive integer


n, sj S and cj R for j = 1, . . . , n
X
ci cj C(si , sj ) 0
i,j

and the expression above is equal to 0 if and only if ci = 0 for all i.

19
Basics Definitions

Definition: C(s, s ) is positive definite if for any positive integer


n, sj S and cj R for j = 1, . . . , n
X
ci cj C(si , sj ) 0
i,j

and the expression above is equal to 0 if and only if ci = 0 for all i.


Definition: The correlation function is defined as

p
(s, s ) = C(s, s )/ C(s, s)C(s , s )

19
Basics Definitions

Definition: C(s, s ) is positive definite if for any positive integer


n, sj S and cj R for j = 1, . . . , n
X
ci cj C(si , sj ) 0
i,j

and the expression above is equal to 0 if and only if ci = 0 for all i.


Definition: The correlation function is defined as

p
(s, s ) = C(s, s )/ C(s, s)C(s , s )

Definition: The variance function is defined as

2 (s) = C(s, s)

19
Stationarity

Definition: A random field is strictly stationary if for any finite


collection of sites s1 , . . . , sn and any u S, the joint distributions
of (X(s1 ), . . . , X(sn )) and (X(s1 + u), . . . , X(sn + u)) are the same.

20
Stationarity

Definition: A random field is strictly stationary if for any finite


collection of sites s1 , . . . , sn and any u S, the joint distributions
of (X(s1 ), . . . , X(sn )) and (X(s1 + u), . . . , X(sn + u)) are the same.
Definition: A random field is weakly stationary if
m(s) = m, s S and C(s, s + u) = C(u). Notice that any
stationary process must have constant variance, so C(u) = 2 (u).

20
Stationarity

Definition: A random field is strictly stationary if for any finite


collection of sites s1 , . . . , sn and any u S, the joint distributions
of (X(s1 ), . . . , X(sn )) and (X(s1 + u), . . . , X(sn + u)) are the same.
Definition: A random field is weakly stationary if
m(s) = m, s S and C(s, s + u) = C(u). Notice that any
stationary process must have constant variance, so C(u) = 2 (u).
Remark: Strict stationarity implies weak stationarity. For a
Gaussian process the opposite is also true.

20
Isotropy

Definition: Assuming that E(X(s + u) X(s)) = 0, the variogram


is defined as E(X(s + u) X(s))2 = var(X(s + u) X(s)).

21
Isotropy

Definition: Assuming that E(X(s + u) X(s)) = 0, the variogram


is defined as E(X(s + u) X(s))2 = var(X(s + u) X(s)).
If the variogram depends only on u, then we write it as 2(u)
where is denoted as the semi-variogram. If the covariance
function exists, then we have that (u) = C(0) C(u).

21
Isotropy

Definition: Assuming that E(X(s + u) X(s)) = 0, the variogram


is defined as E(X(s + u) X(s))2 = var(X(s + u) X(s)).
If the variogram depends only on u, then we write it as 2(u)
where is denoted as the semi-variogram. If the covariance
function exists, then we have that (u) = C(0) C(u).
Definition: A stationary random field is isotropic if the covariance
function depends on distance alone, i.e. C(s, s ) = C( ) where
= ||s s ||. This is a very strong condition on the radial
symmetry of the covariance.

21
Covariance Functions

Model C( )


2 (1 3 / + 1 ( /)3 ) if 0 <
2 2
Spherical C( ) =
0 if

Powered
Exponential C( ) = 2 exp(| /| ) > 0 0 < 2

Rational  2

Quadratic C( ) = 2 1
(2 + 2 )
>0

Wave C( ) = 2 sin( /)
/
>0

2
Matern C( ) = 21 ()
( /) K ( /) >0 >0

22
Covariance Functions

The isotropic covariances considered are defined by at least two


parameters. The scale and the range . 2 corresponds to the
variance of the process.
The range determines the decay of the covariance function. So, for
example, consider an exponential covariance. If the distance for
which the correlation is equal to 5% is 0 , then the range is equal
to 0 /2.3. Note that in the parameterizations used in the tables,
the range is measured in the same units used to obtain distances.

23
Matern Correlation Functions

varying and fixed models with equivalent "practical" range

= 0.5, = 0.25
1.0

1.0
= 0.5
= 1.5 = 1, = 0.188
=2 = 2, = 0.14
=3 = 3, = 0.117
0.8

0.8
0.6

0.6
(h)

(h)
0.4

0.4
0.2

0.2
0.0

0.0 0.5 1.0 1.5 0.0 0.0 0.2 0.4 0.6 0.8 1.0

distance distance

24
Geometric Properties

Characterizing the smoothness and differentiability of a random


function is key when choosing a family of models that is most
suited for a problem.
Since a random field is a collection of random variables, there are
technical subtleties in the definition of continuity. Intuitively
continuity corresponds to any realization of X(s) being continuous
as a function of s.

25
Mean Square Properties

Definition: A random field X is mean square continuous in B if


for every sequence sn such that ||sn s|| 0 as n , then

E(|X(sn ) X(s)|2 ) 0, as n s B

provided the expectation exists.

26
Mean Square Properties

Definition: A random field X is mean square continuous in B if


for every sequence sn such that ||sn s|| 0 as n , then

E(|X(sn ) X(s)|2 ) 0, as n s B

provided the expectation exists.


Mean square continuity of Gaussian processes is controlled by the
smoothness of the covariance function. For a stationary random
field all we need is to look at one point.

26
Mean Square Properties

Theorem: Assume that E(X(s)) is continuous. Then, a random


field X(s) is mean square continuous at t if and only if its
covariance function C(s, s ) is continuous at s = s = t.
Corollary: A stationary random field X(s) is mean square
continuous at s S if and only if its correlation function (h) is
continuous at 0.

27
Mean Square Differentiability

Additional smoothness of the random field depends on the


differentiability of the covariance function. C(, ) needs to be twice
differentiable for X(s) to be differentiable.
P
Theorem: Let = i i , then, if the derivative

2 C(s, t)
(1)
s11 snn t11 tnn
exists and is finite for all i = 1, . . . n at (s, s), X(s) is times
differentiable at s. Moreover, the covariance function of
X(s)
s11 snn
is given by (1).

28
Smoothness

Consider a Gaussian correlation function ( ) = exp{( /)2 }.


This is an analytic function at = 0, so the corresponding random
field is infinitely smooth. This is an unrealistic assumption for
many natural phenomena.

29
Smoothness

Consider a Gaussian correlation function ( ) = exp{( /)2 }.


This is an analytic function at = 0, so the corresponding random
field is infinitely smooth. This is an unrealistic assumption for
many natural phenomena.
Consider the power exponential correlation ( ) = exp{ }, with
0 < 2. Then ( ) = 1 exp{ }. So that

0 < < 1


(0) = 1 =1


0 1<2

So there is no differentiability for 0 < < 1.

29
Smoothness

The second derivative is ( ) = 2 (1 + ) exp{ } and


we have that

1 < < 2
lim (0) =
0 2 =2

which implies that the only case where the resulting process is
differentiable is = 2. In such case the process is infinitely smooth.
This lack of continuity in the smoothness of the family of power
exponential correlation is undesirable.

30
Smoothness

The Matern family is indexed by a parameter that provides a


gradual transition from non-differentiability 1 to increasingly
smooth sample paths > 1. This flexibility makes it very desirable
as modeling choice.

31
Smoothness

The Matern family is indexed by a parameter that provides a


gradual transition from non-differentiability 1 to increasingly
smooth sample paths > 1. This flexibility makes it very desirable
as modeling choice.
For small values of we have that K ( ) ()21 . Thus
1
lim K ( ) = 1, >0
0 ()21

so that continuity holds. For the derivatives we have that


d
( K ( )) = K1 ( )
d

31
Smoothness

Using the results on the previous slide we have that:


for 0 < < 1/2, (0) = . So these cases correspond to
extremely erratic processes.
for 1/2 < 1 (0) (, 0). Which produces a range of
erratic processes.
for d we have that (2d1) (0) = 0 and (2d) (0) (, 0).
This implies that the process is d times mean square
differentiable.

32
Specific Matern Correlations

 

= 1/2, ( ) = exp

   

= 3/2, ( ) = 1 + exp

2
   

= 5/2, ( ) = 1 + + 2 exp
3
The case = 1 is known as Whittle correlation function.

33

You might also like