You are on page 1of 45

Geostatistics: Kriging

9.11.2017
Jaakko Madetoja
Slides also by Rangsima Sunila & Kirsi Virrantaus
Materials
• Geographic Information Analysis, O’Sullivan,
D. & Unwin, D. 2010 – e-book available from
library; chapter 10 (and 9.3 if you’re not
familiar with IDW)
• Principles of Geographical Information
Systems, Burrough,P. & McDonnell,R., 1999
• (Spatial Data Modelling Using Fuzzy and
Geostatistical Applications. Sunila,R. Dr. thesis,
2009.)

2
Today
• Introduction to interpolation
• Background in Geostatistics
• Concepts of kriging
• Types of kriging
• Further notes

3
Interpolation
• How to estimate unknown values at specific
locations.

(O’Sullivan & Unwin, 2010)


4
Interpolation

5
Spatial Interpolation
• Methods
– Nearest neighbours: Thiessen polygons (i.e.
Voronoi diagram)
– Triangulated Irregular Network (TIN)
– Trend surfaces
– Inverse Distance Weighting (IDW)
– Kriging
• The methods vary in the way they regard
spatial autocorrelation

6
Thiessen Polygons

(O’Sullivan & Unwin, 2010)

7
Triangulated Irregular Network (TIN)

8
Example:
Site x y z D to
(5,5)
10
1 2 2 3 4.2426
2
8

2 3 7 4 2.8284 4
6
4

3 9 9 2 5.6569 4
6
2 3
4 6 5 4 1.0000
0
0 2 4 6 8 10

5 5 3 6 2.0000

We would like to estimate the variable value at (5,5)

9
Inverse Distance Weighting (IDW)
• Value of z(x) is estimated from all known
values of z at all n points. (Weighted Moving
Average technique)
n
z ( x)   wi zi
i 1
n

w i 1
Weights usually add to 1, i 1

10
IDW
• In IDW, the weights are based on the distance
from each of the known points (i) to the point
we are trying to estimate (k): dik. In IDW, we
consider the inverse distance, 1/dik
1
d ik
wi  n
1
11 d ik

11
Example: IDW
Location (x,y) z D to (5,5) ID Weights
(2,2) 3 4.2426 0.2357 0.1040
(3,7) 4 2.8284 0.3536 0.1560
(9,9) 2 5.6569 0.1768 0.0780
(6,5) 4 1.0000 1 0.4413
(5,3) 6 2.0000 0.5 0.2207
sum 2.2661 1

Z(5,5) = 0.1040(3) + 0.1560(4) + 0.0780(2) + 0.4413(4) + 0.2207(6)


= 4.1814

12
Introduction to statistical analysis of
fields
• The previously introduced methods are based
on mathematical, deterministic models
• Disadvantages:
– The control points are assumed to be error free
– The methods assume that nothing is known about
the phenomenon being interpolated
• Two statistical interpolation methods: trend
surface analysis and kriging

13
Trend surface analysis
• Approximates trends in the control points by
fitting a (polynomial) function to the data
• Idea the same as in regression modeling

(O’Sullivan & Unwin, 2010)


14
Trend surface analysis
• Rarely used in interpolation
– Phenomenon rarely matches a simple
mathematical function
• Used for example with gravity measurements
(don’t quote me on this one!)

15
Introduction to kriging
• IDW takes spatial autocorrelation into account,
but the weighting is arbitrary
• In trend surface analysis a function is fitted to the
data, i.e. the methods lets the “data speak for
themselves”

• Kriging combines the approaches conceptually: it


uses distance weighting approach, but lets the
data speak for themselves to define the weights
16
Historical background
• Geostatistics, first developed by Georges Matheron (1930-
2000), the French geomathematician. The major concepts
and theory were discovered during 1954-1963 while he was
working with the French Geological Survey in Algeria and
France.
• In 1963, he defined the linear geostatistics and concepts of
variography, variances of estimation and kriging (named
after Danie Krige) in the Traité de géostatistique appliquée.
The principles of geostatistics was published in Economic
Geology Vol. 58, 1246-1266.
• Kriging was named in honor of Danie Krige (1919-2013), the
South African mining engineer who developed the methods
of interpolation.

17
What is Geostatistics
• Techniques which are used for mapping of surfaces
from limited sample data and the estimation of values
at unsampled locations
• Geostatistics is used for:
– spatial data modelling
– characterizing the spatial variation
– spatial interpolation
– simulation
– optimization of sampling
– characterizing the uncertainty
• The idea of geostatistics is the points which are close to
each other in the space should be likely close in values.

18
• Mining
• Geography
• Geology
• Geophysics
• Oceanography
• Hydrography
• Meterology
• Biotechnology
• Enviromental studies
• Agriculture

19
Geostatistical methods provide
• How to deal with the limitations of
deterministic interpolation
• The prediction of attribute values at unvisited
points is optimal
• BLUE (Best Linear Unbiased Estimate)

20
Geostatistical method for interpolation
• Reconigtion that the spatial variation of any
continuous attribute is often too irregular to
be modelled by a simple mathematical
function.
• The variation can be described better by a
stochastic surface.
• The interpolation with geostatistics is known
as kriging.

21
The steps in kriging
1. Describe the spatial variation with variogram
2. Summarize the variation with a mathematical
function
3. Use the function to determine interpolation
weights

22
Step 1: Variogram cloud

(O’Sullivan & Unwin, 2010)


23
Step 1: Square root differences cloud

(O’Sullivan & Unwin, 2010)


24
Step 1: Variogram cloud

(O’Sullivan & Unwin, 2010)


25
Variogram/Semivariogram
• To examine the spatial continuity of a
regionalized variable and how this continuity
changes as a function of distance.
• The computation of a variogram involves
plotting the relationship between the
semivariance and the lag distance
• Measure the strength of correlation as a
function of distance
• Quantify the spatial autocorrelation
26
Variogram

Variability increase
Semivariogram, y(h)

Lag distance (h)

27
Variogram
• Experimental variogram (sample or observed
variogram) :
– when variogram is computed from sampled data.
– The first step towards a quantitative description of
the regionalized variation.
• Theoretical variogram or variogram model:
– when it is modelled to fit the experimental
variogram.

28
Experimental and theoretical
variogram

from Longley, P.A., Goodchild, M.F., Maguire, D.J. And Rhind, D.W., 2001, Geographic Information Systems and Science

29
Describing the variogram
• Lag – The distance between sampling pairs.
• Sill – The value where the semivariogram first
flattens off, the maximum level of semivariance.
• Range – The point where the semivariogram
reaches the sill on the lag-axis. Sample points
that are farther apart than range are not spatially
autocorrelated.
• Nugget – The value of the variogram with 0 lag;
errors in measurements
30
Different variogram models
 h   h 
Spherical Exponenial

range range
sill sill

nugget nugget

Lag (h) Lag (h)

 h  Linear  h  Gaussian

range
sill

nugget nugget

Lag (h) Lag (h)

31
The significance of the terms
describing the variogram model
• Sill, range and nugget define the model
• For example, spherical model:

32
Step 3: Determine interpolation
weights
• The variogram model is used to determine the
weights for unknown points.
• The calculation is rather complex; see O’Sullivan
and Unwin (2010) page 302 for details.
• With the weights calculated, interpolation is the
same as with IDW
• Kriging also produces kriging variance which can
be used for estimating the uncertainty of the
interpolation
33
Example
• Semivariogram modeling in ArcMap

34
Further notes in kriging
• Anisotropy
• Different types of kriging
– Ordinary
– Simple
– Universal
– Block
– Indicator
– Co-kriging
• How to evaluate the quality of the interpolation
– Cross-validation (CV)
– Using training and testing data sets

35
Anisotropy
• Spatial variation is not the same in all
directions
• The variogram is computed for specific
directions
• If the process is anisotropic (as opposed to
isotropic), then so is the variogram

36
Example: Anisotropy

37
Ordinary, simple and universal kriging
• The difference between the methods is what they
assume about the mean z-value
• In ordinary, mean is an unknown value estimated
locally
• In simple, mean is a known constant, i.e. average
of the entire data set
• In universal, drift in the data is modeled using
trend surface analysis and the semivariogram is
calculated using residual values from the surface
38
Example of trend removal

from Introduction to the ArcGIS Geostatistical Analyst Tutorial

39
Block kriging
• Estimate an average value of a block
• Used for example in mining, where blocks of
rocks are extracted

40
Indicator kriging
• Used when the interpolated value is binary

• Can be utilized with nominal data

41
Co-kriging
• It is an extension of ordinary kriging where
two or more variables are interdependent
• The information contained in the associated
variable is used to enable better estimations
of the other variable
• Useful when the associated variable is easier
to acquire

42
Evaluating the quality of the
interpolation
• Cross-validation method leave-one-out:
1. Drop one input point out of the model
2. Interpolate the surface with kriging
3. Compare measured (i.e. real) value and
predicted (i.e. from kriging) value

43
Cross-validation in ArcMap

44
Evaluating the quality of the
interpolation
• Training and testing data sets:
– Use two different data sets (or divide one data set
in two)
– The interpolation is done with training data set
– Testing data set is used for testing how well the
interpolated surface predicts testing points

45

You might also like