Comparison of Ordinary Kriging and Artificial Neural Network

Comparison of ordinary kriging and
artificial neural network for spatial

mapping of arsenic contamination
Presented by: Pejman Tahmasebi
Supervisor: Dr.Katibeh
August, 2010
Outlines
 Role of ANN (Artificial Neural Network) and
Geostatistics in Enviromental Sciences
 What is ANN??
 What are the most prevalent geostatisticals
methods???
 What is the differences between simulation and
estimation??
 A case study by applying and comparison of
ordinary kriging (OK) and ANN
Outline of
ANN
Biological inspiration
Artificial neurons and neural networks
Learning processes
Learning with artificial neural networks
Methodology
3
Biological inspiration
 Animals are able to react adaptively to changes in their
external and internal environment, and they use their
nervous system to perform these behaviours.
 An appropriate model/simulation of the nervous system
should be able to produce similar responses and behaviours
in artificial systems.
 The nervous system is build by relatively simple units, the
neurons, so copying their behaviour and functionality should
be the solution.
4
How does it work???
5
NN Learning
Mathematics Algorithms
ANNs
Architectures Methodology
Problems
6
Neural Networks Mathematics
Output
Inputs
y11  f ( x1 , w11 )  y11  2

  y1  f ( y1 , w12 )  y 2

y12  1
f ( x2 , w2 )y1   y 2  2
1  3

 1 2 y  f ( y 1
, w2
2
) y 2
  y 2
3  y Out  f ( y 2
, w 3
1)
 y3  y 2  f ( y1 , w 2 )  2 
y31  f ( x3 , w31 )  y1  3 3  y3 
 4
y14  f ( x4 , w14 )
7
Neural Networks Architectures
x yout MLP neural networks
x
yout RBF
8
Learning to approximate
Error measure:
1 N
E
N
 t
( F ( x ; W )  y t ) 2
t 1
Rule for changing the synaptic weights:
E
wi  c 
j
(W )
wi j
wij , new  wij  wij

c is the learning parameter (usually a constant)
9
Learning with MLP neural
networks
1
MLP neural network with p layers y 1k   w1 kT x  a1k
, k  1,..., M 1
1 e
y 1  ( y11 ,..., y 1M )T1
1
yout y k2   w 2 kT y 1  a k2
, k  1,..., M 2
x 1 e
y 2  ( y12 ,..., y M2 ) T 2
...
1 2 … p-1 p
y out  F ( x;W )  w pT y p 1
Data: ( x1 , y1 ), ( x 2 , y 2 ),..., ( x N , y N )
Error: E (t )  ( y (t ) out  yt ) 2  ( F ( x t ;W )  yt ) 2
It is very complicated to calculate the weight changes.
10
Learning with backpropagation (BP)
Solution of the complicated learning:

• calculate first the changes for the synaptic weights of the
output neuron;
• calculate the changes backward starting from layer p-1, and
propagate backward the local error terms.
The method is still relatively complicated but it is much

simpler than the original optimization problem.
11
Methodology
??Why NN Application Results
Developing
Network
Network
Complex
Geological
Data
Validation
Learning
Testing
for
Architecture
Analysis
Estimation
new
Setting
Algorithm
locations
12
Geostatistics
 Geostatistical analysis is distinct from other spatial
models in the statistics literature in that it assumes
the region of study is continuous
• Observations could be
0.3 0.4 0.5
taken at any point
Z
2
within the study area

0 0. 1 0.
• Interpolation at points
in between observed
locations makes sense
Spatial Autocorrelation
 Spatial modeling is based on the assumption
that observations close in space tend to co-vary
more strongly than those far from each other
 Positively co-vary: values are similar in value
▪ E.g. elevation (or depth) tends to be similar for locations
close together)
 Negatively co-vary: values tend to be opposite in
value
▪ E.g. density of an organism that is highly spatially
clustered, where observations in between clusters are low
and values within clusters are high
Covariance
 Definition: two variables are said to co-vary if their

correlation coefficient is not zero
 x , y   ( x, y )  cov( x, y )  E[( x   x )( y   y )]   x y
where  is the correlation coefficient between X and

Y and X (Y) is the standard deviation of X (Y)
 Consider this in the context of a single variable

 E.g. do nearest neighbors have non-zero covariance?
Continuous Data – Geostatistics
Notation

Z(s) is the random process at location s=(x, y)
z(s) is the observed value of the process at
location s=(x, y)
D is the study region
The sample is the set {z(s) : s  D} . We say that
it is a partial realization of the random spatial
process {Z(s) : s  D}
Z (s )  μ(s ) W (s )  η(s )  ε(s )
Simpler Conceptual Model

Z ( s )   ( s )  ( s )   ( s )
where
(s) is the mean structure; called large-scale non-spatial
trend
δ(s) = W(s) + (s) is a zero-mean, stationary process

with autocorrelation which combines the smooth
small-scale and micro-scale variation
(s) is the random noise term with zero-mean and

constant variance which is independent of W(s) and (s)
Ordinary Kriging
 The theory of regionalised variables leads to

an “optimal” interpolation method, in the
sense that the prediction variance is
minimized.
 This is based on the theory of random

functions, and requires certain assumptions.
 A “Best Linear Unbiased Predictor” (BLUP)
that satisfies certain criteria for optimality.
Prediction with Ordinary Kriging (OK)
 In OK, we model the value of variable z at location

si as the sum of a regional mean m and a
spatially-correlated random component e(si):
 Z(si) = m+e(si)
 The regional mean m is estimated from the

sample, but not as the simple average, because
there is spatial dependence. It is implicit in the OK
system.
Prediction with Ordinary Kriging (OK)
 Predict at points, with unknown mean (which must

also be estimated) and no trend
 Each point x is predicted as the weighted average
n
Z* = å λ Zæçèx ö÷ø
x0 i=1 i i
of the values at all samples λ =?
i
 The weights assigned to each sample point sum to 1

 Therefore, the prediction is unbiased
 “Ordinary”: no trend or strata; regional mean must
be estimated from sample
Simple and Ordinary Kriging
Linear combination of nearest neighbours
•
x1 x2
x0 x3
x4
Local Means Inverse Distance Weights Kriging

n n
Z*x   λi Z xi  Z* = å λ Zæçè x ö÷ø
0 i1 n x0 i=1 i i
Z* = å λ Zæçè x ö÷ø
λ 1 x0 i=1 i i λ =?
i n i
λ = 12
i d
Ordinary Kriging
1
x1 x2 Variogram
Variogram analysis
analysis
x0 x3 2
Variogram
Variogram adjustment
adjustment
x4
3
Modelo
Modelo de
de ajuste
ajuste do
do semivariograma
semivariograma
  3 
 3  h 1  h  
    h 
γ h  C0  C1        C0  C1 Sph
2 a 2  a  
    
  
Kriging
Kriging estimator
estimator
Ordinary Kriging
-1
l1 C11 C12 .........C1n 1 C10
l2 C21 C22 .........C2n 1 C20
: = : : : : :
ln Cn1 Cn2 .........Cnn 1 Cn0
a 1 1 ......... 1 0 1
•• Covariance
Covariance matrix
matrix elements
elements
Cij =C(0) - γ (h) =C0 +C1 - γ (h)
••Substituting
Substituting the
the values
values we
we find
find the
the weights
weights
n
••Kriging estimator:Z*
Kriging estimator: = λ Zæx ö
x å i çè i÷ø
0 i=1
•• Variance
Variance
2
σko =(C0 +C1)- λ T k
Kriging example
Estimator:
1
 λ  C11 C12 C13 C14 1  C 01 
50      
x 2
λ   C 21 C 22 C 23 C 24 1 C 02 
50 x λ     
1
x   = C 31 C 32 C 33 C 34 1  C 03 
x
3
λ  C C C C 1  C 
   41 42 43 44   04 
0
x
    1 1 1 1 0  1
4
   
• Matrix elements: Cij = C0 + C1 - g (h) Modelo Teórico

C12 = C21 = C04 = C0 + C1 - g (50 2)
  50 2 (50 2 )3  
= (2+20) - 2  20 1,5  0,5 
3   = 9,84
  200 (200)  
Kriging example
C13 = C31 = (C0 + C1) - g [ V 2 2
(150) + (50) ] = 1,23
50 C14 = C41 = C02 = (C0 + C1) - g [ V 2 2

(100) + (50) ] = 4,98
x 2
50 x 1 C23 = C32 = (C0 + C1) - g [ V 2 2

(100) + (100) ] = 2,33
x 3
x 0 C24 = C42 = (C0 + C1) - g [ V 2 2

(100) + (150) ] = 0,29
x 4
C34 = C43 = (C0 + C1 ) - g [ V 2 2

(200) + (50) ] = 0
C01 = (C0 + C1 ) - g (50) = 12,66
C03 = (C0 + C1 ) - g (150) = 1,72
C11 = C22 = C33 = C44 = (C0 + C1 ) - g (0) = 22

Kriging example
Substituting the values Cij, we find the following weights:
l1 = 0,518 l2 = 0,022 l3 = 0,089 l4 = 0,371
The estimator is
ZZ*x*xo = 0,518 z(x1) + 0,022 z(x2) + 0,089 z(x3) + 0,371 z(x4)
o
50
x 2
50 x 1
x 3
x 0
x 4
Case Study
Abstract
 It was investigated the hypothesis that ‘non-

linearity matters in the spatial mapping of complex
patterns of groundwater arsenic contamination’
 One ANN and a variogram model were used to
represent the spatial structure of arsenic
contamination.
 The probability for successful detection of a well as
safe or unsafe was found to be atleast 15% larger
than that by kriging under the country-wide
scenario.
Introduction
 Extensive groundwater contamination by arsenic is
observed in many alluvial aquifers of the world today.
 Soluble arsenic compounds are generally rapidly
absorbed into the body from the gastrointestinal tract.
 Studies have shown that twenty years of sustained
consumption of contaminated water exceeding 50 µg/l
of arsenic can cause internal cancers and affect 10% of
all exposed.
 detection of groundwater arsenic contamination can
prevent widespread diseases which could otherwise be
very costly to treat
 Spatial mapping of arsenic contamination on the basis of
sparse in situ sampling data can be considered one such
cost-effective and non-structural method of contamination
detection at non-sampled locations.
 Conventional methods for spatial mapping of groundwater
contamination based on linear geostatistical theory (such as
kriging) can however have high uncertainty at non-sampled
locations.
 The objective of this study is to explore the validity of the
hypothesis that ‘non-linearity matters in the spatial mapping
of complex patterns of groundwater arsenic contamination.
Study region, data and mapping tools
 Arsenic data were obtained from the

British Geological Survey (BGS) which, in
collaboration with local authorities in
Bangladesh, surveyed randomly selected
wells from 1998 to 2000.
 Measurement of arsenic was taken at a
single depth close to the screen for each
well, wherein the depths varied from 10–
300 ft below the surface.
 Arsenic measurements of BGS-DPHE
(2001) survey were based on the Atomic
Absorption Spectro- photometric (AAS)
method, which can be considered a very
reliable method for arsenic testing.
 The weights were trained using
the back propagation (BP)
algorithm
 Class (1) predicted concentration
is less than 10 parts per billion
(ppb, or µg/l); Class (2) predicted
concentration is between 10 and
50 ppb; and Class (3) predicted
concentration is higher than 50
ppb. Note that the 10 and 50 ppb
are the safe limits prescribed by
the World Health Organization
(WHO) and Bangladesh
Government, respectively.
 In this study, it was used the Levenberg–Marquardt (LM)
algorithm for training of ANN.
 This algorithm is a trust region based method with
hyperspherical trust region that has proved to be a better
solution in searching for the minima.
 In order to elicit the essential features of the spatial pattern of
arsenic data and thereby facilitate the modeling equally for
each mapping tool, data preprocessing was performed.
 In the un-preprocessed format, the spatial nature of arsenic
data is known to be highly irregular in the southern and south
central regions of Bangladesh.
 Data from each well was grouped in 5 × 5 km grids
Comparision of ANN versus ordinary
kriging
 For assessing the accuracy of each method for spatial interpolation
of arsenic concentration at non-sampled locations, the following
three metrics were used:
 1. Probability of successful detection: This is the probability that the
predicted class value matches with the in-situ class value of a non-
sampled well.
 2. Probability of false hope: This is the probability that the predicted
class value is underestimated significantly leading to an unsafe well
being predicted wrongly as safe for a non-sampled well.
 3. Probability of false alarm: This is the probability that the predicted
class value is overestimated significantly leading to a safe well being
predicted wrongly as unsafe for a non-sampled well.
 clearly observe that ANN, by virtue of its
ability to generalize the spatial pattern using
a highly nonlinear network, shows
considerably more accuracy when compared
to ordinary kriging subject to the same
breadth and constraints in data.
 The probability for successful detection is at
least 15% higher than that by kriging for the
country as a whole.
Conclusion
 The mapping (spatial interpolation) is made relatively easier
by a technique.
 The study demonstrated that ANNs can also be used to map
with noticeably higher accuracy than kriging the complex
and seemingly erratic spatial pattern of groundwater
contamination provided that reasonable data preprocessing
and exploratory data analysis are performed.
 The challenge now is to find practical ways to leverage the
information gained from chaos analysis towards the robust
design of ANN-type mapping schemes that can build upon
conventional kriging methods.

Comparison of Ordinary Kriging and Artificial Neural Network

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Comparison of Ordinary Kriging and Artificial Neural Network

Uploaded by

Copyright:

Available Formats

Comparison of ordinary kriging and

artificial neural network for spatial

Presented by: Pejman Tahmasebi

Artificial neurons and neural networks

Learning with artificial neural networks

y11  f ( x1 , w11 )  y11  2

x yout MLP neural networks

Rule for changing the synaptic weights:

wij , new  wij  wij

It is very complicated to calculate the weight changes.

Solution of the complicated learning:

The method is still relatively complicated but it is much

??Why NN Application Results

within the study area

 Definition: two variables are said to co-vary if their

where  is the correlation coefficient between X and

 Consider this in the context of a single variable

Simpler Conceptual Model

δ(s) = W(s) + (s) is a zero-mean, stationary process

(s) is the random noise term with zero-mean and

 The theory of regionalised variables leads to

 This is based on the theory of random

 In OK, we model the value of variable z at location

 The regional mean m is estimated from the

 Predict at points, with unknown mean (which must

 The weights assigned to each sample point sum to 1

Local Means Inverse Distance Weights Kriging

• Matrix elements: Cij = C0 + C1 - g (h) Modelo Teórico

50 C14 = C41 = C02 = (C0 + C1) - g [ V 2 2

50 x 1 C23 = C32 = (C0 + C1) - g [ V 2 2

x 0 C24 = C42 = (C0 + C1) - g [ V 2 2

C34 = C43 = (C0 + C1 ) - g [ V 2 2

C01 = (C0 + C1 ) - g (50) = 12,66

C03 = (C0 + C1 ) - g (150) = 1,72

C11 = C22 = C33 = C44 = (C0 + C1 ) - g (0) = 22

 It was investigated the hypothesis that ‘non-

 Arsenic data were obtained from the

You might also like