Spatial Estimation The Geostatistical Point of View PDF

SPATIAL ESTIMATION: T H E GEOSTATISTICAL POINT O F VIEW
by
SIMON M C F A D Z E A N - F E R G U S O N
B.Sc. Bristol University, United K i n g d o m , 1991
A THESIS S U B M I T T E D IN P A R T I A L F U L F I L L M E N T O F
THE REQUIREMENTS FOR THE DEGREE OF
MASTER OF SCIENCE
in
T H E F A C U L T Y O F G R A D U A T E STUDIES
(Institute of A p p l i e d Mathematics)
(Department of Mathematics)
We accept this thesis as conforming

to the required standard
T H E UNIVERSITY OF BRITISH C O L U M B I A
M a r c h 1995
S i m o n McFadzean-Ferguson
In
presenting
this
degree at the
thesis
in partial
University of
fulfilment of
the
requirements
for
an advanced
British Columbia, I agree that the Library shall make it
freely available for reference and study. I further agree that permission for extensive
copying of this thesis for
department
or
by
his
or
scholarly purposes may be granted by the head of

her
representatives.
It
is
understood
that
copying
my
or
publication of this thesis for financial gain shall not be allowed without my written
permission.
Department of
The University of British Columbia
Vancouver, Canada
-Ik
Date
DE-6 (2/88)
\ 2.
'vSfrML '
Abstract
Geostatistics involves the statistical estimation of erratic surfaces, similar to those found
in geology, using sample data. It has been my experience that there are few texts i n geostatistics written for people who are new to the subject, and who have not been immersed
in it since its inception i n the 1960's. To prevent other people becoming confused by the
changing notation, and unspoken assumptions, I provide an overview of this subject, w i t h
the a i m of providing a clearer understanding of the concepts involved, the assumptions
made, and the motivation behind each type of estimator. I then concentrate on the more
general form of estimation assuming a nonhomogeneous trend, called
Universal Kriging.
I explain i n detail how this estimator can be found i n an accurate and computationally
efficient way. Using the information gained from robustness studies of this estimator, I
then attempt to apply it to real surfaces, for different methods of covariance estimation
and trend orders.
ii
Table of Contents
Abstract
ii
Table of Contents
iii
List of Figures
vi
List of Tables
vii
Acknowledgements
viii
General Notation
ix
Introduction
T h e E v o l u t i o n of Geostatistics
2.1
Derivation of Matheron's Ordinary K r i g i n g
2.2
T h e Histogram
13
2.3
(Ordinary) Lognormal K r i g i n g (LK)
16
2.4
Nonlinear estimators
18
(OK)
estimator, ZOK(^)
2.4.1
Disjunctive K r i g i n g (DK)
19
2.4.2
Indicator K r i g i n g (IK)
25
K r i g i n g w i t h a nonhomogeneous trend
3.1
27
O r i g i n a l derivation of Matheron's Universal K r i g i n g estimator, Z {x)

UK
3.1.1
Discussion
30
34
iii
3.2
3.3
R i p l e y ' s form of the Universal K r i g i n g estimator
36
3.2.1
Cholesky factorisation
36
3.2.2
Q R Decomposition
36
3.2.3
Singular Value Decomposition ( S V D )
37
3.2.4
Step 1: E s t i m a t i n g the trend
37
3.2.5
Step 2: Modelling the residuals, R(x)
44
T h e equivalency of the two forms of estimation
46
3.3.1
F i n d i n g $ explicitly
46
3.3.2
F i n d i n g R K(*)
46
explicitly
3.4
C a l c u l a t i n g o"g(x), the error variance of ZUK(X)
47
3.5
E s t i m a t i o n of the covariance function
48
3.5.1
Intrinsic R a n d o m Functions of Order p (IRF-p)
51
3.5.2
Other forms of unbiased covariance estimation
57
59
Robustness of the Universal Kriging estimator, Z (x.)

UK
4.1
Step 1: T h e trend component functions
4.2
Step 2: Nonparametric estimation of the experimental (co)variogram
4.3
Step 3: F i t t i n g an appropriate parametric model, C ( x , y ; 6)
4.4
Step 4: F i n a l estimation using C ( x , y ; 0 ) and the sample data z

4.4.1
60
. .
63
64
67
Robust K r i g i n g
68
A Numerical Application
71
5.1
Reasons for choosing the biased form of (co)variogram estimation
5.2
Detailed description of (co)variogram estimation
74
5.3
Results
79
5.4
Detecting the optimal trend order
87
iv
. . . .
72
5.5
Conclusions
90
94
Conclusion
97
Bibliography
List of Figures
1.1
Idealised (co)variogram under second order stationarity assumption
2.2
Illustration of how distribution of Z(x)
is transformed to N(0,1)
. . .
using
CDF's
23
5.3
A n experimental semivariogram
75
5.4
Comparison of semivariogram models
76
5.5
A comparison between L S and Cressie's weighted L S method
79
5.6
H o w weighted L S alters the sill of the spherical fit
80
5.7
Plots of residuals versus fitted values for the A D and S D data sets . . . .
82
5.8
Histograms and normal q-q plots for the A D and S D ^ residuals
83
5.9
Dimension-.methodinteraction plots for both the A D and S D residuals .
85
5.10 Density and normal q-q plots for L S - W L S and L S - L I N differences for
stationary trend surface
87
5.11 M e a n and median factor plots for the A D data set
92
5.12 N o r m a l q-q plots for the cross validation data
93
vi
List of Tables
5.1
A N O V A tables for A D and S D ^ data sets
5.2
M e a n and standard deviations of A D data for the different m e t h o d / t r e n d

combinations
84
85
vii
Acknowledgements
First and foremost I would like to express my gratitude to D r . R u b e n Zamar for taking a
particular interest i n my work i n his sampling theory course, and for offering to supervise
me for the duration of my Masters thesis. I thank h i m for his continuous support and
guidance during the course of this research.
I am also extremely grateful to Professor D o n L u d w i g for serving on my advisory committee, for carefully reading drafts of this thesis, and for his many helpful comments
during various stages of this work. I also thank h i m for his generous support as my
course advisor.
viii
General Notation
Roughly i n order of appearance:

n
the number of sample values
a general (usually unsampled) point i n !!ft
Xj
the sample points, i = 1 . . . n
Z(x)
the theoretical regionalised variable for the ore grade at the point x
z(x)
the ore grade measured at the point x , seen as a realisation of the variable Z ( x )
the (n x 1) vector of ore grade random variables at the sample points, x i . . . x
the (n x 1) vector of ore grades measured at the sample points, x i . . . x
Z(x)
an estimator of Z{x)
(x)
the actual estimate of z{x)
LI(X.)
the expected value of Z{x) at point x
the expected value of Z(x) when it is assumed to be homogeneous
the (n x 1) vector of residual variables at the sample points, X i . . . x
C ( x , y) shortened notation for C o v ( Z ( x ) ,

C(h)
Z(y))
notation for C o v ( x , y) when the covariance function is assumed to be

homogeneous, where h = x y
C(h)
notation for C o v ( x , y) when the covariance function is assumed to be

homogeneous and isotropic, where h = \x y\
7(x, y) the semivariogram for Z(x) with same notation changes as for C ( x , y)
cr^(x)
the variance of Z(x), which depends on x
<7g(x)
the variance of the error, Z ( x ) Z(x), which depends on x

ix
A,-(x)
the kriging estimator weights, i = 1 . . . n , which depend on x
A(x)
the (n x 1) vector of kriging weights A (x)
<p
a Lagrange multiplier
<p
the (n x 1) vector of Lagrange multipliers, fa,..., (f>
//(x)
the trend component functions, I = 0 ... L
f(x)
the ((L + 1) x 1) vector composed of the trend component functions
the (n x (L + 1)) m a t r i x of trend component functions evaluated at the sample
points
p
the degree of the polynomial trend
(3
the ((L + 1) x 1) vector of trend constants, /?/, such that: u(x) = / 3 f (x)
the (n x n) m a t r i x of covariances between the ore grade variables, Z(x;), at the
sample points
KQ
an i n i t i a l estimate of K
k(x)
the (n x 1) vector of covariances between the ore grade variable Z(x) at x and
those at the sample points, Z ( x i ) . . . Z ( x )
n
m(x)
the generalised least squares estimator of the trend: m(x) = (3 f(x)
the Cholesky decomposition lower triangular m a t r i x of K such that: K
the uncorrelated sample data transformed from Z
the ( n x ( I + 1)) transformed m a t r i x of F such that: G =
k^x)
the ( n x l ) transformed vector of k(x) such that: k^x) = M k ( x )
the uncorrelated sample residuals transformed from e such that: = A f e
such that: Y
= M
MM
M~ F
l
_ 1
_ 1
i?
R
the Q R decomposition of G where Q is (n x n), i? is (n x (X + 1)) and G =

an ((1/ + 1) x (L + 1)) upper triangular m a t r i x : R =
V
i?(x)
the residual estimator : i?(x) = e(x) = Z(x) p(x)
the actual residual estimate at the sample point x*, seen as a realisation of
the variable i?(x,) such that: r(x,) = Z(XJ) m ( x j )
the (n x 1) vector of estimated residuals r ( x j ) at the sample points x i . . . x,
the covariance parameters for a chosen covariance model C ( x , y ; 6)
xi
Chapter 1
Introduction
Geostatistics was initially created to provide better estimates of ore reserves. To understand the concepts of geostatistics, it is necessary to look more closely at the types of
surfaces of the ore deposits involved.
T h e quality of ore, or
ore grade, can be distributed i n a variety of ways depending upon
the type of ore deposit. Bedded sedimentary ore deposits can have quite regular spatial
ore grade distributions. For these deposits, the ore reserve can be adequately estimated
using conventional techniques such as the fitting of a deterministic function by least
squares.
Alternatively, deposits of valuable minerals and precious metals such as the
gold mines i n South Africa can have extremely erratic spatial ore grade distributions,
even w i t h i n a very small area.
A l t h o u g h these ore grade distributions are continuous
surfaces, they are so erratic i n their spatial behaviour that points on the surface that are
relatively far apart appear to be statistically independent. T h i s idea of ore grade being
somewhere between a deterministic function and a random variable led to the concept
of the
regionalised variable. A s the types of surfaces that are found i n geology could not
be realistically fitted by any form of deterministic function without having to solve for a
huge number of parameters, geologists were forced to look at statistical models for these
surfaces.
A s for traditional statistical models, geostatistics considers surface values .z(x) and
z(y),
Chapter 1. Introduction
from different locations x and

and Z(y).
y, to be realisations of the surface random variables, Z(x)
A basic statistical model assumes that, if chosen randomly, Z(x)
and
Z(y)
are independent and identically distributed random variables with:
E(Z(x))
p, V x
(1.1)
Var(Z(x))
o\
(1.2)
Var(Z(x) - Z(y)) =
Vx
V a r ( Z ( x ) ) + Var(Z(y)) = 2a .
(1.3)
Equations (1.1) and (1.2) state that the surface variables at a l l points have the same
expected value and variance, and are commonly referred to as the
stationarity under
translation or homogeneity assumptions i n both the mean and the variance. These assumptions allow the estimation of statistical properties which can be assumed representative of the whole surface or population over which stationarity is assumed, rather than
of a particular location. Under these assumptions we can view each sample value as an
outcome of the
same variable, allowing us to find its mean and variance.
Therefore, under this basic model, the geological surface z ( x ) is modelled as a realisation
of an infinite set of independent (normal) random variables, Z(x),
x.
one for each point
W h e n talking about this set of random variables, Z(x) is referred to as a
random
function, as it assigns a random variable to each point x . Even though the sample values
are used to estimate the common mean,
\x, and variance, a\, the positional information
of the sampled points is not used i n estimating the surface value at a specifically placed
unsampled point. Bearing i n m i n d that the geological surface z ( x ) that we are t r y i n g to
model as a random function is actually
non-random and continuous, then for any points
x and y close together, it is unrealistic to treat variables Z ( x ) and Z(y)

independent.
as entirely
Instead, the closer x and y are to each other, the more likely Z(x)
and
Z(y)
w i l l be similar i n value. Therefore, a more realistic statistical model is to assume
that however randomly the points x and y are chosen, they are not independent, but
spatially correlated.
This additional information is translated into the geostatistical
theory i n the following two ways:
E(Z(x)) =
Vx
V a r ( Z ( x ) - Z{y))
intrinsic stationarity (IS)
(1-4)
= 2 (h) J
E ( Z ( x ) ) = /* V x
> second order stationarity (SOS)
(1-5)
Cov(Z(x),Z(y)) = C(h) J
where h = x y .
The function, 27(h), is called the
variogram. For a particular direction, this is usually
an increasing function of | h | . The idea behind this is that the closer the points x and y
are, the more alike the variables will be, and so the smaller w i l l be the variance of their
difference.
Thus, under this geostatistical theory, the geological surface, z(x),
realisation of an infinite set of random variables, Z(x),
is modelled as a
which are independent over long
distances, and are correlated over short distances, dependent on their variogram 27(h) or
covariance function C ( h ) , otherwise known as the
is referred to as a
covariogram. Under this model, Z(x)
regionalised variable.
C o m p a r i n g assumptions (1.4) and (1.5), S O S is the stronger of the two as it i m p l i c i t l y

implies stationarity i n the variance, whereas IS does not.
It can be shown that the
class of IS models strictly contains those satisfying S O S . In practise, I have found that
this distinction is lost because one must assume stationarity in the variance i n order to
Chapter 1.
Introduction
estimate either the variogram or the covariance function. Under this unifying assumption,
the variogram and covariance function can be directly related i n the following way:
2 (h)
7
( Z ( x ) - Z{y)) = V a r ( Z ( x ) - Z ( y ) )
Var(Z(x)) +
2a -2C(h).
Var(Z(y))-2Cov(Z(x),Z(y))
(1.6)
F r o m now on, I use the abbreviation (co)variogram to refer to both the variogram and
covariance function. In geostatistics, people refer to the semivariance or semivariogram,
7(h), rather than the variogram where
(h) = a
2
2
-C(h) = C(0)-C(h),
(1.7)
A s the covariance function is so central to spatial estimation i n geostatistics, estimation

of this function is key to any estimator's performance.
Unless there is information to
suggest that the covariance function should indicate more correlation i n one direction
than another, i t may also be assumed to be stationary under rotation, or isotropic, to
make estimation of the covariance function easier:
C ( h ) -> C{h)
where h = | h |
Having outlined the philosophy and assumptions behind basic geostatistical theory, I now
concentrate on more specific developments within this subject of spatial estimation.
In Chapter 2, I describe the evolution of ideas i n mainstream geostatistics. Throughout
this chapter, the random function Z(x) is assumed to have a homogeneous trend. Firstly,
Figure 1.1: A n idealised (co)variogram under the second order stationarity assumption.
C(h) is the covariogram and 'y(h) is the variogram. Sample values whose
inter-location distance is greater than a, the range of influence, are considered to be uncorrelated and thus independent.
I derive Matheron's original
linear estimator, the Ordinary K r i g i n g estimator
B y considering the optimality of
ZOK(X),
ZOK{*)-
we become interested i n the underlying dis-
tribution of the sample variables which can be estimated by forming a histogram of the
data.
Methods of forming this histogram from correlated data are discussed.
Under
the assumption that the data is jointly gaussian, the optimal estimator can be shown to
be
linear. U p o n the failure of this assumption, I then outline the alternative methods
used, leading to the onset of
nonlinear estimators. I summarise a selection of nonlinear
estimators, outlining their differing motivations and assumptions, and pointing out their
practical failings.
In Chapter 3, I then outline the attempts that have been made i n geostatistics to estimate random functions Z(x) w i t h a
nonhomogeneous trend. I describe i n detail one
of the early geostatistical estimators, the
Universal Kriging estimator
ZUK^X)-
Pointing
out the weaknesses i n applying the original form of the estimator, I describe i n detail an
alternative derivation by Ripley (1981) that is practically superior, and computationally

more efficient. T h e two forms of estimation are shown to be equivalent. I discuss the
problem of estimating the covariance function unbiasedly from nonhomogeneous sample
variables, and how this led to the use of
tion of the
generalised increments and the unbiased estima-
generalised covariance function. Finally, I outline other proposed methods of
unbiased estimation of the covariance function.

In Chapter 4, I outline the studies that have been done on the robustness of Universal
K r i g i n g to outliers i n the data, and to errors i n the estimation process, such as misspecification of the trend order, inappropriate choice of (co)variogram model and misspecification of the model parameters. I also describe various methods of detecting and editing
these outliers.
In Chapter 5, I attempt to apply the improved form of the Universal K r i g i n g estimator
derived i n Chapter 3 to real surface data of a mountain range, assumed to be representative of an ore surface, using the practical information gained from Chapter 4. I
discuss the various methods by which I estimate the (co)variogram of the surface from
the data. I then divide the surface area into smaller areas and estimate a selection of
these smaller surfaces for different combinations of (co)variogram estimation method and
trend order. I analyse the results i n detail, and come to some general conclusions. Lastly
I discuss whether the optimal trend order could have been identified for a particular
sample configuration and surface.
Chapter 2
T h e E v o l u t i o n of Geostatistics
Even though the theory of regionalised variables and the original kriging estimators were
developed i n the early to late 60's, much of this work was written i n French and it was not
until the publications i n English of Michel David's
and A n d r e JournePs and C . Huijbregts'
Geostatistical Ore Reserve Estimation
Mining Geostatistics i n the late 70's that use
of these geostatistical ideas became widespread. Since this time, numerous papers and
books have been published on a l l aspects of geostatistics.
It is impossible to summarise adequately this wealth of ideas, recent estimators, improved
versions of old estimators, and differences of opinion, i n a single chapter. Therefore I outline the 'backbone' of modern geostatistics, highlighting the motivations behind each type
of estimator. I begin by deriving Matheron's
linear Ordinary K r i g i n g estimator Z o ^ ( x )
which was originally thought to be distribution-free.
However, when considering the
optimality of an estimator, the underlying distribution of the sample variables becomes

vitally important, and can be estimated by forming a histogram of the data. T h e problem
of using correlated data to form a histogram is discussed. Depending on whether this
distribution can be assumed to be jointly gaussian or jointly lognormal, or whether it can
be transformed to an isofactorial model, a variety of estimators have been developed, and
applied w i t h varying success. I describe the theory behind these estimators i n enough
detail to provide the reader w i t h a clear understanding of the concepts and assumptions
involved.
For a l l the estimators discussed i n this chapter, the random function
Z(x),
Chapter 2. The Evolution of Geostatistics
from which the surface originated, is assumed to have a homogeneous trend.

K r i g i n g is a statistical form of estimation developed by Georges M a t h e r o n i n the 1960's
for use i n the mining industry as a method for estimating ore reserves. It is named after
D . G . Krige, a South African mining engineer, who was a forerunner i n using statistical
techniques i n ore estimation. It is this same G . Matheron that developed the Theory of
Regionalised Variables, introduced i n the previous chapter, which together w i t h kriging,
led to an explosion of work i n this area. T h i s work, and other related subjects i n applied
statistics, has become collectively known as Geostatistics.
T h e original kriging estimators, Z(x),
are estimators of the regionalised variable
Z(x)
at an unsampled point x , and are chosen as a weighted sum of the regionalised variables
Z ( x i ) , . . . , Z(x. ) at the sample points, x i , . . . , x . These sampled points do not have to
n
be of any specific configuration. The weights are chosen so that Z(x) is unbiased, and the
variance of the difference between Z(x)
and -2T(x) is minimised. The kriging estimator,
Z ( x ) , is therefore a B L U E estimator (Best Linear Unbiased Estimator) i n that no other

linear combination of the sample variables w i t h the same model assumptions can form
an estimator w i t h a smaller variance about the true surface value at a general point x .
A l t h o u g h i n the beginning the term kriging specifically referred to this particular method
of unbiased m i n i m u m error variance estimation, it now refers more to the collection of
geostatistical estimators, both linear and non-linear, that have developed from this idea.
Before going any further, I shall outline the basic assumptions made by a l l the kriging
estimators discussed i n this chapter.
Basic kriging assumptions

1. The unknown ore grade, z(x), and the n sample values, z ( x i ) , . . . , . z ( x ) , are realin
sations of the respective regionalised variables, Z ( x ) , and Z ( x i ) , . . . , Z ( x ) . T h e y

contain no measurement or positional errors.
2. The covariance C o v ( Z ( x ) , Z(y))
of regionalised variables Z(x)
and Z ( y ) , is known
for any two points x and y i n the area over which we want to estimate
z(x).
3. T h e non-negative definite matrix K of covariances between the ore grade variables

at the sample points, is positive definite, where
C(x!,Xi)
....
C(x!,X )
n
(2.8)
\ C(x ,xi)
n
and C ( x , y ) = C o v ( Z ( x ) , Z(y))
4. The mean of Z(x)
....
C(x ,x ) J
n
for ease of notation.
is the same for all points x i n the area over which we want to
estimate -z(x). In geostatistical terms, the trend is homogeneous. T h i s assumption

w i l l be relaxed i n Chapter 3.
2.1
D e r i v a t i o n o f M a t h e r o n ' s O r d i n a r y K r i g i n g (OK) e s t i m a t o r ,
Matheron considers the original Ordinary K r i g i n g (OK) estimator
ZOK(*)
Z K(^)
0
of the surface
variable Z(x.) at point x as a linear function of the surface variables Z ( i ) > > Z(yin) at
x
the sample points:

Z (x)
OK
= >(x)Z(xO = A ( x ) Z ,
r
(2.9)
10
where
( Z(
X L
( Ax(x)
) ^
A(x) =
Z
n
A (x)
2
(2.10)
and the weights, A;(x), are fixed but unknown.

Z(xi),...,
For
T h e mean of each variable Z ( x ) ,
Z ( x ) is unknown and denoted by /i.

n
ZOK(X)
to be unbiased, we need the expected value of the
error, the difference
between the ore grade z ( x ) that would be measured at x and its estimate
Z K{*),
0
to be
zero:
E(Z(x) - Z {*))
OK
(Z(x)) - (Z *(x))
G
1=1
(i - E M * ) ) = o.
(2.11)
Therefore, for equation (2.11) to be true, independent of the value of the homogeneous
mean LL, we require:
] P A j ( x ) = 1,
otherwise written as
A ( x ) l = 1.
T
i=i
T h e error variance, cr (x), is defined as:

2
<r (x)
2
Var(Z(x) - Z ^ ( x ) )
Var(Z(x)) + V a r ( Z
O K
( x ) ) - 2 C o v ( Z ( x ) , Z0AT(X))
(2.12)
Chapter 2.
The Evolution
of
Geostatistics
11
Var(Z(x)) + V a r ( A ( x ) Z ) - 2Cov(Z(x), A ( x ) Z )
r
o (x) + A ( x ) i T A ( x ) - 2 A ( x f k ( x ) ,
2
(2.13)
where
C(x,
X l
) ^
(2.14)
k(x)
V G(x,x) j
n
We now minimise the error variance
crf(x) subject
to the unbiasedness constraint i n
equation (2.12) by introducing a Lagrange multiplier 0. Therefore we minimise:
<r (x) + 2 ( l - A ( x ) l ) 0
<x (x) + X(x) KX(x)
- 2 A ( x f k ( x ) + 2(1 - A(x) l)</>,

r
(2.15)
w i t h respect to A ( x ) and 4>. The factor of two i n the Lagrange component of (2.15) is
only for convenience i n the working, and has no effect on the final solution.
A s H is quadratic i n A ( x ) and K is non-negative definite, when we differentiate with
respect to A ( x ) , the stationary point solution w i l l necessarily be at a m i n i m u m :
<9A(x)
-
-- 2 K A ( x ) - 2k(x) - 201 = 0
=
(1-A(xfl) = 0
which imply respectively that
KX(x)
k ( x ) + 01
(2-16)
A(x) l
1.
(2.17)
12
These are referred to as the kriging equations. B y first solving for c/>, and using the
assumption that K is positive definite, it can be shown that a unique solution for A(x)
exists.
T h i s is described i n full detail i n Section 3.1 for Universal K r i g i n g , of which
O r d i n a r y K r i g i n g is a special case. Therefore, by substituting the solution for A(x) into

equations (2.9) and (2.13), we have the original OK estimator and error variance as
derived by Matheron.
If the homogeneous mean ji is somehow known, then a slightly different kriging estimator
is used:
n
z {y) = v- + 5 > ( x ) ( Z ( x i ) ~ A*),
(2-18)
OK
i=l
which is already unbiased. Therefore an unbiasedness constraint is not required, and the
kriging equation for A(x) is simply:
(2.19)
tfA(x) =k(x).
A s mentioned, ZOK(X)
and ZOK(X)
are B L U E estimators and were originally consid-
ered to be distribution-free, but as soon as one considers the optimality of an estimator,

the assumed distribution of the data is all important.
Z(x)
A s s u m i n g E(Z(x) )
2
< oo, (for
to have a finite variance), the optimal estimator of Z(x) by any measurable func-
tion g(Z(xi),...,
Z ( x ) ) , whether linear or nonlinear, and irrespective of the underlying

n
distribution of Z{x), is the conditional expectation (CE) estimator:
Z (x)
CE
= E(Z(x)|Z(
=
/
J
z(x)
X l
),...,Z(x ))
f e ) , Z ( x ) . - , Z ( x ) {Z{*)X*1),
1
(2.20)
*W)
/z(xi),...,Z(x )(2(Xi),...,2(x ))
n
d z ( x )
Chapter 2.
CE
The Evolution of Geostatistics
13
is optimal i n that it minimises the error variance over all estimators which can
be computed from the sample values.
In order to calculate this estimator, the joint
distribution of ( Z ( x ) , Z ( x x ) , . . . , Z ( x ) ) would be required, which is rarely known i n

n
practise. For this reason, we are forced to restrict our choice of estimators to those that
require information that we know, or can at least estimate from the known data values
z(Xi), . . . , 2 ( x ) .
n
If ( Z ( x ) , Z ( x i ) , . . . , Z(x ))
n
has a joint gaussian distribution, then it can be easily shown
that CE is a linear function of the sample variables Z(x), Z ( x i ) , . . . , Z ( x ) , and more

n
importantly, that it is identical to ZOK^)
(Rendu 1979). Therefore, under this distribu-
tional assumption, only linear functions of the sample variables need be considered. O f
course, if pi is unknown, then CE cannot be found exactly. In this case, Zo {x)
K
be used, which is the best (linear) unbiased approximation to
should
CE.
The first problem is how to verify this gaussian assumption using the correlated sample
values. U p to now this has been done using the simplest of tools: the histogram.
2.2
The Histogram
It is an impossible task to verify that (Z(x), Z(xi),...,
Z(~x )) has a joint gaussian disn
tribution from only the sample values ( z ( ) , . . . , ;z(x )), seen as a single realisation
X l
of ( Z ( x i ) , . . . , Z(x )).
F r o m only one realisation of a distribution, there is no form
of inference possible. Therefore, we can only verify that the marginal distributions of
( Z ( x ) , Z ( x i ) , . . . , Z(x ))
n
are gaussian, assuming them to be identically distributed. T h i s
can be done by forming a histogram of the sample values.

A l t h o u g h a histogram is easy to produce, finding the correct histogram becomes very
14
difficult due to the correlation i n the sample values. In order for it to represent
true distribution of Z(x),
the
the sample values must be seen as realisations of independent
random variables Z(x.i), i
= l,...,n
identically distributed to Z(x).
T h e problem is
that, i n reality, samples are far from independent. Miners drill where they t h i n k the ore
grade is high, and then cluster their drilling about this high grade region. So treating
these samples as independent realisations of Z(x)
being over-represented, leading to a
would result i n the high ore grades
biased histogram and an overestimated average ore
grade.
A very simple technique called
declustering involves dividing the area into a regular grid
and choosing one sample at random from each grid square.
Using this reduced set of
sample values to form the histogram has proved to much reduce this bias, and is far more
practical than the other methods devised to resolve this problem (David, 1988). One
drawback to this technique is that, dependent on the grid size, different histograms may
be found. T h i s leads to confusion over which histogram to use, demanding a subjective
benefit function over which to optimise.
M y objection to this technique is that it attempts to remove the bias by choosing the
sample points randomly within each square. This does not agree w i t h the theory. A s
described i n Chapter 1, geostatistics is based on the premise that however randomly
you choose your samples, they will not be independent but correlated as dictated by the
covariance function. For this reason we must ensure independence by m a k i n g sure that
the samples used i n the histogram are further apart than the
range of influence, a, of the
covariance function. Using the declustering technique, we could still end up w i t h samples
close together despite being i n separate squares, which would then bias the histogram.
A method more i n line w i t h the theory would be to assume a gaussian distribution,
15
estimate the (co)variogram from the data, and from this, estimate a. T h e n by o m i t t i n g
the samples that are within distance a of each other, we could form a histogram and
see if our original assumption is valid. The (co)variogram estimators, discussed later i n
Chapter 4, are again only optimal under the gaussian assumption.
A n o t h e r source of bias i n the histogram is the occurrence of
outliers. A g a i n , numerous
solutions have been put forward, but an even more basic problem still lies i n defining an
outlier. A common method to correct this bias is to decide on a particular model for
the distribution from the histogram and then correct the sample values accordingly to fit
the model. T h e problem w i t h this approach is that i n practise, relatively small samples
are taken, and so the sample values could very well be accurate but just not numerous
enough to provide a good representation of the distribution. In this situation it would be
very unwise to alter the sample values, and thus risk biasing the results, so as to fit an
estimated histogram. Instead, estimation techniques that are robust to outliers should
be used.
It is a well known fact that linear estimators are extremely sensitive to outliers. For
this reason, histogram estimation, and outlier detection i n particular, is an essential
and skilled job and takes great experience to be done well. If the corrected histogram is
skewed from normal, it could indicate either the existence of a nonhomogeneous trend, or
a different distribution. The first case w i l l be discussed in Chapter 3. In the second case,
if the distribution is
lognormal, then a transformed version of the OK estimator, called
Lognormal Kriging, can be used due to its close connection to the gaussian distribution.
Otherwise, CE
estimators.
is not necessarily linear or loglinear and we must consider nonlinear
Chapter
2.3
2. The Evolution
of
Geostatistics
16
(Ordinary) Lognormal Kriging (LK)
If Z ( x ) , Z ( x i ) , . . . , Z ( x ) have a joint lognormal distribution then \n(CE)
is a linear
function of the transformed sample variables l n ( Z ( x ) ) , m ( Z ( x i ) ) , . . . , l n ( Z ( x ) ) . A g a i n ,

n
this can easily be shown using multivariate gaussian theory (Rendu 1979).
T h i s estimator was developed by A n d r e Journel, and a complete derivation can be found

in (Journel 1980). W e first perform the transformation:
F ( x ) = ln(Z(x)),
(2.21)
such that the new variables F ( x ) , F ( x i ) , . . . , Y(x )

n
use the O r d i n a r y K r i g i n g estimator YOK(^)
are normally distributed. W e then
of ^ ( x ) , using the transformed data set
l n ( z ( x i ) ) , . . . , l n ( 2 ; ( x ) , assuming the homogeneous mean \n(p) of Y(x) to be known. O f

n
course this must be transformed back i n order to find the estimator for Z(x):
Z (x)
LK
= De "&\
(2.22)
where D is a constant derived i n (Journel 1980) to ensure that the resulting estimator is
unbiased. In deriving D, Journel uses the fact that YOK(X) is normally distributed and
so e
Y
^ is lognormally distributed.
A g a i n it can be easily shown that under these distributional assumptions, ZLK(X)

equal to the CE estimator and is thus optimal.
then ZLJ<-(X), the LK estimator using Y K(*)
0
is
A g a i n , i f the mean p is unknown,
instead of Y K(^-),
0
is the best (log-linear)
unbiased approximation to C E .
T h e use of LK has been somewhat controversial, due to its lack of robustness compared
to n o r m a l kriging. OK is only dependent on the relative values of the covariance function
Chapter 2.
The Evolution
of
Geostatistics
17
C(h) at different h, i.e. its shape, whereas LK is very sensitive to local variations i n the
value of C(h) of the transformed data. This can be seen by looking at the OK kriging
equations (2.16) and (2.17) which can be written as
EAj(x)C(xj,Xi) = C(x,xO
i=i
i = l,...,n,
>i(x) = l.
(2.23)
(2-24)
B y substituting the covariance function C ( x , y ) for a C ( x , y) + P i n equation (2.23) where

a and j3 are constants, and then expanding, we get
n
a ^ j ( ) ( j . i ) + / ? E i ( ) = a C ( x , X i ) + /? + <>
/
x
z= l,...,n.
(2.25)
Therefore, by using equation (2.24), we can cancel P from both sides of the equation.
T h e n by dividing throughout by a, equation (2.25) becomes
d>
A ( x ) C ( x , x ) = C(x,Xi) + j
j=i
i = l,...,n.
01
The constant 0 i n equation (2.23) only has the effect of equating

C ( x , j ) } for each i.
X
of Aj(x).
(2.26)
{YIj=\ A J ( X ) C ( X J , x,)
Therefore, dividing <f> by a has no effect on the final solutions
O f course, a can not be zero, as there would be no solution to the kriging
equations. Therefore, the covariance function can not be horizontal. So, it is clear that
any constant nonzero factor or constant term i n the covariance function is filtered out
by the kriging equations, and thus has no effect on the kriging parameters A ( x ) and the
final estimator
ZOK{*)-
However, for LK, the unbiasedness constant D is dependent upon the exact covariance
function, and thus the estimator loses this filtering property. This makes LK far more
18
dependent on the estimation of the covariance function, which is already a very difficult
task. It is demonstrated very clearly i n (David 1988) using a couple of examples, that by
slightly altering the covariance function you can get a great variety of estimates. For this
reason, LK is a very risky estimator to use unless the practitioner is very experienced
and some form of cross-validation technique is used, such as David's method ( K i m and
Knudsen, 1978).
This method involves the simple technique of removing one sample
value at a time and using the other sample values and the proposed covariance function
to estimate it. B y repeating this for every sample value, we get a collection of estimation
errors which we can compare w i t h the model predictions to see if the proposed covariance
is appropriate.
2.4
Nonlinear estimators
T h e advantage of using
linear estimators as compared w i t h nonlinear estimators, is that
they require fewer (although stronger) assumptions about Z(x),

faster and easier to use, and usually provide unique solutions.
are computationally
For these reasons the
original linear kriging estimators such as ZOK(X) are still i n great use i n the mining
industry, despite the recent development of nonlinear estimators. For the same reasons,
it is also worthwhile to use linear estimators, even when the variable Z ( x ) does not
depend on the data i n a linear way.
In this case, it is advisable to divide the area
into subsections, where the kriging assumptions of a homogeneous trend and a jointly
gaussian or lognormal sample distribution are more likely to be satisfied, and use a
different estimator for each subsection. A s linear estimators are extremely sensitive to
outliers, dividing the estimator i n this way would also limit the effect of the outliers to
the sections i n which they lie. O f course, the extent of this division would be limited by
the number of samples, considering that a covariance function and trend function must
19
be estimated for each section.

W h e n Z ( x ) , Z ( x i ) , . . . , Z ( x ) are neither jointly gaussian nor jointly lognormal, then
CE is neither linear nor log-linear. Therefore, i n this case, when the number of samples
does not allow ample division of the area as described above, we must look to
nonlinear
estimators of Z(~x).
T h e first nonlinear estimator to be developed was Disjunctive Kriging. Devised by M a t h eron (1976), it is still one of the most popular non-linear estimators, at least i n research
if not i n practise, and for this reason I shall describe it i n more detail.
2.4.1
Disjunctive Kriging (DK)
Matheron's motivation behind Disjunctive K r i g i n g is to increase the space of possible

estimators of Z(x), rather than restricting them to linear ones, but i n such a way that
the computations involved do not require information that could not be estimated from
the data.
T h e a i m is to find an estimator somewhere between OK and CE, again
assuming a parametric distribution for Z(x).

Therefore, instead of choosing a linear function of the sample variables as for the O r d i n a r y
K r i g i n g estimator, Matheron estimates the variable Z{x) by a sum of n Immeasurable
functions
each a function of one sample variable Z ( x ; ) :

n
(2.27)
i=i
which satisfy the following unbiasedness constraints:
E(Z (x)
DK
| Z(x,-)) =
E(Z(x) | Z(x.j))
j = 1,... ,n.
(2.28)
20
These unbiasedness constraints of ZDK{~X) are chosen to be as close to the conditionally

unbiased property, E(Z K(X)\Z(X-I)
D
Z ( x ) ) = CE, without destroying the

n
disjunctive
nature of the estimator. The reason for this will become clear later on i n this section.
T h e functions fi must be Immeasurable for E[fi (Z(x.i)]
< oo for each x ; , and thus for
ZDK{*)
to have a finite variance. A s the system of equations defined i n (2.28) cannot be
solved i n general, more assumptions must be made to simplify the problem. F r o m now
on, for ease of notation, x shall be denoted by Xo.
If we can assume that Zfa),
Z ( x i ) = to(Y(xi)),
i = 0 , . . . , n, are transformations from a process such that
w i t h to assumed known and Immeasurable, and the process F ( x ; ) ,
denoted by Yi i = 0 , . . . , n, forms an isofactorial model such that:

1. Yi, i = 0 , . . . , n, have the same distribution function F
2. T h e joint distribution functions Fij for all pairs of variables (Yi, Yj) can be factorised
such that:
= (f^TSWWtyi)^))
Fi^ViAVj)
\fc=0
F(d )F(d )
yi
yj
Vi, j
(2.29)
where v , k 0,1,..., are orthogonal functions i n that Vz, E[ri (Yi)ri (Yi)]
k
kl
k2
=0
for k ^ k , and T (i, j) = E [rik( i)Vk(Yj)],

Y
then the system of equations i n (2.28) can be greatly simplified, m a k i n g it possible to

estimate the functions fi, i = 0 , . . . , n, or more precisely hi, i = 0 , . . . , n, where
hi(-) = fM-)).
(2.30)
For a general isofactorial model, understanding of the following theory can become unnecessarily confused by notation.
Therefore, as only one isofactorial model seems to
be used i n most practical applications of DK,
21
I shall derive the DK
estimator for this
specific model. The following steps are very similar to those i n the general case:
If the sample variables Zfc)
are transformations from the random variables Yi whose
distribution function F is iV(0,1), and whose joint distribution functions Fij are bivariate
normal w i t h marginal means 0, marginal variances 1, and correlation coefficient pij =
E(YiYj) (which i n this case is also the covariance), then the Yi can be shown to form an
isofactorial model w i t h the Tjk(Yi) being the normalised Hermite polynomials:
V k ( Y i )
=k\
> >~>
w h e r e
;'
( 2
'
3 1 )
These polynomials form an orthogonal basis of Immeasurable functions of a single JV(0,1)

distributed random variable. Therefore if we rewrite (2.27) i n terms oi hi, i 0 , . . . , n:
Z (x)
DK
J2h(Z(xi))
i=l
= j^hiiYi),
= JZhîXi))
i=l
(2.32)
i=l
then as Yi, % = 0 , . . . , n, are distributed N(0,1),
and the functions hi, i = 0 , . . . , n,
are Immeasurable, each element in the sum can be written as an expansion of Hermite
polynomials i n the following way:
oo
h (Yi) = W Y )
fc=0
t
Similarly Z{x )
0
oo
=
fc=o
TJ
(V\
A.fc^^.
-
(2.33)
lc
can be expanded in this way:
Z(xo)=u,(Yo) = f > ^ T ^
( - )
2
3 4
Under the assumption that we know the transformation cu, we also also know the coefficients bk, k = 0 , 1 , . . Now, i n order to solve for hi, i = 0 , . . . , n, or more specifically, for
22
the coefficients A ^ , i = 0 , . . . ,n, k = 0 , 1 , . . , we must go back to (2.28) which can be

rewritten as:
it ( i(Yi)
E
i) = E(Z(x )
I Yj)
j = 1 , . . . , n.
(2.35)
i=l
B y inserting (2.33) a n d (2.34) into (2.35), we get:
oo
oo
n \
Y.Y.Ê{H {Y )\Y )
k
fc=0 i=l
= Y.T-^(H {Y )\Y )
k=0
which, using the fact that

oo
j = l,...,n,
(2.36)
E(H (Yi) \ Yj) = pij H (Yj) for all triples (i,j,k), becomes:
k
oo
>
fc=0 i=l
E hp^VkiYj)
j = 1 , . . . , n.
(2.37)
k=0
T h i s is where the advantage of expressing the hi, i 0 , . . . , n, as expansions of Hermite

polynomials is realised. B y multiplying both sides of (2.37) by rjk(Yj) and t a k i n g expectations, we can use the orthogonal properties of the polynomials to divide each equation
for j i n (2.37) into infinitely many smaller equations, one for each k:
it *Pn
x
= *A>;*
j = l,...,n,
A; = 0 , 1 , . .
(2.38)
i=i
Thus, for each k, we now have n equations and n unknowns allowing us to solve for each
set of Ajfc, i = 1 , . . . , n, separately - thus the title Disjunctive K r i g i n g .
In practise, the Hermite polynomial expansions must be truncated to form a finite sum,
such that our DK estimator becomes:
n
ZDKM
= E E
oo
^VkiYi) ~ E E W ( ^ ) = E W ) ,
i=l
fc=0
i=l
fc=0
(2-39)
i=l
where the coefficients Xi are found by solving K sets of n equations as defined i n (2.38).
k
23
A t this point it must also be noted that for F ( x ) to have an invariant distribution w i t h
respect to x , as required for an isofactorial model, so must zT(x), under the common transformation u>. A s a consequence, Z(x) is required to be stationary i n both its mean and
variance. Another major drawback is that this model has no allowance for an anisotropic
covariance function.
DK
has been highly criticised for these very strict assumptions
innate to the model, which i n many cases render DK an inapplicable form of estimation.
DK
also poses some difficult problems when put to practical use: the most important
being the estimation of to. The only documented method of estimation seems to be
in (David, 1988).
T h e C D F is estimated from the sample values, and the C D F of a
N(0,1) random variable is transposed onto the same axes. Each sample value 2(x;) is
transformed to the corresponding value y{x.i) on the N(0,1) curve, chosen as the value
below which the same proportion of the distribution lies. T h i s is illustrated i n Figure
2.2.
F(z) _
'
F(y)
F(yO=F(Zi)
-0.6
-0.4
-0.2
yi
0.2
0.4
Zi
0.6
0.8
Ore Grade (z)
Figure 2.2: This is an illustration of how the sample values . z ( x i ) , . . . , z(x ) are transformed to corresponding values y ( x i ) , . . . , y ( x ) from a iV(0,1) distribution.
F(z) is the C D F of the sample values, and F(y) is the C D F of a N(0,1)
random variable.
n
24
B y then plotting z(x{) against t/(x;), we can fit a function to of the form:
Z(x) = u(Y(x)) = b (Y(x)),
(2.40)
kVk
fc=0
estimating the coefficients b by least squares or some other similar method.

k
T h i s method only attempts to estimate to that transforms Z(x)

variable Y(x),
to a N(0,1)
random
w i t h no consideration of the bivariate distributions between Y^s at different
locations. Due to there being only one realisation of each random variable Yi, no inference
can be made about these bivariate distributions, so an assumption that the estimated
to also provides normal bivariate distributions Fij, V i, j must be made. Also, accurate
estimation of the C D F from correlated data is directly related to histogram estimation.
A s discussed earlier, this is a difficult task.. Disjunctive K r i g i n g is also computationally
heavy i n comparison w i t h the previous linear estimators, and as Hawkins and Cressie
(1984) point out, the procedure for estimating the functions fi is highly nonrobust.
E s t i m a t i o n of the correlation (covariance) function p and the coefficients b are completely
k
dependent upon the transformed data y(xi), i = 1 , . . . , n. A bias i n the estimation of to,
from misspecification of the C D F say, could lead to a compounded bias i n the resulting
estimator.
{
Most other non-linear estimators have been developed especially to estimate recoverable
reserves rather than total reserves, where instead of estimating the reserves directly from
the sample variables Z{xi), indicator variables of the following type are defined:
I(x;z)
if Z(x) < z
w.p.
p(z)
otherwise
w.p.
1 p(z)
(2.41)
one for each sample variable, where z is a cut-off grade below which the ore is not
recoverable.
Stationarity i n the mean and variance of Z{x)
assumed throughout.
and thus I(x; z) are b o t h
One such estimator is
2.4.2
25
Indicator Kriging, as derived i n (Journel 1983).
Indicator K r i g i n g
Indicator K r i g i n g , or IK,
(IK)
is principally designed as a non-parametric method of esti-
mation arguing that i n nature, variables are rarely gaussian, especially the uncontrolled
spatial distributions found i n the earth sciences. Therefore, Journel proposes to use the
data to model uncertainty i n the value at an unknown point x i n the form of a conditional
distribution function F(x;z\n) of Z(x).
This is as opposed to assuming the underlying
distribution and using the data to find an optimal estimator. T h i s conditional distribution function is modelled as a weighted linear function of the indicator functions at the
data points:
n
F(x; z\n) = P(Z(x) < z\Z(x )...
Z(x )) = ^ ( x ; z)I(x ; z).
The unknown coefficients
(2.42)
aj(x;z) are estimated using OK w i t h a non-parametric dis-
tance related covariance function, e.g.^^p, just as i n distance related moving average
estimators, where D is a constant. T h e n by repeating this for z , k = 1 , . . . , K, a discrete
k
approximation of F ( x ; z\n) can be found. This is of course w i t h the condition that the
resulting estimate is a valid distribution function i.e. F G [0,1] and F(z)
z > z'. In order to provide an estimate of Z(x)
a
loss function L. B y using F ( x ; , z | n ) ,
ZJK,
> F(z')
when
given the sample values, Journel defines

the estimate at x , is chosen as that which
minimises:
E[L(z
IK
- Z(x))\Z(x )...
Z(x )} = / f ^ L(z
- f=i
L(z
IK
IK
- z)dF(x; z\n)
- 4 ) [ F ( x ; z \n) - F ( x ; z \n))
k+1
(2.43)
where z' =
k
26
^+ ,
Zfc+
Other non-linear estimators include:

which is very similar to IK
Probability Kriging as derived i n (Sullivan 1984)
but uses instead a linear K r i g i n g estimator involving b o t h
the sample variables and the indicator variables for F(x;zfc|n);

as derived i n (Verly, 1984); and
Multigaussian Kriging
Bigaussian Kriging as derived i n (Marcotte and D a v i d ,
1985), b o t h of which go back to various gaussian assumptions about Z(x)

distributions of the sample variables.
and the joint
Chapter 3
Kriging with a nonhomogeneous trend
In practise, most ore surfaces are not homogeneous i n their mean, but instead the mean
follows a systematic trend. This can sometimes be diagnosed by looking at the estimated
covariance function. If the correct covariance function is fitted, and is significantly nonzero for large values of h, this implies that surface variables that are far apart are still
significantly correlated.
This correlation over large distances can be described by a
deterministic function, the trend surface /i(x), relating the two variables. T h e way i n
which geostatistical theory deals with this nonhomogeneity i n the mean is to model the
surface values relative to this trend, rather than the surface values themselves. T h i s leads
to the model:
Z ( x ) = ( x ) + e(x),
(3.44)
where
î(x) = trend surface, assumed non-random,
e(x) = small scale regionalised variable incorporating the randomness i n Z ( x ) ,
and
( Z ( x ) ) = n(x),
Vx
=>
( e ( x ) ) = 0, V x .
(3.45)
So now, as long as the trend surface is known or can be well-estimated, the new regionalised variable e(x) is stationary, and so can be modelled using regular geostatistical
27
Chapter 3. Kriging with a nonhomogeneous trend
28
theory. T h e covariance function for e(x) is of course the same as for Z(x), as the trend
surface is assumed to be non-random.
The problem w i t h the above model, which will be discussed i n greater detail later, is i n
its practical application. The surface value, z(x),
at each location x is viewed as one
realisation of the surface variable Z(x). W h e n the mean is stationary, we can use every
sample value to estimate the mean ft, whereas now we have only one value w i t h which
to estimate t(x), for each point x. Even if we were to know z(x) for the whole surface,
which would then make estimation unnecessary, we still could not estimate yu(x) exactly.
So really, this trend surface only exists as a theoretical tool to make the regionalised
variables stationary, but i n practise it is impossible to find.
Despite these practical drawbacks, Universal Kriging, as devised by M a t h e r o n (1969),
is still the only form of estimation i n geostatistical theory that incorporates a nonhomogeneous trend.
For this reason, much has been written on this estimator, and i n
particular on unbiased estimation of the covariance function using sample values from
nonhomogeneous sample variables.
In this chapter, I first outline the original derivation of the Universal K r i g i n g
(UK)
estimator by M a t h e r o n . A s UK is also a B L U E estimator, its derivation is very similar

to that for OK i n Chapter 2. Pointing out the weaknesses i n applying the explicit form
of the estimator, I then describe a second form of the UK estimator proposed by Ripley
(1981). I verify that the two forms of estimation are equivalent. I show how Ripley's
form of estimation avoids the various practical drawbacks of using the explicit estimator,
and I describe exactly how the final estimator and error variance can be found accurately
and efficiently. T h i s is very important when multiple estimates have to be found as i n
29
Chapter 5.
Finally, I discuss the problem of estimating the covariance function unbiasedly from
nonhomogeneous sample variables. I describe how this led to the theory of intrinsic ran-
dom functions and a proposed form of unbiased estimation of the generalised covariance
function. I also describe other forms of unbiased covariance estimation that have been
developed since then.
Before deriving the Universal K r i g i n g estimator Z (x),
I shall list the assumptions
UK
required, i n addition to those outlined i n Chapter 2.
U K m o d e l assumptions
1. T h e basic model for each regionalised variable Z ( x ) at a point x is as defined i n
equations (3.44) and (3.45).
2. T h e trend surface p(x) is unknown, yet we know its shape, i n that it is of the form:
/x(x) = / ? / / * ( x ) = / 3 f ( x ) ,
r
(3.46)
where
/
and
f(x) =
/o(x)
/i(x)
V hW
T h e trend component functions, / ; ( x ) , defining the shape of the trend, are known,
but the parameters B are unknown.
t
30
3. T h e matrix F, defined as
/o(xi)
/L(XI)
F =
(3.47)
^ /o(x )
n
.... / ( x ) J
L
is of rank L + 1, where x i , . . . , x are the sample points.
T h i s means that the
L + l vectors ( / ( x i ) , . . . , / ( x ) ) , . - . . , (/L(X ), . . . , /L(X)), must be linearly inde0
pendent. It is worth noting at this point that although F is full rank, F F

T
becomes
\ illconditioned for relatively small L and will cause roundoff errors to occur when
calculated. Numerical solutions to this problem will be discussed later.
3.1
Original derivation of Matheron's Universal K r i g i n g estimator,
A s for O r d i n a r y K r i g i n g , Matheron considers the Universal K r i g i n g estimator
ZUK^)
ZJJK^)
to be a linear function of the surface variables Z ( x i ) , . . . , Z ( x ) at the sample points

n
X i , . . . , x , such that:
n
Z (x) =
UK
5Xx)Z(
)
i=l
Xl
(3.48)
= ZÂ(x),
where Z and A ( x ) are as defined i n equation (2.10), and the weights Aj(x) are unknown.
n
For ZIJK(*)
to be unbiased, we again need the expected value of the error to be zero,
such that:
E(Z(x) -
Z (x)) =
UK
E(Z(x)) n
E(Z (x))
UK
= ^( ) - E *( M i)
X
i=l
(3.49)
31
Therefore, for (3.49) to be true, independent of the values of Pi, we require:
f(x) = f > ( x ) f ( ) ,
l = 0,...,L,
X i
i=i
written simply as
f(x)
= A(x) F.
(3.50)
We now want to find the optimal weights which minimise the error variance, <r (x),
2
under the L+ 1 unbiasedness constraints defined i n (3.50). So again, by introducing the

Lagrange multipliers
<b = (<f>o,..., 4>L) for the L + 1 unbiasedness constraints, we need

T
to minimise:
\{x) F)cf>
T
a (x) + 2 ( f ( x ) -
a (x) + A ( x ) K A ( x ) - 2A(x) k(x) + 2(f(x) - A ( x ) F ) 0 ,
(3.51)
w i t h respect to A ( x ) and <p. Therefore,
-- 2K\(x) - 2k(x) - 2F<p = 0
d\{.x
H
2(f ( x ) - A ( x ) F ) = 0,
r
which i m p l y respectively that

KX(x)
f(x)
= k ( x ) + Feb
T
(3.52)
= A(x) F.
(3.53)
F i r s t we solve for (p. A s K is the covariance matrix, it is symmetric and non-negative

definite by construction. We further assume that it is positive definite, and thus
exists. B y multiplying (3.52) by F K~
T
and transposing (3.53), we get:
K~
F A(x)
F /C k(x) +
F A(x)
f(x).
32
F K~ F(f)
_ 1
(3.54)
(3.55)
So, by equating (3.54) and (3.55), we get:
F ^ ^ x ) + F K~ Fcb
T
= f(x).
(3.56)
A s K is assumed positive definite and F is assumed to be of rank L + 1, it follows that

A = (F K~ F)
T
is of full rank L + 1 and thus its inverse exists. Therefore:
<t> = A~H(x) - A- F K~ \i{x).

1
(3.57)
B y substituting this solution for <f> back into (3.52), and multiplying by K ,
-1
we have
the following solution for A(x):

A(x) = K - k ( x ) + i T - F v l - ( f ( x ) - F K - k ( x ) ) .
X
(3.58)
Thus, by substituting (3.58) back into (3.48), the Universal Kriging estimator,
of Z(x),
ZUK(X),
as defined by Matheron, can be written explicitly as:
Z (x)
VK
= ZÂ(x) = T7 (K~ \{X) + K " F ^ - ( f ( x ) - F K-^{x)))

X
(3.59)
In order to calculate the kriging estimate, we need to define the trend component functions, fi(x), i n equation (3.46).
A s discussed earlier, this trend surface p(x) is only
constructed i n theory so as to make the residuals of the random function, Z(x),
homo-
geneous. Therefore we have little idea of what the trend component functions should be.
In most cases, the trend surface is assumed to be a polynomial of some general order p,
where
Chapter 3.
Kriging with a nonhomogeneous trend
33
'
/o(x)
/i(x)
f(x) =
( x\
= 1
if
p = 0,
L =
Hp = 2,
if p = 1,
V /L(X) ;
and
1 ^
xy
n=i
If p = 0, the C/K
kriging equations (3.52) and (3.53) become identical to the OK kriging
equations, as one would expect.

Also, as
Z (x.) is a (linear) unbiased estimator that minimises the variance of the error,
UK
it must necessarily estimate the exact values at the the sample points w i t h zero error
variance. Therefore, assuming there to be no measurement error i n the sample values,
UK is an interpolator. This can be very difficult to see by looking at the UK estimator.
We
must go back to the original UK kriging equations defined i n (3.52) and (3.53). B y
letting x be a general sampled point X j , and noticing that (3.52) and (3.53) can be written
as:
[k(x )...k(x )...k(x )]A(x ) = k(xi) + F 0

1
f(x,) =
r
f(
X l
(3.60)
f ^
A(x^
(3.61)
f(x )
n
it is easy to see that a solution to (3.60) and (3.61) is
34
AJ(XJ) =
1,
AJ(XJ) =
0 V7 ^ i,
and (p = 0. A s we know that this system of equations has a unique solution then this
must be i t . Therefore
zuxi^-i) = zfai)
VXJ. B y substituting
this solution for
A(XJ) into
equation (2.13) for the error variance, it is easy to see that the terms cancel to leave a
zero error variance as expected.
ZUK(X)
is always an interpolator, to the extent that if
C{h) is discontinuous at zero,
then the predicted surface becomes discontinuous at the data points. A s an example, if
we assume that the trend is homogeneous and the surface variables at a l l points on the
surface are completely uncorrelated, such that p = 0, and:
if / i = 0
otherwise
C(h) =
(3.62)
then it is easy to show that ZUK{X) predicts a horizontal surface of height ^
J2?=i z (
the mean of the sample values, which becomes discontinuous at the sample points.
3.1.1
Discussion
It is a well-known numerical fact that though products and transposes of large matrices
can be calculated fairly easily and accurately, inverses are a completely different proposition. A s K is an n x n matrix and A an (.L + l ) x (L + l) matrix, calculating K"
and A"
w i l l become extremely heavy i n computation i f either n or L become large. Additionally,

in order to estimate the covariance function, a relatively large sample would need to be
taken, naturally forcing n to be large, unless a covariance function is assumed known.
For this reason, it is very difficult to find a computer package that can cope w i t h the
magnitude of the calculations. B o t h M a t h e m a t i c a and Maple are symbolically-oriented
packages and so are very inefficient i n dealing w i t h numerical problems. I suggest that
i)>
35
for any extensive data analysis that one uses one of the more numerically-based packages
such as M a t l a b .
I wrote a program i n M a t l a b to estimate a surface from a set of samples using the
above explicit form of the UK estimator. A l t h o u g h it was no problem to calculate
accurately, A~
became ill-defined once the trend p(x)
K~
was assumed to be of order 3
or above. T h e reason for this lies behind the construction of A. A s K is a covariance

matrix, it can be very close to the identity matrix depending on the range of influence of
the covariance, and therefore so can its inverse. So ultimately A is composed of the inner
products of the columns of F, which is where our problem arises. Under our assumption
of a polynomial trend, the first column of F is composed of ones, the second column of
the x coordinates of the n samples, the t h i r d column of the y coordinates, and so on,
w i t h increasing order of exponent. Therefore as p, the dimension of the trend, increases,
we have a great difference i n the order of magnitude of the columns i n F. As A has the
square of this order of magnitude difference i n its components, this w i l l result i n it having
a near-zero determinant and thus an inaccurate calculation of its inverse.
A n o t h e r form of the Universal K r i g i n g estimator can be found i n Ripley (1981). A l t h o u g h

it seems very different i n its derivation, it will be shown to be identical to Matheron's
estimator i n equation (3.59), and it avoids finding the inverse of b o t h K and A. It also
provides, I feel, a clearer understanding of how Universal K r i g i n g estimates the trend î(x)
and how the kriging weights Aj(x) are related to the covariance function. For situations
where multiple estimates must be found, such as i n Chapter 5, this form of estimation
also proves to be more computationally efficient.
3.2
36
Ripley's form of the Universal Kriging estimator
Instead of estimating z ( x ) directly from the sample values
-Z(XJ), Ripley
approaches the
problem from a point of view analogous to that i n time series analysis. He estimates the
trend fj,(x) directly and then models the residuals, i?(x) = Z(x) / i ( x ) , just as we d i d
for Z ( x ) before, but now assuming they are stationary w i t h mean zero. T h e same basic
model assumptions and notation as defined for Matheron's estimator still hold.
Before proceeding w i t h the derivation, I define some well known numerical decompositions
which are used i n the analysis:
3.2.1
Cholesky factorisation
Cholesky factorisation states that if an (JV x
N) matrix X is non-negative definite, then it
can be uniquely factorised into the product of a
lower triangular m a t r i x and its transpose.
X = MM ,
T
(3.63)
where M is also (N x N).
3.2.2
Q R Decomposition
Q R decomposition states that any (N x (L + 1)) matrix X can be decomposed into the
product of an
(N x N) orthogonal matrix Q, and an (JV x ( L + 1)) m a t r i x R, such that:
X = QR,
(3.64)
where
QQ = Q Q = 1,
T
and
R=^
J,
R is an ((L + 1) x ( L + 1)) upper triangular matrix.
(3.65)
3.2.3
37
Singular Value Decomposition (SVD)
Singular Value Decomposition states that any real (N x (L + 1)) m a t r i x X can be

decomposed into the product of an (N x (L + 1)) m a t r i x U, a ((L + 1) x (L + 1)) diagonal
m a t r i x E = diag(v\,..., ox+i), and an ((L + 1) x (L + 1)) m a t r i x V such that:
X = C/V
where U U
T
(3.66)
= VV
= VV
= I.
T h e columns of U are the L + 1 orthonormalised eigenvectors associated w i t h the L + 1

largest eigenvalues of X l ' , the columns of V are the L +1 orthonormalised eigenvectors
1
associated w i t h the Z- + 1 eigenvalues of X X,
and (01,... ,CTL+I) are the non-negative
square roots of the eigenvalues of X X,

T
called the singular values. T h e proof of this can
be found, for instance, i n Moler (1967). The advantages of using S V D are discussed i n
Section 3.2.4.
3.2.4
Step 1: Estimating the trend
A s we assume the trend to be of the form:
(3.67)
0(x) = /3 f(x),
T
w i t h the components of f (x) assumed known, then under the basic model, we arrive at
the following vector equation for the ore grade variables at the sample points:
1
= Ff3 + e ,
n
*(x0
e(x )
2
where e
(3.68)
*(Xn)
38
where Z, F, and f (x) are as defined i n the last section, and the residuals, e , are assumed
n
to be correlated random variables with mean zero and covariance matrix K.

We then find the generalised least squares ( G L S ) estimator, (3, which is the solution of:
min (Z - Fb) K- (Z
- Fh)
= min
^K~ e
l
n n
= ,b
m
V(*i)>
i n
(3-69)
i=lj = l
= ( e ( x i ) , . . . , e ( x ) ) is the vector of errors. T h e idea behind this is that i f the
where
ore grade variables Z(XJ) and Z(x.j) at the sampled points j and X j are highly correlated,
X
this reduces the worth of their combined information. Therefore, the multiple of their
associated errors, (XJ) and ( j)> will have a reduced weighting by being multiplied
x
by
a factor inversely proportional to their covariance
COV(Z(XJ), Z(x.j))
= Kij.
T h i s w i l l reduce the effect that a cluster of points has on the polynomial fit, thus providing
a better fit to the more isolated points.
A s K is a covariance matrix, it is non-negative definite by construction. Therefore we
can use Cholesky factorisation as defined earlier to write K~
i n terms of M
K'
= (MM )~
T
= M~ M-\
,
(3.70)
where M is a lower triangular matrix, and ( M
_ 1
= (M )~
T
= M~ for ease of notation.

T
Therefore equation (3.69) can be written as
min (Z - Fb) M- M~ (Z
T
- Fh)
= min ( M - Z
1
- M- Fb) (M- Z 1
M~ Fb).
l
(3.71)
Chapter 3. Kriging with.a nonhomogeneous trend
39
B y transforming the problem i n this way, we have effectively uncorrelated the
-Z'(XJ),
and thus the residuals e(xj). T h e new residuals of the transformed problem, ( X J ) , are
independent random variables with variances equal to one:
Cov(0
Cov(Y ) = C o v ( M
M~ KM~
_ 1
Z ) = M
n
= M~ MM
M"
_ 1
Cov(Z )M~
= I.
(3.72)
It then becomes clear that the generalised least squares estimator J3 is the standard least
squares estimator of the transformed problem:
M~ Z
l
= M~ Ff3 + M
l
_ 1
e ,
(3.73)
which can be written as

Y = G/3 + ,
(3.74)
where
= M~ Z ,
1
G = M~ F
If we assume the residuals e
and
= M~ e .
(3.75)
to be jointly gaussian, then the new residuals
w i l l be
jointly gaussian, and the G L S estimator 0 w i l l be the maximum likelihood estimator of
BA l t h o u g h the transformed variables F ( x ^ ) are uncorrelated among themselves, they rem a i n correlated w i t h the unknown ore grade variable Z(x) that we are t r y i n g to estimate,
w i t h new covariance
k (x)
y
40
Cov(Z(x),Y ) = Cov(Z(x),M- Z )
M- Cov(Z(x),Z ) = M~ k(x).
(3.76)
Notice that since M is a lower triangular matrix, the transformations as denned i n (3.75)
can be carried out using a sequential substitution method. For example, Y
can be written as
M Y
= Z .
(3.77)
which expanded, becomes
MnY,
MY
= Z( )
X l
+ M Y
M Y
+ M r
MY
+ M Y
2l
Z1
nl
22
3 2
n2
= Z(x )
+ M y
33
+ M Y
n3
= Z(x )
3
+ ...
+ M Y
nn
= Z(x )
n
B y solving the first equation for Y\ and substituting it into the second equation, it
becomes obvious how Z
can be easily transformed to Y without having to find
Similarly we can find G, the transformed matrix of F, by letting MG = F and applying

the same method to each column of G, as if it were Y
i n the previous example.
A s stated earlier, we can now use standard least squares methods to estimate j3 from the
transformed data.
41
S t a n d a r d Least Squares A n a l y s i s of the transformed
problem
Following from equation (3.71), we now want to find the standard least squares (LS)
estimator of f3 that satisfies
min H = min ( Y - G b ) ( Y
- Gb).
(3.78)
To find the usual form of this estimator, we expand H to get
H = YlY
- 2b G Y
T
A s G G = F K~ F
T
+ b G Gh.
T
(3.79)
= A , as defined earlier, G G is symmetric and positive definite.
Therefore, as H is quadratic i n b , when we differentiate H w i t h respect to b , the unique

stationary point solution must necessarily be at a minimum:
^- 2G Y
ah
<
As F F
T
+ 2G Gh = 0
T
=>
J3 = (G G)~ G Y .
T
(3.80)
is an ill-conditioned matrix for relatively small values of L, then G G

T
likely to be ill-conditioned, especially if K is close to the identity matrix.
is also
Therefore
in calculating /3, as defined above, we run into exactly the same problem as before, of
finding the inverse of an ill-conditioned matrix. In this
method of L S has been
developed to avoid this problem where the columns of G are orthogonalised using the QR
decomposition. A s G is an (N x ( L + 1)) matrix, it can be decomposed into an (N x TV)

orthogonal m a t r i x Q, and an (N x ( L + l ) ) matrix R , such that
G = QR,
where
and R is an ( ( L + 1) x (L + 1)) upper triangular matrix.

equation (3.78),
(3.81)
Therefore, following from
m i n ( Y - Gb) (Y
- Gb)
42
m i n ( Y - Gb) QQ (Y
T
- Gb)
min ( Q ( Y
- Gb)) (Q (Y
- Gb))
=
Now if we define the vector Q Y
T
m i n (Q Y
T
- Rb) (Q Y
T
- Rb).
(3.82)
as two sub vectors i n the following way:
QY
(3.83)
\ J2
where J i is a ((L + 1) x 1) vector, and J is thus an ((n L + 1) x 1) vector, equation

2
(3.82) becomes
-Rb V
Ji
( J - Rb \
mm
b
m m ((Jx - Pib) (3 - Rb) + j j ) .

T
(3.84)
F r o m looking at equation (3.84), it finally becomes obvious that i t is minimised for b

satisfying
= R/3.
(3.85)
Since R is an upper triangular matrix, b can be found by applying the sequential substitution method as before.
We can summarise the above i n the following steps:
1)
F i n d K, either by assuming a covariance function, or by some form of estimation.
2)
Perform Cholesky decomposition on K to find M.
3)
F o r m F by inserting the sample coordinates into the polynomial trend component
Chapter 3.
Kriging
with a nonhomogeneous
trend
43
functions.
Transform the sample data Z
5)
A p p l y similar substitution method to find G, where MG = F
6)
Perform Q R decomposition on G to find R
7)
Calculate Q Y
8)
Use reversed substitution method to find 0 where Rf3 = J i
to find J
to Y
by solving MY
= Z
4)
for Y(xi).
~ T
Thus we have found the G L S estimator (3 and have an estimate /2(x) = j3 f (x) of the
trend.
A t this point it is worth noting that the above method of avoiding roundoff error for the
illdefined L S problem is only one of many. A n alternative method called singular value
decomposition (SVD), as defined i n Section 3.2.3, is probably the most widely used, as
it is not only very stable, but it also decomposes G i n such a way that it reveals a great
deal about its structure and stability. B y allowing the practitioner to keep an eye on
the condition number of G G,
S V D provides a solution that can be easily altered to
avoid roundoff error when G G
is unconditioned. None of the other methods, including
Ripley's, allow this versatility.

T h e algorithm for this decomposition is based on the original procedure by G o l u b and
Reinsch (1970), using two finite sequences of Householder transformations to reduce G
to a bidiagonal matrix, and then using a QR algorithm to find the singular values. Other
references are G o l u b and V a n L o a n (1983) and Press et al. (1989). After decomposing
G i n this way, we look at the singular values. A l l the singular values whose ratio to
the largest singular value (the reciprocal of the condition number) is less than a certain
multiple of the machine precision must be set to zero. B y doing this, we throw away the
linear combinations of the set of equations associated w i t h these singular values. T h e y
44
are so corrupted by roundoff error, that they have the effect of pulling (3 away from the
L S solution. T h e final S V D solution of (3 for the linear system G(3 = Y
(3* =
where E
_ 1
is:
VE~ U nY,
U
* is E
with
(3.86)
1/aj replaced by zero liuj = 0 for j 1 , . . . , L +1. T h i s solution
is also extremely versatile in its application. If G is square and non-singular, then (3* is
the exact solution. If G is singular, whether it is square or not, then if Y is i n the range
of G , there are an infinite number of solutions, and (3* is the one w i t h the smallest length
(3 f3, for increased stability. Lastly, and most importantly, if Y
T
is not i n the range of
G, then \3* is the L S solution that minimises {G(3 Y ) (G(3 Y ) . T h i s is proved i n

T
Press et al. (1989).
Despite the versatility and stability of this method, Ripley's method is more than adequate for the L S problems that I encounter i n Chapter 5.
3.2.5
S t e p 2: M o d e l l i n g t h e r e s i d u a l s , i ? ( x )
Once the trend has been estimated, we then find the residuals, r ( x ; ) , such that
r(xi) = z(xi)
- p,(xi) = z(xi)
- J3 f ( x ^ , Vx;
(3.87)
and
r(xi)
r(x )
2
\
(3.88)
T h e residual r ( x ; ) for each sample is seen as a realisation of the residual variable -R(x,) =
Z(XJ)
/X(XJ). Under the assumption that /i(x) is the actual trend, i ? ( x ) is equal to the
45
theoretical residual random function e(x) defined earlier. Therefore, we assume that R(x)
is a homogeneous random function such that
E(R(x)) = 0, V x ,
Var(i2(x)) = Var(Z(x)) = o (x), V x ,
2
Cov(fl(x),.R(y)) = C o v ( Z ( x ) , Z ( y ) ) = C(x,y), Vx,y.

Under this assumption, we can estimate R(x) using the O r d i n a r y K r i g i n g estimator
ROK(X)
derived i n Chapter 2 for u 0:

n
ÔAT(X) =
5Xx)i2(xi) =
RTJ(X),
(3.89)
where
Kr){x) = k ( x ) .
(3.90)
A g a i n , by rewriting equation (3.90) as M(M r](x.)) = k(x) and letting v ( x ) = M r / ( x ) ,

T
we can solve Mv{x) = k(x) for u ( x ) , and then solve M rj(x) = i>(x) for rj(x), using the
T
substitution method described earlier.

We can summarise the above i n the following steps:
-T
1)
F o r m the vector of sampling residuals r
2)
F o r m the covariance vector k(x) of the n sample points Xj w i t h the estimation
where r(xj) = zfc) (3 f (XJ)
point x , using the estimated or assumed covariance function, C ( x , y)

3)
Solve Mv(x) = k(x) for v(x) using the substitution method
4)
Solve M 7](x) = v(x) for 77(x) using the reversed substitution method
T
Therefore, Ripley's form of the Universal K r i g i n g estimator is:

Z (x)
UK
= m(x) + Ro {y) = p f
T
(x) + R ^ ( x )
(3.91)
3.3
46
The equivalency of the two forms of estimation
It is far from obvious that Ripley's estimator, as defined i n equation (3.91), is equivalent
to Matheron's estimator, as defined i n equation (3.59). Therefore I now show that the
two forms oi UK estimation are equivalent.
3.3.1
Finding ft explicitly
RR
= R R = R Q QR
= GG = A
= A~ R
l
(3.92)
Therefore, using (3.92), (3.85), and (3.83), we find that
(3 =
R J
= (A- R )J
A- R
X
'
V J
=
A-'R {Q Y )
T
A- R (Q M- Z ).
1
(3.93)
B y noting that R = Q G = Q M~ F,
T
and substituting for R
i n equation (3.93), we
can find an explicit expression for 0:

f3 = A~ (F M~
l
3.3.2
Finding
Q)Q
ROK(^)
M~ Z
l
= A~ F M~
x
M~ 7i
l
= A~ F K~ Z .
1
(3.94)
explicitly
A s ROK^) is defined as
i?(x) = Rl(K- k(x)),
(3.95)
we need to find an explicit function for R . B y first expressing the components of R i n

n
terms of the trend:

R(*i) = Z(xi) - ji{xi) = Z(xi) - Z K- FA-H(x),
T
(3.96)
and noticing that F
47
= (f(xi), f ( x ) , . . . , f(x )), we can write R
explicitly as
Kn = Z - (ZK- FA- F ) .
1
(3.97)
T T
Finally, by substituting J3 from equation (3.94) and R
from equation (3.97) into equa-
tion (3.91), we get an explicit expression for Ripley's UK estimator that is identical to
Matheron's estimator:
Z {x)
UK
3.4
m ( x ) + i?(x)
ZlK^FA'H(x)
+ Z {I l
( i f - ^ x ) + K- FA~
K- FA- F )K- k(x)
(f(x) -
F K~ k(x)))
T
C a l c u l a t i n g cr (x), t h e e r r o r v a r i a n c e o f Z (-x)
2
UK
The advantage of having a statistical rather than deterministic estimator of the ore grade
such as Z (-x), is that one can estimate its error variance, cr (x) = V a r ( Z ( x ) Z (x)),
2
UK
UK
and thus have some idea of the magnitude of the estimation error.
T h e UK error variance is as derived i n equation (2.13):
(T (x) = a ( x ) + A ( x ) K A ( x ) - 2 A ( x ) k ( x )
2
(3.98)
where A ( x ) is as defined i n equation (3.58). B y substituting A ( x ) into (3.98), and simplifying the expanded equation using the fact that F K~ F
T
= A, we arrive at the following
equation for cr (x):

2
a (x)
2
<r (x) 2
kfaf/T^x)
+ (f (x) - F K- k(pc))
T
A'
(f(x) - F ^ - k ( x ) )
T
(3.99)
48
For ease of calculation, we can write a (x) in terms of G, k y ( x ) and R which have already
2
been calculated when finding
<T(X) =
2
ZTJK{X)'-
a (x) + k ( x ) M - M - k ( x )
2
: r
+ ( f ( x ) - F M- M- k(x))
T
(f(x) -
F M- M- k(x))
T
^ W+ k^xfk^x)
(f (X) - GTky(x)))
+ (lf
(f (x) - G k ( x ) )
a (x)+ k (x) k (x)+ h h,
(3.100)
(f(x) - G k ( x ) ) .
(3.101)
where
h = R
Using the fact that R
is a lower triangular matrix, we can again write (3.101) as R h =
( f ( x ) G k ( x ) ^ and solve for h using the substitution method described earlier.

T
2 /
In practise, I am somewhat dubious of the effectiveness of this error variance, considering

the assumptions made about the surface and the random function i t apparently originated
from. I would not expect the estimated error variance from the sample data to be accurate
enough to be of any measure of the magnitude of the error.
3.5
E s t i m a t i o n o f the covariance function
A l t h o u g h UK may have a theoretical superiority over the other estimators, involving

less restrictive assumptions about the surface to be estimated, i n practise this has not
proved to be true. The essential reason for C / K ' s demise as an effective estimator lies i n
the estimation of its covariance function. A l t h o u g h the covariance function is assumed
known for a l l geostatistical estimators, i n general i t must be estimated from the sample
Chapter 3. Kriging
trend
49
data. For this reason, the UK estimator stated earlier is somewhat misleading. Before
going any further, I shall define C ( x , y ; 0) as a parametric class of models assumed to
contain the true covariance function, and I shall restate Ripley's version of the estimator
in equation (3.91) so as to include this information:
Z (x)
= P(0) f (x) + ( Z - Ff3(6)) K(0)- k(x;

T
UK
where J3(0) = (F K(ey F)- F K(6)- Z

T
0),
(3.102)
is the optimal unbiased G L S estimator of
the trend coefficients, and 6 is an estimator for 0.

W i t h a homogeneous trend, the experimental variogram can be unbiasedly
estimated
directly from the sample data using, for example, Matheron's estimator (Matheron 1969):
where N(h)
jivM ^
[z(x
"
z(Xj)]2
'
'
(3 103)
= {(XJ,XJ) : X j X j = h} and |A~(h)| is the number of elements i n
N(h).
There are other more robust estimators which w i l l be discussed i n Chapter 5, but the
important fact is that the estimation is unbiased.
Under the second-order stationary
assumptions, and using the filtering property of the kriging equations, the covariance
function is equivalent to the negative of the variogram for the purposes of finding the
optimal kriging weights. Therefore, assuming an appropriate parametric model is fitted
to the experimental variogram, estimation with a homogeneous trend is unbiased.
O n the other hand, when the trend is non-homogeneous, as for UK, estimation of the
covariance function poses a larger problem. The process of estimating the trend, calculating the residual estimates at the sample points, and from these estimating
C(h),
introduces a substantial bias (Starks and Fang 1982a). Even if we were to know the correct covariance m a t r i x K, the process of estimating the trend by the o p t i m a l unbiased
50
G L S estimator /t(x), and estimating the covariance function from the resulting residuals
R is biased. T h i s is demonstrated by the following:
(x)
f(x) 3 =
Z - FJ3 = (J - FA~ F K~ )Z
f(x) A- F K- Z
nt
BZ .
Therefore,
E(R )
= F(3 - F(3 = 0 = E(e )
= BE(Z ) = (/ - FA- F K~ )Ff3

1
(3.104)
but,
Cov(R )
n
BCov(Z )B
= (I-
FA- F K~ )K{I
l
K - {KK~ FA~ F
1
K~ FA~ F )
1
FA~ F K~ K
1
FA~ F K~ KK~ FA~ F )

l
K-FA^F
K = Cov{Z )
(3.105)
Therefore, by using an estimator of the trend, we have introduced a bias, such that the
variance of the new residual process i?(x) is different from that of the original data
Z.
n
We cannot calculate this bias either as it depends on K.

A back-substitution method has been proposed as a way of resolving this problem, (Ripley
1983); (Neuman and Jacobsen, 1984). B y starting w i t h an ordinary L S estimator of the
trend using covariance matrix K
= la ,
2
the sample residuals are estimated, a covariance
model fitted, a new covariance matrix K
trend found using the new K .

0
calculated, and then a G L S estimate of the
This process is to be continued until the covariance
function converges. It has been demonstrated (Cressie, 1987) that this back-substitution
51
w i l l not remove the bias i n the covariance function, concluding that the best any iteration
method can do is converge to a
biased estimator.
T h i s problem w i t h UK was recognised soon after its development i n 1969, and by 1973
M a t h e r o n had already come up w i t h an apparent solution to this problem i n the form
of
Intrinsic Random Functions of order p (Matheron 1973), where p is the order of the
trend as defined on page 32.
3.5.1
Intrinsic Random Functions of Order p (IRF-p)
Even today, UK and IRF-p are referred to as two different methods of estimation, but
in fact, as best linear unbiased estimators, they are identical (Christensen 1990). It is
best to think of IRF-p as just a different way of looking at the UK problem, resulting i n
the
unbiased estimation of the generalised covariance function (GCF) K (h), rather than
p
the usual covariance function C{h), to be used i n the UK system of equations. A l t h o u g h

this new method provides a better understanding of the UK problem and the effect of
the unbiasedness constraints, there is still much debate as to whether the GCF is easier
to estimate and interpret than the original covariance function.
A s much of the theory of IRF-p is buried i n the previous working for UK, i t can be
very difficult to see what new insight it brings. Being a subtle concept, many texts have
failed to provide a clear description of IRF-p and the GCF, and of their relevance to
the previously discussed problem. T h e most popular detailed account of the theory is
(Delfmer 1976), whereas a brief summary can be found i n (Journel 1989).
T h e idea behind this method is analogous to A R I M A models i n time series, where linear
combinations or
increments of the sample variables are found such that the resulting
Chapter 3. Kriging
trend
52
series or surface is stationary with respect to the mean and variance.

For a general configuration of points
xi,..., x ,
m
a generalised increment
(GI) of order p
is any linear combination of the corresponding random variables:

m
J2"iZ{xi)
=v Z
5></'(xi)
such that
i=l
where
= 0
l = 0,...,L,
(3.106)
i=l
/i(x),
I = 0 , . . . , L are the monomial trend component functions defined earlier,
and p is the order of the trend. Some Vi, but not a l l , can be zero valued.
These G T s have the property of filtering out any linear combination of the trend component functions,
J2t=o dififc),
added or taken away from the random function Z(x):
^2v (z(x )j2dji(x ))

i=l
1=0
/m
E ^ W E ^ 5>/<(xi)
i=l
\i=l
1=0
A n intrinsic
tion
Z(x)
random function
for which zZTLi
Y,ViZ{xi).
of order p (IRF-p)
ViZ{*-i
(3.107)
is then defined as any random func-
+ x) is stationary (i.e. independent of x) i n its mean
and variance, for any configuration of sample points x . . . , x , and any GI vector v,
1 ;
irrespective of its covariance structure.

Therefore, i f our surface Z(x) is thought to have a trend of the form zZi=o A / / ( ) P
x
stationary residual e(x), then:
E ^ Z ( * i + x)
i=i
r L
Yl i E/ '/'(xt + x) + e(x + x)
u
i=i
Vl=0
i u s
Chapter
3.
Kriging
trend
E '( )/a(Xi)
T
i=l
(=0
Ls=0
E E f e ( x ) E^/*( *
x
1=0s=0
m
.1=1
53
+E ^ ( + )
e
i=i
m
Vit^Xi
x)
1=1
Eê(xi + x),
(3.108)
i=l
which is thus stationary i n both its mean and variance, for any configuration of sample
points X i , . . . , x
and any GI vector u. T h i s is using the fact that the monomial trend
component functions fi are closed under translation, so that //(x^ + x ) can be expanded
as above. Thus Z(x)
is an IRF-p, or more exactly a member of an IRF-p,
the class or
group of random functions with a trend of the form zZiLo A / t ( ) x
M a t h e r o n proves, using the theory of Generalised Functions (Gel'fand and V i l e n k i n 1964),

that an IRF-p is characterised by a generalised covariance function
for any GI of the
(GCF) K , such that,

p
IRF-p:
V a r ( E ViZifr)
] = V a r ( E ê(x*)
\i=l
\i=l
= E E WiKpifr
(3.109)
j)-
i=l j= l
In a recent paper by K i t a n i d i s (1993), the relationship between the normal covariance

function C and the G C F K
is finally clarified.
He states that an IRF-p is a class of
equivalent functions whose covariance functions could be used by UK to give the same
results. He points out that i n forming the IRF-p, the unbiasedness conditions essentially
divide the covariance function C ( x , y ) into an effective part, the GCF
A T ( x , y ) , and a
p
redundant part i ? ( x , y ) :
C ( x , y ) = K ( x , y ) + i?(x,y),
p
where K (x,
p
(3.110)
y ) is the same for a l l the functions i n the IRF-p, and i ? ( x , y ) changes w i t h
each function. B y filtering out the polynomial trend, the GPs essentially filter out this
54
redundant variance 7?(x, y) associated w i t h this trend, leaving the characteristic G C F of

the IRF-p. K i t a n i d i s shows that the higher the order of the trend, the more variability
i n the surface that can be described by the trend, and thus the more complex i2(x, y)
becomes. Therefore, contrary to popular belief, the higher the order of the trend that
can be justified, the more freedom one has to estimate C ( x , y).
If we now consider the configuration of sample points x
1 ;
..., x
for UK, w i t h x
being
any point of estimation, then the UK error can be written as:

n
Z(xo) -
Z {xo)
UK
Z(x ) - A (xo)Z( ) =
0
UiZ(xi),
X i
i=l
(3.111)
i=0
where
' -Ai(xo)
^: = <
1
if i 0.
(3.112)
ifi= 0
n
and the unbiasedness conditions, E i ( o ) / / ( i ) = / ; ( o )

A
0 , . . . , L , can be written
as:
n
2>/i(xi) = 0 Z= 0,...,L.
(3.113)
i=0
T h i s implies that the UK error is a GI of order p, and as Z ( x ) is an IRF-p, the error

variance
<Jg(x)
is characterised by the G C F as i n (3.109). This can be minimised w i t h
respect to the unbiasedness constraints, as for UK, producing the same set of equations
for the kriging weights A ; ( x ) and the Lagrangian parameters <pi, but w i t h the covariance
0
function replaced by the GCF. Thus, for K r i g i n g purposes, instead of finding the specific
covariance function C of the random function Z(x)\
characterising the class of random functions IRF-p.
we only have to find the GCF
Theoretically, we can estimate K
55
unbiasedly without estimating the trend by finding a
function which satisfies (3.109) for any GI satisfying (3.111) and (3.112). A s s u m i n g N
such G T s , h,.. .,IN can be found, where Ii = "
= 1
K (h) is estimated by a
VijZfc),
L S regression, minimising the function R:
R = E [i - E(I )
2
i=i
N
VijZ(%)
Vi=i
i=i
E "ijVikKpdxj
(3.114)
x |)
fc
j=i fc=i
T h i s uses the fact that the GTs now have zero mean. K is assumed t o be stationary
p
and isotropic i.e. K (h).
K is also chosen as a polynomial which depends linearly on
its parameters so that the regression simplifies to the unique solution of a set of linear
equations, otherwise the regression would be very difficult to solve. A n o t h e r restriction
is that i t must be positive definite for it to be a valid covariance function. It has been
proven (Matheron 1973) that a valid model for a positive definite polynomial GCF is:
K (h) =
p
for p 0
ai\h\
\h\ + a | / i |
a - osi\h\ + a \h\ - a \h\

3
where a , a i , a 3 > 0 and a

0
>
(3.115)
for p = 1
for p = 2
- j^/aitt3.
A more recent improvement on the form of these functions can be found i n (Delfiner et
al., 1978) where the Green's function | / i | l o g | / i | is added to K (h) defined above for b o t h
2
p = 1 and p = 2, and | / i | l o g | / i | for p = 2. Also a is replaced w i t h a 8(\h\), where 6(\h\)

4
is the D i r a c delta function, allowing for a discontinuity at h = 0, the nugget effect. This
can arise from measurement error, incurring variance i n the value measured at a point.
56
A s for finding the GFs of the random variables, Z ( x , ) , i = 0 , . . . , n , various methods

have been proposed, the most practical probably being that proposed i n (David 1988).
Essentially it uses the above fact that the UK error is a generalised increment, and
thus by removing one sample value Z(xi)
i n turn, and estimating it by UK using the
other n 1 sample values, n such G / ' s can be found. T h i s is of course assuming an
K (h), e.g. \h\. B y using these GFs i n (3.114), a better estimate of
i n i t i a l estimate of
Kp(h) can be found, and the whole process is repeated, similarly to the iterative process
described earlier for C(h).
O n l y this time, it supposedly tends to an unbiased estimate
of K (h).
p
A s stated earlier, it is still debated as to whether the GCF

to estimate than the original covariance function C(h).
K (h)
p
is i n practise easier
One drawback to the GCF
is
its highly restricted choice of models. Another problem lies i n the form of estimation of
its parameters.
T h e success of fitting covariance functions has proven to depend much
upon the skill of the practitioner at cleaning the data of outliers.
T h e fact that L S
estimation allows very little interaction by the practitioner can be a major disadvantage,
as it is highly sensitive to outliers. Journel (1989) also states that the L S estimation
often results i n a , the nugget effect, being dominant, resulting i n little or no spatial
0
correlation. So effectively, the surface is estimated by a trend surface w i t h white noise.

T h i s is little gain for so much work.
F i n a l l y Cressie (1986) states that the estimation of K
i n conclusion, it is still not clear whether K
equations.
still appears to be biased.
So
or C should be used i n the UK system of
3.5.2
57
Other forms of unbiased covariance estimation
Since 1973, when the above form of unbiased covariance estimation was first proposed,
other methods have been developed.
One example is the maximum likelihood estimator 0 ( M a r d i a and M a r s h a l l 1984) which,
assuming ( Z ( x i ) , . . . , Z ( x ) ) to be jointly gaussian, maximises the likelihood:
n
exp{-\{z-Ft3{0)) K{0y\z-Fp{0))\
(3.116)
over a l l viable 0, for a chosen parametric class of covariance models C ( x , y ; 0).

A n o t h e r example is minimum variance quadratic unbiased estimation ( M I V Q U ) developed by K i t a n i d i s (1985). The covariance matrix K is assumed to be a linear combination
of m known symmetric nxn matrices K{ , and the covariance parameters, 8j, are assumed
to be quadratic functions of the sample values:
= Y,Ki6i
9j =
Z BZ
T
(3.117)
j = l,...,m.
T h e matrices Bj are n x n
(3.118)
matrices chosen so that the estimators, 9j, are invariant to
the addition of any linear combination of the trend component functions to the surface,
and are unbiased. T h e optimal matrices Bj are found by minimising the error variance
of each 9j under these constraints, just as for UK.
Despite the apparent completeness of the theory, many problems arise when t r y i n g to
apply it i n practise. Firstly, as the error variance involves the fourth moments of the
sample values, a distributional assumption for ( Z ( x i ) , . . . , Z ( x ) ) is required. Secondly,
n
an initial estimate K
58
of K is required. A l t h o u g h unbiasedness and invariance to the
addition of a trend is withheld by using Ko, the resulting estimators 9j may not necessarily
be m i n i m u m variance. Also if this method is used iteratively to find 0 by substituting
the estimate K for Ko, it may become biased, as the matrices Bj become functions of
the sample values. Similarly, if constraints for K to be a positive definite m a t r i x are
included, then the matrices Bj will be functions of the sample values and may again
interfere w i t h the unbiasedness of 0.
Other examples of this form of unbiased covariance estimation are
restricted maximum
likelihood ( R E M L ) by K i t a n i d i s (1983), and minimum norm quadratic unbiased ( M I N Q U )

by M a r d i a and M a r s h a l l (1985), and are along the same lines as those described above.
Chapter 4
Robustness of the Universal Kriging estimator,
ZTJK{X)
A l t h o u g h much effort and thought has been put into the development of the various
forms of kriging estimator, relatively little has been put into analysing the robustness
of these estimators to changes i n the assumptions, and providing robust estimators to
cope w i t h these changes.
W i t h respect to the
UK
estimator,
ZUK(X),
these changes
could take the form of outliers i n the data; misspecification of the trend component
functions; misrepresentation of the sampling configuration, and misspecification of the
(co)variogram model and associated parameters.
A s extremely little has been written on the robustness of the supposedly unbiased forms
of covariance estimation discussed i n section 3.5.2, I shall concentrate on the other forms
of covariance estimation where the trend function is estimated to find estimates of the
residuals. T h e estimation process, for a specific sample and sample configuration, can be
divided into four distinct steps:
1. choosing the trend component functions f(x) appropriate to the surface of estimation, and finding initial estimates of the trend coefficients (3.
2. calculating the estimated residuals R , and estimating the experimental (co)varion
gram nonparametrically.
3. choosing an appropriate parametric (co)variogram model to be fitted to the experimental (co)variogram, and estimating the model parameters 0.
59
Chapter 4. Robustness of the Universal Kriging estimator,
ZUK{X)
4. final estimation using the estimate of 6 and the sample data z
60
E a c h step i n the estimation process could be affected by the above changes. A s each step
is also dependent on the previous steps, the UK estimator could potentially be extremely
unstable to these changes.
In this chapter I describe how each step of the estimation process is affected by some of
these changes, and how this may have an effect on the final estimator. In some cases,
robust estimators have been devised to lessen these effects.
4.1
S t e p 1: T h e t r e n d c o m p o n e n t f u n c t i o n s
The problem of estimating the trend coefficients j3 unbiasedly, given that the appropriate
trend component functions are known, has been discussed at length, yet very little is
known about how to choose these functions, or more specifically, the order of the trend,
assuming it to be polynomial.
Starks and Fang (1982b) suggest first seeing if a trend is evident i n the geology of the
surrounding area, or i n the data.
T h e n after
fitting
a trend model, and estimating
the (co)variogram, they suggest using the cross-validation technique discussed earlier to
obtain errors and predicted error variances. O n looking at the normal quantile-quantile
plot of the standardised errors, an alternative model should be fitted if it is far from a
N(0,1) distribution. The idea is that if the correct trend and regionalised variable models
are fitted then the errors will be independent N(0,1) random variables. T h e problem w i t h
this approach is that, assuming first that the kriging model is appropriate for the surface,
it is extremely unlikely that the correct one will be chosen. Therefore, these standardised
errors w i l l be correlated and lead to a biased estimation of the distribution. A second
Chapter 4. Robustness of the Universal Kriging estimator, Z (x)

UK
61
problem is that if the points are evenly spaced then if a point is taken out and estimated
from the rest, the nearest sample point w i l l be about twice the normal distance from
the point of estimation. T h i s will lead to more inaccurate estimation than usual of the
surface values and error variances, and could again bias the error distribution. Also, due
to the large number of steps i n the estimation process, it is very difficult to pinpoint
misspecification of the trend as being the sole cause.
Despite this, the application of
different models to the data and the comparison of their errors and predicted error
variances at the samples points is very sensible, though computationally expensive.
Another possible way of choosing the order of the polynomial trend model from the sample
data is to use stepwise regression and other commonly used methods i n linear regression
for model choice. A good reference for these techniques is Weisberg (1985). Forward
selection, one particular form of stepwise regression, begins w i t h a simple regression
model and at each step, one variable (or group of variables) is added to the model. T h i s
variable must provide the largest reduction i n the sum of squares of all the possible
variables not i n the model, when the new regression model is fitted to the sample data.
Equivalently, the variable w i t h the largest F-value is chosen. T h i s adding of variables
is continued until a stopping rule is met.
Possible stopping rules are that the number
of variables i n the model have reached a chosen limit; the F-values for the rest of the
possible variables to be added are below a chosen minimum; or the addition of the next
variable would make the matrix F F
T
unconditioned and thus may cause roundoff errors
in the L S fit of the model. In this particular case, we must add all the monomials of each
order all at once, i n order for the trend model to retain its invariance to rigid motions. A s
the coordinate system is only chosen for convenience, the trend model should not depend
on it. For this reason, this stepwise method reduces to adding groups of monomials of

UK
62
successively higher order until either the m a x i m u m number of terms is reached or the F value is below the m i n i m u m . A s only one group of monomials is considered at each step,
it is only sensible to choose the m i n i m u m for the F-value to be the point of significance
for the specific F statistic. A s long as the order of the trend does not get too high, the
previously discussed methods i n Section 3.2 should solve the problem of roundoff error.
One drawback to this method is that the sample variables must be uncorrelated such
that the residuals can be assumed to be independent N(0, o ) random variables. Under
2
geostatistical theory, the sample values are correlated, so they must first be uncorrelated
by using a technique such as declustering.
Alternatively, various studies have been done on the effects of choosing the wrong order
of the trend, or
misspecifing the trend. D i a m o n d and A r m s t r o n g (1984) suggest, using
a form of perturbation analysis, that owerspecification of the trend would increase the
potential instability of the UK estimator by increasing the bound on the relative error i n
the kriging weights. Cressie and Zimmerman (1992) correctly state that overspecification
of the trend would not cause any bias i n the final estimator, but due to the sample values
being used to estimate the extra trend coefficients, the
predicted error variance w i l l be an
overestimate of the actual error variance. Theoretically, underepecification of the trend

poses a more serious problem by introducing a bias i n the final estimator, but as Cressie
and Z i m m e r m a n point out, estimation of the (co)variogram acts to some extent as a
self-correcting mechanism i n the estimation process. A s long as the UK model sensibly
divides the regionalised variable Z(x)
into the large-scale variation of the trend, and the
small-scale variation of the residuals, then if the trend is underspecified, the resulting
residuals should contain the extra variance unaccounted for by the trend. Therefore the
variogram of the residuals should naturally be overfit and thus correct the bias to some
degree.
Chapter 4. Robustness of the Universal Kriging estimator, Z (-x.)
63
UK
4.2
Step 2: Nonparametric estimation of the experimental
(co)variogram
Depending on whether the stationarity assumption of the residuals i?(x) is intrinsic or

second-order, two estimators devised by Matheron (1969) can be used to estimate the
experimental variogram and covariance function:
C(h)
JFHhJl ] ^
{ R { X i )
R ( x
'
) ) 2
( 4
'
U 9 )
where N(h) = {(x;,Xj)
Xj
x,- =.h} and |AT(h)| is the number of elements i n N(h).
For second-order stationarity, both the variogram estimator 27(h), and the covariogram
estimator C(h) can be used to estimate the variogram and covariance function respectively, but for intrinsic stationarity, only the variogram estimator can be used, due to
nonstationarity i n the variance. B o t h of these estimators are m i n i m u m variance a n d
unbiased under the gaussian assumption of -R(x), but Cressie and G r o n d o n a (1992) show
that i n terms of the bias, the variogram estimator is more stable to misspecification of
the trend than the covariogram estimator. Therefore I advise using 27(h) rather than
C(h) when assuming second-order stationarity.
If the residual data R
is contaminated with outliers then i t is advised to use a more
robust variogram estimator. One proposed by A r m s t r o n g and Delfiner (1980) uses a scale
estimator devised by Huber (1964) rather than the sample variance. Cressie and Hawkins
(1980) have devised a similar estimator to Matheron's but instead take the fourth power
of the mean rooted difference:
= h jm)
Dl/
64
ZUK(X)
7+
<
4121)
T h e justification for this robust estimator is that the rooted differences |i?(xj)
i?(xj)|2
are very close to normal even when the differences i?(xj) Rfaj) follow a symmetric
distribution about zero with heavy tails contaminated by outliers. Another robust estimator, 270(h), is proposed by Cressie and Hawkins, replacing the mean of the rooted
differences D i n equation (4.121) by the median D, which is a more robust statistic to
outliers than the mean. Hawkins and Cressie (1984) found that for data containing outliers, all of the above estimators had a positive bias, but that 275(h) had a smaller bias
than 275(h) which again had a smaller bias than Matheron's i n equation (4.119). W i t h
respect to their variances, the order of 275(h) and 275(h) was reversed, but Matheron's
estimator proved to have the smallest variance as the outlier effect became negligible,
confirming its m i n i m u m variance property under gaussian conditions.
Therefore one
must use either 275(h) or 275(h) as an estimator of the experimental variogram when
the data is thought to contain outliers.
4.3
S t e p 3: F i t t i n g a n a p p r o p r i a t e p a r a m e t r i c m o d e l , C ( x , y ; 0)
T h i s step has been given the most attention with respect to robustness as it is considered
to be the most sensitive part of the estimation process. M a n y studies have been conducted
on robustness of variogram models to misspecification, or more exactly, to inappropriate
choice of parametric model or wrong estimation of the parameters.
A l t h o u g h kriging predictions for linear unbiased estimators have m i n i m u m error variance,
Chapter 4. Robustness of the Universal Kriging estimator, Z (x.)
65
UK
given that the underlying assumptions hold, if the (co)variogram is misspecified, the
error variance can become very large. D i a m o n d and A r m s t r o n g (1984) have investigated
the effect of small perturbations of the variogram on the kriging predictions for a fixed
set of observations.
B y defining a neighbourhood 6 of variograms close to the true
variogram, D i a m o n d and A r m s t r o n g provide a bound on the relative error i n the kriging

weights w i t h i n this neighbourhood. They find that this bound ultimately depends on
the condition number K(T) of the true variogram matrix V, which is equal to the largest
absolute eigenvalue of T divided by the smallest absolute eigenvalue. The larger K(T), the
larger the bound and thus the larger the potential instability. Therefore they propose K(T)
to be used as a measure of how stable a particular variogram model is to misspedification
for a fixed set of observations.
B y applying this to b o t h the gaussian and spherical
variogram models, they found that the gaussian model gave very unstable behaviour for
small perturbations i n the variogram whereas the spherical model was far more stable i n
comparison. Bardossy (1988) proposes a different measure to D i a m o n d and Armstrong's
which is also sensitive to the location of the estimation point x , a factor which Bardossy
shows is a major influence on the kriging weights. Using this new measure, Bardossy
also found the gaussian model to give more unstable behaviour to small changes i n the
parameters than either the spherical or exponential models. For this reason, Stein (1989)
recommends that if a model must be fitted that is quadratic at the origin, then the
gaussian should be replaced by a more robust model such as C ( h ; 0) = c ? -
0 2 h
i e
(l +
6 h).
2
A s the number of observations increases, Stein and Handcock (1989) show that as long as
the variogram model used is
compatible w i t h the true variogram, then the differences i n
their final prediction and their kriging variances become negligible. The term
compatible
is strictly defined by Stein and Handcock, but it can be loosely defined as having the
same behaviour near the origin. A l t h o u g h this is an encouraging fact, I think that it is
66
ZTJK( )
X
not necessarily relevant, as i n most applications observations must be kept to a m i n i m u m

in order to cut costs.
Z i m m e r m a n and Harville (1989) prove that, assuming the correct covariance model
C ( x , y ; 0) and trend component functions have been chosen, the estimate of the trend
coefficients (3(0) is unbiased under the weak conditions that z be symmetric about its
mean, and that 0 be an even and translation invariant estimator. Using this fact, Z i m merman and Cressie (1991) show that under the same conditions the UK estimator is
unbiased. Therefore, contrary to previous belief, the estimator of the covariance function
parameters 0 need not be unbiased for the resulting estimate of the trend and the
final
kriging estimator to be unbiased.

Finally, i n fitting a (co)variogram model to the experimental (co)variogram, w i t h Stein
and Handcock's definition of compatibility i n m i n d , maybe more emphasis should be put
on providing a better fit to the smaller values of h i n preference to the larger values. T h i s
is achieved using the weighted LS method (Cressie 1985), described i n detail i n Chapter 5.
T h e smaller h is, the closer an observation is to the unobserved point and thus the larger
its weight i n predicting the value at the unobserved point. Therefore more care should
be put into estimating these weights correctly at the expense of poorly estimating the
weights for observations further away. The only problem w i t h this argument is that the
larger h is, the larger the probable number of observations at that distance.
Therefore
the total of these negligible weights could sum to a weight that has a significant effect
on the final prediction. Alternatively, if the process is truly stochastic, then only the
closer observations should be included i n the estimator anyway, (i.e we should see the
observation weights for the further points as exceeding a confidence interval i n some way
and therefore being ignored.)
4.4
67
ZUK{X)
Step 4: F i n a l estimation using C(x,y;0) and the sample data z
A l t h o u g h much has been written about the effects of misspecification of the trend and
the (co)variogram on the final UK estimator ZUK{x),
very little has been written about
how it is affected by errors i n the sample configuration. D i a m o n d and A r m s t r o n g (1984)

tried to analyse this by perturbing the coordinates of the sample slightly while keeping
the variogram unchanged. They again derived a bound on the relative error i n the kriging
weights, and found that the condition number K(T) of the variogram m a t r i x T for the
original sample configuration was again the most influential factor. Therefore, K(T) could
be used to give some idea of how robust different kriging systems are to a slight change i n
the sample configuration, and thus to measurement errors i n the sample coordinates. O f
course, calculating K(T) assumes knowledge of the true variogram and so this could not
be used as a measure of instability when applied to real situations where the variogram
is needed to be estimated. The same applies to the previous step when K(T) was used as
a measure of instability i n the variogram.
In conclusion, the UK estimator has an inherent stability i n its estimation process due
to the model's decomposition of Z(x) into the large scale variation of the trend, and the
small scale variation of the residuals. A s long as estimates of the trend coefficients /3 and
the covariance parameters 0 are relatively good then, as discussed earlier, an error i n one
is naturally corrected by an opposite error i n the other. Therefore, as long as each step
in the estimation process is carried out carefully, and an appropriate covariance model
is chosen that is compatible and relatively stable to misspecification of its parameters,
then the UK estimator should be fairly robust to errors i n the estimation process.
A l t h o u g h every effort has been made to make each step as robust as possible to outliers,

UK
this does not guarantee that
ZUK{X)
68
w i l l be robust to outliers, as it is a linear function
of the data. Therefore i n some way these outliers must be isolated a n d edited so as not
to overinfluence the predictions made be Z (x).
UK
way of doing this called Robust Kriging.
Hawkins and Cressie (1984) propose a
The method is based on the idea that outliers
can only be identified by comparison w i t h the values of the immediately surrounding

observations, rather than w i t h the sample mean w i t h which i t may agree.
4.4.1
Robust Kriging
1. estimate the variogram using robust methods as discussed earlier.

2. use this robust variogram i n the cross validation technique to estimate the kriging
weights Oiij and the error variance s} for each sample point x based on the rest of
the data:
5(x ) = c ^ (
t
(4.122)
3. use these weights ctij and the neighbouring sample values to Xj to get a robust
prediction 5(x;) of .z(xj) for each sample point x, using a weighted median method:
if gridded data, then choose the nearest eight points, requiring only two different kriging weights to be used throughout.
solve
iZjî
ocijSgn(z(xi) - z(xi)) = 0 for z(xi).
if there are multiple roots and/or intervals of roots to the equation then there
w i l l always be an odd number of them, so choose z(xi) to be the middle one,
and if i t is an interval, choose
-Z(XJ)
to be the middle of this interval.
4. Winsorise z(xi) by replacing i t w i t h z(x{) + cs, if z(xi) > z(x{) + CSJ, or w i t h

z(xi) csi if -z(xj) < z(xi) csi, but otherwise keep it unchanged.
5. the final Robust K r i g i n g estimate
-ZRK-(XJ)
ZUK(^)
69
of z ( x ) at an unobserved point x is found
by using the robust estimate of the variogram to calculate the kriging weights A;
as usual, but by replacing the sample values z
z {x)
RK
= ^w(xi)
J
w i t h the Winsorised sample values
(4.123)
T h i s method of smoothing the data is very similar to Huber's smoothing process used i n
time series (Huber 1977). The crux of the method is i n the fourth step when the data
is edited by Winsorizing.
The larger the estimated error variance at X;, the more erratic
the surface is assumed to be, and therefore the larger the bound about the weighted
median of its surrounding values within which zfa)
remains unedited. Obviously, the
constant c also controls the w i d t h of this bound, and is thus an important parameter i n
the estimation process. A relatively high value of 2-2.5 is usually chosen so as not to
alter the data unless a sample value is extremely different from the weighted median of
its surrounding sample values.
Therefore, the essential difference between this form of robust kriging and the previous
methods, is that the edited sample values are used i n the final estimate instead of the
original sample values. Considering that the error variances sj are not estimated robustly,
and that they may be extremely inaccurate as discussed i n section 4.1, this method may
not be an effective way of identifying outliers. Instead, the editing of the data may only
succeed i n biasing the final estimate.
A n alternative method is to measure i n some way the influence that a particular sample
value has on the final estimates, and then to decide whether the highly influential sample
70
ZJJK{^)
values are outliers or important points that should be left untouched. Christensen, Johnson, and Pearson (1992) do this by using
case deletion diagnostics. B y using a version
of the UK estimator which allows for measurement errors at the sample points, the ac-
tual values at the sample points are estimated ( z ) using the n measured sample values,
n
y . T h e n , w i t h one sample value removed, they are estimated again ( z _ i ) . A measure

n
similar to
Cook's distance (Cook 1977), quadratic i n the error ( z
z _i),
n
is used to
measure the relative influence a sample has on the resulting estimates. In this way, the
most influential sample values can be found and edited if thought to be outliers.
Chapter 5
A Numerical Application
In this final chapter, I attempt to apply Ordinary K r i g i n g and Universal K r i g i n g w i t h

a first order and second order trend to real data, using Ripley's form of the estimator
and the practical information gained from the robustness studies discussed i n the previous chapter. T h e data consists of elevations of a mountain range i n northern B r i t i s h
C o l u m b i a , near St. M a r y ' s Lake, measured by satellite, and was obtained courtesy of
Dave M o u l t o n (1993). It is i n the form of a (359 x 504) grid of data points, w i t h a
grid spacing of 60 metres. I assume the surface to be somewhat representative of an ore
surface, to which these techniques are commonly applied. F r o m now on, to avoid any
confusion, I shall refer to Ordinary K r i g i n g as Universal K r i g i n g w i t h a homogeneous
trend.
F i r s t l y I state my reasons for choosing the form of (co)variogram estimation associated

w i t h estimating the trend and forming an experimental semivariogram, as opposed to the
other supposedly unbiased methods described i n section 3.5.2. I then describe i n detail
the method by which I estimate the (co)variogram from the data.
Secondly, I divide the above surface into 84 smaller surfaces of (51 x 42) grid points, and
choose 20 of them at random. I assume all of these surfaces to be isotropic and stationary
in their variance. I predict each of these 20 surfaces for the same sample configuration
of 50 randomly chosen points for different trend assumptions and 3 different methods of
71
Chapter 5. A Numerical Application
72
fit to the experimental semivariogram: linear, spherical w i t h L S fit, and spherical w i t h

weighted L S fit. I repeat this for 20 different sample configurations.
A s a measure of prediction performance, I calculate both the squared difference and the
absolute difference between the predicted value and the actual value at each grid point.
M y two measures of prediction performance or goodness-of-fit are then the mean squared
difference, and the mean absolute difference over the grid of points. I investigate how they
differ i n ranking the various method combinations. Using these performance measures, I
come to some conclusions as to which trend assumption and method of variogram fit to
use when applying UK to surfaces of this type.
Finally, I outline the few methods which have been proposed to detect the order of the
trend that would give the best estimator of the surface.
I provide an example of how
these methods can be very misleading.
5.1
Reasons for choosing the biased form of (co)variogram estimation
A s previously mentioned i n Chapter 4, there are two general types of method by which
the (co)variogram can be estimated from the data. T h e first requires i n i t i a l estimation
of the trend.
F r o m the residuals, the experimental (co)variogram is estimated, and a
positive definite function is fitted, usually of linear, spherical, exponential or gaussian

form. T h e second type of method supposedly avoids going through this biased process of
estimating the trend by estimating the parameters of an assumed (co)variogram model
straight from the data.
A l t h o u g h , i n theory, these (co)variogram estimators of the second type may be unbiased,
in practise their performance ultimately depends on the appropriateness of the chosen
73
(co)variogram model and the validity of the many assumptions made about the data and
their distribution. A s pointed out i n section 3.5.2, if these assumptions are not satisfied,
it could lead to biased estimators just as for the first method. Therefore, it would be
unwise to rely on these apparent unbiasedness properties. Also, as stated i n section 4.3,
a biased (co)variogram estimator does not necessarily lead to a biased kriging estimator.
T h e methods of the second type also require a highly restricted choice of (co)variogram
model, usually linear i n the parameters. These parameters, 9, are very difficult to interpret, and as a result, it is difficult to evaluate whether the estimates, 9, are reasonable
for the considered data set. T h i s property, and the very nature of the estimation process,
allows little interaction by the practitioner. Also, the final parameter estimates can often
lead to an invalid (co)variogram.
O n the other hand, the (co)variogram models for the first type of method are very
versatile and naturally produce positive definite (co)variograms. T h e individual parameters are also very easy to interpret as, say, the variance of Z(x),
or the gradient of the
(co)variogram at h 0. T h i s allows the practitioner to use his or her skill and experience
i n choosing the (co)variogram model and the method of fitting it to the experimental
(co)variogram.
Lastly, and most importantly, there seems to be absolutely nothing documented on the robustness of these 'unbiased' estimators. W i t h respect to Delfiner's L S method, discussed
i n section 3.5.1, it is well known that L S estimation is highly sensitive to outliers. A l t e r natively, much has been written about the sensitivity of estimators of the first method to
outliers and misspecification of the trend and (co)variogram, as previously summarised
i n Chapter 4. For this major reason, and the others discussed above, I choose to use
74
estimators of the first type to estimate the (co)variogram.
5.2
Detailed description of (co)variogram estimation
For a stationary trend, the experimental (co)variogram can be directly estimated from
the data, whereas for a first and second order trend, we face the problem of estimating the
trend from correlated data without previous knowledge of the (co)variogram. A s there
is still no generally accepted way of doing this, I simply decluster the sample points, and
then assuming them to be independent, I fit a trend surface of first or second order by L S .
T h i s method should be fairly stable, as the sample data is exact with no measurement or
positional errors. I use the declustering technique discussed i n section 2.2, using a grid
spacing of 4 metres. This generally reduces the sample from 50 to 43-44 values.
I then calculate the residuals between the sample values and the fitted trend surface.
Assuming these residuals to have originated from normally distributed random variables
w i t h zero mean, I use them to estimate the experimental (co)variogram by the exact
same methods as for a homogeneous trend, but using the estimated residuals
-R(XJ), i =
1 , . . . , n, i n place of the original data.

A s the data is free of measurement errors, I use Matheron's estimator for the variogram,
as defined i n equation (4.119), instead of Cressie and Hawkins' robust estimators 2^p(h)
and 2jf)(h), to make use of its m i n i m u m variance property for gaussian data.
I also
choose Matheron's variogram estimator over the covariogram estimator for its stability
to misspecification of the trend. A s \N(h)\ is extremely small or zero-valued for most
values of h, it is standard practise to divide the h scale into intervals or discrete lags,
providing only one semivariogram estimate, ^(hj), per lag, where N(hj)
Xi Xj = hj, and hj is i n j
th
= {(XJ,X.,) :
interval} (Ripley 1981). There is no standard interval
75
length but it seems common to divide the h scale into the order of 10-20 intervals. I use
15 intervals of equal length, w i t h the estimators evaluated at the midpoints. A n example
of an experimental semivariogram of this form is shown i n Figure 5.3.
Figure 5.3: A n experimental semivariogram after dividing the h scale into 15 discrete
lags, providing only one estimator per lag.
T h e next step is to fit a valid semivariogram model to the experimental semivariogram.
T h e common models considered are linear, exponential, spherical, and gaussian, although
there are many other variations of these which could be applied. A comparison of these
models can be found i n Figure 5.4. A s mentioned i n Section 3.1.1, the [/If equations are
constructed i n such a way that any constant term or factor i n the covariance function is
filtered out. Therefore, any linear model of the semivariogram would be equivalent under
UK, so there would be no point fitting one to the experimental semivariogram under
these methods.
O f the nonlinear models i n Figure 5.4, the most appropriate one to fit to the experimental
semivariogram seems to be the spherical model, for every pairing of surface and sample
76
Linear
S = Spherical
E = Exponential
Gaussian
Figure 5.4: A comparison of semivariogram models with the same sill or variance (except
for the linear model), and the same gradient at h = 0 (except for the gaussian
model which always has a zero gradient at h 0).
configuration that I choose. Also, as mentioned i n section 4.3, the spherical model is
relatively stable to misspecification of 0. So, i n order to automate the estimation process,
I assume that the spherical model is the most appropriate nonlinear model to fit i n all
cases. A s most of the experimental semivariograms indicate negligible discontinuity at
h = 0, which is expected as there is no measurement error i n the data, I also decide not
to allow for a nugget effect i n the model. Therefore the final form of the spherical model
that I fit to the experimental semivariograms is as follows:
h< R
V
(5.124)
h> R
T h e final step of fitting the semivariogram model to the experimental semivariogram is

where the practitioner's skill and experience comes into effect. The common methods of
fitting the curve are by eye, by various techniques of estimating the parameters specific
to the model fitted, or by some form of L S . A s the curve is nonlinear, explicit equations
77
for the parameters cannot be found. Instead some form of Newton-Rhaphson iteration
technique must be applied to the L S statistic to find a local minimum. A s there may be
many local minima, initial values for the parameters must be stated. For the spherical
model, the parameters are S and R.
S is the horizontal asymptote or sill of the model, and i n theory should be equal to the
variance of Z(x), assuming the variance to be stationary. Therefore, the sill S can be
estimated by an estimate of cr , such as the sample variance a ,
2
i=i
O f course, if a trend is fitted to the surface, then the estimated residuals i2(xj) should
be used instead of the data i n the above formula. Cressie (1985) shows that <r w i l l
2
always have a negative bias due to the estimation of LL by Z, just as i n Section 3.1 when
estimating the trend for higher dimensions. Cressie and Glonek (1984) propose that by
using the median of the sample values, rather than the sample mean Z, as an estimate
of p, this bias can be much reduced. Also <r is based on the assumption that the data
2
are uncorrelated, so I apply it to the declustered data.

The gradient of the spherical semivariogram model at h = 0 is f | . B y estimating | | by
the gradient of the line of best fit to the first three lags of the experimental semivariogram,
and using the variance estimator for S, we can provide initial values of S and R for the
L S iteration method.
A s the experimental semivariogram is estimated for a different number of pairs, \N(hj)\,
for each discrete lag hj, minimisation of the L S statistic may not be the most appropriate
way of fitting the semivariogram model. Cressie (1985) proposes a weighted L S statistic
to be minimised, based on \N(hj)\, which gives weight to early lags, and down weights
78
lags w i t h a small number of pairs. He proposes finding the model parameters 0 that
minimise
L
rather than
where L is the number of lags, j(hj)
is the experimental semivariogram estimator, and
j(hj \ 0) is the semivariogram model. Cressie states that this is a vast improvement over
L S . Cressie also proposes that this estimator of 6 could be used as the starting value of
an iterative generalised L S approach.
W h e n comparing Cressie's weighted L S method w i t h normal L S for fitting the spherical
model, I find that there is practically no difference between them when the model is a
good fit to the experimental semivariogram. Otherwise, I find that Cressie's method does
fit the early lags far better, which are the most important lags, as discussed i n section
4.3. A n example of this is shown i n Figure 5.5.
A l s o Cressie's method gives more weight to the middle lags, which have the highest values
of
\N(hj) |. T h i s has the effect of bending the normal L S fit towards these weighted values.
In Figure 5.5 this effect is shown to t u r n an almost linear L S fit into a more appropriate
spherical fit. In general though, this weighting of the middle lags has the effect of changing
the sill S and the range R without altering the general L S fit too much. A n example of
this is shown i n Figure 5.6.
Lastly, as the values of
\N(hj)\ for lags hj = 13,14, and 15 are generally far below those
of the other lags, I do not include them i n the L S and weighted L S fits. A similar 'rule'
has been devised by Journel and Huijbregts (1978) for fitting semivariograms.
To summarise, I apply UK w i t h a homogeneous, first order and second order trend
Chapter
5. A Numerical
Application
79
Figure 5.5: A comparison between LS and Cressie's weighted L S method for fitting a
spherical model to the first 12 lags of the experimental semivariogram in
Figure 5.3.
surface, using both a linear semivariogram, and a spherical semivariogram fitted to the
experimental semivariogram using L S and weighted L S . Finally, I use the relation C{h) =
cr j(h)
2
to estimate the covariance function
C(h).
A s UK filters out any constant terms or factors i n the covariance function, o and S could
2
be any constants that make C(h) a valid positive definite function, i.e. any numbers large
enough to ensure that C(h) is non-negative i n the required range of h. However, it is
best to estimate these parameters as accurately as possible so that C(h) can be used to
estimate the error variance.
5.3
Results
20 randomly chosen geological surfaces are estimated by UKiox 20 different sample configurations of 50 randomly chosen points; 3 different methods of (co)variogram estimation
Chapter 5. A Numerical
Application
80
WLS
LS
10
20
30
40
50
60
Figure 5.6: A n example of how Cressie's weighted L S method alters the sill of the L S fit
without altering the general L S fit too much.
- linear ( L I N ) , spherical with a least squares fit ( L S ) , spherical w i t h a weighted least

squares fit ( W L S ) ; and 3 different orders of the trend surface - stationary (O), first order
(I), second order (II). For the response variable, I use both the mean squared difference
(SD) and the mean absolute difference ( A D ) between the real and estimated surface over
the (51 x 42) grid of points.
W i t h o u t analysing the data statistically, it is very difficult to come to any real conclusions
about which method of (co)variogram estimation is better, and for which order of the
trend surface. It is also very unwise to compare the different methods and trend orders
without first investigating whether there is any interaction between the 3 factors of the
experiment: method, trend, and sample. Interactions between these factors can lead to
very misleading conclusions if they are not dealt w i t h properly. In order to investigate
these effects, and their comparative magnitudes, I attempt to model the response variable
Vijkn
s a
ijkn
81
overall mean p plus the sum of these effects and their interactions:
= + M + T + S + (MT)ij + (TS)
f l
jk
+ (MS)
ik
+ (MTS)
ijk
+e
ljkn
(5.127)
where i = 1,2,3, j = 0,1, 2, k, n = 1 , . . . , 20, and M ; is the effect due t o (co)variogram

method i, Tj is the effect due to trend order j, and S
is the effect due to sample
configuration k. n is the surface coefficient. The rest of the terms i n equation (5.127) are
the interaction effects between the different factors. tij
kn
using the
is the residual variable when
{ijkn} combination of factors and surface.
B y modelling the response variable i n this way, I can carry out an analysis of variance
( A N O V A ) study on the data to compare the relative effects. G o o d references for A N O V A
tests and related diagnostic techniques are Montgomery(1991) and Hicks (1993). A s the
sample configurations are randomly chosen, a mixed A N O V A model must be used. T h e
method and trend effects, M and T, and their interactions, MT, are called fixed effects.
The sample effects, S, and their associated interactions, TS, MS, and MTS, are called
random effects. For each random class, S,TS,MS, and MTS, the effects are assumed to
be normally distributed w i t h zero mean and constant variance for a l l i,j, and k. For each
fixed class, M,T, and MT, the null hypothesis is that a l l the effects are zero, and the
alternative hypothesis is that there is least one non-zero effect. For each random class,
the null hypothesis is that the common variance is zero, and the alternative hypothesis
is that i t is non-zero. If we assume that the model is appropriate, and the residuals are
independent normally distributed random variables w i t h constant variance a , then the
2
F-values for each class follow an F distribution w i t h their respective degrees of freedom.
In each case, the alternative hypothesis only replaces the null hypothesis if the F-value
is significantly large. Therefore the upper t a i l is the critical region, requiring a one-sided
test.
82
Before analysing the A N O V A tables for the A D and S D data sets closely, the residuals
of their respective fits to the above model should be analysed to ensure that they satisfy
the above assumptions. Plots of the residuals versus the fitted values for b o t h the A D
and S D data sets are shown i n Figure 5.7.
AD Data
35
40
Fitted Value
SD Data
45
50
3000 4000 5000 6000 7000 8000

Fined Value
Figure 5.7: Plots of residuals versus fitted values for the mean absolute difference (AD)
and the mean squared difference (SD) data sets.
A l t h o u g h the variance of the residuals for A D shows only a slight increase w i t h i n creasing magnitude of the fitted values, the variance of the residuals for S D shows a
marked increase indicating the need for a transformation of the data.
L o g ( S D ) pro-
vides the most stable residual variance of a l l the transformations i n the natural set
{ S D , S D ^ , log(SD), S D ^ } but at the expense of an asymmetric histogram of the resid- 1
uals, whereas S D ^ provides a slightly increasing residual variance similar to that for A D ,
but a near perfect histogram and normal quantile-quantile plot. T h e quantile-quantile
plot for A D is also relatively straight except for a slight kink i n the negative t a i l between
83
-10 and -30. T h i s is reflected i n the histogram where there is a slight dip around -20.
B o t h the histograms and normal quantile-quantile plots for A D and S D ^ can be found
in Figure 5.8.
Figure 5.8: Histograms and normal quantile-quantile plots for both the A D and SD2
residuals. A straight normal quantile-quantile plot indicates that the residuals follow a normal distribution.
These results, and further plots of residuals versus method, trend, and sample, support
my conclusion that neither the A D or S D ^ model fits are i n significant violation of
the above assumptions as to seriously affect any conclusions made from the respective
A N O V A tables.
B y comparing the A N O V A tables for A D and SD5 i n Table 5.1, it is clear that they
provide exactly the same information. For both response variables, the two fixed factor
effects, the variance of the sample effect and the
trend:method interaction effect have
AD
84
Effect
trend
method
sample
trend:method
trend:sample
trend:sample
trend:method:sample
Residuals
Df
2
2
19
4
38
38
76
3420
Sum of Sq
3629
1225
42887
996
1319
433
312
534749
M e a n Sq
Effect
Df
2
2
19
4
38
38
76
3420
Sum of Sq
18286
4575
157588
3359
7838
1663
1235
1080426
M e a n Sq
9143.20
2287.66
8294.08
839.73
206.27
43.76
16.24
315.91
SD*
trend
method
sample
trend: method
trend:sample
method:sample
trend:method: sample
Residuals
1814.49
612.56
2257.24
249.12
34.72
11.40
4.11
156.36
F Value
52.261
53.733
14.436
60.613
0.222
0.073
0.026
p Value
0.000
0.000
0.000
0.000
1.000
1.000
1.000
F Value
44.326
52.277
26.255
51.708
0.653
0.139
0.051
p Value
0.000
0.000
0.000
0.000
0.951
1.000
1.000
Table 5.1: A N O V A tables for the A D and SD2 data sets
extremely high probability of being non-zero, whereas the other interaction effects are
almost definitely zero.
T h e trend:method interaction plots for A D and SD2 i n Figure 5.9 are also very similar.
T h e y b o t h indicate that by increasing the order of the trend, the differences i n the
methods are only magnified, without changing the actual ordering i n any way.
Only
when the trend is stationary does this order seem to change.

Paired t-tests and paired W i l c o x o n signed rank nonparametric tests are performed to test
the significance of these differences between the A D means for the various method/trend
combinations. In Table 5.3, these method/trend combinations are ordered w i t h increasing
Figure 5.9:
85
Dimension:method interaction plots for both the A D and SD2 residuals
mean A D .
mean
LS
WLS
lin
0
36.00
36.08
36.11
I
37.72
37.06
36.35
II
39.85
38.81
36.86
s.d.
LS
WLS
lin
0
11.84
11.84
11.82
I
12.80
12.92
12.00
II
14.50
13.82
12.51
Table 5.2: Mean and standard deviations of the A D data for the different method/trend
combinations estimated over all the combinations of surface and sample configuration.
Firstly, I test the differences between the A D means of the different (co)variogram methods, keeping the trend order fixed. For both the first and second order trend surfaces (I
and II), b o t h the paired t-test and paired W i l c o x o n signed rank test give extremely small
86
p-values for each difference. Therefore, despite the fact that these differences seem i n significant, they are actually highly significant. This implies that for estimating a general
surface of this type from 50 randomly chosen sample values, L I N is better than W L S ,
which is itself better than L S , when using a polynomial trend surface of order one or
two. For a stationary trend (O), the two tests agree that the difference between W L S
and L I N is insignificant, but for the other two differences, W L S - L S and L I N - L S , there
is some disagreement. The respective p-values for the paired t-test are 0.263 and 0.314,
suggesting that the two differences are insignificant, whereas the values for the W i l c o x o n
test are 0.016 and 0.000, suggesting completely the opposite conclusions. O n looking
at the density plots and normal quantile-quantile plots of the two differences i n Figure
5.10,1 feel that despite the heavy tails the distibutions are nearly normal and suggest an
insignificant difference from zero for the respective means. Therefore I conclude that the
two differences are insignificant, i n agreement w i t h the paired t-tests. A s these differences
are so small, this conclusion is rather less important than for the first and second order
trends.
Secondly, I test the differences between the A D means for the different trend orders,
keeping the (co)variogram method fixed. For each (co)variogram method, the two tests
give extremely small p-values for every difference. T h i s implies that for any (co)variogram
method, when estimating a general surface of this type from 50 randomly chosen sample
values, the lower the trend order the better. T h e exact same conclusions are found by
analysing the S D ^ data i n this way. These conclusions are also reflected i n the mean and
median factor plots for A D i n Figure 5.11.
These factor plots also point out that however significant the differences may be between
the (co)variogram methods and trend orders, the differences i n the means for the various
87
Figure 5.10: Density and normal quantile-quantile plots for LS - W L S and LS - L I N

differences for a stationary trend surface over all combinations of surface
and sample configuration. The density plots are smoothed histograms.
sample configurations are far greater. T h i s strongly indicates that the choice of sample
configuration is very important i n the estimation of a surface, and suggests that more
effort should be put into choosing the best sampling scheme. T h e problem w i t h choosing
the best sampling scheme is that it is entirely dependent on the surface, of which we know
extremely little at this stage i n the estimation process. Based on this little knowledge,
various sampling schemes such as clustered, stratified, systematic or random sampling
can be applied (Ripley 1981).
5.4
Detecting the optimal trend order
In this final section, the problem of choosing the order of the trend surface is considered.
88
T h e results of my numerical example indicate that if the practitioner has no idea of

which trend order to choose then he or she should choose the trend to be stationary as
in general, for randomly chosen samples and surfaces, a stationary trend surface seems
to perform the best. However, if there was any way that the o p t i m a l trend order could
be detected from the sample data, this would obviously be better.
A s discussed i n Section 4.1, there has been very little written on this problem of choosing
the appropriate trend component functions, or i n this case, the order of the trend. A s
mentioned, Starks and Fang (1982) suggest using the cross validation method to obtain
kriging errors and predicted kriging error variances at the sample points. B y dividing
the errors by their predicted variances, a set of standardised errors can be found. T h e y
suggest choosing the model whose distribution of the standardised errors compares best
w i t h a N(0,1) distribution. The problems w i t h this method have been discussed i n section
4.1. For these reasons, I feel that goodness-of-fit of the standardised errors to a N(0,1)
distribution is very unlikely to be related to the optimality of a particular trend model.
A simpler measure could be the mean squared difference or mean absolute difference of
these kriging errors.
A t h i r d method, also discussed i n Section 4.1, could be to fit all three trend models to the
declustered data by L S and calculate the residual sum of squares at the sample points,
as i n stepwise regression. F-tests can be performed to test whether the reduction i n the
sum of squares, when moving from a stationary to a first order model, and from a first
order to second order model, are significant.
T h e results of these methods when applied to surface 13 for the first sample configuration
are as follows:
89
T h e cross validation kriging errors and associated error variances are estimated for a
homogeneous, first order, and second order trend surface, using a linear covariance function. Note that the parameters of the linear model are now required to be estimated i n
order to estimate the kriging error variances. The normal quantile-quantile plots of the
standardised errors for each trend model are i n Figure 5.12. It is clear that the best fit to
a N(0,1) distribution is the second order trend model, w i t h the stationary model being
the worst. A s expected, none of them are very good fits.
Next, I calculate the mean squared difference and mean absolute difference of the kriging
errors. I find the ordering of the trend models to be the same for b o t h measures, w i t h
the second order model having the lowest value, and the first order model the highest.
Lastly, the three trend models are fitted to the 43 declustered sample values by L S . The
F statistic for moving from the stationary to the first order trend model is 29.71 which
is highly significant for an F(2,40) distribution. The F statistic for moving from the first
order trend to the second order trend model is 35.10 which is also highly significant for an
F(3,37) distribution. Therefore again it is concluded that the second order trend model
is the most appropriate, then first order, and lastly the stationary model.
In fact, the trend model that provides the best fit to surface 13 for the first sample
configuration is the first order model, w i t h the second order model coming i n last place.
Therefore, these methods can clearly be misleading and result i n choosing the worst
rather than the best trend model for a particular surface and sample configuration.
5.5
90
Conclusions
T h e following conclusions are obviously limited to applications of UK to surfaces of the

type studied i n this example. B o t h the surfaces, and the low sample size were chosen so
as to direct the resulting conclusions as much towards ore mining applications as possible.
A s prediction performance measures, A D and S D give slightly different orderings of
the method/trend combinations for specific surfaces and sampling configurations. W i t h
respect to the general analysis, A D and S D ^ give almost identical results. Therefore, I
conclude that there is little difference between A D and S D .
I conclude that, i n general, UK w i t h a homogeneous trend performs better than UK w i t h
a first order trend, which performs better than UK w i t h a second order trend, whatever
the form of variogram estimation.
I conclude that, i n general, a linear variogram model performs better than a spherical
variogram model fitted by weighted L S , which performs better than a spherical variogram
model fitted by L S , for UK w i t h a first and second order trend. For UK w i t h a homo-
geneous trend, there is no significant difference between the three forms of variogram
estimation.
Finally, I conclude that the methods for detecting the optimal trend order for a specific
sample configuration and surface can be seriously misleading.
Therefore, for a sample configuration of 50 randomly chosen sample points and a general
surface of the type estimated i n this example, it is best to choose the simplest trend
surface and the simplest method of variogram estimation. I suggest that this is as a result
of my inexperience i n finding an initial estimate for the trend, and i n fitting a variogram
model.
91
For these reasons, I conclude that unless the practitioner has experience i n
applying this type of estimator and i n using the above methods for detecting the o p t i m a l
trend order, UK should always be applied w i t h a homogeneous trend i.e.
K r i g i n g , for which the choice of variogram estimation is unimportant.
Ordinary
92
20'
20 n
8-
8-
97-
79-
6-
6-
1611
Q
<
Q
<
"l
15-
lOO
ll
16-
II-
LSWLS
102-
LS
WLST
LINT
''
LIN4-
15O-
0-
18-
I
17ISM-
1751
19-
S'
14
19-
12-
18-
sample
18'
trend
Factors
method
sample
trend
method
Factors
Figure 5.11: Mean and median factor plots for the A D data set. The horizontal lines in
the mean and median plots mark the overall mean and median respectively.
93
Homogeneous trend
Quantiles of Standard Normal
First order trend
Second order trend
Figure 5.12: Normal quantile-quantile plots for the cross validation standardised residuals. The standardised residuals for the second order trend provide the best
fit to a N(0,1) distribution.
Chapter 6
Conclusion
Since the original linear estimators of Ordinary and Universal K r i g i n g were first developed
i n the 1960's, a multitude of alternative estimators have been put forward, based on far
more complex mathematical and statistical theory, yet few, if any, have proved to be
practically superior. T h i s is supported by the fact that O r d i n a r y K r i g i n g is still being
practiced i n many ore mines around the world, including the gold mines i n South Africa,
where statistical techniques for ore estimation were first applied. So, as is often the case,
the simplest model proves to be the best.
A l t h o u g h Universal K r i g i n g (UK) is the most general of the linear estimators, it fails
i n its attempt to model the surface as having a
nonhomogeneous trend. T h e problem
lies i n estimating the (co)variogram unbiasedly from sample variables w i t h unknown

but different means.
M u c h research has gone into solving this problem, resulting i n
the development of supposedly unbiased forms of (co)variogram estimation which avoid

prior estimation of the trend.
Also, much research has gone into making UK more
robust to errors at each stage i n its estimation process. Despite all this work, UK w i t h a
nonhomogeneous trend is still considered to be too unreliable to be of any real practical
use. Another problem is that, being a linear estimator, it is also very sensitive to outliers
in the data. T h i s problem is difficult to avoid. Therefore, UK is very dependent upon the
prior detection and editing of these outliers, which is itself a very difficult and unreliable
procedure. In situations where outliers are known to exist, UK should be replaced by
94
Chapter 6. Conclusion
95
estimators that are inherently robust to outliers.

After attempting to apply UK, using the (biased) techniques associated w i t h prior estimation of the trend, and estimation of the experimental semivariogram, I come to the
conclusion that, for an inexperienced practitioner such as I, a homogeneous trend model
should always be chosen when applying UK to a general unknown surface. M y reasoning
for why the homogeneous model performs better than the higher order trend models i n
my numerical example is that the estimation performance for the homogeneous model
seems to be far less dependent upon the choice of (co)variogram model and method of
fit, and therefore relies less upon the skill and experience of the practitioner. For nonhomogeneous trends, the estimation of the (co)variogram is also dependent upon prior
estimation of the trend from correlated sample values, and this increases the potential
instability of the estimation process.
For both first and second order trend models, I
find that the linear (co)variogram model, which demands no interaction at a l l by the
practitioner, performs better than the spherical model fitted by weighted L S , which itself
performs better than the spherical model fitted by L S . T h i s again indicates that the less
interaction by an inexperienced practitioner the better, as it is unlikely that a linear
model is more appropriate than a spherical model for a (co)variogram when b o t h fitted
correctly. T h i s observation leads me to some suggestions for future research.
Firstly, as the other unbiased forms of (co)variogram estimation demand little to no

interaction by the practitioner, it would be of interest to see how these (co)variogram
methods compare when applied to the same surfaces and sample configurations as i n my
numerical example.
Secondly, it would be of interest to compare UK with other non-statistical estimators
Chapter
6.
Conclusion
96
for different surfaces of varying irregularity and different numbers of sample values, and
try to find some definition of the type of surface and number of sample values that UK
is best suited for. It is known that the interpolating spline is equivalent to UK w i t h a
first order trend and a fixed covariance function equal to the Green's function, |/i| Zo(?|/i|
2
(Dubrule 1983). The theoretical advantage that UK has over spline interpolation is that
it is more versatile, allowing the trend order and the covariance function to change to
suit the surface being fitted. It would be of interest to see whether this translates to an
advantage i n practise.
Thirdly, I think less attention should be put towards finding unbiased forms of estimation
and more towards finding estimators that are robust to outliers. A s these estimators are
generally applied to ore deposits, where there are likely to be errors i n measurement, I
feel that the effort i n this field of research has been somewhat misplaced.
Lastly, my numerical example strongly indicates that the choice of sample configuration
is extremely important i n the estimation of a surface, far outweighing the effects of the
different trend orders and forms of (co)variogram estimation. Therefore, I feel that more
effort should be put towards finding methods of choosing an optimal sampling scheme.
Bibliography
A R M S T R O N G , M . AND D E L F I N E R , P . , 1980, Towards a more robust variogram: A

case study on coal, Internal note N-671: Centre de Geostatistique, Fontainebleau,
France.
B A R D O S S Y , A . , 1988, Notes on the robustness of the kriging system: M a t h . Geol.,
20, 189-203.
CHRISTENSEN, R . , 1990, The equivalence of predictions from univeral kriging and
intrinsic random function kriging: M a t h . Geol., 22, 655-664.
C H R I S T E N S E N , R . , W E L S E Y , J . , AND P E A R S O N , L . M . , 1992, P r e d i c t i o n diagnos-
tics for spatial linear models: Biometrika, 79, 583-591.

C O O K , R . D . , 1977, Detection of influential observations i n linear regression: Technometrics, 19, 15-18.
CRESSIE, N . , 1985, F i t t i n g variogram models by weighted least squares: J . Int.
Assoc. M a t h . G e o l , 17, 563-586.
CRESSIE, N . 1986, K r i g i n g nonstationary data, J . A m e r . Stat. Assoc. 81, 625-634
CRESSIE, N . 1987, A nonparametric view of generalised covariances for kriging:
M a t h . G e o l , 19, 425-449.
CRESSIE, N . , AND G L O N E K , G . , 1984, M e d i a n based covariogram estimators reduce
bias: Stat. P r o b . Lett., 2, 299-304.
CRESSIE, N . , A N D GRONDONA, M . O . , 1992, A comparison of variogram estimation w i t h covariogram estimation, in M a r d i a , K . V . (Ed.), The Art of Statistical
Science: Wiley, New York.
CRESSIE, N . , AND HAWKINS, D . M . , 1980, Robust estimation of the variogram, I:
J . Int. Assoc. M a t h . G e o l , 12, 115-125.
CRESSIE, N . , A N D ZIMMERMAN, D . L.,1992, O n the stability of the geostatistical
method: M a t h . G e o l , 24, 45-59.
DAVID, M . , 1977, Geostatistical
Ore Reserve Estimation:
97
Elsevier, New Y o r k , 364p.
Bibliography
D A V I D , M . , 1988,
98
Handbook of Applied Advanced Geostatistical Ore Reserve Esti-
mation: Elsevier, Amsterdam.

D E L F I N E R , P . , 1976, Linear estimation of nonstationary spatial phenomena, in
Guarascio, M . , et al. (Eds.), Advanced Geostatistics in the Mining Industry: Reidel,
Dordrecht, 49-68.
D E L F I N E R , P . , R E N A R D , D . , A N D CHILES, J . P . , 1978,
BLUEPAK-3D Manual:
Centre de Geostatistique, Fontainebleau, France.

D I A M O N D , P . , AND A R M S T R O N G , M . , 1984, Robustness of variograms and conditioning of kriging matrices: M a t h . Geol., 16, 809-822.
D U B R U L E , O . , 1983, T w o methods w i t h different objectives: splines and kriging:
M a t h . G e o l , 15, 245-257.
G E L ' F A N D , I. M . , A N D V l L E N K l N , N . Y . , 1964,
Generalised Functions: Applica-
tions of Harmonic Analysis, vol. 4: Academic Press, New York, 384p.

G O L U B , G . H . , A N D REINSCH, C , 1970, Singular value decomposition and least
squares solutions: Numer. M a t h . , 14, 403-420.
G O L U B , G . H . , A N D V A N L O A N , C . F . , 1983,
Matrix Computations: Johns Hopkins
U n i v . Press, Baltimore.
HAWKINS, D . M . , AND CRESSIE, N . , 1984, Robust kriging - a proposal:
G e o l , 16, 3-18.
Math.
HICKS, C . R . , 1993, Fundamental concepts i n the design of experiments (4th ed.):

Saunders College P u b . , New York, 509p.
H U B E R , P . J . , 1964, Robust estimation of a location parameter: A n n . M a t h . Stat.,
35, 73-101.
H U B E R , P . , 1979, Robust smoothing, in Launer R . L . , and W i l k i n s o n , G . N . (Eds.),
Robustness in Statistics: Academic Press, New York, 33-47.
J O U R N E L , A . G . , 1980, The lognormal approach to predicting local distributions
of selective mining unit grades: M a t h . Geol., 12, 285-303.
J O U R N E L , A . G . , 1983, Nonparametric estimation of spatial distributions: M a t h .
Geol. 15, 445-468.
A . G . , 1989, Fundamentals of Geostatistics in Five Lessons: Short
Course i n Geology, 8: American Geophysical Union, Washington, D . C . , 40p.
JOURNEL,
Bibliography
99
J O U R N E L , A . G . , A N D HUIJBREGTS, C . , 1978, Mining

Academic Press, 600p.
Geostatistics:
London,
KITANIDIS, P . K . , 1983, Statistical estimation of polynomial generalised covariance

functions and hydrologic applications: Water Resour. Res., 19, 909-921.
KITANIDIS, P . K . , 1985, Minimum-variance unbiased quadratic estimation of covariances of regionalised variables: M a t h . Geol., 17, 195-208.
KITANIDIS, P . K . , 1991, Orthonormal residuals i n geostatistics:
and parameter estimation: M a t h . Geol. 23, 741-758.
M o d e l criticism
KITANIDIS, P . K . , 1993, Generalised covariance functions i n estimation:

G e o l , 25, 525-540.
M A R C O T T E , D . , A N D DAVID, M . , 1985, The Bi-Gaussian approach:
method for recovery estimation: M a t h . Geol., 17, 625-644.
Math.
A simple
M A R D I A , K . V . , AND M A R S H A L L , R . J . , 1984, M a x i m u m likelihood estimation of

models for residual covariance i n spatial regression: Biometrika, 7 1 , 135-146.
M A R D I A , K . V . , AND M A R S H A L L , R . J . , 1985, M i n i m u m norm quadratic estimation of components of spatial covariances: M a t h . Geol., 17, 517-525.
M A T H E R O N , G . , 1969, Le Krigeage Universel: Les Cahiers d u Centre de M o r p h o l o gie Mathematique de Fontainebleau, Fascicule 1.
M A T H E R O N , G . 1971, L a theorie des variables regionalisees et ses applications: Les
Cahiers du Centre de Morphologie Mathematique de Fontainebleau, Fascicule 5,
211p.
M A T H E R O N , G . , 1973, The intrinsic random function: A d v . A p p l . P r o b . , 5, 438-468.
M A T H E R O N , G . , 1976, A simple substitute to conditional expectation: T h e Disjunctive K r i g i n g , in Guarascio, M . , et al. (Eds.), Advanced Geostatistics in the Mining
Industry: Reidel, Dordrecht, 221-236.
M O L E R , C . B . , 1967, Computer Solutions
H a l l , New Jersey.
of Linear Algebraic
Systems:
Prentice-
M O N T G O M E R Y , D . C , 1991, Design and analysis of experiments: Wiley, New York,

649p.
Bibliography
100
Nonconforming Finite Element Solution

for the Plate Bending problem with application to Visual Surface Reconstruction:
M O U L T O N , J . D . , A N D SPITERI, R . , 1993,
F i n a l project, C S 542a, U n i v . of B . C .
N E U M A N , S. P . , A N D J A C O B S E N , E . A . , 1984, Analysis of nonintrinsic spatial
variability by residual kriging w i t h application to regional groundwater levels: M a t h .
G e o l , 16, 499-521.
PRESS, W . H . , FLANNERY, B . P . , TEUKOLSKY, S . A . , AND V E T T E R L I N G
1989,
702p.
W.T.,
Numerical Recipes: The Art of Scientific Computing: C a m b . U n i v . Press,
R E N D U , J - M . M . , 1979, N o r m a l and lognormal estimation: M a t h . Geol., 1 1 , 407422.

R I P L E Y , B . D . , 1981,
Spatial Statistics: Wiley, New York.
STARKS, T . , A N D F A N G , J . , 1982a, The effect of drift on the experimental semivariogram: J . Int. Assoc. M a t h . Geol., 14, 309-319.
STARKS, T . , AND F A N G , J . , 1982b, O n the estimation of the generalised covariance
function: J . Int. Assoc. M a t h . Geol., 14, 57-64.
STEIN, M . L . , 1989, The loss of efficiency i n kriging prediction caused by misspecification of the covariance structure, in Armstrong, M . et al. (Eds.), Geostatistics
vol. 1: Kluwer, 273-282.
STEIN, M . L . , A N D H A N D C O C K , M . S., 1989, Some asymptotic properties of
kriging when the covariance function is misspecified: M a t h . Geol., 2 1 , 171-190.
SULLIVAN, J . , 1984, Conditional recovery estimation through P r o b a b i l i t y K r i g i n g , in
Verly, G . et al. (Eds.), Geostatistics for Natural Resources Characterisation: Reidel,
Dordrecht, 365-384.
V E R L Y , G . , 1983, T h e Multigaussian approach and its application to the estimation
of local reserves: M a t h . G e o l , 15, 259-286.
W E I S B E R G , S., 1985,
Applied Linear Regression: Wiley, New York, 324p.
Z I M M E R M A N , D . L . , AND CRESSIE, N . , 1991, M e a n squared prediction error i n the

spatial linear model w i t h estimated covariance parameters: A n n . Inst. Stat. M a t h . ,
43
Bibliography
101
Z I M M E R M A N , D . L . , A N D H A R V I L L E , D . A . , 1989, O n the unbiasedness of the
Papadakis estimator and other nonlinear estimators of treatment contrasts i n fieldplot experiments: Biometrika, 76, 253-259.

Spatial Estimation The Geostatistical Point of View PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Spatial Estimation The Geostatistical Point of View PDF

Uploaded by

Copyright:

Available Formats

SPATIAL ESTIMATION: T H E GEOSTATISTICAL POINT O F VIEW

B.Sc. Bristol University, United K i n g d o m , 1991

We accept this thesis as conforming

British Columbia, I agree that the Library shall make it

scholarly purposes may be granted by the head of

Derivation of Matheron's Ordinary K r i g i n g

(Ordinary) Lognormal K r i g i n g (LK)

O r i g i n a l derivation of Matheron's Universal K r i g i n g estimator, Z {x)

R i p l e y ' s form of the Universal K r i g i n g estimator

Singular Value Decomposition ( S V D )

Step 1: E s t i m a t i n g the trend

Step 2: Modelling the residuals, R(x)

T h e equivalency of the two forms of estimation

C a l c u l a t i n g o"g(x), the error variance of ZUK(X)

E s t i m a t i o n of the covariance function

Intrinsic R a n d o m Functions of Order p (IRF-p)

Other forms of unbiased covariance estimation

Robustness of the Universal Kriging estimator, Z (x.)

Step 1: T h e trend component functions

Step 2: Nonparametric estimation of the experimental (co)variogram

Step 3: F i t t i n g an appropriate parametric model, C ( x , y ; 6)

Step 4: F i n a l estimation using C ( x , y ; 0 ) and the sample data z

Reasons for choosing the biased form of (co)variogram estimation

Detailed description of (co)variogram estimation

Detecting the optimal trend order

Idealised (co)variogram under second order stationarity assumption

Illustration of how distribution of Z(x)

Comparison of semivariogram models

A comparison between L S and Cressie's weighted L S method

H o w weighted L S alters the sill of the spherical fit

Histograms and normal q-q plots for the A D and S D ^ residuals

Dimension-.methodinteraction plots for both the A D and S D residuals .

5.11 M e a n and median factor plots for the A D data set

5.12 N o r m a l q-q plots for the cross validation data

A N O V A tables for A D and S D ^ data sets

M e a n and standard deviations of A D data for the different m e t h o d / t r e n d

Roughly i n order of appearance:

the number of sample values

a general (usually unsampled) point i n !!ft

the sample points, i = 1 . . . n

the (n x 1) vector of ore grade random variables at the sample points, x i . . . x

the (n x 1) vector of ore grades measured at the sample points, x i . . . x

the actual estimate of z{x)

the expected value of Z{x) at point x

the expected value of Z(x) when it is assumed to be homogeneous

the (n x 1) vector of residual variables at the sample points, X i . . . x

C ( x , y) shortened notation for C o v ( Z ( x ) ,

notation for C o v ( x , y) when the covariance function is assumed to be

notation for C o v ( x , y) when the covariance function is assumed to be

the variance of Z(x), which depends on x

the variance of the error, Z ( x ) Z(x), which depends on x

the kriging estimator weights, i = 1 . . . n , which depend on x

the (n x 1) vector of kriging weights A (x)

the (n x 1) vector of Lagrange multipliers, fa,..., (f>

the trend component functions, I = 0 ... L

the ((L + 1) x 1) vector composed of the trend component functions

the (n x (L + 1)) m a t r i x of trend component functions evaluated at the sample

the degree of the polynomial trend

the (n x n) m a t r i x of covariances between the ore grade variables, Z(x;), at the

the generalised least squares estimator of the trend: m(x) = (3 f(x)

the Cholesky decomposition lower triangular m a t r i x of K such that: K

the uncorrelated sample data transformed from Z

the ( n x ( I + 1)) transformed m a t r i x of F such that: G =