Professional Documents
Culture Documents
Subtopic: Status
Introduction
Regionalized Variables
Kriging
Data Integration
Conditional Simulation
Case Studies
Selected Readings
Geostatistics Glossary
STATISTICAL NOTATION
Statistical notation uses Roman or Greek letters in equations to represent similar
concepts, with the distinction being that:
Greek notation describes Populations: measures of a population are
called parameters
Roman notation describes Samples: measures of a sample are called
statistics
Now might be a good time to review the list of Greek letters. Following is a list of
Greek letters and their significance within the realm of statistics.
Letter
Name Upper &
Lower Case
alpha
beta
gamm
a
delta
epsilon
zeta
eta
theta
iota
kappa
lambd
a
mu Statistical Notation: Mean of a Population
nu
xi
omicro
n
pi
rho
Statistical Notation: Correlation Coefficient
Statistical Notation: Summation
MEASUREMENT SYSTEMS
Because the conclusions of a quantitative study are based in part on inferences
drawn from measurements, it is important to consider the nature of the
measurement systems from which data are collected. Measurements are
numerical values that reflect the amount or magnitude of some property. The
manner in which numerical values are assigned determines the measurement
scale, and thereby determines the type of data analysis (Davis, 1986).
There are four measurement scales, each more rigorously defined than its
predecessor; and thus containing more information. The first two are the nominal
and ordinal scales, in which we classify observations into exclusive categories.
The other two scales, interval and ratio, are the ones we normally think of as
“measurements,” because they involve determinations of the magnitude of an
observation (Davis, 1986).
Nominal Scale
This measurement classifies observations into mutually exclusive categories of
equal rank, such as “red,” “green,” or “blue.” Symbols like “A,” “B,” “C,” or
numbers are also often used. In geostatistics, we may wish to predict facies
occurrence, and may therefore code the facies as 1, 2 and 3, for sand, siltstone,
and shale, respectively. Using this scale, there is no connotation that 2 is “twice
as much” as 1, or that 3 is “greater than” 2.
Ordinal Scale
Observations are sometimes ranked hierarchically. A classic example taken from
geology is Mohs‟ scale of hardness, in which mineral rankings extend from one to
ten, with higher ranks signifying increased hardness. The step between
successive states is not equal in this scale. In the petroleum industry, kerogen
types are based on an ordinal scale, indicative of stages of organic diagenesis.
Interval Scale
This scale is so named because the width of successive intervals is constant.
The most commonly cited example of an interval scale is temperature. A change
from 10 to 20 degrees C is the same as the change from 110 to 120 degrees C.
This scale is commonly used for many measurements. An interval scale does not
have a natural zero, or a point where the magnitude is nonexistent. Thus, it is
possible to have negative values. Within the petroleum industry, reservoir
properties are measured along a continuum, but there are practical limits for the
measurements. (It would be hard to conceive of negative porosity, permeability,
or thickness, or of porosity greater than 100%.)
Ratio Scale
Ratios not only have equal increments between steps, but also have a zero point.
Ratio scales represent the highest forms of measurement. All types of
mathematical and statistical operations are performed with them. Many
geological measurements are based on a ratio scale, because they have units of
length, volume, mass, and so forth.
For most of our geostatistical studies, we will be primarily concerned with the
analysis of interval and ratio data. Typically, no distinction is made between the
two, and they may occur intermixed in the same problem. For example, in trend
surface analysis, the independent variable may be measured on a ratio scale,
whereas the geographical coordinates are on an interval scale.
INTRODUCTION
Statistical analysis is built around the concepts of “populations” and “samples.”
A population consists of a well-defined set of elements (either finite or infinite).
More specifically, a population is the entire collection of those elements.
Commonly, such elements are measurements or observations made on items of
a specific type (porosity or permeability, for example). A finite population might
consist of all the wells drilled in the Gulf of Mexico in 1999, whereas, the infinite
population might be all wells drilled in the Gulf of Mexico, past, present, and
future.
A sample is a subset of elements drawn from the population (Davis, 1986).
Samples are studied in order to make inferences about the population itself.
Random Sampling
Samples should be acquired from the population in a random manner. Random
sampling is defined by two properties.
First, a random sample must be unbiased, so that each item in the sample
has the same chance of being chosen as any other item in the sample.
Second, the random sample must be independent, so that selecting one
item from the population has no influence on the selection of other
items in the population.
Random sampling produces an unbiased and independent result, so that, as the
sample size increases, we have a better chance of understanding the true nature
(distribution) of the population.
One way to determine whether random samples are being drawn is to analyze
sampling combinations. The number of different samples of n measurements that
can be drawn for the population, N, is given by the equation:
N N!
Cn n!(N n)!
Where:
CNn = the number of combinations of samples
N = the number of elements in the population
n = the number of elements in the sample
If the sampling is conducted in a manner such that each of the C Nn samples has
an equal chance of being selected, the sampling program is said to be random
and the result is a random sample (Mendenhall, 1971).
Sampling Methods
The method of sampling affects our ability to draw inferences about our data
(such as estimation of values at unsampled locations) because we must know the
probability of an observation in order to arrive at a statistical inference.
Replacement
The issue of replacement plays an important role in our sampling strategy. For
example, if we were to draw samples of cards from a population consisting of a
deck, we could either:
Draw a card from the deck, and add it‟s value to our hand, then draw
another card
Or
Draw a card from the deck, note it‟s value, and put it back in the deck, then
draw a card from the deck again.
In the first case, we sample without replacement; in the second case we sample
with replacement. Sampling without replacement prevents us from sampling that
value again, while sampling with replacement allows us the chance to pick that
same value again in our sample.
INTRODUCTION
In statistical parlance, a trial is an experiment that produces an outcome which
consists of either a success or a failure. An event is a collection of possible
outcomes of a trial. Probability is a measure of the likelihood that an event will
occur, or a measure of that event‟s relative frequency. The following discussion
introduces events and their relation to one another, then provides an overview on
probability.
EVENTS
An event is a collection of possible outcomes, and this collection may contain
zero or more outcomes, depending on how many trials are conducted. Events
can be classified by there relationship to one another:
Independent Events
Events are classified as Independent if the occurrence of event A has no bearing
on the occurrence of event B, and vice versa.
Dependent Events
Events are classified as Dependent if the occurrence of event A influences the
occurrence of event B.
PROBABILITY
Probability is a measure of the likelihood that an event will occur, or a measure of
that event‟s relative frequency. The measure of probability is scaled from 0 to 1,
where:
0 represents no chance of occurrence, and
1 represents certainty that the event will occur.
Probability is just one tool that enables the statistician to use information from
samples to make inferences or describe the population from which the samples
were obtained (Mendenhall, 1971). In this discussion, we will review discrete and
conditional probabilities.
Discrete Probability
All of us have an intuitive concept of probability. For example, if asked to guess
whether it will rain tomorrow, most of us would reply with some confidence that
rain is either likely or unlikely. Another way of expressing the estimate is to use a
numerical scale, such as a percentage scale. Thus, you might say that there is a
30% chance of rain tomorrow, and imply that there is a 70% chance it will not
rain.
The chance of rain is an example of discrete probability; it either will or it will not
rain. The probability distribution for a discrete random variable is a formula, table,
or graph providing the probability associated with each value of the random
variable (Mendenhall, 1971; Davis, 1986). For a discrete distribution, probability
can be defined by the following:
P(E) = number of outcomes corresponding to event E
total number of possible outcomes
Where:
P = the probability of a particular outcome, and
E = the event
Consider the following classic example of discrete probability, used almost
universally in statistics texts.
Sample
Coin 1 Coin 2 P(EI) y
Point
E1 H H ¼ 2
E2 H T ¼ 1
E3 T H ¼ 1
E4 T T ¼ 0
Let y equal the number of heads observed. We assign the value y = 2 to sample
point E1, y = 1 to sample point E2, etc. The probability of each value of y may be
calculated by adding the probabilities of the sample points in the numerical event.
The numerical event y = 0 contains one sample point, E4; y =1 contains two
sample points, E2 and E3; while
y =2 contains one sample point, E1.
Thus, for this experiment there is a 25% chance of observing two heads from a
single toss of the two coins. The histogram contains three classes for the random
variable y, corresponding to y = 0, y = 1, and y = 2. Because p(0) = ¼, the
theoretical relative frequency for y = 0 is ¼; p(1) = ½, hence the theoretical
relative probability for y = 1 is ½, etc. The histogram is shown in Figure 1
(Probability Histogram for p(y) (modified from Davis, 1986)).
Figure 1
If you were to draw a sample from this population, by throwing two balanced
coins, say 100 times, and recorded the number of heads observed each time to
construct a histogram for the 100 measurements, your histogram would appear
very similar to that of Figure 1. If you repeated the experiment with 1000 coin
tosses, the similarity would be even more pronounced.
Conditional Probability
The concept of conditional probability is key to oil and gas exploration, because
once a well is drilled, it makes more information available, and allows us to revise
our estimates of the probability of further outcomes or events. Two events are
often related in such a way that the probability of occurrence of one event
depends upon whether the other event has or has not occurred. Such a
dependence on a prior event describes the concept of Conditional Probability: the
chance that a particular event will occur depends on whether another event
occurred previously.
For example, suppose an experiment consists of observing weather on a specific
day. Let event A = „snow‟ and B = „temperature below freezing‟. Obviously,
events A and B are related, but the probability of snow, P(A), is not the same as
the probability of snow given the prior information that the temperature is below
freezing. The probability of snow, P(A), is the fraction of the entire population of
observations which result in snow. Now examine the sub-population of
observations resulting in B, temperature below freezing, and the fraction of these
resulting in snow, A. This fraction, called the conditional probability of A given B,
may equal P(A), but we would expect the chance of snow, given freezing
temperatures, to be larger.
In statistical notation, the conditional probability that event A will occur given that
event B has occurred already is written as:
P(A|B)
where the vertical bar in the parentheses means “given” and events appearing to
the right of the bar have occurred (Mendenhall, 1971).
Thus, we define the conditional probabilities of A given B as:
P(A|B) = P(AB)
P(B)
Where:
P(A | B) = the probability that event A will occur, given that event B has
already occurred
P(B | A) = the probability that event B will occur, given that event A has
already occurred
P(A) = the probability that event A will occur
P(B | A') = the probability that event B will occur, given that event A has not
already occurred
P(A') = the probability that event A will not occur
A practical geostatistical application using Bayes‟ Theorem is described in an
article by Doyen; et al. (1994) entitled Bayesian Sequential Indicator Simulation
of Channel Sands in the Oseberg Field, Norwegian North Sea.
INTRODUCTION
Geoscientists are often tasked with estimating the value of a reservoir property at
a location where that property has not been previously measured. The estimation
procedure must rely upon a model describing how the phenomenon behaves at
unsampled locations. Without a model, there is only sample data, and no
inference can be made about the values at locations that were not sampled. The
underlying model and its behavior is one of the essential elements of the
geostatistical framework.
Random variables and their probability distributions form the foundation of the
geostatistical method. Unlike many other estimation methods (such as linear
regression, inverse distance, or least squares) that do not state the nature of their
model, geostatistical estimation methods clearly identify the basis of the models
used (Isaaks and Srivastava, 1989). In this section, we define the random
variable and briefly review the essential concepts of important probability
distributions. The random variable is further explained later, in Spatial Correlation
Analysis and Modeling.
10- 13 13 62 62
11
11- 17 17 79 79
12
12- 13 13 92 92
13
13- 4 4 94 94
14
>14 4 4 100 100
1989).
2b
(Sometimes, the histograms are converted to continuous curves by running a line
from the midpoint of each bar in the histogram. This process may be convenient
for comparing continuous and discrete random variables, but may tend to
confuse the presentation.)
Figure 3
1986) shows the probabilities associated with all possible outcomes of the five-
well drilling program.
Negative Binomial Probability Distribution
Other discrete distributions can be developed for experimental situations with
different basic assumptions. We can develop a Negative Binomial Probability
Distribution to find the probability that x dry holes will be drilled before r
discoveries are made.
Problem: Drill as many holes as needed to discover two new fields in a virgin
basin.
Assumption: The same conditions that govern the binomial distribution are
assumed, except that the number of “trials” is not fixed.
The probability distribution governing such an experiment is the negative
binomial. Thus we can investigate the probability that it will require, 2, 3, 4, …, up
to n exploratory wells before two discoveries are made.
The expanded form of the negative binomial equation is
P = [(r + x -1)!/(r -1)!x!][(1 -p)x pr
Where:
P = the probability of success
r= the number of discovery wells
x = the number of dry holes
p = regional success ratio
If the regional success ratio is 10 %, the probability that a two-hole exploration
program will meet the company‟s goal of two discoveries can be calculated:
r=2
x=0
p = 0.10
P = 0.029
The calculated probabilities are low because they relate to the likelihood of
obtaining two successes and exactly x dry holes (in this case: x = zero). It may be
more appropriate to consider the probability distribution that more than x dry
holes must be drilled before the goal of r discoveries is achieved. We do this by
first calculating the cumulative form of the negative binomial. This gives the
probability that the goal of two successes will be achieved in (x + r) or fewer
holes, as shown in Figure 4 (Discrete distribution giving the cumulative
probability that two discoveries will be made by or before a specified hole is
drilled, when the success ratio is 10% (modified from Davis, 1986)).
Figure 4
Each of these probabilities is then subtracted from 1.0 to yield the desired
probability distribution illustrated in Figure 5 (Discrete distribution giving the
probability that more than a specified number of holes must be drilled to make
two discoveries, when the success ratio is 10% (modified from Davis, 1986)).
Figure 5
Figure 6
Frequency Distributions Of Continuous Random Variables
Frequency distributions of continuous random variables follow a theoretical
probability distribution or probability density function that can be represented by a
continuous curve. These functions can take on a variety of shapes. Rather than
displaying the functions as a curve, the distributions may be displayed as a
histogram, as shown in Figure 7a, 7b,
Figure 7a
7c,
7b
7c
7d
1 1 Y 2
Z e
2 2
Where
Z is the height of the ordinate (y-axis) of the curve and represents the
density of the function. It is the dependent variable in the expression, being
a function of the variable Y.
There are two constants in the equation: , well-known to be approximately
3.14159, making 1/2 equal 0.39894, and e, the base of the Naperian or
natural logarithms, whose value is approximately 2.71828.
There are two parameters in the normal probability density function. These
are the parametric mean, , and the standard deviation, , which determine
the location and shape of the distribution (these parameters are discussed
under Summary Statistics). Thus, there is not just one normal distribution,
rather there is an infinity of such curves, because the parameters can
assume an infinity of values (Sokol and Rohlf, 1969).
Figure 8a
Figure 8b
which corresponds to its mean () and its points of inflection. The distance
between and one of the points of inflection represents the standard deviation,
sometimes referred to as the mean variation. The square of the mean variation is
the variance.
In a normal frequency distribution, the standard deviation may be used to
characterize the sample distribution under the bell curve. According to Sokol and
Rohlf, (1969): 68.3% of all sample values fall within -1 to +1 from the mean,
while 95.4% of the sample values fall within -2and +2 from the mean, and
99.7% of the values are contained within -3 and +3 of the mean. This bears
repeating, in a different format this time:
(1 standard deviation) contains 68.3% of the data
2 (2 standard deviations) contain 95.46% of the data
3 (3 standard deviations) contain 99.73% of the data
How are the percentages calculated? The direct calculation of any portion of the
area under the normal curve requires an integration of the function shown as the
above expression. Fortunately, for those who have forgotten their calculus, the
integration has recorded in tabular form (Sokol and Rohlf, 1969). These tables
can be found in most standard statistical books, for example, see Statistical
Tables and Formulas, Table 1 (Hald, 1952).
Application of the Normal Distribution
The normal frequency distribution is the most widely used distribution in statistics.
There are three important applications of the density function (Sokol and Rohlf,
1969).
1. Sometimes we need to know whether a given sample is normally
distributed before we can apply certain tests. To test whether a sample
comes from a normal distribution we must calculate the expected
frequencies for a normal curve of the same mean and standard deviation,
then compare the two curves.
2. Knowing when a sample comes from a normal distribution may confirm or
reject underlying hypotheses about the nature of the phenomenon studied.
3. Finally, if we assume a normal distribution, we may make predictions
based upon this assumption. For the geosciences, this means a better and
unbiased estimation of reservoir parameters between the well data.
Normal Approximation to the Binomial Distribution
Recall that approximately 95% of the measurements associated with a normal
distribution lie within two standard deviations of the mean and almost all lie within
three standard deviations. The binomial probability distribution would nearly be
symmetrical if the distribution were able to spread out a distance equal to two
standard deviations on either side of the mean, which in fact is the case.
Therefore, to determine the normal approximation we calculate the following
when the outcome of a trial (n) results in a 0 or 1 success with probabilities q and
p, respectively:
= np
= npq
If the interval 2 lies within the binomial bounds, 0 and n, the approximation
will be reasonably good (Mendenhall, 1971).
Lognormal Distribution
Many variables in the geosciences do not follow a normal distribution, but are
highly skewed, such as the distribution in Figure 7b, and as shown below.
Figure 9 Schematic histogram of sizes and numbers of oil field discoveries of
hundred thousand-barrel equivalent.
Figure 9
The histogram illustrates that most fields are small, with decreasing numbers of
larger fields, and a few rare giants that exceed all others in volume. If the
histograms of Figure 7b and Figure 9 are converted to logarithmic forms (that is,
we use Yi = log Xi instead of Yi =Xi for each observation), the distribution
becomes nearly normal. Such variables are said to be lognormal.
Transformation of Lognormal data to Normal
The data can be converted into logarithmic form by a process known as
transformation. Transforming the data to a standardized normal distribution (i.e.,
zero mean and unit variance) simplifies data handling and eases comparison to
different data sets.
Data which display a lognormal distribution, for example, can be transformed to
resemble a normal distribution by applying the formula ln(z) to each z variate in
the data set prior to conducting statistical analysis. The success of the
transformation can be judged by observing its frequency distribution before and
after transformation. The distribution of the transformed data should be markedly
less skewed than the lognormal data. The transformed values may be back-
transformed prior to reporting results.
Because of its frequent use in geology, the lognormal distribution is extremely
important. If we look at the transformed variable Yi rather than Xi itself, the
properties of the lognormal distribution can be explained simply by reference to
the normal distribution.
In terms of the original transformed variable Xi, the mean of Y corresponds to the
nth root of the products of Xi,
Y GM n Xi
Where:
GM is the geometric mean
is analogous to , except that all the elements in the series are multiplied
rather than added together (Davis, 1986).
In practice, it is simpler to convert the measurements into logarithms and
compute the mean and variance. If you want, the geometric mean and variance
compute the antilog of Y and s2y. If you work with the data in the transformed
state, all of the statistical procedures that are appropriate for ordinary variables
are applicable to the log transformed variables (Davis, 1986).
The characteristics of the lognormal distribution are discussed in a monograph by
Aitchison and Brown (1969) and in the geological context by Kock and Link
(1981).
Random Error
Random errors for normal distributions are additive, which means that errors of
opposite sign tend to cancel one another, and the final measurement is near the
true value. Lognormal distribution random errors are multiplicative, rather than
additive, thus produce an intermediate product near the geometric mean.
INTRODUCTION
There are several ways in which to summarize a univariate (single attribute)
distribution. Quite often we will simply compute the mean and the variance, or
plot its histogram. However, these statistics are very sensitive to extreme values
(outliers) and do not provide any spatial information, which is the heart of a
geostatistical study. In this section, we will describe a number of different
methods that can be used to analyse data for a single variable.
SUMMARY STATISTICS
The summary statistics represented by a histogram can be grouped into three
categories:
measures of location,
measures of spread, and
measures of shape.
Measures of Location
Measures of location provide information about where the various parts of the
data distribution lie, and are represented by the following:
Minimum: Smallest value.
Maximum: Largest value.
Median: Midpoint of all observed data values, when arranged in
ascending order. Half the values are above the median, and half are
below. This statistic represents the 50th percentile of the cumulative
frequency histogram and is not generally affected by an occasional
erratic data point.
Mode: The most frequently occurring value in the data set. This value falls
within the tallest bar on the histogram.
Quartiles: In the same way that the median splits the data into halves, the
quartiles split the data in quarters. Quartiles represent the 25th, 50th
and 75th percentiles on the cumulative frequency histogram.
Mean: The arithmetic average of all data values. (This statistic is quite
sensitive to extreme high or low values. A single erratic value or outlier
can significantly bias the mean.) We use the following formula to
determine the mean of a Population:
Mean = = i
N
where:
= population mean
N = number of observations (population size)
ZI = sum of individual observations
Measures of Spread
Measures of spread describe the variability of the data values, and are
represented by the following:
Variance: Average squared difference of the observed values from the
mean. Because the variance involves squared differences, this statistic
is very sensitive to abnormally high/low values.
i
2
Variance =
N
Kachigan (1986) notes that the above formula is only appropriate for
defining variance of a population of observations. If this same formula was
applied to a sample for the purpose of estimating the variance of the
parent population from which the sample was drawn, then the formula
above will tend to underestimate the population variance. This
underestimation occurs as repeated samples are drawn from the
population and the variance is calculated from each, using the sample
mean ( x , rather than the population mean (). The resulting average of
these variances would be lower than the true value of the population
variance (assuming we were able to measure every single member of the
population).
We can avoid this bias by taking the sum of squared deviations and
dividing that sum by the number of observations – less one. Thus, the
sample estimate of population variance is obtained using the following
formula:
Variance = s
i x
2
n 1
This measure is used to show the extent to which the data is spread
around the vicinity of the mean, such that a small value of standard
deviation would indicate that the data was clustered near to the mean. For
example, if we had a mean equal to 10, and a standard deviation of 1.3,
then we could predict that most of our data would fall somewhere between
(10 - 1.3) and (10 + 1.3), or between 8.7 to 11.3. The standard deviation is
often used instead of the variance, because the units are the same as the
units of the attribute being described.
Interquartile Range: Difference between the upper (75th percentile) and
the lower (25th percentile) quartile. Because this measure does not use
the mean as the center of distribution, it is less sensitive to abnormally
high/low values.
Figure 1a and 1b illustrate histograms of porosity with a mean of about 15 %, but
different variances.
1b
Figure 1a
Measures of Shape
Measures of shape describe the appearance of the histogram and are
represented by the following:
Coefficient of Skewness: Averaged cubed difference between the data
values and the mean, divided by the cubed root of the standard
deviation. This measure is very sensitive to abnormally high/low
values:
CS1/nZi -)3/
where:
is the mean
is the standard deviation
n is the number of X and Y data pairs
The coefficient of skewness allows us to quantify the symmetry of the data
distribution, and tells us when a few exceptional values (possibly outliers?)
exert a disproportionate effect upon the mean.
positive: long tail of high values (median < mean)
negative: long tail of low values (median > mean)
zero: a symmetrical distribution
Figure 2a, 2b,
Figure 2a
and 2c
2c
2b
Advantages
Easy to calculate.
Provides information in a very condensed form.
Can be used as parameters of a distribution model (e.g., normal
distribution defined by sample mean and variance).
Limitations
Summary statistics are too condensed, and do not carry enough
information about the shape of the distribution.
Certain statistics are sensitive to abnormally high/low values that properly
belong to the data set (eg.,,CS).
Offers only a limited description, especially if our real interest is in a
multivariate data set (attributes are correlated).
INTRODUCTION
Methods for bivariate description not only provide a means to describe the
relationship between two variables, but are also the basis for tools used to
analyze the spatial content of a random function (to be described in the Spatial
Correlation and Modeling Analysis section). The bivariate summary methods
described in this section only measure the linear relationship between the
variables - not their spatial features.
SCATTERPLOTS
The most common bivariate plot is the Scatterplot, Figure 1 (Scatterplot of
Porosity (dependent variable) versus Acoustic Impedance (independent
variable)).
Figure 1
This plot follows a common convention, in which the dependent variable (e.g.,
porosity) is plotted on the Y-axis (ordinate) and the independent variable (e.g.,
acoustic impedance) is plotted on the X-axis (abscissa). This type of plot serves
several purposes:
detects a linear relationship,
detects a positive or inverse relationship,
identifies potential outliers,
provides an overall data quality control check.
This plot displays an inverse relationship between porosity and acoustic
impedance, that is, as porosity increases, acoustic impedance decreases. This
display should be generated before calculating bivariate summary statistics, like
the covariance or correlation coefficient, because many factors affect these
statistical measures. Thus, a high or low value has no real meaning until verified
visually.
A common geostatistical application of the scatterplot is the h-scatterplot. (In
geostatistics, h commonly refers to the lag distance between sample points.)
These plots are used to show how continuous the data values are over a certain
distance in a particular direction. If the data values at locations separated by h
are identical, they will fall on a line x = y, a 45-degree line of perfect correlation.
As the data becomes less and less similar, the cloud of points on the h-
Scatterplot becomes fatter and more diffuse. A later section will present more
detail on the h-scatterplot.
COVARIANCE
Covariance is a statistic that measures the correlation between all points of two
variables (e.g., porosity and acoustic impedance). This statistic is a very
important tool used in Geostatistics to measure spatial correlation or dissimilarity
between variables, and forms the basis for the correlogram and variogram
(detailed later).
The magnitude of the covariance statistic is dependent upon the magnitude of the
two variables. For example, if the Xi values are multiplied by the factor k, a scalar,
then the covariance increases by a factor of k. If both variables are multiplied by
k, then the covariance increases by k2. This is illustrated in the table below.
COVARIAN
VARIABLES
CE
X and Y 3035.63
X*10 and
303563
Y*10
Figure 2
The numerator for the correlation coefficient is the covariance. This value is
divided by the product of the standard deviations for variables X and Y. This
normalizes the covariance, thus removing the impact of the magnitude of the data
values. Like the covariance, outliers adversely affect the correlation coefficient.
The Correlation Coefficient formula (for a population) is:
X i x Yi y
n
Corr. Coeff.x,y = x , y xy
where:
Xi is the X variable
Yi is the Y variable
x is the mean of X
y is the mean of Y
x is the standard deviation of X
y is the standard deviation of Y
n is the number of X and Y data pairs
As with other statistical formulas, Greek is used to signify the measure of a
population, while algebraic notation ( r ) is used for samples.
Rho Squared
The square of the correlation coefficient 2 (also referred to as r2) is a measure of
the variance accounted for in a linear relation. This measure tells us about the
extent to which two variables covary. That is, it tells us how much of the variance
seen in one variable can be predicted by the variance found in the other variable.
Thus, a value of = -0.83 between porosity and acoustic impedance tells us that
as porosity increases in value, velocity decreases, which has a real physical
meaning. However, only about 70% (actually, it is -0.832, or 68.89%)of the
variability in porosity is explained by its relationship with acoustic impedance.
In keeping with statistical notation, the Greek symbol 2 is used to denote the
correlation coefficient of a population, while the algebraic equivalent is used to r2
refer to the correlation coefficient of a sample.
Linear Regression
Linear regression is another method we use to indicate whether a linear
relationship exists between two variables. This is a useful tool, because once we
establish a linear relationship, we may later be able to interpolate values between
points, extrapolate values beyond the data points, detect trends, and detect
points that deviate away from the trend.
Figure 3 (Scatterplot of inverse linear relationship between porosity and acoustic
impedance, with a correlation coefficient of -0.83), shows a simple display of
regression.
Figure 3
When two variables have a high covariance (strong correlation), we can predict a
linear relationship between the two. A regression line drawn through the points
of the scatterplot helps us to recognize the relationship between the variables. A
positive slope (from lower left to upper right) indicates a positive or direct
relationship between variables. A negative slope (from upper left to lower right)
indicates a negative or inverse relationship. In the example illustrated in the
above figure, the porosity clearly tends to decrease as acoustic impedance
increases.
The regression equation has the following general form:
Y = a + bXi,
where:
Y is the dependent variable, or the variable to be estimated (e.g., porosity)
Xi is the independent variable, or the estimator (e.g., velocity)
b is the slope; defined as b = (y/x), and
is the correlation coefficient between X and Y
x is the standard deviation of X
y is the standard deviation of Y
a is a Constant, which defines the ordinate (Y-axis) intercept
and:
a = x -by
x is the mean of X
y is the mean of Y
With this equation, we can plot a regression line that will cross the Y-axis at the
point “a”, and will have a slope equal to “b”
Linear equations can include polynomials of any degree, and may include
combinations of logarithmic, exponential or any other non-linear variables.
The terms in the equation for which coefficients are computed are independent
terms, and can be simple (a single variable) or compound (several variables
multiplied together). It is also common to use cross terms (the interaction
between X and Y), or use power terms.
Z = a + bX +cY: uses X and Y as predictors and a constant
Z = a + bX + cY + dXY: adds the cross term
Z = a + bX + cY + dXY +eX 2 + fY2: adds the power terms
Advantages
Easy to calculate.
Provides information in a very condensed form.
Can be used to estimate one variable from another variable or from
multiple variables.
Limitations
mvarhaug· Summary statistics sometimes can be too condensed, Formatted: Bullets and Numbering
and do not carry enough information about the shape of the
distribution.
Certain statistics are sensitive to abnormally high/low values that properly
belong to the data set (e.g., covariance, correlation coefficient).
Outliers can highly bias a regression predication equation.
No spatial information.
EDA PROCESS
Note that there is no one set of prescribed steps in EDA. Often, the process will
include a number of the following tasks, depending on the amount and type of
data involved:
data preprocessing
univariate and multivariate statistical analysis
identification and probable removal of outliers
identification of sub-populations
data posting
quick maps
sampling of seismic attributes at well locations
At the very least, you should plot the distribution of attribute values within your
data set. Look for anomalies in your data, and then look for possible explanations
for those anomalies. By employing classical statistical methods to analyze your
data, you will not only gain a clearer understanding of your data, but will also
discover possible sources of errors and outliers.
Geoscientists tasked with making predictions about the reservoir will always face
these limitations:
Most prospects provide only a very few direct “hard” observations (well
data)
“Soft” data (seismic) is only indirectly related to the “hard” well data
A scarcity of observations can often lead to a higher degree of uncertainty
These problems can be compounded when errors in the data are overlooked.
This is especially troublesome with large data sets, and when computers are
involved; we simply become detached from our data. A thorough EDA will foster
an intimate knowledge of the data to help you flag bogus results. Always take the
time to explore your data.
INTRODUCTION
All interpolation algorithms require a standard for selecting data, referred to as
the search neighborhood. The parameters that define a search neighborhood
include:
Search radius
Neighborhood shape
Number of sectors ( 4 or 8 are common)
Number of data points per sector
Azimuth of major axis of anisotropy
When designing a search neighborhood, we should remember the following
points:
Each sector should have enough points ( 4) to avoid directional sampling
bias.
CPU time and memory requirements grow rapidly as a function of the
number of data points in a neighborhood.
We will see a further example of the search neighborhood in our later discussion
on kriging.
SEARCH STRATEGIES
Two common search procedures are the Nearest Neighbor and the Radial
Search methods. These strategies calculate the value of a grid node based on
data points in the vicinity of the node.
Nearest Neighbor
One simple search strategy looks for data points that are closest to the grid node,
regardless of their angular distribution around the node. The nearest neighbor
search routine is quick, and works well as long as samples are spread about
evenly. However, it provides poor estimates when sample points are
concentrated too closely along widely spaced traverses.
Another drawback to the nearest neighbor method occurs when all nearby points
are concentrated in a narrow strip along one side of the grid node (such as might
be seen when wells are drilled along the edge of a fault or pinchout). When this
occurs, the selection of points produces an estimate of the node that is
essentially unconstrained, except in one direction. This problem may be avoided
by specifying search parameters which select control points that are evenly
distributed around the grid node.
Radial Searches
Two common radial search procedures are the quadrant search, and its close
relative, the octant search. Each is based on a circular or elliptical area, sliced
into four or eight equal sections. These methods require a minimum number of
control points for each of the four or eight sections surrounding the grid node.
These constrained search procedures test more neighboring control points than
the nearest neighbor search, which increases the time required. Such constraints
on searching for nearest control points will expand the size of the search
neighborhood surrounding the grid node because a number of nearby control
points will be passed over in favor of more distant points that satisfy the
requirement for a specific number of points being selected from a single sector.
In choosing between the simple nearest neighbor approach and the constrained
quadrant or octant searches, remember that the autocorrelation of a surface
decreases with increasing distance, so the remote data points sometimes used
by the constrained searches are less closely related to the location being
estimated. This may result in a grid node estimate that is less realistic than that
produced by the simpler nearest neighbor search.
SPATIAL DESCRIPTION
One of the distinguishing characteristics of earth science data is that these data
sets are assigned to some particular location in space. Spatial features of the
data sets, such as the degree of continuity, directional trends and location of
extreme values, are of considerable interest in developing a reservoir description.
The statistical descriptive tools presented earlier are not able to capture these
spatial features. In this section, we will use a data set from West Texas to
demonstrate tools that describe spatial aspects of the data.
DATA POSTING
Data posting is an important initial step in any study (Figure 1: Posted porosity
data for 55 wells from North Cowden Field in West Texas).
Figure 1
Not only do these displays reveal obvious errors in data location, but they often
also highlight data values that may be suspect. Lone high values surrounded by
low values (or visa versa) are worth investigating. Data posting may provide clues
as to how the data were acquired. Blank areas may indicate inaccessibility
(another company‟s acreage, perhaps); heavily sampled areas indicate some
initial interest. Locating the highest and lowest values may reveal trends in the
data.
In this example, the lower values are generally found on the west side of the
area, with the larger values in the upper right quadrant. The data are sampled on
a nearly regular grid, with only a few holes in the data locations. The empty spots
in the lower right corner are on acreage belonging to another oil company. Other
missing points are the result of poor data, and thus are not included in the final
data set. More information is available about this data set in an article by
Chambers, et al. (1994). This data set and acoustic impedance data from a high-
resolution 3D seismic survey will be used to illustrate many of the geostatistical
concepts throughout the remainder of this presentation.
DATA DISTRIBUTION
A reservoir property must be mapped on the basis of a relatively small number of
discrete sample points (most often consisting of well data). When constructing
maps, either by hand or by computer, attention must be paid to the distribution of
those discrete sample points. The distribution of points on maps (Figure 2,
Typical distribution of data points within the map area.
Figure 2
INTRODUCTION
One of our many tasks as geoscientists is to create contour maps. Although
contouring is still performed by hand, the computer is used more and more to
map data, especially for large data sets. Unfortunately, data are often fed into the
computer without any special treatment or exploratory data analysis. Quite often,
defaults are used exclusively in the mapping program, and the resulting maps are
accepted without question, even though the maps might violate sound geological
principles.
Before using a computer to create a contour map it is necessary to create a grid
and then use the gridding process to create the contours. This discussion will
introduce the basic concepts of grids, gridding and interpolation for making
contour maps.
WHAT IS A GRID?
Taken to extremes, every map contains an infinite number of points within its
map area. Because it is impractical to sample or estimate the value of any
variable at an infinite number of points within the map area, we define a grid to
describe locations where estimates will be calculated for use in the contouring
process.
A grid is formed by arranging a set of values into a regularly spaced array,
commonly a square or rectangle, although other grid forms may also used. The
locations of the values represent the geographic locations in the area to be
mapped and contoured (Jones, et al., 1986). For example, well spacing and
known geology might influence your decision to calculate porosity every 450 feet
in the north-south direction, and every 300 feet in the east-west direction. By
specifying a regular interval of columns (every 450 feet in the north-south
direction) and rows (every 300 feet in the east-west direction), you have, in effect,
created a grid.
Grid nodes are formed by the intersection of each column with a row. The area
enclosed by adjacent grid nodes is called a grid cell (three nodes for a triangular
arrangement, or more commonly, four nodes for a square arrangement).
Because the sample data represent discrete points, a grid should be designed to
reflect the average spacing between the wells, and designed such that the
individual data points lie as closely as possible to a grid node.
GRID SPACING
The grid interval controls the detail that can be seen in the map. No features
smaller than the interval are retained. To accurately define a feature, it must
cover two to three grid intervals; thus the cell should be small enough to show the
required detail of the feature. However, there is a trade-off involving grid size.
Large grid cells produce quick maps with low resolution, and a course
appearance. While small grid cells may produce a finer appearance with better
resolution, they also tend to increase the size of the data set, thus leading to
longer computer processing time; furthermore a fine grid often imparts gridding
artifacts that show up in the resulting map (Jones, et al., 1986).
A rule of thumb says that the grid interval should be specified so that a given grid
cell contains no more than one sample point. A useful approach is to estimate, by
eye, the average well spacing, and use it as the grid interval, rounded to an even
increment (e.g., 200 rather than 196.7).
2. Designing the grid over the area (Figure 2 , Grid design superimposed on
the control points);
Figure 2
Figure 4
To illustrate these steps, we will use porosity measurements from the previously
mentioned West Texas data set.
TRADITIONAL INTERPOLATION METHODS
INTRODUCTION
The point-estimation methods described in this section consist of common
methods used to make contour maps. These methods use non-geostatistical
interpolation algorithms and do not require a spatial model. They provide a way to
create an initial “quick look” map of the attributes of interest. This section is not
meant to provide an exhaustive dissertation of the subject, but will introduce
certain concepts needed to understand the principles of geostatistical
interpolation and simulation methods discussed in later sections.
Most interpolation methods use a weighted average of values from control points
in the vicinity of the grid node in order to estimate the value of the attribute
assigned to that node. With this approach, the attribute values of the nearest
control points are weighted according to their distance from the grid node, with
the heavier weights assigned to the closest points. The attribute values of grid
nodes that lie beyond the outermost control points must be extrapolated from
values assigned to the nearest control points.
Many of the following methods require the definition of Neighborhood parameters
to characterize the set of sample points used during the estimation process,
given the location of the grid node. For the upcoming examples, we‟ve specified
the following neighborhood parameters:
Isotropic ellipse with a radius = 5000 feet
4 quadrants
A minimum of 7 sample points, with an optimum of 3 sample points per
quadrant
These examples use porosity measurements, located on a nearly regular grid.
See Figure 1 (Location and values of control points within the mapping area at
North Cowden Field, West Texas) for the sample locations and porosity values.
Figure 1
The following seven estimation methods will be discussed in turn:
Inverse Distance
Closest Point
Moving Average
Least Squares Polynomial
Spline
Polygons of Influence
Triangulation
The first five estimation methods are accompanied by images that illustrate the
patterns and relative magnitude of the porosity values created by each method.
All images have the same color scale. The lowest value of porosity is dark blue
(5%) and the highest value is red (13%), with a 0.5% color interval. However, for
the purpose of this illustration, the actual values are not important at this time.
(No porosity mapping images were produced for the polygons of influence and
triangulation methods.)
INVERSE DISTANCE
This estimation method uses a linear combination of attribute values from
neighboring control points. The weights assigned to the measured values used in
the interpolation process are based on distance from the grid node, and are
inversely proportional, at a given power (p). If the smallest distance is smaller
than a given threshold, the value of the corresponding sample is copied to the
grid node. Large values of p ( 5 or greater) create maps similar to the closest
point method (Isaaks and Srivastava, 1989; Jones, et al., 1986), which will be
described next. The equation for the inverse distance method has the following
form:
Where:
1 / d
p
(1 / d )
p
And:
Z* = the target grid node location
= the weights
Z = the data points
dp = power of distance from Z to Z*
The Inverse Distance method is recommended as a “first pass” through the data
because it:
is simple to use and understand.
produces a “quick map.”
is an excellent QC tool.
locates “bulls-eye” effect (lone high or low values).
spots erroneous sample locations.
gives a first indication of trends.
CLOSEST POINT
The closest point (Figure 3) or nearest neighbor methods consist of copying
the value of the closest sample point to the target grid node.
Figure 3
This method can be viewed as a linear combination of the neighboring points with
all the weights equal to 0, except the weight attached to the closest point which is
equal to 100% (Henley, 1981; Jones, et al., 1986).
Z* = Z (closest
Where
Z* = the target grid node location
Z = the data points
MOVING AVERAGE
The moving average method (Figure 4) is the most frequently used estimation
method.
Figure 4
Each neighboring sample point is given the same weight. The weight is
calculated so that the sum of the weights of all the neighboring sample points
sum to unity (Henley, 1981; Jones, et al., 1986).
So, if we assume that there are N neighboring data,
Z* = Z /N
Where
Z* = the target grid node location
Z = the data points
N = the neighboring data
The moving average takes its name from the process of estimating the attribute
value at each grid node based on the weighted average of nearby control points
in the search neighborhood, and then moving the neighborhood from grid node to
node.
SPLINE
Spline fitting is a commonly used quantitative method. The method ignores
geologic trends and allows sample location geometry to dictate the range of
influence of the samples. The bicubic spline (Figure 6) is a two-dimensional
gridding algorithm.
Figure 6
In one-dimension, the function has the form of a flexible rod between the sample
points. In two dimensions, the function has the form of a flexible sheet. The
objective of the method is to fit the smoothest possible surface through all the
samples using a least squares polynomial approach (Jones, et al., 1986).
POLYGONS OF INFLUENCE
This is a very simple method, and is often used in the mining industry to estimate
average ore grade within blocks. Often, the value estimated at any location is
simply the value of the closest point. The method is similar to the closest point
approach. Polygonal patterns are created, based on sample location. Polygon
boundaries represent the distance midway between adjacent sample locations.
As long as the points we are estimating fall within the same polygon of influence,
the polygonal estimate does not change. As soon as we encounter a grid node in
a different polygon, the estimate changes to a different value. This method
causes abrupt discontinuities in the surface, and may create unrealistic maps
(Isaaks and Srivastava, 1989; Henley, 1981).
TRIANGULATION
The triangulation method is used to calculate the value of a variable (such as
depth, or porosity for instance) in an area of a map located between 3 known
control points. Triangulation overcomes the problem of the polygonal method,
removing possible discontinuities between adjacent points by fitting a plane
through three sample points that surround the grid node being estimated (Isaaks
and Srivastava, 1989). The equation of the plane is generally expressed as:
Z* = ax + by + c
This method starts by connecting lines between the 3 known control points to
form a triangle (denoted as rst in Figure 7).
Figure 7
Next, join a line from the unknown point (point O in the figure), to each of the
corners of the triangle, thereby forming 3 new triangles within the original triangle.
The value of any point located within the triangle (point O in this example) can be
determined through the following steps:
1. Compute the areas of the resulting new triangles
2. Use the areas of each triangle to establish a weight for each corner point
3. Multiply the values of the three corner points by their respective weights,
and
4. Add the resulting products.
The formula to find the area of a triangle is:
A = bh/2
Where
A is the area of the triangle,
b is the length of the base, and
h is the length of the height, taken perpendicular to the base.
Weights are assigned to each value in proportion to the area of the triangle
opposite the known value, as shown by the example in the Figure 7. This
example shows how the values from the three closest locations are weighted by
triangular areas to form an estimated value at point O. The control values (r,s,t)
are located at the corners of the triangle.
The value at point r is weighted by the triangular area A Ost,
point s is weighted by the area A Ort, and
point t is weighted by the area A Ors.
The weights are taken as a percentage, where the sum of all 3 weights equals 1.
Now multiply the weight times its associated control value to arrive at a weighted
control value. Do this for each of the three points. Then add up the 3 weighted
control values to triangulate an interpolated depth for point O.
Be aware, however, that choosing different meshes of triangles or entering the
data in a different sequence may result in a different set of contours for your map.
MAP DISPLAY TYPES
CONTOUR MAPS
Contour maps reveal overall trends in the data values. Hand contouring the data
is an excellent way to become familiar with the data set. Unfortunately, many
data sets are too large to hand contour, so computer contouring is often the only
alternative.
At this preliminary stage of spatial description, the details of the contouring
algorithm need not concern us as long as the contour map is a good visual
display of the data. Gridding and contouring of the data requires values to be
interpolated onto a regular grid. For a first pass through the data, the inverse
distance algorithm is a good choice. The inverse distance parameters are easy to
set; then choose an octant isotropic octant search neighborhood with about two
data points per octant, if possible. Design the grid interval to be about the
average spacing of the wells, or half that size. Figure 1 (Grid mesh and data
location of 55 porosity data points from North Cowden Field in West Texas)
Figure 1
shows the grid design with respect to the data locations and Figure 2 is the
resulting contour map using an inverse distance approach with a distance power
equal to 1.
Figure 2
In this example, the high porosity area is located in the upper right quadrant,
extending down the right side of the mapped area. There is a second region of
high porosity in the southern, central portion of the area. We can see that low
values are generally trending north-south, with a zone of lower porosity trending
east to west through the central portion of the area. Displays such as this will aid
in designing the spatial analysis strategy and help to highlight directional
continuity trends.
SYMBOL MAPS
For many large data sets (for example, seismic), posting individual sample values
may not be feasible and contouring may mask interesting local details (Isaaks
and Srivastava, 1989). An alternative approach is a symbol map. Different colors
for ranges of data values can be used to reveal trends in high and low values.
Figure 3 is a five-color symbol map of 33,800 acoustic impedance values, scaled
between 0 and 1, from a high resolution 3D seismic survey.
Figure 3
Previous studies with this data set and its accompanying porosity data set
(Chambers, et al., 1994) show that acoustic impedance has a -0.83 correlation
with porosity. Therefore, observation from this map may indicate zones of high
porosity associated with the red and orange areas (see contour and data posting
maps, Figure 1 and Figure 2). Low porosity is located in the blue and green
areas. If this relationship holds, the seismic data can be used to infer porosity in
the inter-well regions using a geostatistical data integration technique commonly
referred to as cokriging (which we will describe later in the section on Data
Integration).
INDICATOR MAPS
An indicator map is a special type of symbol map with only two symbols, for
example, black and white. With these two symbols, each data point is assigned to
one of two classes. Indicator maps simply record when a data value is above or
below a certain threshold. Four indicator maps (Figure 4 ) were created from the
acoustic impedance data shown in Figure 3. The threshold values are 0.2, 0.4,
0.6, and 0.8 acoustic impedance scaled units.
Figure 4
These maps show definite trends in high and low values, which relate to trends in
porosity values.
Figure 4 parts B and C are perhaps the most revealing. Zones of lowest scaled
impedance are located in the upper right quadrant of the study area (Figure 4
part B). Zones of highest impedance are on the western side of the study area,
trending generally north to south. There is also a zone of high impedance cutting
east to west across the southern, central portion of the area, with another north to
south trend in the lower right corner of the map area.
Data posting, contour, symbol and indicator maps provide us with a lot of
information about the spatial arrangement and pattern in our data sets. These are
also excellent quality control displays and provide clues about potential data
problems.
INTRODUCTION
Moving window statistics provide a way to look for local anomalies in the data set,
heteroscedasticity in statistical jargon. In earth science data it is quite common to
find data values in some regions that display more variability than in other
regions. For example, local high (thief zones) or low (barriers) zones of
permeability hamper the effective recovery of hydrocarbons and creates many
more problems with secondary and tertiary recovery operations.
The calculation of a few summary statistics (mean and standard deviation) within
moving windows is often useful for investigating anomalies in the data (Isaaks
and Srivastava, 1989). The method is quite simple, and consists of :
dividing the area into local neighborhoods of equal size,
then computing summary statistics within each local area.
The window size depends on the average data spacing, dimensions of the study
area, and amount of data. We are looking for possible trends in the local mean
and standard deviation. It is also important to see if the magnitude of the local
standard deviation tracks (correlates) with the magnitude of the local mean,
known as the proportionality effect. See Isaaks and Srivastava (1989) for more
details on moving windows and the proportionality effect.
PROPORTIONALITY EFFECT
The proportionality effect concerns the relationship of the local summary statistics
computed from moving windows. There are four relationships between the local
mean and the local variability (e.g., standard deviation or variance). According to
Isaaks and Srivastava (1989), these relationships are:
There is no trend in the local mean or the variability. The variability is
independent of the magnitude of the local mean. This is the ideal case,
but is rarely seen.
There is a trend in the local mean, but the variability is independent of the
local mean and has no trend.
There is no trend in the local mean, but there is a trend in the local
variability.
There is a trend in both the local mean and local variability. The magnitude
of the local variability correlates with the magnitude of the local mean.
For estimation purposes, the first two cases are the most favorable. If the local
variability is roughly constant, then estimates anywhere in the mapped area will
be as good as estimates anywhere else. If the local mean shows a trend, then we
need to examine our data for signs of stationarity.
CONCEPT OF STATIONARITY
A stationary property is stable throughout the area measured. Stationarity may be
considered the statistical equivalent of homogeneity, in which statistical
parameters such as mean and standard deviation are not seen to change.
Stationarity requires that values in the data set represent the statistical
population.
Ideally, we would like our data to be independent of sample location. However,
data often show a regular increase (or decrease) in value over large distances,
and the data are then said to be non-stationary, or to show a trend (Hohn, 1988;
Isaaks and Srivastava, 1989; Henley, 1981; Wackernagel, 1995).
The concept of stationarity is used in every day practice. For example, consider
the following:
The top of Formation A occurs at a depth of about 975 feet TVDss.
This statement, however, does not preclude the possibility that Formation A
varies in depth from well to well. Thus, if Z (top) is a stationary random function,
and
At location Z(xi), Formation A occurs at 975 feet TVDss, then
At location Z(xi + ½ mile), Formation A should also occur at about 975 feet
TVDss.
However, if Formation A is known to be non-stationary, then predicting the depth
to the top of Formation A in the new well is more difficult, and requires a more
sophisticated model. We will discuss such models and how stationarity influences
them in the section on regionalized variables.
INTRODUCTION
In the reservoir, the variables of interest (e.g., porosity, permeability, sand/shale
volumes, etc.) are products of a variety of complex physical and chemical
processes. These processes superimpose a spatial pattern on the reservoir rock
properties, so it is important to understand the scales and directional aspects of
these features to gain efficient hydrocarbon production. The spatial component
adds a degree of complexity to these variables, and serves to increase the
uncertainty about the behavior of attributes at locations between sample points
(sample points are usually wells). Deterministic models cannot handle the
uncertainties associated with such variables, so a geostatistical approach has
been developed because it is based on probabilistic models that account for
these inevitable uncertainties (Isaaks and Srivastava, 1989).
and Figure 1b: Comparison of porosity data measured over 50 units of distance
with an equal sampling interval):
Figure 1b
This graphic shows a plot of porosity measures along two transects. The sample
spacing is 1 unit of distance. Sequence A, on the left (Regionalized Variable)
shows spatial continuity in porosity, whereas the Sequence B on the right
(Random Variable) shows a random distribution of porosity. However, the mean,
variance and histogram for both porosity sequences are identical.
Autocorrelation
Let us further investigate the concept of spatial continuity by plotting Sequences
A and B in a different way. When any of these data sequences is plotted against
itself, it will yield a slope of 45 degrees, thus indicating perfect correlation.
However, if the data are translated by the sampling interval, then plotted against
itself, we will begin to see the impact of spatial correlation, or the lack of spatial
correlation.
Figure 2a
2c,
2b
2d,
2c
2e,
2d
and 2f
2f
2e
and 3c
3c
Observations
Sequences A and B (above) are presented as h-Scatterplots, where h represents
lag, or separation distance. Recall that the concept of the h-Scatterplot was
discussed in the section on Bivariate Statistical Measures and Displays. The h-
Scatterplot forms the basis for describing a model of spatial correlation. The
shape of the cloud on these plots tells us how continuous the data values are
over a certain distance in a particular direction. For this case, h-Scatterplots were
computed along two different transects. If the data values at locations separated
by h are identical, then they will fall on a line x = y, a 45-degree line of perfect
correlation. As the data becomes less and less similar, the cloud of points on the
h-Scatterplot becomes fatter and more diffuse (Hohn, 1988; Isaaks and
Srivastava, 1989).
The following observations are readily apparent from the previous two figures:
Sequence A is Regionalized Variable and shows spatial continuity over about 3
units of distance.
Good correlation after three units of translation is shown
Correlation is 0.21 after five units of translation
Sequence B is a Random Variable with no spatial continuity.
Poor correlation after one unit of translation is shown
Correlation approaches 0 after two units of translation
SPATIAL AUTO-CORRELATION
Spatial auto-correlation describes the relationship between regionalized variables
sampled at different locations. Samples that are auto-correlated are not
independent with regard to distance. The closer two variables are to each other in
space the more likely they are to be related. In fact, the value of a variable at one
location can be predicted from values sampled at other (nearby) locations.
The two common measures of spatial continuity are the variogram and its close
relative, the correlogram, which allow us to quantify the continuity, anisotropy and
azimuthal properties of our measured data set.
THE VARIOGRAM
Regionalized variable theory uses the concept of semivariance to express the
relationship between different points on a surface. Semivariance is defined as:
(h) = [1/2N(h)] [(zxi) -( zxi+h)]2
Where:
= semivariance
h = lag (separation distance)
zxi = value of sample located at point xi
zxi+h = value of sample located at point xi+h
N(h) = total number of sample pairs for the lag interval h.
Semivariance is used to describe the rate of change of a regionalized variable as
a function of distance. We know intuitively that there should be no change in
values (semivariance = 0) between points located at a lag distance h = 0,
because there are no differences between points that are compared to
themselves. However, when we compare points that are spaced farther apart, we
see a corresponding increase in semivariance (the higher the average
semivariance, the more dissimilar the values of the attribute being examined). As
the distance increases further, the semivariance eventually becomes
approximately equal to the variance of the surface itself. This distance is the
greatest distance over which a variable measured at one point on the surface is
related to that variable at another point.
Semivariance is evaluated by calculating (h) for all pairs of points in the data set
and assigning each pair to a lag interval h. If we plot a graph of semivariance
versus lag distance, we create a variogram (also known as a semivariogram).
The variogram measures dissimilarity, or increasing variance between points
(decreasing correlation) as a function of distance. In addition to helping us assess
how values at different locations vary over distance, the variogram provides a
way to study the influence of other geologic factors which may affect whether the
spatial correlation varies only with distance (the isotropic case) or with direction
and distance (the anisotropic case).
Because the variogram is the sum of the squared differences of all data pairs
falling within a certain lag distance, divided by twice the number of pairs found for
that lag, we use the variogram to infer the correlation between points. That is,
rather than showing how two points are alike, or predicting attribute value at each
point, we actually plot the difference between each value over a given lag
distance.
THE CORRELOGRAM
The correlogram is another measure of spatial dependence. Rather than
measuring dissimilarity, the correlogram is a measure of similarity, or correlation,
versus separation distance.
C(h) = 1/n Z(xi) -m] [Z(xi+h) -m]
Where:
m is the sample mean over all paired points, n(h), separated by distance h.
Computing the covariance for increasing lags (double, triple, etc.) allows us to
generate a plot showing decreasing covariance with distance, as shown in
Figure 1a and 1b: omni-directional variogram (A)
Figure 1a
1b
In this graphic, we see that while the variogram in Frame A measures increasing
variability (dissimilarity) with increasing distance, the correlogram in Frame B
measures decreasing correlation with distance.
Figure 2
3b
Figure 1a
and 1c
1c
1b
In this graphic, the solid squares on the figures represent the average of porosity
or acoustic impedance data pairs for each 500-unit lag interval. The numbers of
data pairs are displayed next to the average experimental data point. The first
point contains only one data pair and should not be taken into consideration
during the modeling step.
Below are the general variogram equations for the primary attribute (porosity), the
secondary attribute (acoustic impedance), and their cross variogram.
Consider the following:
Z(xi) = the primary attribute measured at location xi
Z(xi + h) = the primary attribute measured at xi + some separation distance
(lag), h
T(xi) = the secondary attribute measured at location xi
T(xi + h) = the secondary attribute measured at xi + some separation
distance (lag), h
N = the number of data points
then
The variogram of the primary attribute is calculated as(Figure 1a):
(h) = 1/2N Z(xi)] -[Z(xi+h)]2
The variogram of the secondary attribute is calculated as(Figure 1b):
(h) = 1/2N T(xi)] -[T(xi+h)]2
The cross variogram between the primary and secondary attribute is
calculated as(Figure 1c):
(h) = 1/2N Z(xi) -Z(xi+h)] [T(xi) -T(xi+h)]
SUPPORT EFFECT
Most reservoir studies are concerned with physical rock samples, with
observations corresponding to a portion of rock of finite volume. It is obvious that
once a piece of rock is collected (e.g. cores, hand samples, etc.) from a location,
it is impossible to collect it again from the same location (Henley, 1981).
The shape and volume of the rock are collectively termed the support of the
observation. If the dimensions of the support are very small in comparison to the
sampling area or volume (e.g. the reservoir), the sample can be considered as
point data (Henley, 1981). For much of our work the support does not influence
our mapping, until we compare samples of different sizes or volumes (e.g. core
plugs and whole core, or wireline data and core).
The support effect becomes significant when combining well data and seismic
data. Seismic data can not be considered point data. The volumes of
measurement differences between well logs and cores versus seismic data is
very large and should not be ignored. Geostatistical methods can account for the
support effect using a variogram approach. As the support size increases, the
variance decreases until it remains constant after reaching a certain area or
volume (Wackernagel, 1995).
HANDLING TRENDS
The following approach is often used to “detrend” the data. However, this
approach also has its problems.
MODEL VARIOGRAMS
The experimental variogram and correlogram described in the previous section
are calculated only along specific inter-distance vectors, corresponding to
angular/distance classes. After computing the experimental variogram, the next
step is to define a model variogram. This variogram is a simple mathematical
function that models the trend in the experimental variogram. In turn, this
mathematical model of the variogram is used in kriging computations.
The kriging and conditional simulation processes require a model of spatial
dependency, because:
Kriging requires knowledge of the correlation function for all-possible
distances and azimuths.
The model smoothes the experimental statistics and introduces geological
information.
Kriging cannot fit experimental directional covariance models
independently, but depends upon a model from a limited class of
acceptable functions.
Consider a random function Z (x) with an auto-covariance C(h):
Define an estimator, Z = iZ(xi)
The variance of Z is given by: z2 = ijC(xi -xj) 0
The variance must be positive (positive definiteness criterion)
for any choice of weights (i and j),
and any choice of locations (x i and xj)
To honor the above inequality, the experimental covariance model must be
fit with a positive definite C(h).
Spatial modeling is not curve fitting, in the least squares sense. A least squares
model does not satisfy the positive definiteness criterion. The shape of the
experimental model usually constrains the type of model selected, although any
model can be applied, affecting the final kriged results.
Figure 1a
1c,
1b
and 1d:
1d
Frame A shows purely random behavior. Frame B is linear, with some degree of
random component. Frame C is highly continuous, while Frame D exhibits linear
behavior.
Figure 2a
2b
In this example, we note that the bounded variogram (Frame A) reaches a sill and remains
at the sill value for an infinite distance. This behavior is typical of the classic variogram.
Meanwhile, the unbounded variogram (Frame B) never plateaus at the sill, but shows a
continuous increase in variance with increasing distance. Variograms displaying this
characteristic are typical of data that possess a trend.
Figure 3a
3c,
3b
3d,
3c
3e,
3d
3f,
3e
and 3g
3g
3f
The Spherical model is the most commonly used model, followed by the
Exponential. The Gaussian and Exponential functions reach the sill
asymptotically. For such functions, the range is arbitrarily defined as the distance
at which the value of the function decreases to 5%. The Nested model is a linear
combination of two spherical structures; having short and long scale components.
The “Hole” model is used for variograms computed from data which has a
repeating pattern. The Hole model can be dangerous because the periodicity will
show up in a map, although it is not present in the data. There does not appear to
be any relationship between depositional environment and variogram shape.
Below are equations for four common variogram models.
LINEAR MODEL
The linear model describes a straight line variogram. This model has no sill, so
the range is defined arbitrarily to be the distance interval for the last lag class in
the variogram. (Since the range is an arbitrary value it should not be compared
directly with ranges of other models.) This model is described by the following
formula:
(h) = Co + [h(C/Ao)]
where
h = lag interval,
Co = nugget variance > 0,
C = structural variance > Co, and
Ao = range parameter
SPHERICAL MODEL
The spherical model is a modified quadratic function where the range marks the
distance at which pairs of points are no longer autocorrelated and the
semivariogram reaches an asymptote. This model is described by the following
formula:
(h) = Co + C [1.5(h/A o) -0.5(h/Ao)3] for h < Ao
(h) = Co + C for h > Ao
where
h = the lag distance interval,
Co = nugget variance > 0,
C = structural variance > Co, and
Ao = range
EXPONENTIAL MODEL
This model is similar to the spherical variogram in that it approaches the sill
gradually, but differs in the rate at which the sill is approached and in the fact that
the model and the sill never actually converge. This model is described by the
following formula:
(h) = Co + C[1-exp(-h/Ao)]
where
h = lag interval,
Co = nugget variance > 0,
C = structural variance > Co, and
Ao = range parameter
In the exponential model, Ao is a parameter used to provide range, which, in the
exponential model, is usually assumed to be the point at which the model
approaches 95% of the sill (C+Co). Range is estimated as 3Ao.
GAUSSIAN MODEL
The gaussian or hyperbolic model is similar to the exponential model but
assumes a gradual rise for the y-intercept. This model is described by the
following formula:
(h) = Co + C[1-exp(-h2/Ao2)]
where
h = lag interval,
Co = nugget variance > 0,
C = structural variance > Co, and
Ao = range parameter
The range parameter in this model is simply a constant defined as that point at
which 95% of the sill is approached. The range can be estimated as 1.73A o (1.73
is the square root of 3).
KRIGING OVERVIEW
INTRODUCTION
Contouring maps by hand or by computer requires the use of some type of
interpolation procedure. As previously shown in the section on Gridding and
Interpolation, there are a number of algorithms for computer-based interpolation.
At the other end of the spectrum is the geologist who maps by hand, interpolating
between data points (or extrapolating beyond the control points), drawing
contours, smoothing the map to make it look “real” and perhaps biasing the map
with a trend based on geological experience (Hohn, 1988). This section provides
a broad overview of the computer-intensive interpolation process which lies at the
heart of geostatistical modeling.
LINEAR ESTIMATION
Kriging is a geostatistical technique for estimating attribute values at a point, over
an area, or within a volume. It is often used to interpolate grid node values in
mapping and contouring applications. In theory, no other interpolation process
can produce better estimates (being unbiased, with minimum error); though the
effectiveness of the technique actually depends on accurately modeling the
variogram. The accuracy of kriging estimates is driven by the use of variogram
models to express autocorrelation relationships between control points in the
data set. Kriging also produces a variance estimate for its interpolation values.
The technique was first used for the estimation of gold ore grade and reserves in
South Africa (hence the origin of the term Nugget Effect), and it is named in
honor of a South African mining engineer, Danny Krige. The mathematical validity
and foundation was developed by Georges Matheron, who later founded the
Centre de Geostatistiques, as part of the Ecole’ des Mines in Paris, France.
(Henley, 1981; Hohn, 1988; Journel, 1989; Isaaks and Srivastava, 1989; Deutsch
and Journel, 1992; Wackernagel, 1995).
KRIGING FEATURES
Kriging is a highly accurate estimation process which:
minimizes estimation error (the difference between measured value - the
re-estimated value)
honors “hard” data
does not introduce an estimation bias
does not reproduce inter-well variability
produces a “smoothed” result; like all interpolators
is a univariate estimator; requiring only one covariance model
weighs control points according to a spatial model (variogram)
tends to the mean value when control data are sparse
uses a spatial correlation model to determine the weights ()
assigns negative or null weights to control points outside the correlation
range of the spatial model
indicates the global relative reliability of the estimate through RMS error
(kriging variance), as a by-product of kriging
has a general and easily reformulated kriging matrix, making it a very
flexible technique to use more than one variable
declusters data before the estimation
Types Of Kriging
There are a number of kriging algorithms, and each is distinguished by how the
mean value is determined and used during the estimation process. The four most
commonly used methods are:
Simple Kriging: The global mean is known (or supplied by the user), and
is held constant over the entire area of interpolation.
Ordinary Kriging: The local mean varies, and is re-estimated based on
the control points in the current search neighborhood ellipse.
Kriging with an External Drift: Although this method uses two variables,
only one covariance model is required, and the shape of the map is
related to a 2-D attribute which guides the interpolation of the primary
attribute known only at discrete locations. A typical application is time-
to-depth conversion, where the primary attribute (such as depth at the
wells) acquires its shape from the secondary attribute, referred to as
external drift (such as two-way travel time known on a 2-D grid).
Indicator Kriging: estimates the probability of an attribute at each grid
node (e.g., lithology, productivity). The technique requires the following
parameters:
Coding of the attribute in binary form, as 0 or 1.
Prior Probabilities of both classes.
Spatial covariance model of the indicator variable.
Figure 1
Given samples located at (Z ), where = 1, 2, 3 Find the most likely value of the
variable Z at the target point (grid node: Z 0*, Figure 1). In this graphic, we see the
geometrical arrangement of three data points Z , the location of the point whose
value we wish to estimate Z0*, and the unknown weights, .
Consider Z0* as a linear combination of the data Z
Z0* = 0 + Z
Where: = 1 and 0 = mz -
Determine so that:
Z0* is unbiased: E [Z0* -Z] = 0
Z0* has minimum mean square error (MSE)
E [Z0* -Z]2 is minimum
Recall that the unknown value Z 0* is estimated by a linear combination of n data
points plus a shift parameter 0:
Z0* = 0 + Z (1)
By transforming the above equation into a set of linear normal equations, we
solve the following to obtain the weights . The set of linear equations takes the
following form:
j C (x , xj) - = c (x , x 0) for all j = 1,n (2)
or in matrix shorthand notation:
C=c (3)
All three terms are matrices where:
C (x , x j) represents a covariance between sample points x and xj
c (x , x 0) represents a covariance between a sample located at x and the
target point x 0; the estimated point
are the unknown weights, j
is a Lagrange multiplier that converts a constrained minimization problem
into an unconstrained minimization.
Determine the matrix of unknown weights by solving the matrix equation for as
follows:
C=c (4)
Where
= C-1 c (5)
Note that equation 3 is written in terms of covariance values, however we either
modeled a variogram or correlogram, not the covariance. We use the covariance
values because it is computationally more efficient.
The covariance equals the sill minus the variogram (Figure 2 : Relationship
between a spherical variogram and its covariance equivalent):
Figure 2
3b
)
Number of sectors ( 4 or 8 are common)
Figure 3a
Figure 4a
4c,
4b
4d,
4c
4e,
4d
4f,
4e
4g,
4f
4h,
4g
and 4i:
4i
Kriging results from a common data set, based on different variogram
models.
4h
In this graphic, Frames A-H use the isotropic neighborhood design shown in
Figure 3b. The nested model (Frame F) used two spherical variograms, with a
short range = 1000 meters and a long range of 10,000 meters. Nested models
are additive. The anisotropic model (Frame I) used the anisotropic neighborhood
design shown in Figure 3. The minor axes of the variogram model = 1000 meters,
with a major axis = 5000 meters (5:1 anisotropy ratio), rotated to N15E. The color
scale is equivalent for all figures. Purple is 5% porosity and red is 13%. All these
illustrations were created using the same input data set.
Advantages Of Kriging
Kriging is an exact interpolator (if the control point coincides with a grid
node).
Kriging variance:
Relative index of the reliability of estimation in different regions.
Good indicator of data geometry.
Smaller nugget (or sill) gives a smaller kriging variance.
Minimizes the Mean Square Error.
Can use a spatial model to control the interpolation process.
A robust technique (i.e., small changes in kriging parameters equals
small changes in the results).
Disadvantages Of Kriging
Kriging tends to produce smooth images of reality (like all interpolation
techniques). In doing so, short scale variability is poorly reproduced, while it
underestimates extremes (high or low values). It also requires the specification of
a spatial covariance model, which may be difficult to infer from sparse data.
Kriging consumes much more computing time than conventional gridding
techniques, requiring numerous simultaneous equations to be solved for each
grid node estimated. The preliminary processes of generating variograms and
designing search neighborhoods in support of the kriging effort also require much
effort. Therefore, kriging probably is not normally performed on a routine basis;
rather it is best used on projects that can justify the need for the highest quality
estimate of a structural surface (or other reservoir attribute), and which are
supported by plenty of good data.
CROSS-VALIDATION
Cross-validation is a process for checking the compatibility between a set of data,
the spatial model and neighborhood design. In cross-validation, each point in the
spatial model is individually removed from the model, and then its value is
estimated by a covariance model. In this way, it is possible to compare estimated
versus actual values.
The procedure consists of the following steps:
1. Consider each control point in turn.
2. Temporarily suppress each control point from the data set.
3. Re-estimate each point from the surrounding data using the covariance
model.
4. Compare the estimated values, Zest, to the true values, Ztrue.
This also provides a re-estimation error (kriging variance is also calculated
at the same time):
RE = Zest -Ztrue
5. Calculate a standardized error:
SE = RE/krig
Ideally, it should have a zero mean and a variance equal to 1.
The numerator is affected by the range
The denominator is affected by the sill
6. Average the errors for a large number of target points to obtain:
Mean error
Mean standard error
Mean squared error
Mean squared standardized error
7. Distribution of errors (in map view) can provide useful criterion for:
Selecting a search region
Selecting a covariance model
8. Any data point whose absolute Standardized Error 2.5 is considered an
outlier, based on the fact that the data point falls outside the 95%
confidence limit of a normal distribution.
1c,
1b
and 1d
1d
1c
Figure 1a shows a map view of the magnitude of the Re-estimation Error (RE).
Open circles are over-estimations; solid circles are under-estimations. The solid
red circle falls outside 2.5 standard deviation from a mean = 0. Also, look for
intermixing of the RE as an additional indication of biasing. A good model has an
equally likely chance of over or under estimating any location.
Figure 1b is a cross plot of the measured attribute of porosity at the wells versus
porosity re-estimated at the well locations during the cross validation test. Again,
open circles are under estimates. The red, solid circle is the sample from
Figure 1a.
The two most important plots are in Figure 1c and 1d because they help identify
model bias. If the histogram of standardized error (SE) in Figure 1c is skewed, if
or there is a correlation between SE and the estimated values in Figure 1d, then
the model is biased. Such is not the case in this example; however, Figure 2a,
2b,
Figure 2a
2c,
2b
2c
2d
In Figure 1a, the over estimated RE values are clustered in the center of the map.
The histogram of SE (Figure 1c) is slightly skewed towards over estimation.
Finally, there is a positive correlation between SE and estimated porosity. These
indicate poor model design, poor neighborhood design, or both.
COKRIGING
INTRODUCTION
In the previous section, we described kriging with a single attribute. Rather than
only consider the spatial correlation between a set of sparse control points, we
will now describe the use of a secondary variable in the kriging process.
In this section, you will learn about multivariate geostatistical data integration
techniques, which fall into the general category called Cokriging and you will
learn more about External Drift.
There are many situations when it is possible to study the covariance between
two or more regionalized variables. The techniques introduced in this section are
appropriate for instances when the primary attribute of interest (such as well
data) is sparse, but there is an abundance of related secondary information (such
as seismic data). The mutual spatial behavior of regionalized variables is called
coregionalization.
The estimation of a primary regionalized variable (e.g., porosity) from two or more
variables (such as acoustic impedance) is known as Cokriging.
TYPES OF COKRIGING
Cokriging is a general multivariate regression technique which has three basic
variations:
Simple Cokriging uses a multivariate spatial model and a related
secondary 2-D attribute to guide the interpolation of a primary attribute
known only at control points (such as well locations). The mean is
specified explicitly and assumed to be a global constant. The method
uses all primary and secondary data according to search criterion.
Ordinary Cokriging is similar to Simple Cokriging in that the mean is still
assumed to be constant, but it is estimated using the neighborhood
control points rather than specified globally.
Collocated Cokriging is a reduced form of Cokriging, which requires
knowledge of only the hard data covariance model, the Product-
Moment Correlation coefficient between the hard and soft data, and the
variances of the two attributes. There is also a modified search criterion
used in Collocated Cokriging. This method uses all the primary data,
but, in its simplest form, uses only one secondary data value, the value
at the target grid node.
PROPERTIES OF COKRIGING
This method is a powerful extension of kriging, which:
must satisfy the same conditions as kriging:
it minimizes the estimation error.
is an unbiased estimator
honors the “hard” data
control points are weighed according to a model of coregionalization
is more demanding than kriging:
requires a simple covariance model the for the primary and all
secondary attributes
requires cross-covariance models for all attributes
must be modeled with a single coregionalized model
Requires neighborhood searches that are more demanding
requires more computation time
DATA INTEGRATION
Besides being able to use a spatial model for determining weights during
estimation, one of the more powerful aspects of the geostatistical method is
quantitative data integration. We know from classical multivariate statistics that
models developed from two or more variables often produce better estimates. We
can extend classical multivariate techniques into the geostatistical realm and use
two or more regionalized variables in this geostatistical estimation process.
Basic Concept
We‟ll illustrate this concept by way of example. From our exploratory data
analysis we might find a good correlation between a property measured at well
locations and a certain seismic attribute. In such a case, we might want to use
the seismic information to provide better inter-well estimates than could be
obtained from the well data alone. Even when the number of primary (well) data
(e.g., porosity) are sparse, it is possible to use a densely sampled secondary
attribute (e.g., seismic acoustic impedance), in the interpolation process.
Well data have excellent vertical resolution of reservoir properties, but poor
lateral resolution. Seismic data, on the other hand, have poorer vertical resolution
than well data, but provide densely sampled lateral information. Geostatistical
data integration methods allow us to capitalize on the strengths of both data
types, to yield higher quality reservoir models.
This graphic shows the geometrical arrangement of three data control points Z
(where = 1,2,3), a grid of seismic data, the unknown weights, , and the target
grid node, Z0*.
Figure 2
Estimator
Z0* = Z1Z1 + Z2Z2+ Z3Z3 +T1T1 + T2T2 +T3T3
Estimated Error
2 = CZ00 -Z1CZ01 -Z2CZ02- Z3CZ03- T1CT01- T21CT02- T3CT03 -
Collocated Cokriging
Collocated cokriging is a modification of the general cokriging case:
It requires only the simple covariance model of the primary attribute in its
simplest form. In the case of sparse primary data, the covariance
model is often derived from the covariance model of the densely
sampled secondary attribute.
It uses all primary data according to search criterion.
It uses secondary data attribute located only at the target grid (simplest
form) node during estimation.
If the secondary attribute covariance model is assumed proportional to the
primary attribute covariance model, then:
the correlation coefficient is the constant of proportionality.
we can use the correlation coefficient and the ratio of the secondary to
primary variances to transforms a univariate covariance model into a
multivariate covariance model. This assumption is termed Markov-
Bayes assumption (Deutsch and Journel, 1992)
Example Using 3 Data Points
Figure 3 illustrates a typical data configuration for collocated cokriging.
Figure 3
In this graphic, the secondary data at the estimation grid node is the only bit of
seismic data used in this forma of the algorithm. Forms that are more complex
combine this data configuration with the one shown in Figure 1, which also
increases the computation time substantially.
Estimator
Z0* = Z1Z1 + Z2Z2+ Z3Z3 +T0T0
Estimated Error
2 = CZ00 -Z1CZ01 -Z2CZ02-Z3CZ03- T0CT0 -
General Cokriging Versus Collocated Cokriging
Cokriging
A secondary variable is not required at all nodes of the estimation grid.
The traditional method does not incorporate secondary information from
non-collocated data points, it uses only data located at the primary
sample locations.
Cokriging requires more modeling effort: CZZ(h), CYY(h) and CZY(h) must be
specified. No assumption regarding the relationship between the cross-
covariance and the auto-covariance of the primary variable is required.
It is impractical to incorporate more than two to three secondary variables
into the cokriging matrix because of increased modeling assumptions
and computational time.
System of normal equations may be ill conditioned; that is, it is often
difficult to find a common model of coregionalization.
Collocated Cokriging
Collocated cokriging assumes that the secondary variable is known at all
nodes of the estimation grid and uses all secondary information during
the estimation process.
The simplest form of collocated cokriging ignores the influence of non-
collocated secondary data points, because it uses the secondary data
located only at the target grid node.
Collocated cokriging only requires the knowledge of the primary
covariance model (CZZ(h)), the variances of the primary and secondary
attributes (2Z, 2T) and the correlation coefficient between the primary
and secondary attributes (ZY).
The Markov-Bayes approach to collocated cokriging assumes that the
cross-covariance is a scaled version of the primary variable auto-
covariance.
In general, the system of normal equations is well conditioned.
The objective of KED is to use the seismic data as a correlated shaping function,
a true regression approach, to construct the final depth map. Four wells
intersected the top of a reservoir. Kriging was used to map the solid curved
surface through the data points. This surface is a second or third order
polynomial. The seismic two-way time data (lightweight line) is the External Drift.
The seismic travel times correlate with the measured depth at the wells and
suggest a much more complex surface than the surface created using only the
well data.
KED is appropriate when shape is an important aspect of the study. The
approach assumes a perfect correlation between the well and seismic data. KED
is not an appropriate approach for mapping reservoir rock properties; the
collocated cokriging is the better choice.
Figure 2 shows a three data point KED example.
Figure 2
The primary data are located at Z and the secondary data at S . Note that KED
also uses the secondary information at the target grid node. This data
configuration can also be used for a more rigorous application of the collocated
cokriging method.
Estimator
Z0* = 1Z1 + 2Z2+ 3Z3
Estimated Error
2 = K00 -1K01 -2K02- Z3K03 -0 -1S0
Advantages of KED
Allows direct integration of a secondary attribute during estimation of the
primary data.
Easier to implement than cokriging or collocated cokriging because it does
not require any secondary attribute modeling.
Neighborhood search is identical to kriging.
Computation time is similar to kriging a single variable.
Limitations of KED
May be difficult to infer the covariance of the residuals (local features).
There is no means to calibrate and control the influence of the secondary
variable because the method is a true regression model and assumes
a perfect correlation between the two data types.
KED system may be unstable if the drift is not a smoothly varying function.
MEASUREMENT ERRORS
Kriging, cokriging and conditional simulation algorithms are flexible enough to
take measurement errors in the primary variable into account. At a data point i,
the measured value of Z i = Si + i, where Si is the true value and i is the
unknown measurement error. If true, then assume that:
The errors are random; that is not spatially correlated.
The expected mean value of the errors is equal to zero.
The errors are independent of the true values.
The errors have a Gaussian distribution.
Using these assumptions, decompose the data values into:
A signal component with a constant variance, S2.
Zero-mean Gaussian white noise uncorrelated with the signal. It is also
assumes that the variance of the noise, i2, is known at every primary
data location.
Figure 1a
1c,
1b
and 1d
1d
Figure 2a
2c,
2b
and 2d
2d
2c
3b,
Figure 3a
3c,
3b
and 3d
3d
3c
Figure 1a shows porosity data points from well log information. Figure 1b shows
variograms derived from the well data, with the experimental variogram (thin line
labeled D1) superimposed on the model variogram. In Figure 1c, the porosity
data were kriged to the seismic grid using the omni-directional, spherical
variogram model having a range of 1500 meters (shown in Figure 1b). The
isotropic search neighborhood used an octant search with 2 points per sector.
Figure 1d shows the seismic acoustic impedance data. The seismic data resides
on a grid of approximately 12 by 24 meters in X and Y, respectively. This is the
grid mesh used for all the following examples, including Figure 1c.
Figure 2a, 2b, 2c, and 2d illustrate an example of traditional cokriging. Porosity
(Figure 2a) and acoustic impedance (Figure 2b) were modeled with an omni-
directional, spherical variogram with a range of 1500 meters. The lines labeled
D1 represent the experimental variograms upon which the model variograms are
based. The cross variogram (Figure 2c) uses the same spherical model. The
cross variogram shows an inverse relationship between porosity and acoustic
impedance (the correlation is -0.83). The curved, dashed lines show the bounds
of perfect positive or inverse correlation. The sill of the cross variogram reflects
the magnitude of the -0.83 correlation between the data. Figure 2d shows the
results of cokriging using the cross-variogram model from Figure 2c.
Figure 3a, 3b, 3c, and 3d illustrate collocated cokriging (Figure 3a-c) and kriging
with external drift (Figure 3d). The model for the collocated kriging was derived
from analysis and modeling of the seismic acoustic impedance data from the
West Texas data set. Lines D1 and D2 represent experimental variograms taken
from two different directions, based on the anisotropic search neighborhood. The
well porosity data is sparse (55 data points) in comparison to the densely
sampled seismic data (33,800 data points). We are justified in using the nested,
anisotropic seismic data variogram model (Figure 3a) as a model of porosity
based on the high correlation coefficient (-0.83). Thus, we can use the Markov-
Bayes assumption to create the seismic variogram from the porosity variogram,
calibrate them using the correlation coefficient, and scale them based on their
individual variances. Figure 3b is a result of a Markov-Bayes collocated cokriging
using a correlation of -0.83. Figure 3c is also a collocated cokriging using the
Markov-Bayes assumption, except the correlation coefficient in this case was set
to -0.1. Figure 3c illustrates the condition of self-krigability, when the secondary
attribute has no correlation to the primary attribute, thus reverting to a simple
kriging solution. Although not a totally appropriate use of KED (Figure 3d), the
porosity map using KED shows a slightly wider range of porosity values. This
approach would be similar to a Markov-Bayes assumption using a -1.0
correlation.
INTRODUCTION
Stochastic modeling, also known as conditional simulation, is a variation of
conventional kriging or cokriging. An important advantage of the geostatistical
approach to mapping is the ability to model the spatial covariance before
interpolation. The covariance models make the final estimates sensitive to the
directional anisotropies present in the data. If the mapping objective is reserve
estimation, then the smoothing properties of kriging in the presence of a large
nugget may be the best approach. However, if the objective is to map directional
reservoir heterogeneity (continuity) and assess model uncertainty, then a method
other than interpolation is required (Hohn, 1988).
Once thought of as stochastic “artwork”, useful only for decorating the walls of
research centers (Srivastava, 1994a), conditional simulation models are
becoming more accepted into our day-to-day reservoir characterization-modeling
efforts because the results contain higher frequency content, and lend a more
realistic appearance to our maps when compared to kriging.
Srivastava (1994a) notes that, in an industry that has become too familiar with
layer-cake stratigraphy, with lithologic units either connected from well-to-well or
that conveniently pinch out halfway, and contour maps that show gracefully
curving undulations, it is often difficult to get people to understand that there is
much more inter-well heterogeneity than depicted by traditional reservoir models.
Because stochastic modeling produces many, equi-probable reservoir images,
the thought of needing to analyze more than one result, let alone flow simulate all
of them, changes the paradigm of the traditional reservoir characterization
approach. Some of the realizations may even challenge the prevailing geological
wisdom, and will almost certainly provide a range of predictions from optimistic to
pessimistic (Yarus, 1994).
Most of us are willing to admit that there is uncertainty in our reservoir models,
but it is often difficult to assess the amount of uncertainty. One of the biggest
benefits of geostatistical stochastic modeling is the assessment of risk or
uncertainty in our model. To paraphrase Professor Andre Journel “… it is better
to have a model of uncertainty, than an illusion of reality.”
Before reviewing various conditional simulation methods, it is useful to ask what
is it that we want from a stochastic modeling effort. We really need to consider
the goal of the reservoir modeling exercise itself, because the simulation method
we choose depends, in large part, on the goal of the study and the types of data
available. Not all conditional simulation studies need the Cadillac approach, when
a Volkswagen technique will do fine (Srivastava, 1994a).
Uncertainty Estimation
Once all of these simulated images have been generated, how do you determine
which one is correct? Technically speaking, any one of the simulated images is a
possible realization of the reservoir, because each image is equally likely, based
on the data and the spatial model. However, just because the image is
statistically equally probable does not mean it is geologically acceptable. You
must look at each simulated image to determine if it is a reasonable
representation of what you know about the reservoir -if not, discard it, and run
more simulations if necessary.
Some of the possible maps generated from a suite of simulated images include:
Mean: This map is the average of n conditional simulations. At each cell,
the program computes the average value, based on the values from all
simulations at the same location. When the number of input simulations
is large, the resultant map converges to the kriged solution.
Minimum: Each cell displays the smallest value from all input simulations.
Maximum: Each cell displays the largest value from all input simulations.
Standard Deviation: A map of the standard deviation at each grid cell,
computed from all input maps. This map is used as a measure of the
standard error and is used to analyze uncertainty.
Uncertainty or Risk: This map displays the probability of meeting or
exceeding a user specified threshold value at each grid cell. The grid
cell values range between 0 and 100 percent.
Iso-Probability: These maps are displayed in terms of the attribute value at
a constant probability threshold.
1b
)
Figure 1a
3b,
Figure 3a
and 3c
3c
3b
Figure 4a,
Figure 4a
4b, 4c,
4b
and 4d
4d
illustrate a risk map for two different porosity cutoffs and minimum and maximum
value maps.
4c
Figure 5a
One commonly used transformation that transforms any data set into a Normal
Distribution is the Hermite polynomial method (Wackernagel, 1995: Hohn, 1998).
This method fits a polynomial with n terms to the histogram and maps the data
from one domain to another. Variogram modeling, kriging and simulation are
performed on the transformed variable. Then, back-transform the gridded results
using the stored Hermite coefficients. A Hermite polynomial transform on 55
porosity data is shown in Figure 5a, 5b, 5c,
5b
and 5d
5d
.
5c
The shape in Figure 5a shows a truncated porosity distribution (no values lower
than 6 %) because a cut-off was used for pay estimation. This approach to pay
estimation creates the skewed distribution. If we want to honor this histogram
(Figure 5a) in the simulation process, we must transform the original data into a
Gaussian (normal) distribution (Figure 5b). Figure 5c and Figure 5d show the
results of a Hermite polynomial modeling approach to transform the data. The
modeled histogram (blue) is superimposed on the raw histogram (black) in
Figure 5c. The cumulative histogram (Figure 5d) shows a reasonably good match
between the model (blue) and the original (black) data. The purpose of the
transformation is to model the overall shape and not every nuance of the raw
data, which may only be an approximation.
OVERVIEW
INTRODUCTION
Several very good public domain geostatistical mapping and modeling packages
are available to anyone with access to a personal computer. In this section, five
software packages are reviewed, with information on how to obtain them. For a
more complete review, see the article by Clayton (1994).
The geostatistical packages STATPAC, Geo-EAS, GEOPACK, Geostatistical
Toolbox, and GSLIB are reviewed according to their approximate chronological
order of appearance in the public domain. These packages are fairly
sophisticated, reflecting the evolution in personal computer graphics, interfaces,
and advances in geostatistical technology.
These programs are placed into the public domain with the understanding that
the user is ultimately responsible for its proper use. Geostatistical algorithms are
complicated to program and debug, considering all the possible combinations of
hardware and operating systems.
Another word of caution is that the different authors use different nomenclature
and mathematical conventions, which just confuses the issue further. The
International Association of Mathematical Geologists has attempted to
standardize geostatistical jargon through the publication of Geostatistical
Glossary and Multilingual Dictionary, edited by Richardo Olea.
STATPAC
STATPAC (STATistical PACkage) is a collection of general-purpose statistical
and geostatistical programs developed by the U. S. Geological Survey. The
programs were complied in their current form by David Grundy and A. T. Miesch.
It was released as USGS Open-File Report 87-411-A, 87-411-B, and 87-411-C,
and was lasted updated in May 1988.
The programs were originally developed for use in applied geochemistry and
petrology within the USGS. The geostatistical program only works for 2-
dimensional spatial data analysis. This early program was developed for the older
XT PCs, and thus does not take advantage of the quality graphical routines now
available. The limited graphic capabilities may discourage beginning practitioners
from using this software, even though STATPAC may have some advantages
over other public domain software (Clayton, 1994).
Order STATPAC from the following sources:
Books and Open-File Reports
U. S. Geological Survey
Federal Center
P. O. Box 25425
Denver, CO 80225
Telephone: (303) 236-7476
Order Reports: OF 87-411-A, B, C
Cost: About $100
GeoApplications
P. O. Box 41082
Tucson, AZ 85717-1082
Telephone: (602) 323-9170
Fax: (602) 327-7752
Cost: call or fax for current pricing
GEO-EAS
Evan Englund (USGS) and Allen Sparks (Computer Sciences Corporation)
developed Geo-EAS (Geostatistical Environmental Assessment Software) for the
U.S. Environmental Protection Agency for environmental site assessment and
monitoring of data collected on a spatial network. Version 1.2.1 was compiled in
July 1990.
Geo-EAS provides practical geostatistical applications for individuals with a
working knowledge of geostatistical concepts. The integrated program layout,
interface design, and excellent user's manual makes this an excellent
instructional or self-study tool for learning geostatistical analysis (Clayton, 1994)
Order Geo-EAS from the following sources:
Computer Oriented Geological Survey
P. O. Box 370246
Denver, CO 80237
Telephone: (303) 751-8553
Cost: call for current pricing
National Technical Information Service
Springfield, VA 22161
Telephone: (707) 487-4650
Fax: (703) 321-8547
Cost: about $100
IGWMC USA
Institute for Ground-Water Research and Education
Colorado School of Mines
Golden, CO 80401-1887
Telephone: (303) 273-3103
Fax: (303) 272-3278
Cost: call for current pricing
GeoApplications
P. O. Box 41082
Tucson, AZ 85717-1082
Telephone: (602) 323-9170
Fax: (602) 327-7752
Cost: call or fax for current pricing
GEOPAC
This is a geostatistical package suitable for teaching, research and project work
released by the EPA. S. R. Yates (U. S. Department of Agriculture) and M. V.
Yates (University of California-Riverside) developed GEOPACK. Version 1.0 was
released in January 1990.
GEOPACK is useful for mining, petroleum, environmental, and research projects
for individuals who do not have access to a powerful workstation or mainframe
computer. It is designed for both novice and experienced geostatistical
practitioners (Clayton, 1994)
Order GEOPACK from:
Computer Oriented Geological Survey
P. O. Box 370246
Denver, CO 80237
Telephone: (303) 751-8553
Cost: call for current pricing
“GEOPACK”
Robert S. Kerr Environmental Research Laboratory
Office of Research and Development
U. S. EPA
Ada, OK 74820
IGWMC USA
GEOSTATISTICAL TOOLBOX
FSS International, a consulting company specializing in natural resources and
risk assessment, makes Geostatistical Toolbox available to the public. The
program was developed and written by Roland Froidevaux, with Version 1.30
released in December 1990.
Geostatistical Toolbox provides a PC based interactive, user-friendly
geostatistical toolbox for workers in mining, petroleum, and environmental
industries. It is also suitable for teaching and academic applications. The program
has been rigorously tested and is recommended for anyone wanting an excellent
2-dimensional geostatistical package.
Order Geostatistical Toolbox from the following sources:
Computer Oriented Geological Survey
P. O. Box 370246
Denver, CO 80237
Telephone: (303) 751-8553
Cost: call for current pricing
GSLIB
The GSLIB is a library of geostatistical programs developed at Stanford
University under the direction of Andre Journel, director of the Stanford Center for
Reservoir Forecasting. Oxford University Press published the user‟s guide and
FORTRAN programs authored by Clayton Deutsch and Andre Journel (1992).
GSLIB addresses the needs of graduate students and advanced geostatistical
practitioners, but is also a useful resource for the novice. GSLIB is the most
advanced public domain geostatistical software available, offering full 2-D and 3-
D applications. The program library does not contain executable code, but rather
uncompiled ASCII FORTRAN program listings. These programs will run on any
computer platform that can compile FORTRAN. Although the user‟s guide is well
written and documents the program in an organized text-like fashion, with
theoretical background, the novice may find introductory texts, such as Hohn
(1988) or Isaaks and Srivastava (1989), useful supplementary reading.
Order GSLIB from the following sources:
Oxford University Press
Business and Customer Service
2001 Evans Road
Cary, NJ 27513
Telephone: 1-800-451-7756
Order: GSLIB: Geostatistical Software Library and User’s Guide by Clayton V.
Deutsch and Andre Journel (ISBN 0-19-507392-4)
Cost: $49.95 plus $2.50 postage
You can also order this book through most bookstores.
REFERENCES
Books
Clark, I., 1979, Practical Geostatistics, Applied Science Publishers, London, 129 p.
Cressie, N., 1991, Statistics for Spatial Data, Wiley, New York,900 p.
Davis, J. C., 1986, Statistics and Data Analysis in Geology, Second Edition, John
Wiley & Sons, New York, 646 p.
Deutsch, C. V. and A G. Journel, 1992, GSLIB: Geostatistical Software Library and
User's Guide, Oxford University Press, New York, Oxford, with software diskettes,
340 pp.
Hohn, M. E., 1988, Geostatistics and Petroleum Geology, Van Nostrand Reinhold,
NY, 264 pp.
Papers
Chu, J., Xu, W., Zhu, H., and Journel, A.G., 1991, The Amoco case study: Stanford
Center for Reservoir Forecasting, Report 4 (73 pages).
Clark, I., 1979, The Semivariogram -Part 1: Engineering and Mining Journal, Vol.
180, No. 7, pp. 90-94.
Clark, I., 1979, The Semivariogram -Part 2: Engineering and Mining Journal, Vol.
180, No. 8, pp. 90-97.
Clayton, C. M. (1994) "Public Domain Geostatistics Programs: STATPAC, Geo-
EAS, GEOPAC, Geostatistical Toolbox, and GSLIB," in Stochastic Modeling and
Geostatistics, (1994), J. M. Yarus and R. L. Chambers, Eds., AAPG Computer
Applications in Geology, No. 3, pp. 340-367.
Cressie, N., 1990, The Origins of Kriging, Mathematical Geology, Vol. 22, No. 3, pp.
239-252.
Davis, J.M, Phillips, F.M., Wilson, J.L., Lohmann, R.C., and Love, D.W., 1992, A
sedimentological-geostatistical model of aquifer heterogeneity based on outcrop
studies (Abstract) EOS, American Geophysical Union, v.73 p.122.
Fogg, G.E., Lucia, F.J, and Senger,R.K, 1991, "Stochastic simulation of interwell-
scale heterogeneity for improved prediction of sweep efficiency in a carbonate
reservoir," in Reservoir characterization II, W.L. Lake, H. B. Caroll, and T.C. Wesson,
eds., Orlando, Florida, Academic Press, p. 355-381.
Journel, A. G., 1988, Non-parametric Geostatistics for Risk and Additional Sampling
Assessment, in Principles of Environmental Sampling, L. Keith, Ed., American
Chemical Society, pp. 45-72.
Krige, D. G., 1951, A Statistical Approach to Some Basic Mine Evaluation Problems
on the Witwatersrand, J. Chem. Metall. Min. Soc. South Africa, vol. 52, pp. 119-39.
Lund, H. J., Ates, H. Kasap, E., and Tillman, R.W., 1995 Comparison of single and
multi-facies variograms of Newcastle Sandstone: measures for the distribution of
barriers to flow: SPE paper 29596, p. 507-522.
Matheron, G., 1963, Principles of Geostatistics, Economic Geology, vol. 58, pp.
1246-66.
Olea, R. A., 1977, Measuring Spatial Dependence with Semivariograms, Lawrence,
Kansas, Kansas Geological Survey, Series on Spatial Analysis, No. 3, 29 p.
Rehfeldt, K.R., Boggs, J.M, and Gelhar, L.W., 1992, Field study of dispersion in a
heterogeneous aquifer 3: geostatistical analysis of hydraulic conductivity: Water
Resources Research, v.28, p.3309 -3324.
Royle, A. G., 1979, Why Geostatistics?, Engineering and Mining Journal, Vol. 180,
pp. 92-101.
Websites
Easton, V.J., and McColl, J.H., Statistics Glossary v1.1, (HTML Editing by Ian
Jackson), http://www.cas.lancs.ac.uk/glossary_v1.1/main.html -a helpful and
authoritative glossary, with definitions explained in plain language, accompanied by
equations.
ADDITIONAL READING
McDowell,R. R., Matchen, D. L., Hohn, M. E., and Vargo, A. G. (1994). "An
Innovative Geostatistical Approach to Oil Volumetric Calculations: Rock Creek Field,
West Virginia: Abstract ." AAPG Bulletin, v. 78, p. 1332-1332.
Smyth, M. and Buckley, M.J. (1993). " Statistical Analysis of the Microlithotype
Sequences in the Bulli Seam, Australia, and Relevance to Permeability for Coal
Gas." International Journal of Coal Geology, v. 22, p. 167-187.
TERMINOLOGY
In compiling this list of geostatistical terminology, only the most commonly
encountered terms were selected. No attempt was made to duplicate the more
extensive glossary by Richardo Olea (1991). Some definitions may differ slightly
from those of Olea.
Admissibility (of semivariogram models): for a given covariance model, the
kriging variance must be 0, this condition is also known as positive definite.
Anisotropy: refers to changes in a property when measured along different axes.
In geostatistics, anisotropy refers to covariance models that have major and
minor ranges of different distances (correlation scale or lengths). This condition is
easiest seen when a variogram shows a longer range in one direction than in
another. In this module, we discuss two types of anisotropy:
Geometric anisotropic covariance models have the same sill, but different
ranges;
Zonal anisotropic covariance models have the same range, but different
sills.
Auto-correlation: a method of computing a spatial covariance model for a
regionalized variable. It measures a change in variance (variogram) or correlation
(correlogram) with distance and/or azimuth.
Biased estimates: seen when there is a correlation between standardized errors
and estimated values (see Cross-Validation). A histogram of the standardized
errors is skewed, suggesting a bias in the estimates, so that there is a chance
that one area of a map with always show estimates higher (or lower) than
expected.
Block kriging: Kriging with nearby sample values to make an estimated value for
an area; making a kriging estimate over an area, for example estimating the
average value at the size of the grid cell. The grid cell is divided into a specified
number of sub-cells, a value is kriged to each sub-cell, and then the average
value is placed at the grid node.
Cokriging: the process of estimating a regionalized variable from two or more
variables, using a linear combination of weights obtained from models of spatial
auto-correlation and cross-correlation. The multivariate version of kriging.
Conditional bias: a problem arising from insufficient smoothing which causes
high values of an attribute to be overstated, while low values are understated.
Conditional simulation: a geostatistical method to create multiple (and equally
probable) realizations of a regionalized variable based on a spatial model. It is
conditional only when the actual control data are honored. Conditional simulation
is a variation of conventional kriging or cokriging, and can be considered as an
extrapolation of data, as opposed to the interpolations produced by kriging. By
relaxing some of the kriging constraints (e.g. minimized square error), conditional
simulation is able to reproduce the variance of the control data. Simulations are
not estimations; their goal is to characterize variability or risk. The final “map”
captures the heterogeneity and connectivity mostly likely present in the reservoir.
Post processing conditional simulation produces a measure of error (standard
deviation) and other measures of uncertainty, such as iso-probability and
uncertainty maps.
Correlogram: a measure of spatial dependence (correlation) of a regionalized
variable over some distance. The correlogram can also be calculated with an
azimuthal preference.
Covariance: a measure of correlation between two variables. The kriging system
uses covariance, rather than variogram or correlogram values, to determine the
kriging weights, . The covariance can be considered as the inverse of the
variogram, and equal to the value of the sill minus the variogram model (or zero
minus the correlogram).
Coregionalization: the mutual spatial behavior between two or more
regionalized variables.
Cross-correlation: a technique used to compute a spatial cross-covariance
model between two regionalized variables. This provides a measure of spatial
correlation between the two variables. It produces a bivariate analogue of the
variogram.
Cross-validation: a procedure to check the compatibility between a data set, its
spatial model and neighborhood design. First, each sampled location is kriged
with all other samples in the search neighborhood. The estimates are then
compared against the true sample values. Significant differences between
estimated values and true values may be influenced by outliers or other
anomalies. This technique is also used to check for biased estimates produced
by poor model and/or neighborhood design.
Drift: often used to describe data containing a trend. Drift usually refers to short
scale trends at the size of the neighborhood.
Estimation variance: the kriging variance at each grid node. This is a measure
of global reliability, not a local estimation of error.
Experimental variogram: a measure of spatial dependence (dissimilarity or
increasing variability) of a regionalized variable over some distance and/or
direction. This is the variogram that is based upon the sample data; upon which
the model variogram will be fitted.
External drift: a geostatistical linear regression technique that uses a spatial
model of covariance when a secondary regionalized variable (e.g. seismic
attribute) is used to control the shape of the final map created by kriging or
simulation.
Geostatistics: the statistical method used to analyze spatially (or temporally)
correlated data and to predict the values of such variables distributed over
distance or time.
h-Scatterplot: a plot obtained by selecting a value for separation distance, h,
then plotting the pairs Z (x) and Z(x+h) as the two axes of a bivariate plot. The shape
and correlation of the cloud is related to the value of the variogram for distance,
h.
Histogram: a plot, which shows the frequency or number of occurrences (Y-axis)
of data, falling into size classes of equal width (X-axis).
Indicator variable: a binary transformation of data to either 1 of 0, depending on
whether the value of the data point surpasses or falls short of a specified cut-off
value.
Interpolation: estimation technique in which samples located within a certain
search neighborhood are weighted to form an estimate, such as the kriging
technique.
Inverse distance weighting: Non-geostatistical interpolation technique that
assumes that attributes vary according to the inverse of their separation (raised
to some power).
Iso-probability map: maps created by post processing conditional simulations to
show the value of the regionalized variable at a constant probability threshold.
For example, at the 10th, 50th (median), or the 90th percentiles. These maps
provide a level of confidence in the mapped results.
Kriging: a method of calculating estimates of a regionalized variable using a
linear combination of weights obtained from a model of spatial correlation. It
assigns weights to samples to minimize estimation variance. The univariate
version of cokriging.
Kriging variance: see estimation variance.
Lag: a distance parameter (h) used during computation of the experimental
covariance model. The lag distance typically has a tolerance of one-half the
initial lag distance.
Linear estimation method: a technique for making estimates based on a linear
weighted average of values, such as seen in kriging.
Model variogram: a function fitted to the experimental variogram as the basis for
kriging.
Moving neighborhood: a search neighborhood designed to use only a portion of
the control data point during kriging or conditional simulation.
Nested variogram model: a linear combination of two or more variogram
(correlogram) models. It has more than one range showing different scales of
spatial variability; for example, a short-range exponential model combined with a
longer-range spherical model. Often, it involves adding a nugget component to
one of the other models.
Nonconditional simulation: a method that does not use the control data during
the simulation process; quite often used to observe the behavior of a spatial
model and neighborhood design.
Nugget effect: a feature of the covariance model where the experimental points
defining the model does not appear to intersect the y-axis at the origin. The
nugget represents a chaotic or random component of attribute variability. The
nugget model shows constant variance at all ranges, but is often modeled as
zero variance at the control point (well location). Abbreviated as Co by
convention.
Ordinary (co-)kriging: a technique in which the local mean varies and is re-
estimated based on the control points in the search neighborhood ellipse (moving
neighborhood).
Outliers: data points falling outside about 2.5 standard deviation of the mean
value of the sample population possibly the result of bad data values or local
anomalies.
Point kriging: making a kriging estimate at a specific point, for example at a grid
node, or a well location.
Positive definite: see admissibility.
Random function: the random function has two components: (1) a regional
structure component manifesting some degree of spatial auto-correlation
(regionalized variable) and lack of independence in the proximal values of Z (x),
and (2) a local, random component (random variable).
Random variable: a variable created by some random process, whose values
follow a probability distribution, such as a normal distribution.
Range: the distance where the variogram reaches the sill, or when the
correlogram reaches zero correlation. Also known as the correlation range or
correlation scale, it represents the distance at which correlation ceases. It is
abbreviated as a by convention.
Regionalized variable: a variable that has some degree of spatial auto-
correlation and lack of independence in the proximal values of Z (x).
Risk map: see Uncertainty Map
Simple kriging: the global mean is constant over the entire area of interpolation
and is based on all the control points used in a unique neighborhood (or is
supplied by the user).
Semivariogram: a measure of spatial dependence (dissimilarity or increasing
variability) of a regionalized variable over some distance; a plot of similarity
between points as a function of distance between the points. The variogram can
also be calculated with an azimuthal preference. The semivariogram is commonly
called a variogram. See also correlogram.
Sill: the upper level of variance, where the variogram reaches its correlation
range. The variance of the sample population is the theoretical sill of the
variogram.
Smearing: a condition produced by the interpolation process where high-grade
attributes are allowed to influence the estimation of nearby lower grades.
Stationarity: the simplest definition is that the data do not exhibit a trend; spatial
statistical homogeneity. This implies that a moving window average shows
homogeneity in the mean and variance over the study area.
Stochastic modeling: used interchangeably with conditional simulation,
although not all stochastic modeling applications necessarily use control data.
Support: the size, shape, and geometry of volumes upon which we estimate a
variable. The effect of which is that attributes of small support are more variable
than those having a larger support.
Transformation: a mathematical process used to convert the frequency
distribution of a data set from Lognormal to Normal.
Unique neighborhood: a neighborhood search ellipse that uses all available
data control points. The practical limit is 100 control points. A unique
neighborhood is used with simple kriging.
Uncertainty map: these are maps created by post processing conditional
simulations. A threshold value is selected, for example, 8 % porosity, an
uncertainty map shows at each grid node, the probability that porosity is either
above or below the chosen threshold.
Variogram: geostatistical measure used to characterize the spatial variability of
an attribute.
Weights: values determined during an interpolation or simulation, that are
multiplied by the control data points in the determination of the final estimated or
simulated value at a grid node. To create a condition of unbiasness, the weights,
, sum to unity for geostatistical applications.
SUGGESTED REFERENCE
Olea, R. A., 1991, Geostatistical Glossary and Multilingual Dictionary, New York,
Oxford University Press, 177 pages.