Basic Geostatistics

BASIC GEOSTATISTICS
BACKGROUND
Many geoscientists, particularly geologists, do not like to deal with mathematics, except perhaps at a rudimentary
level, if that! Traditional geology is qualitative in nature, based soundly on classification schemes and descriptions
associated with physical phenomena. In his book, Stochastic Modeling and Geostatistics (1994), Yarus discusses
what he believes to be the central problem with many geological reservoir models. The terminology, although
meaningful to the geologist, is largely qualitative.
This qualitative terminology has value in allowing the geoscientist to describe structural surfaces such as anticlines
and synclines. Depositional environments are described in terms of deltas, reefs, or deep sea fans, along with
onlapping or downlapping surfaces. Finally, lithologies are sandstones, limestones, and shales. Although
descriptive, these terms allow great latitude in their meaning and are subject to individual interpretation. The
interpretation quality largely depends upon the experience of the interpreter, although we would expect other
geologists to produce similar yet slightly different results, all of which may be valid reservoir models.
This presentation describes both classical statistical and geostatistical methodologies designed to quantify
qualitative information and to transform a traditional geological reservoir description (a surface or 3D volume) into
a mathematical reservoir model.
The material presented herein is not heavily mathematical. Although mathematical formalism is kept to a minimum,
the presentation is not simplistic. General equations and matrix notation are used in some sections when
appropriate. The equations are presented mostly for informational purposes, because calculations used in
geostatistics tend to be involved and tedious. These are not the kind of calculations that you would carry out on a
hand-held calculator; rather, they are very computer-intensive.
GEOSTATISTICS DEFINED
Geostatistics may be defined as a study of spatial correlation between variables. This rapidly evolving branch of
applied statistics and mathematics offers a collection of tools aimed at understanding and modeling spatial
variability.
Spatial variability includes scales of connectivity (heterogeneity) as well as directionality within the data sets.
Reservoir data exhibit spatial connectivity to greater or lesser degrees, because as the distance between two data
points increases, the similarity between the two measurements decreases. And because reservoirs tend to exhibit
anisotropy, we will also see that similarity between two measurements will change with direction. By understanding
how data values vary with distance and direction, we can interpolate values at unsampled locations throughout our
study area.
Origins of Geostatistics
The origins of geostatistics are found exclusively in the mining industry. D.G. Krige, a South African mining
engineer, and H.S. Sichel, a statistician developed this estimation method in the early 1950s, when classical
statistics were found unsuitable for estimating disseminated ore reserves.
Georges Matheron, a French engineer, developed Kriges innovative concepts and formalized them within a single
framework with his Theory of Regionalized Variables (Matheron, 1970, in French). The term kriging was coined in
recognition of D.G. Krige.
Although originally developed for solving ore reserve estimation problems, the advent of high-speed computers in
the 1970s facilitated the spread of geostatistics from mining and geology to a broad range of other disciplines
(including biology, zoology, ecology, psychology and medicine). It was not until the mid-to-late 1980s that
geostatistical techniques were used to any extent in the petroleum industry, and its popularity has grown every
year since.
In the petroleum industry, geostatistics is used primarily for quantitative reservoir characterization. Geostatistics
provides the conceptual framework for defining the internal architecture of the reservoir by combining subjective
and objective knowledge of the reservoir to produce detailed models of its properties.
Role of Geostatistics in Reservoir Characterization
The enormous expense of developing heterogeneous offshore fields and the desire to increase ultimate recovery
has spurred oil companies to use innovative reservoir characterization techniques to determine how various
properties are distributed throughout a reservoir. Geostatistics is one of many new technologies often incorporated
into the reservoir characterization process. It is used as a means of calculating the values of properties between
the actual measured data points (interpolation), thereby creating a grid of values which can be used to create
maps, cross-sections, and flow models.
Over the past decade, geostatistical techniques, especially those incorporating 3D seismic data, have gained
acceptance as a way to characterize petroleum reservoirs, with the resulting numerical descriptions often used as
an input to fluid flow simulators. As a key component of Integrated Reservoir Characterization, Geostatistics
requires cooperation between geologists, geophysicists, geochemists, and petroleum engineers; enabling each
discipline to contribute fully in the process of building the reservoir model.
This approach is quite different from the past, where the mathematical formalization was often left to the reservoir
engineers who have been more comfortable with math and computers. Thus, part of the geostatistical philosophy
is to ensure that geological reality does not get lost in this process (Yarus, 1994).
The process of reservoir modeling involves the transformation of qualitative geological models into a numerical
model, often by someone other than a geologist. If the original model is precise, the transformation should not
present a problem. Unfortunately, all too often the final numerical model bears little resemblance to the original
model. Yarus (1994) states that the differences are often not a result of the professional interpretation, but are
typically based on pragmatic reasons. Reservoir models are expensive to produce, requiring a great deal of
computer time (and human time), and many file manipulations. To simulate a reservoir at a very fine resolution is
unreasonable and financially unacceptable. Thus, to reduce computer simulation time (and cost), the reservoir
engineer coarsens the fine grid of the original geological model to a more manageable number of grid nodes.
There were ramifications to drastically reducing the number of cells in a reservoir model. First, the heterogeneity,
or complexity of the geology was over-simplified. If the reservoir was not very complex, then a coarser
representation might be appropriate; however, coarse simulation results were often misleading for complex
reservoirs. To prevent this, history matching techniques were used to fine-tune the coarser engineering model.
Porosity, permeability and other parameters were adjusted until the fluid flow simulation matched the observed
well performance or pressures and flow rates from production tests. Once any of these conditions were met, it was
assumed that the model was correct. However, such was not always the case (Yarus, 1994).
Although reservoir simulation brings a certain closure to a study by providing the economics and development
plan, the future production performance is often disappointing. The original production predictions and recoverable
reserves often prove to be in error. Individual wells do not perform as anticipated, infill-drilling patterns may be
inappropriate and enhanced recovery methods are frequently required.
Geostatistics attempts to improve predictions by developing a different type of quantitative model. The goal is to
construct a more realistic model of reservoir heterogeneity by using methods that do not average-out the important
reservoir properties. Like the traditional deterministic approach, it preserves indisputable hard data where it is
known, while honoring interpretative soft data where it is desired. However, unlike the deterministic approach,
geostatistics provides scientists with numerous plausible results. The degree to which the various models differ is a
reflection of the unknown, or a measurement of the uncertainty. Yarus (1994) states that some of the outcomes
may challenge the prevailing geological wisdom, and will almost certainly provide a range of economic scenarios
from optimistic to pessimistic.
BASIC ELEMENTS
OF A GEOSTATISTICAL STUDY
The typical geostatistical study will involve the following steps:
1. Data gathering and preparation (including initial quality control)
2. Data loading
3. Exploratory Data Analysis
4. Spatial Continuity Analysis
a.
calculation of experimental covariance model
b.
interpretation and modeling
5. Search Neighborhood Design

6. Model Cross-Validation
7. Spatial Interpolation of Reservoir Properties
8. Conditional Simulation of Reservoir Properties
9. Model Uncertainty Assessment
LIMITATIONS OF A GEOSTATISTICAL APPROACH

Geostatistics will never be a substitute for obtaining good data (and there is never enough good
data). Nor is it an alternative to conducting a thorough analysis of the data. It is not a magic
technique or panacea. Models must be interpreted and validated in light of all available information,
and with regard to the principles of reservoir geology, rock physics and reservoir engineering.
In closing, remember the humorous claim that 84.2% of all statistics are simply fabricated. In later
sections of this presentation, you will see a glimpse of how the very same set of statistics can be
used to support or condemn a prospect. Lastly, never forget the immortal words of the late
Benjamin Disraeli, Prime Minister of Great Britain, There are three kinds of lies: Lies, Damn Lies,
and Statistics.
(Or did he mean Geostatistics?)
CLASSICAL STATISTICS
INTRODUCTION
Before undertaking any study of Geostatistics, it is necessary to become familiar with certain key concepts drawn
from Classical Statistics, which form the basic building blocks of Geostatistics. Because the study of Statistics
generally deals with quantities of data, rather than a single datum, we need some means to deal with that data in a
manageable form. Much of Statistics deals with the organization, presentation, and summary of data. Isaaks and
Srivastava (1989) remind us that Data speaks most clearly when organized.
This section reviews a number of classic statistical concepts that are frequently used during the course of
geostatistical analysis. By understanding these concepts, we will gain the tools needed to analyze and describe
data, and to understand the relationships between different variables.
STATISTICAL NOTATION
Statistical notation uses Roman or Greek letters in equations to represent similar concepts, with the distinction
being that:
Greek notation describes Populations: measures of a population are called parameters
Roman notation describes Samples: measures of a sample are called statistics
Now might be a good time to review the list of Greek letters. Following is a list of Greek letters and their
significance within the realm of statistics.
Letter
Name
alpha
beta
gamma
Upper &
Lower Case
delta
epsilon
zeta
eta
theta
iota
kappa
lambda
mu
nu
xi
omicron
pi
rho
sigma
tau
upsilon
phi
chi
psi
omega
Statistical Notation: Mean of a Population
Statistical Notation: Correlation Coefficient

Statistical Notation: Summation
Statistical Notation: Standard Deviation of a Population
Statistical Notation: Mean of a Sample ( )

It is important to note that in some cases, a letter may take on a different meaning, depending on whether the letter
is upper case or lower case. Certain Roman letters take on additional importance as part of the standard notation
of Statistics or Geostatistics.
Letter
Name
Statistical
Notation
Event
F
f
f
Distribution
Frequency,
Probability function for a random variable
Lag distance (distance between two sample points)
Sample mean
N
n
O
o
Population size
Sample size (or number of observations in a data set)
Observed frequencies
Outcomes
P
p
Probability
Proportion
Standard deviation of a sample
Variance
X
x
Random variable
A single value of a random variable
MEASUREMENT SYSTEMS
Because the conclusions of a quantitative study are based in part on inferences drawn from measurements, it is
important to consider the nature of the measurement systems from which data are collected. Measurements are
numerical values that reflect the amount or magnitude of some property. The manner in which numerical values
are assigned determines the measurement scale, and thereby determines the type of data analysis (Davis, 1986).
There are four measurement scales, each more rigorously defined than its predecessor; and thus containing more
information. The first two are the nominal and ordinal scales, in which we classify observations into exclusive
categories. The other two scales, interval and ratio, are the ones we normally think of as measurements,
because they involve determinations of the magnitude of an observation (Davis, 1986).
Nominal Scale
This measurement classifies observations into mutually exclusive categories of equal rank, such as red, green,
or blue. Symbols like A, B, C, or numbers are also often used. In geostatistics, we may wish to predict facies
occurrence, and may therefore code the facies as 1, 2 and 3, for sand, siltstone, and shale, respectively. Using this
scale, there is no connotation that 2 is twice as much as 1, or that 3 is greater than 2.
Ordinal Scale
Observations are sometimes ranked hierarchically. A classic example taken from geology is Mohs scale of
hardness, in which mineral rankings extend from one to ten, with higher ranks signifying increased hardness. The
step between successive states is not equal in this scale. In the petroleum industry, kerogen types are based on
an ordinal scale, indicative of stages of organic diagenesis.
Interval Scale
This scale is so named because the width of successive intervals is constant. The most commonly cited example
of an interval scale is temperature. A change from 10 to 20 degrees C is the same as the change from 110 to 120
degrees C. This scale is commonly used for many measurements. An interval scale does not have a natural zero,
or a point where the magnitude is nonexistent. Thus, it is possible to have negative values. Within the petroleum
industry, reservoir properties are measured along a continuum, but there are practical limits for the measurements.
(It would be hard to conceive of negative porosity, permeability, or thickness, or of porosity greater than 100%.)
Ratio Scale
Ratios not only have equal increments between steps, but also have a zero point. Ratio scales represent the
highest forms of measurement. All types of mathematical and statistical operations are performed with them. Many
geological measurements are based on a ratio scale, because they have units of length, volume, mass, and so
forth.
For most of our geostatistical studies, we will be primarily concerned with the analysis of interval and ratio data.
Typically, no distinction is made between the two, and they may occur intermixed in the same problem. For
example, in trend surface analysis, the independent variable may be measured on a ratio scale, whereas the
geographical coordinates are on an interval scale.
POPULATIONS AND SAMPLES

INTRODUCTION
Statistical analysis is built around the concepts of populations and samples.
A population consists of a well-defined set of elements (either finite or infinite). More specifically, a population is the
entire collection of those elements. Commonly, such elements are measurements or observations made on items
of a specific type (porosity or permeability, for example). A finite population might consist of all the wells drilled in
the Gulf of Mexico in 1999, whereas, the infinite population might be all wells drilled in the Gulf of Mexico, past,
present, and future.
A sample is a subset of elements drawn from the population (Davis, 1986). Samples are studied in order to make
inferences about the population itself.
Parameters, Data, and Statistics

Populations possess certain numerical characteristics (such as the population mean) which are known as
parameters. Data are measured or observed values obtained by sampling the population. A statistic is similar to a
parameter, but it applies to numerical characteristics of the sample data.
Within the population, a parameter consists of a fixed value, which does not change. Statistics are used to
estimate parameters or test hypotheses about the parent population (Davis, 1986). Unlike the parameter, the value
of a statistic is not fixed, and may change by drawing more than one sample from the same population.
Remember that values from Populations (parameters) are often assigned Greek letters, while the values from
Samples (statistics) are assigned Roman letters.
Random Sampling
Samples should be acquired from the population in a random manner. Random sampling is defined by two
properties.
First, a random sample must be unbiased, so that each item in the sample has the same chance of
being chosen as any other item in the sample.
Second, the random sample must be independent, so that selecting one item from the population has
no influence on the selection of other items in the population.
Random sampling produces an unbiased and independent result, so that, as the sample size increases, we have a
better chance of understanding the true nature (distribution) of the population.
One way to determine whether random samples are being drawn is to analyze sampling combinations. The
number of different samples of n measurements that can be drawn for the population, N, is given by the equation:
N!
Cn n!( N n)!
Where:
C Nn
= the number of combinations of samples
= the number of elements in the population
= the number of elements in the sample
If the sampling is conducted in a manner such that each of the C Nn samples has an equal chance of being selected,
the sampling program is said to be random and the result is a random sample (Mendenhall, 1971).
Sampling Methods
The method of sampling affects our ability to draw inferences about our data (such as estimation of values at
unsampled locations) because we must know the probability of an observation in order to arrive at a statistical
inference.
Replacement
The issue of replacement plays an important role in our sampling strategy. For example, if we were to draw
samples of cards from a population consisting of a deck, we could either:
Draw a card from the deck, and add its value to our hand, then draw another card
Or
Draw a card from the deck, note its value, and put it back in the deck, then draw a card from the deck
again.
In the first case, we sample without replacement; in the second case we sample with replacement. Sampling
without replacement prevents us from sampling that value again, while sampling with replacement allows us the
chance to pick that same value again in our sample.
Oilfield Applications to Sampling

When observations having certain characteristics are systematically excluded from the sample, whether
deliberately or inadvertently, the sampling is considered biased. In the oil industry, we face this situation quite
frequently. Suppose, for example, we may be interested in the pore volume of a particular reservoir unit for pay
estimation. Typically, we use a threshold or porosity cutoff when making the calculation, thus deliberately biasing
the true pore volume to a larger value.
Similarly, the process of drilling wells in a reservoir necessarily involves sampling without replacement.
Furthermore, any sample data set will provide only a sparse and incomplete picture of the entire reservoir. The
sampling routine (also known as the drilling program) is highly biased and dependent, and rightly so -any drilling
program will be biased toward high porosity, high permeability, high structural position, and ultimately, high
production. And the success or failure of nearby wells will influence further drilling. Because the sample data set
represents a minuscule subset of the population, we will never really know that actual population distribution
functions of the reservoir. (We will discuss bias in more detail in our discussion of summary statistics.)
However, despite these limitations, our task is to infer properties about the entire reservoir from our sample data
set. To accomplish this, we need to use various statistical tools to understand and summarize the properties of the
samples to make inferences about the population (reservoir).
TRIALS, EVENTS, AND PROBABILITY

INTRODUCTION
In statistical parlance, a trial is an experiment that produces an outcome which consists of either a success or a
failure. An event is a collection of possible outcomes of a trial. Probability is a measure of the likelihood that an
event will occur, or a measure of that events relative frequency. The following discussion introduces events and
their relation to one another, and then provides an overview on probability.
EVENTS
An event is a collection of possible outcomes, and this collection may contain zero or more outcomes, depending
on how many trials are conducted. Events can be classified by there relationship to one another:
Independent Events
Events are classified as Independent if the occurrence of event A has no bearing on the occurrence of event B,
and vice versa.
Dependent Events
Events are classified as Dependent if the occurrence of event A influences the occurrence of event B.
Mutually Exclusive Events

Events are Mutually Exclusive if the occurrence of either event precludes the occurrence of the other. Two events
that are independent events cannot be mutually exclusive.
PROBABILITY
Probability is a measure of the likelihood that an event will occur, or a measure of that events relative frequency.
The measure of probability is scaled from 0 to 1, where:
0 represents no chance of occurrence, and
1 represents certainty that the event will occur.
Probability is just one tool that enables the statistician to use information from samples to make inferences or
describe the population from which the samples were obtained (Mendenhall, 1971). In this discussion, we will
review discrete and conditional probabilities.
Discrete Probability
All of us have an intuitive concept of probability. For example, if asked to guess whether it will rain tomorrow, most
of us would reply with some confidence that rain is either likely or unlikely. Another way of expressing the estimate
is to use a numerical scale, such as a percentage scale. Thus, you might say that there is a 30% chance of rain
tomorrow, and imply that there is a 70% chance it will not rain.
The chance of rain is an example of discrete probability; it either will or it will not rain. The probability distribution
for a discrete random variable is a formula, table, or graph providing the probability associated with each value of
the random variable (Mendenhall, 1971; Davis, 1986). For a discrete distribution, probability can be defined by the
following:
P(E) = number of outcomes corresponding to event E
total number of possible outcomes
Where:
P = the probability of a particular outcome, and
E = the event
Consider the following classic example of discrete probability, used almost universally in statistics texts.
Coin Toss Experiment

Coin tossing is a clear-cut example of discrete probability. The event has two states and must occupy one or the
other; except for the vanishingly small possibility that the coin will land precisely on edge, it must come up either
heads or tails (Davis, 1986: Mendenhall, 1971).
The experiment is conducted by tossing two unbiased coins. When a single coin is tossed, it has two possible
outcomes: heads or tails. Because each outcome is equally likely, the probability of obtaining a head is . This
does not imply that every other toss results in a head, but given enough tosses, heads will appear one-half the
time.
Now let us look at the two-coin example. The sample points for this experiment with their respective probabilities
are given below (taken from Mendenhall, 1971).
Sample Point
Coin 1
Coin 2
P(EI)
E1
E2
E3
E4
Let y equal the number of heads observed. We assign the value y = 2 to sample point E1, y = 1 to
sample point E2, etc. The probability of each value of y may be calculated by adding the
probabilities of the sample points in the numerical event.
The numerical event y = 0 contains one sample point, E4; y =1 contains two sample points, E2 and E3; while
y =2 contains one sample point, E1.
The Probability Distribution Function for y, where y = Number of Heads
y
Sample Points in y
p(y)
E4
E2, E3
E1
Thus, for this experiment there is a 25% chance of observing two heads from a single toss of the two coins. The
histogram contains three classes for the random variable y, corresponding to y = 0, y = 1, and y = 2. Because p(0)
= , the theoretical relative frequency for y = 0 is ; p(1) = , hence the theoretical relative probability for y = 1 is
, etc. The histogram is shown in Figure 1 (Probability Histogram for p(y) (modified from Davis, 1986)).
Figure 1
If you were to draw a sample from this population, by throwing two balanced coins, say 100 times, and recorded
the number of heads observed each time to construct a histogram for the 100 measurements, your histogram
would appear very similar to that of Figure 1. If you repeated the experiment with 1000 coin tosses, the similarity
would be even more pronounced.
Conditional Probability
The concept of conditional probability is key to oil and gas exploration, because once a well is drilled, it makes
more information available, and allows us to revise our estimates of the probability of further outcomes or events.
Two events are often related in such a way that the probability of occurrence of one event depends upon whether
the other event has or has not occurred. Such a dependence on a prior event describes the concept of Conditional
Probability: the chance that a particular event will occur depends on whether another event occurred previously.
For example, suppose an experiment consists of observing weather on a specific day. Let event A = snow and
B = temperature below freezing. Obviously, events A and B are related, but the probability of snow, P(A), is not
the same as the probability of snow given the prior information that the temperature is below freezing. The
probability of snow, P(A), is the fraction of the entire population of observations which result in snow. Now examine
the sub-population of observations resulting in B, temperature below freezing, and the fraction of these resulting in
snow, A. This fraction, called the conditional probability of A given B, may equal P(A), but we would expect the
chance of snow, given freezing temperatures, to be larger.
In statistical notation, the conditional probability that event A will occur given that event B has occurred already is
written as:
P(A|B)
where the vertical bar in the parentheses means given and events appearing to the right of the bar have occurred
(Mendenhall, 1971).
Thus, we define the conditional probabilities of A given B as:
P(A|B) = P(AB)
P(B)
and we define the conditional probabilities of B given A as follows:
P(B|A) = P(AB)
P(A)
Bayes Theorem on Conditional Probability

Bayes Theorem allows the conditional probability of an event to be updated as newer information becomes
available. Quite often, we wish to find the conditional probability of an event, A, given that event B occurred at
some time in the past. Bayes Theorem for the probability of causes follows easily from the definition of conditional
probability:
P(A | B)
P( B | A ) P ( A )
P ( B | A ) P( A ) P ( B | A ' ) P ( A ' )
Where:
P(A | B) = the probability that event A will occur, given that event B has already occurred
P(B | A) = the probability that event B will occur, given that event A has already occurred
P(A)
= the probability that event A will occur
P(B | A') = the probability that event B will occur, given that event A has not already occurred
P(A')
= the probability that event A will not occur
A practical geostatistical application using Bayes Theorem is described in an article by Doyen; et al. (1994)
entitled Bayesian Sequential Indicator Simulation of Channel Sands in the Oseberg Field, Norwegian North Sea.
Additive Law of Probability

Another approach to probability problems is based upon the classification of compound events, event relations,
and two probability laws. The first is the Additive Law of Probability, which applies to unions.
The probability of the union (A B) is equal to:
P(A B) = P(A) + P(B) -P(AB)
If A and B are mutually exclusive, P(AB) = 0 and
P(A B) = P(A) + P(B)
Multiplicative Law of Probability

The second law of probability is called the Multiplicative Law of Probability, which applies to intersections.
Given two events, A and B, the probability of the intersection, AB, is equal to
P(AB) = P(A)P(B|A)
= P(B)P(A|B)
If A and B are independent, then P(AB) = P(A)P(B)
RANDOM VARIABLES AND THEIR PROBABILITY DISTRIBUTIONS

INTRODUCTION
Geoscientists are often tasked with estimating the value of a reservoir property at a location where that property
has not been previously measured. The estimation procedure must rely upon a model describing how the
phenomenon behaves at unsampled locations. Without a model, there is only sample data, and no inference can
be made about the values at locations that were not sampled. The underlying model and its behavior is one of the
essential elements of the geostatistical framework.
Random variables and their probability distributions form the foundation of the geostatistical method. Unlike many
other estimation methods (such as linear regression, inverse distance, or least squares) that do not state the
nature of their model, geostatistical estimation methods clearly identify the basis of the models used (Isaaks and
Srivastava, 1989). In this section, we define the random variable and briefly review the essential concepts of
important probability distributions. The random variable is further explained later, in Spatial Correlation Analysis
and Modeling.
THE PROBABILISTIC APPROACH

Deterministic models are applicable only when the process that generated the data is known in sufficient detail to
enable an accurate description of the entire population to be made from only a few sample values. Unfortunately,
few reservoir processes are understood well enough to permit application of deterministic models. Although we
know the physics or chemistry of the fundamental processes, the variables we study in reservoir data sets are
often the product of complex interactions that are not fully quantifiable. These processes include, for example,
depositional mechanisms, tectonic processes, and diagenetic alterations.
For most reservoir data sets, we must accept that there is an unavoidable degree of uncertainty about how the
attribute behaves between sample locations (Isaaks and Srivastava, 1989). Thus, a probabilistic approach is
required, and the following random function models introduced herein recognize this fundamental uncertainty,
providing us with tools to estimate values at unsampled locations.
The following discussion describes the two kinds of random variables. Next, well discuss the probability
distributions or functions associated with each type random variable.
RANDOM VARIABLE DEFINED

A random variable can be defined as a numerical outcome of an experiment whose values are generated randomly
according to some probabilistic mechanism. A random variable associates a unique numerical value with every
outcome, so the value of the random variable will vary with each trial as the experiment is repeated.
The throwing of a die, for example, produces values randomly from the set 1, 2, 3, 4, 5, 6. The coin toss is another
experiment that produces numbers randomly. (In the case of a coin toss, however, we need to designate a
numerical value to heads as 0 and tails as 1; then we can draw randomly from the set 0, 1.)
TWO CLASSES OF RANDOM VARIABLES

There are two different classes of random variables, with the distinction based on the sample interval associated
with the measurement. The two classes are the discrete and the continuous random variable. We will discuss each
in turn.
Discrete Random Variables

A discrete random variable may be identified by the number and nature of the values it assumes; it may assume
only a finite range of distinct values (distinct values being the operative phrase here, e.g.: 0,1,2,3,4,5 -as opposed
to each and every number between 0 and 1 -which would produce an infinite number of values).
In most practical problems, discrete random variables represent count (or enumerated) data, such as point counts
of minerals in a thin section. The die and coin toss experiments also generate discrete random variables.
Discrete random variables are characterized by a probability distribution, which may be described by a formula,
table or graph that provides the probability associated with each value of the discrete random variable. The
probability distribution function of discrete random variables may be plotted as a histogram. Refer to Figure 1
(Probability histogram) as an example histogram for a discrete random variable.
Figure 1
Frequency Tables and Histograms

Discrete random variables are often recorded in a frequency table, and displayed as a histogram. A frequency
table records how often data values fall within certain intervals or classes. A histogram is a graphical
representation of the frequency table.
It is common to use a constant class width for a histogram, so that the height of each bar is proportional to the
number of values within that class. Data is conventionally ranked in ascending order, and thus can be represented
as a cumulative frequency histogram, where the total number of values below certain cutoffs are shown, rather
than the total number of values in each class.
Table 1 Frequency and Cumulative Frequency tables of 100 values, X, with a class width of one (modified from
Isaaks and Srivastava, 1989).
Class
Frequency
Frequency
Cumulative
Cumulative
Interval
Occurrences
Percentage
Number
Percentage
0-1
1-2
2-3
3-4
4-5
5-6
6-7
7-8
13
13
22
22
8-9
16
16
38
38
9-10
11
11
49
49
10-11
13
13
62
62
11-12
17
17
79
79
12-13
13
13
92
92
13-14
94
94
>14
100
100
Figure 2a and 2b display Frequency and Cumulative frequency histograms of data in Table 1 (modified from Isaaks
and Srivastava, 1989).
Figure 2a
2b
(Sometimes, the histograms are converted to continuous curves by running a line from the midpoint of each bar in
the histogram. This process may be convenient for comparing continuous and discrete random variables, but may
tend to confuse the presentation.)
Continuous Random Variables

These variables are defined by an infinitely large number of possible values (much like a segment of a numberline, which can be repeatedly subdivided into smaller and smaller intervals to create an infinite number of
increments).
In most practical problems, continuous random variables represent measurement data, such as the length of a
line, or the thickness of a pay zone.
The probability density function of the continuous random variable may be plotted as a continuous curve. Although
such curves may assume a variety of shapes, it is interesting to note that a very large number of random variables
observed in nature approximate a bell-shaped curve. A statistician would say that such a curve approximates a
normal distribution (Mendenhall, 1971).
Probability Distributions Of The Discrete Random Variable

The probability distribution of a discrete random variable consists of the relative frequencies with which a random
variable takes each of its possible values. Four common probability distributions for discrete random variables are:
Binomial, Negative Binomial, Poisson, and Hypergeometric. Each of these distributions is discussed using
practical geological examples taken from Davis (1986).
Binomial Probability Distribution
Binomial distributions only apply to a special type of discrete random variable, called a binary variable. Binary
variables can only have two values: such as ON or OFF, SUCCESS or FAILURE, 0 or 1. (Often times, values such
as ON or OFF, and SUCCESS or FAILURE will be assigned the numerical values of 1 or 0 respectively.) Similarly,
binomial distributions are only valid for trials in which there are only two possible outcomes for each trial.
Furthermore, the total number of trials must be fixed beforehand, all of the trials must have the same probability of
success, and the outcomes of all the trials must not be influenced by the outcomes of previous trials. The
probability distribution governing a coin toss or die throwing experiment is a binomial distribution.
Well consider how the binomial distribution can be applied to the following oilfield example.
Problem: Forecast the probability of success of a drilling program.
Assumptions: Each wildcat is classified as either:
0 = Failure (dry hole)
1 = Success (discovery)
The binomial distribution is appropriate when a fixed number of wells will be drilled during an exploratory program
or during a single period (budget cycle) for which the forecast is made.
In this case, each well that is drilled in turn is presumed to be independent; this means that the success or failure
of one hole does not influence the outcome of the next. Thus, the probability of discovery remains unchanged as
successive wildcats are drilled (true initially -as Davis pointed out in 1986, this assumption is difficult to justify in
most cases, because a discovery or failure influences the selection of subsequent drilling locations).
The probability p that a wildcat well will discover gas or oil is estimated using an industry-wide success ratio for
drilling in similar areas, or based on the companys own success ratio. Sometimes the success ratio is a subjective
guess. From p, the binomial model can be developed for exploratory drilling as follows:
P The probability that a hole will be successful.
-p The probability of failure.
P = (1 -p)n
The probability that n successive wells will be dry.
P = (1 -p)n-1 p
The probability that the nth hole will be a discovery, but the preceding (n -1) holes will be dry.
P = n(1 -p)n-1 p
The probability of drilling one discovery well in a series of n wildcat holes, where the discovery can occur in any of
the n wildcats.
P = (1 -p)n-r pr
The probability that (n -r) dry holes will be drilled, followed by r discoveries.
However, the (n -r) dry holes and the r discoveries may be arranged in
n
r
combinations, or equivalently, in
n! / (n -r)!r! different ways, resulting in the equation:

P = [n! / (n -r)!r!][(1 -p)n-r pr]
The probability that r discoveries will be made in a drilling program of n wildcats.
This is an expression of the binomial distribution, and gives the probability that r successes will occur in n trials,
when the probability of success in a single trial is p.
For example, suppose we want to find the probability of success associated with a 5-well exploration program in a
virgin basin where the success ratio is anticipated to be about 10%. What is the probability that the entire
exploration program will be a total failure, with no discoveries?
The terms of the equation are:
N =5
r =0
p = 0.10
P = [(5!/5!0!] [1] [0.95]

= 0.59
Where:
P = the probability of success
r = the number of discovery wells
p = anticipated success ratio
n = the number of holes drilled in the exploration program
The probability of no discoveries resulting from exploratory effort is almost 60%. Using either the binomial equation
or a table for the binomial distribution, Figure 3 (Discrete distribution giving the probability of making n discoveries
in a five-well drilling program when the success ratio (probability of discovery) is 10% (modified from Davis,
Figure 3
1986) shows the probabilities associated with all possible outcomes of the five-well drilling program.
Negative Binomial Probability Distribution
Other discrete distributions can be developed for experimental situations with different basic assumptions. We can
develop a Negative Binomial Probability Distribution to find the probability that x dry holes will be drilled before r
discoveries are made.
Problem: Drill as many holes as needed to discover two new fields in a virgin basin.
Assumption: The same conditions that govern the binomial distribution are assumed, except that the number of
trials is not fixed.
The probability distribution governing such an experiment is the negative binomial. Thus we can investigate the
probability that it will require, 2, 3, 4, , up to n exploratory wells before two discoveries are made.
The expanded form of the negative binomial equation is
P = [(r + x -1)!/(r -1)!x!][(1 -p)x pr

Where:
P = the probability of success
r= the number of discovery wells
x = the number of dry holes
p = regional success ratio
If the regional success ratio is 10 %, the probability that a two-hole exploration program will meet the companys
goal of two discoveries can be calculated:
r=2
x=0
p = 0.10
P = 0.029
The calculated probabilities are low because they relate to the likelihood of obtaining two successes and exactly x
dry holes (in this case: x = zero). It may be more appropriate to consider the probability distribution that more than
x dry holes must be drilled before the goal of r discoveries is achieved. We do this by first calculating the
cumulative form of the negative binomial. This gives the probability that the goal of two successes will be achieved
in (x + r) or fewer holes, as shown in Figure 4 (Discrete distribution giving the cumulative probability that two
discoveries will be made by or before a specified hole is drilled, when the success ratio is 10% (modified from
Davis, 1986)).
Figure 4
Each of these probabilities is then subtracted from 1.0 to yield the desired probability distribution illustrated in
Figure 5 (Discrete distribution giving the probability that more than a specified number of holes must be drilled to
make two discoveries, when the success ratio is 10% (modified from Davis, 1986)).
Figure 5
Poisson Probability Distribution

A Poisson random variable is typically a count of the number of events that occur within a certain time interval or
spatial area. The Poisson probability distribution seems to be a reasonable approach to apply to a series of
geological events. For example, the historical record of earthquakes in California, the record of volcanic eruptions
in the Mediterranean, or the incidence of landslides related to El Nino along the California coast can be
characterized by Poisson distributions.
The Poisson probability model assumes that:
events occur independently,
the probability that an event will occur does not change with time,
the length of the observation period is fixed in advance,
the probability that an event will occur in an interval is proportional to the length of the interval, and
the probability of more than one event occurring at the same time is vanishingly small.
When the probability of success becomes very small, the Poisson Distribution can be used to approximate the
binomial distribution with parameters n and p. This is a discrete probability distribution regarded as the limiting
case of the binomial when:
n, the number of trials becomes very large, and
p, the probability of success on any one trial becomes very small.
The equation in this case is

p(X) = e- x/X!
Where
p(X)
= probability of occurrence of the discrete random variable X
= rate of occurrence
Note that the rate of occurrence, , is the only parameter of the distribution.
The Poisson distribution does not require either n or p directly, because we use the product np = instead, which
is given by the rate of occurrence of events.
Hypergeometric Probability Distributions
The binomial distribution would not be appropriate for calculating the probability of discovery because the chance
of success changes with each wildcat well. For example, we can use Statistics to argue two distinctly contradictory
cases:
Discovery of one reservoir increases the odds against finding another (fewer fields remaining).
Drilling a dry hole increases the probability that the remaining untested features will prove productive.
What we need is to find all possible combinations of producing and dry features within the population, then
enumerate those combinations that yield the desired number of discoveries.
The probability distribution generated by sampling without replacement, is called a hypergeometric distribution.
Consider the following:
Problem: An offshore concession contains 10 seismic anomalies, with a historical success ratio of 40%. Our
limited budget will permit only six anomalies to be drilled. Assume that if four structures are productive, the
discovery of one reservoir increases the odds against finding another. What will be the number of discoveries?
The probability of making x discoveries in a drilling program consisting of n holes, when sampling from a
population of N prospects of which S are believed to contain commercial reservoirs, is
S N -S
x n -x
N
P=
n
Where:
x
= the number of discoveries
= the number of prospects in the population
= the number of holes drilled
= the number commercial reservoirs
This expression represents the number of combinations of reservoirs, taken by the number of discoveries, times
the number of combinations of barren anomalies, taken by the number of dry holes, all divided by the number of
combinations of all prospects taken by the total number of holes in the drilling program (Davis, 1989).
Applying this to our offshore concession example containing ten seismic anomalies, from which four are likely to
be reservoirs, what are the probabilities associated with a three-well drilling program?
The probability of total failure, with no discoveries among the three structures is about 17%.
The probability of one discovery is about 50%.
A histogram of all possible outcomes of this exploration strategy is shown in Figure 6 (Discrete distribution giving
the probability of n discoveries in three holes drilled on ten prospects, when four of the ten contain reservoirs
(modified from Davis, 1986)). Note that some probability of success is (1.00 -0.17), or 83%.
Figure 6
Frequency Distributions Of Continuous Random Variables

Frequency distributions of continuous random variables follow a theoretical probability distribution or probability
density function that can be represented by a continuous curve. These functions can take on a variety of shapes.
Rather than displaying the functions as a curve, the distributions may be displayed as a histogram, as shown in
Figure 7a,7b,
Figure 7a
7c,
7b
7c
and 7d (Examples of some continuous variable probability distributions).
7d
In this section, we will discuss the following common distribution functions:
Normal Probability Distribution
Lognormal Distribution
Normal Probability Distribution

It is often assumed that random variables follow a normal probability density function, and many statistical (and
geostatistical) methods are based on this supposition. The Central Limit Theorem is the foundation of the normal
probability distribution.
Central Limit Theorem
The Central Limit Theorem (CLT) states that under rather general conditions, as the sample size increases, the
sums and means of samples drawn from a population of any distribution will approximate a normal distribution
(Sokol and Rohlf, 1969; Mendenhall, 1971). The Central Limit Theorem is defined below:
Central Limit Theorem:
If random samples of n observations are drawn from a population with finite mean, , and a standard
deviation, , then, as n grows larger, the sample mean, y, will be approximately normally distributed with mean
equal to and standard deviation n. The approximation will become more and more accurate as n
becomes large (Mendenhall, 1971).
The Central Limit Theorem consists of three statements:
1.
The mean of the sampling distribution of means is equal to the mean of the population from which the
samples were drawn.
2.
The variance of the sampling distribution of means is equal to the variance of the population from
which the samples were drawn, divided by the size of the samples.
3.
If the original population is distributed normally (i.e. it is bell shaped), the sampling distribution of
means will also be normal. If the original population is not normally distributed, the sampling
distribution of means will increasingly approximate a normal distribution as sample size increases (i.e.
when increasingly large samples are drawn).
The significance of the Central Limit Theorem is twofold:

1.It explains why some measurements tend to possess (approximately) a normal distribution.
2.The most important contribution of the CLT is in statistical inference. Many algorithms that are used to make
estimations or simulations require knowledge about the population density function. If we can accurately predict its
behavior using only a few parameters, then our predictions should be more reliable. If the CLT applies, then
knowing the sample mean and sample standard deviation, the density distribution can be recreated
precisely.
However, the disturbing feature of the CLT, and most approximation procedures, is that we must have some idea
as to how large the sample size, n, must be in order for the approximation to yield useful results. Unfortunately,
there is no clear-cut answer to this question, because the appropriate value of n depends upon the population
probability distribution as well as the use we make of the approximation. Fortunately, the CLT tends to work very
well, even for small samples, but this is not always true.
Properties of the Normal Distribution
Formally, the Normal Probability Density Function is represented by the following expression:
1 Y
e
2 2
1
Where
Z is the height of the ordinate (y-axis) of the curve and represents the density of the function. It is the
dependent variable in the expression, being a function of the variable Y.
There are two constants in the equation: , well-known to be approximately 3.14159, making 1/2 equal 0.39894,
and e, the base of the Naperian or natural logarithms, whose value is approximately 2.71828.
There are two parameters in the normal probability density function. These are the parametric mean, , and the
standard deviation, , which determine the location and shape of the distribution (these parameters are discussed
under Summary Statistics). Thus, there is not just one normal distribution, rather there is an infinity of such
curves, because the parameters can assume an infinity of values (Sokol and Rohlf, 1969).
Figure 8a
Figure 8a (Illustration of how changes in the two parameters of the normal distribution affect the shape and
position of histograms. Left ( = 4, = 1). Right( = 8, = 0.5)) illustrates the impact of parameters on the shape
of a probability distribution histogram.
The histogram (or curve) is symmetrical about the mean. Therefore the mean, median and mode (described later
under this subtopic) of the normal distribution occur at the same point. Figure 8b (Bell curve) shows that the curve
of a Gaussian normal distribution can be described by the position of its maximum,
Figure 8b
which corresponds to its mean () and its points of inflection. The distance between and one of the points of
inflection represents the standard deviation, sometimes referred to as the mean variation. The square of the mean
variation is the variance.
In a normal frequency distribution, the standard deviation may be used to characterize the sample distribution
under the bell curve. According to Sokol and Rohlf, (1969): 68.3% of all sample values fall within -1 to +1 from
the mean, while 95.4% of the sample values fall within -2and +2 from the mean, and 99.7% of the values are
contained within -3 and +3 of the mean. This bears repeating, in a different format this time:
(1 standard deviation) contains 68.3% of the data
(2 standard deviations) contain 95.46% of the data
(3 standard deviations) contain 99.73% of the data
How are the percentages calculated? The direct calculation of any portion of the area under the normal curve
requires an integration of the function shown as the above expression. Fortunately, for those who have forgotten
their calculus, the integration has recorded in tabular form (Sokol and Rohlf, 1969). These tables can be found in
most standard statistical books, for example, see Statistical Tables and Formulas, Table 1 (Hald, 1952).
Application of the Normal Distribution
The normal frequency distribution is the most widely used distribution in statistics. There are three important
applications of the density function (Sokol and Rohlf, 1969).
1.
Sometimes we need to know whether a given sample is normally distributed before we can apply
certain tests. To test whether a sample comes from a normal distribution we must calculate the
expected frequencies for a normal curve of the same mean and standard deviation, then compare the
two curves.
2.
Knowing when a sample comes from a normal distribution may confirm or reject underlying
hypotheses about the nature of the phenomenon studied.
3.
Finally, if we assume a normal distribution, we may make predictions based upon this assumption.
For the geosciences, this means a better and unbiased estimation of reservoir parameters between
the well data.
Normal Approximation to the Binomial Distribution

Recall that approximately 95% of the measurements associated with a normal distribution lie within two standard
deviations of the mean and almost all lie within three standard deviations. The binomial probability distribution
would nearly be symmetrical if the distribution were able to spread out a distance equal to two standard deviations
on either side of the mean, which in fact is the case. Therefore, to determine the normal approximation we
calculate the following when the outcome of a trial (n) results in a 0 or 1 success with probabilities q and p,
respectively:
= np
= npq
If the interval 2 lies within the binomial bounds, 0 and n, the approximation will be reasonably good
(Mendenhall, 1971).
Lognormal Distribution
Many variables in the geosciences do not follow a normal distribution, but are highly skewed, such as the
distribution in Figure 7b, and as shown below.
Figure 9 Schematic histogram of sizes and numbers of oil field discoveries of hundred thousand-barrel equivalent.
Figure 9
The histogram illustrates that most fields are small, with decreasing numbers of larger fields, and a few rare giants
that exceed all others in volume. If the histograms of Figure 7b and Figure 9 are converted to logarithmic forms
(that is, we use Yi = log Xi instead of Yi =Xi for each observation), the distribution becomes nearly normal. Such
variables are said to be lognormal.
Transformation of Lognormal data to Normal
The data can be converted into logarithmic form by a process known as transformation. Transforming the data to a
standardized normal distribution (i.e., zero mean and unit variance) simplifies data handling and eases comparison
to different data sets.
Data which display a lognormal distribution, for example, can be transformed to resemble a normal distribution by
applying the formula ln(z) to each z variate in the data set prior to conducting statistical analysis. The success of
the transformation can be judged by observing its frequency distribution before and after transformation. The
distribution of the transformed data should be markedly less skewed than the lognormal data. The transformed
values may be back-transformed prior to reporting results.
Because of its frequent use in geology, the lognormal distribution is extremely important. If we look at the
transformed variable Yi rather than Xi itself, the properties of the lognormal distribution can be explained simply by
reference to the normal distribution.
In terms of the original transformed variable Xi, the mean of Y corresponds to the nth root of the products of Xi,
Y GM n Xi
Where:
GM is the geometric mean
is analogous to , except that all the elements in the series are multiplied rather than added
together (Davis, 1986).
In practice, it is simpler to convert the measurements into logarithms and compute the mean and variance. If you
want, the geometric mean and variance compute the antilog of Y and s2y. If you work with the data in the
transformed state, all of the statistical procedures that are appropriate for ordinary variables are applicable to the
log transformed variables (Davis, 1986).
The characteristics of the lognormal distribution are discussed in a monograph by Aitchison and Brown (1969) and
in the geological context by Kock and Link (1981).
Random Error
Random errors for normal distributions are additive, which means that errors of opposite sign tend to cancel one
another, and the final measurement is near the true value. Lognormal distribution random errors are multiplicative,
rather than additive, thus produce an intermediate product near the geometric mean.
UNIVARIATE DATA ANALYSIS

INTRODUCTION
There are several ways in which to summarize a univariate (single attribute) distribution. Quite often we will simply
compute the mean and the variance, or plot its histogram. However, these statistics are very sensitive to extreme
values (outliers) and do not provide any spatial information, which is the heart of a geostatistical study. In this
section, we will describe a number of different methods that can be used to analyse data for a single variable.
SUMMARY STATISTICS
The summary statistics represented by a histogram can be grouped into three categories:
measures of location,
measures of spread, and
measures of shape.
Measures of Location
Measures of location provide information about where the various parts of the data distribution lie, and are
represented by the following:
Minimum: Smallest value.
Maximum: Largest value.
Median: Midpoint of all observed data values, when arranged in ascending order. Half the values are
above the median, and half are below. This statistic represents the 50th percentile of the cumulative
frequency histogram and is not generally affected by an occasional erratic data point.
Mode: The most frequently occurring value in the data set. This value falls within the tallest bar on the
histogram.
Quartiles: In the same way that the median splits the data into halves, the quartiles split the data in
quarters. Quartiles represent the 25th, 50th and 75th percentiles on the cumulative frequency histogram.
Mean: The arithmetic average of all data values. (This statistic is quite sensitive to extreme high or
low values. A single erratic value or outlier can significantly bias the mean.) We use the following formula
to determine the mean of a Population:
Mean = =
where:
= population mean
N = number of observations (population size)
ZI = sum of individual observations
We can determine the mean of a Sample in a similar manner. The below formula for the
sample mean is comparable to the above formula, except that population notations have
been replaced with those for samples.
Mean =
where:
x = sample mean
n = number of observations (sample size)

ZI = sum of individual observations
Measures of Spread
Measures of spread describe the variability of the data values, and are represented by the following:
Variance: Average squared difference of the observed values from the mean. Because the
variance involves squared differences, this statistic is very sensitive to abnormally high/low values.
Variance =
i
N
Kachigan (1986) notes that the above formula is only appropriate for defining variance of a population of
observations. If this same formula was applied to a sample for the purpose of estimating the variance of
the parent population from which the sample was drawn, then the formula above will tend to
underestimate the population variance. This underestimation occurs as repeated samples are drawn
from the population and the variance is calculated from each, using the sample mean ( x , rather than
the population mean (). The resulting average of these variances would be lower than the true value of the
population variance (assuming we were able to measure every single member of the population).
We can avoid this bias by taking the sum of squared deviations and dividing that sum by the number of
observations less one. Thus, the sample estimate of population variance is obtained using the following
formula:
i x
Variance = s
n 1
Standard Deviation: Square root of the variance.
Standard Deviation =
This measure is used to show the extent to which the data is spread around the vicinity of the
mean, such that a small value of standard deviation would indicate that the data was clustered
near to the mean. For example, if we had a mean equal to 10, and a standard deviation of 1.3,
then we could predict that most of our data would fall somewhere between (10 - 1.3) and (10 +
1.3), or between 8.7 to 11.3. The standard deviation is often used instead of the variance,
because the units are the same as the units of the attribute being described.
Interquartile Range: Difference between the upper (75th percentile) and the lower (25th percentile)
quartile. Because this measure does not use the mean as the center of distribution, it is less sensitive to
abnormally high/low values.
Figure 1a and 1b illustrate histograms of porosity with a mean of about 15 %, but different variances.
1b
Outliers or Spurious Data
Figure 1a
Another statistic to consider is the Z-score; a summary statistic in terms of standard deviation. Data which
appear to be anomalous based on its Z-score which have absolute values are greater than a specified cutoff are
termed outliers. The typical cutoff is 2.5 standard deviations from the mean. The formula is the ratio of the data
value minus the sample mean to the sample variance.
Zscore = (Zi -) /
This statistic serves as a caution, signifying either bad data, or a true local anomaly, which must be taken into
account in the final analysis.
Note: The Z-score transform does not change the shape of the histogram. The transform re-scales the histogram
with a mean equal 0 and a variance equal 1. If the histogram is skewed before being transformed, it retains the
same shape after the transform. The X-axis is now in terms of standard deviation units about the mean of zero.
Measures of Shape
Measures of shape describe the appearance of the histogram and are represented by the following:
Coefficient of Skewness: Averaged cubed difference between the data values and the mean,
divided by the cubed root of the standard deviation. This measure is very sensitive to abnormally
high/low values:
CS1/nZi -)3/
where:
is the mean
is the standard deviation

n is the number of X and Y data pairs
The coefficient of skewness allows us to quantify the symmetry of the data distribution, and tells us
when a few exceptional values (possibly outliers?) exert a disproportionate effect upon the mean.
positive: long tail of high values (median < mean)
negative: long tail of low values (median > mean)
zero: a symmetrical distribution
Figure 2a, 2b,
Figure 2a
and 2c
2c
illustrate histograms with negative, symmetrical and positive skewness.
2b
Coefficient of Variation: Often used as an alternative to skewness as a measure of asymmetry for

positively skewed distributions with a minimum at zero. It is defined as the ratio of the standard deviation
to the mean. A value of CV > 1 probably indicates the presence of some high erratic values (outliers).
CV =
where:
is the standard deviation
is the mean
SUMMARY OF UNIVARIATE STATISTICAL MEASURES AND DISPLAYS
Advantages
Easy to calculate.
Provides information in a very condensed form.
Can be used as parameters of a distribution model (e.g., normal distribution defined by sample mean
and variance).
Limitations
Summary statistics are too condensed, and do not carry enough information about the shape of the
distribution.
Certain statistics are sensitive to abnormally high/low values that properly belong to the data set
(eg.,,CS).
Offers only a limited description, especially if our real interest is in a multivariate data set (attributes
are correlated).
BIVARIATE STATISTICAL MEASURES AND DISPLAYS

INTRODUCTION
Methods for bivariate description not only provide a means to describe the relationship between two variables, but
are also the basis for tools used to analyze the spatial content of a random function (to be described in the Spatial
Correlation and Modeling Analysis section). The bivariate summary methods described in this section only
measure the linear relationship between the variables - not their spatial features.
THE RELATIONSHIP BETWEEN VARIABLES

Bivariate analysis seeks to determine the extent to which one variable is related to another variable. We can
reason that if one variable is indeed related to another, then information about the first variable might help us to
predict the behavior of the second. If, on the other hand, our analysis of these two variables shows absolutely no
relationship between the two, then we might need to discard one from the pair in favor of a different variable which
will be more predictive the other variable's behavior.
The relationship between two variables can be described as complementary, parallel, or reciprocal. Thus, we might
observe a simultaneous increase in value between two variables, or a simultaneous decrease. We might even see
a simultaneous decrease in the value of one variable while the other increases. An alternative way of
characterizing the relationship between two variables would be to describe their behaviors in terms of variance. In
this case, we observe how the value of one variable may change (or vary) in a manner that leaves the relationship
with the second variable unchanged. (For instance, if the relationship was defined by a 1:10 ratio, then as the
value of one variable changed, the other would vary by 10 times that amount - thus preserving the relationship.)
Dependent and Independent Variables

Where a relationship between variables does exist, we can characterize each variable as being either dependent
or independent. We use the behavior of the independent (or predictor) variable to determine how the dependent
(or criterion) variable will react. For instance, we might expect that an increase in the value of the independent
variable would result in a corresponding increase in the value of the dependent variable.
COMMON BIVARIATE METHODS

The most commonly used bivariate statistical methods include:
Scatterplots
Covariance
Product Moment Correlation Coefficient
Linear Regression
We will discuss each of these methods in turn, below.
SCATTERPLOTS
The most common bivariate plot is the Scatterplot, Figure 1 (Scatterplot of Porosity (dependent variable) versus
Acoustic Impedance (independent variable)).
Figure 1
This plot follows a common convention, in which the dependent variable (e.g., porosity) is plotted on the Y-axis
(ordinate) and the independent variable (e.g., acoustic impedance) is plotted on the X-axis (abscissa). This type of
plot serves several purposes:
detects a linear relationship,
detects a positive or inverse relationship,
identifies potential outliers,
provides an overall data quality control check.
This plot displays an inverse relationship between porosity and acoustic impedance, that is, as porosity increases,
acoustic impedance decreases. This display should be generated before calculating bivariate summary statistics,
like the covariance or correlation coefficient, because many factors affect these statistical measures. Thus, a high
or low value has no real meaning until verified visually.
A common geostatistical application of the scatterplot is the h-scatterplot. (In geostatistics, h commonly refers to
the lag distance between sample points.) These plots are used to show how continuous the data values are over a
certain distance in a particular direction. If the data values at locations separated by h are identical, they will fall on
a line x = y, a 45-degree line of perfect correlation. As the data becomes less and less similar, the cloud of points
on the h-Scatterplot becomes fatter and more diffuse. A later section will present more detail on the h-scatterplot.
COVARIANCE
Covariance is a statistic that measures the correlation between all points of two variables (e.g., porosity and
acoustic impedance). This statistic is a very important tool used in Geostatistics to measure spatial correlation or
dissimilarity between variables, and forms the basis for the correlogram and variogram (detailed later).
The magnitude of the covariance statistic is dependent upon the magnitude of the two variables. For example, if
the Xi values are multiplied by the factor k, a scalar, then the covariance increases by a factor of k. If both variables
are multiplied by k, then the covariance increases by k2. This is illustrated in the table below.
VARIABLES
COVARIANCE
X and Y
3035.63
X*10 and Y
30356.3
X*10 and Y*10
303563
The covariance formula is:

COVx,y =
i x i y
n
where:
Xi is the X variable
Yi is the Y variable
x is the mean of X
y is the mean of Y
It should be emphasized that the covariance is strongly affected by extreme pairs (outliers).
Product Moment Correlation Coefficient

The product moment correlation coefficient ( ) is more commonly called simply the correlation coefficient, and is
a statistic that measures the linear relation between all points of two variables (e.g., porosity and velocity). This
linear relationship is assigned a value that ranges between +1 to -1, depending on the degree of correlation:
+1 = perfect, positive correlation
0 = no correlation -a totally random relation
-1 = perfect inverse correlation.
Figure 2 illustrates scatterplots showing positive correlation, no correlation, and inverse correlation between two
variables.
Figure 2
The numerator for the correlation coefficient is the covariance. This value is divided by the product of the standard
deviations for variables X and Y. This normalizes the covariance, thus removing the impact of the magnitude of the
data values. Like the covariance, outliers adversely affect the correlation coefficient.
The Correlation Coefficient formula (for a population) is:
Corr. Coeff.x,y =
X i x Yi y
x, y
n
xy
where:
Xi is the X variable
Yi is the Y variable
x is the mean of X
y is the mean of Y
x is the standard deviation of X

y is the standard deviation of Y
As with other statistical formulas, Greek is used to signify the measure of a population, while algebraic notation ( r )
is used for samples.
Rho Squared
The square of the correlation coefficient 2 (also referred to as r2) is a measure of the variance accounted for in a
linear relation. This measure tells us about the extent to which two variables covary. That is, it tells us how much of
the variance seen in one variable can be predicted by the variance found in the other variable. Thus, a value of =
-0.83 between porosity and acoustic impedance tells us that as porosity increases in value, velocity decreases,
which has a real physical meaning. However, only about 70% (actually, it is -0.83 2, or 68.89%)of the variability in
porosity is explained by its relationship with acoustic impedance.
In keeping with statistical notation, the Greek symbol 2 is used to denote the correlation coefficient of a
population, while the algebraic equivalent is used to r2 refer to the correlation coefficient of a sample.
Linear Regression
Linear regression is another method we use to indicate whether a linear relationship exists between two variables.
This is a useful tool, because once we establish a linear relationship, we may later be able to interpolate values
between points, extrapolate values beyond the data points, detect trends, and detect points that deviate away from
the trend.
Figure 3 (Scatterplot of inverse linear relationship between porosity and acoustic impedance, with a correlation
coefficient of -0.83), shows a simple display of regression.
Figure 3
When two variables have a high covariance (strong correlation), we can predict a linear relationship between the
two. A regression line drawn through the points of the scatterplot helps us to recognize the relationship between
the variables. A positive slope (from lower left to upper right) indicates a positive or direct relationship between
variables. A negative slope (from upper left to lower right) indicates a negative or inverse relationship. In the
example illustrated in the above figure, the porosity clearly tends to decrease as acoustic impedance increases.
The regression equation has the following general form:
Y = a + bXi,
where:
Y is the dependent variable, or the variable to be estimated (e.g., porosity)
Xi is the independent variable, or the estimator (e.g., velocity)
b is the slope; defined as b = ( y/ x), and
is the correlation coefficient between X and Y
x is the standard deviation of X

y is the standard deviation of Y
a is a Constant, which defines the ordinate (Y-axis) intercept

and:
a = x -by
x is the mean of X
y is the mean of Y
With this equation, we can plot a regression line that will cross the Y-axis at the point a, and will have a slope
equal to b
Linear equations can include polynomials of any degree, and may include combinations of logarithmic, exponential
or any other non-linear variables.
The terms in the equation for which coefficients are computed are independent terms, and can be simple (a single
variable) or compound (several variables multiplied together). It is also common to use cross terms (the interaction
between X and Y), or use power terms.
Z = a + bX +cY: uses X and Y as predictors and a constant

Z = a + bX + cY + dXY: adds the cross term
Z = a + bX + cY + dXY +eX2 + fY2: adds the power terms
SUMMARY OF BIVARIATE STATISTICAL MEASURES AND DISPLAYS

Advantages
Easy to calculate.
Provides information in a very condensed form.
Can be used to estimate one variable from another variable or from multiple variables.
Limitations
Summary statistics sometimes can be too condensed, and do not carry enough information about the
shape of the distribution.
Certain statistics are sensitive to abnormally high/low values that properly belong to the data set (e.g.,
covariance, correlation coefficient). Outliers can highly bias a regression predication equation.
No spatial information.
EXPLORATORY DATA ANALYSIS

The early phase of a geostatistics project often employs classical statistical tools in a general analysis and
description of the data set.
This process is commonly referred to as Exploratory Data Analysis, or simply EDA. It is conducted as a way of
validating the data itself -you need to be sure each value that you plug into the geostatistical model is valid.
(Remember: garbage in -garbage out!) By analyzing the data itself, you can determine which points represent
anomalous values of an attribute (outliers) that should either be disregarded or should be scrutinized more closely.
EDA is an important precursor to the final goal of a geostatistical study, which may be interpolation, or simulation
and assessment of uncertainty. Unfortunately, in many studies, including routine mapping of attributes, EDA is
often overlooked. However, it is absolutely necessary to have a good understanding of your data, so taking the
time in EDA to check the quality of the data, as well as exploring and describing the data set, will reward you with
improved results.
The classical statistical tools described in previous sections, along with the tools that we will introduce in the
sections under the Data Validation heading, will help you to conduct a thorough analysis of your data.
EDA PROCESS
Note that there is no one set of prescribed steps in EDA. Often, the process will include a number of the following
tasks, depending on the amount and type of data involved:
data preprocessing
univariate and multivariate statistical analysis
identification and probable removal of outliers
identification of sub-populations
data posting
quick maps
sampling of seismic attributes at well locations
At the very least, you should plot the distribution of attribute values within your data set. Look for anomalies in your
data, and then look for possible explanations for those anomalies. By employing classical statistical methods to
analyze your data, you will not only gain a clearer understanding of your data, but will also discover possible
sources of errors and outliers.
Geoscientists tasked with making predictions about the reservoir will always face these limitations:
Most prospects provide only a very few direct hard observations (well data)
Soft data (seismic) is only indirectly related to the hard well data
A scarcity of observations can often lead to a higher degree of uncertainty
These problems can be compounded when errors in the data are overlooked. This is especially troublesome with
large data sets, and when computers are involved; we simply become detached from our data. A thorough EDA will
foster an intimate knowledge of the data to help you flag bogus results. Always take the time to explore your data.
SEARCH NEIGHBORHOOD CRITERIA

INTRODUCTION
All interpolation algorithms require a standard for selecting data, referred to as the search neighborhood. The
parameters that define a search neighborhood include:
Search radius
Neighborhood shape
Number of sectors ( 4 or 8 are common)
Number of data points per sector
Azimuth of major axis of anisotropy
When designing a search neighborhood, we should remember the following points:
Each sector should have enough points ( 4) to avoid directional sampling bias.
CPU time and memory requirements grow rapidly as a function of the number of data points in a
neighborhood.
We will see a further example of the search neighborhood in our later discussion on kriging.
SEARCH STRATEGIES
Two common search procedures are the Nearest Neighbor and the Radial Search methods. These strategies
calculate the value of a grid node based on data points in the vicinity of the node.
Nearest Neighbor
One simple search strategy looks for data points that are closest to the grid node, regardless of their angular
distribution around the node. The nearest neighbor search routine is quick, and works well as long as samples are
spread about evenly. However, it provides poor estimates when sample points are concentrated too closely along
widely spaced traverses.
Another drawback to the nearest neighbor method occurs when all nearby points are concentrated in a narrow
strip along one side of the grid node (such as might be seen when wells are drilled along the edge of a fault or
pinchout). When this occurs, the selection of points produces an estimate of the node that is essentially
unconstrained, except in one direction. This problem may be avoided by specifying search parameters which
select control points that are evenly distributed around the grid node.
Radial Searches
Two common radial search procedures are the quadrant search, and its close relative, the octant search. Each is
based on a circular or elliptical area, sliced into four or eight equal sections. These methods require a minimum
number of control points for each of the four or eight sections surrounding the grid node.
These constrained search procedures test more neighboring control points than the nearest neighbor search,
which increases the time required. Such constraints on searching for nearest control points will expand the size of
the search neighborhood surrounding the grid node because a number of nearby control points will be passed over
in favor of more distant points that satisfy the requirement for a specific number of points being selected from a
single sector.
In choosing between the simple nearest neighbor approach and the constrained quadrant or octant searches,
remember that the autocorrelation of a surface decreases with increasing distance, so the remote data points
sometimes used by the constrained searches are less closely related to the location being estimated. This may
result in a grid node estimate that is less realistic than that produced by the simpler nearest neighbor search.
SPATIAL DESCRIPTION
One of the distinguishing characteristics of earth science data is that these data sets are assigned to some
particular location in space. Spatial features of the data sets, such as the degree of continuity, directional trends
and location of extreme values, are of considerable interest in developing a reservoir description. The statistical
descriptive tools presented earlier are not able to capture these spatial features. In this section, we will use a data
set from West Texas to demonstrate tools that describe spatial aspects of the data.
DATA POSTING
Data posting is an important initial step in any study (Figure 1: Posted porosity data for 55 wells from North
Cowden Field in West Texas).
Figure 1
Not only do these displays reveal obvious errors in data location, but they often also highlight data values that may
be suspect. Lone high values surrounded by low values (or visa versa) are worth investigating. Data posting may
provide clues as to how the data were acquired. Blank areas may indicate inaccessibility (another companys
acreage, perhaps); heavily sampled areas indicate some initial interest. Locating the highest and lowest values
may reveal trends in the data.
In this example, the lower values are generally found on the west side of the area, with the larger values in the
upper right quadrant. The data are sampled on a nearly regular grid, with only a few holes in the data locations.
The empty spots in the lower right corner are on acreage belonging to another oil company. Other missing points
are the result of poor data, and thus are not included in the final data set. More information is available about this
data set in an article by Chambers, et al. (1994). This data set and acoustic impedance data from a high-resolution
3D seismic survey will be used to illustrate many of the geostatistical concepts throughout the remainder of this
presentation.
DATA DISTRIBUTION
A reservoir property must be mapped on the basis of a relatively small number of discrete sample points (most
often consisting of well data). When constructing maps, either by hand or by computer, attention must be paid to
the distribution of those discrete sample points. The distribution of points on maps (Figure 2, Typical distribution of
data points within the map area) may be classified into three categories: regular, random, or clustered (Davis,
1986).
Figure 2
Regular: The pattern is regular (Figure 2 -part a) if the points are located on some sort of grid pattern,
for example, a 5-spot well pattern. The patterns of points are considered uniform in density if the points
in any sub-area are equal to the density of points in any other sub-area.
Random: When points are distributed at random (Figure 2 -part b) across the map area, the coverage
may be uniform, however, we do not expect to see the same number of points within each sub-area.
Clustered: Many of the data sets we work with show a natural clustering (Figure 2 -part c) of points
(wells). This is especially true when working on a more regional scale.
GRIDS AND GRIDDING

INTRODUCTION
One of our many tasks as geoscientists is to create contour maps. Although contouring is still performed by hand,
the computer is used more and more to map data, especially for large data sets. Unfortunately, data are often fed
into the computer without any special treatment or exploratory data analysis. Quite often, defaults are used
exclusively in the mapping program, and the resulting maps are accepted without question, even though the maps
might violate sound geological principles.
Before using a computer to create a contour map it is necessary to create a grid and then use the gridding process
to create the contours. This discussion will introduce the basic concepts of grids, gridding and interpolation for
making contour maps.
WHAT IS A GRID?
Taken to extremes, every map contains an infinite number of points within its map area. Because it is impractical to
sample or estimate the value of any variable at an infinite number of points within the map area, we define a grid to
describe locations where estimates will be calculated for use in the contouring process.
A grid is formed by arranging a set of values into a regularly spaced array, commonly a square or rectangle,
although other grid forms may also used. The locations of the values represent the geographic locations in the
area to be mapped and contoured (Jones, et al., 1986). For example, well spacing and known geology might
influence your decision to calculate porosity every 450 feet in the north-south direction, and every 300 feet in the
east-west direction. By specifying a regular interval of columns (every 450 feet in the north-south direction) and
rows (every 300 feet in the east-west direction), you have, in effect, created a grid.
Grid nodes are formed by the intersection of each column with a row. The area enclosed by adjacent grid nodes is
called a grid cell (three nodes for a triangular arrangement, or more commonly, four nodes for a square
arrangement). Because the sample data represent discrete points, a grid should be designed to reflect the average
spacing between the wells, and designed such that the individual data points lie as closely as possible to a grid
node.
GRID SPACING
The grid interval controls the detail that can be seen in the map. No features smaller than the interval are retained.
To accurately define a feature, it must cover two to three grid intervals; thus the cell should be small enough to
show the required detail of the feature. However, there is a trade-off involving grid size. Large grid cells produce
quick maps with low resolution, and a course appearance. While small grid cells may produce a finer appearance
with better resolution, they also tend to increase the size of the data set, thus leading to longer computer
processing time; furthermore a fine grid often imparts gridding artifacts that show up in the resulting map (Jones,
et al., 1986).
A rule of thumb says that the grid interval should be specified so that a given grid cell contains no more than one
sample point. A useful approach is to estimate, by eye, the average well spacing, and use it as the grid interval,
rounded to an even increment (e.g., 200 rather than 196.7).
GRIDS AND GRIDDING

Within the realm of geostatistics, you will often discover that seemingly similar words have quite different
meanings. In this case, the word gridding should not be considered as just a grammatical variation on the word
grid.
Gridding is the process of estimating the value of an attribute from isolated points onto a regularly spaced mesh,
called a grid (as described above). The attributes values are estimated at each grid node.
Interpolation and Contouring

The objective of contouring is to visually describe or delineate the form of a surface. The surface may represent a
structural surface, such as depth to the top of a reservoir, or may represent the magnitude of a petrophysical
property, such as porosity. Contour lines, strictly speaking, are isolines of elevation. However, geologists are rather
casual about their use of terminology, and usually call any isoline a contour, whether it depicts elevation, porosity,
thickness, composition, or other property.
Contour maps are a type of three-dimensional graph or diagram, compressed onto a flat, two-dimensional
representation. The X-and Y-axes usually correspond to the geographical coordinates east-west and north-south.
The Z-axis typically represents the value of the attribute, for example: elevation with respect to sea level, or
porosity, thickness, or some other quantity (Davis, 1986).
Contour lines connect points of equal value on a map, and the space between two successive contour lines
contains only points falling within the interval defined by the contour lines. It is not possible to know the value of the
surface at every possible location, nor can we measure its value at every point we might wish to choose. Thus, the
purpose of contouring is to summarize large volumes of data and to depict its three-dimensional spatial distribution
on a 2-D paper surface. We use contour maps to represent the value of the property at unsampled locations
(Davis, 1986; Jones, et al., 1986).
The Interpolation Process
The mapping (interpolation) and contouring process involves four basic steps. According to Jones, et al. (1986),
the four mapping and contouring steps are:
1.
Identifying the area and the attribute to be mapped (Figure 1 , Location and values of control points
within the mapping area, North Cowden Field, West Texas);
Figure 1
2.
Designing the grid over the area (Figure 2 , Grid design superimposed on the control points);
Figure 2
3.
Calculating the values to be assigned at each grid node (Figure 3, Upper left quadrant of the grid
shown in Figure 2.
Figure 3
The values represent interpolated values at the grid nodes.These values are used to create the
contours shown in Figure 4);
4.
Using the estimated grid node values to draw contours (Figure 4 , Contour map of porosity, created
from the contol points in Figure 1 and the grid mesh values shown in Figure 3).
Figure 4
To illustrate these steps, we will use porosity measurements from the previously mentioned West
Texas data set.
TRADITIONAL INTERPOLATION METHODS

INTRODUCTION
The point-estimation methods described in this section consist of common methods used to make contour maps.
These methods use non-geostatistical interpolation algorithms and do not require a spatial model. They provide a
way to create an initial quick look map of the attributes of interest. This section is not meant to provide an
exhaustive dissertation of the subject, but will introduce certain concepts needed to understand the principles of
geostatistical interpolation and simulation methods discussed in later sections.
Most interpolation methods use a weighted average of values from control points in the vicinity of the grid node in
order to estimate the value of the attribute assigned to that node. With this approach, the attribute values of the
nearest control points are weighted according to their distance from the grid node, with the heavier weights
assigned to the closest points. The attribute values of grid nodes that lie beyond the outermost control points must
be extrapolated from values assigned to the nearest control points.
Many of the following methods require the definition of Neighborhood parameters to characterize the set of sample
points used during the estimation process, given the location of the grid node. For the upcoming examples, weve
specified the following neighborhood parameters:
Isotropic ellipse with a radius = 5000 feet
4 quadrants
A minimum of 7 sample points, with an optimum of 3 sample points per quadrant
These examples use porosity measurements, located on a nearly regular grid. See Figure 1 (Location and values
of control points within the mapping area at North Cowden Field, West Texas) for the sample locations and
porosity values.
Figure 1
The following seven estimation methods will be discussed in turn:
Inverse Distance
Closest Point
Moving Average
Least Squares Polynomial
Spline
Polygons of Influence
Triangulation
The first five estimation methods are accompanied by images that illustrate the patterns and relative magnitude of
the porosity values created by each method. All images have the same color scale. The lowest value of porosity is
dark blue (5%) and the highest value is red (13%), with a 0.5% color interval. However, for the purpose of this
illustration, the actual values are not important at this time. (No porosity mapping images were produced for the
polygons of influence and triangulation methods.)
INVERSE DISTANCE
This estimation method uses a linear combination of attribute values from neighboring control points. The weights
assigned to the measured values used in the interpolation process are based on distance from the grid node, and
are inversely proportional, at a given power (p). If the smallest distance is smaller than a given threshold, the value
of the corresponding sample is copied to the grid node. Large values of p ( 5 or greater) create maps similar to
the closest point method (Isaaks and Srivastava, 1989; Jones, et al., 1986), which will be described next. The
equation for the inverse distance method has the following form:

Where:
1 / d
p
(1 / d )
And:
Z* = the target grid node location
= the weights
Z = the data points
dp = power of distance from Z to Z*
Figure 2 displays Inverse Distance gridding using a power of 1.
Figure 2
The Inverse Distance method is recommended as a first pass through the data because it:
is simple to use and understand.
produces a quick map.
is an excellent QC tool.
locates bulls-eye effect (lone high or low values).
spots erroneous sample locations.
gives a first indication of trends.
CLOSEST POINT
The closest point (Figure 3) or nearest neighbor methods consist of copying the value of the closest sample
point to the target grid node.
Figure 3
This method can be viewed as a linear combination of the neighboring points with all the weights equal to 0,
except the weight attached to the closest point which is equal to 100% (Henley, 1981; Jones, et al., 1986).
Z* = Z (closest
Where
Z = the data points
MOVING AVERAGE
The moving average method (Figure 4) is the most frequently used estimation method.
Figure 4
Each neighboring sample point is given the same weight. The weight is calculated so that the sum of the weights
of all the neighboring sample points sum to unity (Henley, 1981; Jones, et al., 1986).
So, if we assume that there are N neighboring data,
Z* = Z /N
Where
Z = the data points
N = the neighboring data
The moving average takes its name from the process of estimating the attribute value at each grid node based on
the weighted average of nearby control points in the search neighborhood, and then moving the neighborhood
from grid node to node.
LEAST SQUARES POLYNOMIAL

The least squares polynomial method (Figure 5) is commonly used for trend surface analysis.
Figure 5
The neighboring points are used to fit a polynomial expression of a degree specified by the user. The polynomial
form is a logical choice for surface approximation, as any function that is continuous and possesses all derivatives
can be reproduced by an infinite power series. The polynomial surface is a mathematical function involving powers
of X and Y. The complexity of the surface (Table 3.1) is controlled by the user through the number of terms used,
which is dependent upon its degree, N, a positive integer (Jones, et al., 1986; Davis, 1986; Krumbein and Graybill,
1965, Henley, 1981).
Z* = aij Xi Yj
Where
Table 1: General form of polynomial functions (after (Jones, ET al., 1986).
Degree
Function
Z = a00 + a10X + a01Y
Z = a00 + a10X + a01Y + a20X2 + a11XY + a02Y2
Z = a00 + a10X + a01Y + . . . + a0NYN
SPLINE
Spline fitting is a commonly used quantitative method. The method ignores geologic trends and allows sample
location geometry to dictate the range of influence of the samples. The bicubic spline (Figure 6) is a twodimensional gridding algorithm.
Figure 6
In one-dimension, the function has the form of a flexible rod between the sample points. In two dimensions, the
function has the form of a flexible sheet. The objective of the method is to fit the smoothest possible surface
through all the samples using a least squares polynomial approach (Jones, et al., 1986).
POLYGONS OF INFLUENCE
This is a very simple method, and is often used in the mining industry to estimate average ore grade within blocks.
Often, the value estimated at any location is simply the value of the closest point. The method is similar to the
closest point approach. Polygonal patterns are created, based on sample location. Polygon boundaries represent
the distance midway between adjacent sample locations. As long as the points we are estimating fall within the
same polygon of influence, the polygonal estimate does not change. As soon as we encounter a grid node in a
different polygon, the estimate changes to a different value. This method causes abrupt discontinuities in the
surface, and may create unrealistic maps (Isaaks and Srivastava, 1989; Henley, 1981).
TRIANGULATION
The triangulation method is used to calculate the value of a variable (such as depth, or porosity for instance) in an
area of a map located between 3 known control points. Triangulation overcomes the problem of the polygonal
method, removing possible discontinuities between adjacent points by fitting a plane through three sample points
that surround the grid node being estimated (Isaaks and Srivastava, 1989). The equation of the plane is generally
expressed as:
Z* = ax + by + c
This method starts by connecting lines between the 3 known control points to form a triangle (denoted as rst in
Figure 7).
Figure 7
Next, join a line from the unknown point (point O in the figure), to each of the corners of the triangle, thereby
forming 3 new triangles within the original triangle.
The value of any point located within the triangle (point O in this example) can be determined through the following
steps:
1. Compute the areas of the resulting new triangles
2. Use the areas of each triangle to establish a weight for each corner point
3. Multiply the values of the three corner points by their respective weights, and
4. Add the resulting products.
The formula to find the area of a triangle is:
A = bh/2
Where
A is the area of the triangle,
b is the length of the base, and
h is the length of the height, taken perpendicular to the base.
Weights are assigned to each value in proportion to the area of the triangle opposite the known value, as shown by
the example in the Figure 7. This example shows how the values from the three closest locations are weighted by
triangular areas to form an estimated value at point O. The control values (r,s,t) are located at the corners of the
triangle.
The value at point r is weighted by the triangular area AOst,
point s is weighted by the area AOrt, and
point t is weighted by the area AOrs.
The weights are taken as a percentage, where the sum of all 3 weights equals 1. Now multiply the weight times its
associated control value to arrive at a weighted control value. Do this for each of the three points. Then add up the
3 weighted control values to triangulate an interpolated depth for point O.
Be aware, however, that choosing different meshes of triangles or entering the data in a different sequence may
result in a different set of contours for your map.
MAP DISPLAY TYPES

CONTOUR MAPS
Contour maps reveal overall trends in the data values. Hand contouring the data is an excellent way to become
familiar with the data set. Unfortunately, many data sets are too large to hand contour, so computer contouring is
often the only alternative.
At this preliminary stage of spatial description, the details of the contouring algorithm need not concern us as long
as the contour map is a good visual display of the data. Gridding and contouring of the data requires values to be
interpolated onto a regular grid. For a first pass through the data, the inverse distance algorithm is a good choice.
The inverse distance parameters are easy to set; then choose an octant isotropic octant search neighborhood with
about two data points per octant, if possible. Design the grid interval to be about the average spacing of the wells,
or half that size. Figure 1 (Grid mesh and data location of 55 porosity data points from North Cowden Field in West
Texas)
Figure 1
shows the grid design with respect to the data locations and Figure 2 is the resulting contour map using an inverse
distance approach with a distance power equal to 1.
Figure 2
In this example, the high porosity area is located in the upper right quadrant, extending down the right side of the
mapped area. There is a second region of high porosity in the southern, central portion of the area. We can see
that low values are generally trending north-south, with a zone of lower porosity trending east to west through the
central portion of the area. Displays such as this will aid in designing the spatial analysis strategy and help to
highlight directional continuity trends.
SYMBOL MAPS
For many large data sets (for example, seismic), posting individual sample values may not be feasible and
contouring may mask interesting local details (Isaaks and Srivastava, 1989). An alternative approach is a symbol
map. Different colors for ranges of data values can be used to reveal trends in high and low values. Figure 3 is a
five-color symbol map of 33,800 acoustic impedance values, scaled between 0 and 1, from a high resolution 3D
seismic survey.
Figure 3
Previous studies with this data set and its accompanying porosity data set (Chambers, et al., 1994) show that
acoustic impedance has a -0.83 correlation with porosity. Therefore, observation from this map may indicate zones
of high porosity associated with the red and orange areas (see contour and data posting maps, Figure 1 and
Figure 2). Low porosity is located in the blue and green areas. If this relationship holds, the seismic data can be
used to infer porosity in the inter-well regions using a geostatistical data integration technique commonly referred
to as cokriging (which we will describe later in the section on Data Integration).
INDICATOR MAPS
An indicator map is a special type of symbol map with only two symbols, for example, black and white. With these
two symbols, each data point is assigned to one of two classes. Indicator maps simply record when a data value is
above or below a certain threshold. Four indicator maps (Figure 4 ) were created from the acoustic impedance
data shown in Figure 3. The threshold values are 0.2, 0.4, 0.6, and 0.8 acoustic impedance scaled units.
Figure 4
These maps show definite trends in high and low values, which relate to trends in porosity values.
Figure 4 parts B and C are perhaps the most revealing. Zones of lowest scaled impedance are located in the
upper right quadrant of the study area (Figure 4 part B). Zones of highest impedance are on the western side of
the study area, trending generally north to south. There is also a zone of high impedance cutting east to west
across the southern, central portion of the area, with another north to south trend in the lower right corner of the
map area.
Data posting, contour, symbol and indicator maps provide us with a lot of information about the spatial
arrangement and pattern in our data sets. These are also excellent quality control displays and provide clues about
potential data problems.
MOVING WINDOW STATISTICS

INTRODUCTION
Moving window statistics provide a way to look for local anomalies in the data set, heteroscedasticity in statistical
jargon. In earth science data it is quite common to find data values in some regions that display more variability
than in other regions. For example, local high (thief zones) or low (barriers) zones of permeability hamper the
effective recovery of hydrocarbons and creates many more problems with secondary and tertiary recovery
operations.
The calculation of a few summary statistics (mean and standard deviation) within moving windows is often useful
for investigating anomalies in the data (Isaaks and Srivastava, 1989). The method is quite simple, and consists of :
dividing the area into local neighborhoods of equal size,
then computing summary statistics within each local area.
The window size depends on the average data spacing, dimensions of the study area, and amount of data. We are
looking for possible trends in the local mean and standard deviation. It is also important to see if the magnitude of
the local standard deviation tracks (correlates) with the magnitude of the local mean, known as the proportionality
effect. See Isaaks and Srivastava (1989) for more details on moving windows and the proportionality effect.
PROPORTIONALITY EFFECT
The proportionality effect concerns the relationship of the local summary statistics computed from moving
windows. There are four relationships between the local mean and the local variability (e.g., standard deviation or
variance). According to Isaaks and Srivastava (1989), these relationships are:
There is no trend in the local mean or the variability. The variability is independent of the magnitude of
the local mean. This is the ideal case, but is rarely seen.
There is a trend in the local mean, but the variability is independent of the local mean and has no
trend.
There is no trend in the local mean, but there is a trend in the local variability.
There is a trend in both the local mean and local variability. The magnitude of the local variability
correlates with the magnitude of the local mean.
For estimation purposes, the first two cases are the most favorable. If the local variability is roughly constant, then
estimates anywhere in the mapped area will be as good as estimates anywhere else. If the local mean shows a
trend, then we need to examine our data for signs of stationarity.
CONCEPT OF STATIONARITY
A stationary property is stable throughout the area measured. Stationarity may be considered the statistical
equivalent of homogeneity, in which statistical parameters such as mean and standard deviation are not seen to
change. Stationarity requires that values in the data set represent the statistical population.
Ideally, we would like our data to be independent of sample location. However, data often show a regular increase
(or decrease) in value over large distances, and the data are then said to be non-stationary, or to show a trend
(Hohn, 1988; Isaaks and Srivastava, 1989; Henley, 1981; Wackernagel, 1995).
The concept of stationarity is used in every day practice. For example, consider the following:
The top of Formation A occurs at a depth of about 975 feet TVDss.
This statement, however, does not preclude the possibility that Formation A varies in depth from well to well. Thus,
if Z(top) is a stationary random function, and
At location Z(xi), Formation A occurs at 975 feet TVDss, then
At location Z(xi + mile), Formation A should also occur at about 975 feet TVDss.
However, if Formation A is known to be non-stationary, then predicting the depth to the top of Formation A in the
new well is more difficult, and requires a more sophisticated model. We will discuss such models and how
stationarity influences them in the section on regionalized variables.
REGIONALISED VARIABLES
INTRODUCTION
In the reservoir, the variables of interest (e.g., porosity, permeability, sand/shale volumes, etc.) are products of a
variety of complex physical and chemical processes. These processes superimpose a spatial pattern on the
reservoir rock properties, so it is important to understand the scales and directional aspects of these features to
gain efficient hydrocarbon production. The spatial component adds a degree of complexity to these variables, and
serves to increase the uncertainty about the behavior of attributes at locations between sample points (sample
points are usually wells). Deterministic models cannot handle the uncertainties associated with such variables, so
a geostatistical approach has been developed because it is based on probabilistic models that account for these
inevitable uncertainties (Isaaks and Srivastava, 1989).
THE BASIS OF THE REGIONALIZED VARIABLE AND SPATIAL CORRELATION

Matheron, in his definitive work on Geostatistics entitled Traite de Geostatistique Appliquee (1963), laid the
foundation for regionalized variable theory, a body of theoretical statistics in which the location was for the first time
considered an important factor in the estimation procedure. Regionalized variable theory pertains to the statistics
of a special type of variable that differs from the ordinary scalar random variable. Although the regionalized
variable possesses the usual distribution parameters (mean, variance, etc.), it also has a defined spatial location.
Thus, two realizations of a regionalized variable that differ in spatial location will display in general a non-zero
correlation. However, successive realizations of an ordinary scalar random variable are uncorrelated (Henley,
1981).
Therefore, the premise of regionalized variables and spatial correlation analysis is to quantify the continuity of
sample properties with distance and direction. For example, we can intuitively surmise that two wells in close
proximity are more likely to have similar reservoir properties than two wells which are further apart. But just exactly
how far can wells be separated yet still yield similar results? We need a new statistical measure, because the
classical univariate or bivariate statistics cannot capture spatial correlation information. Spatial correlation analysis
is one of the most important steps in a geostatistical reservoir study, because it conditions subsequent processes,
such as kriging and conditional simulation results, and their associated uncertainties.
PROPERTIES OF REGIONALIZED AND RANDOM VARIABLES

One data set can have exactly the same univariate statistics as another set, and yet exhibit very different spatial
continuity. Consider the following two sequences (Figure 1a and Figure 1b: Comparison of porosity data measured
over 50 units of distance with an equal sampling interval):
Figure 1a
Figure 1b
This graphic shows a plot of porosity measures along two transects. The sample spacing is 1 unit of
distance. Sequence A, on the left (Regionalized Variable) shows spatial continuity in porosity,
whereas the Sequence B on the right (Random Variable) shows a random distribution of porosity.
However, the mean, variance and histogram for both porosity sequences are identical.
Statistical Properties Of The Porosity Profiles
The porosity profiles in the above graphic are purely hypothetical, but serve to illustrate the concepts behind the
regionalized variable and spatial correlation.
Similarities:
Same mean = 8.4%
Same standard deviation = 2.7 %
Same frequency distribution (histogram)
Differences:
Sequence A has SPATIAL CONTINUITY
Sequence B is RANDOM
Sequence A exhibits a structured or spatial correlation component, and hence is a REGIONALIZED VARIABLE.
Sequence B does not show any spatial continuity, and so is classified as a RANDOM VARIABLE.
These variables will come into play later, during the mapping process. The process of hand-contouring data points
on a map is a form of geostatistical modeling, where the geoscientist has a certain model in mind before he
attempts the contouring exercise. However, those who contour data usually assume the presence of the spatial
component (regionalized variable), but typically ignore the second component (the random variable).
Autocorrelation
Let us further investigate the concept of spatial continuity by plotting Sequences A and B in a different way. When
any of these data sequences is plotted against itself, it will yield a slope of 45 degrees, thus indicating perfect
correlation. However, if the data are translated by the sampling interval, then plotted against itself, we will begin to
see the impact of spatial correlation, or the lack of spatial correlation.
Sequence A Correlation Plots

Figure 2a, 2b,
Figure 2a
2c,
2b
2d,
2c
2e,
2d
and 2f
2f
(Correlation plots) will help to illustrate the concept of spatial autocorrelation. Here we see a reasonably good
correlation even with 3 units of translation of Sequence A.
2e
Sequence B Correlation Plots

When we compare Figure 3a, 3b,
Figure 3a
and 3c
3c
(Correlation cross-plots) with that of the previous graphic, we see a distinct difference in autocorrelation
characteristics. There is a poor correlation after one unit of translation of Sequence B.
3b
Observations
Sequences A and B (above) are presented as h-Scatterplots, where h represents lag, or separation distance.
Recall that the concept of the h-Scatterplot was discussed in the section on Bivariate Statistical Measures and
Displays. The h-Scatterplot forms the basis for describing a model of spatial correlation. The shape of the cloud on
these plots tells us how continuous the data values are over a certain distance in a particular direction. For this
case, h-Scatterplots were computed along two different transects. If the data values at locations separated by h
are identical, then they will fall on a line x = y, a 45-degree line of perfect correlation. As the data becomes less
and less similar, the cloud of points on the h-Scatterplot becomes fatter and more diffuse (Hohn, 1988; Isaaks and
Srivastava, 1989).
The following observations are readily apparent from the previous two figures:
Sequence A is Regionalized Variable and shows spatial continuity over about 3 units of distance.
Good correlation after three units of translation is shown
Correlation is 0.21 after five units of translation
Sequence B is a Random Variable with no spatial continuity.
Poor correlation after one unit of translation is shown
Correlation approaches 0 after two units of translation
The Random Function

The complex attributes we study in the reservoir are random functions, which are combinations of Regionalized
and Random Variables. Thus, the random function has two components:
Structured Component, consisting of the regionalized variable, which exhibits some degree of spatial
auto-correlation
Local Random Component, consisting of the random variable (also referred to as the nugget effect),
showing little or no correlation
The random function model assumes that:
The single measurement at location z(xi) is one possible outcome from a random variable located at
point Z(xi).
The set of collected samples, z(xi), i = 1, n, are interpreted as a particular realization of dependent
random variables, Z(xi), i = 1, n, known as a random function.
The process of quantifying spatial information involves the comparison of attribute values measured at one
location with values of the same attribute measured at other locations. This method is analogous to the hScatterplot. By studying the spatial dependency between any two measurements of the same attribute sampled at
z(xi) and z(xi+h), where h is some measurement of distance, we are essentially studying the spatial correlation
between two corresponding random functions Z(x i) and Z(xi+h).
For more information on the random function model, see: Hohn, (1988); Isaaks and Srivastava, (1989) Deutsch
and Journel (1992), Wackernagel, (1995) and Henley (1981).
SPATIAL CONTINUITY ANALYSIS

In previous sections, we discussed classical methods for analyzing single variables or multiple variables. Those
methods, however, could not properly address the spatial continuity and directionality that is inherent in earth
science data. Traditional interpolation procedures work on the assumption that spatial correlation within a data set
may be modeled by a linear function, based on the premise that as the distance between sampled locations
increases, the variability between data values increases proportionally. In practice, such a linear correlation
amounts to gross oversimplification. In this section, we will introduce the concept of auto-correlation, and the tools
used to measure this property.
SPATIAL AUTO-CORRELATION
Spatial auto-correlation describes the relationship between regionalized variables sampled at different locations.
Samples that are auto-correlated are not independent with regard to distance. The closer two variables are to each
other in space the more likely they are to be related. In fact, the value of a variable at one location can be
predicted from values sampled at other (nearby) locations.
The two common measures of spatial continuity are the variogram and its close relative, the correlogram, which
allow us to quantify the continuity, anisotropy and azimuthal properties of our measured data set.
THE VARIOGRAM
Regionalized variable theory uses the concept of semivariance to express the relationship between different points
on a surface. Semivariance is defined as:
(h) = [1/2N(h)] [(zxi) -( zxi+h)]2
Where:
h
zxi
= semivariance
= lag (separation distance)
= value of sample located at point xi
zxi+h = value of sample located at point xi+h

N(h) = total number of sample pairs for the lag interval h.
Semivariance is used to describe the rate of change of a regionalized variable as a function of distance. We know
intuitively that there should be no change in values (semivariance = 0) between points located at a lag distance h =
0, because there are no differences between points that are compared to themselves. However, when we compare
points that are spaced farther apart, we see a corresponding increase in semivariance (the higher the average
semivariance, the more dissimilar the values of the attribute being examined). As the distance increases further,
the semivariance eventually becomes approximately equal to the variance of the surface itself. This distance is the
greatest distance over which a variable measured at one point on the surface is related to that variable at another
point.
Semivariance is evaluated by calculating (h) for all pairs of points in the data set and assigning each pair to a lag
interval h. If we plot a graph of semivariance versus lag distance, we create a variogram (also known as a
semivariogram).
The variogram measures dissimilarity, or increasing variance between points (decreasing correlation) as a function
of distance. In addition to helping us assess how values at different locations vary over distance, the variogram
provides a way to study the influence of other geologic factors which may affect whether the spatial correlation
varies only with distance (the isotropic case) or with direction and distance (the anisotropic case).
Because the variogram is the sum of the squared differences of all data pairs falling within a certain lag distance,
divided by twice the number of pairs found for that lag, we use the variogram to infer the correlation between
points. That is, rather than showing how two points are alike, or predicting attribute value at each point, we actually
plot the difference between each value over a given lag distance.
The Experimental Variogram

The variogram described above is known as an experimental variogram. The experimental variogram is based on
the values contained in the data set, and is computed as a preliminary step in the kriging process. The
experimental variogram serves as a template for the model variogram, which is used to guide the kriging process.
THE CORRELOGRAM
The correlogram is another measure of spatial dependence. Rather than measuring dissimilarity, the correlogram
is a measure of similarity, or correlation, versus separation distance.
C(h) = 1/n Z(xi) -m] [Z(xi+h) -m]
Where:
m is the sample mean over all paired points, n(h), separated by distance h.
Computing the covariance for increasing lags (double, triple, etc.) allows us to generate a plot showing decreasing
covariance with distance, as shown in Figure 1a and 1b: omni-directional variogram (A)
Figure 1a
and correlogram (B) computed from the same data set.
1b
In this graphic, we see that while the variogram in Frame A measures increasing variability (dissimilarity) with
increasing distance, the correlogram in Frame B measures decreasing correlation with distance.
Anatomy of the correlation model

Notice that the variogram and correlogram plots in Figure 1a and 1b curve in opposite directions. Thus, the origin
represents zero variance for the variogram and perfect correlation for the correlogram; a measure of self-similarity.
The variogram model tends to reach a plateau called a sill (the dashed horizontal line at the top of Figure 1a). The
sill represents the maximum variance of the measured spatial process being modeled. The lag distance at
which the sill is reached by the variogram is called the range, which represents the maximum separation distance
at which one data point will be able to correlate with any other point in the data set. The range plays a role in
determining the maximum separation distance between grids.
The correlogram reaches its range when C(h) = 0. The correlation scale length is determined when the covariance
value reaches zero (no correlation).
Working from the same data set, the range for a variogram and correlogram should be the same for a given set of
search parameters. The sill and range are useful properties when comparing directional trends in the data.
Notice that the correlogram intersects the Y-axis at 1.0, but there is a discontinuity near zero for the variogram
model. Often the variogram or correlogram show discontinuity near the origin. Such a discontinuity is called the
nugget effect in geostatistical terminology. The nugget effect is generally interpreted as a residual variance or
spatially independent variability that occurs at spatial scales below the observational threshold of the sampling
-smaller than the resolution of the sample grid. It can also be caused by random noise at all scales, or
measurement error.
Variogram Search Strategies

If the sample data have a regular sampling interval, the search strategy is simple. Unfortunately, point data (well
data) rarely form a neat regular array, therefore to extract as much information out of the data as possible, rather
than searching along a simple vector, we search for data within a bin.
Search Parameters
When computing the experimental variogram (covariance model), the following search parameters must be taken
into account (Figure 2: Search strategy along azimuths 45 and 135 degrees.):
Figure 2
Lag: The lag distance is the separation distance, h, between sample points used in calculating the
experimental model.
In a producing field, a good starting distance might be the average well spacing, rather than the closest well
pair spacing. For example, West Texas fields are commonly drilled on 1340-foot (1/4-mile) spacing. However,
because all wells will not be spaced exactly 1340 feet apart, we should set the lag at 1400 feet. Then we will
program a lag tolerance to one-half the lag interval. Thus, for this example, the first lag bin is from 700 to 2100
feet. (Some programs may set the first bin centered around 350 feet to account for wells spaced closer than
700 feet.) The second lag bin is from 2100 to 3500 feet, and so forth until the maximum lag distance specified
is reached.
An important consideration in designing a lag strategy involves how we specify the size of the lag bin. If we
decrease the size of the bin, then we increase the number of bins, and hence the number points that we plot
on the variogram. This will increase the resolution of the variogram. However, by decreasing the size of each
bin, we also decrease the number of data pairs within each class. This has the effect of decreasing the
likelihood that the average semivariance for that class is accurately estimated.
Search Azimuth: Because reservoir data often exhibit directional properties, we may wish to specify
a certain direction for the search strategy. Such is the case when the continuity of a reservoir property is
more prevalent in one direction than in another direction. We say this property exhibits an anisotropic
behavior. The search azimuth also has an azimuth tolerance. For example, we may wish to calculate two
directions, at 045 and 135 degrees. By using a 45-degree tolerance about each search direction, we
will be able to cover all sample locations in the neighborhood.
Bandwidth: The bandwidth restricts the limits (width) of the azimuth tolerance at large lag distances.
In the above graphic, Point A is compared to Point B. The Bandwidth is indicated by a light dashed line about the
azimuthal direction (heavy dashed line) of 45 degrees. Point B lies within one of the search bins designated by the
lag tolerance.
Omni-directional Experimental Models

There are many ways to design a search strategy, each dictated by the data configuration and the number of
sample points. Quite often, it is necessary to conduct an omni-directional search, simply because of a sparse data
set. (By calculating an omni-directional variogram, we do not necessarily imply a belief that the spatial continuity is
the same in all directions.) An omni-directional search is designed by selecting a single azimuthal direction (does
not matter what angle is selected) and setting a tolerance of 90 degrees.
The omni-direction model is a good choice for an initial variogram, which can subsequently be refined by
calculating a directional model:
It is the average of all possible directional variograms.
It can serve as an early warning for erratic data points.
If the omni-directional variogram is not able to clearly define the spatial continuity, then it is unlikely
that spatial continuity will be established by directional variograms.
Anisotropic Experimental Models
Because earth science data is often more continuous in one direction than in another, we need to design a
variogram search strategy to model the maximum and minimum directions of continuity. In the Figure 2, the
minimum direction of continuity might align along the 45-degree azimuth. By definition, the maximum direction of
continuity is orthogonal (90 degrees) to the minimum axis. Note, however, that this assumption does not always
hold true, as the axes may change direction across the study area (e.g. meandering channel system), and it is
preferable to have a variogram that conforms to the major axis of anisotropy.
Accounting for Anisotropy
If we base our variogram search along two different azimuths, we often see the influence of anisotropy. By plotting
the results onto a common graph, we produce two variograms on the same chart. A study of each variogram
allows us to further characterize the nature of the anisotropy. Figure 3a and 3b
3b
(Variograms computed along different azimuths) shows two types of anisotropy:
Figure 3a
Geometric anisotropy (Frame A) is indicated by directional variograms that have the same sill, but
different ranges.
Zonal anisotropy (Frame B) is seen when variograms have different sills but the same range.
Practical Considerations for Computing the Experimental Spatial Variogram

The omni-directional variogram considers all azimuths simultaneously.
An omni-directional variogram contains more sample pairs per lag than any directional variogram, and
therefore is more likely to show structure.
An omni-directional variogram is the average of all directional variograms.
The Nugget Effect is more easily determined from the omni-directional variogram.
May need to clean-up the data prior to calculating the variogram; variograms are very sensitive to
outliers.
Do not consider variogram values for distances greater than about the size of the study area.
Interpret a variogram only if the corresponding number of pairs is sufficient (e.g., 15 to 20 pairs).
A saw-toothed pattern may indicate a poor choice of lag increment.
If data are skewed, consider transforming the data (e.g. Gaussian distribution).
Consider data clustering.
The variogram computation involves a decision of stationarity.
Non-stationary variograms do not reach a sill and are considered unbounded (have a characteristic
parabolic upward shape).
Advantages And Limitations of Variograms And Correlograms

Given a sufficient number of data points, variograms and correlograms provide useful tools to:
measure linear spatial dependence
quantify spatial scales
identify and quantify anisotropy
test multiple geological scenarios
However, variograms and correlograms are subject to limitations:
They are measures of linear spatial interdependence, and may not be appropriate for non-linear
processes.
Spatial correlation analysis is difficult to perform when data are sparse.
It is often difficult to select a domain of stationarity (constant mean) for computation.
SPATIAL CROSS-CORRELATION ANALYSIS

The previous discussion focused on spatial analysis of only a single variable (e.g., comparing porosity values to
other nearby porosity values), a process known as auto-correlation. To study the spatial relationships between two
or more variables we use the process of cross-correlation.
The cross correlation model is useful when performing cokriging or conditional cosimulation (e.g., matching well
data with seismic data). The cross correlogram or cross variogram describes spatial correlation in which the paired
points represent different variables. In this case, known values of one variable are compared to known values of a
different variable.
If for example, we need to estimate porosity based on measurements of acoustic impedance, then it is first
necessary to compute the auto-correlation models for both attributes and then compute the cross-correlation
model, as seen in Figure 1a, 1b,
Figure 1a
and 1c
1c
(Omni-directional variograms of Porosity (A), Acoustic Impedance (B), and their cross variogram (C)).
1b
In this graphic, the solid squares on the figures represent the average of porosity or acoustic impedance data pairs
for each 500-unit lag interval. The numbers of data pairs are displayed next to the average experimental data
point. The first point contains only one data pair and should not be taken into consideration during the modeling
step.
Below are the general variogram equations for the primary attribute (porosity), the secondary attribute (acoustic
impedance), and their cross variogram.
Consider the following:
Z(xi) = the primary attribute measured at location xi
Z(xi + h) = the primary attribute measured at xi + some separation distance (lag), h
T(xi) = the secondary attribute measured at location xi
T(xi + h) = the secondary attribute measured at xi + some separation distance (lag), h
N = the number of data points
The variogram of the primary attribute is calculated as(Figure 1a):
then
(h) = 1/2N Z(xi)] -[Z(xi+h)]2
The variogram of the secondary attribute is calculated as(Figure 1b):
(h) = 1/2N T(xi)] -[T(xi+h)]2
The cross variogram between the primary and secondary attribute is calculated as(Figure 1c):
(h) = 1/2N Z(xi) -Z(xi+h)] [T(xi) -T(xi+h)]

SUPPORT EFFECT
Most reservoir studies are concerned with physical rock samples, with observations corresponding to a portion of
rock of finite volume. It is obvious that once a piece of rock is collected (e.g. cores, hand samples, etc.) from a
location, it is impossible to collect it again from the same location (Henley, 1981).
The shape and volume of the rock are collectively termed the support of the observation. If the dimensions of the
support are very small in comparison to the sampling area or volume (e.g. the reservoir), the sample can be
considered as point data (Henley, 1981). For much of our work the support does not influence our mapping, until
we compare samples of different sizes or volumes (e.g. core plugs and whole core, or wireline data and core).
The support effect becomes significant when combining well data and seismic data. Seismic data can not be
considered point data. The volumes of measurement differences between well logs and cores versus seismic data
is very large and should not be ignored. Geostatistical methods can account for the support effect using a
variogram approach. As the support size increases, the variance decreases until it remains constant after reaching
a certain area or volume (Wackernagel, 1995).
STATIONARITY IN REGIONALIZED VARIABLES

Stationarity ensures that the spatial correlation can be modeled with a positive definite function and states that the
expected value which may be considered the mean of the data values is not dependent upon the distance
separating the data points. Mathematically, this assumption means that the expected value of the difference
between two random variables is zero. This is denoted by the following equation:
E[Z(xi+h)-Z(xi)] = 0 for all xi, h
Where:
Z(xi), Z(xi +h) = random variables
E = expected value
xi = sampled location
h = distance between sampled locations
Stationarity is defined through the first-order (mean) and second-order (variability) moments of the observed
random function, and degrees of stationarity correspond to the particular moments that remain invariant across the
study area (Hohn, 1988). For a random variable, Z(x), observed at location x, the distribution function of Z(x) has the
expectation
E Z(x) = m(x)
which can depend upon x. This is the first-order moment.
Three second-order moments are useful in geostatistics:
1. The variance of the random variable Z(x):
VAR Z(x) = E [Z(x) - m(x) ]2
2. The covariance:
C(x1 -x2) = E [Z(x1) - m(x1) ] [Z(x2) - m(x2],

where
Z(x1) and Z(x2) are two random variables observed at locations x1 and x2.
3. The semivariogram function:
(x1, x2) = VAR [Z(x1) -Z(x2)] / 2
Under conditions of second-order stationarity, the semivariogram and covariance are alternative measures of
spatial autocorrelation (Hohn, 1988; Isaaks and Srivastava, 1989; Henley, 1981; Wackernagel, 1995).
HANDLING NON-STATIONARY DATA

An ongoing debate centers on how to handle non-stationary data. Some argue that stationarity is a matter of scale
(Hohn, 1988; Isaaks and Srivastava, 1989; Henley, 1981; Wackernagel, 1995). A variable of interest may have a
trend across an area, and so it would be deemed non-stationary. However, others contend that they can define
quasi-stationarity as local stationarity, if the maximum distance h used in computing the semivariogram or
covariance is much smaller than the scale of the trend (Hohn, 1988).
The impact of non-stationarity depends in part on the sampling scale in relation to the scale of the trend. With
sufficient sampling, stationarity can be achieved. Unfortunately, the geoscientist seldom has total control over the
sample distribution.
If a regionalized variable is non-stationary, it can be regarded as a composite of two parts, the residual and the
trend.
Z(x) = Y(x) + m(x)
Where:
Y(x) has an underlying variogram (residual)
m(x) can be approximated by a polynomial (trend)
Note: The univariate probability law [the probability of remaining at the same value at location Z (xi) and at
Z(xi+h)] does not depend on the location of x, but only on the separation distance, h (Isaaks and
Srivastava, 1989).
In practice, it is possible to ignore the trend if the data set contains a large number of data points. However, if the
data are sparse, the variogram of the residuals should be computed, which means detrending the data.
HANDLING TRENDS
The following approach is often used to detrend the data. However, this approach also has its problems.
1. Stationarize the data.

a.
determine the trend on the sample data (trend surface analysis)
b.
subtract the trend from the data (usually from the well data)
2. Compute the variogram (correlogram) on the residuals.

3. Obtain kriging or conditional simulation of the residuals on the grid.
4. Krig the trend to the grid.
5. Calculate the final gridded results by adding residuals to the trend.
Although this is a reasonable approach, the main hurdle is the correct determination of the trend to remove from
the raw data. In addition, the trend removed by this method is the global trend, but perhaps we should be working
at a local scale (e.g., neighborhood scale).
MODEL VARIOGRAMS
The experimental variogram and correlogram described in the previous section are calculated only along specific
inter-distance vectors, corresponding to angular/distance classes. After computing the experimental variogram, the
next step is to define a model variogram. This variogram is a simple mathematical function that models the trend in
the experimental variogram. In turn, this mathematical model of the variogram is used in kriging computations.
The kriging and conditional simulation processes require a model of spatial dependency, because:
Kriging requires knowledge of the correlation function for all-possible distances and azimuths.
The model smoothes the experimental statistics and introduces geological information.
Kriging cannot fit experimental directional covariance models independently, but depends upon a
model from a limited class of acceptable functions.
Consider a random function Z(x) with an auto-covariance C(h):
Define an estimator, Z = iZ(xi)
The variance of Z is given by: z2 = ijC(xi
-xj)
The variance must be positive (positive definiteness criterion)

for any choice of weights (i and j),
and any choice of locations (xi and xj)
To honor the above inequality, the experimental covariance model must be fit with a positive definite
C(h).
Spatial modeling is not curve fitting, in the least squares sense. A least squares model does not satisfy the positive
definiteness criterion. The shape of the experimental model usually constrains the type of model selected,
although any model can be applied, affecting the final kriged results.
THE MODEL CHOICE

The important characteristics of the spatial model are its behavior near the origin and behavior at large distances
from the origin.
BEHAVIOR NEAR THE ORIGIN

The behavior near the origin affects the short scale variability of the final map plot.
Figure 1a, 1b,
Figure 1a
1c,
1b
and 1d:
1d
Variogram behavior near the origin.
1c
Frame A shows purely random behavior. Frame B is linear, with some degree of random component. Frame C is
highly continuous, while Frame D exhibits linear behavior.
BEHAVIOR AT LARGE DISTANCES

After the variogram reaches its variance or sill, we can describe its behavior as either bounded, or unbounded.
Figure 2a and 2b: (Variogram behavior at large distances after reaching the variance of the data) shows an
example of each type of behavior.
Figure 2a
In this example, we note that the bounded variogram (Frame A) reaches a sill and remains at the sill value for an
infinite distance. This behavior is typical of the classic variogram. Meanwhile, the unbounded variogram (Frame B)
never plateaus at the sill, but shows a continuous increase in variance with increasing distance. Variograms
displaying this characteristic are typical of data that possess a trend.
2b
BASIC COVARIANCE FUNCTIONS

Basic covariance functions for modeling variograms have the following characteristics:
simple, isotropic functions
independent of direction: correlograms are equal to 1 at h = 0,

while variograms are equal to 0 at h = 0
variance reaches or approaches the sill beyond a certain distance

(the range or correlation length), a
Figure 3a, 3b,
Figure 3a
3c,
3b
3d,
3c
3e,
3d
3f,
3e
and 3g
3g
(Common variogram models) shows a variety of models.
3f
The Spherical model is the most commonly used model, followed by the Exponential. The Gaussian and Exponential
functions reach the sill asymptotically. For such functions, the range is arbitrarily defined as the distance at which the
value of the function decreases to 5%. The Nested model is a linear combination of two spherical structures; having short
and long scale components. The Hole model is used for variograms computed from data which has a repeating pattern.
The Hole model can be dangerous because the periodicity will show up in a map, although it is not present in the data.
There does not appear to be any relationship between depositional environment and variogram shape.
Below are equations for four common variogram models.
LINEAR MODEL
The linear model describes a straight line variogram. This model has no sill, so the range is defined arbitrarily to be
the distance interval for the last lag class in the variogram. (Since the range is an arbitrary value it should not be
compared directly with ranges of other models.) This model is described by the following formula:
(h) = Co + [h(C/Ao)]
where
h = lag interval,
Co = nugget variance > 0,
C = structural variance > Co, and
Ao = range parameter
SPHERICAL MODEL
The spherical model is a modified quadratic function where the range marks the distance at which pairs of points
are no longer autocorrelated and the semivariogram reaches an asymptote. This model is described by the
following formula:
(h) = Co + C [1.5(h/Ao) -0.5(h/Ao)3]
(h) = Co + C
for h < Ao
for h > Ao
where
h = the lag distance interval,
Ao = range
EXPONENTIAL MODEL
This model is similar to the spherical variogram in that it approaches the sill gradually, but differs in the rate at
which the sill is approached and in the fact that the model and the sill never actually converge. This model is
described by the following formula:
(h) = Co + C[1-exp(-h/Ao)]
where
h = lag interval,
In the exponential model, Ao is a parameter used to provide range, which, in the exponential model, is usually
assumed to be the point at which the model approaches 95% of the sill (C+Co). Range is estimated as 3Ao.
GAUSSIAN MODEL
The gaussian or hyperbolic model is similar to the exponential model but assumes a gradual rise for the yintercept. This model is described by the following formula:
(h) = Co + C[1-exp(-h2/Ao2)]
where
h = lag interval,
The range parameter in this model is simply a constant defined as that point at which 95% of the sill is
approached. The range can be estimated as 1.73A o (1.73 is the square root of 3).
Practical Guidelines For Variogram Modeling

Do not over fit; use the simplest model (fewest number of structures).
Do not fit the covariance for distances greater than the study area.
Pay special attention to the fit for small distances and the size of the nugget effect.
The nugget acts as a smoothing function during kriging.
The nugget adds variability during conditional simulation.
Beware of features that may relate to non-stationarity.
Beware of periodic oscillations.
Is hole effect related to true structure?
Is hole effect due to sparse data at given lag intervals?
Compute and fit the covariance along the direction of maximum and minimum continuity using a
single structure if possible.
KRIGING OVERVIEW
INTRODUCTION
Contouring maps by hand or by computer requires the use of some type of interpolation procedure. As previously
shown in the section on Gridding and Interpolation, there are a number of algorithms for computer-based
interpolation. At the other end of the spectrum is the geologist who maps by hand, interpolating between data
points (or extrapolating beyond the control points), drawing contours, smoothing the map to make it look real and
perhaps biasing the map with a trend based on geological experience (Hohn, 1988). This section provides a broad
overview of the computer-intensive interpolation process which lies at the heart of geostatistical modeling.
LINEAR ESTIMATION
Kriging is a geostatistical technique for estimating attribute values at a point, over an area, or within a volume. It is
often used to interpolate grid node values in mapping and contouring applications. In theory, no other interpolation
process can produce better estimates (being unbiased, with minimum error); though the effectiveness of the
technique actually depends on accurately modeling the variogram. The accuracy of kriging estimates is driven by
the use of variogram models to express autocorrelation relationships between control points in the data set.
Kriging also produces a variance estimate for its interpolation values.
The technique was first used for the estimation of gold ore grade and reserves in South Africa (hence the origin of
the term Nugget Effect), and it is named in honor of a South African mining engineer, Danny Krige. The
mathematical validity and foundation was developed by Georges Matheron, who later founded the Centre de
Geostatistiques, as part of the Ecole des Mines in Paris, France. (Henley, 1981; Hohn, 1988; Journel, 1989;
Isaaks and Srivastava, 1989; Deutsch and Journel, 1992; Wackernagel, 1995).
KRIGING FEATURES
Kriging is a highly accurate estimation process which:
minimizes estimation error (the difference between measured value - the re-estimated value)
honors hard data
does not introduce an estimation bias
does not reproduce inter-well variability
produces a smoothed result; like all interpolators
is a univariate estimator; requiring only one covariance model
weighs control points according to a spatial model (variogram)
tends to the mean value when control data are sparse
uses a spatial correlation model to determine the weights ()
assigns negative or null weights to control points outside the correlation range of the spatial model
indicates the global relative reliability of the estimate through RMS error (kriging variance), as a byproduct of kriging
has a general and easily reformulated kriging matrix, making it a very flexible technique to use more
than one variable
declusters data before the estimation
Types of Kriging
There are a number of kriging algorithms, and each is distinguished by how the mean value is determined and
used during the estimation process. The four most commonly used methods are:
Simple Kriging: The global mean is known (or supplied by the user), and is held constant over the
entire area of interpolation.
Ordinary Kriging: The local mean varies, and is re-estimated based on the control points in the
current search neighborhood ellipse.
Kriging with an External Drift: Although this method uses two variables, only one covariance model
is required, and the shape of the map is related to a 2-D attribute which guides the interpolation of the
primary attribute known only at discrete locations. A typical application is time-to-depth conversion,
where the primary attribute (such as depth at the wells) acquires its shape from the secondary attribute,
referred to as external drift (such as two-way travel time known on a 2-D grid).
Indicator Kriging: estimates the probability of an attribute at each grid node (e.g., lithology,
productivity). The technique requires the following parameters:
Coding of the attribute in binary form, as 0 or 1.
Prior Probabilities of both classes.
Spatial covariance model of the indicator variable.
The Kriging Process

We will illustrate the estimation process with an example problem, as shown in Figure 1: Arrangement of three
data points.
Figure 1
Given samples located at (Z), where = 1, 2, 3 Find the most likely value of the variable Z at the target point (grid
node: Z0*, Figure 1). In this graphic, we see the geometrical arrangement of three data points Z , the location of the
point whose value we wish to estimate Z0*, and the unknown weights, .
Consider Z0* as a linear combination of the data Z
Z0* = 0 + Z
Where: = 1 and 0 = mz -
Determine so that:
Z0* is unbiased: E [Z0* -Z] = 0
Z0* has minimum mean square error (MSE)
E [Z0* -Z]2 is minimum
Recall that the unknown value Z0* is estimated by a linear combination of n data points plus a shift parameter 0:
Z0* = 0 + Z
(1)
By transforming the above equation into a set of linear normal equations, we solve the following to obtain the
weights . The set of linear equations takes the following form:
j C (x, xj) - = c (x, x0) for all j = 1,n
(2)
or in matrix shorthand notation:

C=c
(3)
All three terms are matrices where:
C (x, xj) represents a covariance between sample points x and xj
c (x, x0) represents a covariance between a sample located at x and the target point x0; the
estimated point
are the unknown weights, j
is a Lagrange multiplier that converts a constrained minimization problem into an unconstrained

minimization.
Determine the matrix of unknown weights by solving the matrix equation for as follows:
C=c
(4)
Where
= C-1 c
(5)
Note that equation 3 is written in terms of covariance values; however we either modeled a variogram or
correlogram, not the covariance. We use the covariance values because it is computationally more efficient.
The covariance equals the sill minus the variogram (Figure 2 : Relationship between a spherical variogram and its
covariance equivalent):
Figure 2
C(h) = 2 (sill) -(h)
(6)
Kriging Variance
In addition to estimating the value of a variable at an unsampled location, the kriging technique also provides an
estimation of the likely error (in the form of error variance) at every grid node. These error estimates can be
mapped to give a direct assessment of the reliability of the contoured surface. Because the kriging variances are
determined independent of the data values, the kriging error is not a measure of local reliability (Deutsch and
Journel, 1992). Do not attempt to use the kriging standard deviation like the true classical standard deviation
statistic.
The kriging variance equation is:
2k = C(x0, x0) i(x, x0) -
(7)
Search Neighborhood Criterion

All interpolation algorithms require good data selection criteria, specified by a search neighborhood. The model
variogram plays a role in controlling extent of the neighborhood. The variogram range defines the maximum size of
the neighborhood from which control points should be selected for estimating a grid node, in order to take
advantage of the statistical correlation among the observations.
A typical geostatistical routine might interpolate values for a specific location using nearest neighbor values
weighted by distance and the degree of autocorrelation present for that distance (as defined by the variogram
model). The neighborhood searches would be limited to a specified number of nearest neighbors, and might also
be restricted to a particular direction.
Search neighborhood parameters include:
Search radius
Neighborhood shape (isotropic or anisotropic: Figure 3a and 3b)
3b
Number of sectors ( 4 or 8 are common)
Figure 3a
Number of data points per sector
Unique Neighborhoods use all data points (practical limit is 100)
Moving Neighborhoods use a limited number of points per sector (e.g., 4)
Azimuth of major axis of anisotropy
In this graphic, note the elliptical shape of the anisotropic search neighborhood and the circular shape for the
isotropic neighborhood. Both neighborhoods are divided into octants, with a maximum of two data points per
sector.
These graphic shows the radii for the anisotropic neighborhood are: minor axis = 1000 meters and major axis =
4000 meters, aligned at N15E. The isotropic model has a 1500-meter radius length. The center of the
neighborhood is the target grid node for estimation. There are 55-sample points (x) within the study area. Weights
are shown for data control points used for the estimation at the target point.
Practical Considerations in Designing the Search Neighborhood

Align the search axis with the direction of maximum anisotropy.
Search radii (if anisotropic) should be to the correlation ranges.
Each quadrant should have enough points ( 4) to avoid directional sampling bias.
CPU time and memory requirements grow rapidly as a function of the number of data points in
neighborhood.
In theory, more data in the kriging system reduces the mean square error.
In practice, the covariance is poorly known for distances exceeding about to 2/3 the size of the
field. Including points that are more distant may actually increase the error.
The kriging estimator is built from data within the search neighborhood centered at the target location
Zo*.
Practical Considerations: Unique versus Moving Search Neighborhoods

In a moving search neighborhood, a new simple kriging (SK) or ordinary kriging (OK) system of
equations is solved at each grid node.
A unique search neighborhood uses all the data, so the left side of the kriging matrix, C, is solved only
once and used at each grid node.
If sufficient wells are available for ordinary kriging, then a moving neighborhood is preferable to the
unique neighborhood.
Unique neighborhoods tend to prevent artifacts from abrupt changes in the number and values of the
data points.
Unique neighborhoods smooth the data more than moving neighborhoods.
Practical Considerations: Ordinary (OK) versus Simple Kriging (SK)

Simple kriging does not adapt to local trends; rather it relies on a constant, global mean.
Ordinary kriging uses a local mean (mz), which amounts to re-estimating mz at each grid node from
the data within the search neighborhood.
When all data points are used (unique neighborhood), ordinary kriging and simple kriging yield similar
results.
If only a few data points are available in the local search neighborhood, ordinary kriging may produce
spurious weights because of the constraint that the weights must sum to 1.
If the wells are known to provide a biased sampling, it may be better to impose your own mz with
simple kriging rather than use ordinary kriging.
Effects of Variogram Parameters on Kriging

Kriging applies weighting functions according to a mathematical model of the variogram. The resulting kriging
estimates are best linear unbiased estimates of the surface, provided that the surface is stationary and the correct
form of the variogram has been determined. As the shape of the model variogram changes, so do the kriging
results.
Rescaling the variogram or correlogram (to create a larger or smaller sill):
Has no affect on kriging estimate
Changes the kriging variance
Increasing the nugget component:
Acts as a smoothing term during kriging (makes weights more similar)
Increasing the range tends to increase the influence of more distance data points and leads to
smoother maps.
The shape of the variogram or correlogram near the origin influences the continuity of the
interpolation process (e.g., the gentler the slope, the smoother the interpolation). See Figure 4a, 4b,
Figure 4a
4c,
4b
4d,
4c
4e,
4d
4f,
4e
4g,
4f
4h,
4g
and 4i:
4i
Kriging results from a common data set, based on different variogram models.
4h
In this graphic, Frames A-H use the isotropic neighborhood design shown in Figure 3b. The nested model (Frame
F) used two spherical variograms, with a short range = 1000 meters and a long range of 10,000 meters. Nested
models are additive. The anisotropic model (Frame I) used the anisotropic neighborhood design shown in Figure 3.
The minor axes of the variogram model = 1000 meters, with a major axis = 5000 meters (5:1 anisotropy ratio),
rotated to N15E. The color scale is equivalent for all figures. Purple is 5% porosity and red is 13%. All these
illustrations were created using the same input data set.
Advantages of Kriging
Kriging is an exact interpolator (if the control point coincides with a grid node).
Kriging variance:
Relative index of the reliability of estimation in different regions.
Good indicator of data geometry.
Smaller nugget (or sill) gives a smaller kriging variance.
Minimizes the Mean Square Error.
Can use a spatial model to control the interpolation process.
A robust technique (i.e., small changes in kriging parameters equals small changes in the
results).
Disadvantages Of Kriging
Kriging tends to produce smooth images of reality (like all interpolation techniques). In doing so, short scale
variability is poorly reproduced, while it underestimates extremes (high or low values). It also requires the
specification of a spatial covariance model, which may be difficult to infer from sparse data.
Kriging consumes much more computing time than conventional gridding techniques, requiring numerous
simultaneous equations to be solved for each grid node estimated. The preliminary processes of generating
variograms and designing search neighborhoods in support of the kriging effort also require much effort. Therefore,
kriging probably is not normally performed on a routine basis; rather it is best used on projects that can justify the
need for the highest quality estimate of a structural surface (or other reservoir attribute), and which are supported
by plenty of good data.
CROSS-VALIDATION
Cross-validation is a process for checking the compatibility between a set of data, the spatial model and
neighborhood design. In cross-validation, each point in the spatial model is individually removed from the model,
and then its value is estimated by a covariance model. In this way, it is possible to compare estimated versus
actual values.
The procedure consists of the following steps:
1. Consider each control point in turn.
2. Temporarily suppress each control point from the data set.
3. Re-estimate each point from the surrounding data using the covariance model.
4.
Compare the estimated values, Zest, to the true values, Ztrue.

This also provides a re-estimation error (kriging variance is also calculated at the same time):
RE = Zest -Ztrue
5. Calculate a standardized error:

SE = RE/krig
Ideally, it should have a zero mean and a variance equal to 1.
The numerator is affected by the range
The denominator is affected by the sill
6. Average the errors for a large number of target points to obtain:
Mean error
Mean standard error
Mean squared error
Mean squared standardized error
7. Distribution of errors (in map view) can provide useful criterion for:
Selecting a search region
Selecting a covariance model
8. Any data point whose absolute Standardized Error 2.5 is considered an outlier, based on the fact
that the data point falls outside the 95% confidence limit of a normal distribution.
USEFUL CROSS-VALIDATION PLOTS

Figure 1a, 1b,
Figure 1a
1c,
1b
and 1d
1d
(Cross-validation plots) show output from a cross-validation test.
1c
Figure 1a shows a map view of the magnitude of the Re-estimation Error (RE). Open circles are over-estimations;
solid circles are under-estimations. The solid red circle falls outside 2.5 standard deviation from a mean = 0. Also,
look for intermixing of the RE as an additional indication of biasing. A good model has an equally likely chance of
over or under estimating any location.
Figure 1b is a cross plot of the measured attribute of porosity at the wells versus porosity re-estimated at the well
locations during the cross validation test. Again, open circles are under estimates. The red, solid circle is the
sample from Figure 1a.
The two most important plots are in Figure 1c and 1d because they help identify model bias. If the histogram of
standardized error (SE) in Figure 1c is skewed, if or there is a correlation between SE and the estimated values in
Figure 1d, then the model is biased. Such is not the case in this example; however, Figure 2a, 2b,
Figure 2a
2c,
2b
2c
and 2d (Cross-validation results from a biased model) is.
2d
In Figure 1a, the over estimated RE values are clustered in the center of the map. The histogram of SE (Figure 1c)
is slightly skewed towards over estimation. Finally, there is a positive correlation between SE and estimated
porosity. These indicate poor model design, poor neighborhood design, or both.
COKRIGING
INTRODUCTION
In the previous section, we described kriging with a single attribute. Rather than only consider the spatial
correlation between a set of sparse control points, we will now describe the use of a secondary variable in the
kriging process.
In this section, you will learn about multivariate geostatistical data integration techniques, which fall into the
general category called Cokriging and you will learn more about External Drift.
There are many situations when it is possible to study the covariance between two or more regionalized variables.
The techniques introduced in this section are appropriate for instances when the primary attribute of interest (such
as well data) is sparse, but there is an abundance of related secondary information (such as seismic data). The
mutual spatial behavior of regionalized variables is called coregionalization.
The estimation of a primary regionalized variable (e.g., porosity) from two or more variables (such as acoustic
impedance) is known as Cokriging.
TYPES OF COKRIGING
Cokriging is a general multivariate regression technique which has three basic variations:
Simple Cokriging uses a multivariate spatial model and a related secondary 2-D attribute to guide
the interpolation of a primary attribute known only at control points (such as well locations). The mean is
specified explicitly and assumed to be a global constant. The method uses all primary and secondary
data according to search criterion.
Ordinary Cokriging is similar to Simple Cokriging in that the mean is still assumed to be constant,
but it is estimated using the neighborhood control points rather than specified globally.
Collocated Cokriging is a reduced form of Cokriging, which requires knowledge of only the hard
data covariance model, the Product-Moment Correlation coefficient between the hard and soft data, and
the variances of the two attributes. There is also a modified search criterion used in Collocated
Cokriging. This method uses all the primary data, but, in its simplest form, uses only one secondary data
value, the value at the target grid node.
PROPERTIES OF COKRIGING
This method is a powerful extension of kriging, which:
must satisfy the same conditions as kriging:
it minimizes the estimation error.
is an unbiased estimator
honors the hard data
control points are weighed according to a model of coregionalization
is more demanding than kriging:
requires a simple covariance model the for the primary and all secondary attributes
requires cross-covariance models for all attributes
must be modeled with a single coregionalized model
Requires neighborhood searches that are more demanding
requires more computation time
DATA INTEGRATION
Besides being able to use a spatial model for determining weights during estimation, one of the more powerful
aspects of the geostatistical method is quantitative data integration. We know from classical multivariate statistics
that models developed from two or more variables often produce better estimates. We can extend classical
multivariate techniques into the geostatistical realm and use two or more regionalized variables in this
geostatistical estimation process.
Basic Concept
Well illustrate this concept by way of example. From our exploratory data analysis we might find a good correlation
between a property measured at well locations and a certain seismic attribute. In such a case, we might want to
use the seismic information to provide better inter-well estimates than could be obtained from the well data alone.
Even when the number of primary (well) data (e.g., porosity) are sparse, it is possible to use a densely sampled
secondary attribute (e.g., seismic acoustic impedance), in the interpolation process.
Well data have excellent vertical resolution of reservoir properties, but poor lateral resolution. Seismic data, on the
other hand, have poorer vertical resolution than well data, but provide densely sampled lateral information.
Geostatistical data integration methods allow us to capitalize on the strengths of both data types, to yield higher
quality reservoir models.
The Cokriging Process

We can illustrate the cokriging process by way of the following problem:
Given samples located at Z, and seismic data located on a grid, find the most likely value of the variable Z 0* at
the target grid node (Figure 1: Arrangement of control points, seismic grid, weights, and target grid node).
Figure 1
This graphic shows the geometrical arrangement of three data control points Z (where = 1,2,3), a grid of
seismic data, the unknown weights, , and the target grid node, Z0*.
Traditional Cokriging Estimator

The general Cokriging estimator is expressed as a weighted linear combination of the primary (well) data
Z1, . , Zn and the secondary (seismic) data T1, . , Tm. The traditional cokriging approach uses only secondary
data at the well locations, so if there is a very high correlation between the primary and secondary data, it is the
same as kriging the primary data only. This condition is known as self-krigability.
Zo * =
Z + T + c
Where:
= 1 (primary weights)
j = 0 (secondary weights)
c = mZ [1-
] -mT
to ensure unbiasness.
Requirements
CZZ (h) is the spatial covariance model of the primary attribute (well data).
CTT (h) is the spatial covariance model of the secondary attribute (seismic data).
CZT (h) is the spatial cross-covariance model of well and seismic data.
Self-Krigability in the Isotropic Case

A variable is defined as self-krigable when the cokriging estimate (in the isotropic case) is identical to its kriging
estimate.
Different Cases
No Correlation:
Perfect Correlation:
Z1Z2 (h) = 0
b ij b ii b jj
i,j
Intrinsic Correlation occurs when the simple and cross-variograms are proportional, which is always
the case if there is only one basic structure.
Cokriging: Example Using 3 Data Points

Data Configuration
Figure 2 shows a typical data configuration for traditional cokriging.
Figure 2
Estimator
Z0* = Z1Z1 + Z2Z2+ Z3Z3 +T1T1 +T2T2 +T3T3
Estimated Error
2 = CZ00 -Z1CZ01 -Z2CZ02- Z3CZ03- T1CT01- T21CT02- T3CT03 -
Collocated Cokriging
Collocated cokriging is a modification of the general cokriging case:
It requires only the simple covariance model of the primary attribute in its simplest form. In the case of
sparse primary data, the covariance model is often derived from the covariance model of the densely
sampled secondary attribute.
It uses all primary data according to search criterion.
It uses secondary data attribute located only at the target grid (simplest form) node during estimation.
If the secondary attribute covariance model is assumed proportional to the primary attribute
covariance model, then:
the correlation coefficient is the constant of proportionality.
we can use the correlation coefficient and the ratio of the secondary to primary variances to
transforms a univariate covariance model into a multivariate covariance model. This assumption
is termed Markov-Bayes assumption (Deutsch and Journel, 1992)
Example Using 3 Data Points

Figure 3 illustrates a typical data configuration for collocated cokriging.
Figure 3
In this graphic, the secondary data at the estimation grid node is the only bit of seismic data used in this forma of
the algorithm. Forms that are more complex combine this data configuration with the one shown in Figure 1, which
also increases the computation time substantially.
Estimator
Z0* = Z1Z1 + Z2Z2+ Z3Z3 +T0T0
Estimated Error
2 = CZ00 -Z1CZ01 -Z2CZ02-Z3CZ03-T0CT0 -
General Cokriging Versus Collocated Cokriging

Cokriging
A secondary variable is not required at all nodes of the estimation grid.
The traditional method does not incorporate secondary information from non-collocated data points, it
uses only data located at the primary sample locations.
Cokriging requires more modeling effort: CZZ(h), CYY(h) and CZY(h) must be specified. No assumption
regarding the relationship between the cross-covariance and the auto-covariance of the primary variable
is required.
It is impractical to incorporate more than two to three secondary variables into the cokriging matrix
because of increased modeling assumptions and computational time.
System of normal equations may be ill conditioned; that is, it is often difficult to find a common model
of coregionalization.
Collocated Cokriging
Collocated cokriging assumes that the secondary variable is known at all nodes of the estimation grid
and uses all secondary information during the estimation process.
The simplest form of collocated cokriging ignores the influence of non-collocated secondary data
points, because it uses the secondary data located only at the target grid node.
Collocated cokriging only requires the knowledge of the primary covariance model (C ZZ(h)), the
variances of the primary and secondary attributes (2Z, 2T) and the correlation coefficient between the
primary and secondary attributes (ZY).
The Markov-Bayes approach to collocated cokriging assumes that the cross-covariance is a scaled
version of the primary variable auto-covariance.
In general, the system of normal equations is well conditioned.
Practical Considerations: Cokriging and Collocated Cokriging

In general, increasing the number of data points of the secondary variable does not improve the
cokriging performance. In fact, as the number increases, the cokriging system can become unstable.
During extrapolation situations (i.e., no well data within the search radius), collocated cokriging
reduces to a traditional least-squares regression problem.
Careful determination of the correlation coefficient, ZT(0), is critical when applying collocated
cokriging, because it controls the scaling between the primary and secondary data when using the
Markov-Bayes assumption.
Remove outliers when computing ZT(0).
Analyze and understand the physical meaning of the correlation, especially if the well data are
sparse.
Make sure that the estimator yields a meaningful range of estimates, Z0*, for the minimum and
maximum values of the secondary data (e.g., the well data probably do not calibrate the full range
of the secondary data).
With a correlation coefficient of < 0.5, the secondary attributes has less influence during the
estimation process.
Do not use more than one or two seismic attributes, because it is often difficult to understand the
physical meaning of the multivariate correlation.
Advantages of Cokriging and Collocated Cokriging
Allows incorporation of correlated, secondary data into the mapping process.
Can calibrate and control the influence of the secondary data via a cross-covariance model
(cokriging) or through the correlation coefficient (collocated cokriging).
When compared to traditional least-squares regression, the cokriging technique honors the primary
data and accounts for spatial correlation in the variations of the secondary data.
Yields more accurate estimates than simple single variable kriging.
Limitations of Cokriging and Collocated Cokriging

Requires more modeling effort than kriging or kriging with an external drift.
Cokriging system may sometimes be ill conditioned.
Cokriging tends to produce a smoothed image, but not as smooth as kriging.
Inferring a correct linear correlation model is difficult for sparse well data.
EXTERNAL DRIFT KRIGING

This technique allows us to krig, or simulate, in the presence of a trend. Although technically a univariate problem
requiring only the primary attribute covariance model, external drift kriging (KED) is generally included in the
multivariate data integration discussion of most geostatistical presentations. KED is a true regression technique,
which uses a secondary attribute to define a trend to guide the estimation of the primary variable (Deutsch and
Journel, 1992).
Regionalized variables are made up of two parts:
Drift - expected value, analogous to a local trend surface, representing regional features
Residual - deviation from the drift, represents local features
The basic hypothesis is that the expectation of the variable is a function, denoted S(x), which is
completely defined:
E [Z(x)] = S(x)
To provide greater flexibility, we often express the model as:

E [Z(x)] = a0 + a1 S(x)
where the coefficients a0 and a1 are unknown.
The a0 and a1 coefficients in the above equation are a linear combination of the error term used to
filter the local secondary data trend (or drift). This is analogous to trend surface analysis for removing
a trend based on a polynomial equation.
Once the local secondary data drift is known, the residual is estimated at the target grid node using
traditional kriging methods, then the drift value is added back to produce the final estimated value.
Before applying the kriging conditions, the mean from the kriging neighborhood must be known. Like
traditional kriging, external drift kriging must use an authorized variogram model to ensure the
computation of a positive kriging variance (meeting the positive definiteness criterion).
The sum of the weights must equal 1:
i = 1
The weight times the drift value is equal to the drift value at the target location (which is the area we want to
investigate):
iSi = S0
These equations ensure that the system is unbiased. This optimality constraint leads to the traditional error
equation:
2 = K00 -iKi0 -0 -1 S0
KED is a multi-step process:
Compute the coefficients a0 and a1 from a local least-squares regression using the primary and
secondary variables measured at the wells.
Compute residuals of the well data.
Compute residuals at all data points.
Compute the estimated attribute at all grid nodes.
KRIGING WITH EXTERNAL DRIFT:

EXAMPLE USING 3 DATA POINTS
Figure 1 illustrates a typical example (time to depth conversion) for the KED method.
Figure 1
The objective of KED is to use the seismic data as a correlated shaping function, a true regression approach, to
construct the final depth map. Four wells intersected the top of a reservoir. Kriging was used to map the solid
curved surface through the data points. This surface is a second or third order polynomial. The seismic two-way
time data (lightweight line) is the External Drift. The seismic travel times correlate with the measured depth at the
wells and suggest a much more complex surface than the surface created using only the well data.
KED is appropriate when shape is an important aspect of the study. The approach assumes a perfect correlation
between the well and seismic data. KED is not an appropriate approach for mapping reservoir rock properties; the
collocated cokriging is the better choice.
Figure 2 shows a three data point KED example.
Figure 2
The primary data are located at Z and the secondary data at S. Note that KED also uses the secondary
information at the target grid node. This data configuration can also be used for a more rigorous application of
the collocated cokriging method.
Estimator
Z0* = 1Z1 + 2Z2+ 3Z3
Estimated Error
2 = K00 -1K01 -2K02- Z3K03 -0 -1S0
PRACTICAL CONSIDERATIONS: KRIGING WITH EXTERNAL DRIFT
The external drift must be known at the locations of all primary data and at all nodes of the estimation
grid.
In theory, the covariance model of the residuals, K ZZ(h), cannot be inferred from the Z data. In practice,
KZZ(h) CZZ(h) for small distances of h or along directions not strongly affected by the trend.
In a moving neighborhood, the coefficients a0 and a1 in the regression Z*(x) = a0 + a1 S(x) are reestimated at each grid node. Sufficient data must fall within the search neighborhood to ensure proper
definition of the regression (to filter the trend).
Use a unique neighborhood in cases with sparse data.
Consider a KED approach only when shape is important, or a very high correlation exists between the
primary and secondary attributes. Use a cross-plot to investigate the relationship:
Is there a linear relation?
Is it well defined (i.e., 0.9)?
Is the correlation physically meaningful?
The external drift should be a smoothly varying function, otherwise the KED system may be unstable
(produce extremely high or low values).
Advantages of KED
Allows direct integration of a secondary attribute during estimation of the primary data.
Easier to implement than cokriging or collocated cokriging because it does not require any secondary
attribute modeling.
Neighborhood search is identical to kriging.
Computation time is similar to kriging a single variable.
Limitations of KED
May be difficult to infer the covariance of the residuals (local features).
There is no means to calibrate and control the influence of the secondary variable because the
method is a true regression model and assumes a perfect correlation between the two data types.
KED system may be unstable if the drift is not a smoothly varying function.
MEASUREMENT ERRORS
Kriging, cokriging and conditional simulation algorithms are flexible enough to take measurement errors in the
primary variable into account. At a data point i, the measured value of Zi = Si + i, where Si is the true value
and i is the unknown measurement error. If true, then assume that:
The errors are random; that is not spatially correlated.
The expected mean value of the errors is equal to zero.
The errors are independent of the true values.
The errors have a Gaussian distribution.
Using these assumptions, decompose the data values into:
A signal component with a constant variance, S2.
Zero-mean Gaussian white noise uncorrelated with the signal. It is also assumes that the variance of
the noise, i2, is known at every primary data location.
PRACTICAL CONSIDERATIONS: MEASUREMENT ERROR
For (co)-kriging, the noise variance acts as a smoothing parameter (as does the Nugget Effect), which
determines how closely the primary data are honored. When n2 is equal to zero, the data values are
honored exactly. When n2 at one point is large compared to the signal, the data point receives a much
lower weight in the interpolation process.
In the simulation mode, n2 introduces more variability into the final simulated result.
When modeling the experimental covariance of the data, the user must remove the contribution of the
nugget to extract the signal variance.
It is often useful to allow the noise parameter to vary from one data location to another. Examples
include:
interpolating zone average data from wireline logs with differing accuracy.
mixing core-derived and log-derived measurements.
building velocity maps from well-derived and NMO-derived average velocities.
Advantages of Using Measurement Error

Integration of data of varying quality.
Account for spatially varying measurement error.
Errors are accounted for in the final result.
Limitations of Measurement Error

Only implemented for the primary data, therefore, the secondary data are assumed noise free.
Amount of smoothing is not proportional to the noise variance.
Data Integration Examples

The following examples illustrate the three data integration methods just described. Figure 1a, 1b,
Figure 1a
1c,
1b
and 1d
1d
illustrate the basic data configuration for the three examples.
1c
Figure 2a, 2b,
Figure 2a
2c,
2b
and 2d
2d
illustrate cokriging, and Figure 3a,
2c
3b,
Figure 3a
3c,
3b
and 3d
3d
show collocated cokriging and kriging with external drift.
3c
Figure 1a shows porosity data points from well log information. Figure 1b shows variograms derived from the
well data, with the experimental variogram (thin line labeled D1) superimposed on the model variogram. In
Figure 1c, the porosity data were kriged to the seismic grid using the omni-directional, spherical variogram
model having a range of 1500 meters (shown in Figure 1b). The isotropic search neighborhood used an octant
search with 2 points per sector. Figure 1d shows the seismic acoustic impedance data. The seismic data
resides on a grid of approximately 12 by 24 meters in X and Y, respectively. This is the grid mesh used for all
the following examples, including Figure 1c.
Figure 2a, 2b, 2c, and 2d illustrate an example of traditional cokriging. Porosity (Figure 2a) and acoustic
impedance (Figure 2b) were modeled with an omni-directional, spherical variogram with a range of 1500
meters. The lines labeled D1 represent the experimental variograms upon which the model variograms are
based. The cross variogram (Figure 2c) uses the same spherical model. The cross variogram shows an
inverse relationship between porosity and acoustic impedance (the correlation is -0.83). The curved, dashed
lines show the bounds of perfect positive or inverse correlation. The sill of the cross variogram reflects the
magnitude of the -0.83 correlation between the data. Figure 2d shows the results of cokriging using the crossvariogram model from Figure 2c.
Figure 3a, 3b, 3c, and 3d illustrate collocated cokriging (Figure 3a-c) and kriging with external drift (Figure 3d).
The model for the collocated kriging was derived from analysis and modeling of the seismic acoustic
impedance data from the West Texas data set. Lines D1 and D2 represent experimental variograms taken
from two different directions, based on the anisotropic search neighborhood. The well porosity data is sparse
(55 data points) in comparison to the densely sampled seismic data (33,800 data points). We are justified in
using the nested, anisotropic seismic data variogram model (Figure 3a) as a model of porosity based on the
high correlation coefficient (-0.83). Thus, we can use the Markov-Bayes assumption to create the seismic
variogram from the porosity variogram, calibrate them using the correlation coefficient, and scale them based
on their individual variances. Figure 3b is a result of a Markov-Bayes collocated cokriging using a correlation
of -0.83. Figure 3c is also a collocated cokriging using the Markov-Bayes assumption, except the correlation
coefficient in this case was set to -0.1. Figure 3c illustrates the condition of self-krigability, when the secondary
attribute has no correlation to the primary attribute, thus reverting to a simple kriging solution. Although not a
totally appropriate use of KED (Figure 3d), the porosity map using KED shows a slightly wider range of
porosity values. This approach would be similar to a Markov-Bayes assumption using a -1.0 correlation.
CONDITIONAL SIMULATION AND UNCERTAINTY ESTIMATION

INTRODUCTION
Stochastic modeling, also known as conditional simulation, is a variation of conventional kriging or cokriging.
An important advantage of the geostatistical approach to mapping is the ability to model the spatial covariance
before interpolation. The covariance models make the final estimates sensitive to the directional anisotropies
present in the data. If the mapping objective is reserve estimation, then the smoothing properties of kriging in
the presence of a large nugget may be the best approach. However, if the objective is to map directional
reservoir heterogeneity (continuity) and assess model uncertainty, then a method other than interpolation is
required (Hohn, 1988).
Once thought of as stochastic artwork, useful only for decorating the walls of research centers (Srivastava,
1994a), conditional simulation models are becoming more accepted into our day-to-day reservoir
characterization-modeling efforts because the results contain higher frequency content, and lend a more
realistic appearance to our maps when compared to kriging.
Srivastava (1994a) notes that, in an industry that has become too familiar with layer-cake stratigraphy, with
lithologic units either connected from well-to-well or that conveniently pinch out halfway, and contour maps that
show gracefully curving undulations, it is often difficult to get people to understand that there is much more
inter-well heterogeneity than depicted by traditional reservoir models.
Because stochastic modeling produces many, equi-probable reservoir images, the thought of needing to
analyze more than one result, let alone flow simulate all of them, changes the paradigm of the traditional
reservoir characterization approach. Some of the realizations may even challenge the prevailing geological
wisdom, and will almost certainly provide a range of predictions from optimistic to pessimistic (Yarus, 1994).
Most of us are willing to admit that there is uncertainty in our reservoir models, but it is often difficult to assess
the amount of uncertainty. One of the biggest benefits of geostatistical stochastic modeling is the assessment
of risk or uncertainty in our model. To paraphrase Professor Andre Journel it is better to have a model of
uncertainty, than an illusion of reality.
Before reviewing various conditional simulation methods, it is useful to ask what is it that we want from a
stochastic modeling effort. We really need to consider the goal of the reservoir modeling exercise itself,
because the simulation method we choose depends, in large part, on the goal of the study and the types of
data available. Not all conditional simulation studies need the Cadillac approach, when a Volkswagen
technique will do fine (Srivastava, 1994a).
WHAT DO WE WANT FROM A CONDITIONAL SIMULATION METHOD?

Srivastava (1994a), in an excellent review of stochastic methods for reservoir characterization, identifies five
major types of stochastic simulation model approaches:
Assessing the impact of uncertainty.
Monte Carlo risk analysis.
Honoring heterogeneity.
Facies or rock properties (or both)
Honoring complex information.
The interested reader should refer to the original article for details, which is only summarized in this
presentation.
Assessing the Impact of Uncertainty

Anyone who forecasts reservoir performance understands that there is always uncertainty in the reservoir
model. Performance forecasts or volumetric predictions are often based on a best case model. However, the
reservoir engineer is also interested in other models, such as, the pessimistic and optimistic case. These
models allow the engineer to assess whether the field development plan, based on the best case scenario, is
flexible enough to handle the uncertainty. When used for this kind of study, stochastic models offer many
models consistent with the input data. We could then sort through the many realizations, select one that looks
like a downside scenario, and find another that looks like an up-side model.
Monte Carlo Risk Analysis

A critical aspect for the use of stochastic modeling is the belief in some space of uncertainty and that the
stochastic simulations are outcomes which sample this space fairly and adequately. We believe that we can
generate a fair representation of the whole spectrum of possibilities and hope that they do not have any
systematic tendencies to show pessimistic or optimistic scenarios. This type of study involves the idea of a
probability distribution, rather than simply sorting through a large set of outcomes and selecting two that seem
plausible. In Monte Carlo risk analysis, we depend on the notation of a complete probability distribution of
possible outcomes, and that the simulation realizations fairly represent the entire population.
Honoring Heterogeneity
Although stochastic techniques are capable of producing many plausible outcomes, many studies only use a
single outcome as the basis of performance prediction. Over the past decade, it has become increasing
apparent that reservoir performance predictions are more accurate when based on models that reflect
possible reservoir heterogeneity. We are painfully aware of the countless examples of failed predictions due to
the use of overly simplistic models. The thought of using only a single outcome from a stochastic modeling
effort is often viewed with disdain by those who like to generate hundreds of realizations.
Srivastava (1994a) argues that even a single outcome from a stochastic approach is a better basis for
performance prediction than a single outcome from a traditional technique that does not honor reservoir
heterogeneity.
Granted, many people will argue with this statement, because that one simulation may be the pessimistic (or
optimistic) realization just by the luck of the draw, probabilistically speaking.
Facies or Rock Properties (or both)
Reservoir modelers recognize two fundamentally different aspects of stochastic reservoir models. The
reservoir architecture is usually the first priority, consisting of the overall structural elements (e.g. faults, top
and base of reservoir, etc.), then defining the geobodies based on the depositional environment (e.g., eolian,
deep-water fan, channels, etc.). Once the spatial arrangement of the different flow units are modeled, we must
then decide how to populate them with rock and fluid properties. The important difference between modeling
facies versus modeling rock properties is that the former is a categorical variable, whereas the latter are
continuous variables. Articles by Tyler, et al. (1994), MacDonald and Aasen (1994), and Hatloy (1994) provide
excellent overviews of these methods.
Though it is conventionally assumed that a lithofacies model is an appropriate model of reservoir architecture,
we should ask ourselves whether this is a good assumption. Just because the original depositional facies are
easily recognized and described, they may not be the most important control on fluid flow. For example,
permeability variations might be due to later diagenesis or tectonic events (Srivastava, 1994 ).
Honoring Complex Information
Stochastic methods allow us to incorporate a broad range of information that most conventional methods can
not accommodate. Many individuals are not so much interested in the stochastic simulation because it
generates a range of plausible outcomes, but because they want to integrate seismic data with petrophysical
data while obtaining some measure of reliability.
Properties of Conditional Simulation

Conditional simulation is a Monte Carlo technique designed to:
honor measured data values
approximately, reproduce the data histogram
honor the spatial covariance model
be consistent with secondary data
assess uncertainty in the reservoir model
Conditional Simulation Methods

The following section is but a very brief review of stochastic simulation methods in common use, followed by a
discussion of important practical advantages and limitations of each method.
The terms stochastic and conditional are sometimes used interchangeably. Technically, they each mean
something different. Stochastic typically connotes randomness to most people. In geostatistics, we define
stochastic simulation as the process of drawing equally probable, joint realizations of the component Random
Variables from a Random Function model. These are usually gridded realizations, and represent a subset of
all possible outcomes of the spatial distribution of the attribute values. Each realization is as called a
stochastic image (Deutsch and Journel, 1992). If the image represents a random drawing from a population of
mean = 0 and variance = 1, based on some spatial model, we would call this type of realization a nonconditional simulation. However, a simulation is said to be conditional when it honors the measured values of a
regionalized variable (Hohn, 1988). For the remainder of this discussion, stochastic and conditional will be
used as equivalent processes.
Non-conditional simulations are often used to assess the influence of the spatial model parameters, such as
the nugget and sill values, in the absence of control data. Each of these parameters has a direct affect on the
amount of variability in the final simulation. Increasing either the sill or nugget increases the amount of
variability in a simulated realization.
Srivastava (1994a) lists the following types of stochastic simulation methods:
Turning Bands
Sequential Simulation -
Gaussian
Indicator
Bayesian
Simulated Annealing
Boolean, Marked-Point Process and Object Based
Probability Field
Matrix Decomposition Methods
We will describe each of these methods in turn.

Turning Bands
This is one of the earliest simulation methods, tackling the simulation problem by first creating a smooth model
by kriging, then adding an appropriate level of noise. The noise is added through a non-conditional simulation
step using the same histogram and spatial model as in the kriging step, but does not use the actual data
values at the well locations. The final model still honors the original data and the spatial model, but now also
has an appropriate level of spatial heterogeneity (Srivastava, 1994a; Deutsch and Journel, 1992).
Sequential Simulation
Three sequential simulation (Gaussian, Indicator, and Bayesian) procedures make use of the same basic
algorithm for different data types. The general process is
1.
Select at random grid node GNi, a point not yet simulated in the grid.
2.
Use kriging to estimate the mean, mi, and variance, i2 at location GNi from the local Gaussian
conditional probability distribution (lGcpd), with zero mean and unit variance.
P (ZSi ZS1 , . . ., ZSi-1) exp [(ZSi -mi)2 / 2i2]
where:
mi is estimated by any of the kriging methods, including kriging with external drift (KED)
i2 is the error variance of mi,
3. Draw at random a single value, zi from the lGcpd, whose maximum spread is 2 around mi
4. Create a newly simulated value ZSi* = mi + zi.
5. Include the newly simulated value ZSi* in the set of conditioning data. This ensures that closely spaced
values have the correct short scale correlation.
6. Repeat the process until all grid nodes have a simulated value.
Selection of the Simulated Grid Node
The first step in sequential simulation is the random selection of a location GN i, then GNi +1, until all grid nodes
contain a simulated value. The order in which grid nodes are randomly simulated influences the cumulative
feedback effect on the outcome. The selection process is random, but repeatable:
For each simulation, shuffle the grid nodes into an order defined by a random seed value.
Each random seed corresponds to a unique grid order.
Different random seed values produce a different path through the grid.
Although the total possible number of orderings is very large, each random path is uniquely identified
and repeatable.
Sequential Gaussian Simulation (SGS) is a method for the simulation of continuous variables, such as
petrophysical properties. In SGS, the procedure is essentially the same as (co) kriging, with the addition of a
bias.
Sequential Indicator Simulation (SIS) is a method used to simulate discrete variables. By creating a grid of
0s and 1s, it uses the same methodology as SGS, which represent lithofacies (pay/non-pay, or sand/shale).
SIS requires the following input parameters:
The a priori probabilities (proportions) of two data classes (Indicators -denoted as I) coded as 0 or 1,
for example:
I(zx) = 1 if zx is shale
I(zx) = 0 if zx is sand
Indicator histogram
The Indicator spatial correlation model
Bayesian Sequential Indicator Simulation is a later form of SIS (Doyen, ET al., 1994). This technique allows
direct integration of seismic attributes with well data using a combination of classification and indicator
methods.
Bayesian SIS input parameter requirements:
Code well data as 0 or 1, as in SIS.
Classify the seismic attribute into two classes (0, 1):
Assuming the two data classes are Normal Distributions, we need the:
Mean and standard deviation of the seismic attribute that is 0.
Mean and standard deviation of the seismic attribute that is 1.
The a priori probabilities of the two classes of seismic data.
The Indicator spatial correlation models.
Simulated Annealing
Annealing is the process where a metallic alloy is heated so that the molecules move around and reorder
themselves into a low-energy grain structure. The probability that any two molecules will follow each other is
known as the Boltzmann probability distribution. Simulated annealing is the application of the annealing
mechanism of swapping the attributes assigned to two different grid node locations, using the Boltzmann
probability distribution for accepting the perturbations (Deutsch, 1994). The process continues until the desired
model conditions are satisfied.
Simulated annealing constructs the reservoir model via an iterative trial and error process, and does not use
an explicit random function model. Rather, the simulated image is formulated as an optimization process. The
first requirement is an objective (or energy) function, which is some measure of difference between the desired
spatial characteristics and those of the candidate realization (Deutsch, 1994). For example, we might want to
produce an image of a sand/shale model with a 70% net-to-gross ratio, an average shale length of 60 m, with
average shale thickness of 10 m.
The image starts with pixels arranged randomly, having sand and shale in the correct global proportion. The
net-to-gross is incorrect because of the random assignment of the sand and shale. The average shale length
and width are too short also. Next, the annealing mechanism swaps attributes at different grid node locations,
applies the Boltzmann probability distribution for accepting the perturbations, and continues until the model
conditions are satisfied.
At first glance, this approach seems terribly inefficient, because millions of perturbations may be required to
arrive at the desired image. However, these methods are more efficient than they appear in theory (Deutsch,
1994).
Boolean, Marked-Point Process and Object Based
Theses methods constitute a family of techniques that create reservoir models based on objects of some
genetic significance, rather than being built up from one elementary node or pixel at a time. To use such
methods, you need to select a basic shape for each lithofacies that describes its geometry. For example, you
might want to model sand channels that look like half ellipses in cross section, or deltas as triangular wedges
in map view. You must also specify the proportions of the shapes in the final model and choose a distribution
for the parameters that describe the shapes. There are algorithms that describe how the geobodies are
positioned relative to each other (that is, can they overlap, and how, or must there be a minimum distance
between the shapes).
After the distribution of parameters and position rules are chosen, follow the remaining steps in the procedure
(Srivastava, 1994a):
1. Fill the reservoir model background with some lithofacies (e.g., shale).
2. Randomly select a starting point in the model.
3. Randomly select one of the lithofacies shapes, and draw an appropriate size, anisotropy and
orientation.
4. Check to see if the shape conflicts with any conditioning data (e.g., well data) or with other previously
simulated shapes. If not, keep the shape, otherwise reject it and go back to the previous step.
5. Check to see if the global proportions are correct, if not, return to step 2.
6. Simulate petrophysical properties within the geobodies using the more classical geostatistical
methods. If control data must be honored, this step is typically completed first, and then the inter-well
region is simulated. Be sure that there are no conflicts with known stratigraphic and lithologic sequences
in the wells.
Boolean or object-based techniques are of current interest in the petroleum industry with a number of
research, academic, and commercial vendors working on new implementation algorithms. In the past,
Boolean-type algorithms could not always honor all of the conditioning data, because the algorithms were not
strict simulators of shape. The number of input parameters made this almost a deterministic method, requiring
much upfront knowledge of the depositional system you wanted to model. Articles by Tyler, et al. (1994) and
Hatloy (1994) provide excellent case studies using Boolean-type methods to simulate fluvial systems.
Probability Field Simulation
This method is an enhancement of the sequential simulation methods described earlier. In sequential
simulation, the value drawn from the local cumulative probability distribution at a particular grid node is treated
as if it was hard data, and is included as local conditioning data. This ensures that closely spaced values have
the correct short scale correlation. Otherwise, the simulated image would contain too much short scale (high
frequency noise) variability.
The idea behind probability field, or P-field, simulation is to increase the efficiency of computing the local
conditional probability distribution (lcpd) on the original well data only. P-field simulation gets around the
problem of too much short scale variability by controlling the sampling of the distributions rather than
controlling the distributions as in sequential simulation (Srivastava, 1994a).
Srivastava (1994b) shows how P-field simulation improves the ability to visualize uncertainty and the article by
Bashore, et al, (1994) illustrates a P-field application for establishing an appropriate degree of correlation
between porosity and permeability.
Matrix Decomposition Methods
Some simulation techniques involve matrix decomposition; L-U decomposition is one such example, using a
matrix represented as the product of a lower triangular matrix, L, and an upper triangular matrix U. This
decomposition can be made unique either by stipulating that the diagonal elements of L be unity, or that the
diagonal elements of L and U be correspondingly identical. In this approach, different outcomes are created by
multiplying vectors of random numbers by a precalculated matrix created from spatial continuity information
supplied by the user, typically as a variogram or correlogram. Matrix methods can be viewed as a form of
sequential simulation because the multiplication across the rows of the precalculated matrix and down the
column vector of the random numbers can be construed as a sequential process in which the value of the
successive node depends upon the value of the previously simulated nodes (Srivastava, 1994a; Deutsch and
Journel, 1992).
Uncertainty Estimation
Once all of these simulated images have been generated, how do you determine which one is correct?
Technically speaking, any one of the simulated images is a possible realization of the reservoir, because each
image is equally likely, based on the data and the spatial model. However, just because the image is
statistically equally probable does not mean it is geologically acceptable. You must look at each simulated
image to determine if it is a reasonable representation of what you know about the reservoir -if not, discard it,
and run more simulations if necessary.
Some of the possible maps generated from a suite of simulated images include:
Mean: This map is the average of n conditional simulations. At each cell, the program computes the
average value, based on the values from all simulations at the same location. When the number of input
simulations is large, the resultant map converges to the kriged solution.
Minimum: Each cell displays the smallest value from all input simulations.
Maximum: Each cell displays the largest value from all input simulations.
Standard Deviation: A map of the standard deviation at each grid cell, computed from all input maps.
This map is used as a measure of the standard error and is used to analyze uncertainty.
Uncertainty or Risk: This map displays the probability of meeting or exceeding a user specified
threshold value at each grid cell. The grid cell values range between 0 and 100 percent.
Iso-Probability: These maps are displayed in terms of the attribute value at a constant probability
threshold.
Practical Considerations for Conditional Simulation

In theory, the amount of conditioning data increases as the number of points simulated, using a
sequential simulation approach. However, only measured data and previously simulated points that fall
within the search radius are used at any given time.
The spatial correlation function is reproduced only for distances within the search radius. Therefore,
the search region must extend at least to distances for which the covariance function is to be
reproduced.
Do not expect exact reproduction of the spatial model, because of uncertainty in the model
parameters.
In sequential simulation, locations are visited according to a random path to avoid artifacts and
maximize simulation variability.
Correct determination of the correlation coefficient between the primary and secondary variable is
crucial for a Markov-Bayes collocated simulation. Over-estimating the correlation may result in overconstrained simulations and a narrow range of outcomes.
Simulation with the KED method may yield an unrealistically wide range of simulated values unless
the external drift is a smoothly varying function (e.g., seismic velocity).
Sparse data sets will produce a wide range of outcomes.
How many simulations should we create and use?

This is often a difficult question to answer, because it depends, in part, on the number of conditioning
data points and the quality of the correlation between the primary and secondary data (if performing a
co-simulation). Only about 100 simulations are required to produce reasonable probability maps or
confidence margins on global parameters.
Only a small number of simulated models, representing minimum, most likely and maximum cases
need to be retained for fluid flow simulations.
Discard geologically unrealistic simulations and recompute more simulations to ensure adequate
summary maps.
The density and quality of the conditioning data control the amount of variability.
Advantages of Conditional Simulation

By approximately reproducing the data histogram and the spatial correlation structure, conditional
simulations provide more realistic reservoir images than (co)-kriging.
Because simulations can reproduce extreme values (tails of histograms) and their pattern of
connectivity, they are useful for simulating hydrocarbon production volumes and rates.
Conditional simulations provide alternative models, which are consistent with the data.
Simulations generate different, but equally probable geological scenarios for use in risk assessment.
Limitations Of Conditional Simulation

CPU and memory intensive.
Large numbers of simulations may create data management problems.
Interpret confidence limits calculated from post processing simulations with caution, because
uncertainty in the conditioning data may be large.
Simulations are very sensitive to covariance model parameters, like the sill and nugget, or correlation
coefficient if using collocated cosimulation.
Sparse conditioning data generally produces a wide range of variability between the simulations.
Although statistically equally probable, not all images may be geologically realistic.
A Conditional Simulation Example

The following figures illustrate a Markov-Bayes, collocated cosimulation of porosity with seismic acoustic
impedance for the North Cowden Field data set. The variogram mode, neighborhood configurations, and
Markov-Bayes assumption are identical to those used in the collocated kriging example described in an earlier
section (Figure 1a and 1b
1b
Figure 1a
Fifty simulations (Figure 2 ) were generated and post-processed to create the mean and standard deviation
maps of the simulations shown in Figure 3a,
Figure 2
3b,
Figure 3a
and 3c
3c
3b
Figure 4a,
Figure 4a
4b, 4c,
4b
and 4d
4d
illustrate a risk map for two different porosity cutoffs and minimum and maximum value maps.
4c
Figure 2 shows eleven of 50 simulations created by a Markov-Bayes collocated co-simulation approach. The
mean of the simulated values is displayed in the lower right corner. Each image is a reasonable representation
of porosity based on the input data. What we see are repeating global patterns with local variability. When you
see repeating features from image to image, you should have more confidence that the values are real.
Figure 3a, 3b,, 3c and Figure 4a, 4b, 4c, 4d show the results of post processing the 50 simulations.
Figure 3a is the mean of the 50 simulations compared to the collocated cokriging result (Figure 3b). The
standard deviation or standard error map (Figure 3c) of the 50 simulations ranges from 0 to about 0.8 porosity
percentage units. This map provides a measure of uncertainty based on the input data and spatial model.
Figure 4 shows the maximum (Figure 4a) and minimum (Figure 4b) values simulated at each grid node. Do
not use these as the pessimistic and optimistic cases. These are computed maps, not simulated results.
These displays only show the range of simulated values. Figure 4c shows the probability that the porosity is
8 %, and Figure 4d shows the probability that porosity is 10 %. These displays are very useful for risk
analysis.
Other Points To Consider When Performing A Simulation

Stochastic simulation methods assume that the data follow a Normal Distribution. If the sample data are
reasonably normal, then it may not be necessary to transform them. This assumption is easily checked using
q-q plots, for example. (A graphical approach that plots ordered data values against the expected values of
those observations. If the values follow a particular reference distribution, the points in such a plot will follow a
straight line. Departures from the line show how the data differ from the assumed distribution.) However, if the
data are skewed (Figure 5a ), it may be necessary to transform the data.
Figure 5a
One commonly used transformation that transforms any data set into a Normal Distribution is the Hermite
polynomial method (Wackernagel, 1995: Hohn, 1998). This method fits a polynomial with n terms to the
histogram and maps the data from one domain to another. Variogram modeling, kriging and simulation are
performed on the transformed variable. Then, back-transform the gridded results using the stored Hermite
coefficients. A Hermite polynomial transform on 55 porosity data is shown in Figure 5a, 5b, 5c,
5b
and 5d
5d
5c
The shape in Figure 5a shows a truncated porosity distribution (no values lower than 6 %) because a cut-off
was used for pay estimation. This approach to pay estimation creates the skewed distribution. If we want to
honor this histogram (Figure 5a) in the simulation process, we must transform the original data into a
Gaussian (normal) distribution (Figure 5b). Figure 5c and Figure 5d show the results of a Hermite polynomial
modeling approach to transform the data. The modeled histogram (blue) is superimposed on the raw
histogram (black) in Figure 5c. The cumulative histogram (Figure 5d) shows a reasonably good match
between the model (blue) and the original (black) data. The purpose of the transformation is to model the
overall shape and not every nuance of the raw data, which may only be an approximation.
PUBLIC DOMAIN GEOSTATISTICS PROGRAMS

OVERVIEW
INTRODUCTION
Several very good public domain geostatistical mapping and modeling packages are available to anyone with
access to a personal computer. In this section, five software packages are reviewed, with information on how to
obtain them. For a more complete review, see the article by Clayton (1994).
The geostatistical packages STATPAC, Geo-EAS, GEOPACK, Geostatistical Toolbox, and GSLIB are
reviewed according to their approximate chronological order of appearance in the public domain. These
packages are fairly sophisticated, reflecting the evolution in personal computer graphics, interfaces, and
advances in geostatistical technology.
These programs are placed into the public domain with the understanding that the user is ultimately
responsible for its proper use. Geostatistical algorithms are complicated to program and debug, considering all
the possible combinations of hardware and operating systems.
Another word of caution is that the different authors use different nomenclature and mathematical conventions,
which just confuses the issue further. The International Association of Mathematical Geologists has attempted
to standardize geostatistical jargon through the publication of Geostatistical Glossary and Multilingual
Dictionary, edited by Richardo Olea.
STATPAC
STATPAC (STATistical PACkage) is a collection of general-purpose statistical and geostatistical programs

developed by the U. S. Geological Survey. The programs were complied in their current form by David Grundy
and A. T. Miesch. It was released as USGS Open-File Report 87-411-A, 87-411-B, and 87-411-C, and was
lasted updated in May 1988.
The programs were originally developed for use in applied geochemistry and petrology within the USGS. The
geostatistical program only works for 2-dimensional spatial data analysis. This early program was developed
for the older XT PCs, and thus does not take advantage of the quality graphical routines now available. The
limited graphic capabilities may discourage beginning practitioners from using this software, even though
STATPAC may have some advantages over other public domain software (Clayton, 1994).
Order STATPAC from the following sources:
Books and Open-File Reports
U. S. Geological Survey
Federal Center
P. O. Box 25425
Denver, CO 80225
Telephone: (303) 236-7476
Order Reports: OF 87-411-A, B, C
Cost: About $100
GeoApplications
P. O. Box 41082
Tucson, AZ 85717-1082
Telephone: (602) 323-9170
Fax: (602) 327-7752
Cost: call or fax for current pricing
GEO-EAS
Evan Englund (USGS) and Allen Sparks (Computer Sciences Corporation) developed Geo-EAS
(Geostatistical Environmental Assessment Software) for the U.S. Environmental Protection Agency for
environmental site assessment and monitoring of data collected on a spatial network. Version 1.2.1 was
compiled in July 1990.
Geo-EAS provides practical geostatistical applications for individuals with a working knowledge of
geostatistical concepts. The integrated program layout, interface design, and excellent user's manual makes
this an excellent instructional or self-study tool for learning geostatistical analysis (Clayton, 1994)
Order Geo-EAS from the following sources:
Computer Oriented Geological Survey
P. O. Box 370246
Denver, CO 80237
Telephone: (303) 751-8553
Cost: call for current pricing
National Technical Information Service
Springfield, VA 22161
Telephone: (707) 487-4650
Fax: (703) 321-8547
Cost: about $100
IGWMC USA
Institute for Ground-Water Research and Education
Colorado School of Mines
Golden, CO 80401-1887
Telephone: (303) 273-3103
Fax: (303) 272-3278
GeoApplications
P. O. Box 41082
Tucson, AZ 85717-1082
Telephone: (602) 323-9170
Fax: (602) 327-7752
Cost: call or fax for current pricing
GEOPAC
This is a geostatistical package suitable for teaching, research and project work released by the EPA. S. R.
Yates (U. S. Department of Agriculture) and M. V. Yates (University of California-Riverside) developed
GEOPACK. Version 1.0 was released in January 1990.
GEOPACK is useful for mining, petroleum, environmental, and research projects for individuals who do not
have access to a powerful workstation or mainframe computer. It is designed for both novice and experienced
geostatistical practitioners (Clayton, 1994)
Order GEOPACK from:
P. O. Box 370246
Denver, CO 80237
Telephone: (303) 751-8553
GEOPACK
Robert S. Kerr Environmental Research Laboratory
Office of Research and Development
U. S. EPA
Ada, OK 74820
IGWMC USA
Institute for Ground-Water Research and Education
Colorado School of Mines
Golden, CO 80401-1887
Telephone: (303) 273-3103
Fax: (303) 272-3278
GEOSTATISTICAL TOOLBOX
FSS International, a consulting company specializing in natural resources and risk assessment, makes
Geostatistical Toolbox available to the public. The program was developed and written by Roland Froidevaux,
with Version 1.30 released in December 1990.
Geostatistical Toolbox provides a PC based interactive, user-friendly geostatistical toolbox for workers in
mining, petroleum, and environmental industries. It is also suitable for teaching and academic applications.
The program has been rigorously tested and is recommended for anyone wanting an excellent 2-dimensional
geostatistical package.
Order Geostatistical Toolbox from the following sources:
P. O. Box 370246
Denver, CO 80237
Telephone: (303) 751-8553
FSS International Offices at:
800 Millbank
245 Moonshine Circle
False Creek South

Vancouver, BC
Canada V5Z 3Z4
Reno, NV 89523
USA
10 Chemin de Drize
1256 Troinex
Switzerland
P. O. Box 657
Eppling 2121
NSW, Australia
GSLIB
The GSLIB is a library of geostatistical programs developed at Stanford University under the direction of Andre
Journel, director of the Stanford Center for Reservoir Forecasting. Oxford University Press published the
users guide and FORTRAN programs authored by Clayton Deutsch and Andre Journel (1992).
GSLIB addresses the needs of graduate students and advanced geostatistical practitioners, but is also a
useful resource for the novice. GSLIB is the most advanced public domain geostatistical software available,
offering full 2-D and 3-D applications. The program library does not contain executable code, but rather
uncompiled ASCII FORTRAN program listings. These programs will run on any computer platform that can
compile FORTRAN. Although the users guide is well written and documents the program in an organized textlike fashion, with theoretical background, the novice may find introductory texts, such as Hohn (1988) or
Isaaks and Srivastava (1989), useful supplementary reading.
Order GSLIB from the following sources:
Oxford University Press
Business and Customer Service
2001 Evans Road
Cary, NJ 27513
Telephone: 1-800-451-7756
Order: GSLIB: Geostatistical Software Library and Users Guide by Clayton V. Deutsch and Andre Journel
(ISBN 0-19-507392-4)
Cost: $49.95 plus $2.50 postage
You can also order this book through most bookstores.
TERMINOLOGY
In compiling this list of geostatistical terminology, only the most commonly encountered terms were selected.
No attempt was made to duplicate the more extensive glossary by Richardo Olea (1991). Some definitions
may differ slightly from those of Olea.
Admissibility (of semivariogram models): for a given covariance model, the kriging variance must be 0,
this condition is also known as positive definite.
Anisotropy: refers to changes in a property when measured along different axes. In geostatistics, anisotropy
refers to covariance models that have major and minor ranges of different distances (correlation scale or
lengths). This condition is easiest seen when a variogram shows a longer range in one direction than in
another. In this module, we discuss two types of anisotropy:
Geometric anisotropic covariance models have the same sill, but different ranges;
Zonal anisotropic covariance models have the same range, but different sills.
Auto-correlation: a method of computing a spatial covariance model for a regionalized variable. It measures
a change in variance (variogram) or correlation (correlogram) with distance and/or azimuth.
Biased estimates: seen when there is a correlation between standardized errors and estimated values (see
Cross-Validation). A histogram of the standardized errors is skewed, suggesting a bias in the estimates, so
that there is a chance that one area of a map with always show estimates higher (or lower) than expected.
Block kriging: Kriging with nearby sample values to make an estimated value for an area; making a kriging
estimate over an area, for example estimating the average value at the size of the grid cell. The grid cell is
divided into a specified number of sub-cells, a value is kriged to each sub-cell, and then the average value is
placed at the grid node.
Cokriging: the process of estimating a regionalized variable from two or more variables, using a linear
combination of weights obtained from models of spatial auto-correlation and cross-correlation. The multivariate
version of kriging.
Conditional bias: a problem arising from insufficient smoothing which causes high values of an attribute to be
overstated, while low values are understated.
Conditional simulation: a geostatistical method to create multiple (and equally probable) realizations of a
regionalized variable based on a spatial model. It is conditional only when the actual control data are honored.
Conditional simulation is a variation of conventional kriging or cokriging, and can be considered as an
extrapolation of data, as opposed to the interpolations produced by kriging. By relaxing some of the kriging
constraints (e.g. minimized square error), conditional simulation is able to reproduce the variance of the
control data. Simulations are not estimations; their goal is to characterize variability or risk. The final map
captures the heterogeneity and connectivity mostly likely present in the reservoir. Post processing conditional
simulation produces a measure of error (standard deviation) and other measures of uncertainty, such as isoprobability and uncertainty maps.
Correlogram: a measure of spatial dependence (correlation) of a regionalized variable over some distance.
The correlogram can also be calculated with an azimuthal preference.
Covariance: a measure of correlation between two variables. The kriging system uses covariance, rather than
variogram or correlogram values, to determine the kriging weights, . The covariance can be considered as
the inverse of the variogram, and equal to the value of the sill minus the variogram model (or zero minus the
correlogram).
Coregionalization: the mutual spatial behavior between two or more regionalized variables.
Cross-correlation: a technique used to compute a spatial cross-covariance model between two regionalized
variables. This provides a measure of spatial correlation between the two variables. It produces a bivariate
analogue of the variogram.
Cross-validation: a procedure to check the compatibility between a data set, its spatial model and
neighborhood design. First, each sampled location is kriged with all other samples in the search
neighborhood. The estimates are then compared against the true sample values. Significant differences
between estimated values and true values may be influenced by outliers or other anomalies. This technique is
also used to check for biased estimates produced by poor model and/or neighborhood design.
Drift: often used to describe data containing a trend. Drift usually refers to short scale trends at the size of the
neighborhood.
Estimation variance: the kriging variance at each grid node. This is a measure of global reliability, not a local
estimation of error.
Experimental variogram: a measure of spatial dependence (dissimilarity or increasing variability) of a
regionalized variable over some distance and/or direction. This is the variogram that is based upon the sample
data; upon which the model variogram will be fitted.
External drift: a geostatistical linear regression technique that uses a spatial model of covariance when a
secondary regionalized variable (e.g. seismic attribute) is used to control the shape of the final map created by
kriging or simulation.
Geostatistics: the statistical method used to analyze spatially (or temporally) correlated data and to predict
the values of such variables distributed over distance or time.
h-Scatterplot: a plot obtained by selecting a value for separation distance, h, then plotting the pairs Z(x) and
Z(x+h) as the two axes of a bivariate plot. The shape and correlation of the cloud is related to the value of the
variogram for distance, h.
Histogram: a plot, which shows the frequency or number of occurrences (Y-axis) of data, falling into size
classes of equal width (X-axis).
Indicator variable: a binary transformation of data to either 1 of 0, depending on whether the value of the data
point surpasses or falls short of a specified cut-off value.
Interpolation: estimation technique in which samples located within a certain search neighborhood are
weighted to form an estimate, such as the kriging technique.
Inverse distance weighting: Non-geostatistical interpolation technique that assumes that attributes vary
according to the inverse of their separation (raised to some power).
Iso-probability map: maps created by post processing conditional simulations to show the value of the
regionalized variable at a constant probability threshold. For example, at the 10 th, 50th (median), or the 90th
percentiles. These maps provide a level of confidence in the mapped results.
Kriging: a method of calculating estimates of a regionalized variable using a linear combination of weights
obtained from a model of spatial correlation. It assigns weights to samples to minimize estimation variance.
The univariate version of cokriging.
Kriging variance: see estimation variance.
Lag: a distance parameter (h) used during computation of the experimental covariance model. The lag
distance typically has a tolerance of one-half the initial lag distance.
Linear estimation method: a technique for making estimates based on a linear weighted average of values,
such as seen in kriging.
Model variogram: a function fitted to the experimental variogram as the basis for kriging.
Moving neighborhood: a search neighborhood designed to use only a portion of the control data point during
kriging or conditional simulation.
Nested variogram model: a linear combination of two or more variogram (correlogram) models. It has more
than one range showing different scales of spatial variability; for example, a short-range exponential model
combined with a longer-range spherical model. Often, it involves adding a nugget component to one of the
other models.
Nonconditional simulation: a method that does not use the control data during the simulation process; quite
often used to observe the behavior of a spatial model and neighborhood design.
Nugget effect: a feature of the covariance model where the experimental points defining the model does not
appear to intersect the y-axis at the origin. The nugget represents a chaotic or random component of attribute
variability. The nugget model shows constant variance at all ranges, but is often modeled as zero variance at
the control point (well location). Abbreviated as Co by convention.
Ordinary (co-)kriging: a technique in which the local mean varies and is re-estimated based on the control
points in the search neighborhood ellipse (moving neighborhood).
Outliers: data points falling outside about 2.5 standard deviation of the mean value of the sample population
possibly the result of bad data values or local anomalies.
Point kriging: making a kriging estimate at a specific point, for example at a grid node, or a well location.
Positive definite: see admissibility.
Random function: the random function has two components: (1) a regional structure component manifesting
some degree of spatial auto-correlation (regionalized variable) and lack of independence in the proximal
values of Z(x), and (2) a local, random component (random variable).
Random variable: a variable created by some random process, whose values follow a probability distribution,
such as a normal distribution.
Range: the distance where the variogram reaches the sill, or when the correlogram reaches zero correlation.
Also known as the correlation range or correlation scale, it represents the distance at which correlation
ceases. It is abbreviated as a by convention.
Regionalized variable: a variable that has some degree of spatial auto-correlation and lack of independence
in the proximal values of Z(x).
Risk map: see Uncertainty Map
Simple kriging: the global mean is constant over the entire area of interpolation and is based on all the
control points used in a unique neighborhood (or is supplied by the user).
Semivariogram: a measure of spatial dependence (dissimilarity or increasing variability) of a regionalized
variable over some distance; a plot of similarity between points as a function of distance between the points.
The variogram can also be calculated with an azimuthal preference. The semivariogram is commonly called a
variogram. See also correlogram.
Sill: the upper level of variance, where the variogram reaches its correlation range. The variance of the
sample population is the theoretical sill of the variogram.
Smearing: a condition produced by the interpolation process where high-grade attributes are allowed to
influence the estimation of nearby lower grades.
Stationarity: the simplest definition is that the data do not exhibit a trend; spatial statistical homogeneity. This
implies that a moving window average shows homogeneity in the mean and variance over the study area.
Stochastic modeling: used interchangeably with conditional simulation, although not all stochastic modeling
applications necessarily use control data.
Support: the size, shape, and geometry of volumes upon which we estimate a variable. The effect of which is
that attributes of small support are more variable than those having a larger support.
Transformation: a mathematical process used to convert the frequency distribution of a data set from
Lognormal to Normal.
Unique neighborhood: a neighborhood search ellipse that uses all available data control points. The
practical limit is 100 control points. A unique neighborhood is used with simple kriging.
Uncertainty map: these are maps created by post processing conditional simulations. A threshold value is
selected, for example, 8 % porosity, an uncertainty map shows at each grid node, the probability that porosity
is either above or below the chosen threshold.
Variogram: geostatistical measure used to characterize the spatial variability of an attribute.
Weights: values determined during an interpolation or simulation, that are multiplied by the control data points
in the determination of the final estimated or simulated value at a grid node. To create a condition of
unbiasness, the weights, , sum to unity for geostatistical applications.
SUGGESTED REFERENCE
Olea, R. A., 1991, Geostatistical Glossary and Multilingual Dictionary, New York, Oxford University Press, 177
pages.

Basic Geostatistics

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Basic Geostatistics

Uploaded by

Copyright:

Available Formats

BASIC GEOSTATISTICS

Role of Geostatistics in Reservoir Characterization

calculation of experimental covariance model

interpretation and modeling

5. Search Neighborhood Design

LIMITATIONS OF A GEOSTATISTICAL APPROACH

Greek notation describes Populations: measures of a population are called parameters

Roman notation describes Samples: measures of a sample are called statistics

Statistical Notation: Mean of a Population

Statistical Notation: Correlation Coefficient

Statistical Notation: Standard Deviation of a Population

Statistical Notation: Mean of a Sample ( )

Lag distance (distance between two sample points)

Standard deviation of a sample

POPULATIONS AND SAMPLES

Parameters, Data, and Statistics

= the number of combinations of samples

= the number of elements in the population

= the number of elements in the sample

Oilfield Applications to Sampling

TRIALS, EVENTS, AND PROBABILITY

Mutually Exclusive Events

0 represents no chance of occurrence, and

1 represents certainty that the event will occur.

Coin Toss Experiment

and we define the conditional probabilities of B given A as follows:

Bayes Theorem on Conditional Probability

Additive Law of Probability

Multiplicative Law of Probability

RANDOM VARIABLES AND THEIR PROBABILITY DISTRIBUTIONS

THE PROBABILISTIC APPROACH

RANDOM VARIABLE DEFINED

TWO CLASSES OF RANDOM VARIABLES

Discrete Random Variables

Frequency Tables and Histograms

Continuous Random Variables

Probability Distributions Of The Discrete Random Variable

n! / (n -r)!r! different ways, resulting in the equation:

P = [(5!/5!0!] [1] [0.95]

P = [(r + x -1)!/(r -1)!x!][(1 -p)x pr

Poisson Probability Distribution

events occur independently,

the length of the observation period is fixed in advance,

n, the number of trials becomes very large, and

p, the probability of success on any one trial becomes very small.

The equation in this case is

= probability of occurrence of the discrete random variable X

= the number of discoveries

= the number of prospects in the population

= the number of holes drilled

= the number commercial reservoirs

The probability of one discovery is about 50%.

Frequency Distributions Of Continuous Random Variables

In this section, we will discuss the following common distribution functions:

Normal Probability Distribution

Normal Probability Distribution

The significance of the Central Limit Theorem is twofold:

(1 standard deviation) contains 68.3% of the data

(2 standard deviations) contain 95.46% of the data

(3 standard deviations) contain 99.73% of the data

Normal Approximation to the Binomial Distribution

UNIVARIATE DATA ANALYSIS

measures of spread, and

Minimum: Smallest value.

X10 and Y10