Professional Documents
Culture Documents
4, 1984
Douglas M. Hawkins 2
The borehole slotting method previously proposed by Gordon and Reyment for two bore-
holes may be extended to three or more boreholes, but soon becomes computationaUy in-
tractable. We propose another method based on the mixture model o f cluster analysis, the
computational labor o f which increases linearly with the number o f boreholes. This method
produces profiles o f the unknown strata to which the individual boreholes are easily matched
by the two-borehole slotting method.
KEY WORDS: geophysical logs, multivariate statistics, duster analysis, linear programming,
dynamic programming.
INTRODUCTION
0020-5958/84/0500-0393503.50/0 1984PlenumPublishingCorporation
394 Hawkins
tics of the different strata are. These assumptions put the problem in the "cluster
analysis" class. If detailed prior information on the strata were available, the
problem would be in the "discriminant analysis" class.
What distinguishes this problem from straightforward cluster or discriminant
analysis problems is the constraints on the grouping imposed by the fact that
the strata common to different boreholes appear in the same order at each
borehole.
Despite the considerable work on mathematical formulations of this strati-
graphic correlation problem (see for example the entire contents of Computers
and Geosciences, vol. 4, no. 3, 1978), Mann (1981) avers that "a fully opera-
tional system is yet to be established."
Later we propose two general formulations. One, an extension of the
"slotting" method (Gordon and Reyment, 1979), is based on dynamic program-
ming (DP) and produces a stratification by means of a guaranteed optimal (by
one of several criteria) allocation of each layer to a single stratum. This method,
however, is computationally intractable if there are more than about four bore-
holes. For such larger data sets, we propose a different mixture model which
concentrates on inferences about the strata, as a by-product of which it pro-
duces good, but not necessarily optimal, stratigraphic correlations of the bore-
holes. This method is tractable for data sets with an arbitrary number of bore-
holes, provided each is not too deep.
(a) The readings taken at each borehole are made at regular or arbitrary spacings
so that several or perhaps even all successive layers at a borehole might be-
long to the same stratum. In this case, when carrying out the DP search, all
values o f D i between 0 and n i must be investigated.
(b) The borehole data have already been segmented, so that each layer in the
borehole is known to come from a different stratum. In this case, it is not
permissible to allocate more than one layer from a particular borehole to the
same stratum, and so the only permissible values o f D i are 0 and 1.
(c) If the boreholes form a sequence, or more generally, if their spatial arrange-
ment is taken into account, the user might wish to require that if a stratum
is represented at both boreholes i and i', then it must also be represented at
all boreholes i " with i < i " < i'. This means that when considering the trial
b o t t o m layer allocations D1, . . . , D k one must exclude the choices D i,, = 0
whenever D i > 0 and D i, > O.
(d) There may be reason to believe that all strata are present at all boreholes
with thicknesses at the various boreholes that do not differ by more than
some ratio R, say. If this is the case then one must consider o n l y D i satisfying
max D i ~ R min D i
i
396 Hawkins
Note that as is usual in DP, but unlike most other optimization methods, these
costraints on the decision variables Di do not make the calculations more diffi-
cult, but in fact simplify them by reducing the number of options to be investi-
gated at each step.
and
' = max
Sm i,j s , t .
[ max
~i',j' s . t .
di,/,i,,i, ]
s(i,])=m s(i',j')=m
i',]' 4=-i,]
S' = max S(m) (2)
m
These two stress functions correspond to the single linkage and complete
linkage criteria of cluster analysis, and, along with other related linkage criteria,
have the necessary separability properties for the Principle of Optimality to hold.
They are, thus, amenable to optimization using DP.
Stratigraphic Correlation of Several Boreholes 397
Note that T is fixed for the data, while A depends on the proposed stratification.
Standard Manova measures of stress include:
i=1 j=ni-Di+l
k ni
A ~ = Z Z ( x , - x * ) ( x , - ~*)',
i=1 j=ni-Di+l
A MIXTURE MODEL
To set the stage for the actual model, let us return to some general cluster
analysis concepts (e.g., Hawkins, 1982). Suppose we have M source populations,
with observations Xj coming with a prior probability of 7rm from the ruth, which
is N ( ~ m X). The marginal distribution of an X i is
M
Z mx(G, x)
m=l
This is a mixture model. Given a set of data, one may estimate the parameters
rim, ~m, and Z , and also the set of posterior probabilities 7tim that X i came
from source m given X]. Availability of these parameter estimates implies that,
if desired, the X / c o u l d subsequently be classified back to their sources as a dis-
criminant rather than a clustering problem.
A slightly different derivative algorithm is the allocation method. This be-
gins with the observation that, whatever the prior probabilities, each )(1. in fact
came from one of the M sources. By defining Wjm 1 if X! came from source m
=
and 0 otherwise, we may then seek a set of Wjm to optimize some criterion,
for example, likelihood. From these, Wjm estimates of the ~rn and 2 may be
computed.
Parameter estimates ~:m and 1 are common to the two methods of imple-
menting the mixture model. The quantities Zrim of the mixture model are analo-
gous to the Wjm of the allocation method, and, in fact, in the standard mixture
analysis, the optimal posterior classification is to allocate X / t o that source m
for which 7rim is a maximum.
400 Hawkins
{10 if s(i,j)=m
Wijm = otherwise
s(i,j)>~s(i,]- 1 ) + R
that is, each successive layer must belong to a stratum at least as deep as the
layer above it, or strictly deeper if R = 1. This implies that
M
Z t~(Wijm - Wi]-l, m ) > R (7)
m=l
~rn = E 2 WqmXq/nrn
i j
nm= Z Z Wiim
i j
= I 2 xi;x;j - Z m'm
]
N = 2mnrn (8)
and the likelihood is then maximized by minimizing 1"21.
q
The meanings of these quantities are: ~:m is the estimated mean vector of
stratum m; n m the /xtotal number of layers from stratum m intersected by the
different boreholes; ~ the within-stratum covariance matrix.
Stratigraphic Correlation of Several Boreholes 401
wij m >~ 0
M
Wiim= l i=l,...,k; j= l ..... Ni
m=l
M
E w(Wifrn-Wijm, m))R i=1 ..... k; j:l . . . . . Ni
r/~=l
then the problem is one in linear programming (LP). More particularly (ii) be-
comes the solution of the following k problems:
s.t. wij m ~ 0
~'~Wi/rn = 1 ]= 1 , . . . ,N i
m
~_, m ( w i i m - wi,]_l,m) ~R f= 2,... ,Ni
m
Some reductions of this problem lead to the following sparse and computa-
tionally much more efficient two-phase LP formulation to be solved for each i
(Phase 1) Max fl
(Phase 2) Max f2
subject to
gi U,.
fl + E vj+ E rj=0
]=1 ]=2
Art" M
f2 +E E (dj;,
]=1 m=l
M
V]+ y " (dTm-d[m) =0 ] =l,...,Ni j =1 . . . . . Ni
rrt=l
M
m(d[rn - dim - di+l,m + d/-_l,m) - u] + t]
m=l
M
= R + Z m ( r y - l , r n - r}, m) ]'=2 ..... N i
m=l
0 <~ dim <<. 1 - rim ; 0 <<-dim <<-rim ; 0 <~ vj; uj; t / a n d f l , f2 unbounded, a formu-
lation having 2 N i - 1 constraints, and N i ( 2 M + 3) variables. The bounds on d +
and d - need not be regarded as constraints if the whole problem is solved using
a bounded variable linear programming code.
Note the distinction between the computational load of this formulation
and the DP formulation. The mixture formulation has effectively decomposed
the problem so that a separate LP is solved for each borehole. The computational
labor is, thus, effectively proportional to the number of boreholes, while with
DP it rises exponentially with the number of boreholes. On the other hand, the
Stratigraphic Correlation of Several Boreholes 403
Layer
Stratum 1 2 3 4 5 6
1 0.24 . . . . .
2 . . . . .
3 . . . . . .
4 . . . . . .
5 0.76 0.97 . . . .
6 - 0.03 1.00 0.92 0.85 0.78
7 . . . . . .
8 . . . . .
9 . . . . .
10 . . . . . .
11 . . . . . .
12 . . . . . .
13 . . . . . .
14 . . . . . .
15 . . . . .
16 . . . .
17 . . . . . .
18 . . . . . .
19 . . . . . .
20 - - - 0.08 0.15 0.22
DP 1 5 6 12 14 20
Allocation
LP's solution time varies at least cubically with the n u m b e r of layers per bore-
hole. Thus, while the DP formulation is effective for two or three boreholes of
m a n y layers, the m i x t u r e f o r m u l a t i o n is suitable for almost arbitrarily m a n y
boreholes, but with n o t too m a n y layers e a c h - s a y , fewer than 20 layers per
borehote.
A final step is involved. The m i x t u r e formalism does n o t solve the strati-
graphic correlation problem directly, since the allocation o f layers to strata is
a probabilistic one and, as is well illustrated by Table 1, it is n o t clear what is
really implied by saying that a layer comes from stratum 1 with probability
0.24, or from stratum 5 with probability 0.76. The m i x t u r e f o r m u l a t i o n does,
however, make it easy to complete the final allocation step using the ~m and Y,.
These estimates of the true stratigraphy m a y be treated as if they were the re-
sults from a single hypothetical borehole, and each of the actual boreholes can
be m a t c h e d to it individually using the straightforward two-borehole DP algo-
rithm. Matching each borehole to the estimated stratum sequence then also
implicitly matches the borehole sequences to one another, thus providing a
solution to the stratigraphic correlation problem.
404 Haw~ns
EXAMPLE
The procedure has been applied to some data obtained from the Libanon
gold-mine. A total of 11 diamond drill holes were sunk from stopes along the
normal to the auriferous reef. The boreholes covered a lateral area of approxi-
mately 1.5 km 2 and were up to 30-m deep. The cores so obtained were divided
on the basis of their appearance (e.g., pebble packing and mineralization) into
distinct layers and these were assayed for their minor and trace element com-
position, 32 elements being assayed.
Up to 19 horizons were intersected by the boreholes. There is good reason
to believe that these represent some of the distinct strata of the Witwatersrand
sedimentary succession, but that because of variations in the depositional envi-
ronments, these strata may appear or disappear at the different boreholes. As
there were no visual markers to correlate that strata, it was hoped that trace
element geochemistry would provide suitable reliable chemical ones. A log
transformation of the grades (after addition of suitable constants) was found
to give acceptable marginal normality, and these transformed data were used
in the stratification. As the major elements (Si and 02) were not among the 32
measured, it was not necessary to take any account of constant sum constraints.
Of the 32 elements, 15 were below detection limits in many of the samples and
were dropped from the analysis; the remaining 17 were used.
As the layers were believed to be distinct, the R = 1 version of the proce-
dures was used. From 19 up to 25 strata were fitted, the entire analysis requiring
1000 Cps on a Cyber 170/750 computer.
The full output, consisting of thee:m, 1~, Wiim, and DP allocations for each
m, is too voluminous for inclusion here, but Tables 1 and 2 show up some fea-
tures of the method. Table 1 shows the wij m resulting from the mixture model
in one of the shorter boreholes. They show quite clearly the problems that arise
in interpreting these mixture posterior probabilities since the mixing probabili-
ties all involve only strata 1, 5, 6, and 20.
Following the mixture estimation, the DP allocation was run, and con-
verged in two cycles. The allocations for this borehole are given as the last row
of Table 1, and show some interesting features: layer 1 is allocated to stratum 1
rather than 5 (which is needed for layer 2), and the indeterminacy of layers 3
to 6 mixing over strata 6 and 20 in various fractions is resolved by the alloca-
tion of layers 4 and 5 to strata 12 and 14 (which strata did not figure in the w
at all).
Stratigraphic Correlation of Several Boreholes 405
19 -29.6 -26.0
20 -31.0 -26.4
21 -31.4 -27.4
22 -31.5 -27.4
23 -31.7 -26.8
24 -32.8 -28.5
25 -33.3 -28.9
26 -33.5 -29.2
ACKNOWLEDGMENTS
REFERENCES
Brower, J. C., Millendorf, S. A., and Dyman, T. S., 1978, Methods for the quantification of
assemblage zones based on multivariate analysis of weighted and unweighted data:
Comput. Geosci., v. 4, p. 221-227.
Cormack, R. M., 1971, A review of classification: Jour. Roy. Star. Soc., v. 134A, p. 321-353.
Delcoigne, A. and Hansen, P., 1975, Sequence comparison by dynamic programming:
Biometrika, v. 62, p. 661-664.
Dempster, A. P., Laird, N. M., and Rubin, D. B., 1977, Maximum likelihood from incom-
plete data via the EM algorithm: Jour. Roy. Stat. Soe., v. 39B, p. 1-22.
Gordon, A. D. and Reyment, R. A., 1979, Slotting of borehole sequences: Jour. Math.
Geol., v. 11, p. 309-327.
406 Hawkins
Hawkins, D. M., 1974, Computing mean vectors and dispersion matrices in multivariate
analysis of variance: Appl. Stat., v. 23, p. 234-238.
Hawkins, D. M., Ed., 1982, Topics in applied multivariate analysis: Cambridge Press, Cam-
bridge, England.
Hawkins, D. M. and Merriam, D. F., 1974, Zonation of multivariate sequences of digitized
geologic data: Jour. Math. Geol., v. 6, p. 263-269.
Mann, C. J., 1981, Stratigraphic analysis: decades of revolution (1970-1979) and refine-
ment (1980-1989), in, D. F. Merriam, (Ed.), Computer applications in the Earth
Sciences: Plenum Press, New York.
Millendorf, S. A., Brower, J. C., and Dyman, T. S., 1978, A comparison of methods for the
quantification of assemblage zones: Comput. Geosci., v. 4, p. 229-242.
Nemhauser, G. L., 1967, Introduction to dynamic programming: John Wiley & Sons, New
York.