The Bayesian Estimation of Common Parameters from Several Responses Author(s): George E. P. Box and Norman R. Draper Reviewed work(s): Source: Biometrika, Vol. 52, No. 3/4 (Dec., 1965), pp. 355365 Published by: Biometrika Trust Stable URL: http://www.jstor.org/stable/2333689 . Accessed: 08/11/2011 15:52
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp JSTOR is a notforprofit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org.
Biometrika Trust is collaborating with JSTOR to digitize, preserve and extend access to Biometrika.
http://www.jstor.org
355
(2.1)
Oh E(eiu) = 0
E(e4 ) =
(h= 1,2,...,m),
E(ciu e1) = 0
=
all i,j,u * v;
i
E(ejueju)
oi,
j, all u.
is E, where
may (These are verygeneralmodels. In cases met in practice,considerablesimplification the time occur. For example, it may happen that xiu = xju. In the examples given,xiu = tu, at which the uth experimentwas run and is in fact independent of i. Another possible simplicationis that not all of the 0's may appear in all of the models.) the vector ofuth observaY2u ..., yku) to represent This impliesthat ifwe writeyu = (ylu, = 1,2, ..., n) on each of the k variables then the variancecovariancematrix of Y. tions (u
111 012
0=
... ...
Olk
I21
k1 Lkl
022
0,2k
=
~~~~~~~{0.ij}
k2
...
O'kkJ
of t Research supportedby the United States Navy throughthe Office Naval Research.
356
GEORGE
E.
P. BOX
AND NORMAN
R. DRAPER
The elements of y. and Y, i * v are = {1ii}. and where oij= oji. Let A = 1= the vector of the m parametersin equations = (01, 02, ..., O,, represent ) uncorrelated.Let 0' normaldistribution. a What can be deduced (2a1). Suppose yl, Y2u .... 'YkU follow multivariate about the possible values of the elementsof O? We definethe quantities
vi
q4=l
which are the sums of squares and sums of products of the deviations of the observed yiu's fromtheirrespectivemodels. the wereknown, likelihoodwould be a monotonic Now ifthe variances and covariances oij k k functionof the quadratic form z= E vijVij.
i=l j=1
Minimization of this quadratic formprovides the generalization of the method of least would result if it were known that simplification squares due to Aitken (1935). A further the elements of the vector observations were uncorrelated (oij= 0, i * j). In this case, to of minimization z would correspond standardweightedleast squares, usingthereciprocals and it is this ofthe variances as the weights. In mostpractical problemsthe oijare unknown case we shall discuss. 3. BAYESIAN SOLUTION Since the n sets of observationsYu = (y1u,Y2u, p(yI0,oii) =
(27T)Ink
...,
Yku)
IAlinexp 2
(3.1)
wherewe denote the data by y' = (yj, y', . .., Yn) for priordistributions 0 and We shall apply the Bayes theoremusing 'noninformative' the 0rij. We suppose that little is known ab initio about the values of the constants 0. of Specificallywe suppose that, when suitably parametrized,the prior distribution the 0's will not change very much over a regionin which the likelihoodis appreciable, i.e. we take a locally uniform priordistribution P(O) oc dO. The invariance theory of Jeffreys (1961) leads to p(oii) = Ji, where J is the Jacobian so As ID(o1)/D(1i?)I.shown in the Appendix, J = IAK(k+l), that (see also Savage, 1961; Geisser & Cornfield, 1963; Tiao & Zellner, 1964)
p(i p(oj.) =
A '(k+l) IAI(k1) A
{

2 E
. C H ojvij dO doi
(3.2)
of To findthe marginaldistribution 0 we can integrateout the cii by comparingthe righthand side of the joint posteriordistributionwith the Wishart distribution(Wilks, 1962, page 551, equation 18.2.27), notingthat, whileWilks's variables are the vij,ours are the oii.
that It follows
(y)
{IvjI
lndO
357
a for density involving, therefore,remarkably simpleform theposterior apart We find, is, which ofcourse, of only of from constant, a power thedeterminant thedispersion a matrix of generalization least we a function 0 only. In particular, noticethat an appropriate of provided.If we wishto obtainpointestimates forthe 0h' we squaresis immediately a7h the shouldminimize value ofthedeterminant IvijI. It is, of interest notethe relation to between thiscriterion the one thatwouldbe and of matrix wereknown.If Vijis the cofactor vjj in if appropriate the variancecovariance we can write vijI, k k k k
zo=
vij=
i=1
E vijvij =
EijVij j=1
weights the oii. estimates theoiireplacing unknown of maximumlikelihood this specialcases. in of We nowconsider general result thelight somefamiliar
4.
i=1
j=1
SPECIAL
CASES
model 441. Univariate linear regression Whenk = 1 andf1(xg, 0) = p(O) where v = nrn
q=1
q4=l
(lu
q=l
q x
1954;Dunnett& Sobel,1954). (Cornish, multivariate tdistribution i.e. thewellknown rnodels with independent regression 4*2. Multivariate observations, linear the there a linear is so model, that vector of Supposethatfor ithelement theobservation
f (xg u,0)=
q=1
m E
XqXiu (i=1,2,...,k).
..Yku)
= 0 distribution thecommon of distributed thatoyij 0,i * j. Thentheposterior so pendently is givenby theproduct p(O) = constant
where v = nm,
are inde
+ (O
=
)'
tsYiu m
q=1
(0
)}
2
(n1)84
U4=l
O 6qXu
the estimate of 0 from ith set of observationsi = 1, 2, ..., k,and Oiis the maximumposterior multivariatetdistributions.
23
(x' , x4u,.. ., xi) is the uth row (u = 1, 2, ..., n) of Xi, i.e. by the product of k independent
Biom. 52
358
GEORGE
E. P. BOX
AND NORMAN
R. DRAPER
= Suppose in the previous case k = 2, m = 1 and the models are given by f1 of distribution 0 is now the product of two univariate tdistributions
p(0) = constant
f2=
0. The
A1 +
where Oirepresents the maximum posteriorestimate of 0 fromthe ith set of observations of i = 1,2. This corresponds the fiducialdistribution the weightedmean given by Fisher to (1961). The weightedmean problemis the simplestexample ofthe moregeneralproblemof fromseveral responses. (See also Sukhatme, 1938; Yates, 1939; combininginformation James, 1959).
5.
AN EXAMPLE OF THE MULTIVARIATE 0 IN ESTIMATION SYSTEM OF A SINGLE
PARAMETER
A NONLINEAR
Considera chemical reaction in which a product A is decomposingto formB in such a way that the rate ofreactionis proportionalto the proportionY, ofA leftunchanged.With a Y2 representing proportionof A which has reacted to formB, the systemis describedby
thedifferential equations
Yil= 1,
17I=
OIL=
42
> 0)
(5.1)
at t= oo.
with respectto time t,with boundary conditions wherethe dot denotes differentiation
Y20
at
t= O V,
i=0?,
12=1
= ys?u Vju+ eju (u = 1, 2, .. n, i = 1, 2), whereoy c2 and p are unknown. = = ) = oj', = and where E(ejue2u) Pcr10r2 0i12, E(eju) 0, E(e4 It should be noted that Yl + Y2 * 1 when the responsesare observed separately. In consideringa parameter like the specificrate 05which is essentially positive, it is probably most realistic to take 0 = ln 6,0oo < 0 < oo, as locally uniforma priori. This would be would mean, for example, that having guessed a value for qS, an experimenter about equally preparedto accept a value twice as big as he would to accept onehalfas big. Suppose two observationsare taken at each offivedistinctvalues ofthe timet. If a single response,Yi alone, or Y2 alone is available, the posteriordistributionof 0 will be given by P( IYi) Cc vjf5(i = 1,2). If, however,both Yi and Y2 are observed, the posteriordistribution fromboth sources will be whichnow makes use of information
P(0jY)0C (V11v22 v12)
A set of manufactureddata for this type of example is given in the second and third columnsofTable 1,labelled Example 1. These data wereobtained by addingrandomnormal deviates to calculated values and taking lrll = 0 0004, 0r22 = 00016 and o12= 00004 so was P12 = 0 5. Fig. 1 shows the posterior betweenthe errors that the correlationcoefficient for distributions Yi alone, Y2 alone and forYi and Y2 taken together. fromYi and Y2 separately,we see that they we VVhen considerthe posteriordistributions 0. presentconsistentevidence concerning As is to be expected,the precisionofthe estimate
359
fromYi (whichhas the smallervariance) is greaterthan that fromY2. Also, even though Yi providea distribution to and Y2 are correlated some extent,the two responsestaken together which is sharperthan eitherof those fromthe individual responses. Table 1. Data for examples,k
Example 1
tu Y1u Y2u
2, m = 1
Example 2
Yiu
Y2U
1 1 2 2 4 4 8 8 25 
0 907 *915 *801 *825 *649 *675 *446 *468 *233 *187
0127 *064 *134 *200 *274 *375 *570 *535 *792 *803
0 907 *915 *801 *825 *649 *675 *446 *468 *233 *187
0*142 *079 *160 *188 *315 *416 *624 *589 *838 *849
Y1and Y2
20
Y1alone
10
Y2alone
18
1.7
1.6
15
14
O=lnb
Fig. 1. Posteriordistributions.Example 1.
Lack offit
It is important noticehow the overall criterion to in may be upsetby lack offit a particular 0) response. Suppose the tth functionfitsbadly. Then the residual quantities yt ft(xtu, may become excessivelylarge in magnitude even forthe indicated 'best' value of 0. Thus the vtj(= vjt)(j = 1,2, ..., k) will be affected and so will the cofactors (j VJj(=VJt) = 1, 2, ..., k). k k We recall that
zo =
&=1 j=l
XE JVi. Vi
act Since the cofactors as estimatedweightswe see that lack of fitof one factorcan affect the weightgiven to anotherin the overall criterion.
232
360
GEORGE
E.
P. BOX
AND NORMAN
R. DRAPER
A second example (Example 2, Table 1) will serve to demonstratethe situation which may arise whenlack offitexists. In generating thesedata, the Yi columnwas taken as before but a different value of 0 was used to generatethe Y2 column. On inspectionof Fig. 2, we notice immediatelythat the evidence about 0 fromYi alone contradictsthe evidence from Y2 alone. This ofitself clearlyindicatesthat eitherone orboth ofthe modelsare inappropriate. We notice that lack of fit makes it appear that Yi and Y2 togetherprovide less precise evidence about 0 than does Yi alone.
20 
/\Y, alone
15 isY1
1\
andY2
10
1*8
_1.7
16
1.5
14
1.3
O=ln9
Fig. 2. Posteriordistributions.Example 2.
We are rarely in the position that we can safely a&sume a mathematical model to be adequate. Rather our attitude to the model should be that it is to be 'tentatively entertained' and provisionshould be made to check it. In multivariateproblemsit is not only necessaryto checkthe adequacy ofeach responsemodel individuallybut also to check their mutual consistency. In the practical analysis of problems of this sort it is importantthat the investigator should notresortimmediatelyto the joint analysis of responses. Rather he should: (i) Check the individual fitof each responseby analysing residuals, of from various responsesby comparing the (ii) Considerthe consistency the information posteriordistributions. In those cases wherehe is satisfiedwith the overall fithe can then proceed with the joint analysis. A formalmultivariatelack of fittest (which we hope to discuss in more detail at a latertime) could also be used in conjunctionwiththe less formalanalysismentionedabove.
6. AN EXAMPLE OF MULTIVARIATE
ESTIMATION OF MORE THAN
SYSTEM
The informative when thereis more power of this multivariateapproach is most striking than one parameter.This will be illustratedby a further example. In the study of kinetic mechanisms,a chemistwill oftenbe concernedwith estimating more than one parameter,forexample, the rate constants forthe system,and will wish to
361
do so using measurementsof several chemical products. Perhaps the simplest example which can illustratethis situation is a reaction of the type A  B > C. If VI, Y2 and 93 representthe proportionsof reactants A, B and C present at a particular time t, then the systemmay be describedby the differential equations
81
Y2 =
 01VP,
VI  02V21
V2Y2
(6.1)
Y3
with boundary conditions VI = I Y2 = V3= 0 at t = 0, and where the dot denotes difwithrespectto time t,and 01, 02 are unknownrate constants. Equations (61) ferentiation solution have t, j = e1L (eOl e02t) (62) 01/(021)i 1 + (02 eO1t+ b1 e02t)/(W2A 01). As before,it is probably most natural for the experimenterto think in terms of the logarithmsofthe rate constants,Oi = ln Xi (i = 1, 2), and to regardthese as locally uniformly distributeda priori. Suppose observationsYl, Y2 and y3 of all threeproducts were available, two independent multivariateobservationsbeing made at each of six distinctvalues of time t, as shown in Table 2. We shall suppose these observationshave arisenfrom12 independent experimental runs, as would be the case if each run were carried out in a sealed tube, reaction being terminatedat the appropriatetime by sudden cooling. There are now threevariances and three covariances, all unknown.
Y2 = V3 =
Yiu
Y2U
Y3U
1 1 2 2 4 4 8 8 16 16
0.959 *914 *855 *785 *628 *617 0480 *423 *166 *205 *034 *054
0025 *061 *152 *197 *130 *249 0.184 *298 *147 *050 *000 *047
0028 *000 *068 *096 *090 *118 0.374 *358 *651 *684 *899 *991
The estimationsituationis completelyportrayedby the posteriordistributions.Analysis of Yi supplies information only on the parameter01. However, joint posteriordistributions for01 and 02 may be calculated whichreflect information the about 01 and 02 comingfrom (a) Y2 alone, (b) y3 alone, (c) Y2 and y3 jointly,and (d) Yl, Y2, and y3 jointly. It is particularly instructiveto compare these in the formof superimposedcontour diagrams. For clarity, we show in Fig. 3 onlya singlecontourforeach distribution.In each case, thisis the contour which includes approximately 99*75% of the posterior distribution.This approximate Bayesian regionis given by that contourforwhich (Box & Cox, 1964) logp(#)logp(0) = 1X2(1JC) where a = 00025.
362
GEORGE
E.
P. BOX
AND NORMAN
R. DRAPER
It is easyto see thatthiswouldgivetheprecise if region the jointdistribution weremultia very variatenormal. Fig. 3 provides goodillustration thevalueofconsidering whole of the thana single rather the distribution posterior pointestimate.Essentially same pointhas madeby Barnardin discussing likelihood beenrepeatedly the principle.
082In O2
Y3alone
'4 alone
Y,,Y2and Y3
>
02;
sum, or some such functionof the specificrates. That this is the case forthis example is contour shown by the oblique 'northwestto southeast' orientation the crescentshaped of forP(01 021Y3). In this particular example we see, by inspection of the solution fory3 in equations (6 2), that y3 is symmetric 01 and 02 and hence in 01 and 02* It follows in that, when y3 alone is considered,if any point (01, 02) is included in the Bayesian region,so must be its mirror image (02, 01) in the 01 = 02 axis. (This can lead, as in this example, to a double maximum.) The informationsupplied by the intermediate product Y2 is of essentially different character. From it we obtain information both specificrates but principally on the on difference ratio of the rates. It is noticeable that this regionis obliquely orientedin a or directionapproximatelyat rightangles to that ofP(01, 021Y3). When information fromY2 and y3 iS combined,we find,as would be expected, a much smaller regionin the neighbourhoodof the intersectionof the regionsforY2 alone and y3 alone. Finally,we can considerthe effect adding Yi whichprovidesinformation 01only. of on
of In any sequential reaction >B  Ca..., ... we shouldexpectthat observation A etc., on onlythe endproduct in ourexample)wouldprovide (C accurateinformation onlythe
363
Study of the probabilitycontourfrom P(01, 02IY1Y y3) shows,again as would be expected, Y2) that the regionis changedprincipallyin beingnarrowedin the 01 direction.Point estimates may be obtained corresponding the maximumposterior to densities.The estimatesavailable fromthe various sources are shown in Table 3. This example perhaps serves to illustratethe caution and commonsense which ought to be applied in analysingthis type ofproblemand the necessityforconsidering each problem individuallyand not hopingfor'automatic' answersfrom computerprogram.The authors a know of one case where an elaborate consecutivemechanisminvolvingten constants was postulated, but observationswere taken on the single endproductonly and on no intermediate product. An iterativenonlinearestimationroutinerun on a computerconverged only veryslowlyon to what appeared to be nonsensicalanswers.This was undoubtedlydue to the peculiarities of the tendimensionallikelihood surface,which probably contained multidimensional ridges and multiplemaxima. Observations on additional responses and application of the theory of this paper, would have eliminated many, if not all, of the ambiguities. In the complete set of data just used the reaction is almost complete when the last observationis taken. In practice,forone reason or another,data ofthiskind occurin which the available observationstrace only part ofthe course ofthe reaction.We have illustrated the effect this by omittingthe last fourobservationson Yl, Y2 and y3 and repeatingthe of analysis. As shown in Fig. 4, over the ranges studied, the contoursforP(1IY2) and p(0yI3) do not close. Neverthelessquite precise estimationis possible using Y2 and Y3 togetherand the addition of Yi improves the estimation further.Table 3 shows the final estimates remarkablyclose to those obtained before. Table 3. EstimatesforO1and o2fromvarioussources
When all 12 observationsare used Responses used
Y2
Y3*
01
1561 1619
1585 1.570
U2
1715 1273
1561 1.565
0.942 1022
0701
Yl, Y2'
*
Y2' Y3
0.695
Y3
O0707
provides alternativeestimates;
7. APPENDIX (see ? 3) We use results due to Hsu, given by Deemer & Olkin (1951); the full notation of that paper is not requiredhowever. Result. If E = {orij} and A = then the Jacobian
1

has value IAI(k+1). Proof. By Property 5B. 3, page 366 of Deemer & Olkin, D(A, E) = D(A*, E*), where A* = dA and E* = dE are the matrices of differentials. Taking differentials E = A' of
provides

AlA*A1
364
GEORGE
E.
P. BOX
AND NORMAN
R. DRAPER
by Theorem 4 4, page 357, ofDeemer & Olkin. By Theorem 37and Corollary3 7, page 348 (A')' we see that of Deemer & Olkin, and noting that since A is symmetric,A' D(A*, E*) = IAI(k+l)whence the resultfollows.
3
02=ln
02
Y3alone
Y2alone
YF,
Y2eand r
variate analysis, based on lectures by P. L. Hsu. Biometrika, 38, 34567. DUNNETT, C. W. & SOBEL, M. (1954). A bivariate generalization of Student's tdistribution, with tables for special cases. Biometrika, 41, 15369. FISHER, R. A. (1961). The weighted mean of two Normal samples with unknown variance ratio.
55, 42. Box, G. E. P. & Cox, D. R. (1964). An analysis of transformations.J. R. Statist. Soc. B, 26, 21143. tdistribution associated witha set ofnormalsample deviates. CORNISH, E. A. (1954). The multivariate Austr.J. Phys. 7, 53142. DEEMER, W. L. & OL:KIN, I. (1951). The Jacobians of certain matrix transformations useful in multi
AITKEN, A. C. (1935). On least squares and linear combinationof observations. Proc. Roy. Soc. Edin.
Sankhya, 23 A, 10314.
GEISSER,
JEFFREYS,
365
L. J. (1961). The Subjective Basis of Statistical Practice. (Mimeographedmanuscript.) Universityof Michigan. SUEHATME, P. V. (1938). On Fisher and Behrens' test of significance the difference means oftwo for in normal samples. Sankhya,4, 3948. TIAO, G. C. & ZELLNER, A. (1964). On the Bayesian estimation of multivariate regression. J. B. Statist.Soc. B, 26, 27785. WILKS, S. S. (1962). Mathematical Statistics. New York: John Wiley and Sons, Inc. YATES, F. (1939). An apparent inconsistencyarising from tests of significance based on fiducial distributions unknownparameters. Proc. Camb. Phil. Soc. 35, 57991. of