Box, Wetz Technical Report PDF

---*- ----- ---------------- -*--
DEPARTMENT OF STKIISTICS
-_-----------1111-- ----,-dm,---
University c f Th'isconsin
Madison, 14.Jisconsi~~ 5 '3706
TECHNICPIL REPORT NO. 9
Work done 1 9 6 4
Report i s s u e d 1Q73
CRITENA FOR JUDGING ADEQUACY OI?
ESTIMATION BY AN APPROXIMATING RESPONSE

FUNCTION
by
George E, P. Box and John Wetz
T h i s rcsearch w a s supportzd by i!le U.S. Navy t h r o u g h t h e Olfice

of fiaval Research x n d c r Curltract No. ONR-NO00 14-63A-0 128-0017.
CHAPTER 1
INTRODUCTION
1 t was realized by Sir Ronald Fisher i n t h e early 1 9 2 0 ' s that con-
clusl.ons t.o be drawn f r c m rouhnely recorded aata were of limited validity,
l'c overcome t h e ambiguities and ulicertanties connected w i t h s u c h
analy s j s, he introduced the concept of designed experiments i n w h i c h
randomi zation, replicatjon, blocking, and orthogonality of d e s i g n were
ccnlral features ( F i s h e r , 1958, 1939, 1960; Fisher and Yates, 1963).
To understand s o m e of the difficulties and how Fisher's ideas
overcame them, it is convenient to t h i n k i n terns of a specific example.
Suppose one were interested i n studying t h e dependence of a response
y, which might be, fcr example, t h e yield of wheat, upon a number of
vdrjables x x
1' 2'
. . . ,xk such a s the amount of potassium, nitrogen, a ~ d
phosphorus i n t h e soil, t h e rainfall, or temperatures a t various t i m e s 01
year. Imagine t h a t xl, x2, . . . , x N which includes t h e subset x l , xz, . . .

' Xk
represents a completely comprehensive listing of all underlying
v ~ s i a b l se w h i c h rnjght affect y. Denote the. functional relationship
c o n n e c t i n g t h e levels of 3;. w i t h t h e response y by

1
.
y . = f[x1, , * * , x ~ ) (1. 3 t
Further suppose t h a t over the range I n which the x ' s vary, a l i n e a r
f u n c t i o n a l r e l a t i o n s h i p adequately represents t h e dependence. T h a t is,
f o r +.he range of variation concerned w e can write i n place of { I . 1)

N
1 , 1 Wc*lsecsa Correlations Eliminated bv Randomizal:ion
In practice all :he N factors affecting y are not: known, and o x
.;iu9;r is confined lo a c e r t a i n subset. Lct t h i s subset consist of t h e
f j r sf k factcrs, Then tzrc can write
w h c c~ E may be calle-",a n error term that includes t h e influence of all the
n variables %+p
ur?k~ow - a 9 XN- Suppose these are n observations from
t h i s system so t h a t in matrix notation the n-dimensional vector -

y of
observations m a y be w r i t t e n
r =4il
+E,% (1~51
w h c c X 8 f s the "error term" contributed by the unknown variables. Now

-2-2
i f we write a rno.3~1
nofj ng that t h e components of -

c are linear functions of t h e variables
xk+] . . . , xN, t h e n from t h e Central Limit Theorem with thc usua.1
proviscs, t h e d i s t r i b u t i o n of - w i l l tend t o normality

f as t h e number of
cclrnpc;nc-2% x k+13 • " 7 XN becomes large. If, however, the parameters 0

-1
j?l w h i c h w e arc specifically interested are estimated by least squares w e
w i l l no*L I n ger?eral clht_ain u nhiased estimates unlcs s the variables ih X

-2
alLeur?co:.rel2ted w i t h those i n X
-1
. The nature of the biases can be seen if
the model is rewritten
Y- = - -
Xl(Bl AB2) + (Xz
+-- - - ---
X1AJB2
where
-
-A = (X',zl)'-X-~ X. ~ (1*8)
The elements of t h e vector - + -

ABz are in fact the expected values of the
least squares estimates
for
A
E(O,) = -- - - + --
(x;~~)-~x;(x~B~
Xz~z)
= o1- +--
ABZ ( 1. 10)
TO illustrate by m e a n s of a topical example, let -

y be an
th
n-dimensional vector of 0 ' s and 1's and consider t h e u element y
U
.
th
Let yU = 1 indicate t h a t t h e u subject had, by a particular date, died of
lung cancer; and let y = 0 indicate that he had not died of lung cancer.
U
X, contain a column of I t s corresponding t o location

Let the n x . 2 matrix -
th
parameter and a second column whose u element is t h e average number
th
of packs of cigarettes smoked daily by the u subject. -
Let X2 he an
n-dimensional vector whose uth element is "the strength of a genetic

th
factor in the u subject which predisposes the person to lung cancer and
also produces a desire to smoke. " Then A

- would contain t h e regression
coefficient of t h e genetic factor on t h e number of p a c k s smoked. If t h i s
regression were large and positive, the finding of a large positive

regression of cancer on smoking could occur even when the t r u e
regression coefficient w a s zero due 20 t h e influence of R e 2 . --

N ~ n s c n s ecorrelation, of which t h i s is a n example, can be
eliminated only by ensuring i n some way that -

XI and -
X, are independent.
Fisher achieved t h i s by introducing the concept of randomized design
into experiments. Me emphasized that wherever possible, information
should be obtained not from records which had just happened but from
a carefully and deliberately staged trial a t w h i c h t h e levels of the
- were at t h e choice of !he experimenter.

variables i n XI
In ge nerd, randomization would consist of t h e following.
- is c h o s e n
Suppose that for a particular experiment t h e actual matrix XI
from a set of M matrices {X . } for which X Y = C where i = 1, 2,.

-1 L - 1 - ..M,
- is a constant matrix.
and C Associated with each element X of t h e set
-1 i
is a probability p , which is t h e chance t h a t X will be selected for t h i s

1 -1 i
particular experiment. Now suppose we choose the set (X ) and a n
-1 i
associated set (p. ) so t h a t
1
where E represents the expectation over the randomization s e t ,

r -
Since XI is
c h o s e n q u i t e independently of -
X 2 , t h e n E(XiXz) = 5 and i f XI is selected
-
r --
h
from a s e t for which X',X = C , t h e n E (el) =
-11-li - r -
el. The effect of unknown
-
variables X,
- in inducing n o n s e n s e correlations would t h u s be eliminated.
1. 2 Replication t o Supply a Valid Estimate of Eror
h
The fact that the estimates -
el i n the previous example were
unbiased would be, by itself, of little value i f w e could not measure them
th
against solne r ~ a s o n a b l yprecise estimate of error. Now the u com-
ponent of the error term -r = X,% is
in which x are regarded as random variables. This has

u, k t l ' '' ' xufl
variance
where r , is t h e standard deviation of x. and p. is t h e coefficient of

1 1 Ih
th
correlation between the jth and h variables. We can obtain a
XI is kept
reliable estimate of V(E ) by making a series of runs in which -
U
- is allowed t o vary freely.
fixed but Xz However, i f any of t h e x's i n Xz
-
are constrained i n any manner w e shall not have a true estimate of V(E ).
U
For example, suppose in a chemical experiment runs are made at rn
distinct lev els of xl, .. . , + An estimate of error might be obtained by
making r observations a t each s e t of m reaction conditions and computing
t h e variance estimate s2 based on m{r-l) degrees of freedarn. Clearly,
however, s5will not be a n unbiased estimate of V(a ) u n l e s s all t h e

U
sources of uncontrolled variation , z+~,
. . . xN, are allowed to vary i n
the same unrestricted manner between t h e r replicates made a t the same
set of conditions as between c b s ~ r v a t i o n smade a t different sets of
conditions,
For example, suppose uncontrollable changes occurring from day to
d a y i n external variables, s u c h as ambient temperature and humidity, m i g h t
be important elements i n E2, and t h a t a series of duplicated experiments
were being made over a period of several weeks. Suppose t h a t two
experimental r u n s could be made on any given day. If duplicate r u n s
were always made on the s a m e day, the effect of tho external variables
A>
would not be t a k e n into account i n t h e estimate of error, Rowever, their
effect would be accounted for i f t h e duplicates were made randomly i n
time.
Thus, as Fishes pointed out, a valid estimate of error could be
obtained by deliberately arranging circumstances s o t h a t r u n s made at
the same experimental conditions were subjected to the same error
i n f l u e n c e s as were runs made a t different experimental conditions.
1. 3 Blockina
A s w e have explained, unbiased estimates w i t h valid standard
errors m a y be obtained by the use of randomization and suitable
replication, and it becomes amropriate to analyze the experiment as i f
an adequate model were
- "X1OI
y -- +r
d
(1. 143
with t h e e l e m e n t s of t h e E vector independent of each other and or't h e
clemcnts of -
XL, J30wever, it may be that although the estimates are
valid, they are also very jrnprecise. Techniques are needed therefore,
which reduce t h e errors of t h e estimates to a m i n i m u m while preserving

validityt It frequently happens that some of the sources of error i n
-E = -XZ-QZ are recognizable although not controllable.
For example, i n chemical experiments it is often necessary to
make experimental runs using different pieces of equipment. It may be
known that s l i g h t differences beween, say, one reactor and another,
w i l l cause differences which will inflate the error term i n a completely
randomized design. (In our model the use of different pieces of e q u i p -
m e n t could be represented by making a n appropriate number of t h e x ' s
- so-called indicator variables.

i n XZ If x is an indicatcr variable i n X
h -,
th
it might take the value 1 when t h e h reactor w a s i n operation, which
would contribute an increment 8 t o y , and the value 0 when t h i s reactor

h
was not in operation. ) Since it is known which reactor is i n operation
a t any given t i m e , allowance for t h e reactor effects i n the analysis can
be made by removing t h i s part of the model from t h e error term and
putting it i n the part t o be estimated. Thus i f
--
where ZIP represents t h e effect of '%lookk"variables, s u c h as reactors or
days, whose influence can be recognized but not controlled, w e may
consider t h e model
where
- --
e = W$ (1. 18)
is t h a t part of the error which remains after removing the systematic effects.
The influence of e w i l l be snialler t h a n that of the original error t e r m ,
X and -
Furthermore, i f it is arranged i n advance that - Z are o r t h o g o ~ a ls o
that -- 0 then the p l s can be estimated as though the block v a r i a b l e s

X'Z = -
did not exist. Since the block variables do not affect the estimation, they
no longer contribute to the error, and a s Fisher indicated, their influence
is thus eliminated, An example of a randomized block design employing
this principle would be one i n which t h e rn sets of experimental con-
ditions were run i n each of r reactors.
1. 4 Orthogonality
In general, the l e a s t squares estimates k w i l l be correlated. If
pi and P . are two quantities in the P to be estimated, t h i s will

group -
1
mean i n particular t h a t the estimate of P, for given $ which we may
1 j'
A
write p . l ~ . i, s a f u n c t i o n o f P thatis,
1 1 I'
h
P, I pj = f(Bj)
A
This dependence of the estimate P1, on the values of the other parameters
i s undersirable, Furthermore, other things being equal, s u c h correlation
is associated w i t h an increase i n the error variance. For example, it h a s
been shown by Hotelling (3944), Tocher (1952), and Box (1952) t h a t for
X ' s t h e minimum varj ance e stimatc s of the

fixed diagonal elements of -
elements of are obtained only when a l l off-diagonal elements are zero.
Fisher therefore recommended the use of orthogonal designs. That is t o
say, t h e set of matrices (x)

- from w h i c h a random selection is t o be made
will normally be a set of m a t r i c e s for whlch the columns are orthogonal.
For exam plc, i f
Xf! = Poxou + B1xl, + Pzxzu + P3x3U (1. 2 0 )
where x
QU
= 1, a z3 factorial d e s i g n might be employed i n which case t h e
set (x)
- could be t h e 8 ! matrices of d i m e n s i o n 8 X 4 whose rows are
(1 -1 1 11, (1 1 1 1) in all possible permutations. Thus f - X'X

,
= I , X 8. -
1.5 The Analysis of Variance
Associated with F i s h e r ' s methods for designing experiments w a s
h i s analysis of variance (ANOVA) technique for analyzing results. In the

L
case of the design employing orthogonal blocks mentioned above, consider
the model
The observations -
y can be thought of as a vector i n the n-dimensional
- and -
sample space and each of t h e c o l u m n s of X Z provides another s u c h
vector, The complete set of s u c h vectors i n -

X defines what w e s h a l l call
a "treatment" hyperplane. The complete s e t of vectors i n -

5 defines what
w e shall ca13 a "block" hyperplane. The values 9- estimated by the fitted

- into the
least squares model are t h e coordinates of t h e projection of y
hyperplane defined by the vectors of -

X and -
%. The least squares estimatc s
h A h
and - are the coordinates of t h e projection - - into t h e
y of the vector y
hyperplane where these coordinates are measured with respect t o the basis
X and -
vectors which are the columns of - 2.
The analysis of variance concerns the squared length 1' of the

Y
observation vector i n relation to the squared lengths l 2 and f 2 of t h e
X z
projections of t h i s vector in t h e "treatment" and "block " hyperplanes,
and t h e squared length 1

' of the residual vector. Since the block and
r
treatment hyperplanes are them selves orthogonal and t h e residual vector
is orthogonal to both, t h e n
f2 =I; + e 2 t 1 2
Y Z r
or equivalently i n matrix notation
- --$ I ' ( ~- 2) .
A A A
- - + +(x
A
- - = R --
yly ' x ' x ~4 ~-
'IJZ'-
ZQ (1. 2 3 )
This is essentially a n application of t h e Pythagorean Theorem and is
- and Z
illustrated diagrammatically i n Figure 1.1 where t h e lines X - each
represent multidimensional spaces. Specifical.ly, i n terms of the
observations 5 the analysis amounts t o
If t h e hyperplane -
X had dimensionality v and t h e hyperplane -
Z had
X
dimensionality v and i f
z
n - v - v = v (1. 2 5 )
X z r'
then v v and v are called t h e degrees of freedom associated with
x' 2
' r
treatments, blocks, and residual.
If the y ' s could be regarded as being normally distributed, Fisher
showed that l % a s distributed as X2 K

'
, where X: is the chi-square
r v
r r
variable with v degrees of freedom. In t h e absence of any real treat-
r
m e n t effects, that is, if t h e p k were all zero, then '1 would be
X
distributed as X2 rZ, and i n t h e absence of any real block effects

V
X
2
P would be distributed a s XZ rZ. A nu11 hypothesis that, for example,
Z v-
5
'
there were no treatment effects could be tested by referring the
statistic ( P ~ V) /(lyv ) to t h e F-distribution. The analysis of

x r
variance not only made such overall tests possible, hut a l s o provided
a convenient way of separating out systematic components of variation.
This facilitated t h e isolation of an appropriate error t e r m from which
joint or individual confidence intervals for t h e elements of could be
obtained.
1. 6 Response Surface Methodolwv
Fisher's designs were worked out specifically to satisfy the needs
of agriculture. However, they have been used with great success i n many
other areas, particularly i n biology. The intention w a s u s u a l l y to test for
t h e existence of some "'efiect" and sometimes t o provide an estimate of its
magnitude. Thus, in a n agricultural experiment t h e question might be,
"Is i t better t o plant seeds 4 inches apart or 6 inches apart? " If p
represented t h e change in yield produced by changing t h e spacing from
4 i n c h e s to 6 inches then an experiment would be regarded as successful
i f it es'tablishcd t h c sign of t h e effect. Thus, one may see reported t h a t a

13
change of seed spacing from 4 i n c h e s to 6 inches producled a positive effect
significant a t t h e 5% level, T h i s could mean, for example, t h a t the estimated

A
effect i n bushels per acre on yield w a s 6 = 2 . 2 with a 9570 confidence
interval extending from 0. 2 to 4. Z . Considerable procjress might result by

A
gaining t h i s knowledge even though the estimate p = 2 . 2 had s u c h a large
uncertainty.
F i s h e r k randomized block, Latin Square, and incomplete block
designs were designed t o provide only simple comparisons between treat-
ments. Often the establishment of t h e sign of the r e s u l t w a s as much a s
could be hoped for because of the Iarcje experimental error which existed.
With t h e introduction of factorial designs, it w a s sometimes possible to
detect more complex effects associated with interactions of variables.
Interaction refers to a situation where t h e effect on t h e response of a
change i n level of variable xl is not the same as that due to the change
i n level of variable xz. Again, however, usually only gross effects were
being sought.
When Fisher's technique of planned experimentation w a s applied
i n other areas, particularly i n the physical sciences, where t h e
experimental error w a s frequently m u c h smaller, it was possible and
appropriate to s e e k more detailed knowledge of t h e relation between a
response and a s e t of variables. Suppose t h c expected value of a
r e s p o n s e , E(y) = q, is related to a set of variables x,, ... ,xk by t h e
equation
q = E(y) = f ( x , , . .. ,xk) . {l.2 6 )

14
In some problens i n t h e physical sciences the nature of t h e function f is
known a priori. For example, in a chemical problem, y might be t h e yield
05 product which i s related to t h e concentrations of reactants by a s e t of
differer-itial equations derived from reaction kinetics. The problem of
checking the adequacy of f i t of such form and of estimating the constants
has been discussed elsewhere (Box and W. G. Hunter, 1962; Hartley,
1961). Here w e shall concentrate o n t h e situation where t h e precise
nature of f is unknown and it is hoped to find and f i t a n empirical graduat-
ing function g(xI, . . . ,\-) which adequately represents f(xl, . . . , x ) over

k
s o m e specific region of interest R.
In s o m e cases an adequate representation over R might be
obtained from a linear approximation

k
2 = Po + 1=1
.zp . xlU
1
(1. 2 7 )
W h e n such a linear approximation w a s slot adequate initially, it
s o m e t i m e s might be made so by s u i t a b l e transformations of the observed
response y or of the x's. The appropriate transformations might be
suggested a priori. For example, i f the response were t h e rate of a
chemical reaction, the logarithm of t h e rate might he expected to be
linearly related to the logarithm of the concentration and to t h e reciprocal
of t h e absojute temperature. In other cases t h e appropriate transformations
m i g h t he deduced from t h e d a t a (Bartlett, 1947; b x and Cox, 1964; Box and
Tidwell, 1362). In those cases where the r e s p n s e appeared to be
monotonic i n t h c x's there would be a possibility of adequate representation

by a function linear i n the x k . In other cases, and i n particular where the
function w a s passing through a m a x i m u m or minimum, t h e lineal apprcxirn~ticn
would bp, wfiolly inadequate and a quadratic approximation might be tried.
Thus, i f there were two variables, w e might tentatively e n t e r t a i n the
second order model
Again, transformations i n the response or in the variables x, and x2 m i g h t
be needed to achieve an adequate representation.
The technique of representing a functional relationship by means
of empirical graduating functions of t h e kinds discussed has c o m e t o be
called response surface methodology. In t h i s area F i s h e r ' s principles of
randomization, replication, and blocking are, of course, a s import ant a s
t h e y are in any other context. In practice t h e r e will be not o n l y t h e k
variables which are currently being considered but a larger number, N
variables, which can affect t h e respense. However, after applying
adequate randomization it will become appropriate, as before, to apply
t h e model
e. as if its elements are distributed independently of

and to treat the -
-
each other and of t h e elements of X and -
Z. In r e s p o n s e surface
methodology, however, additional problems a r i s e and t h e present work
I s concerned with some of these problems.

16
To allow the fitting of empirical functions i n t h e w a y s discussed,
special designs have been developed (Box and Wilson, 1951; Box and
J. S. Hunter, 1957; Box and Draper, 1959; Davies, 1960) which are called
response surface designs. A r e s p n s e surface d e s i g n of order d is

th
specifically arranged for the estimation of a polynomial of d degree.
As m i g h t be expected, particular importance h a s been attached to
designs of first and second order. It is advantageous to incorporate
certain properties i n a response surface design of order d. The design
should allow 1) t h e efficient estimation of a polynomial of degree d, and
2) a sensitive t e s t of fit of t h e model to be made. When a polynomial
of degree d i s being tentatively entertained a s a model, discrepancies
are often caused by the existence of terms of order (d + 1) i n t h e true

model. Designs would normally be selected so that t h i s type 05 lack of
fit would readily announce itself.
A s with t h e Fisher-ian designs, a n analysis of variance i s
associated with the use of t h e d e s i g n and is of great v a l u e in analyzing
the data. Some of the problems which are associated w i t h t h e u s e af
res,aonse surface designs c a n be made clear by the u s e of a n example.
The following table shows a set of results which might be obtained by
performing ar~octagonal second order d e s i g n i n m a variables (Box and
J. S. Hunter, 1957).
Table 1.1 Desian Matrix and Observations
In practice t h e variables would be quantitative. For example, XI might
refer to temperature T and x, to pressure P and t h e variables would be
scaled and centered s o t h a t t h e d e s i g n would cover a range of in?+esest.
T h a t is, we could set
and choose T St, Po, and S s o that appropriate ranges are covered.
0' P
The design i s s e t out diagrammatically i n Figure 1.2 i n which it is
assumed that T = 165" C., S = 10" C . , P = 30 p s i . , and S = 5 ~ s i .

0 t 0 P
The best fitting second order equation
Figure 1.2 Octagonal Design with Observcd Responses
and Fitted Contours
Temperature T
.can be estimated by least squares and the follcwing analysis of varla9ce
table c a n be obtained.
Table 1.2 Analysis of Vziance
Source of Variation
polynomial
Residual
If for t h e m o m e n t the adequacy of some second degree equation is assumedr

t h e n the estimates of the coefficients and their s t a n d a r d errors (calcu.lat~5
u s i n g the residual mean square i n t h e analysis of vzcria~ce)are
The contours of t h e equation

A
y = 8 9 . 6 4 ~ 1 I . 4 ~ , + 1 7 . 6 ~ , - 7 . 9 ~ ~ - 1 9 . 8 ~ ~ - 1 4 . (91 .~2 2, )~ ,
are also shown i n Figure 1. 2,
It is clear from the analysis of variance t h a t t h e hypothesis th3: the
regressioncoefficientsb bl, bZ, brl, bZ2, and b 1 2 a r e a 1 1 z e r o i s d i s -
0
'
proved for the ratio (regression mean square)/(:-esidual m e a n square) =
1402. 6/83.6 = 16. 8 is significant a t well b e y o ~ bt h e 1% level. It is also
clear t h a t a linear polynomial cannot adequal-cly represent the functional
relationship, for the ratio of mean squares 1164.9/83.6 = 13.9 exceeds by a
wide margin the 170level of the appropriate F-distribution.
Xt is one thing, however, to show that coefficients are non-zero

A
and it is quite another to conclude that y is accurately estimated in the
sense that t h e fitted polynomial $(x) represented hy the contours i n
Figure J. 2 provides a reasonable idea of the underlying function f(x).
There are basically two kinds of uncertainty: 1) uncertainty arising from the
sampling error of $, and 2 ) uncertainty arising from bias i n 9 due t o t h e use

of a n inadequate model.
1.7 Uncertaintv Arisina from t h e Sarnalina Error i n C

Suppose a t first that the form of model is adequate. Consider
Figure 1.3 in which it is presumed t h a t a quadratic function i n one
variable xl
A
y = bo + b,x, + b,,xf (1. 33)
has been fitted by least squares and a confidence band calculated as
shown. Then it seems seasonable t o say t h a t the function
Y. = 41x1) ( 1 . 34)
is adequately estimated over t h e region R only i f the average error ir. 41
is not large compared with t h e total change in response C predicted by
the equation.
2,8 Uncertainty Arising from Lack of Fit
Now let u s reconsider-the example. If the second degree equation
is adequate,the expected values for the response at the n sets of conditions
may be w r i t t e n in m a t r i x notation a s
Suppose that in fact additional terms are needed, so that actually
Then the expected vslue of the residual sum of squares S is not v crZ
r r
as it would be i f t h e model (I, 35) were adequate, but
2
E(S ) = A' ?- v cr (1. 3 7 )
r r
where
n 2 = EX;[ -~ --x-( -x ~ -
x ) x,p,
-~x~]
-- (1. 38)
hrow suppose the design contains certain points which are
replicated and the total s u m of squares within replicates is S having v

e e
d e g r e e s of freedom, Then, provided t h e experiment has been proper.ly
conducted so that the differences among replicates reflect all the
relevant so-aces of variation,
and a lack of f i t sum of squares
having v = v
1 r
- v
e
degrees of freedom can be obtaj.r?sd i n which t h c effczct
of lack of fit i s concentrated. For clearly
E(S1) = A' t v u" (1. 411; .

L
where A' is defined as before. In this particular. example t h e center
point has been replicated four t i m e s so that an independent estAmzite of
error based on three degrees of freedom can be calculated and the
residual sum of squares can be further analyzed as follows.

Table 1.3 - --
Analysis cf Variance
Degrees of Sums of
Source of Variation Mean Squ=ea
Reedon Squxes
l a c k of F i t 3 458.3 152.8
Error 3 43.2 14.4
Residual 6 501.5 83. 6
The lack of fit, m e a n square is significantly greeter than that due to e r r c r
at the 5% value and it is clear that we may be misled in t h e nature of the
response surface because of the inadequacy of the functional. form.
Now it is clear that one could have a situ3tion where a very
precise experiment has been run in which the change in response over-
the region of interest w a s many t i m e s greater than the experimental erro;
and the graduating polynomial was representing the function quite well.
Yet, nevertheless, because of-the high precision of the experiment, it
was possible to detect significant (although unimportant) lack of f i t .
The question arises, "Under w h a t circumstances would lack of fit
indicate t h a t a misleading impression of the true response surface was
probably being given? "
This report is concerned with the problem of decidinq when an
adequate estimate of t h e response function has been obtained. This
can be divided naturally into two parts from which t h e following
questions arise. First, "Given t h a t t h e graduating function is

adequate, how can we tell when a sufficiently close estimate of t h e
response function has been obtained? " Second, "How can we k n m
t h a t the graduating function provides an adequate fit aver t h e region
of interest ? " The first question is discussed i n Chapter 2 and t h e
second in Chapter 3. In Chapter 4, approximations which are
needed i n the distribution theory discussed i n Chapters 2 and 3 m
examined. Chapter 5 deals with t h e applj.cation of t h e methods .
derived i n the earlier chapters, and suggestions we made for
further work.
CHAPTER 2
UNCERTAINTY A.?.?ISING FROM SAMPLING ERROR I N $
In this section s u p p o s e t h a t the form of the model is asEequ~?e.
The discussion of Inr-idequate models will. appear ir, t h e next chapter.
Suppose that the m o d e l written i n matrix notation is
-
y = x--
p t - --
ze+-
e (2.1)
X P is t h e component i n which we are chiefly

where, as before, the part --
*
Z represents extraneous effects, s u c h as
interested, and t h e part --
block variables and t i m e trends, which we w i s h t o eliminate. Suppose
also the design h a s been chosen so t h a t X Z are orthogonal.

- and - For
our purpose, it w i l l also be convenierrt to include the overall lncstion
-
parameter in the set Q , Thus, in the simplest possible case there
will be no block variables, Z'will consist of a single vector of 1 ' s arzd

+
-
Q will be the overall m e a n w h i c h is to be "eliminated" in further
analysis.
Suppose t h e model --
X p contains v parameters. For example,
m
i f w e were concerned with fitting a sec0r.d degree polynomial i n two
varjables x, and x, and there were no block effects, then t h e mcdel
would be written
where the variables after elimination of t h e m e a n are indicated by a
,dot notation, thus

/
In general, the analysis of variance table is as follows:

Table 2.1- h-alysis of Variance
s ~ r c of
e Degrees of Expected
Sums of Squares
Variaticn Freedom Sums aE-Squ are s
Elirni nated 4 A n -
V 'Z a =
-Q ' Z-- (yu12
Variable s b u=l u=l
n n n
Model 'I
rn elxlxe z(9U -Y12
-- = u=1 U
z (q -Q2
u-l u
s Y
m
=2
Re sidu a1 A 5
(War)
V
r
(x-x) A -
' (x-xl = ;;l(~u-~u)
A 2
Y
r
wZ
where
For instance, i f only the mean had been eliminated, then
In general, the mean squares and their expectations would be as follows:

Table 2 . 2 Mean Squares end Expected Mean Squares i n the Analysis of
Variance
-
-
--
To t e s t for t h e necessity of including terms X P i n t h e model, t h a t
is, to t e s t t h e null hypothesis a =-

0 against t h e alternative -
P # 5 the
ratio M /M is referred t o t h e table of t h e F - d i s f ~ i b u t i o nwith v and v

rn r rn r
degrees of freedom. A significant F -ratio contradicts the hypothe sis t h a t
the p 3 are a11 zero, A s noted previously, t h i s would not allow us to
conclude further t h a t the fi."cted regression function is adequately
estimated i n the sense of giving a general picture of the nature of t h e
dependence of 3 QII X. - The least squares estimate

-
-y - -y of 3 -
A #
T
J
measures the level of the se s ponse after exkaneous variable s have been
eliminated. Assurance is needed that the error in t h i s estimate y d

- y-
A r c ,
is
N
" not too large compared with t h e real changes i n 3 - 3 which occur over
t h e experimental region.
2.1 An Overall Measure of t h e Error i n y-- - y
A *
For most gractical designs t h e n experimental points s u p p l y a
reasonable coverage of t-he region of interest. In order t o measure the
true changes i n r e s p o n s e over t h e region, w e m a y study the c h a n g e s i n

Eu
t h e quantities q a s t h e index u goes from 1 t o n. A cofivenient
U -?u
measure of the extent of the changes is
T h i s measure can be compared with the errors committed i n estimating

N
the quantities q
u - 2-
- A -
The least squares estimate of r)u - qU is yU - yu and
xfiA = x(xlx)-'X'v
y - Y
The variance-covariance m a p i x of - - is
A -
N
th
The variance of -y is therefore t h e u diagonal e l e m e n t of
U u
- 2 = ---
M X(X'X) - w2 .
- (2.9)
I"'
A
In general, t h e variance 05 y -y will change from one design
U u
point t o the next. We m a y u s e a s an overall measure of variance t h e
average variance of t h e
N
(9U - y U 1" denoted by rr2n -

( Y -y3
which is
A =
(Y-Y )
( n } = {v /n)r
m
2
. (&lo)
Thus, a reasonable measure sf the magnitude of t h e changes in q -q yu
compared with tJlejr errors of estimate is provided by the criterion
The situation is illustrated in one dimension in Pigure 2.1.
X1 Xz x3 x4 x5 Xh
Figure 2.1 - Deviations o f True q Values about their M e a n
Compared with the Spread of the Estimates $ -y.
Suppose observations have been made at the values indicated on
the x-axis, and t h a t the true response is indicated by the curve q.
Suppose furthermore t h a t the assumption preserved in.this chapter is true,
that is, the true response function can be represented by the assumed
form of the equation. Then the criterion y essentially would compare

131
the spread of the true values about their mean 6 indicated by the heavy
lines i n Figure 2 . 1 w i t h t h e average spread of t h e estimates y
A
- y- of
these quantities.
It seems reasonable that t h e adequacy of the estimate of the
response function should be measured by the size of y T h i s criterion

rn'
is arbitrary to s o m e extent, It is however, a sensible one and as shown
below, arises naturally out of t h e a n a l y s i s of variance. The expected
value of t h e mean square associated w i t h the regression after elininat-
ing extraneous variables i n t h e a n a l y s i s of variance (Table 2 . 2 ) is i n
fact
The quantity y 2 is t h e non-centrality parameter w h i c h arises i n t h e

m
F - t e s t based on t h e m e a n square ratio M J Mr
rn
.
Note t h a t there seems t o be ro accepted. cofive~tionof how t h i s
nen-centrality parameter should be defined. As defined here
where 1 is t h e non-centrality parameter defined by Patnaik (1949) and by
Pearson e n d H a r t l e y (1951), 6 is that defined by ~ c h e f £ c(1961),

/ and + is
t h a t defined by Tang (19381,
2.2 Inferences Concerning y

- --
rn
-
In t h e present state of statistj.ca1 development there seems to be
n o universal agreement coficerning t h e basis on w h i c h statis tical
inferences s h o u l d be drawn. One school of t h o u g h t seems to prefer

statements zxpressed i n terms of hypothesis testing and confidence
Intervals, A rival school prefers to express inferences i n terms of
posterior distributions derived from Baye s ' Theorem. Here, w e shall.
derive the distribution theory needed t o m a k e statements of both kinds.
2. 3 Approach from t h e Direction of S y p o t h e s i s Testing
Using t h i s approach we choose a level F and accept t h a t

0
nr'
' i s sufficiently large only if M /M is greater than Fo.
m r
The value
F can be c h o s e n so t h a t when the value of y2 = r2 -/m2h is a s

0 IY-Y 1
s m a l l as r2 there will be only an acceptably small chance t h a t F is
0
greater than F To proceed i n this way, a11 t h a t is needed is t h e

0'
probability t h a t F > Fo for any fixed value of the non-centrality
parameter Y2
m
. This can be determined using t h e non-central F -
distribution,
One approach is t o decide i n advance a m i n i m a l acceptable
value y of ym. For exaiilple, it might be decided t h a t it will be

0
unsafe to use the fitted function u n l e s s it can be demonstrated that
the spread of t h e true response is greater than, say, 4 times t h e
average spread of the sampling error, i n which case y would be e q u a l

0
to 4. To gain the necessary assurance, t h e null hypothesis y -

m -
can be tested against the alternative hypothesis y m > Yo- An
appropriate t e s t criterion would be F = M /M To obtain a test for-

m r'
w h i c h t h e null hypothesis would be wrongly rejected with a s m a l l
probability o, w e could t h e n choose a value F so t h a t w h e n y

0 rn = yo,
33
t h e probability that F > F is equal t o a, The probability that F > F for

0 0
any fixed value of the non-centrality parameter y Z is provided by t h e

rn
non-central F -distribution,
M o w with non-centrality pararneter y - t h e probability t h a t

rn -^ 'QJ
F 3 F is given by the folfowing integral of t h e non-central F -
0
distribution
where
and
Note that the factor w is a " h i s s o n weighting factor" where the quaritity
5
Z 2
zv,yrn is t h e parameter of t h e Poisson w e i g h l i n g distribution.
The non-central IT-distribution was studied by P. C. Tang (1938)
who produced tables for values of P at significance levels of cy = 0.01

I1
and 0.05. P is the chance of f a i l i n g t o reject t h e null hypothesis wherz
IT
the alternate hypothesis is t r u e (error of t h e second kind). Peasson and
Elartley (1951) also studied t h i s distribution and produced charts, each of
which gives ma families of power curves for a = 0. 01 and 0. 05.

The usual application of the non-central T-distribution is i n the
calculation of the power of the analysis of variance t e s t s . For the
present purpose a w i d e r coverage is needed, and required. values for
the probability integral c a n be obtained to sufficient accuracy by u s i n g
an approximation d u e to Patnaik (1949). In t h i s approximation the non-
central F -distribution is approximated by a central F -distribution i n
t h e following way. The non-central F-distribution can be w r i t t e n as
t h e ratio of a non-central Xz-distribution and a Xz-distribuf.ion. The
non-central Xz-distribution can t h e n be approximated using a variate
axZ where a and b are chosen such that t h e mean and the variance of
b
the approximating dis'rribution are t h e s a m e a s for t h e non-central
X2-distribution.
Then
where F ' 2 is t h e non-central F-variate w i t h v and v degrees of

lJ
m J r"m
1, m r
freedom and non-centrality parameter Y2 * X12
rn" is t h e non-central x2-
variate with v degrees of freedom and non-centrality parameter
M
y2
m ; XZ
is t h e central 2 -variate w i t h v
r
degrees of freedom; a d F
byvr
i s the
central F-variate with b and v degrees of freedom* By equating the

r
m e a n s and t h e variances respectively of t h e d i s t r i b u t i o n s of X'2 and
ax2 it can be shown that

b
1 4- 2 y 2
m vm(l + Y;)=
a= and b =-.
1 +v; 1 4- "k
Therefore, the constant a=b/v i s equal to (1
m
+ y rn2 ), and
These results will be illustrated by an example. The following
set of data i s t a k e n from a manuscript of a book by G.E. P. Box and
j. S. Hunter.
2. 4 Example
A central composite rotatable d e s i g n in three variables run in
four blocks yieZded the results shown in Table 2.3.

-
Table 2. 3 Design Matrix and. Observaticns y
Block 51 -
x2 x3
- -
Y
The best fitting second degreo response relationship is given by
A
y = 51.80 + 0. 74xI + 4. 81x2 3 7. 96x3 - 3.82~: + 1.21~:- 6. 26x;
-0. 0. 33x1x2 + 10. 2 7 x l x 3 - 2. 83xZx3 (2. 20)
and the analysis of variance is as follows:
Table 2.4 Analysis of Variance
Degrees of
Source Sums pf Squares M e a n Squares
Freedom
53571.37
Due to Mean 50527.38
Due to Blocks 28.63
Extra due to 1
Degree Polynomi al 464.38
Extra due t o 2
Degree Polynomial
13"*
1584.35
1 3 } 2 9 i ' i ' 48
264.06
i 330.83
Residual 37.88 3.44
Now suppose w e wanted to t e s t t h e null hypothesis that y rn = 4
against the alternative hypothesis t h a t y

rn
> 4. Then b = 9(1 7 l 2 h 3r 7 9 ,
= 2 . 4 7 for cr = 0.05, M J M = 3 3 0 . 8 3 / 3 . 4 4 = 96.17, and t h e

F79, 11 m r
approximate critical value F of M /M from (2.19) is 17(2. 47) = 41.99.
o m r
Since M /Mr = 96.17 >41.99, the null hypothesis that y = 4 is
m m
rejected. This demonstrates that the value of y cannot be as small. a s 4 .
n
Since y = 4 is t h e smallest value t h a t w e are prepared to accept, t h e
m
analysis can be contf nued. The question of confidence intervals shall
now be considered.
2. 5 A Confidence Interval for ym
The present problem does not f i t very naturally into the pattern
associated with hypothesis testing. A somewhat irn prcved formulation
can be made i n terms of confidence intervals. Adopting t h i s approach,
consideration is not limited t o a p a r t i c u l a value of y

rn* Instead, a
range of values for y is d e t e r m i ~ e dwhich is acceptable based on

m
available data.
Suppose a n experiment h a s been run i n which the observed
value of the variance ratio M /M is equal to F. Denote by F' 2

m r v v
rn9 r f Yrn
a non-central F-variate with the appropriate number of degrees of
freedom and non-centrality parameter yZ

m
. To find a 100(1-(r)70
confidence interval for y two values y, and yZ are computed s u c h that

rn
and
In accordance with the general theory of confidence intervals, i n
repeated sampling s u c h an interval w i l l include the ? r u e value of y in

m
100(1-a)% of repeated trials.
Continuing w i t h the example i n Section 2.4, t h e computed value
of F is equal to 96. 17. The lower 9570 confidence l i m i t is obtained
by u s i g ~(2-21) ir. conjunction w i t h .the approximation for F ' 2

Vm' Vr' Y1
from (2.19) which is (1 t y f ) ~ with v = 9 and v = 11. For
.-
b1 Yr m r
several given values of y l , corresponding values of (1 -k y2
')Fb,
L
are computed. The latter values are plotted against the y l values,
and t h e intersection of the curve of (1 + yZ) values with t h e

Fb, v
r
line F = 9 6 . 17 provides the required value of y l which satisfies (2. 21).
This value of y l is 5. 6 5 ,
The upper 9 5 % ccofidence l i m i t I s found i n a similar manner.
Using the approximation, (2. 22) becomes
The intersectj on of t h e curves of F and (1 t y ; ) / ~ each

Y b
r'
plotted against y2 gives the solution of y 2 which satisfies ( 2 . 2 2 ) .
This value is 13.79. Thus, t h e 9570 confidence interval for y is

m
5. 65 < y 13. 7 9 .
m
2. 6 An Approach V i a Bayes' Theorem
In the Bayesian approach, a posterior distribution for y is

rn
calculated. To obtain t h i s distribution it is first necessary to
postulate prior distributions for y and for a. By using

rn
"non-informative prior distributions, " that is t o say, prior distrihuticns
which correspond t o a state of initial ignorance, t h e posterior d i s -
tribution c a n be made to present information coming essentially fro-
the data.
We shall proceed by f i r s t finding t h e posterior distribution of
0 = --
By means of a linear transformation - T P t h e quadratic form P'X'X
- --p/r2 L
vm
can be reduced to a sum of squares -8 "-
/(r2 = E 0f / c r 2 .
i=1l
Let us envision an experimental situation i n ~ v h i c hthe
experimenter has very little knowledge of t h e values of t h e p L s before
the data is collected. Thus, the joint prior distribution of the p k is
disperse i n t h e neighborhood where the likelihood is appreciable. It
will then be correspondingly t r u e t h a t t h e joint prior distribution for
the 0 ' s is approxi.rnately locally uniform.

h
Now i f there is a set of p ' s which are least s q u a r e s
h A
e s t i m a t e s of the P k , then t h e elements of the vector - T p w i l l be
0 = *-
A
the least squares estimates of the 0 ' s . Furthermore, t h e 0 ' s w i l l be
distributed independently of one another and w i l l have t h e same
variance rr2 a s t h e original observations. Thus, t h e joint sampling

A
distribution of t h e 0 ' s is
and
has a non-central X 2 - d i s t r i b u t i ~with

n v degrees of freedom and
rn
non -centsaliliy parameter
where the quantity A' is consistent with the h in (2,13). The

rn
distribution of L
' c a n be expressed a s a Poisson weighted sum of
rn
X2-distributions with $A the "mean" of t h e Poisson distribution.
rn
Thus.
where
But i f a priori t h e 0 ' s are uniformly distributed, t h e n ( 2. 25)
implies t h a t the posterior distribution of e l , O1,. . . , 8 V g i v e n u is

rn
A
Since t h i s expression is symmetric i n 0 and 8 it follows that
also h a s a posteriori a non-central Xz-distribution w i t h v degrees of
m
freedom and non -centrality parameter
The distribution of A' also can be expressed as a Poisson weighted

m
sum of X 2 -.distributions
where
in which A and I, have now switched roles.

m m
The value of cr is generally unknown, but i f a n estimate sZ of (r2
with v degrees of freedom is available and i f the u s u a l a s s u m p t i o n is

r
m a d e that log rr is locally uniform, t h e n t h e posterior distribution of
rr2 given sZ is
I
( v s2/2)"r
P(rrvs2) =
r e x p {- $ ( vr s2/e2) ) . (2-35)
-
To obtain the joint distribution of ~ h n 5-dt
rn
p ( ~1 r2)
k is rnultiplfod
by p(r21 s 2 ) , and then rz is eliminated from this joint distribution by
integrating w i t h respect to t h a t variable. Finally, the postericr
distribution of n2
m
given s 2 is
where
and
Note that L is ratio of the sum of squares due to t h e m o d e l divided 3

q
the residual error mean square. If we set
and
the weight function w is

k
and the distribution of A' given

m
sZ can be written as a weighted sum
If v f s an even integer, the w e i g h t function w can be expressed as
r k
The weights are t h u s t h e t e r m s of the negative binomial distribution.
T h i s result may be compared with (2. 33) where it is noted t h a t t h e
ordinary non-central Xz-distri bution c a n be expressed as a w eiqhted
sum of X2-distributions w i t h weights corresponding to t h e terms of a

Ym*
Poisson distribution. If 1' = Z O~/S'
J
i s substituted into (2.4 0 ) , t h e n
i=l
P becomes t h e ratio of t h e residual sum of squares divided by t h e "total"
s u m of s q u a r e s (residual p l u s model s u m s of s q u a r e s ) .
The properties of this distribution and the method for
approximating it are described in Chapter 4. It is shown there that t h e
distribution is w e l l approximated by a X2-distribution in which A' i s

m
approximated by a variate ax2 with
b
and
T h i s procedure i s now applied to t h e example i n Sectiorl 2:4. For
convenience, t h e value of I/P is computed w h i c h is

1 - M o d e l t Residual S u m s of Squzres
-
P Residual Sum of Squares
Mso, v = 9 , v -11, az79.79, and b = 1 0 . 9 5 . The transformation

m r
Ym
= AI m
/G
rn
I
is applied to p(
m s 2 ) and t o the approximating
distribution so t h a t t h e distributioa for y

m
= O-
- - -
A
/G-
(y-y)canbe
obtained. Figure 2.2 shows both the exact and approximating distributions
of ym; the approximation is i n d i s t i n g u i s h a b l e from t h e exact
distribution.
2. 7 Discussion of t h e Various Approaches
We have seen how y can be justified a s a measure of the c h a n g e

m
occurring in the actual response q compared with the error of estimate of
that response. Three approaches have been considered for making
inferences about y First, w e can adopt some valve y of y,, which is

rn' 0
the minimum acceptable value and t e s t the null. hypothesis ,y -

m - '0
against the alternative hypothesis y > In the example i n Section
rn '0'
2 . 4 , this leads t o the decision to reject y = 4.

m
Second, i f a particular level of significance a is chosen, t h e n
a whole spectrum of decisions concerning all t h e postulated values of
y can be m a d e . In accordance with the u s u a l arguments, t h e con-

a
fidence interval consists of all the v a l u e s y for which t h e n u l l
0
hypothesis y = y is not rejected. Implicit i n t h i s , as i n all sampling

m o
theory, i s t h e c o m p a r i s o n of a particular criterion w i t h a reference set
47
of other values which have not occurred, i n fact, b u t which could hzve
occurred on same stated as sum ptions.
Third, i n the Bayesian approach, a posterior distribution for y

rn
has been found.. To those who favor t h i s approach, it may seem t a be
a more direct and natural mode of inference t h a n .that via signir'j.cance
t e s t s and confidence intervals. In the simple case of a s i n g l e unknown
parameter, for each value of that parameter, the likelihood c a n be
considered as a function which gives the frequency w i t h which samples
would be obtained " l i k e " t h a t which is actually observed. This
frequency, when multiplied by the (prior) prdbability of occurance of
these different values, would indicate t h e probability of any particular
value of the unknown parameter hzving given rise to t h e single set of
observed data. When there are nuisance parameters, a s i m i l a r argument
applies, but these parameters are averaged out with weights proportional
t o their probabilities.
In practice, of course, the above methods may produce
numerical r e s u l t s which are close i n certain cases, although the inter -
pretation differs. From the posterior distribution of y for t h e e x a m p l e

rn
considered i n Section 2. 4, if we s e e k two values y and y a t both
ml m2
of which the posterior density is e q u a l to some value P and s u c h

0'
that
w h e r e a = 0.05, weobtainapproxirnatelyy = 5.5 a n d y = 13.9.
M1 mz
The interval defined by y and y may be compared with the 95%
ml mz
confidence interval far y (Section 2. 5) which is 5 - 6 5 < y < 13.79.
m rn
CFAPTER 3
UNCERTAINTY ARISING FROM LACK OF FIT
It h a s baen assnmed t o t h i s point that t h e function t o be fitted
to the data provides a completely adequate representation. We now
suppose t h i s is not the case, and t h a t while Q = E(y) = f(x)

- represents
t h e true functional relationship b e h e e n t h e expected values of the y ' s
and t h e levels of t h e variables xl,x ~ , ... ,xk, it is wrongly assumed
-). Then
t h a t q can be represented by some graduating function glx
after t h e redundant variables are eliminated, the analysis of variance
will appear as follows.
Table 3. 1 M e a n Squares i n the Analysis of Variance
Source I Freedom
M r r r Squares I Expected Mean Squares
-E(K)I
I
n A " n
Model Y
rn M
m u=l
z [E~J
= Z: ( Y , - Y ~ ) ~ / V ~ uzl +
2 / ~ m rL
n . A n
Residual v M = T ;
r I- u = l B [ v ~ - E & ~ )2/vr
u=l J + rL
If a design is used in which certain experiments are genuinely
replicated, t h e n from t h e residual. sum of squares v r M r w e can isolate
a s u m of squares v,M, due to pure error alone, and a s u m of s q u a r e s
v
r
M
r
-Y
e
M
e
= v M associated w i t h lack of f i t w h e r e v
1 1 I!
=v
r
-v
e
. This
is shown i n t h e 3nalysi.s of variance, Table 3. 2.

Table 3 . 2 M e a n Squares in t h e Analysis of Variance
Source Degrees of M e a n Squares Expected Mean Squares

Freedom
n
Model v
m " = U-1
E
rJ
(~u-~,~2/~, 1 [ E ( F ~ ) ~ E ( <' /)vIIn +rZ
~ = l
Lack of Y
L
A
'/vl + r2
Fit
Pure
v M = sZ u
Error e e
The situation which occurs w h e n there is lack of f i t is illustrated
for the case of one variable in Figurc 3.1. However, the argument is
perfectly general and may be applied t o any number of variables.
h
Figure 3.1 Discrepancies q -E(y j Compared w i t h Average Variance
X X
A
of Quantities y-y.
W e suppose t h e nature of the true funct,ion q t o be unknown. A gradusting
-
function g(x) i s fitted over t h e region of interest. Since the fitted value
n
yx at some .point*-
x i n the space of the variables is given by
A A
- -- --
x = X ' ( X ' X ) - ' E ~ then E(yX ) = - -- The function,E(yX ) is there -
X'(X'X)-'X'~.
fore t h a t which would be obtained by fitting to t h e true values 7 cf the
response. In the example illustrated i n Figure 3. 1, the true functior,
q might be exactly functionally re presented by t h e difference between
two expoentials, and g (-

x ) might be a quadratic function. The average
fitted curve in repeated sampling usifig the same experimental design is

A
the dotted curve E(yx), and the discrepancies between \and E($ ) are
X
indicated by the heavy lines.
It has been customary i n the past t o make a t e s t of significance
of the l a c k of fit sum of squares against the error sum of squares u s i n g
the F-criterion. It is by no means clear, however, what action should
or should not be taken as a result of such a test. For example, while
from t h e statistical vi.ewpoint a " sl.gnificant" lack of fit m a y exist, it
may be of little or no practical importance, This situation may occur
when an exactly calculable function is tabulated, and the only error
which occurs in the tabulated values is rounding error. When an
interpolation is made i n a table, there must certainly be lack of fit
between the interpolating polynomial and the true function but this is
of no importance unless it is large compared with the acceptable error.

To gauge t h e importance of discrepancies due to lack of fi.t
relative t o discrepancies a r i s h g from sarn pling error, we consider
the quantity
where
n
The quantify Z {rl -E$
U
)I2 is precisely t h a t which appears a s a
u=1
factor in t h e n o n ~ c e n t r a l i t yparameter in the lack of f i t mean square i n
t h e analysis of variance (Table 3 . 1 ), In fact
and
where
W e c a n therefore proceed in a manner very similar to that used i n
Chagter 2 . A significance test, a confidence interval, and a
Bayesian posterior distribution c a n all he derived i n terms of the
quantity y
L'
3.1 A Significance Test for v,
It might be argued t h a t w e should proceed with the analysis
only if w e had an assurance that y w a s not larger than some specific

1
value y This would lead t o a t e s t of significance i n which the null
0-
hypothesis y = y would be r e j e c t e d i n favor of the alternative

L 0
hypothesis y if the mean square ratio F = M /M was less t h a n

1 <Yo B e
some specific value F The selection of F would be s u c h t h a t when
0 0
ye =yo, t h e probability that F < F would be equal to a, where u is

0
some suitably chosen s m a l l quantity and comes ponds to t h e significance
level or the "size of t h e t e s t . " When yl = yo, t h e ratio F = M/M,
has a non-central F-distribution which can be approxi.rnated by
with
This approximation and the method for finding the appropriate value
F 0 exactly parallels that given i n Chapter 2. Using the table of t h e
central F-distribution with b and v degrees of freedom, w e now find

e
t h e value F w h i c h is s u c h t h a t
0
and w e m i g h t t h e n agree to proceed o ~ 3 yIf a significant result were
obtained,
For e x m p l e , consider again thc data already discussed i n
Chapter 2, Section 2. 4, Lor which the appropriate analysis of variance
t a b l e is;
-- 3 . 3 Analysis of Variance
Table
-
Source Degrees ' sums of ~ q u a r p s Mean Squares

Freedom
Modcl 9 2977, 48 330.83
Lack of Fit 3 24.95 8. 3 2
Pure Error 8 12,9 3 1, 62
Suppose it was decided t h a t we would prefer y to be less than

1
1. Then we c a n *moceed b-j testing the null hypothesis HO:yI = I
against the alternative hypothesis HI :y < 1, Since y = 1 on t h e null

1 1
Y
hypothesis, and v = 9, v = 3 and v = 8, then b = 3(4)2/7=7,

M L e
F 7 , 8 = 0 * 2 6 8 f ~ r u = 0 m 0 5 , h r JM
l =8.32/1,62 =5,14, andthe
4 e
approximated critical value of M/M, from (3. 6 ) i s 4{0.268) = 1. 07.
The observed value F = 5.14 is not less t h a n F = 1. 07; consequently

0
w e do not have t h e required assurance to proceed w i t h the analysis

I
bascd on t h i s approach,
3. 2 A Confidence Interval for yl
The l i m i t s y l and y, which provide the 400(1-a)%confidence

interval for y
a can be obtained by satisfying
and
wherc F' represents the non-centsal T-variate with Y and v degrees

1 e
of freedom a.nd with non-centrality parameter <'1 = ( V ~ / V ~ ) ~ ; ,and F
is the observed mean square ratio M ~ / M , . The procedure for finding
t h a confidence limits for y is exactly the same as that described for

L
y m i n Chapter 2, Section 2.5. Usir.g the results from Table 3. 3, +he
90% confidence internal for y is found to be 0.30 < y

L L < L98.
3. 3 A Posterior Diskibution for yL
A posterior distribution for y may be obtained i n a n exactly

L
parallel way t o t h a t employed i n obtaining the posterior distribution
of ym. By a s i m i l a r argument
and t h e posterior distrj bution of A' given sqis

J?
As notcd i n Chapter 2, Section 2 . 6 , a distribution of this form is well
approximated by a X2-distribution. Thus, h 2 can be approximated by

L
a l X 2 where
b
and
and
-I --
Lack of Fit t Pure Error Sums of Squares
2
.. P' Pure Error Sum of Squares
Referring again to t h e analysis of variance (Table 3 . 3 ) ,
l/p1=2.93 withvl = 3 a n d v =R. Thed-istributionof y is now

e ,t
readily obtained since y = 1 A The exact and approximate
J! rn
distributions are compared i n Chapter 4 and the agreement is shown to
be very good. The approximating distribution for this example is shown
i n Figure 3. 2. From this figure it is clear t h a t i n estimating t h e
response 9X over the region of i n t e r e s t , the experimenter c a n expect
t h e error contribution from t h e bias to be about the same order of
magnitude as the error contribution from sampling variation.
3. 4 The Amount- of True Change Not Accounted far by the M o d e l
The previous approach to Iack of fit s u p p l i e s valuable information
about the relative seriousness of lack of fit error compared with s a m p l -
ing error. However, it fails to indicate how serious t h i s lack of f i t is
i n relation to the total change occurring i n t h e function. The issue here
can be seen m o s t clearly i f w e ignore for a moment the problems raised
by sampling variation and c o n s i d e r j u s t t h e question of interpolatjon.

Figure 3. 2 Appraxim ating Posterior Distribution of y
1
58
- could be exactly determined at, each

Suppose t h e function q = f(x)
oi the experimental points and ~2 propose t o graduate it by f i t t i n g t h e
function g ( x ) by the method of least squares to the true values q

&
determined at each of these points. T h e n in order t o measure t h e
-
effectiveness of g ( x ) i n representing f(x)
- over the region of interest, w e
could compare a measure of t h e discrepancies f (x)- g (--
x ) at t h e experimental
-
points w i t h the variation accounted for by g ( x ) a t these s a m e points.
Such a measure would be
and t h i s could be regarded as a measure of the adequacy of the graduat-
ing function over the region. It will be noted that t h e quantities w e are
u s i n g here are the exact values of t h e response and d o not contain
sampling error. In the previous d i s c u s s i o n it appeared that t h e Bayesian
approach supplied the most u s e f u l and informative conclusions; as a
consequence we s h a l l proceed only with t h i s approach in t h e present
situation.
c a n be derived i n
T h e posterior distribution of y
.elm= Yl/Y,
the following way. By m e a n s of linear transformations t h e quadratic
forms A; and A' can b e expressed as s u m s of squares of t h e

m
parameters (I and 9 respectively. Thus,
and
If a priori + and -8 are uniformly distributed,

+
t h e posterior distributions
for & and -

8 given r xe
1
1 = r V exp -21=1
,Z (#i -$i~Z/r2)
and
-1, vm
rn
P@/ r) =(fir) e x p { -1$' 1.z{ 0 ~ - 8 ~ 1 1~ / r ~
and these distributions are i n d e p e ~ d e n t . Therefore, h 2 and each

1 rn
have non-central X2-distributions, with v L and vm degrees of freedom
"--
vf A vmh
' = Z 9?/s2 and L' = z 8?/cr2
and non-centrality parameters L
i-1 1 m i=l I
respectively.
The distribution of t h e ratio of two independent non-central
X z - ~ a r i a t eeach
s with the appropriate number of degrees of freedom
and non-centrality parameters h a s been determined by Kendall and
Stuart (1951). In terms of t h e above development the ratio to be con-
sidered i s
Applying the result of K ~ n d a l land S t u a r t mentioned above, the
posterior distribution of y2 given .cr2 is

l/m
Icr21 :exp {-$(L;-L~) 1
m
2:
03
z - ( b ~hj()lF Ll I t~ }
p(y;/rn h-Q t=o h !t !3 ( i v -f- h, 2-v f t)
1 m
where y 2 L; and L' are all functions of r', and

a/& m
B (1r ~ l + h , $v t)
i- is the complete beta function. In practice,s'
m
generally is unknown, but with an estimate s 2 and the u s u a l a
prjori a s s u m p t i o n that log r is uniformly distributed, t h e posterior
dj stribution of crZgiven s2 w i t h v degrees of freedom is

e
Thus, the distribution of y Z given s2 c a n be obtained by multiplying

~/rn
~ (I 2)~ f 1 , by P(r2s P )and integrating t h i s product with respect to r2.
The posterior distribution of aEZ given s2 is t h e n

Urn
where the weights
are the terns of the negative trinomisl distribution. and

and
Note that 1 ; i s t h e lack of f i t sum of squares divided by t h e error m e a n
square, and 8 ; is the ratio of the sum of squares due to t h e model
divided by the error mean square. If the expressions for I f and 1 are
substituted into t h e expressions for P, and PZ, t h e n i n more familiar
terms P, becomes t h e ratio of the lack of f i t sum of squares to the ""total"
sum of squares, and Pz becomes t h e ratio of the model sum of squares
to t h e "totalt1sum of squares. T h e "total" s u m of squares denotes t h e
m d e I plus lack of fit plus error s u m s of squares.
T h i s disf ribution of y 2 is somewhat complex, however, as

1 /m
before, an approximation c a n be obtained for it. To do t h i s a general
expression for t h e m o m e n t s of y 2 i n t h e form of finite s u m s is f i r s t

f/m
th
derived. It is shown i n Chapter 4 that t h e r moment of can
a/m
be expressed as a f i n i t e triple s u m which is
w k w-k
x c y)(-Ve/l:) (1 + v , / , ~ f ) 1
k=o S
and
th
The u term i n the summation i n 1 becomes n-k -1 ( -1)1' n ( l +l f / v )
S ( e zl
w h e n p = 0, where p = z - n t k - u -1-1 ,n 1
-. ,v + r - j , z = r-1, with I : and
e
1:defined by ( 3 . 2 8 1 a n d ( 3 . 2 9 ) . T h e o n l y r e s t r i c t i ~ n s o n ~ ~ )are
"~~
r l/m
t h a t w m u s t be a non-negative integer and therefore v must be a positive
m
even integer such t h a t $v ->
. s +1, and n must be a positive integer,
rn
hence v m u s t be a positive even integer such that $v 2 k -El. These
e e
restrictions do not present a major problem because when a situation
arises in which v or v are odd integers w e can interpolate between

m e
the adjacent even values. This procedure is expedited by t h e u s e of
a digital computer.
NOW since t h e numerator and denominator of y" A;/A;

a/m
can each be approximated by a X2-distributionit, might be expccted
that this ratio can be approximately distributed as an F-distribution.
It seems appropriate therefore to approximate t h e variate y Z by a

1 /m
quantity bF where b is a constant and I-' is t h e variate of t h e central
F-distribution w i t h 2P a n d 2 Q degrees of freedom. T h e approximating

distribution for = bF is given by
)P-1
2
( p/bwPr ( P+Q) Y,(: .
P ( Y ~ / ~ =) (3. 3 3 )
r (PII.( QI(l+yt/, P/~Q)'+~
and it is easily shown that t h e rt h moment of the distribution of bF i s
given by
for r < Q. The moments of bF are functions cf b, P, and Q, and these
three quantities may t h u s be evaluated by equating t h e first three
moments of t h e exact distribution of y 2 to the same three moments

a/m
of the bF-distribution. The details are given in Chapter 4.
This procedure can now be applied to the example previously
considered, the information for which is furnished in Table 3. 3.
The moments cf the exact distribution are pi = 1,051(10)-~,
p i = 1.418(10) -4, and = 2. 510(10)-~and thus b = 0.00918,
P = 10.23, and Q =- 7.88. Hence,the approximating distribution of
having t h e appropriate first three moments is

' 1 /m
(3. 3 5 )
)
n
r
/
;
'
( = '* 00918F20.58, 15. 7 6 '
By means of a positive square root transformation the approximate
posterior distribution of y m a y be obtained. It is shown i n Figure 3 . 3.
The maximum point of the distribution occurs a t y = 0.09. Thus, we

a h
estimate y
m/m = ) 1 as about 0. 09.
" h - ~ ( 9d)B @ ( 9-E(%
CHAPTER 4
DISTRIBUTIONS A N D APPROXIMATIONS
4.1 The Exsct Distribution of A:
It was shown i n Chapter 2 that the quantity y may he used

m
as a measure of the adequacy of estimation of a response surface
where Y2
m
= A' /v
m rn
. On stated assumptions, we have obtained the
posterior distribution of h 2 as
rn
and the characteristic function of A' as

m
The distribctian of A' is t h u s a weighted s u m of X2-distributions

rn
where the weights
are the terms in t h e negalive binominal distribution. The quantity P
is the ratio of the residual sum of squares divided by the residual plus
model s u m s cf squares, v represents t h e degrees of freedom due to

m
t h e model, and v denotes the residual degrees of freedom.
f
66
If the mean of t h e negative binomial is fairly small, t h e
distribution of A' might be calculated directly since, in t h i s case,

rn
t h e weights w will become small quite rapidly with increasing k.
k
Thus, rclntively few terms would have t o be computed t o obtain t h e
distribution (4.1).
It is advantageofis to obtain a finite expression for the
d i s t r i b u t i o ~of A' for use when it is necessary to calculate a

M
large number of terms i n t h e infinite series (4.11, The characteristic
function (4.2) can be re=ranged so that
k
,m (t) = (1-2it)
m
-2v --
= (1-2it)
m
{(l-Zit)/(l-Zit/F')]
t",
To obtain t h e corre s ponding e x a c t distribution from the characteristic
function (4. 5) it is necessary to consider separately the cases:
1) vm > v 2 ) vm = v and 3 ) v < v

r' r' rn r'
Case 1: vm > v r . Whenboth v and v areevenintegers, the
m r
first tern i n (4. 5 ) is t h e characteristic function of a Xt-df stribution
with v
m
- v, r degrees of freedom. The second term in ( 4. 5) is t h e
characteristic function of the distribution of ( I / P ) ~with

~ v degrees of
f
freedom. It follows that is distributed as X2
rn v +v
rn r
+ (1/p)x: r
where t h e two Xz-variates arc distributed independently, Box ( 1954)
k
h a s considered the distribution of T: A , X2 where the degrees of
freedom v , are even integers.

3
j=l I V
j
(Here k = 2 , hl = 1, XZ - P/P,
vl = v -Y a n d v , = v , ) Boxshows t h a t i f t h e v areevenintegers
rn r' r j
such that v. = Z g , the density function of A is a weighted finite
1 J m
sum of X 2 -distributions
The constants a are evaluated by u s e of the formulae:

js
and
th th
where t h e h moment about the origin pi is related to t h e h
ih
cummulant K by w e l l known expressions. If either one or both of
ih
the quantities v and v are odd integers, t h e distribution of A'
rn r m
m a y be obtained by using the t h e infinite series due to Robbins and
Pitman (1949), or i t may be obtained by interpolation between
adj~iningeven integers.
Case 2:
-----
t.
rn
= v
r
. When v
m
= v
rf
the characteristic function of
Therefore, h Z is distributed as ( I / P ) ~ ~and the distribution i s

m vr
Case 3: v
. m
<v
r
. If v
r
- v
m
is a positive even integer, t h e n it
is possihle to obtain t h e distribution of A' as a finite series. The

m
characteristic function ( 4. 5) can be rearranged s o that
1 1
ant (t) = (I-2it)z ( v r - v m ) hl-zit/pl zvr
where
In t h i s case, the distribution c a n be expressed a s a finite weighted sum

i n which the w e i g h t s w
k are t h e terms of a bifiornial series. When
v -v is an odd integer, interpolation between t h e appropriate even
r M
integers is suitable,
4.2 Approximation for the Di skibution of A:
So far, methods for calculating the exact distribution of A

m
have been considered. It is convenient to have available a simple
approximation. A number of authors; Welch (193 6), (1937); Fairfield
Smith (193 6): Saiterthwaite (1941); and Box (1954) have shown that a
s u m of weighted Xz-distributions 'can often be approximated by ax2

b -
W e have used axZ t o approximate t h e distribution of 11'
b m
. The
quantities a and b are chosen such that the mean and the variance of
t h e approximation are equal respectively t o the mean and t h e
variance of the exact distribution. The mean of t h e approximating
distribution is
and the variance is
The mean and variance of t h e exact distribution are obtained from (4.5),
t h e characteristic function of A' The corresponding curnulant

m'
generating function is
70
th
and expanding the logarithms i n series, the general form of the r
curnulant of t h e exact distribution of A' is found t o be

m
Therefcre, the mean ar.d t h e variance are respectively
and
Equating (4.1 5 ) to (4.19) and (4.16) to (4.201, the solutions for the
quantities a and b are
and
as asserted in (2.45 and (2.46) in Chapter 2. Since 1 / ~Z 3, a and b
are always positive quantities .

The approximating distribution for h Z can be written as
rn
where a and b are defined above.
4.3 Summary OF Distribution Theory and Approximations for y

m
The exact distribution of y i n t h e general case, can be obtained

m'
directly by applying the transformation
Ym
= 1 Amf i l m i n (4.3). ~ h u s ,
$vmtk v f2k-1
M
k(vm' (Y,)
P(Y,),=
O,
2 -L
k=o pYrn +k-1
-vy,2
/;) ( 4.24)
r($vrn tk)
where w is defined by (4.4). Corresponding to (4.61, (4.11) and

k
(4.14), t h e distribution c a n be written alternatively as follows : I} when
and 3 ) w h e n v < v
rn P
where w is defined b y (4.13).

k
This distribution has the same fom as (4,26} when )r
In
=v
r
. Then
the c x a r t and appr0Amatir.g distributions are identical with a = 1/'~
and b = v
r'
4.4 -
Adequacy of t h e ,Approximation
To c h e c k the adeiuacy of t h e simple X2-approxirnation:~for

the
distributionssequations (4. E; 21, (4.,26), (4. Z?,), . and .( 4*28) were
programmed far the CDC. 1404-digital computer and a number of
examples were examined for various values of t h e quantities v

r n ~V? '
and VP, The densities of t h e exact and approximatjng d i s b i b u t i o n s of
y are given for various examples i n Table 4,l. Tables 4. 2 and 4.3 -
m
show t h e third and fourth cumulants respectively for the exact
*
diswibution of A' and for t h e approximating distribution. In s o m e
rn
cases the agreement between t h e s e cumulants becomes worse for a
given-value of 1 / ~a s t h e spread between v and v increases. Ilo%ever,

rn r
this does not s e e m to affect the goodness of fit of t h e distributions 'as
much as might be expected. For example, for the case,where l / ~

= 14,
v = 10, v = 6, there is a discrepancy of 8 , O % i n t h e fourth curnulant,

n r
nevertheless t h e difference betwcen the probability densities is never
greater than 0.009 at any mint, (The"exactand approximate d e n s i t y
functions for this case are shown i n Figure 4.1, ) The following
conclusions emerged:
a. The approximation w a s very good i n all t h e examples tried.

Table 4. I Exact and Approximate Distributions of y
- m
Ym P(Y,) P(Y,) Ym P(Y,) P(Y,)

Exact Approx Exact Appsox
l/P = 1 6 , vm = 4, v = 8
r
Ym P(Y,) P(Y,)
Exact Approx Exact Approx
0 0.000 0.000
1 0. Q 3 4 0.027
2 0.123 0,127
3 0,222 0.229
4 0.249 0.250
5 0.193 0.191
6 0.110 0.108
7 0.047 0.047
8 0.016 0.016
9 0.004 0.004
10 0.001 0.001
,
Table 4.2 Thirci Clrrnulants of Exact and Approximate Distributions
A'.
of y, '"
X
The f i r s t tabled value for each combination of v and v corresponds
rn r
t o the exact distrfbution and t h e second t o t h e approximation.
Each value shown is a factor of 8(1013 less than t h e true cumulant
value a
Table 4.3 Fourth Curnulants of Exact and Approximate Distributions
of ,Y
.-.
'0
8
137.39 134, 14 131.02 128.00 125.09
160. 00 160.00 160. 00 160.00 160.00

10
172.57 164.29 166.10 363.00 160.00
*
The f i r s t tabled value for each co~nbinationof v and v corresponds
m r
to t h e exact distribution and t h e second to the approximation.
E a c h value shown is a factor of 48(103" less t h a n the t r u e cumulant
value.
Figure 4. J. Exact and Approximating Distributions of y
m
values of v and v become closer together,
m r
c. The agreement improves if 1 v rn -v r I is held constant as
I/P increases.
d. For a given 1 1 value,

~ t h e agreement is better for v -V
r m
equal t o a positive constant t h a n for v -v equal to the
m r
same positive constant.
e. If v
m and vr are both increased s u c h that 1 v rn -v r I ' remains
constant for a given l/P value, t h e agreement improves.
4,5 Exact and Approximating Distributions for A; and y

-. - - ..
m
To gain a better understanding of t h e nature of the uncertainty
due to an inadequate model, the quantity y w a s discussed i n Chapter 3

1
where y2 = A:/V,. A procedure exactly paralleling t h a t used i n
L
Section 4.1. , where 11' w a s considered, is now employed i n the
rn
developmentforh2 By substituting P' f o r P , . v l for v and ve for
1' m'
v t h e e x a c t a n d a p p r o x i m a t i n g d i s t r i h u t i o n s f o r ~ ~aredeveloped
T a
i n precisely t h e same way as were those for A'
m
. Thus, t h e posterior
distribution of A is
1
where P' is t h e ratio of the error s u m of squares divided by t h e lack of
f i t p l u s error s u m s of squares, v denotes the degrees of freedom for

L
lack of f i t , and v represents the error degrees of freedom. The form
e
(4.29) is identical, t o (4;l).
A direct analogy also holds for t h e various cases of the exact
distribution and the approximation discussed in 4.1, By making the
substitutions noted above, results similar to those for n2

M
and y
m
c a n be obtained for A' and y Since the forms of t h e corresponding
a L'
equations and the arguments u s e d i n their development are identical,
t h e details w i l l not be given here.

th
4. 6 -Derivation of t h e r Moment of the Distribution of y Z
dm
The quantity y considered i n Chapter 3, is a measure of ,
1/m9
t h e discrepancies between f (- - a t the experimental design
x ) and g (x)
points compared with the amount of variation accounted for by g(x)

-
at the same points. - i s the true functional form of

The quantity f(x)
t h e response and g (-
x ) is t h e graduating function. The distribution
involves a double infinite sum, and therefore a finite

Of '~/m
approximation is desirable. Since,we shall obtain an approximation
by equating m o m e n t s , it is now necessary to derive a general
expression for the moments of Y2

Urn'
The posterior distribution of y Z given r 2from ( 3,221 is
i/m
V Y
where L' = c1 AZ
(P~/(T' aalld L' = m0
z A
Then t h e rt h conditional
1 i-1 m i=l1
m o m e n t of t h e distribution of y2 is
1 /m
1
B($V
1
+ r th, ~v rn -r +t)
X (4.31)
Mter some simplification t h i s double series c a n be alternately
expressed as
th
The r m o m e n t of y Z is therefore
Urn
where
V
a
and 1: = Z $;is:
i=l
A vmA
and 1 ; = T: 0?/s2
1=1 1
.
Now let
If w is a non-negative integer, 3 can be expressed as
X
w n-k
x k=o
-k J xZ/(l+xl ? v e ) dx . (4.3 6 )
0
When n is a positive integer, t h e integral i n ( 4. 3 6) has the value
th
where p = z - n t k - u + l (Dwight 1954 ), W h e n p = 0, the u term i n the
2
summation in I becomes (n-k-1) (-11% n(l-'iepv ). Combining ( 4. 36)
s e
th
and ( 4 , 3 7 1 with (4. 331, the finite £om of.the r moment of e/m i s
where I is defined by ( <4,3bli).T h i s is result ( 3.30) of Chapter 3,

S
4-7 Approximationfarthe Dishibutionof y Z

l/m
IVe have shown, t h a t both t h e numerator and t h e denominator of
~t/m = h2/iI2can separately be well. approximated by a central

1 rn
X2-
distribution. It seems reasonable therefore, to attempt to approximate
the distribution of y Z by a n F-distribution. IF w e set y Z equal t o

a/m
bF where b i s a constant and.F is t h e variate of the central
F-distribution with 2P and 2 Q degrees of freedom, then the,approxirnat-
ing distribution for is given by

a/m
and the general form of the fhm o m e n t of t h e distribution of hP is
The quantities b, P, and Q i n { 4 - 3 9 are now evaluated by equat-
i n g the first three m o m e n t s of the e x a c t distribution of y2 (4.38) t o

1/m
the corresponding m o m e n t s of the approxirnatjna distribution (4.40).
W e find
b . = 2 p ; ( ~ - z } / (4w - 3 2-1)
and
Q = ( 4W-32-1)/(2W-Z-1)
where
w = pypizand Z = p,$/tp;tL~)r
Thus, t h e approximating distribution (4. 39) can be computed. By using
a positive square root tlansforrnation, t h e approximate distribution of

DISCUSSION
5.1 The Ir."_erpolation Efficiency
An interesting interpretation is possible from t h e results
obtained i n t h e previcus sections. Suppose t h e true values
rh, vlz, ,rl, of t h e response a t t h e n experimental points are known.
Then, i f t h e graduating function is fitted t o these true values, t h e
following identity can be written which relates t h e "total" s u m of
squares to the "'regression" and "residual" sums of squares.
This identity is true because the ~ ( 9 ) ' sare the same a s e ' s , that is,
they x e t h e values obtained by fitting t h e true 7's to t h e model. If
each of t h e three quar.tities i n ( 5 , 1 ) is divided by v (r2 t h e identity

m
still holds, Thus, an analysis 0 5 variance table with accompanying
diagrams could be prepared as shown i n Table 5.1. The quantity
w i l l be called t h e "interpolation efficiency. " Its distribution is
indicated by t h e appropriate figure i n Table 5.1. Bearing i n mind that
the analysis i s being made on the true values, there would be no

A
"residual sum of squares" i f t h e model fitted perfectly, since E(y )
u
would be equal to q t h u s t h e interpolation efficiency would be unity.
u'
Now consider t h e situation when the exact values of q are known
so t h a t there is no experimental error but w e are trying t o represent the
function q = f(x)
- by means of a graduating function g(x).
- For any
partlculx~set of "experimental points, " the "total sum of squares, ",
the 'Yegres sion s u m of squares, " the "residual s u m of squares, @"and
the '"interpolation efficiency" ln would be fixed constants. The total
sum of squares would be.a measure of the total amount of change
which w a s occurring i n the true function. The regression sum of
squares would be a measure of t h e total amount of c h a n g e which was
occurring ir! the graduating function, and the residual sum of squares
would measure t h c fctal amount of change which w a s unaccounted
for by.the graduating function, The interpolation efficiency-would
measure t h e proportion of t h e total change occurring i n the function
x ) which was accounted for by t h e graduating function g(x).

f (- -
In an experimental situation t h e additional complication a r i s e s
that t h e observations are subject t o random errors so t h a t the various
sums of squares and the ratio R are not fixed constants but have
distributions, For t h e example considered i n .the previous chapters
beginning i n Section 2. 4, these disih-ibutions are shown i n miniature
i n Table 5.1. W h e n there is experimental error, it becomes relevant
t o consider these s u m s of squares i n relation to the size of r2. Thus,
the size of y will, i n general, tell us whether very much variation

rn
i s being accounted £0; by the graduating function in relation to the
size of the experimental error, The quantity y w i l l indicate the amount
1
of the va.riation not accounted for by the graduating function i n relation
t o the amount of experj mental error. The interpolation efficiency
n = 1/(1 +),ly 2 will measure what proportion of the true functional
variation is being accounted for by the graduating function,
5,2 Interpretation of t h e Estimation Situation
From a table such as Table 5.1 together with t h e diagrams
showing the posterior distributions, the experimenter could obtain .a
complete picture of the estimation situation which faced him, It
would s e e m logical to consider the interpolation efficiency a s
represented by the distribution Q T h i s would be a n estimate of the
sum of squares of the deviations which could be accounted for by t h e
fitted empirical function compared with the total sum of squares
accounted for by the true underlying function. One might reasonably
expect that any useful interpolation f u n c t i o i should a t t a i n a n inter-
polatlon efficiency well above 90%.
Also, the experimenter would need to be assured t h a t the

h
error i n estimating the response (that is, t h e error i n y j w a s reasonably
small compared with the total change i n true response. T h u s , he
might hope to attain a value of y a t least as large as 4 and pre-

rn
ferably somewhat larger. The distribution of y would probably not,
P
by itself, influence h i m unduly. However, i f y were large, the
,I
interpolation efficiency would tend to he small and the situation there -
fore unsatisfactory.
5.3 Practical Application t o Response Surface Methodology
The present investigation indicatc s the desirability of examining
t h e estimation situation for any given application of response surface
methodology. In regard to the practical u s e of the criteria developed
here, it is suggested t h a t as a matter of routine, t h e experimenter
should compute the posterior distributions of y and y or at least

rn 1'
t h e 95% posterior intervals when conducting res ponse surf ace work.
He should be invited to think of y simply as a measure of the

rn
change occurring i n t h e function compared with the average
uncertainty of its estimate. I t seens reasonable to require t h a t
for good estimation, values of y 2 4 are needed. The quantity

rn
%
'
y 1 should be considered a s a measure of t h e bias error due to lack
of f i t compared with sampling error. Values of y 5 1 are reasonable

a -
i n order t o assure t h a t t h e amount of variation t h e graduating
function fails t o account for is s m a l l i n relation to t h e error of
estimate of t h e response.
Finally, the experimenter should consider the interpolation
efficiency R = ]J(l + $-e /m ) which is a measure of t h e proportion a£
t h e b e functional variation accounted for by the graduating function.
By obtaining t h e posterior distribution of Yi,m

= yl / Yh , the
2
interpolation efficiency can be estimated and t h u s t h e adequacy of t h e
graduating function c a n be determined. For values of SZ > 0.90, the
experimenter can be reasonably assured t h a t the true response is
being adequately estimated by the graduating function,
5.4 Recommendations for Future Work
This study suggests three areas i n which further work would
he desirable. First, t h e approximate distributions for y and ye are

m
fairlysimplebutthatfory ismuchmorecomplicated. Itwould
a/m
be of great value to find a simple approximation for this distribution.
Second, an investigation should be made into the feasibility
of constructing u s e f u l tables which would help the experimenter t o
assess quickly how w e l l he w a s graduating t h e response function i n
the region of interest. As a n example, consider t h e significance
t e s t i n g approach for y
m
. Far a given significance level cr and for
eachminimalacceptablevaluey ofy itwouldbeconvenient

0 m7
to have a table-with v (residual degrees of freedom) corresponding
r
to rows, v (regression degrees of freedom) corresponding t o
m
columns, and t h e critical value of t h e quantity P indicated i n the
body of the table, The entry Pwould represent the value of t h e ratio
of the residual sum of squares divided by t h e residual plus model
(regression) sums ~f squares which just led to "significance" a t the
a level.
88
Since the q u m t i t y P is somewhat u~familiar,it would be pos siblc
t o transform the table described above so that the more familiar F-ratio
between t h e regression 2nd residual s u m s of squares could be employed.
If tahles of t h i s kind ~ v e r eprepared for a sufficiently large number of
y m values, then it also would be a relatively simple matter to u s e
them to calculate confidence intervals. Similar tables could be
prepared for y
L'
- Tables are particularly needed to assist i n the calculation of
t h e Bayesian posterior distribution of y Although the x2-

.t/mm
distribution provides a very good approximation for y and y rn with
1
relatively simple calculations, t h i s is not t h e case for t h e
distribution of y Direct tabulation would not be easy because

a/m'
there are seven values to be considered, namely PI, P,, vm, vr, v e , u
and y It doe s s e e m probable, however, that methods could be

1/mm
devised t o greatly simplify these calculations.
Third, throughout t h i s investigation, w e have considered only
those functions which are linear i n t h e parameters, It i s often useful,
however, to e m ploy non-linear functions bf t h e parameters to represent
response relationships. In some cases, these non-linear functions are
well defined so that the experimenter knows t h a t t h e lack of fit which
m a y occurwill be small. In other cases, non-linear functions may
be used, to some extent, a s e m p i r i c a l graduating functions, For
example, w e may postulate a somewhat idealized model by t h e u s e of

differential equations, The functional relationship between the
response and t h e variables which then arises is usually non-linear
i n t h e parameters, but the same considerations apply as i n the linear
approach, First of all, t h e sampling error may be so large that even i f
t h e function could be assumed to fit perfectly, the change which it
represents over t h e region of interest is of the same ordei of

A
magnitude as the average experimental error of y. Secondly, it may
be t h a t because of simplifying assumptions, the model is inadequate
to represent the response. T ~ L ~ sit, would be desirable to devise
methods, similar to those i n the linear situation, i n w h i c h t h e
adeq uacy of the graduating function could be assessed.

Banlett, M, S. (1947). 'The U s c of Transformations, " Biometries 3
-9
39-52.
Box, G. E. P, (1952). "'Multi -factor Designs of Ffrst Order, '"
Box, G.E. P. (1954). "Some Theorems on Quadratic Forms Applied in

the Study of Aqalysis of Variance.Problerns, I. Effect of
Inequality of Variance i n t h e One-Way Classification, "
-
Ann, Math. Stat. 2 5 , 240-302,
I. .
L
Box, G. E. FL and Cox, D. R. 11964). "An.Analysfs of Transfomations, "

, J. Roy. i Stat. -
c C L -
Soc,-(B). In t h e press.
Box, G. E. P, and Draper, N, R. (1959). "A Basis for t h e Selection of

a Response Surface Design, "- J, h e r -
. Stat. d
Assn 5 4 622-654.
Box, G. E. P. and Hunter, J. S "koces s i)evelopment by Statistical

Methods, " Manuscript of Book.
Box, G. E. P. and Hunter, J. S, , (1957). "Multi-factor Experimental

- Math.
Designs for Exploring Response Surfaces:' Ann. - Stat.
- I 28
195-241.
Box, G. E. P. and Hunter, W. G. (19 62). "A Useful Method for Model
Building", Techornetrics 5 301-318.
Box, G, E. P, and Tidwell, P, Mr. (1962). 'Wansformation of t h e

Independent Variables, " Tgchnometrics -4, 531-550.
Box, G. E. P, and Wilson, K. B. (1951). "On t h e Experimental

Attainment of Optimum Conditions, 'y, Roy, Stat. Soc. (B)
I
'
I -
-
13, 1-38,
Davies, 0. L, (Editor) (1960). Design and Analysis of Industrial

Experiments. Iiafner Publishing Co., NEW York, 495-578.
Dwight, W. B, (1954). Tables of Integrals and Other Mathematical

Data, Thc Macrnillan Co., New York, Second Edition,
Fisher, R. A. (1958). Statistical Methcds for Research Workers ,
Hafner Publishing Company, New York, Thirteenth Edition
Fisher, R. A. ( 1 9 5 9 ) . Statistical Methods and Scientific Inference,

Hafner Publishing Company, New York, Second Edition.
Fisher, R A. (1960). The Design of Experiments , Hafner Publish-

ing Company, New York, Seventh Edition.
Fisher, R. A. and Yates, F. (19 63). Statistical Tables for Biological,

.
Aqricultural and Medical Re search Hafner Publishina- Corn ~- anv; - 2
New York, Sixth Edition,
Hartiey, H.0. (1961). "The Modified Gauss-Newton Method for t h e

Fitting 05 Non-Linear Regression Functions by Least Squares ",
Technometrics -3, 2 6 9 - 2 8 0 ,
Hotelling, R, (1944). "Some Improvements i n Weighing and O t h e r

Experimental Techniques ",Annl Math, Stat. 15, 2 9 7 - 3 0 6 .
k n d a l l , Id. G. and Stucrt, A, (1961). The'Advanced Theory of

-
Statistics ,Hafner Publishing Company, New York, 2 5 2 , Vol. 2.
Patnaik, P. B. (1949). "The Non-Central *'-

and F-Distributions and
Their Applications1" Biometrika 36, 202-232.
Pearson, E. S. and Hartley, H.O. (1951). "Charts of t h e Power

Function for Analysis of Variance Tests, Derived from t h e M n -
Central F -Distribution1', Biometrika 38, 112 - 130.
Robbins, H. and Pitman, E. J. G. ( 1949). "Application of the Method

-
of Mixtures to Quadratic Forms i n Normal Variates", A -
nn. Math.
-
'Stat. 3 552-560.
Satterthwaite, F. E. ( 1941). "Synthesis of Variance", Psychometrika

6 309-316. ..
2
~ c h e f f 6 ,H. (1961). The Analysis of Variance , John Wiley and Sons,

New York, 412-41 5.
Smith, H. Fairfizld (1936). "The Problem of Comparing the Results of
TWO Expzriments with Unequal Errors "', J. Council Sci. Indust, Res.
(Australia) 3 211-212,
Tang, P. C* 11938). "The Power Function of the Analysis of Variance

Tests with Tables and Illustrations of Their Use ", Stat.
- Res.-- Mem.
2
2 126-149.
Tocher, LD, (1952). "A Note en t h e D ~ s i g n?Yoblernt', BiemetrIka 3 3 189.
Welch, I3.L. (1936). ""She Speclficarfon of Rules for'Rejecting Too

Variable a. Product, with Particular Reference t o an Electric
Lamp Problem", & =.'SLat. Soc., Suppl. 3, 29-48.
Welch, B. L. (1937). ':The Significance of the Difference Between Two '
Means when the Population Varisnces are Unequal'" Bi ometrika

29 350-362.
2
S-curblv C 1 ~ c - r : f : r n t t o n
r .-*I--&-*.-.
DOCUMENT CONTROL D A T A - R & D

-2
Office of Naval Research
c Criteria for J u s g i n g Adequacy of Estimation by an Approximating

[I
t R c s p c ~ n s eFunction h
i 4 FtSCRlPTEVE NOrE S ( n y e 01 repvrt n n d i n c l u s i v e dstcs)

'?
B Georqe E. P, Box and John Wetz l1

!I
I
REPORT DATE 7a. T O T A L NO. O F P A G T . 3 7b. N O . O F R E F S
I M a r c h 1973 42 PP. 29
i:
:
1 8s. C O N T R A C T 3 R G T I Z N T NO 9s. O R I G t N A T O R ' S R E P O R T M U M B E R I S I
ONR-XQ00 14-67A-0 128-00 17 Report No. 9

b. P R O J E C T N O .
t. 9b. O T H F R R E P O R T NOtSI ( A n y other nurnhcr.~!Ant m a y be ossr/rrcd

thia m p o r l ) ,
1
C
d
IC. O l S T R I f i U T l O W 5 T A T E M E N T
I Distribution of t h i s document is unlimited.

I I. 5UPPLEMEt.ITAVY N O T E S 12. SPONSORING M I L I T A R Y A C T I V I T Y
I
Office of Naval Research

Washington D. C .
1,
I n response surface methodology t h e true functional form f(x) is i
usually not known and t h e response must be approxirnat~dby means of a t
"graduating function" g(x) (for example, b y a polynomial i n 5) over t h e
region of interest. The relationship between an obsezva:ion y and t h e
graduating function g ( x ) is therefore y = g(z) -t- P t e , where P = f(x)-
I
slxS -
rn of t h e variation accounted for by glx)
In t h i s report, a measure y
i n relation t o the size of t h e error of estimate of g(x)and a measure yl
I3
of t h e discrepancy i n t r ~ d u c e ddue to t h e bias term P i n relation to
t h e error of estimate of t h e response a r e suggested. Furthermore t h e
"interpolation efficiency" = l/(l + y j / y h ) is defined. T h i s is a measure !
I
of t h e amount of c h a n g e accounted for by the grzduating function g h )
compared w i t h t h e c h a n g e occurring i n t h e t r u e function f t ~ over
) the
region of i n t c r ~ s t R. Mcthocls for estimating t h e criteria are d i s c u s s e d .
E
Security Clasaificntion

Box, Wetz Technical Report PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Box, Wetz Technical Report PDF

Uploaded by

Copyright:

Available Formats

---*- ----- ---------------- -*--

TECHNICPIL REPORT NO. 9

CRITENA FOR JUDGING ADEQUACY OI?

ESTIMATION BY AN APPROXIMATING RESPONSE

George E, P. Box and John Wetz

T h i s rcsearch w a s supportzd by i!le U.S. Navy t h r o u g h t h e Olfice

1 t was realized by Sir Ronald Fisher i n t h e early 1 9 2 0 ' s that con-

clusl.ons t.o be drawn f r c m rouhnely recorded aata were of limited validity,

l'c overcome t h e ambiguities and ulicertanties connected w i t h s u c h

analy s j s, he introduced the concept of designed experiments i n w h i c h

randomi zation, replicatjon, blocking, and orthogonality of d e s i g n were

ccnlral features ( F i s h e r , 1958, 1939, 1960; Fisher and Yates, 1963).

To understand s o m e of the difficulties and how Fisher's ideas

overcame them, it is convenient to t h i n k i n terns of a specific example.

Suppose one were interested i n studying t h e dependence of a response

y, which might be, fcr example, t h e yield of wheat, upon a number of

phosphorus i n t h e soil, t h e rainfall, or temperatures a t various t i m e s 01

year. Imagine t h a t xl, x2, . . . , x N which includes t h e subset x l , xz, . . .

v ~ s i a b l se w h i c h rnjght affect y. Denote the. functional relationship

c o n n e c t i n g t h e levels of 3;. w i t h t h e response y by

Further suppose t h a t over the range I n which the x ' s vary, a l i n e a r

f u n c t i o n a l r e l a t i o n s h i p adequately represents t h e dependence. T h a t is,

f o r +.he range of variation concerned w e can write i n place of { I . 1)

In practice all :he N factors affecting y are not: known, and o x

.;iu9;r is confined lo a c e r t a i n subset. Lct t h i s subset consist of t h e

f j r sf k factcrs, Then tzrc can write

w h c c~ E may be calle-",a n error term that includes t h e influence of all the

t h i s system so t h a t in matrix notation the n-dimensional vector -

w h c c X 8 f s the "error term" contributed by the unknown variables. Now

nofj ng that t h e components of -

xk+] . . . , xN, t h e n from t h e Central Limit Theorem with thc usua.1

proviscs, t h e d i s t r i b u t i o n of - w i l l tend t o normality

cclrnpc;nc-2% x k+13 • " 7 XN becomes large. If, however, the parameters 0

w i l l no*L I n ger?eral clht_ain u nhiased estimates unlcs s the variables ih X

The elements of t h e vector - + -

least squares estimates

TO illustrate by m e a n s of a topical example, let -

X, contain a column of I t s corresponding t o location

n-dimensional vector whose uth element is "the strength of a genetic

also produces a desire to smoke. " Then A

coefficient of t h e genetic factor on t h e number of p a c k s smoked. If t h i s

regression were large and positive, the finding of a large positive

regression coefficient w a s zero due 20 t h e influence of R e 2 . --

eliminated only by ensuring i n some way that -

Fisher achieved t h i s by introducing the concept of randomized design

into experiments. Me emphasized that wherever possible, information

a carefully and deliberately staged trial a t w h i c h t h e levels of the

- were at t h e choice of !he experimenter.

In ge nerd, randomization would consist of t h e following.

from a set of M matrices {X . } for which X Y = C where i = 1, 2,.

is a probability p , which is t h e chance t h a t X will be selected for t h i s

where E represents the expectation over the randomization s e t ,

ponent of the error term -r = X,% is

in which x are regarded as random variables. This has

where r , is t h e standard deviation of x. and p. is t h e coefficient of

For example, suppose in a chemical experiment runs are made at rn

distinct lev els of xl, .. . , + An estimate of error might be obtained by

making r observations a t each s e t of m reaction conditions and computing

t h e variance estimate s2 based on m{r-l) degrees of freedarn. Clearly,

however, s5will not be a n unbiased estimate of V(a ) u n l e s s all t h e

set of conditions as between c b s ~ r v a t i o n smade a t different sets of

d a y i n external variables, s u c h as ambient temperature and humidity, m i g h t

be important elements i n E2, and t h a t a series of duplicated experiments

---- ----- ---------------- ---