Professional Documents
Culture Documents
DEPARTMENT OF STKIISTICS
-_-----------1111-- ----,-dm,---
University c f Th'isconsin
Madison, 14.Jisconsi~~ 5 '3706
Work done 1 9 6 4
Report i s s u e d 1Q73
by
INTRODUCTION
vdrjables x x
1' 2'
. . . ,xk such a s the amount of potassium, nitrogen, a ~ d
.
y . = f[x1, , * * , x ~ ) (1. 3 t
n variables %+p
ur?k~ow - a 9 XN- Suppose these are n observations from
observations m a y be w r i t t e n
r =4il
+E,% (1~51
Y- = - -
Xl(Bl AB2) + (Xz
+-- - - ---
X1AJB2
where
-
-A = (X',zl)'-X-~ X. ~ (1*8)
for
A
E(O,) = -- - - + --
(x;~~)-~x;(x~B~
Xz~z)
= o1- +--
ABZ ( 1. 10)
th
n-dimensional vector of 0 ' s and 1's and consider t h e u element y
U
.
th
Let yU = 1 indicate t h a t t h e u subject had, by a particular date, died of
lung cancer; and let y = 0 indicate that he had not died of lung cancer.
U
should be obtained not from records which had just happened but from
- is c h o s e n
Suppose that for a particular experiment t h e actual matrix XI
c h o s e n q u i t e independently of -
X 2 , t h e n E(XiXz) = 5 and i f XI is selected
-
r --
h
from a s e t for which X',X = C , t h e n E (el) =
-11-li - r -
el. The effect of unknown
-
variables X,
- in inducing n o n s e n s e correlations would t h u s be eliminated.
1. 2 Replication t o Supply a Valid Estimate of Eror
h
The fact that the estimates -
el i n the previous example were
unbiased would be, by itself, of little value i f w e could not measure them
th
against solne r ~ a s o n a b l yprecise estimate of error. Now the u com-
XI is kept
reliable estimate of V(E ) by making a series of runs in which -
U
- is allowed t o vary freely.
fixed but Xz However, i f any of t h e x's i n Xz
-
are constrained i n any manner w e shall not have a true estimate of V(E ).
U
conditions,
For example, suppose uncontrollable changes occurring from day to
were always made on the s a m e day, the effect of tho external variables
A>
time.
1. 3 Blockina
- "X1OI
y -- +r
d
(1. 143
clemcnts of -
XL, J30wever, it may be that although the estimates are
valid, they are also very jrnprecise. Techniques are needed therefore,
--
where ZIP represents t h e effect of '%lookk"variables, s u c h as reactors or
consider t h e model
where
- --
e = W$ (1. 18)
is t h a t part of the error which remains after removing the systematic effects.
X and -
Furthermore, i f it is arranged i n advance that - Z are o r t h o g o ~ a ls o
did not exist. Since the block variables do not affect the estimation, they
1. 4 Orthogonality
been shown by Hotelling (3944), Tocher (1952), and Box (1952) t h a t for
where x
QU
= 1, a z3 factorial d e s i g n might be employed i n which case t h e
set (x)
- could be t h e 8 ! matrices of d i m e n s i o n 8 X 4 whose rows are
the model
The observations -
y can be thought of as a vector i n the n-dimensional
- and -
sample space and each of t h e c o l u m n s of X Z provides another s u c h
hyperplane where these coordinates are measured with respect t o the basis
X and -
vectors which are the columns of - 2.
is orthogonal to both, t h e n
f2 =I; + e 2 t 1 2
Y Z r
or equivalently i n matrix notation
- --$ I ' ( ~- 2) .
A A A
- - + +(x
A
- - = R --
yly ' x ' x ~4 ~-
'IJZ'-
ZQ (1. 2 3 )
- and Z
illustrated diagrammatically i n Figure 1.1 where t h e lines X - each
represent multidimensional spaces. Specifical.ly, i n terms of the
If t h e hyperplane -
X had dimensionality v and t h e hyperplane -
Z had
X
dimensionality v and i f
z
n - v - v = v (1. 2 5 )
X z r'
then v v and v are called t h e degrees of freedom associated with
x' 2
' r
treatments, blocks, and residual.
If the y ' s could be regarded as being normally distributed, Fisher
obtained.
of agriculture. However, they have been used with great success i n many
uncertainty.
could be hoped for because of the Iarcje experimental error which existed.
change i n level of variable xl is not the same as that due to the change
i n level of variable xz. Again, however, usually only gross effects were
being sought.
equation
t h e model
-
each other and of t h e elements of X and -
Z. In r e s p o n s e surface
special designs have been developed (Box and Wilson, 1951; Box and
J. S. Hunter, 1957; Box and Draper, 1959; Davies, 1960) which are called
J. S. Hunter, 1957).
Table 1.1 Desian Matrix and Observations
and choose T St, Po, and S s o that appropriate ranges are covered.
0' P
The design i s s e t out diagrammatically i n Figure 1.2 i n which it is
Temperature T
.can be estimated by least squares and the follcwing analysis of varla9ce
table c a n be obtained.
Source of Variation
polynomial
Residual
There are basically two kinds of uncertainty: 1) uncertainty arising from the
variable xl
A
y = bo + b,x, + b,,xf (1. 33)
Y. = 41x1) ( 1 . 34)
the equation.
may be w r i t t e n in m a t r i x notation a s
Suppose that in fact additional terms are needed, so that actually
Then the expected vslue of the residual sum of squares S is not v crZ
r r
as it would be i f t h e model (I, 35) were adequate, but
2
E(S ) = A' ?- v cr (1. 3 7 )
r r
where
n 2 = EX;[ -~ --x-( -x ~ -
x ) x,p,
-~x~]
-- (1. 38)
having v = v
1 r
- v
e
degrees of freedom can be obtaj.r?sd i n which t h c effczct
Degrees of Sums of
Source of Variation Mean Squ=ea
Reedon Squxes
l a c k of F i t 3 458.3 152.8
precise experiment has been run in which the change in response over-
and the graduating polynomial was representing the function quite well.
further work.
CHAPTER 2
-
y = x--
p t - --
ze+-
e (2.1)
*
Z represents extraneous effects, s u c h as
interested, and t h e part --
-
parameter in the set Q , Thus, in the simplest possible case there
-
Q will be the overall m e a n w h i c h is to be "eliminated" in further
analysis.
Suppose t h e model --
X p contains v parameters. For example,
m
i f w e were concerned with fitting a sec0r.d degree polynomial i n two
would be written
where the variables after elimination of t h e m e a n are indicated by a
s ~ r c of
e Degrees of Expected
Sums of Squares
Variaticn Freedom Sums aE-Squ are s
Elirni nated 4 A n -
V 'Z a =
-Q ' Z-- (yu12
Variable s b u=l u=l
n n n
Model 'I
rn elxlxe z(9U -Y12
-- = u=1 U
z (q -Q2
u-l u
s Y
m
=2
Re sidu a1 A 5
(War)
V
r
(x-x) A -
' (x-xl = ;;l(~u-~u)
A 2
Y
r
wZ
where
--
To t e s t for t h e necessity of including terms X P i n t h e model, t h a t
T
J
measures the level of the se s ponse after exkaneous variable s have been
" not too large compared with t h e real changes i n 3 - 3 which occur over
t h e experimental region.
2.1 An Overall Measure of t h e Error i n y-- - y
A *
the quantities q
u - 2-
- A -
The least squares estimate of r)u - qU is yU - yu and
xfiA = x(xlx)-'X'v
y - Y
The variance-covariance m a p i x of - - is
A -
N
th
The variance of -y is therefore t h e u diagonal e l e m e n t of
U u
- 2 = ---
M X(X'X) - w2 .
- (2.9)
I"'
A
In general, t h e variance 05 y -y will change from one design
U u
point t o the next. We m a y u s e a s an overall measure of variance t h e
average variance of t h e
N
X1 Xz x3 x4 x5 Xh
that is, the true response function can be represented by the assumed
the spread of the true values about their mean 6 indicated by the heavy
lines i n Figure 2 . 1 w i t h t h e average spread of t h e estimates y
A
- y- of
these quantities.
fact
nr'
' i s sufficiently large only if M /M is greater than Fo.
m r
The value
parameter Y2
m
. This can be determined using t h e non-central F -
distribution,
distribution
where
and
Note that the factor w is a " h i s s o n weighting factor" where the quaritity
5
Z 2
zv,yrn is t h e parameter of t h e Poisson w e i g h l i n g distribution.
axZ where a and b are chosen such that t h e mean and the variance of
b
the approximating dis'rribution are t h e s a m e a s for t h e non-central
X2-distribution.
Then
j. S. Hunter.
2. 4 Example
Block 51 -
x2 x3
- -
Y
The best fitting second degreo response relationship is given by
A
y = 51.80 + 0. 74xI + 4. 81x2 3 7. 96x3 - 3.82~: + 1.21~:- 6. 26x;
-0. 0. 33x1x2 + 10. 2 7 x l x 3 - 2. 83xZx3 (2. 20)
Degrees of
Source Sums pf Squares M e a n Squares
Freedom
53571.37
Extra due to 1
Degree Polynomi al 464.38
Extra due t o 2
Degree Polynomial
13"*
1584.35
1 3 } 2 9 i ' i ' 48
264.06
i 330.83
now be considered.
The present problem does not f i t very naturally into the pattern
and
.-
b1 Yr m r
several given values of y l , corresponding values of (1 -k y2
')Fb,
L
are computed. The latter values are plotted against the y l values,
This value of y l is 5. 6 5 ,
the data.
0 = --
By means of a linear transformation - T P t h e quadratic form P'X'X
- --p/r2 L
vm
can be reduced to a sum of squares -8 "-
/(r2 = E 0f / c r 2 .
i=1l
Let us envision an experimental situation i n ~ v h i c hthe
where
A
Since t h i s expression is symmetric i n 0 and 8 it follows that
also h a s a posteriori a non-central Xz-distribution w i t h v degrees of
m
freedom and non -centrality parameter
where
rr2 given sZ is
I
( v s2/2)"r
P(rrvs2) =
r e x p {- $ ( vr s2/e2) ) . (2-35)
-
To obtain the joint distribution of ~ h n 5-dt
rn
p ( ~1 r2)
k is rnultiplfod
by p(r21 s 2 ) , and then rz is eliminated from this joint distribution by
distribution of n2
m
given s 2 is
where
and
and
and
obtained. Figure 2.2 shows both the exact and approximating distributions
distribution.
of other values which have not occurred, i n fact, b u t which could hzve
applies, but these parameters are averaged out with weights proportional
t o their probabilities.
that
w h e r e a = 0.05, weobtainapproxirnatelyy = 5.5 a n d y = 13.9.
M1 mz
The interval defined by y and y may be compared with the 95%
ml mz
confidence interval far y (Section 2. 5) which is 5 - 6 5 < y < 13.79.
m rn
CFAPTER 3
-). Then
t h a t q can be represented by some graduating function glx
Source I Freedom
M r r r Squares I Expected Mean Squares
-E(K)I
I
n A " n
Model Y
rn M
m u=l
z [E~J
= Z: ( Y , - Y ~ ) ~ / V ~ uzl +
2 / ~ m rL
n . A n
Residual v M = T ;
r I- u = l B [ v ~ - E & ~ )2/vr
u=l J + rL
v
r
M
r
-Y
e
M
e
= v M associated w i t h lack of f i t w h e r e v
1 1 I!
=v
r
-v
e
. This
n
Model v
m " = U-1
E
rJ
(~u-~,~2/~, 1 [ E ( F ~ ) ~ E ( <' /)vIIn +rZ
~ = l
Lack of Y
L
A
'/vl + r2
Fit
Pure
v M = sZ u
Error e e
for the case of one variable in Figurc 3.1. However, the argument is
h
Figure 3.1 Discrepancies q -E(y j Compared w i t h Average Variance
X X
A
of Quantities y-y.
W e suppose t h e nature of the true funct,ion q t o be unknown. A gradusting
-
function g(x) i s fitted over t h e region of interest. Since the fitted value
n
yx at some .point*-
x i n the space of the variables is given by
A A
- -- --
x = X ' ( X ' X ) - ' E ~ then E(yX ) = - -- The function,E(yX ) is there -
X'(X'X)-'X'~.
between the interpolating polynomial and the true function but this is
the quantity
where
n
The quantify Z {rl -E$
U
)I2 is precisely t h a t which appears a s a
u=1
factor in t h e n o n ~ c e n t r a l i t yparameter in the lack of f i t mean square i n
and
where
quantity y
L'
3.1 A Significance Test for v,
with
This approximation and the method for finding the appropriate value
obtained,
t a b l e is;
-- 3 . 3 Analysis of Variance
Table
-
bascd on t h i s approach,
and
of ym. By a s i m i l a r argument
and
-I --
Lack of Fit t Pure Error Sums of Squares
2
-
effectiveness of g ( x ) i n representing f(x)
- over the region of interest, w e
could compare a measure of t h e discrepancies f (x)- g (--
x ) at t h e experimental
-
points w i t h the variation accounted for by g ( x ) a t these s a m e points.
ing function over the region. It will be noted that t h e quantities w e are
situation.
c a n be derived i n
T h e posterior distribution of y
.elm= Yl/Y,
the following way. By m e a n s of linear transformations t h e quadratic
and
-1, vm
rn
P@/ r) =(fir) e x p { -1$' 1.z{ 0 ~ - 8 ~ 1 1~ / r ~
vf A vmh
' = Z 9?/s2 and L' = z 8?/cr2
and non-centrality parameters L
i-1 1 m i=l I
respectively.
X z - ~ a r i a t eeach
s with the appropriate number of degrees of freedom
sidered i s
divided by the error mean square. If the expressions for I f and 1 are
and
th
The u term i n the summation i n 1 becomes n-k -1 ( -1)1' n ( l +l f / v )
S ( e zl
w h e n p = 0, where p = z - n t k - u -1-1 ,n 1
-. ,v + r - j , z = r-1, with I : and
e
1:defined by ( 3 . 2 8 1 a n d ( 3 . 2 9 ) . T h e o n l y r e s t r i c t i ~ n s o n ~ ~ )are
"~~
r l/m
t h a t w m u s t be a non-negative integer and therefore v must be a positive
m
even integer such t h a t $v ->
. s +1, and n must be a positive integer,
rn
hence v m u s t be a positive even integer such that $v 2 k -El. These
e e
restrictions do not present a major problem because when a situation
a digital computer.
given by
DISTRIBUTIONS A N D APPROXIMATIONS
where Y2
m
= A' /v
m rn
. On stated assumptions, we have obtained the
posterior distribution of h 2 as
rn
is the ratio of the residual sum of squares divided by the residual plus
distribution (4.1).
k
,m (t) = (1-2it)
m
-2v --
= (1-2it)
m
{(l-Zit)/(l-Zit/F')]
t",
with v
m
- v, r degrees of freedom. The second term in ( 4. 5) is t h e
vl = v -Y a n d v , = v , ) Boxshows t h a t i f t h e v areevenintegers
rn r' r j
such that v. = Z g , the density function of A is a weighted finite
1 J m
sum of X 2 -distributions
and
th th
where t h e h moment about the origin pi is related to t h e h
ih
cummulant K by w e l l known expressions. If either one or both of
ih
the quantities v and v are odd integers, t h e distribution of A'
rn r m
m a y be obtained by using the t h e infinite series due to Robbins and
adj~iningeven integers.
Case 2:
-----
t.
rn
= v
r
. When v
m
= v
rf
the characteristic function of
Case 3: v
. m
<v
r
. If v
r
- v
m
is a positive even integer, t h e n it
where
integers is suitable,
Smith (193 6): Saiterthwaite (1941); and Box (1954) have shown that a
distribution is
The mean and variance of t h e exact distribution are obtained from (4.5),
and
Equating (4.1 5 ) to (4.19) and (4.16) to (4.201, the solutions for the
and
Ym
= 1 Amf i l m i n (4.3). ~ h u s ,
$vmtk v f2k-1
M
k(vm' (Y,)
P(Y,),=
O,
2 -L
k=o pYrn +k-1
-vy,2
/;) ( 4.24)
r($vrn tk)
and 3 ) w h e n v < v
rn P
and b = v
r'
4.4 -
Adequacy of t h e ,Approximation
y are given for various examples i n Table 4,l. Tables 4. 2 and 4.3 -
m
show t h e third and fourth cumulants respectively for the exact
*
diswibution of A' and for t h e approximating distribution. In s o m e
rn
cases the agreement between t h e s e cumulants becomes worse for a
functions for this case are shown i n Figure 4.1, ) The following
conclusions emerged:
l/P = 1 6 , vm = 4, v = 8
r
Ym P(Y,) P(Y,)
Exact Approx Exact Approx
0 0.000 0.000
1 0. Q 3 4 0.027
2 0.123 0,127
3 0,222 0.229
4 0.249 0.250
5 0.193 0.191
6 0.110 0.108
7 0.047 0.047
8 0.016 0.016
9 0.004 0.004
10 0.001 0.001
,
Table 4.2 Thirci Clrrnulants of Exact and Approximate Distributions
A'.
of y, '"
X
The f i r s t tabled value for each combination of v and v corresponds
rn r
t o the exact distrfbution and t h e second t o t h e approximation.
Each value shown is a factor of 8(1013 less than t h e true cumulant
value a
Table 4.3 Fourth Curnulants of Exact and Approximate Distributions
of ,Y
.-.
'0
8
137.39 134, 14 131.02 128.00 125.09
I/P increases.
e. If v
m and vr are both increased s u c h that 1 v rn -v r I ' remains
distribution of A is
1
1/m9
t h e discrepancies between f (- - a t the experimental design
x ) and g (x)
t h e response and g (-
x ) is t h e graduating function. The distribution
V Y
where L' = c1 AZ
(P~/(T' aalld L' = m0
z A
Then t h e rt h conditional
1 i-1 m i=l1
m o m e n t of t h e distribution of y2 is
1 /m
1
B($V
1
+ r th, ~v rn -r +t)
X (4.31)
expressed as
th
The r m o m e n t of y Z is therefore
Urn
where
V
a
and 1: = Z $;is:
i=l
A vmA
and 1 ; = T: 0?/s2
1=1 1
.
Now let
If w is a non-negative integer, 3 can be expressed as
X
w n-k
x k=o
-k J xZ/(l+xl ? v e ) dx . (4.3 6 )
0
th
where p = z - n t k - u + l (Dwight 1954 ), W h e n p = 0, the u term i n the
2
summation in I becomes (n-k-1) (-11% n(l-'iepv ). Combining ( 4. 36)
s e
th
and ( 4 , 3 7 1 with (4. 331, the finite £om of.the r moment of e/m i s
W e find
b . = 2 p ; ( ~ - z } / (4w - 3 2-1)
and
Q = ( 4W-32-1)/(2W-Z-1)
where
w = pypizand Z = p,$/tp;tL~)r
This identity is true because the ~ ( 9 ) ' sare the same a s e ' s , that is,
function q = f(x)
- by means of a graduating function g(x).
- For any
partlculx~set of "experimental points, " the "total sum of squares, ",
the 'Yegres sion s u m of squares, " the "residual s u m of squares, @"and
occurring ir! the graduating function, and the residual sum of squares
sums of squares and the ratio R are not fixed constants but have
fore unsatisfactory.
estimate of t h e response.
t e s t i n g approach for y
m
. Far a given significance level cr and for
body of the table, The entry Pwould represent the value of t h e ratio
a level.
88
t o transform the table described above so that the more familiar F-ratio
prepared for y
L'
- Tables are particularly needed to assist i n the calculation of
39-52.
Box, G. E. P. and Hunter, W. G. (19 62). "A Useful Method for Model
Building", Techornetrics 5 301-318.
-
13, 1-38,
r .-*I--&-*.-.
d
IC. O l S T R I f i U T l O W 5 T A T E M E N T
1,
I n response surface methodology t h e true functional form f(x) is i
usually not known and t h e response must be approxirnat~dby means of a t
"graduating function" g(x) (for example, b y a polynomial i n 5) over t h e
region of interest. The relationship between an obsezva:ion y and t h e
graduating function g ( x ) is therefore y = g(z) -t- P t e , where P = f(x)-
I
slxS -
rn of t h e variation accounted for by glx)
In t h i s report, a measure y
i n relation t o the size of t h e error of estimate of g(x)and a measure yl
I3
of t h e discrepancy i n t r ~ d u c e ddue to t h e bias term P i n relation to
t h e error of estimate of t h e response a r e suggested. Furthermore t h e
"interpolation efficiency" = l/(l + y j / y h ) is defined. T h i s is a measure !
I
of t h e amount of c h a n g e accounted for by the grzduating function g h )
compared w i t h t h e c h a n g e occurring i n t h e t r u e function f t ~ over
) the
region of i n t c r ~ s t R. Mcthocls for estimating t h e criteria are d i s c u s s e d .
E
Security Clasaificntion