Professional Documents
Culture Documents
DESIGN OF EXPERIMENTS
Preface 7
2 Avoidance of bias 19
2.1 General remarks 19
2.2 Randomization 19
2.3 Retrospective adjustment for bias 29
2.4 Some more on randomization 32
2.5 More on causality 34
2.6 Bibliographic notes 36
2.7 Further results and exercises 37
References 305
Index 323
APPENDIX C
Computational Issues
Revised and converted to R by Wei Lin and Nancy Reid, July 2010.
C.1 Introduction
In the published version of the book (Chapman & Hall, 2000),
Appendix C included code in S-PLUS for the examples discussed in
the text. In this addendum we provide an updated and corrected
version of this Appendix, with all the code converted to R. The
examples in this supplement were run under R version 2.11.1. For
ease of comparison with the original version we have kept the text
largely the same, except in this Introduction, or where R-specific
functions are introduced.
There is a wide selection of statistical computing packages, and
most of these provide the facility for analysis of variance and esti-
mation of treatment contrasts in one form or another. With small
data sets it is often straightforward, and very informative, to com-
pute the contrasts of interest by hand. In 2k factorial designs this
is easily done using Yates’s algorithm (Exercise 5.1).
R is an open-source statistical language and environment mod-
eled after S and its commercial implementation, S-PLUS. It is freely
available under a GNU General Public License. Originally created
by Robert Gentleman and Ross Ihaka at the University of Auck-
land in 1995, it is now maintained by the R Development Core
Team∗ through the R Foundation, and is very widely used in the
statistics community. A great strength of R is the large number
of packages that can be installed as add-ons to the basic distribu-
tions. Software and packages can be downloaded from the R project
website http://www.r-project.org/.
We give here a very brief overview of the analysis of the more
C.2 Overview
† Faraway, J.J. (2004). Linear Models with R, CRC Press, Boca Raton.
‡ Maindonald, J. and Braun, W.J. (2003). Data Analysis and Graphics Using
R: An Example-based Approach, Cambridge University Press, Cambridge
§ Venables, W.N. and Ripley, B.D. (2002). Modern Applied Statistics with S,
Springer-Verlag, New York.
¶ http://socserv.socsci.mcmaster.ca/jfox/Misc/Rcmdr/
OVERVIEW 269
can be written
y~x+A*B
while
E(Yjs ) = µ + βj xjs + τjA + τsB + τjs
AB
OVERVIEW 271
can be written
y~x+x:A+A*B.
There is also a facility for specifying nested effects; for example
the model E(Ya;j ) = µ + τa + ηaj is specified as y ~ A+B/A.
Model formulae are discussed in detail by Chambers and Hastie
(1992, Chapter 2).
The analysis of variance table is printed by the summary func-
tion, which takes as its argument the name of the aov object. This
will show sums of squares corresponding to individual terms in the
model. The summary function does not show whether or not the
sums of squares are adjusted for other terms in the model. In bal-
anced cases the sums of squares are not affected by other terms
in the model but in unbalanced cases or in more general models
where the effects are not orthogonal, the interpretation of individ-
ual sums of squares depends crucially on the other terms in the
model.
R computes the sums of squares much in the manner of stagewise
fitting described in Appendix A, and it is also possible to update
a fitted model using special notation described in Chambers and
Hastie (1992, Chapter 2). The convention is that terms are entered
into the model in the order in which they appear on the right hand
side of the model statement, so that terms are adjusted for those
appearing above it in the summary of the aov object. For example,
unbalanced.aov <- aov(y ~ x1 + x2 + x3); summary(unbalanced.aov)
C.2.5 Plotting
There are some associated plotting methods that are often use-
ful. The function interaction.plot plots the mean response by
levels of two cross-classified factors, and is illustrated in Section
C.5 below. An optional argument fun= allows some other specified
function of the response, such as the median or the standard error,
to be plotted instead; see the help file for this function.
The function qqnorm.aov()/qqnorm() in package gplots, when
applied to an analysis of variance object created by the aov func-
tion, constructs a full or half-normal plot of the estimated ef-
fects (see Section 5.5). Two optional arguments are very useful:
qqnorm.aov(aov.example, label = T) allows interactive label-
ing of points in the plot by clicking on them, and qqnorm(aov.example,
full = T) will construct a full normal plot of the estimated effects.
> potash.matrix
[, 1] [, 2] [, 3]
[1, ] 7.62 1 1
[2, ] 8.14 2 1
[3, ] 7.76 3 1
[4, ] 7.17 4 1
.
.
.
[15, ] 7.21 5 3
> is.factor(potash.tmt)
[1] TRUE
> is.factor(potash.matrix[, 2])
[1] FALSE
> is.factor(potash.df$tmt)
[1] TRUE
> is.factor(potash.df$blk)
[1] TRUE
Terms:
tmt blk Residuals
Sum of Squares 0.73244 0.09712 0.34948
Deg. of Freedom 4 2 8
> summary(potash.aov)
Df Sum Sq Mean Sq F value Pr(>F)
tmt 4 0.73244 0.18311 4.1916 0.04037 *
blk 2 0.09712 0.04856 1.1116 0.37499
Residuals 8 0.34948 0.04369
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
> dummy.coef(potash.aov)
Full coefficients are
(Intercept): 7.722
tmt: 36 54 72 108 144
0.128 0.33133 0.021333 -0.20867 -0.272
blk: I II III
-0.092 0.104 -0.012
> model.tables(potash.aov)
Tables of effects
tmt
tmt
36 54 72 108 144
0.1280 0.3313 0.0213 -0.2087 -0.2720
blk
blk
I II III
-0.092 0.104 -0.012
7.722
tmt
tmt
36 54 72 108 144
7.850 8.053 7.743 7.513 7.450
blk
blk
I II III
7.630 7.826 7.710
Finally, in this example recall that the treatment levels are not
in fact equally spaced, so that the exact linear contrast is as given
in Section 3.5: (−2, −1.23, −0.46, 1.08, 2.6). This can be specified
using contrasts, as illustrated here.
> contrasts(potash.tmt) <- c(-2, -1.23, -0.46, 1.08, 2.6)
> contrasts(potash.tmt)
[, 1] [, 2] [, 3] [, 4]
1 -2.00 -0.44375 -0.4103 -0.3773
2 -1.23 -0.09398 0.3332 0.7548
3 -0.46 0.86128 -0.1438 -0.1488
4 1.08 -0.15416 0.6917 -0.4605
5 2.60 -0.16939 -0.4707 0.2318
> chick.df
weight tmt blk
1 2.51 C 1
2 2.15 His- 1
3 2.49 C 2
4 2.23 Arg- 2
5 2.54 C 3
6 2.26 Thr- 3
.
.
.
> coef(chick.aov)
(Intercept) blk1 blk2 blk3 blk4 blk5 blk6
2.288 -0.1105 -0.013 0.0245 0.060333 0.060333 -0.25883
> dummy.coef(chick.aov)
Full coefficients are (Intercept): 2.288
...
tmt: C His- Arg- Thr- Val- Lys-
0.26167 0.04333 -0.09167 -0.08667 -0.22833 0.10167
...
tmt
C His- Arg- Thr- Val- Lys-
2.445 2.314 2.233 2.236 2.151 2.349
13 14 15
4.38 4.69 4.84
284 COMPUTATIONAL ISSUES
> dummy.coef(chick.aov)$tmt
C His- Arg- Thr- Val- Lys-
0.26167 0.04333 -0.09167 -0.08667 -0.22833 0.10167
### these do not agree exactly with the text Table 4.12
> taustar
C His- Arg- Thr- Val- Lys-
0.26316 -0.0013772 -0.09593 -0.082979 -0.21716 0.13428
> sqrt(1/((1/vartau1)+(1/vartau2)))
[1] 0.03968671
> setaustar <- .Last.value
> sqrt(2)*setaustar
[1] 0.05612548
12 13 14 15
12.7 11.7 11.4 11.1
> summary(dough.aov)
Df Sum Sq Mean Sq F value Pr(>F)
day 6 49.412 8.235 11.1877 0.002750 **
tmt 14 96.225 6.873 9.3372 0.003149 **
Residuals 7 5.153 0.736
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
> dummy.coef(.Last.value)$tmt
1 2 3 4 5 6 7 8
1.3706 3.5372 -2.3156 -1.0711 2.1622 3.9178 0.85389 2.2539
9 10 11 12 13 14 15
-0.51556 -3.4822 0.58444 -1.9822 -0.71556 -3.2822 -1.3156
> replications(dough.df)
$expansion NULL
$tmt
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
2 2 2 2 2 2 2 2 6 1 1 1 1 1 1
$day
[1] 4
5 I 0 1 gnut
.
.
.
, ,
gnut
0 1
0 6425.5 6927
1 6593.0 7192
2 6591.0 6847
, ,
soy
0 1
0 7073.5 7831.0
1 6596.0 7325.5
2 6585.0 6662.0
> dev.off()
6887.375
Lev.f
Lev.f
0 1
6644 7131
. . .
Standard errors for differences of means
Lev.f Lev.pro Type House Lev.f:Lev.pro Lev.f:Type
86.4 105.8 86.4 86.4 149.6 122.2
replic. 12 8 12 12 4 6
290 COMPUTATIONAL ISSUES
Lev.pro:Type Lev.f:Lev.pro:Type
149.6 211.6
replic. 4 2
tion omit=. The effects are extracted from the aov object using
effects(aov-example), which in turn relies on the Q-R decom-
position; these are not equal to, but are proportional to, the effects
as defined as the average difference between the two levels.
> logcancer.df<-data.frame(log.rates,cancer.design)
> logcancer.df
log.rates A B C D
1 -5.159485 -1 -1 -1 -1
2 -5.294907 1 1 -1 -1
3 -5.040542 1 -1 1 -1
4 -5.220409 -1 1 1 -1
5 -5.444233 1 -1 -1 1
6 -5.203099 -1 1 -1 1
7 -5.339566 -1 -1 1 1
8 -5.287310 1 1 1 1
Tables of effects
A
-1 1
0.018054 -0.018054
B
-1 1
0.0027375 -0.0027375
C
-1 1
-0.026737 0.026737
D
-1 1
0.06986 -0.06986
A:B
B
A -1 1
-1 -0.021623 0.021623
1 0.021623 -0.021623
A:C
C
A -1 1
-1 0.07609 -0.07609
1 -0.07609 0.07609
A:D
D
A -1 1
-1 -0.029165 0.029165
1 0.029165 -0.029165
Tables of effects
A
-1 1
1.25 -1.25
B
-1 1
0.75 -0.75
C
-1 1
-2.75 2.75
D
-1 1
6.75 -6.75
EXAMPLES FROM CHAPTER 5 293
A:B
B
A -1 1
-1 -2.5 2.5
1 2.5 -2.5
A:C
C
A -1 1
-1 7.5 -7.5
1 -7.5 7.5
A:D
D
A -1 1
-1 -3 3
1 3 -3
> qqnorm(logcancer.aov, label = TRUE) # plot half-normal quantitle
# then save it as FigC.1.ps
> mean(1/death.c)
[1] 0.01023017
10 11 12 13 14 15 16
449.50 463.75 466.00 449.50 452.75 469.00 471.50
> flour.ybar <- .Last.value
Tables of effects
A
-1 1
-6.828 6.828
B
-1 1
-1.8594 1.8594
C
-1 1
-7.359 7.359
D
-1 1
-3.516 3.516
E
-1 1
0.07812 -0.07812
F
-1 1
1.2031 -1.2031
A:B
B
A -1 1
-1 -1.2656 1.2656
1 1.2656 -1.2656
A:C
C
A -1 1
-1 0.10937 -0.10937
1 -0.10937 0.10937
B:C
C
B -1 1
-1 -1.4219 1.4219
1 1.4219 -1.4219
A:E
E
A -1 1
-1 -0.07813 0.07813
1 0.07813 -0.07813
B:E
E
B -1 1
-1 0.015625 -0.015625
EXAMPLES FROM CHAPTER 6 297
1 -0.015625 0.015625
C:E
E
C -1 1
-1 0.07812 -0.07812
1 -0.07812 0.07812
D:E
E
D -1 1
-1 -0.3281 0.3281
1 0.3281 -0.3281
A:B:E
, , E = -1
B
A -1 1
-1 -1.5781 1.5781
1 1.5781 -1.5781
, , E = 1
B
A -1 1
-1 1.5781 -1.5781
1 -1.5781 1.5781
A:C:E
, , E = -1
C
A -1 1
-1 -1.2656 1.2656
1 1.2656 -1.2656
, , E = 1
C
A -1 1
-1 1.2656 -1.2656
1 -1.2656 1.2656
Error: days
Df Sum Sq Mean Sq F value Pr(>F)
Residuals 2 77.556 38.778
Error: days:prep
Df Sum Sq Mean Sq F value Pr(>F)
prep 2 128.389 64.194 7.0781 0.04854 *
EXAMPLES FROM CHAPTER 6 299
Error: Within
Df Sum Sq Mean Sq F value Pr(>F)
temp 3 434.08 144.69 36.4266 7.449e-08 ***
temp:prep 6 75.17 12.53 3.1538 0.02711 *
Residuals 18 71.50 3.97
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Tables of means
temp
1 2 3 4
31.22 34.56 37.89 40.44
prep
1 2 3
35.67 38.50 33.92
temp:prep
prep
temp 1 2 3
1 29.67 33.33 30.67
2 34.67 39.00 30.00
3 39.33 39.67 34.67
4 39.00 42.00 40.33
F: C2 1 26801293 26801293
OE 1 408747 408747
A:OE 2 112170 56085
A:OE: C1 1 35556 35556
A:OE: C2 1 76614 76614
B:OE 2 245020 122510
B:OE: C1 1 70939 70939
B:OE: C2 1 174081 174081
C:OE 2 5983 2991
C:OE: C1 1 523 523
C:OE: C2 1 5460 5460
D:OE 2 159042 79521
D:OE: C1 1 133300 133300
D:OE: C2 1 25741 25741
E:OE 2 272092 136046
E:OE: C1 1 50139 50139
E:OE: C2 1 221953 221953
F:OE 2 13270 6635
F:OE: C1 1 12429 12429
F:OE: C2 1 840 840
Residuals 10 4461690 446169
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
k cran.r-project.org/doc/contrib/Faraway-PRA.pdf
∗∗ http://cran.r-project.org/doc/contrib/Vikneswaran-ED_companion.pdf
†† Berger, P.D. and Maurer, E. (2002). Experimental Design with Applications
in Management, Engineering and the Sciences, Duxbury Press, Belmont.