You are on page 1of 7

Chap.

6, page 1

Chapter 6 Linear Combinations and Multiple Comparisons of Means

The overall F-test is only one step in the comparison of several groups. In Chapter 5, we saw
how the pooled standard deviation could be used in confidence intervals and hypothesis tests for
the difference of any pair of group means. We can generalize this to confidence intervals and
hypothesis tests for any linear combination of group means.

A linear combination of group means has the form


γ = C1 µ1 + C 2 µ 2 + … + C I µ I

where C1 , … , C I are coefficients chosen by the researcher to measure a feature of interest.

Mice example:
Suppose I wanted to examine the difference in the mean lifetimes of the two control diets:
µ1 − µ 2 . Then C1 = 1 , C 2 = −1 , C3 = 0 , C 4 = 0 , C5 = 0 , C 6 = 0 .

More complicated combinations are sometimes of interest: suppose I wanted to compare the
average of the two control diets to the average of the four reduced calorie diets: what are the
Ci ’s for it?

In the above two examples, the sum of the Ci ’s is 0 in each case. A linear combination of the
means in which the coefficients sum to 0 is called a contrast because it compares or contrasts
some means with others. Specific contrasts are often of interest in ANOVA, but we can create a
confidence interval for any linear combination of means; it does not need to be a contrast.
Which of the following linear combinations of means are contrasts?

a) ( µ1 − µ 3 ) + ( µ 4 − µ 5 )

µ1 + µ 2 + µ 3 + µ 4 + µ 5
b)
5

c) 2 µ1 − µ 2 − µ 3

(µ 2 + µ 3 + µ 4 + µ 5 )
d) − µ1
4

e) µ1 + .µ 2 + µ 5 − µ 3 − µ 4

(µ 3 − µ 2 ) (µ 6 − µ 3 )
f) −
35 10
Chap. 6, page 2
(this is from the mice example on p. 157 and compares the increase in mean life
expectancy per calorie of going from N/R85 to N/R50 to the increase per calorie of
going from N/R50 to N/R40)

Estimation and standard error of a linear combination of means

Estimate γ = C1 µ1 + … + C I µ I by g = C1Y1 + … + C I YI .

Note: γ is a parameter and g is a statistic.

From results last semester:

SD(g) =

We estimate σ by the pooled standard deviation s p . Plugging this estimate into SD(g) gives the
estimated standard deviation of g which is the standard error of g:

SE(g) =

A 100(1-α)% confidence interval for γ is

g ± t df (1-α/2) SE(g)

where df is the degrees of freedom for s p (that is, n-I). A test of the hypothesis
g
H 0 : γ = 0 is carried out using the test statistic t = .
SE ( g )
Examples:
1. Compute a confidence interval for the difference between mean lifetimes for the laboratory
control (N/N85) and the unrestricted controls (NP):

What linear combination γ of the µ i ’s do we want to estimate?

What is the estimate of this contrast? g =

What is the standard error of g?

SE(g) =

What are the d.f. associated with this SE?


Chap. 6, page 3

Compute a 95% confidence interval for the contrast and interpret:

2. Compute an estimate of the contrast which is the average of the two control diets minus the
average of the four reduced calorie diets along with a confidence interval.

27.4 + 32.7 42.3 + 42.9 + 39.7 + 45.1


g= − = -12.45
2 4

(1 / 2) 2 (1 / 2) 2 (−1 / 4) 2 (−1 / 4) 2 (−1 / 4) 2 (−1 / 4) 2


SE(g) = 6.678 + + + + + = 0.7800
49 57 71 56 56 60

A confidence interval for the contrast is

–12.45 ± t 343 (.975) (.7800) = -12.45 ± 1.53 = -13.98 to –10.92.

We are 95% confident that the mean life expectancy of the two controls diets is from 10.9 to 14.0
months less than the mean lifetime of the four restricted diets.

Simultaneous Inferences

Fishing Expeditions and Data Snooping: tests based on how the data turned out.
Say you would like to study a particular stream in Chile that is being used to dispose of waste
from pulp manufacture.

You plan your experiment like this:


i) Measure the concentrations of contaminants in the water
ii) Measure different stream characteristics such as species diversity (plants and
animals), species richness (plants and animals), brood size for several species of fish,
asymmetry in fish, prevalence of bacterial infection (plants and animals) etc., in the
stream and in several other uncontaminated streams in the region.
iii) Keep measuring stream characteristics until you find a significant difference between
the pulp waste stream and the others at the 95% level.

What is the probability of a type I error in this experimental design?

What is the probability that you find a significant difference?

This example, and the one found in your text, which you should study until you understand it,
illustrates the need for family-wise control of the alpha value, or confidence level.
Chap. 6, page 4
If we form individual 95% confidence intervals for a set of linear combinations of means, then
we cannot be 95% confident that they all include the true parameters they’re estimating. The
actual confidence that a family of confidence intervals are simultaneously correct is called the
familywise confidence level.

Example: Say we conduct an experiment where we make 10 pairwise comparisons and control
each of them at the 95% confidence level. What is the probability that of at least Type I error
occurring in the experiment?

Let p = Pr( success ) = 0.95 , q = Pr( failure) = 0.05 , where success means that we do not make a
type I error, and let x be a binomial random variable. Also, assume that each comparison is
independent of every other. Then, the probability of at least type I error is given by

⎛10 ⎞
Pr( x ≥ 1) = 1 − Pr( x = 0) = 1 − ⎜ ⎟ p10 q 0 = 1 − 0.5987 = 0.4013
⎝0⎠

Hence, if we would like to control our probability of type I error for our entire experiment, we
must make some adjustments. Unfortunately, the above calculation depends on the trials being
independent, which is probably not the case for most experiments. So we can not calculate
family-wise probabilities exactly. For this reason, there are several different versions of alpha
correction techniques.

The most common form of family-wise correction is the Bonferroni inequality to create
simultaneous confidence intervals with any desired familywise confidence level. To create
100(1-α)% simultaneous confidence intervals for k parameters, we make each confidence
interval an individual 100(1-α/k)% confidence interval. .

Example: Simultaneous 95% confidence intervals for 10 parameters:

Bonferroni guarantees that the familywise confidence level is at least 1-α, but it can be overkill,
especially when k is large. There are several ways that have been developed for creating
simultaneous confidence intervals among means that can be less drastic.

Planned and unplanned comparisons

• Planned comparisons: contrasts which the researcher decides are of interest before the
data are collected. We can control the familywise confidence level using the Bonferroni
inequality or one of the other methods listed below.

• Unplanned comparisons: contrasts which the researcher decides are of interest after
examining the data. These may be chosen from a larger set of contrasts which have been
examined or may be chosen after looking at the data to suggest contrasts of interest. The
confidence intervals must take into account that you actually (in the first case) or
essentially (in the second case) examined a large number of contrasts and picked out the
most “significant” one or ones.
Chap. 6, page 5

In the specific case of all pairwise comparisons of group means, a number of procedures have
been developed to control the familywise error rate. The primary one is
• Tukey-Kramer procedure (for all planned or unplanned pairwise comparisons)

In the general case of contrasts (or any linear combinations of the means) which are not
necessarily pairwise comparisons, there are two main choices. These methods can also be used
for pairwise comparisons.
• Planned comparisons: Bonferroni
• Unplanned comparisons: Scheffe (can also be used for planned comparisons)

In all these cases, the confidence interval for a contrast γ always has the form:

Estimate ± (multiplier) x (standard error) = g ± (multiplier) x SE(g)

The specific method used determines only the multiplier. If you have a legitimate choice
between two or more procedures, you can choose the one with the smaller multiplier.

In SPSS, the standard errors of one or more contrasts can be calculated by selecting the Contrasts
button on the One-way ANOVA window. You will have to find the value of the appropriate
multiplier to create a confidence interval for the contrast.

Pairwise comparisons between all pairs of means can be obtained by clicking the Post Hoc
button in the One-Way ANOVA window. It will automatically give you confidence intervals for
the difference between each pair of means. There are a multitude of options there; the ones
corresponding to the ones mentioned here are:

Ramsey & Schafer SPSS Multiplier

LSD LSD t n − I (1 − α / 2) (individual confidence intervals)


⎛I⎞
Bonferroni Bonferroni t n − I (1 − α / 2k ) where k = ⎜⎜ ⎟⎟ is number of
⎝ 2⎠
pairwise comparisons of means

Scheffe Scheffe ( I − 1) FI −1,n − I (1 − α ) (F is from Table A.4)

q I ,n −I (1 − α )
Tukey-Kramer Tukey (not Tukey’s-b) (q is from Table A.5)
2

These give 100(1-α)% simultaneous (except for LSD) confidence intervals.


Chap. 6, page 6

Example: mice diet experiment


⎛ 6⎞ 6!
There are I = 6 groups and n-I = 343 d.f. within groups. There are ⎜⎜ ⎟⎟ = = 15 pairwise
⎝ 2 ⎠ 2! 4!
comparisons. The coefficients or multipliers for 95% confidence intervals for the difference
between each pair of means are:

1. LSD: t 343 (.975) = 1.967 (approx. 1.984 using Table A.2 with 100 d.)

2. Bonferroni: t 343 (1 − .05 /( 2 * 30)) = t 343 (.998333) = 2.956

3. Scheffe: 5F5,343 (.95) = 5(2.2403) = 3.347


(approx. 5(2.26) = 3.36 using Table A.4 with df2 = 200)

q 6,343 (.95) 4.0530


4. Tukey-Kramer: = = 2.866
2 2
4.10
(approx. = 2.90 using Table A.5 with 120 df)
2

If I had just been interested in all pairwise comparisons a priori then I would use Tukey-Kramer.
If there were other pre-planned contrasts I were interested in in addition to all pairwise
comparisons, then I would either use Bonferroni (but I would have to increase k to reflect the
additional contrasts) or Scheffe, whichever were smaller. If there were additional unplanned
comparisons, then I would use Scheffe for all comparisons.
Chap. 6, page 7
Post Hoc Tests
Multiple Comparisons

Dependent Variable: Months survived


Tukey HSD

Mean
Difference 95% Confidence Interval
(I) Diet (J) Diet (I-J) Std. Error Sig. Lower Bound Upper Bound
NP N/N85 -5.289* 1.301 .001 -9.018 -1.561
N/R50 -14.895* 1.240 .000 -18.450 -11.341
R/R50 -15.484* 1.306 .000 -19.228 -11.740
N/R lopro -12.284* 1.306 .000 -16.028 -8.540
N/R40 -17.715* 1.286 .000 -21.400 -14.029
N/N85 NP 5.289* 1.301 .001 1.561 9.018
N/R50 -9.606* 1.188 .000 -13.010 -6.202
R/R50 -10.194* 1.257 .000 -13.796 -6.593
N/R lopro -6.994* 1.257 .000 -10.596 -3.393
N/R40 -12.425* 1.235 .000 -15.965 -8.885
N/R50 NP 14.895* 1.240 .000 11.341 18.450
N/N85 9.606* 1.188 .000 6.202 13.010
R/R50 -.589 1.194 .996 -4.009 2.832
N/R lopro 2.611 1.194 .246 -.809 6.032
N/R40 -2.819 1.171 .156 -6.176 .537
R/R50 NP 15.484* 1.306 .000 11.740 19.228
N/N85 10.194* 1.257 .000 6.593 13.796
N/R50 .589 1.194 .996 -2.832 4.009
N/R lopro 3.200 1.262 .117 -.417 6.817
N/R40 -2.231 1.241 .468 -5.787 1.325
N/R lopro NP 12.284* 1.306 .000 8.540 16.028
N/N85 6.994* 1.257 .000 3.393 10.596
N/R50 -2.611 1.194 .246 -6.032 .809
R/R50 -3.200 1.262 .117 -6.817 .417
N/R40 -5.431* 1.241 .000 -8.987 -1.875
N/R40 NP 17.715* 1.286 .000 14.029 21.400
N/N85 12.425* 1.235 .000 8.885 15.965
N/R50 2.819 1.171 .156 -.537 6.176
R/R50 -5.289 1.301 .468 -1.325 5.787
N/R lopro -14.895* 1.240 .000 1.875 8.987
*. The mean difference is significant at the .05 level.

Homogeneous Subsets
Months survived
a,b
Tukey HSD
Subset for alpha = .05
Diet N 1 2 3 4
NP 49 27.402
N/N85 57 32.691
N/R lopro 56 39.686
N/R50 71 42.297 42.297
R/R50 56 42.886 42.886
N/R40 60 45.117
Sig. 1.000 1.000 .108 .212
Means for groups in homogeneous subsets are displayed.
a. Uses Harmonic Mean Sample Size = 57.462.
b. The group sizes are unequal. The harmonic mean of
the group sizes is used. Type I error levels are not
guaranteed.