Professional Documents
Culture Documents
5, page 1
Scope of inference
• The scope of inference is to what would have happened if all mice had been fed each diet.
• The scope of inference can be expanded further if these mice can be viewed as
representative of a larger population of female mice.
• Since it’s an experiment, we can infer cause-and-effect if the experiment was well-run.
Comparison of all six diets was of interest, but there were some specific comparisons that were
of interest as outlined in Display 5.3.
• If the variability within each treatment is about the same for all treatments, then it makes
sense to estimate a pooled standard deviation from all the treatments even if we’re only
comparing any two at a time.
• We may want to carry out more complicated comparisons, such as a comparison of a
control group to the average of the other five groups.
• A standard first question of interest when comparing several groups is whether there is
evidence that any of the means are different from each other. Comparing all the treatments
pairwise with two-sample t tests results in a lot of individual tests (15 for 6 treatments). An
overall test of equality of all the treatment means is much more efficient and will not suffer
from the problem of running multiple tests (where statistically significant results have to be
considered in the context of how many tests were run).
An Ideal Model which allows the problems above to be solved fairly easily
• Population distributions are normal
• Population standard deviations are equal
• Independent random samples from each population (a randomized experiment satisfies this
assumption)
Chap. 5, page 2
This model is exactly the model for the pooled two-sample t-test when there are two groups:
different means, but common standard deviation
The assumption of equal standard deviations is very important and must be checked. If there are
large differences in variability, this may be of interest in and of itself and the reasons for this
should be addressed. Often, differing variability is caused by higher values of the variable in
some groups than another. For example, the variability in lifetimes of animals is likely to be
greater the longer they tend to live. Transformations (such as log) can sometimes solve this
problem.
The only change in adapting this to several groups is to use the pooled standard deviation from
all of the groups if the assumption of equal standard deviations seems reasonable.
Descriptives
Months survived
The equal variance assumption seems reasonable for this experiment so we will use the pooled standard
deviation from all 6 treatments.
The degrees of freedom for the t distribution when you use this pooled standard deviation is the
denominator in the above expression which is n − I , where n is the total sample size (349 in our
example) and I is the number of groups or treatments (6 in our example). So we use a t with 343
degrees of freedom for the mice experiment.
Chap. 5, page 3
• One desired comparison is between groups 1 and 2: the unrestricted non-purified diet
(NP) to a standard 85 calorie diet (N/N85). The result is summarized in part e) on p. 116.
1 1 1 1
First, note that SE( Y1 − Y2 ) = s p + = 6.678 + = 1.301.
n1 n 2 49 57
Conclusion: It is estimated that the 85 calorie standard diet increases mean life
expectancy by 6.8 months over an unrestricted diet with a 95% confidence interval of 4.2
to 9.4 months.
• A test of the null hypothesis that µ 1 = µ 2 against a one-sided alternative that µ 1 < µ 2
(we would have to decide before collecting the data that we were only interested in
detecting an increase in mean life expectancy with the 85 calorie diet):
Y1 − Y2 − 6.8
Test statistic = = = -5.23
SE(Y1 − Y2 ) 1.301
Conclusion: The data provide very strong evidence that the 85-calorie diet increases life
expectancy over the unrestricted diet.
Note: if the equal standard deviations assumption did not appear reasonable, then we could
have done the confidence interval and hypothesis test the usual way using the pooled
standard deviation from the two groups or the unpooled Welch’s t procedures. The
advantage of pooling all 6 groups is a better estimate with increased degrees of freedom.
• Three of the means are the same, the other three are the same but different from the first
group
The idea of a one-sided alternative hypothesis is meaningless with three or more groups.
Testing the hypothesis of equal means relies on a general approach which we will use frequently
in the rest of the course:
For testing the equality of several population means, these models are:
Full model: the population distributions are normal with the same standard deviations, but
different (possibly) means
Reduced model: the population distributions are normal with the same standard deviations,
and the same means
The general idea is that we “fit” both these models to the data (like regression). Each model
gives a predicted value for every case. The full model uses each observation’s group mean as the
predicted value. The reduced model uses the mean of all the observations together. We then
measure how well the data fit the models by computing the sum of squared residuals. The full
model can fit no worse than the reduced model because the reduced model is a special case of the
full model.
Group 1 2 3 4 5 6
Full Y1 Y2 Y3 Y4 Y5 Y6
Reduced Y Y Y Y Y Y
Example:
To illustrate these calculations, we’ll use a small hypothetical example, with 3 groups and 10
observations in all.
Extra sum of squares = Residual sum of squares (reduced) – Residual sum of squares (full)
The residual sum of squares for a model represents the variability in the original data which is
not explained by the model. The extra sum of squares therefore represents the amount of the
unexplained variability in the reduced model that is explained by the full model.
The question now is whether the improved fit represents something real or could just be
attributed to sampling variability. We use the F-statistic to test the null hypothesis that the
populations follow the reduced model against the alternative that they follow the full model and
not the reduced model.
Extra degrees of freedom = # params for full model – # params for reduced model
=4–2=2
σ̂ full
2
= estimate of σ 2 based on full model = s 2p (square of pooled standard deviation)
The numerator of the F-statistic is the average reduction in residual sum of squares for each
parameter added and the denominator is the reduction we would expect per extra parameter just
by chance.
ANOVA
Response
Sum of
Squares df Mean Square F Sig.
Between Groups 219.900 2 109.950 17.111 .002
Within Groups 44.980 7 6.426
Total 264.880 9
I
Sum of squares between groups = SSB = ∑ n (Y − Y )
i =1
i i
2
I ni
Sum of squares within groups = SSW = ∑∑ (Y
i =1 j =1
ij − Yi ) 2
Note that, SST = SSB + SSW and Extra sum of squares = SST – SSW, hence SSB = ESS.
SSB
Mean square between groups = MSB =
I −1
SSW
Mean square within groups = MSW = = s 2p
n−I
If the population means are equal (i.e., if the null hypothesis is true) then
Y1 is N( µ , σ / n * )
Y2 is N( µ , σ / n * )
…
YL is N( µ , σ / n * )
Since the samples are independent, Y1 , Y2 ,…, YI are like a random sample from a normal
population with mean µ and standard deviation σ / n * . Therefore, the sample variance of
Y1 , Y2 ,… , YI is an estimate of σ 2 / n * :
1 I σ2
∑ (Yi − Y )2
I − 1 i =1
is an estimate of
n*
.
1 I
Hence, ∑ n *(Yi − Y )2 = MSB is an estimate of σ 2 .
I − 1 i =1
To summarize:
ANOVA
Months survived
Sum of
Squares df Mean Square F Sig.
Between Groups 12733.942 5 2546.788 57.104 .00000
Within Groups 15297.415 343 44.599
Total 28031.357 348
Conclusion: There is overwhelming evidence that there is a difference in the mean lifetimes
under the different diets. This does not mean that all the diets are different, only that at least one
of them is.
Robustness to assumptions: see Section 5.5.1, p. 130. The main distributional assumptions we
need to worry about are:
• Population standard deviations are roughly equal
• There are no extreme outliers; the F-test is not resistant to outliers, particularly with small
samples
We can judge these assumptions from side-by-side dotplots or boxplots of the raw data. Judging
equality of standard deviations is a little easier if we subtract off the mean of group. That is we
examine the residuals for the full (separate means) model: Yij − Yi . As in regression, we plot the
residuals versus the predicted values. The predicted value for an observation is the group mean.
Judging from this plot, the original boxplots, and the sample standard deviations, there doesn’t
seem to be any reason to doubt the assumptions of the F test.
Chap. 5, page 9
Examining models between the separate means and the equal means models
Suppose we wanted to examine the model which assumes the two control groups (NP and
N/N85) have the same mean lifetime and the remaining four calorie restricted diets have the
same mean lifetime. The question is: how much of the difference among the means is due
simply to the differences between these two groups of diets?
This is a two-mean model that is between the separate means model (with 6 parameters to
describe the means) and the equal means model (with parameter to describe the means).
These three models are said to be nested because each model is a special case of the ones above
it.
We can test the two means model against the separate means model in SPSS by creating a new
categorical value which identifies the first two diets as group 1 and the remaining four diets as
group 2. We then run the ANOVA with this new variable as the explanatory variable.
ANOVA
Months survived
Sum of
Squares df Mean Square F Sig.
Between Groups 11131.393 1 11131.393 228.556 .000
Within Groups 16899.964 347 48.703
Total 28031.357 348
This ANOVA table is comparing the two means model to the equal means model. We see that it
is significant. Now, to compare the two-means model to the separate means model we need to
use the sums of square to compute a new F statistic. Recall
where
Months survived
Sum of
Squares df Mean Square F Sig.
Between Groups 12733.942 5 2546.788 57.104 .00000
Within Groups 15297.415 343 44.599
Total 28031.357 348
Months survived
Sum of
Squares df Mean Square F Sig.
Between Groups 11131.393 1 11131.393 228.556 .000
Within Groups 16899.964 347 48.703
Total 28031.357 348
Calculate the F statistic to test the separate means model against the two-means model:
F , =