Professional Documents
Culture Documents
where k is the number of contrasts.
The Bonferroni correction is conservative. This means that the total type I error risk will never
exceed when the Bonferroni correction is used, but may end up being substantially less than .
If you are only doing a few contrasts, this is not a large problem. However, if you do a lot of
contrasts, the Bonferroni correction may reduce the true type I error risk so drastically that your
tests end up with little power. In fact, you might be better off (have higher power) using one of
the methods for unplanned comparisons, described later.
We will not delve here into the question of how a P is calculated for a contrast. Instead we will
rely on SPSS to calculate the Ps and concentrate on how to interpret them.
Next, we will show how to get SPSS to calculate the P value for a contrast.
This process is a little complex, so bear with me. First, we have to know how SPSS considers
the groups to be ordered. The default is alphabetical. This can be changed, but there is usually
little reason to do so for an unordered categorical variable. Next, we assign numbers to each
group according to the following rules: one side of the contrast get positive numbers; the other
side gets negative numbers. A group not in the contrast gets a 0. The numbers must add to 0.
The following approach is a simple way to achieve this result:
Assign a number to the groups on each side of the contrast which equals the number of
groups on the other side of the contrast
The groups on one side of the contrast receive positive numbers, the groups on the other
side receive negative numbers
Any group not in the contrast receives a 0
Let us apply these rules to the first contrast (None versus Protestant, Catholic, Jewish)
Here are the groups in alphabetical order, with appropriate numbers assigned
9
Catholic Jewish None Protestant
1 1 -3 1
From the perspective of the None side of the contrast, there are three groups on the other side, so
None gets a three. From the perspective of the Protestant, Catholic, Jewish side of the contrast,
there is one group on the other side, so each of these gets a 1. One side is made negative, the
other is positive: it does not matter which is which. We made None negative.
Here are the numbers that result from applying the rules to the second contrast:
Catholic Jewish None Protestant
1 1 0 -2
From the perspective of the Protestant side of the contrast, there are two groups on the other side,
so it gets a 2. From the Catholic, Jewish side of the contrast, there is one group on the other side,
so they each get a 1. None is not involved in this contrast, so it gets a 0.
One side gets positive numbers, the other gets negative; it does not matter which side is negative
and which positive. We made Protestant negative.
Next we need to enter the numbers representing these contrasts into SPSS. You may recall that
SPSS is actually driven by commands, and that the menus generate the commands needed to
carry out an analysis. It turns out that contrasts cannot be specified through the menus, except
for certain predefined ones. Instead, we have to access and edit the commands that carry out the
general linear model procedure. Fortunately, we can do most of the work through the menus: we
just need to add a line by hand.
Here is the procedure. Note that the first three steps are just those previously used to set up the
basic ANOVA, and do not actually need to be repeated if you have already performed the
ANOVA. The steps actually required to set up the contrasts begins with the fourth step:
Click Analyze > General Linear Model > Univariate
Click the variable lham_d and then the arrow next to the Dependent Variable box
Click religion$ and then the arrow next to the Fixed Factor(s) box.
Click the Paste button (this opens a window showing the commands that have been
generated so far). This window should contain the following:
UNIANOVA lham_d BY religion$
/METHOD=SSTYPE(3)
/INTERCEPT=INCLUDE
/CRITERIA=ALPHA(0.05)
/DESIGN=religion$.
We need to add a line to these commands. The period (.) after religion$ terminates the
command set, so the first step is to erase this period. Then we add our line, so the
command set will read:
10
UNIANOVA lham_d BY religion$
/METHOD=SSTYPE(3)
/INTERCEPT=INCLUDE
/CRITERIA=ALPHA(0.05)
/DESIGN=religion$
/contrast (religion$)=special (1 1 -3 1, 1 1 0 -2).
Note that the added line contains the specifications for the two contrasts we want to
perform. At least one blank space is required between each number, and a comma
separates the two contrasts. Finally, the added line, which is the new final line, must end
with a period. Now, to execute the commands:
Click Run > All
We get the same ANOVA table as before, with this additional output:
Consider the P values for the contrasts (Sig. in SPSS). For the first contrast, P=0.507. Even
without the Bonferroni correction, this would not be significant. However, keep in mind that we
now need P<0.025 to declare a significant difference. For the second contrast, P=0.009. This is
less than 0.025, so the contrast is significant. We have evidence that Protestant differs from
Catholic and Jewish.
You may have noticed that we havent actually seen what the means are. This was done
purposely to emphasize that these are a priori contrasts, decided upon before we know the
11
results. Most likely, we would like to see the means. We can obtain these by adding the
following to the general linear model procedure:
Click Options
Under Display, click in the box next to Descriptive Statistics
Click Continue, and then OK.
Here is the result:
It appears that Protestants are happier (have a lower ham_d score) than the other religious groups
in the data set. But not a lot happier. We note from the output on p6 that R
2
is 0.031, so
religious affiliation accounts for only about 3% of the variability in Ham-D score. This is an
example of a relationship that is better than chance, but nevertheless quite weak.
You might have noticed that there is a button for Contrasts in the general linear model menu.
This carries out some predefined contrasts that may or may not be useful in a given situation.
The procedure I have described here allows any contrast to be set up, and so is more versatile.
11.32 Unplanned Contrasts
There are many approaches to carrying out unplanned (post hoc) comparisons. We will cover
two of the more useful ones: the Tukey and Scheff, named after the statisticians who devised
them.
The Tukey Procedure:
The Tukey examines the differences between all pairs of group means. We consider this to be an
unplanned situation because we are scattershot looking at all pairs. We are not specifically
planning to compare certain pairs ahead of time. Since we are comparing a number of pairs of
means, we need a multiple comparison adjustment to control the type I error.
It would not be wrong to use the Bonferroni correction, dividing the desired alpha level by the
number of pairs of means to be compared. However, the number of pairs can be quite large (six
in our example), and the Bonferroni correction gets quite severe. With many comparisons, it
becomes extremely conservative, resulting in an alpha-level much lower than intended, and
therefore substantially reduced power. The Tukey provides a less severe adjustment, while still
assuring that the overall type I error risk is below the chosen .
12
To carry out the Tukey procedure, you first set up the general linear model, as previously
described. Before clicking the OK button, you need to carry out the following additional steps:
Click the Post Hoc button
Click the independent variable name in the Factor(s) box. (There is only once choice in
this example: religion$). Then click the arrow next to the Post Hoc Tests for box.
From the choice of various test procedures, click the small box next to Tukey. (Note that
Bonferroni is also a choice, but we have already decided against it for this situation.)
Click Continue
Click OK
Here is the output for the Tukey comparisons:
The table gives the pairwise differences between the group means, the standard errors for the
differences, the 95% confidence intervals for the differences, and the probabilities. All of these
have been adjusted for the multiple comparison situation, so they are not the same as what you
would get if you simply performed a t-test on the difference between each pair of means. The
Tukey adjusts the probabilities, rather than the alpha-level, so the probabilities can be compaired
directly to the desired alpha level (usually 0.05). Similarly, you can inspect the confidence
intervals, which have been adjusted for the multiple comparison, to see if any exclude 0.
We are perhaps disappointed to see that none of the pairwise differences is significant. (All
have P>0.05, and CIs that span 0. This reflects the fact that unplanned comparisons in general
are less powerful than an insightful planned comparison. That is, having anticipated in a planned
comparison that Protestant would differ from the other two religions, we were able to show a
significant difference. When we try to discern this difference on an unplanned basis, we lose
statistical power, and are unable to make a convincing case for a difference.
13
You might want to refer back to the discussion of power in Chapter 8. One could think of the
difference between unplanned and planned comparisons as similar to using a low power lens
versus a high power lens on a microscope. With the low power lens, you have a wider field of
view, and dont have to know exactly where to look, but you wont see details. With the high
power lens, you must know exactly where to look, but you will resolve smaller details.
The Scheff Procedure
The Scheff allows you to go on a hunting trip through the group means, testing any contrast that
catches your eye. Even if you only formally test one contrast, by implication you have looked
through many others, presumably picking one with a particularly large difference. A multiple
comparison correction is therefore required. The Scheff has the property that if the original
ANOVA (omnibus test) has a significant P, you are guaranteed of being able to find a significant
contrast using the Scheff. The Scheff uses the F ratio from a contrast, from which SPSS
calculates P for the contrast, and adjusts this for multiple contrasts.
The Scheff adjustment is to divide the F ratio by g-1, where g is the number of groups in the x
variable (four in our example). SPSS does not give you the P for this adjusted F, so significance
is most conveniently judged by using the critical value method. Using a table of areas of the F-
distribution, we look up the critical value of F that puts 0.05 in the right tail. We reject the null
hypothesis, and declare the contrast significant if our Scheff adjusted F is greater than the
critical value for F. Remember, with the F distribution, we consider only the right tail, not the
two tails as with the t distribution, because differences between the means can only increase F.
By way of example, we first need to decide what contrast to test. Looking through the means
(now permitted, since this is a post-hoc test), it might strike us that Catholic, Jewish, and None
all have mean lham_d scores that are quite close together, while Protestant stands out as lower.
This suggests the contrast: Protestant versus Catholic, Jewish, None. It can be coded as:
Catholic Jewish None Protestant
1 1 1 -3
Using the procedure described previously for performing a contrast, we obtain:
14
The relevant output is in the table labeled Test Results, where an F ratio is listed (9.243). This F-
ratio would be appropriate only if this were a planned contrast, and the only one we were
performing.
However, this is an unplanned comparison, chosen only after we looked at the results, and by
implication, all possible contrasts. We therefore need a multiple comparison correction.
The Scheff correction is to divide F by g-1, so:
08 . 3
1 4
243 . 9
1
g
F
F
S
For a critical value test, we need to compare F
s
to the critical F, which is F
.95
, from a table of the
F distribution. There is such a table at the end of this chapter. First, we confirm from the usual
diagram that accompanies the table, that the values in the table are based on the area to the left of
F, so to have 0.05 in the right tail, we do want F
.95
. Because of the complexity of the F table,
separate tables are published for different areas. The back of the chapter has only the table for
F
.95
. To enter the table, we need two values for degrees of freedom. You may recall that F is a
ratio, and has separate degrees of freedom for the numerator and denominator components. F
S
has numerator degrees of freedom g-1, and denominator degrees of freedom n-g.
Our sample has four groups, so numerator df is 4-1 = 3. It has 292 cases, so denominator df is
292-4 = 288. Note you can get the denominator df from the Test Results table as the Error df.
15
In the F table, numerator df is listed along the top, and denominator df along the side. The table
does not have an entry for 288 df: the choices are 120 or infinity. In practice, they are not much
different. Conservative approach is to use the lower df, for a critical F
.95
of 2.68.
For critical value method, we reject H
0
when our test statistic exceeds the critical value, or in this
case, when F
s
> F
.95
. In our example, 3.08 > 2.68, so we reject the null hypothesis. We conclude
that we have evidence that Protestants have a lower lham_d score than the other religious groups.
11.4 Two-Way ANOVA, and Analysis of Covariance (ANCOVA)
In Chapter 10, on multiple regression, we introduced the idea of using two more or more
quantitative x-variables to predict y. One might wonder if the same idea applies to categorical x-
variables. Can we use two or more to increase our understanding of y? For that matter, could
one mix categorical and quantitative xs? For example, could one use a model with a categorical
x and a quantitative x?
The answer to both questions is yes. To keep the discussion from getting too complex, we will
restrict ourselves to the situation where the model includes 2 x-variables. If both are categorical,
we call the resulting model a two-way analysis of variance. If one is categorical, and one is
quantitative, we call the model an analysis of covariance, often abbreviated ANCOVA.
The basic principles involved are precisely the same as for multiple regression. A full model
will include two main effects and an interaction term:
2 1 12 2 2 1 1 0
x x b x b x b b y
However, it is quite possible that not all of these terms will be significant. We use the principle
of hierarchical elimination, as described in chapter 10, to eliminate non-significant terms,
starting with consideration of the interaction term.
Any categorical x will need to be converted to a set of binary xs, as described earlier in this
chapter. However, SPSS will do this automatically for us, so other than telling SPSS that a
variable is categorical, we can set this up precisely as we would a multiple regression.
You tell SPSS whether a variable is categorical or quantitative by which box of the general linear
model menu you put it in. Quantitative variables go in the Covariate(s) box. Categorical
variables go in one of the Factor(s) boxes: for now, the Fixed Factor(s) box.
Note on terminology: In ANOVA, the independent variables, which will necessarily be
categorical, are commonly called factors. In ANCOVA, the categorical independent variables
continue to be called factors, while the quantitative independent variables are called covariates.
Staying with the survey4 data set, let us consider a two-way ANOVA with lham-d as the
dependent variable, and sex$ and religion$ as factors. Beginning with the full model, we set this
up in SPSS as follows (if you are continuing from the prior example, the first three steps have
already been done):
16
Click Analyze > General Linear Model > Univariate
Click lham_d and then the arrow next to the Dependent Variable box
Click religion$ and the arrow next to the Fixed Factor(s) box
Click sex$ and the arrow next to the Fixed Factor(s) box
Click Model
Click religion$ and then the arrow next to the Model box
Click sex$ and then the arrow next to the Model box
Highlight both religion$ and sex$, make sure interaction is selected in the Build term(s)
box, and then click the arrow next to the Model box
Click Continue
Click Options
Highlight religion$, sex$, and religion$*sex$ by holding down the CTRL key and left
clicking each in turn. Then click the arrow next to the Display Means for box
Click Continue
Click OK
The ANOVA table follows:
The overall model tests significant (P=0.036), so there is evidence for some relationship between
lham_d and the factors. The interaction term however is not significant (P=0.711), so we need
to remove the term from the Model box, and run the analysis again. Here is the ANOVA table:
17
Both main effects are significant (P=0.034 for sex$, and p=0.011 for religion$), so this is our
final model. Next we look at the group means:
We have previously seen and commented on the means for the groups in religion$.
The mean depression score is higher for women than for men. This is a well-known result in the
psychological literature.
We note that the R
2
value for this analysis is 0.051, so while the effects of religion$ and sex$ are
statistically significant, they are not large.
For an example of an ANCOVA, let use sex$ and income as x-variables (note that income is
quantitative. Here is the procedure (we assume the previous analysis has been cleared):
Click Analyze > General Linear Model > Univariate
Click lham_d and then the arrow next to the Dependent Variable box
Click sex$ and the arrow next to the Fixed Factor(s) box
18
Click income and the arrow next to the Covariate(s) box
Click Model
Click sex$ and then the arrow next to the Model box
Click income and then the arrow next to the Model box
Highlight both income and sex$, make sure interaction is selected in the Build term(s)
box, and then click the arrow next to the Model box
Click Continue
Click OK
Here is the ANOVA table:
Neither the sex$ nor the interaction terms are significant. Following the hierarchy principle, we
first remove the interaction term from the model, with the following result:
We see that the sex$ term is still not significant, so we remove it. (Note: if you only remove sex$
from the Model box, and leave it in the Fixed Factor(s) box, you will get a warning message
saying that it is not used in the model. This is what we intend, so you can ignore the message.
Alternatively, you can also remove sex$ from the Fixed Factor(s) box.). Since what is left will
be just a simple regression of lham_d on income, we should also request to see the regression
parameters by clicking Options and Parameter estimates. Here is the result:
19
The income term is significant (P=0.016), so this is the final model.
Notice that this is just a simple regression. The slope for income is negative, so higher incomes
are associated with lower depression scores.
From this analysis, we learn a couple of interesting things:
It may not be quite true that money doesnt buy happiness. People with higher incomes
do have lower depression scores. On the other hand r
2
is only 0.02, which is quite small,
so perhaps when subjected to rigorous statistical analysis, the old saw should be revised
to, Money doesnt buy much happiness.
The well-publicized higher mean depression score of women disappears when income is
taken into account. It is possible that the higher depression score of women is a result of
the lower average income of women, presumably something that society could change,
rather than a biological effect of sex per se.
We do caution that this is an observational study, so we need to be careful about attributing cause
and effect to the relationships. Also, as pointed out, the effect size is modest.
11.5 Model II ANOVA:
Up to now, we have been assuming that the groups were defined by fixed, reproducible
differences. For example, we assume that Protestantism, Catholicism, Judaism, and lack of
religious affiliation are more or less constant conditions. There is another situation in anova
where the groups in our data can be thought of as a random sample of a much larger population
of groups. For example, suppose you are the director of a medical laboratory. You are
20
concerned about the variability in the results of a particular blood test carried out by your
laboratory. One possible source for this variability is differences among the technicians carrying
out the test. To investigate this, you give each technician several blood samples to analyze. In
reality, all of the samples come from a single source, so the true value of whatever is being
measured is the same for all samples. Our interest is in whether there are differences among the
results produced by the different technicians.
We could view this two ways. One is ask whether there are differences among the specific
technicians currently working in our lab. This would fall under the fixed effects model we have
already examined. The other would be to view the present technicians as a sample of all of the
technicians that could work in our lab. In other words, we are not concerned about whether Sally
is different from John, but about whether differences among technicians in general contribute
appreciably to the variability of our results. If there is an appreciable technician contribution to
the variability in our results, we might want to look at ways of reducing this variability. For
example, we might be able to find a different way to perform the test that is less affected by the
skill or judgment of the technician. Or, we might want to change how we train the technicians,
to make their performance more consistent.
When we treat the group differences as constant and reproducible, we have a fixed effects, or
model I anova. This is the approach we have been looking at so far. When we treat the groups
as a random sample from a population of groups, we have a random effects, or model II, anova.
Two questions that commonly arise in model II anova are: 1)is there significant variability
among the groups, and 2)what proportion of the total variance of y is attributable to variance
among the groups?
The first question is answered for model II anova in the same way as for model I. That is, we
look at the P value associated with the F-test in the anova table. A P-value less than leads to
conclusion that the group means are more variable than one would expect by chance. In fact, the
anova P is arguably more useful in model II anova than in model I, because in model II anova,
our explicit interest is in whether there is overall heterogeneity among the group means, while in
model I, our principal interest may well lie in a more specific question.
We can answer the second question by calculating a quantity called the coefficient of intraclass
correlation, which we will symbolize r
I
. It has value zero when differences among the groups
make no contribution to the total variation in y, and value one if all the variation in y is
attributable to variation among the groups.
When the groups have equal sample sizes, n, we can calculate r
I
in two steps as:
n
MSE MSR
s
g
2
MSE s
s
r
g
g
I
2
2
However, SPSS will do most of the calculations for us
For an example, we will use the data set BARLEY. It contains data on yields of barley. We will
use the variable y1932 for y, and site$ for x. Y1932 contains barley yields for experimental plots
21
planted in 1932. SITE$ designates six different sites. There were 10 plots at each site. (The
plots happen to represent different varieties of barley, but we will ignore this fact for the present
analysis.) We want to know whether there is significant variability in yield among the sites, and
what proportion of the total variability is accounted for by site.
First run an ANOVA with the general linear model routine, precisely as one would for a Model I
ANOVA. From this we determine that there is a significant added variance component due to
site (P = 0.000):
To estimate variance components:
Click Analyze > General Linear Model > Variance Components
Click y1932 and then the arrow next to the Dependent Variable box
Click site$ and then the arrow next to the Random Factor(s) box
Click OK
Here is the output:
The error variance, Var(Error), is the same as error mean square from the ANOVA table,
29.671, as it should be (remember, mean square is just another name for variance).
The site variance is new: 68.909. The coefficient of intraclass correlation is the proportion of the
total variance (error variance + site variance) that is site variance:
699 . 0
671 . 29 909 . 68
909 . 68
2
2
MSE s
s
r
g
g
I
Thus, about 70% of the total variance in crop yield is attributable to variation among sites.
22
Critical Values of the F-distribution for Area = 0.95
23