You are on page 1of 23

1

11. Analysis of Variance



We have already seen the analysis of variance used in connection with linear regression. There
is, however, a more classic use of analysis of variance. It can be employed in the situation where
there is a quantitative y variable, and one or more categorical x variables. Often, when the term
analysis of variance is used in a research report, it refers to this application, since historically it
preceded the use of analysis of variance in regression.


11.1 Binary x:

Let us start with a simple case: a quantitative y and an x-variable with two categories (called a
binary x). We can use, as an example, the radish seedling data that we first encountered in
Chapter 8. To recap, the RQ was, does a particular chemical affect the growth of radish
seedlings? Stem lengths were measure for two groups of seedlings, with 20 seedlings in each
group. One group was a control grown on distilled water. The other was an experimental group
grown on water to which the chemical had been added. The experiment was repeated several
times, with different chemicals. The data file stem1 contains the data from one experiment. In
this data file, STEMLEN is a quantitative variable containing the stem length measurements,
while TRTMNT$ is a categorical variable indicating which group each seedling belongs to,
coded C for control, and E for experimental. We originally analyzed these data with a t-test,
with the following result:



Using the equal variance results, as we have been doing, we saw that the difference between the
means is significant (P = 0.001 < 0.05).

It turns out that we can analyze these data as a simple regression problem, and achieve the same
result. As a reminder, the simple regression equation is:

x b b y
1 0


Our y variable is the quantitative variable STEMLEN. For the x-variable, we would like to use
TRTMNT$, but there is a problem. The x-variable in regression is required to be quantitative. It
turns out there is a simple solution: recode TRTMNT$ so that it is numeric. All we have to do is
2
assign a numeric value to C and a different numeric value to E. It turns out that it makes no
difference what numbers we use, as long as they are different. One common practice is to use 0
and 1 (say 0 for control group, and 1 for experimental, though it would not matter if we did it the
other way around). If you examine the stem1 data set, you will see that it contains a third
variable, GROUP, which is coded precisely this way:



What we want to do is perform a regression of STEMLEN on GROUP.

Since this is just a simple regression, we could do it using the regression routine, but we are
going to use the general linear model routine instead, since we will need it later anyway. We use
stemlen as the dependent variable, group as the independent variable (in the Covariate(s) box,
and remember to request parameter estimates under Options. Here is the result:



Notice that the P value is exactly the same as it was for the t-test (0.001). Also, the value of t
from the regression is the same as from the t-test, except for the sign reversed (3.551). We might
even be struck by the fact that b
1
, the slope of the regression line, is the same as the difference
between the group means displayed for the t-test, again except for a sign reversal (5.8). (We
3
should mention that this will only be true if 0,1 coding is used for the categorical variable. On the
other hand, the P and t values do not depend on this specific coding.) We also remind you that F
in the ANOVA table = t
2
, as it is for any simple regression.

Evidently, the regression on the recoded x is exactly equivalent to the t-test, a rather remarkable
fact when you consider that the concepts behind the two seem, on the face of it, to be quite
different.

As a matter of convenience, SPSS does not require you to recode the data manually, as was done
for this example. One can use the categorical variable trtmnt$ directly. The only trick is to put it
in the Fixed Factor(s) box. Here is the result:



As you can see, you get precisely the same ANOVA table.

What happens here is the SPSS recodes the categorical treatment$ variable into a numerical
variable, and then runs a regression, precisely as we did by hand.

In summary, we can use the regression procedure to produce the same results as we obtained
with the t-test for the difference between two means.


11.2 More Than Two Categories in x

Perhaps you are thinking, Well that is all very nice, but since we already had the t-test for
analyzing this situation, have we really gained anything? Its a good question. For an answer,
let us turn now to the case where the categorical variable has more than two categories. Our RQ
will be: do the groups defined by the categorical variable have different means for the
quantitative variable?

For an example, let us turn to the survey4 data set. It contains data from an observational study
of life factors involved in depression. In this study, 294 volunteers filled out questionnaires
about various aspects of themselves, and took a test called the Hamilton-D. This is a recognized
psychological test of depression. A low Hamilton-D score indicates that the individual is not
depressed, while higher scores indicate increased levels of depression. The Hamilton_D score is
strongly right skewed, so we will work with its log transform, with the variable name lham_d in
4
the data set. Another variable in the data set is religion$. This is a categorical variable
indicating religious affiliation, coded: Catholic, Protestant, Jewish, None.

In the original data set, there was also a category Other, but there were only two individuals in
this category. Because this is too few on which to base a conclusion, these cases were removed
from the data set.

The RQ is: Does mean Ham-D score vary among the four religions represented in the data set?
The null hypothesis is that they are the same. In symbols:

H
0
:
N J P C


The t-test developed in Chapter 8 only is applicable to the situation where there are two groups.
Here we have four. Perhaps it occurs to you that we could do this as a regression, recoding
RELIGION$ numerically. For example:

Catholic 0
Protestant 1
Jewish 2
None 3

Unfortunately, this does not lead to a sensible result. The following graph shows a regression of
the means of the four groups on a numeric x defined as above:
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5
RELIGION
1.7
1.8
1.9
2.0
2.1
2.2
L
D
E
P
R
E
S
S

Although this looks like a perfectly good regression, we recall that religion$ is unordered
(Chapter 1). There is no particular reason for listing the religions in the order given, and no
reason to assign the numeric codes the way we assigned them. Suppose, for example, we listed
the religions in alphabetical order, and assigned numeric codes as follows:
5


Catholic 0
Jewish 1
None 2
Protestant 3

A regression of the group means on an x coded as above looks like this:

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5
RELIGION
1.7
1.8
1.9
2.0
2.1
2.2
L
D
E
P
R
E
S
S

As you can see, it is quite a different regression

In other words, the result of the regression depends arbitrarily on how we assign the codes. Any
conclusion from this regression would be useless.

Can we solve this problem? Actually, there is a rather ingenious solution. The trick is to recode
the categorical variable, RELIGION$ in this example, as a series of binary numeric xs. We need
one fewer such xs than there are categories in the categorical variable. Since there are four
categories in RELIGION$, we need three binary xs. There are many ways that these xs could
be set up. Here is one way:

x
1
x
2
x
3

Catholic 0 0 0
Jewish 1 0 0
None 0 1 0
Protestant 0 0 1

As in the case of a single binary variable, it does not matter much how the coding is set up, as
long as there is a unique code for each group, and each variable can take on only two values (0
and 1 in this example).

6
We have already seen that a regression on binary x's produces a sensible result. Now, we run a
multiple regression of y on the binary xs:


3 3 2 2 1 1 0
x b x b x b b y

You may wonder if we need interaction terms. The answer is no, because if you multiply any
combination of the xs, you will find the result is always 0, so any interaction terms always have
the value 0, and can be omitted.

As a matter of convenience, we do not actually have to recode the categorical variable by hand.
If we put a categorical variable in the Fixed Factor(s) box, SPSS will perform the recoding
automatically. Let us try this:

Click Analyze > General Linear Model > Univariate
Click the variable lham_d and then the arrow next to the Dependent Variable box
Click religion$ and then the arrow next to the Fixed Factor(s) box.
Click OK

Here is the result:



The P value for the ANOVA is 0.027, which is less than 0.05. We reject the null hypothesis, and
conclude that we have evidence that the mean depression scores of the religions differ.





7
11.3 Contrasts

The result from the ANOVA is interesting, but it is only the beginning of the story. We now
have evidence that the religions differ with respect to mean depression score. The obvious
question is, what is the nature of the differences? That is which religions differ from which
others?

We answer this question through a series of comparisons, called contrasts. A contrast compares
one group or set of groups with another group or set of groups. It turns out that there are two
ways to form the contrasts.

We can set up the contrasts before we see the results, ideally before we even collect the data.
We do so on the basis of what we think ahead of time would be interesting to look at. For
example, we might wonder whether people who lack a religious affiliation (None group) differ
from people who have a religious affiliation, in other words, None versus Catholic, Protestant,
Jewish. One might wonder whether the majority religion in the US (Protestant) differs from the
minority religions Catholic and Jewish.

The other way to set up the contrasts is to collect the data and then look at the results, that is, at
the group means. We could then set up contrasts on the basis of differences that look interesting:
presumably large differences. Contrasts arrived at in this way are called post hoc, or unplanned
contrasts. One variation on this idea is to set up contrasts consisting of all pairs of means, and
look for large differences.

These two ways of setting up contrasts require different statistical treatments.


11.31 Planned Contrasts:

Planned contrasts, also called a priori contrasts, are set up on the basis of considerations external
to the data. They are contrasts considered interesting on theoretical grounds, or because of
specific interests of the investigators. As their name implies, they should be established at the
planning stage of an investigation, before the data have been collected.

Generally, planned contrasts should be relatively few in number. As we shall see, there is a
statistical penalty to be paid for each planned contrast, which reduces power. Plan too many, and
the penalty becomes excessive, making it hard to demonstrate a difference in any of the
contrasts.

There are various contrasts that might be planned for the depression score and religion example.
The questions that interest me might not be the same ones that interest you. However, to have an
example, lets carry out the two contrasts mentioned above:
None versus Protestant, Catholic, Jewish
Protestant versus Catholic, Jewish

Now the penalty part. Each time you perform a statistical test, you run a risk, set by , of
making a type I error when the null hypothesis is true. Normally, this risk is set at 0.05. If you
8
perform several tests, you run this risk with each test. If you run a bunch of tests, your total type
I error risk is considerably higher than 0.05. In fact, if you run a enough tests, you can be almost
certain to make a type I error. This is a serious problem in complex investigations: the chances
of making a type I error for some conclusion of the investigation can be very high. Statisticians
call this the multiple comparison problem.

So what should we do? The answer is that we adjust downward for each test, so that the total
type I error risk for all the tests together is held to no more than 0.05. There are various ways to
do this. One method is called the Bonferroni correction, after the Italian statistician who
proposed it. The method is simple. You divide your desired by the number of tests (contrasts)
you intend you do. The resulting quantity is the Bonferroni corrected , symbolized
B
. Each
contrast is declared significant only if its P is less than
B.
Since we are doing two contrasts,
B

is calculated as:

025 . 0
2
05 . 0

k
B



where k is the number of contrasts.

The Bonferroni correction is conservative. This means that the total type I error risk will never
exceed when the Bonferroni correction is used, but may end up being substantially less than .
If you are only doing a few contrasts, this is not a large problem. However, if you do a lot of
contrasts, the Bonferroni correction may reduce the true type I error risk so drastically that your
tests end up with little power. In fact, you might be better off (have higher power) using one of
the methods for unplanned comparisons, described later.

We will not delve here into the question of how a P is calculated for a contrast. Instead we will
rely on SPSS to calculate the Ps and concentrate on how to interpret them.

Next, we will show how to get SPSS to calculate the P value for a contrast.

This process is a little complex, so bear with me. First, we have to know how SPSS considers
the groups to be ordered. The default is alphabetical. This can be changed, but there is usually
little reason to do so for an unordered categorical variable. Next, we assign numbers to each
group according to the following rules: one side of the contrast get positive numbers; the other
side gets negative numbers. A group not in the contrast gets a 0. The numbers must add to 0.
The following approach is a simple way to achieve this result:
Assign a number to the groups on each side of the contrast which equals the number of
groups on the other side of the contrast
The groups on one side of the contrast receive positive numbers, the groups on the other
side receive negative numbers
Any group not in the contrast receives a 0

Let us apply these rules to the first contrast (None versus Protestant, Catholic, Jewish)

Here are the groups in alphabetical order, with appropriate numbers assigned

9
Catholic Jewish None Protestant
1 1 -3 1

From the perspective of the None side of the contrast, there are three groups on the other side, so
None gets a three. From the perspective of the Protestant, Catholic, Jewish side of the contrast,
there is one group on the other side, so each of these gets a 1. One side is made negative, the
other is positive: it does not matter which is which. We made None negative.

Here are the numbers that result from applying the rules to the second contrast:

Catholic Jewish None Protestant
1 1 0 -2

From the perspective of the Protestant side of the contrast, there are two groups on the other side,
so it gets a 2. From the Catholic, Jewish side of the contrast, there is one group on the other side,
so they each get a 1. None is not involved in this contrast, so it gets a 0.

One side gets positive numbers, the other gets negative; it does not matter which side is negative
and which positive. We made Protestant negative.

Next we need to enter the numbers representing these contrasts into SPSS. You may recall that
SPSS is actually driven by commands, and that the menus generate the commands needed to
carry out an analysis. It turns out that contrasts cannot be specified through the menus, except
for certain predefined ones. Instead, we have to access and edit the commands that carry out the
general linear model procedure. Fortunately, we can do most of the work through the menus: we
just need to add a line by hand.

Here is the procedure. Note that the first three steps are just those previously used to set up the
basic ANOVA, and do not actually need to be repeated if you have already performed the
ANOVA. The steps actually required to set up the contrasts begins with the fourth step:

Click Analyze > General Linear Model > Univariate
Click the variable lham_d and then the arrow next to the Dependent Variable box
Click religion$ and then the arrow next to the Fixed Factor(s) box.
Click the Paste button (this opens a window showing the commands that have been
generated so far). This window should contain the following:

UNIANOVA lham_d BY religion$
/METHOD=SSTYPE(3)
/INTERCEPT=INCLUDE
/CRITERIA=ALPHA(0.05)
/DESIGN=religion$.

We need to add a line to these commands. The period (.) after religion$ terminates the
command set, so the first step is to erase this period. Then we add our line, so the
command set will read:


10
UNIANOVA lham_d BY religion$
/METHOD=SSTYPE(3)
/INTERCEPT=INCLUDE
/CRITERIA=ALPHA(0.05)
/DESIGN=religion$
/contrast (religion$)=special (1 1 -3 1, 1 1 0 -2).

Note that the added line contains the specifications for the two contrasts we want to
perform. At least one blank space is required between each number, and a comma
separates the two contrasts. Finally, the added line, which is the new final line, must end
with a period. Now, to execute the commands:

Click Run > All

We get the same ANOVA table as before, with this additional output:



Consider the P values for the contrasts (Sig. in SPSS). For the first contrast, P=0.507. Even
without the Bonferroni correction, this would not be significant. However, keep in mind that we
now need P<0.025 to declare a significant difference. For the second contrast, P=0.009. This is
less than 0.025, so the contrast is significant. We have evidence that Protestant differs from
Catholic and Jewish.

You may have noticed that we havent actually seen what the means are. This was done
purposely to emphasize that these are a priori contrasts, decided upon before we know the
11
results. Most likely, we would like to see the means. We can obtain these by adding the
following to the general linear model procedure:
Click Options
Under Display, click in the box next to Descriptive Statistics
Click Continue, and then OK.

Here is the result:


It appears that Protestants are happier (have a lower ham_d score) than the other religious groups
in the data set. But not a lot happier. We note from the output on p6 that R
2
is 0.031, so
religious affiliation accounts for only about 3% of the variability in Ham-D score. This is an
example of a relationship that is better than chance, but nevertheless quite weak.

You might have noticed that there is a button for Contrasts in the general linear model menu.
This carries out some predefined contrasts that may or may not be useful in a given situation.
The procedure I have described here allows any contrast to be set up, and so is more versatile.


11.32 Unplanned Contrasts

There are many approaches to carrying out unplanned (post hoc) comparisons. We will cover
two of the more useful ones: the Tukey and Scheff, named after the statisticians who devised
them.

The Tukey Procedure:

The Tukey examines the differences between all pairs of group means. We consider this to be an
unplanned situation because we are scattershot looking at all pairs. We are not specifically
planning to compare certain pairs ahead of time. Since we are comparing a number of pairs of
means, we need a multiple comparison adjustment to control the type I error.

It would not be wrong to use the Bonferroni correction, dividing the desired alpha level by the
number of pairs of means to be compared. However, the number of pairs can be quite large (six
in our example), and the Bonferroni correction gets quite severe. With many comparisons, it
becomes extremely conservative, resulting in an alpha-level much lower than intended, and
therefore substantially reduced power. The Tukey provides a less severe adjustment, while still
assuring that the overall type I error risk is below the chosen .

12
To carry out the Tukey procedure, you first set up the general linear model, as previously
described. Before clicking the OK button, you need to carry out the following additional steps:
Click the Post Hoc button
Click the independent variable name in the Factor(s) box. (There is only once choice in
this example: religion$). Then click the arrow next to the Post Hoc Tests for box.
From the choice of various test procedures, click the small box next to Tukey. (Note that
Bonferroni is also a choice, but we have already decided against it for this situation.)
Click Continue
Click OK
Here is the output for the Tukey comparisons:



The table gives the pairwise differences between the group means, the standard errors for the
differences, the 95% confidence intervals for the differences, and the probabilities. All of these
have been adjusted for the multiple comparison situation, so they are not the same as what you
would get if you simply performed a t-test on the difference between each pair of means. The
Tukey adjusts the probabilities, rather than the alpha-level, so the probabilities can be compaired
directly to the desired alpha level (usually 0.05). Similarly, you can inspect the confidence
intervals, which have been adjusted for the multiple comparison, to see if any exclude 0.

We are perhaps disappointed to see that none of the pairwise differences is significant. (All
have P>0.05, and CIs that span 0. This reflects the fact that unplanned comparisons in general
are less powerful than an insightful planned comparison. That is, having anticipated in a planned
comparison that Protestant would differ from the other two religions, we were able to show a
significant difference. When we try to discern this difference on an unplanned basis, we lose
statistical power, and are unable to make a convincing case for a difference.

13
You might want to refer back to the discussion of power in Chapter 8. One could think of the
difference between unplanned and planned comparisons as similar to using a low power lens
versus a high power lens on a microscope. With the low power lens, you have a wider field of
view, and dont have to know exactly where to look, but you wont see details. With the high
power lens, you must know exactly where to look, but you will resolve smaller details.


The Scheff Procedure

The Scheff allows you to go on a hunting trip through the group means, testing any contrast that
catches your eye. Even if you only formally test one contrast, by implication you have looked
through many others, presumably picking one with a particularly large difference. A multiple
comparison correction is therefore required. The Scheff has the property that if the original
ANOVA (omnibus test) has a significant P, you are guaranteed of being able to find a significant
contrast using the Scheff. The Scheff uses the F ratio from a contrast, from which SPSS
calculates P for the contrast, and adjusts this for multiple contrasts.

The Scheff adjustment is to divide the F ratio by g-1, where g is the number of groups in the x
variable (four in our example). SPSS does not give you the P for this adjusted F, so significance
is most conveniently judged by using the critical value method. Using a table of areas of the F-
distribution, we look up the critical value of F that puts 0.05 in the right tail. We reject the null
hypothesis, and declare the contrast significant if our Scheff adjusted F is greater than the
critical value for F. Remember, with the F distribution, we consider only the right tail, not the
two tails as with the t distribution, because differences between the means can only increase F.

By way of example, we first need to decide what contrast to test. Looking through the means
(now permitted, since this is a post-hoc test), it might strike us that Catholic, Jewish, and None
all have mean lham_d scores that are quite close together, while Protestant stands out as lower.
This suggests the contrast: Protestant versus Catholic, Jewish, None. It can be coded as:

Catholic Jewish None Protestant
1 1 1 -3

Using the procedure described previously for performing a contrast, we obtain:
14


The relevant output is in the table labeled Test Results, where an F ratio is listed (9.243). This F-
ratio would be appropriate only if this were a planned contrast, and the only one we were
performing.

However, this is an unplanned comparison, chosen only after we looked at the results, and by
implication, all possible contrasts. We therefore need a multiple comparison correction.

The Scheff correction is to divide F by g-1, so:

08 . 3
1 4
243 . 9
1

g
F
F
S


For a critical value test, we need to compare F
s
to the critical F, which is F
.95
, from a table of the
F distribution. There is such a table at the end of this chapter. First, we confirm from the usual
diagram that accompanies the table, that the values in the table are based on the area to the left of
F, so to have 0.05 in the right tail, we do want F
.95
. Because of the complexity of the F table,
separate tables are published for different areas. The back of the chapter has only the table for
F
.95
. To enter the table, we need two values for degrees of freedom. You may recall that F is a
ratio, and has separate degrees of freedom for the numerator and denominator components. F
S

has numerator degrees of freedom g-1, and denominator degrees of freedom n-g.

Our sample has four groups, so numerator df is 4-1 = 3. It has 292 cases, so denominator df is
292-4 = 288. Note you can get the denominator df from the Test Results table as the Error df.

15
In the F table, numerator df is listed along the top, and denominator df along the side. The table
does not have an entry for 288 df: the choices are 120 or infinity. In practice, they are not much
different. Conservative approach is to use the lower df, for a critical F
.95
of 2.68.

For critical value method, we reject H
0
when our test statistic exceeds the critical value, or in this
case, when F
s
> F
.95
. In our example, 3.08 > 2.68, so we reject the null hypothesis. We conclude
that we have evidence that Protestants have a lower lham_d score than the other religious groups.


11.4 Two-Way ANOVA, and Analysis of Covariance (ANCOVA)

In Chapter 10, on multiple regression, we introduced the idea of using two more or more
quantitative x-variables to predict y. One might wonder if the same idea applies to categorical x-
variables. Can we use two or more to increase our understanding of y? For that matter, could
one mix categorical and quantitative xs? For example, could one use a model with a categorical
x and a quantitative x?

The answer to both questions is yes. To keep the discussion from getting too complex, we will
restrict ourselves to the situation where the model includes 2 x-variables. If both are categorical,
we call the resulting model a two-way analysis of variance. If one is categorical, and one is
quantitative, we call the model an analysis of covariance, often abbreviated ANCOVA.

The basic principles involved are precisely the same as for multiple regression. A full model
will include two main effects and an interaction term:


2 1 12 2 2 1 1 0
x x b x b x b b y

However, it is quite possible that not all of these terms will be significant. We use the principle
of hierarchical elimination, as described in chapter 10, to eliminate non-significant terms,
starting with consideration of the interaction term.

Any categorical x will need to be converted to a set of binary xs, as described earlier in this
chapter. However, SPSS will do this automatically for us, so other than telling SPSS that a
variable is categorical, we can set this up precisely as we would a multiple regression.

You tell SPSS whether a variable is categorical or quantitative by which box of the general linear
model menu you put it in. Quantitative variables go in the Covariate(s) box. Categorical
variables go in one of the Factor(s) boxes: for now, the Fixed Factor(s) box.

Note on terminology: In ANOVA, the independent variables, which will necessarily be
categorical, are commonly called factors. In ANCOVA, the categorical independent variables
continue to be called factors, while the quantitative independent variables are called covariates.

Staying with the survey4 data set, let us consider a two-way ANOVA with lham-d as the
dependent variable, and sex$ and religion$ as factors. Beginning with the full model, we set this
up in SPSS as follows (if you are continuing from the prior example, the first three steps have
already been done):
16
Click Analyze > General Linear Model > Univariate
Click lham_d and then the arrow next to the Dependent Variable box
Click religion$ and the arrow next to the Fixed Factor(s) box
Click sex$ and the arrow next to the Fixed Factor(s) box
Click Model
Click religion$ and then the arrow next to the Model box
Click sex$ and then the arrow next to the Model box
Highlight both religion$ and sex$, make sure interaction is selected in the Build term(s)
box, and then click the arrow next to the Model box
Click Continue
Click Options
Highlight religion$, sex$, and religion$*sex$ by holding down the CTRL key and left
clicking each in turn. Then click the arrow next to the Display Means for box
Click Continue
Click OK

The ANOVA table follows:



The overall model tests significant (P=0.036), so there is evidence for some relationship between
lham_d and the factors. The interaction term however is not significant (P=0.711), so we need
to remove the term from the Model box, and run the analysis again. Here is the ANOVA table:

17


Both main effects are significant (P=0.034 for sex$, and p=0.011 for religion$), so this is our
final model. Next we look at the group means:



We have previously seen and commented on the means for the groups in religion$.

The mean depression score is higher for women than for men. This is a well-known result in the
psychological literature.

We note that the R
2
value for this analysis is 0.051, so while the effects of religion$ and sex$ are
statistically significant, they are not large.

For an example of an ANCOVA, let use sex$ and income as x-variables (note that income is
quantitative. Here is the procedure (we assume the previous analysis has been cleared):
Click Analyze > General Linear Model > Univariate
Click lham_d and then the arrow next to the Dependent Variable box
Click sex$ and the arrow next to the Fixed Factor(s) box
18
Click income and the arrow next to the Covariate(s) box
Click Model
Click sex$ and then the arrow next to the Model box
Click income and then the arrow next to the Model box
Highlight both income and sex$, make sure interaction is selected in the Build term(s)
box, and then click the arrow next to the Model box
Click Continue
Click OK

Here is the ANOVA table:



Neither the sex$ nor the interaction terms are significant. Following the hierarchy principle, we
first remove the interaction term from the model, with the following result:



We see that the sex$ term is still not significant, so we remove it. (Note: if you only remove sex$
from the Model box, and leave it in the Fixed Factor(s) box, you will get a warning message
saying that it is not used in the model. This is what we intend, so you can ignore the message.
Alternatively, you can also remove sex$ from the Fixed Factor(s) box.). Since what is left will
be just a simple regression of lham_d on income, we should also request to see the regression
parameters by clicking Options and Parameter estimates. Here is the result:
19



The income term is significant (P=0.016), so this is the final model.

Notice that this is just a simple regression. The slope for income is negative, so higher incomes
are associated with lower depression scores.

From this analysis, we learn a couple of interesting things:

It may not be quite true that money doesnt buy happiness. People with higher incomes
do have lower depression scores. On the other hand r
2
is only 0.02, which is quite small,
so perhaps when subjected to rigorous statistical analysis, the old saw should be revised
to, Money doesnt buy much happiness.
The well-publicized higher mean depression score of women disappears when income is
taken into account. It is possible that the higher depression score of women is a result of
the lower average income of women, presumably something that society could change,
rather than a biological effect of sex per se.

We do caution that this is an observational study, so we need to be careful about attributing cause
and effect to the relationships. Also, as pointed out, the effect size is modest.

11.5 Model II ANOVA:

Up to now, we have been assuming that the groups were defined by fixed, reproducible
differences. For example, we assume that Protestantism, Catholicism, Judaism, and lack of
religious affiliation are more or less constant conditions. There is another situation in anova
where the groups in our data can be thought of as a random sample of a much larger population
of groups. For example, suppose you are the director of a medical laboratory. You are
20
concerned about the variability in the results of a particular blood test carried out by your
laboratory. One possible source for this variability is differences among the technicians carrying
out the test. To investigate this, you give each technician several blood samples to analyze. In
reality, all of the samples come from a single source, so the true value of whatever is being
measured is the same for all samples. Our interest is in whether there are differences among the
results produced by the different technicians.

We could view this two ways. One is ask whether there are differences among the specific
technicians currently working in our lab. This would fall under the fixed effects model we have
already examined. The other would be to view the present technicians as a sample of all of the
technicians that could work in our lab. In other words, we are not concerned about whether Sally
is different from John, but about whether differences among technicians in general contribute
appreciably to the variability of our results. If there is an appreciable technician contribution to
the variability in our results, we might want to look at ways of reducing this variability. For
example, we might be able to find a different way to perform the test that is less affected by the
skill or judgment of the technician. Or, we might want to change how we train the technicians,
to make their performance more consistent.

When we treat the group differences as constant and reproducible, we have a fixed effects, or
model I anova. This is the approach we have been looking at so far. When we treat the groups
as a random sample from a population of groups, we have a random effects, or model II, anova.

Two questions that commonly arise in model II anova are: 1)is there significant variability
among the groups, and 2)what proportion of the total variance of y is attributable to variance
among the groups?

The first question is answered for model II anova in the same way as for model I. That is, we
look at the P value associated with the F-test in the anova table. A P-value less than leads to
conclusion that the group means are more variable than one would expect by chance. In fact, the
anova P is arguably more useful in model II anova than in model I, because in model II anova,
our explicit interest is in whether there is overall heterogeneity among the group means, while in
model I, our principal interest may well lie in a more specific question.

We can answer the second question by calculating a quantity called the coefficient of intraclass
correlation, which we will symbolize r
I
. It has value zero when differences among the groups
make no contribution to the total variation in y, and value one if all the variation in y is
attributable to variation among the groups.

When the groups have equal sample sizes, n, we can calculate r
I
in two steps as:


n
MSE MSR
s
g

2

MSE s
s
r
g
g
I

2
2


However, SPSS will do most of the calculations for us

For an example, we will use the data set BARLEY. It contains data on yields of barley. We will
use the variable y1932 for y, and site$ for x. Y1932 contains barley yields for experimental plots
21
planted in 1932. SITE$ designates six different sites. There were 10 plots at each site. (The
plots happen to represent different varieties of barley, but we will ignore this fact for the present
analysis.) We want to know whether there is significant variability in yield among the sites, and
what proportion of the total variability is accounted for by site.

First run an ANOVA with the general linear model routine, precisely as one would for a Model I
ANOVA. From this we determine that there is a significant added variance component due to
site (P = 0.000):



To estimate variance components:
Click Analyze > General Linear Model > Variance Components
Click y1932 and then the arrow next to the Dependent Variable box
Click site$ and then the arrow next to the Random Factor(s) box
Click OK

Here is the output:


The error variance, Var(Error), is the same as error mean square from the ANOVA table,
29.671, as it should be (remember, mean square is just another name for variance).

The site variance is new: 68.909. The coefficient of intraclass correlation is the proportion of the
total variance (error variance + site variance) that is site variance:

699 . 0
671 . 29 909 . 68
909 . 68
2
2

MSE s
s
r
g
g
I


Thus, about 70% of the total variance in crop yield is attributable to variation among sites.
22
Critical Values of the F-distribution for Area = 0.95



23

You might also like