You are on page 1of 31

ANALYSIS OF VARIANCE

Basic concepts

YOSNI BAKAR
STAB 2004

Before we proceed to discuss


the nitty-gritty of
experimental designs,
let us ask ourselves a couple
of basic questions..

ANALYSIS OF VARIANCE

QUESTION 1
Why not use t-tests to compare more than
two groups?
Recall when we want to compare two groups we
Y1 Y2 ( 1 2 )
use t-test
t

/ 2 , df

1 1
s p

n1 n2
2

The number of t-tests to compare all possible pairs


would be

k (k 1)
2

where k equals the number of groups.

Imagine we have a design with three groups:


G1, G2, G3

We will have to
compare G1
compare G1
compare G2

run three separate t-tests


with G2,
with G3, and
with G3

For every test we use a general -level of


0.05
Reason: The Family-wise error rate

ANOVA: general concepts


-level

-level=0.05
5% possibility to make Type I error, i.e. rejecting
H0, when H0 is actually true.
True
Accept Ho

Accept Ho

1-

Reject Ho

Type I error,

Reject Ho
Type II error,
1-

95% possibilities NOT to make type I error.


Our scope is too reduce the possibilities to have
Type I error

Decision

If we were to run 3 separate t-tests to compare


G1,G2 and G3, each with a -level of 0.05,
The overall possibility NOT to make Type I error
would be 0.857, that is
(0.95)3 = 0.857
Therefore subtracting that from the overall
possibility
NOT to make Type 1 error (1=100%)
1-0.857=0.14
Thus, have 14% of possibilities to make Type 1
error.
14% >> than the usual 5%

FER: The family-wise error rate

FER = (1 (0.95)n

where n is the number of tests that have to be


carried out
The larger the number of tests that have to be
carried out the larger the possibility to have Type
I error
Example: with 4 groups (1vs2, 1vs3, 1vs4, 2vs3,
2vs4,
4vs5)
1 (0.95)6=0.27
27% of possibilities to make type I error!!

The family-wise error rate is the reason why


ANOVA is used instead of single t-tests

ANOVA tell us the overall difference among the


groups but it does not say anything about
possible differences among groups

ANOVA: general concepts


QUESTION 2
Why analyze variances in order to derive
conclusions about the means?
To answer the question, we need to understand a
few concepts pertaining to analysis of variance.

Measures of variation: Sum of Squares &


Mean Squares
Partitioning of variation
F-tests

ANOVA: general concepts


Basic premise
Comparing means
We want to know if the observed differences in
sample means are likely to have occurred by
chance just because of random sampling.

Difference
in
means
small
relative to
overall
variability

Differenc
e in
means
large
relative to
overall
variability

This will likely depend on both the difference


between the sample means and how much
variability there is within each sample.

ANOVA: general concepts


Basic premise
Consider the following two experiments to examine the yields of
three different varieties of wheat. In both experiments, nine
plots of land were randomized to three different varieties (three
plots for each variety) and the yield was measured at the end of
the season.

Look at is the variability within each group as compared to the


variability among the group means to ascertain if there is
evidence of a difference in the group population means.

ANOVA: general concepts


Measures of variation

Recall, the sample variance

X X

n 1

sum of squares

degree of freedom

The distance from any data point to the mean is


the deviation from this point to the mean

X X

Deviations will be both positive and negative;


and the sum will be zero.

ANOVA: general concepts


Measures of variation

Variation

We can measure variation as the sum of the


squares of the deviations between a value and
the mean of the value

SS X X

ANOVA: general concepts


Partitioning of variation: Total Variation
Look at the figure more in details:
Experiment I
Deviation observations
B
from the grand mean

85
7
5
6
5

Grand mean =
75

Sum of (deviations)2 = Sum of squares (SS)

ANOVA: general concepts


Partitioning of variation

Are all of the values identical?


No, so there is some variation in the data
This is a measure of the total variability in the
dataset.
This is called the total variation which is the sum
of squares of the deviations of the data around
the grand mean. k r

SStotal

Y
i 1 j 1

ij

ANOVA: general concepts


Partitioning of variation: Between groups

Are all of the sample means identical?


No, so there is some variation between the
groups
This is called the between group variation
measured as the sum of squares of the
deviations of the group means from the grand
mean, weighted by the sample size of each group

SS Between ri Ygroup Y
k

i 1

ANOVA: general concepts


Partitioning of variation
Next: Experiment I
B

Deviation of group mean


from the grand mean

85
7
5
6
5

C
A

Grand mean =
75

ANOVA: general concepts


Partitioning of variation: Within group

Are each of the values within each group


identical?
No, there is some variation within the groups
This is called the within group variation or the
sum of squares of the deviations of the data
around the separate group means
Sometimes called the error variation
k

SS within SS error (Yij Ygroup )


i 1 j 1

ANOVA: general concepts


Partitioning of variation
Next: Experiment I
B

Deviation of observation
from group mean

85
7
5
6
5

C
A

Grand mean =
75

ANOVA: general concepts


Partitioning of variation summary

Variability is measured in terms of sums of squares because


these three quantities have the simple relationship
Variation
in all
=
observatio
ns
Total sum
of
=
squares

Variation of
each group
mean from
the grand
Between
mean
groups sum
of squares
(Explained
variation)

Variation
+ between each
observation
from its group
Within
group
mean
+ sum of
squares
(Unexplained
variation)

SST = SSB + SSE


So the total variability (SST) has been divided into two
components: that due to differences between plots given
different treatments (SSB), and that due to differences between
plots given the same treatment (SSE)

ANOVA: general concepts


Partitioning of variation summary
SS
Treatme
nt or
betwee
n
groups
SS

ANOVA: general concepts


Partitioning of variation

A comparison of SSB and SSE is going to


indicate whether fitting the three fertilizer means
accounts for a significant amount of variability in
the data
However, for a valid comparison between these
two sources of variability, we need to compare
the variability per degree of freedom, i.e. the
variances.

ANOVA: general concepts


Partitioning of variation

Variation: MS

Combining the information on SS and df,


we can arrive at a measure of
variability per df.

X X

n 1

sum of squares

degree of freedom

This is equivalent to a variance, and in


the context of ANOVA is called a mean
square (MS).

ANOVA: general concepts


Hypothesis testing
In ANOVA we ask: is there truly a difference in means among groups?
Formally, we can specify the null hypotheses:

Ho: 1 = 2 =3 = 0
If fertilizer means are equal, it implies that all i = 0. Remember
i = (i. ), therefore it is equivalent to
Ho : 1. = 2. = 3.
In words,
H0: The means of all groups are equal.
The alternative hypothesis
Ha : not all i equal 0
which is equivalent to
Ha : at least one of the i s is different
or Ha: The (pop.) means are not all equal. (Note: This is different than
saying they are all unequal !)

Compare the two sources of variability: MSE and MSB


Our test statistic is

Fobs

MSB
variance between groups

MSE variance within the groups

If Fobs is small (close to 1), then variability between groups is


negligible compared to variation within groups
> The grouping does not explain much variation in the data i.e.
the methods are similar

If Fobs is large, then variability between groups is large


compared to variation within group
> The grouping explains a lot of the variation in the data i.e.
the three methods are very different

The F-ratio can be quite large even when there


are no treatment differences.
At what point do we decide that the size of the Fratio is due to treatment differences rather than
chance?
Again set at = 0.05

Decision rule for the


ANOVA test statistic
Reject H0 if

How to find the value of


F;k1,Nk

Fobs F;k1,Nk

Fail to reject H0 if
Fobs < F;k1,Nk

We are using a one-sided


rejection region
The F test is always a
one-tailed test
In other words, since F is
a ratio of two variances it
can never be less than 0

Critical value

ANOVA: general concepts


F-table

ANOVA: general concepts


Multiple comparison procedures (MCP)

If we do not reject H0 in an ANOVA, the analysis is


finished -there are no differences among the
means.
If we, however, reject H0, ANOVA tells us at least
two groups have different means.
But it does not tell us which two groups have
different means. In order to figure out which of
the groups have different means, we need to
perform a post-hoc test.

You might also like