Analysis of Variance Basic Concepts: Yosni Bakar STAB 2004

ANALYSIS OF VARIANCE
Basic concepts
YOSNI BAKAR
STAB 2004
Before we proceed to discuss

the nitty-gritty of
experimental designs,
let us ask ourselves a couple
of basic questions..
ANALYSIS OF VARIANCE
QUESTION 1
Why not use t-tests to compare more than
two groups?
Recall when we want to compare two groups we
Y1 Y2 ( 1 2 )
use t-test
t
/ 2 , df
1 1
s p

n1 n2
2
The number of t-tests to compare all possible pairs

would be
k (k 1)
2
where k equals the number of groups.
Imagine we have a design with three groups:

G1, G2, G3
We will have to
compare G1
compare G1
compare G2
run three separate t-tests

with G2,
with G3, and
with G3
For every test we use a general -level of

0.05
Reason: The Family-wise error rate
ANOVA: general concepts

-level
-level=0.05
5% possibility to make Type I error, i.e. rejecting
H0, when H0 is actually true.
True
Accept Ho
Accept Ho
1-
Reject Ho
Type I error,
Reject Ho
Type II error,
1-
95% possibilities NOT to make type I error.

Our scope is too reduce the possibilities to have
Type I error
Decision
If we were to run 3 separate t-tests to compare

G1,G2 and G3, each with a -level of 0.05,
The overall possibility NOT to make Type I error
would be 0.857, that is
(0.95)3 = 0.857
Therefore subtracting that from the overall
possibility
NOT to make Type 1 error (1=100%)
1-0.857=0.14
Thus, have 14% of possibilities to make Type 1
error.
14% >> than the usual 5%
FER: The family-wise error rate
FER = (1 (0.95)n
where n is the number of tests that have to be

carried out
The larger the number of tests that have to be
carried out the larger the possibility to have Type
I error
Example: with 4 groups (1vs2, 1vs3, 1vs4, 2vs3,
2vs4,
4vs5)
1 (0.95)6=0.27
27% of possibilities to make type I error!!
The family-wise error rate is the reason why

ANOVA is used instead of single t-tests
ANOVA tell us the overall difference among the

groups but it does not say anything about
possible differences among groups

QUESTION 2
Why analyze variances in order to derive
conclusions about the means?
To answer the question, we need to understand a
few concepts pertaining to analysis of variance.
Measures of variation: Sum of Squares &

Mean Squares
Partitioning of variation
F-tests

Basic premise
Comparing means
We want to know if the observed differences in
sample means are likely to have occurred by
chance just because of random sampling.
Difference
in
means
small
relative to
overall
variability
Differenc
e in
means
large
relative to
overall
variability
This will likely depend on both the difference

between the sample means and how much
variability there is within each sample.

Basic premise
Consider the following two experiments to examine the yields of
three different varieties of wheat. In both experiments, nine
plots of land were randomized to three different varieties (three
plots for each variety) and the yield was measured at the end of
the season.
Look at is the variability within each group as compared to the

variability among the group means to ascertain if there is
evidence of a difference in the group population means.

Measures of variation
Recall, the sample variance
X X
n 1
sum of squares
degree of freedom
The distance from any data point to the mean is

the deviation from this point to the mean
X X
Deviations will be both positive and negative;

and the sum will be zero.

Measures of variation
Variation
We can measure variation as the sum of the

squares of the deviations between a value and
the mean of the value
SS X X

Partitioning of variation: Total Variation
Look at the figure more in details:
Experiment I
Deviation observations
B
from the grand mean
85
7
5
6
5
Grand mean =
75
Sum of (deviations)2 = Sum of squares (SS)

Are all of the values identical?

No, so there is some variation in the data
This is a measure of the total variability in the
dataset.
This is called the total variation which is the sum
of squares of the deviations of the data around
the grand mean. k r
SStotal
Y
i 1 j 1
ij

Partitioning of variation: Between groups
Are all of the sample means identical?

No, so there is some variation between the
groups
This is called the between group variation
measured as the sum of squares of the
deviations of the group means from the grand
mean, weighted by the sample size of each group
SS Between ri Ygroup Y
k
i 1

Next: Experiment I
B
Deviation of group mean

from the grand mean
85
7
5
6
5
C
A
Grand mean =
75

Partitioning of variation: Within group
Are each of the values within each group

identical?
No, there is some variation within the groups
This is called the within group variation or the
sum of squares of the deviations of the data
around the separate group means
Sometimes called the error variation
k
SS within SS error (Yij Ygroup )

i 1 j 1

Next: Experiment I
B
Deviation of observation
from group mean
85
7
5
6
5
C
A
Grand mean =
75

Partitioning of variation summary
Variability is measured in terms of sums of squares because

these three quantities have the simple relationship
Variation
in all
=
observatio
ns
Total sum
of
=
squares
Variation of
each group
mean from
the grand
Between
mean
groups sum
of squares
(Explained
variation)
Variation
+ between each
observation
from its group
Within
group
mean
+ sum of
squares
(Unexplained
variation)
SST = SSB + SSE

So the total variability (SST) has been divided into two
components: that due to differences between plots given
different treatments (SSB), and that due to differences between
plots given the same treatment (SSE)

Partitioning of variation summary
SS
Treatme
nt or
betwee
n
groups
SS

A comparison of SSB and SSE is going to

indicate whether fitting the three fertilizer means
accounts for a significant amount of variability in
the data
However, for a valid comparison between these
two sources of variability, we need to compare
the variability per degree of freedom, i.e. the
variances.

Variation: MS
Combining the information on SS and df,

we can arrive at a measure of
variability per df.
X X
n 1
sum of squares
degree of freedom
This is equivalent to a variance, and in

the context of ANOVA is called a mean
square (MS).

Hypothesis testing
In ANOVA we ask: is there truly a difference in means among groups?
Formally, we can specify the null hypotheses:
Ho: 1 = 2 =3 = 0
If fertilizer means are equal, it implies that all i = 0. Remember
i = (i. ), therefore it is equivalent to
Ho : 1. = 2. = 3.
In words,
H0: The means of all groups are equal.
The alternative hypothesis
Ha : not all i equal 0
which is equivalent to
Ha : at least one of the i s is different
or Ha: The (pop.) means are not all equal. (Note: This is different than
saying they are all unequal !)
Compare the two sources of variability: MSE and MSB

Our test statistic is
Fobs
MSB
variance between groups
MSE variance within the groups
If Fobs is small (close to 1), then variability between groups is

negligible compared to variation within groups
> The grouping does not explain much variation in the data i.e.
the methods are similar
If Fobs is large, then variability between groups is large

compared to variation within group
> The grouping explains a lot of the variation in the data i.e.
the three methods are very different
The F-ratio can be quite large even when there

are no treatment differences.
At what point do we decide that the size of the Fratio is due to treatment differences rather than
chance?
Again set at = 0.05
Decision rule for the

ANOVA test statistic
Reject H0 if
How to find the value of

F;k1,Nk
Fobs F;k1,Nk
Fail to reject H0 if
Fobs < F;k1,Nk
We are using a one-sided

rejection region
The F test is always a
one-tailed test
In other words, since F is
a ratio of two variances it
can never be less than 0
Critical value

F-table

Multiple comparison procedures (MCP)
If we do not reject H0 in an ANOVA, the analysis is

finished -there are no differences among the
means.
If we, however, reject H0, ANOVA tells us at least
two groups have different means.
But it does not tell us which two groups have
different means. In order to figure out which of
the groups have different means, we need to
perform a post-hoc test.

Analysis of Variance Basic Concepts: Yosni Bakar STAB 2004

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Analysis of Variance Basic Concepts: Yosni Bakar STAB 2004

Uploaded by

Copyright:

Available Formats

ANALYSIS OF VARIANCE

Before we proceed to discuss

The number of t-tests to compare all possible pairs

where k equals the number of groups.

Imagine we have a design with three groups:

run three separate t-tests

For every test we use a general -level of

ANOVA: general concepts

95% possibilities NOT to make type I error.

If we were to run 3 separate t-tests to compare

FER: The family-wise error rate

where n is the number of tests that have to be

The family-wise error rate is the reason why

ANOVA tell us the overall difference among the

ANOVA: general concepts

Measures of variation: Sum of Squares &

ANOVA: general concepts

This will likely depend on both the difference

ANOVA: general concepts

Look at is the variability within each group as compared to the

ANOVA: general concepts

Recall, the sample variance

The distance from any data point to the mean is

Deviations will be both positive and negative;

ANOVA: general concepts

We can measure variation as the sum of the

ANOVA: general concepts

Sum of (deviations)2 = Sum of squares (SS)

ANOVA: general concepts

Are all of the values identical?

ANOVA: general concepts

Are all of the sample means identical?

ANOVA: general concepts

Deviation of group mean

ANOVA: general concepts

Are each of the values within each group

SS within SS error (Yij Ygroup )

ANOVA: general concepts

ANOVA: general concepts

Variability is measured in terms of sums of squares because

SST = SSB + SSE

ANOVA: general concepts

ANOVA: general concepts

A comparison of SSB and SSE is going to

ANOVA: general concepts

Combining the information on SS and df,

This is equivalent to a variance, and in

ANOVA: general concepts

Compare the two sources of variability: MSE and MSB

MSE variance within the groups

If Fobs is small (close to 1), then variability between groups is

If Fobs is large, then variability between groups is large

The F-ratio can be quite large even when there

Decision rule for the

How to find the value of

We are using a one-sided

ANOVA: general concepts

ANOVA: general concepts

If we do not reject H0 in an ANOVA, the analysis is

You might also like