You are on page 1of 15

Analysis of

Variance
(ANOVA)

Analysis of Variance
is a technique that partitions the
total sum of squares of deviations of
the observations about their mean
into portions associated with
independent variables in the
experiment and a portion
associated with error.

Assumptions for ANOVA


1. Each of the I population or group distributions
is normal. -check with a Normal Quantile
Plot (or boxplot) of each group
2. These distributions have identical variances
(standard deviations).
-check if largest sd is > 2 times smallest
sd
3. Each of the I samples is a random sample.
4. Each of the I samples is selected
independently of one another.

ANOVA: Comparing Several


Means
The null hypothesis (step 1) for comparing
several means is
H 0 : 1 2 I
where I is the number of populations to be compared

The alternative hypothesis


(step 2) is
H a : not all of the i are equal
(at least one of the means
is different from the others)

ANOVA: Comparing Several


Means
Step 3: State the significance level
Step 4: Calculate the F-statistic:

Mean Squares Group


MSG
F
or
Mean Squares Error
MSE
This compares the variation between groups (group mean
to group mean) to the variation within groups (individual
values to group means).

This is what gives it the name Analysis of Variance.

ANOVA: Comparing Several


Means

Step 5: Find the P-value


The P-value for an ANOVA F-test is always onesided.
The P-value is

Pr( Fdf1 ,df 2 Fcalculated )

where df1 = I 1 (number of groups minus 1) and


df2 = N I (total sample size minus number of groups).
P-value
F-distribution:

ANOVA: Comparing Several


Means

Step 6. Reject or fail to reject H 0 based on the Pvalue.


If the P-value is less than or equal to , reject H0.
It the P-value is greater than , fail to reject H0.
Step 7. State your conclusion.
If H0 is rejected, There is significant statistical
evidence that at least one of the population
means is different from another.
If H0 is not rejected, There is not significant
statistical evidence that at least one of the
population means is different from another.

ANOVA Table
Source

df

Sum of Squares

Group
(between)

I1

ni ( xi x ) 2 SSG

SSG
MSG
dfG

Error
(within)

NI

(n 1)s

SSE
MSE
dfE

Total

N1

(x

ij

SSE

x ) 2 SSTot

Mean Square

p-value

MSG
Fcalc
MSE

Pr( F Fcalc )

SSTot
MST
dfTot

Note: MSE is the pooled sample variance and SSG + SSE = SST

R2

SSG
is the proportion of the total variation explained by
SSTot
the difference in means

Example

Summary
ANOVA:
allows us to know if variability in a
data set is between groups or merely
within groups
is more versatile than t-test
can compare multiple groups at once
cannot process multiple response
variables
does not indicate which groups are
different

You might also like