You are on page 1of 31

Day 10
Session-I

ANOVA: Analysis of Variance

Learning Objectives

 To compare more than two population means

using one way analysis of variance
 To use the F distribution to test hypotheses about
two population variances
 To learn about randomized block design
 To learn the technique of two-way analysis of
variance and the concept of interaction

Created by: Prabhat Mittal 1

E-mail ID: profmittal@yahoo.co.in
Chapter Overview

One-Way Randomized Two-Way

ANOVA Block Design ANOVA

F-test Multiple Interaction

Comparisons Effects
Tukey-
Kramer
test

General ANOVA Setting

 Investigator controls one or more independent
variables
 Called factors (or treatment variables)
 Each factor contains two or more levels (or groups or
categories/classifications)
 Observe effects on the dependent variable
 Response to levels of independent variable
 Experimental design: the plan used to collect
the data

Created by: Prabhat Mittal 2

E-mail ID: profmittal@yahoo.co.in
Completely Randomized Design

 Experimental units (subjects) are assigned

randomly to treatments
 Subjects are assumed homogeneous
 Only one factor or independent variable
 With two or more treatment levels
 Analyzed by one-way analysis of variance
(ANOVA)

 Evaluate the difference among the means of three

or more groups
Examples: Accident rates for 1st, 2nd, and 3rd shift
Expected mileage for five brands of tires

 Assumptions
 Populations are normally distributed

Created by: Prabhat Mittal 3

E-mail ID: profmittal@yahoo.co.in
Hypotheses of One-Way ANOVA

 H0 : µ1 = µ2 = µ3 = L = µc
 All population means are equal
 i.e., no treatment effect (no variation in means among
groups)


H1 : Not all of the population means are the same
 At least one population mean is different
 i.e., there is a treatment effect
 Does not mean that all population means are different
(some pairs may be the same)

One-Factor ANOVA
H0 : µ1 = µ2 = µ3 = L = µc
H1 : Not all µ j are the same

All Means are the same:

The Null Hypothesis is True
(No Treatment Effect)

µ1 = µ2 = µ3

Created by: Prabhat Mittal 4

E-mail ID: profmittal@yahoo.co.in
One-Factor ANOVA
(continued)
H0 : µ1 = µ2 = µ3 = L = µc
H1 : Not all µ j are the same
At least one mean is different:
The Null Hypothesis is NOT true
(Treatment Effect is present)

or

µ1 = µ2 ≠ µ3 µ1 ≠ µ2 ≠ µ3

SST = Total Sum of Squares

(Total variation)
SSA = Sum of Squares Among Groups
(Among-group variation)
SSW = Sum of Squares Within Groups
(Within-group variation)

Created by: Prabhat Mittal 5

E-mail ID: profmittal@yahoo.co.in
Partitioning the Variation
(continued)

Total Variation = the aggregate dispersion of the individual

data values across the various factor levels (SST)

Among-Group Variation = dispersion between the factor

sample means (SSA)

Within-Group Variation = dispersion that exists among

the data values within a particular factor level (SSW)

d.f. = n – 1

Variation Due to Variation Due to Random

= Factor (SSA) + Sampling (SSW)
d.f. = c – 1 d.f. = n – c

Commonly referred to as: Commonly referred to as:

 Sum of Squares Between  Sum of Squares Within
 Sum of Squares Among  Sum of Squares Error
 Sum of Squares Explained  Sum of Squares Unexplained
 Among Groups Variation  Within-Group Variation

Created by: Prabhat Mittal 6

E-mail ID: profmittal@yahoo.co.in
Total Sum of Squares
SST = SSA + SSW
c nj

Where: j=1 i =1

SST = Total sum of squares

c = number of groups (levels or treatments)
nj = number of observations in group j
Xij = ith observation from group j
X = grand mean (mean of all data values)

Total Variation
(continued)

Response, X

Created by: Prabhat Mittal 7

E-mail ID: profmittal@yahoo.co.in
Among-Group Variation
SST = SSA + SSW
c
SSA = ∑ n j ( X j − X)2
j=1
Where:
SSA = Sum of squares among groups
c = number of groups
nj = sample size from group j
Xj = sample mean from group j
X = grand mean (mean of all data values)

Among-Group Variation
(continued)
c
SSA = ∑ n j ( X j − X)2
j=1

Variation Due to SSA

Differences Among Groups
MSA =
c −1
Mean Square Among =
SSA/degrees of freedom

µi µj

Created by: Prabhat Mittal 8

E-mail ID: profmittal@yahoo.co.in
Among-Group Variation
(continued)

SSA = n1 ( x1 − x )2 + n 2 ( x 2 − x )2 + ... + nc ( x c − x )2

Response, X

X3
X2 X
X1

Group 1 Group 2 Group 3

Within-Group Variation
SST = SSA + SSW
c nj

SSW = ∑ ∑ ( Xij − X j )2
j=1 i=1
Where:

SSW = Sum of squares within groups

c = number of groups
nj = sample size from group j
Xj = sample mean from group j
Xij = ith observation in group j

Created by: Prabhat Mittal 9

E-mail ID: profmittal@yahoo.co.in
Within-Group Variation
(continued)

c nj

SSW = ∑ ∑ ( Xij − X j )2
j=1 i=1

Summing the variation SSW

within each group and then
MSW =
Mean Square Within =
SSW/degrees of freedom

µj

Within-Group Variation
(continued)

Response, X

X3
X2
X1

Created by: Prabhat Mittal 10

E-mail ID: profmittal@yahoo.co.in
Obtaining the Mean Squares

SSA
MSA =
c −1
SSW
MSW =
n−c
SST
MST =
n −1

One-Way ANOVA Table

Source of SS df MS F ratio
Variation (Variance)
Among SSA MSA
SSA c-1 MSA = F=
Groups c-1 MSW
Within SSW
SSW n-c MSW =
Groups n-c
SST =
Total n-1
SSA+SSW
c = number of groups
n = sum of the sample sizes from all groups
df = degrees of freedom

Created by: Prabhat Mittal 11

E-mail ID: profmittal@yahoo.co.in
One-Way ANOVA
F Test Statistic
H0: µ1= µ2 = … = µc
H1: At least two population means are different

 Test statistic MSA

F=
MSW
MSA is mean squares among groups
MSW is mean squares within groups

 Degrees of freedom
 df1 = c – 1 (c = number of groups)
 df2 = n – c (n = sum of sample sizes from all populations)

Interpreting One-Way ANOVA

F Statistic
 The F statistic is the ratio of the among
estimate of variance and the within estimate
of variance
 The ratio must always be positive
 df1 = c -1 will typically be small
 df2 = n - c will typically be large

Decision Rule:
 Reject H0 if F > FU, α = .05
otherwise do not
reject H0 0 Do not Reject H0
reject H0
FU

Created by: Prabhat Mittal 12

E-mail ID: profmittal@yahoo.co.in
One-Way ANOVA
F Test Example

You want to see if three Club 1 Club 2 Club 3

different golf clubs yield 254 234 200
different distances. You 263 218 222
randomly select five 241 235 197
measurements from trials on 237 227 206
an automated driving 251 216 204
machine for each club. At the
0.05 significance level, is
there a difference in mean
distance?

One-Way ANOVA Example:

Scatter Diagram
Distance
Club 1 Club 2 Club 3 270
254 234 200 260 •
263 218 222 ••
241 235 197
250 X1
240 •
237 227 206 • ••
251 216 204 230
220

X2 • X
••
210
x1 = 249.2 x 2 = 226.0 x 3 = 205.8 200
••
••
X3
x = 227.0 190

1 2 3
Club

Created by: Prabhat Mittal 13

E-mail ID: profmittal@yahoo.co.in
One-Way ANOVA Example
Computations
Club 1 Club 2 Club 3 X1 = 249.2 n1 = 5
254 234 200 X2 = 226.0 n2 = 5
263 218 222
X3 = 205.8 n3 = 5
241 235 197
237 227 206 n = 15
X = 227.0
251 216 204 c=3
SSA = 5 (249.2 – 227)2 + 5 (226 – 227)2 + 5 (205.8 – 227)2 = 4716.4
SSW = (254 – 249.2)2 + (263 – 249.2)2 +…+ (204 – 205.8)2 = 1119.6

MSA = 4716.4 / (3-1) = 2358.2 2358.2

F= = 25.275
MSW = 1119.6 / (15-3) = 93.3 93.3

One-Way ANOVA Example

Solution
H0: µ1 = µ2 = µ3 Test Statistic:
H1: µj not all equal
MSA 2358.2
α = 0.05 F= = = 25.275
df1= 2 df2 = 12 MSW 93.3

Critical Decision:
Value:
Reject H0 at α = 0.05
FU = 3.89
α = .05 Conclusion:
There is evidence that
0 Do not Reject H0 at least one µj differs
reject H0 F = 25.275
FU = 3.89 from the rest

Created by: Prabhat Mittal 14

E-mail ID: profmittal@yahoo.co.in
One-Way ANOVA
Excel Output
EXCEL: tools | data analysis | ANOVA: single factor
SUMMARY
Groups Count Sum Average Variance
Club 1 5 1246 249.2 108.2
Club 2 5 1130 226 77.5
Club 3 5 1029 205.8 94.2
ANOVA
Source of
SS df MS F P-value F crit
Variation
Between
4716.4 2 2358.2 25.275 4.99E-05 3.89
Groups
Within
1119.6 12 93.3
Groups
Total 5836.0 14

Numerical Problems
Ref: 11-28. Page no. 604 The following data show the
number of claims processed per day for a group of
four insurance company employees observed for a
number of days. Test the hypothesis that the
employees’ mean claims per day are all the same.
Use the 0.05 level of significance.
Employee 1 15 17 14 12
Employee 2 12 10 13 17
Employee 3 11 14 13 15 12
Employee 4 13 12 12 14 10 9

Created by: Prabhat Mittal 15

E-mail ID: profmittal@yahoo.co.in
One-Way ANOVA Excel Output
EXCEL: tools | data analysis | ANOVA: single factor
SUMMARY

Groups Count Sum Average Variance

Employee 1 4 58 14.5 4.333333
Employee 2 4 52 13 8.666667
Employee 3 5 65 13 2.5
Employee 4 6 70 11.67 3.466667
Sources of variation SS Df Ms F P-value F Crit
Between groups 19.45 3 6.48 1.46 0.26 3.28

Within Groups 66.33 15 4.42

Total 85.78 18
Do not reject Ho. The employees' productivities are not
significantly different

Day 10
Session-II

Created by: Prabhat Mittal 16

E-mail ID: profmittal@yahoo.co.in
The Tukey-Kramer Procedure

 Tells which population means are significantly

different
 e.g.: µ1 = µ2 ≠ µ3
 Done after rejection of equal means in ANOVA
 Allows pair-wise comparisons
 Compare absolute mean differences with critical
range

µ1= µ2 µ3 x

Tukey-Kramer Critical Range

MSW  1 1 
Critical Range = QU +
2  n j n j' 

where:
QU = Value from Studentized Range Distribution
with c and n - c degrees of freedom for
the desired level of α (see appendix E.9 table)
MSW = Mean Square Within
nj and nj’ = Sample sizes from groups j and j’

Created by: Prabhat Mittal 17

E-mail ID: profmittal@yahoo.co.in
The Tukey-Kramer Procedure:
Example
1. Compute absolute mean
Club 1 Club 2 Club 3 differences:
254 234 200
263 218 222 x1 − x 2 = 249.2 − 226.0 = 23.2
241 235 197 x1 − x 3 = 249.2 − 205.8 = 43.4
237 227 206
251 216 204 x 2 − x 3 = 226.0 − 205.8 = 20.2

2. Find the QU value from the table in appendix E.10 with

c = 3 and (n – c) = (15 – 3) = 12 degrees of freedom
for the desired level of α (α = 0.05 used here):

QU = 3.77

The Tukey-Kramer Procedure:

Example
(continued)
3. Compute Critical Range:
MSW  1 1  93.3  1 1 
Critical Range = QU + = 3.77  +  = 16.285
2  n j n j'  2 5 5

4. Compare:
5. All of the absolute mean differences x1 − x 2 = 23.2
are greater than critical range.
Therefore there is a significant x1 − x 3 = 43.4
difference between each pair of
means at 5% level of significance. x 2 − x 3 = 20.2
Thus, with 95% confidence we can conclude
that the mean distance for club 1 is greater
than club 2 and 3, and club 2 is greater than
club 3.

Created by: Prabhat Mittal 18

E-mail ID: profmittal@yahoo.co.in
The Randomized Block Design

 Like One-Way ANOVA, we test for equal

population means (for different factor levels, for
example)...

 ...but we want to control for possible variation

from a second factor (with two or more levels)

SST = Total variation

SSA = Among-Group variation
SSBL = Among-Block variation
SSE = Random variation

Created by: Prabhat Mittal 19

E-mail ID: profmittal@yahoo.co.in
Sum of Squares for Blocking
SST = SSA + SSBL + SSE

r
SSBL = c ∑ ( Xi. − X)2
i=1
Where:

c = number of groups
r = number of blocks
Xi. = mean of all values in block i
X = grand mean (mean of all data values)

Partitioning the Variation

 Total variation can now be split into three parts:

SST and SSA are SSE = SST – (SSA + SSBL)

computed as they were
in One-Way ANOVA

Created by: Prabhat Mittal 20

E-mail ID: profmittal@yahoo.co.in
Mean Squares

SSBL
MSBL = Mean square blocking =
r −1

SSA
MSA = Mean square among groups =
c −1

SSE
MSE = Mean square error =
(r − 1)(c − 1)

Randomized Block ANOVA Table

Source of
SS df MS F ratio
Variation
Among MSA
Treatments SSA c-1 MSA
MSE
Among SSBL r-1 MSBL MSBL
Blocks
MSE
Error SSE (r–1)(c-1) MSE

Total SST rc - 1
c = number of populations rc = sum of the sample sizes from all populations
r = number of blocks df = degrees of freedom

Created by: Prabhat Mittal 21

E-mail ID: profmittal@yahoo.co.in
Blocking Test
H0 : µ1. = µ2. = µ3. = ...
H1 : Not all block means are equal

MSBL
F=
MSE
 Blocking test: df1 = r – 1
df2 = (r – 1)(c – 1)

Reject H0 if F > FU

Main Factor Test

H0 : µ.1 = µ.2 = µ.3 = ... = µ.c
H1 : Not all population means are equal

MSA
F=
MSE
 Main Factor test: df1 = c – 1
df2 = (r – 1)(c – 1)

Reject H0 if F > FU

Created by: Prabhat Mittal 22

E-mail ID: profmittal@yahoo.co.in
The Tukey Procedure

 To test which population means are significantly

different
 e.g.: µ1 = µ2 ≠ µ3
 Done after rejection of equal means in randomized
block ANOVA design
 Allows pair-wise comparisons
 Compare absolute mean differences with critical
range

µ1= µ2 µ3 x

The Tukey Procedure

(continued)

MSE
Critical Range = Qu
r

Compare:
Is x.j − x.j' > Critical Range ? x.1 − x .2
If the absolute mean difference x.1 − x .3
is greater than the critical range
then there is a significant x.2 − x .3
difference between that pair of
means at the chosen level of etc...
significance.

Created by: Prabhat Mittal 23

E-mail ID: profmittal@yahoo.co.in
Factorial Design:
Two-Way ANOVA
 Examines the effect of
 Two factors of interest on the dependent
variable
 e.g., Percent carbonation and line speed on soft drink
bottling process
 Interaction between the different levels of these
two factors
 e.g., Does the effect of one particular carbonation
level depend on which level the line speed is set?

Two-Way ANOVA
(continued)

 Assumptions

 Populations are normally distributed

 Populations have equal variances
 Independent random samples are
drawn

Created by: Prabhat Mittal 24

E-mail ID: profmittal@yahoo.co.in
Two-Way ANOVA
Sources of Variation
Two Factors of interest: A and B
r = number of levels of factor A
c = number of levels of factor B
n’ = number of replications for each cell
n = total number of observations in all cells
(n = rcn’)
Xijk = value of the kth observation of level i of
factor A and level j of factor B

Two-Way ANOVA
Sources of Variation (continued)

SST = SSA + SSB + SSAB + SSE Degrees of

Freedom:
SSA r–1
Factor A Variation

SST SSB c–1

Factor B Variation
Total Variation
SSAB
Variation due to interaction (r – 1)(c – 1)
between A and B
n-1
SSE rc(n’ – 1)
Random variation (Error)

Created by: Prabhat Mittal 25

E-mail ID: profmittal@yahoo.co.in
Two Factor ANOVA Equations

Total Variation: r c n′
SST = ∑∑∑ ( Xijk − X)2
i=1 j =1 k =1

Factor A Variation: r
SSA = cn′∑ ( Xi.. − X)2
i=1

Factor B Variation:
c
SSB = rn′∑ ( X. j. − X)2
j =1

Two Factor ANOVA Equations

(continued)

Interaction Variation:
r c
SSAB = n′∑∑ ( Xij. − Xi.. − X.j. + X)2
i =1 j=1

Sum of Squares Error:

r c n′
SSE = ∑∑∑ ( Xijk − Xij. )2
i=1 j=1 k =1

Created by: Prabhat Mittal 26

E-mail ID: profmittal@yahoo.co.in
Two Factor ANOVA Equations
(continued)
r c n′

where: ∑∑∑ X
i=1 j=1 k =1
ijk

X= = Grand Mean
c n′
rcn′
∑∑ X
j=1 k =1
ijk

Xi.. = = Mean of ith level of factor A (i = 1, 2, ..., r)

cn′
r n′

∑∑ X ijk
X. j. = i =1 k =1
= Mean of jth level of factor B (j = 1, 2, ..., c)
rn′
n′ Xijk
Xij. = ∑
r = number of levels of factor A
= Mean of cell ij c = number of levels of factor B
k =1 n ′
n’ = number of replications in each cell

Mean Square Calculations

SSA
MSA = Mean square factor A =
r −1

SSB
MSB = Mean square factor B =
c −1

SSAB
MSAB = Mean square interaction =
(r − 1)(c − 1)

SSE
MSE = Mean square error =
rc(n'−1)

Created by: Prabhat Mittal 27

E-mail ID: profmittal@yahoo.co.in
Two-Way ANOVA:
The F Test Statistic
F Test for Factor A Effect
H0: µ1.. = µ2.. = µ3.. = • • •
MSA Reject H0
H1: Not all µi.. are equal F=
MSE if F > FU

F Test for Factor B Effect

H0: µ.1. = µ.2. = µ.3. = • • •
MSB Reject H0
H1: Not all µ.j. are equal F=
MSE if F > FU

F Test for Interaction Effect

H0: the interaction of A and B is
equal to zero
MSAB
H1: interaction of A and B is not F= Reject H0
MSE if F > FU
zero

Two-Way ANOVA
Summary Table
Source of Sum of Degrees of Mean F
Variation Squares Freedom Squares Statistic

MSA MSA
Factor A SSA r–1
= SSA /(r – 1) MSE
MSB MSB
Factor B SSB c–1
= SSB /(c – 1) MSE

AB MSAB MSAB
SSAB (r – 1)(c – 1)
(Interaction) = SSAB / (r – 1)(c – 1) MSE

MSE =
Error SSE rc(n’ – 1)
SSE/rc(n’ – 1)
Total SST n–1

Created by: Prabhat Mittal 28

E-mail ID: profmittal@yahoo.co.in
Features of Two-Way ANOVA
F Test
 Degrees of freedom always add up
 n-1 = rc(n’-1) + (r-1) + (c-1) + (r-1)(c-1)
 Total = error + factor A + factor B + interaction

 The denominator of the F Test is always the

same but the numerator is different
 The sums of squares always add up
 SST = SSE + SSA + SSB + SSAB
 Total = error + factor A + factor B + interaction

Examples:
Interaction vs. No Interaction
 Interaction is
 No interaction:
present:

Factor B Level 1
Mean Response

Mean Response

Factor B Level 1
Factor B Level 3

Factor B Level 2
Factor B Level 2
Factor B Level 3

Created by: Prabhat Mittal 29

E-mail ID: profmittal@yahoo.co.in
Multiple Comparisons:
The Tukey Procedure
 Unless there is a significant interaction, you
can determine the levels that are significantly
different using the Tukey procedure
 Consider all absolute mean differences and
compare to the calculated critical range
 Example: Absolute differences X1.. − X 2..
for factor A, assuming three factors:
X1.. − X 3..

X 2.. − X 3..

Multiple Comparisons:
The Tukey Procedure
 Critical Range for Factor A:
MSE
Critical Range = QU
c n'
(where Qu is from Table E.10 with r and rc(n’–1) d.f.)

 Critical Range for Factor B:

MSE
Critical Range = QU
r n'
(where Qu is from Table E.10 with c and rc(n’–1) d.f.)

Created by: Prabhat Mittal 30

E-mail ID: profmittal@yahoo.co.in
Summary
 Described one-way analysis of variance
 The logic of ANOVA
 ANOVA assumptions
 F test for difference in c means
 The Tukey-Kramer procedure for multiple comparisons
 Considered the Randomized Block Design
 Treatment and Block Effects
 Multiple Comparisons: Tukey Procedure
 Described two-way analysis of variance
 Examined effects of multiple factors
 Examined interaction between factors

Created by: Prabhat Mittal 31

E-mail ID: profmittal@yahoo.co.in