You are on page 1of 31

Statistics for Business Analysis

Day 10
Session-I

ANOVA: Analysis of Variance

Learning Objectives

 To compare more than two population means


using one way analysis of variance
 To use the F distribution to test hypotheses about
two population variances
 To learn about randomized block design
 To learn the technique of two-way analysis of
variance and the concept of interaction

Created by: Prabhat Mittal 1


E-mail ID: profmittal@yahoo.co.in
Chapter Overview

Analysis of Variance (ANOVA)

One-Way Randomized Two-Way


ANOVA Block Design ANOVA

F-test Multiple Interaction


Comparisons Effects
Tukey-
Kramer
test

General ANOVA Setting


 Investigator controls one or more independent
variables
 Called factors (or treatment variables)
 Each factor contains two or more levels (or groups or
categories/classifications)
 Observe effects on the dependent variable
 Response to levels of independent variable
 Experimental design: the plan used to collect
the data

Created by: Prabhat Mittal 2


E-mail ID: profmittal@yahoo.co.in
Completely Randomized Design

 Experimental units (subjects) are assigned


randomly to treatments
 Subjects are assumed homogeneous
 Only one factor or independent variable
 With two or more treatment levels
 Analyzed by one-way analysis of variance
(ANOVA)

One-Way Analysis of Variance

 Evaluate the difference among the means of three


or more groups
Examples: Accident rates for 1st, 2nd, and 3rd shift
Expected mileage for five brands of tires

 Assumptions
 Populations are normally distributed

 Populations have equal variances

 Samples are randomly and independently drawn

Created by: Prabhat Mittal 3


E-mail ID: profmittal@yahoo.co.in
Hypotheses of One-Way ANOVA

 H0 : µ1 = µ2 = µ3 = L = µc
 All population means are equal
 i.e., no treatment effect (no variation in means among
groups)


H1 : Not all of the population means are the same
 At least one population mean is different
 i.e., there is a treatment effect
 Does not mean that all population means are different
(some pairs may be the same)

One-Factor ANOVA
H0 : µ1 = µ2 = µ3 = L = µc
H1 : Not all µ j are the same

All Means are the same:


The Null Hypothesis is True
(No Treatment Effect)

µ1 = µ2 = µ3

Created by: Prabhat Mittal 4


E-mail ID: profmittal@yahoo.co.in
One-Factor ANOVA
(continued)
H0 : µ1 = µ2 = µ3 = L = µc
H1 : Not all µ j are the same
At least one mean is different:
The Null Hypothesis is NOT true
(Treatment Effect is present)

or

µ1 = µ2 ≠ µ3 µ1 ≠ µ2 ≠ µ3

Partitioning the Variation

 Total variation can be split into two parts:

SST = SSA + SSW

SST = Total Sum of Squares


(Total variation)
SSA = Sum of Squares Among Groups
(Among-group variation)
SSW = Sum of Squares Within Groups
(Within-group variation)

Created by: Prabhat Mittal 5


E-mail ID: profmittal@yahoo.co.in
Partitioning the Variation
(continued)

SST = SSA + SSW

Total Variation = the aggregate dispersion of the individual


data values across the various factor levels (SST)

Among-Group Variation = dispersion between the factor


sample means (SSA)

Within-Group Variation = dispersion that exists among


the data values within a particular factor level (SSW)

Partition of Total Variation

Total Variation (SST)


d.f. = n – 1

Variation Due to Variation Due to Random


= Factor (SSA) + Sampling (SSW)
d.f. = c – 1 d.f. = n – c

Commonly referred to as: Commonly referred to as:


 Sum of Squares Between  Sum of Squares Within
 Sum of Squares Among  Sum of Squares Error
 Sum of Squares Explained  Sum of Squares Unexplained
 Among Groups Variation  Within-Group Variation

Created by: Prabhat Mittal 6


E-mail ID: profmittal@yahoo.co.in
Total Sum of Squares
SST = SSA + SSW
c nj

SST = ∑∑ ( Xij − X)2


Where: j=1 i =1

SST = Total sum of squares


c = number of groups (levels or treatments)
nj = number of observations in group j
Xij = ith observation from group j
X = grand mean (mean of all data values)

Total Variation
(continued)

SST = ( X11 − X)2 + ( X12 − X)2 + ... + ( Xcnc − X)2


Response, X

Group 1 Group 2 Group 3

Created by: Prabhat Mittal 7


E-mail ID: profmittal@yahoo.co.in
Among-Group Variation
SST = SSA + SSW
c
SSA = ∑ n j ( X j − X)2
j=1
Where:
SSA = Sum of squares among groups
c = number of groups
nj = sample size from group j
Xj = sample mean from group j
X = grand mean (mean of all data values)

Among-Group Variation
(continued)
c
SSA = ∑ n j ( X j − X)2
j=1

Variation Due to SSA


Differences Among Groups
MSA =
c −1
Mean Square Among =
SSA/degrees of freedom

µi µj

Created by: Prabhat Mittal 8


E-mail ID: profmittal@yahoo.co.in
Among-Group Variation
(continued)

SSA = n1 ( x1 − x )2 + n 2 ( x 2 − x )2 + ... + nc ( x c − x )2

Response, X

X3
X2 X
X1

Group 1 Group 2 Group 3

Within-Group Variation
SST = SSA + SSW
c nj

SSW = ∑ ∑ ( Xij − X j )2
j=1 i=1
Where:

SSW = Sum of squares within groups


c = number of groups
nj = sample size from group j
Xj = sample mean from group j
Xij = ith observation in group j

Created by: Prabhat Mittal 9


E-mail ID: profmittal@yahoo.co.in
Within-Group Variation
(continued)

c nj

SSW = ∑ ∑ ( Xij − X j )2
j=1 i=1

Summing the variation SSW


within each group and then
MSW =
adding over all groups n−c
Mean Square Within =
SSW/degrees of freedom

µj

Within-Group Variation
(continued)

SSW = ( x11 − X1 )2 + ( X12 − X 2 )2 + ... + ( Xcnc − Xc )2

Response, X

X3
X2
X1

Group 1 Group 2 Group 3

Created by: Prabhat Mittal 10


E-mail ID: profmittal@yahoo.co.in
Obtaining the Mean Squares

SSA
MSA =
c −1
SSW
MSW =
n−c
SST
MST =
n −1

One-Way ANOVA Table

Source of SS df MS F ratio
Variation (Variance)
Among SSA MSA
SSA c-1 MSA = F=
Groups c-1 MSW
Within SSW
SSW n-c MSW =
Groups n-c
SST =
Total n-1
SSA+SSW
c = number of groups
n = sum of the sample sizes from all groups
df = degrees of freedom

Created by: Prabhat Mittal 11


E-mail ID: profmittal@yahoo.co.in
One-Way ANOVA
F Test Statistic
H0: µ1= µ2 = … = µc
H1: At least two population means are different

 Test statistic MSA


F=
MSW
MSA is mean squares among groups
MSW is mean squares within groups

 Degrees of freedom
 df1 = c – 1 (c = number of groups)
 df2 = n – c (n = sum of sample sizes from all populations)

Interpreting One-Way ANOVA


F Statistic
 The F statistic is the ratio of the among
estimate of variance and the within estimate
of variance
 The ratio must always be positive
 df1 = c -1 will typically be small
 df2 = n - c will typically be large

Decision Rule:
 Reject H0 if F > FU, α = .05
otherwise do not
reject H0 0 Do not Reject H0
reject H0
FU

Created by: Prabhat Mittal 12


E-mail ID: profmittal@yahoo.co.in
One-Way ANOVA
F Test Example

You want to see if three Club 1 Club 2 Club 3


different golf clubs yield 254 234 200
different distances. You 263 218 222
randomly select five 241 235 197
measurements from trials on 237 227 206
an automated driving 251 216 204
machine for each club. At the
0.05 significance level, is
there a difference in mean
distance?

One-Way ANOVA Example:


Scatter Diagram
Distance
Club 1 Club 2 Club 3 270
254 234 200 260 •
263 218 222 ••
241 235 197
250 X1
240 •
237 227 206 • ••
251 216 204 230
220

X2 • X
••
210
x1 = 249.2 x 2 = 226.0 x 3 = 205.8 200
••
••
X3
x = 227.0 190

1 2 3
Club

Created by: Prabhat Mittal 13


E-mail ID: profmittal@yahoo.co.in
One-Way ANOVA Example
Computations
Club 1 Club 2 Club 3 X1 = 249.2 n1 = 5
254 234 200 X2 = 226.0 n2 = 5
263 218 222
X3 = 205.8 n3 = 5
241 235 197
237 227 206 n = 15
X = 227.0
251 216 204 c=3
SSA = 5 (249.2 – 227)2 + 5 (226 – 227)2 + 5 (205.8 – 227)2 = 4716.4
SSW = (254 – 249.2)2 + (263 – 249.2)2 +…+ (204 – 205.8)2 = 1119.6

MSA = 4716.4 / (3-1) = 2358.2 2358.2


F= = 25.275
MSW = 1119.6 / (15-3) = 93.3 93.3

One-Way ANOVA Example


Solution
H0: µ1 = µ2 = µ3 Test Statistic:
H1: µj not all equal
MSA 2358.2
α = 0.05 F= = = 25.275
df1= 2 df2 = 12 MSW 93.3

Critical Decision:
Value:
Reject H0 at α = 0.05
FU = 3.89
α = .05 Conclusion:
There is evidence that
0 Do not Reject H0 at least one µj differs
reject H0 F = 25.275
FU = 3.89 from the rest

Created by: Prabhat Mittal 14


E-mail ID: profmittal@yahoo.co.in
One-Way ANOVA
Excel Output
EXCEL: tools | data analysis | ANOVA: single factor
SUMMARY
Groups Count Sum Average Variance
Club 1 5 1246 249.2 108.2
Club 2 5 1130 226 77.5
Club 3 5 1029 205.8 94.2
ANOVA
Source of
SS df MS F P-value F crit
Variation
Between
4716.4 2 2358.2 25.275 4.99E-05 3.89
Groups
Within
1119.6 12 93.3
Groups
Total 5836.0 14

Numerical Problems
Ref: 11-28. Page no. 604 The following data show the
number of claims processed per day for a group of
four insurance company employees observed for a
number of days. Test the hypothesis that the
employees’ mean claims per day are all the same.
Use the 0.05 level of significance.
Employee 1 15 17 14 12
Employee 2 12 10 13 17
Employee 3 11 14 13 15 12
Employee 4 13 12 12 14 10 9

Solution: H0: µ1 = µ2 = µ3 = µ4 & H1: µj not all equal

Created by: Prabhat Mittal 15


E-mail ID: profmittal@yahoo.co.in
One-Way ANOVA Excel Output
EXCEL: tools | data analysis | ANOVA: single factor
SUMMARY

Groups Count Sum Average Variance


Employee 1 4 58 14.5 4.333333
Employee 2 4 52 13 8.666667
Employee 3 5 65 13 2.5
Employee 4 6 70 11.67 3.466667
Sources of variation SS Df Ms F P-value F Crit
Between groups 19.45 3 6.48 1.46 0.26 3.28

Within Groups 66.33 15 4.42


Total 85.78 18
Do not reject Ho. The employees' productivities are not
significantly different

Statistics for Business Analysis

Day 10
Session-II

ANOVA: Analysis of Variance

Created by: Prabhat Mittal 16


E-mail ID: profmittal@yahoo.co.in
The Tukey-Kramer Procedure

 Tells which population means are significantly


different
 e.g.: µ1 = µ2 ≠ µ3
 Done after rejection of equal means in ANOVA
 Allows pair-wise comparisons
 Compare absolute mean differences with critical
range

µ1= µ2 µ3 x

Tukey-Kramer Critical Range

MSW  1 1 
Critical Range = QU +
2  n j n j' 

where:
QU = Value from Studentized Range Distribution
with c and n - c degrees of freedom for
the desired level of α (see appendix E.9 table)
MSW = Mean Square Within
nj and nj’ = Sample sizes from groups j and j’

Created by: Prabhat Mittal 17


E-mail ID: profmittal@yahoo.co.in
The Tukey-Kramer Procedure:
Example
1. Compute absolute mean
Club 1 Club 2 Club 3 differences:
254 234 200
263 218 222 x1 − x 2 = 249.2 − 226.0 = 23.2
241 235 197 x1 − x 3 = 249.2 − 205.8 = 43.4
237 227 206
251 216 204 x 2 − x 3 = 226.0 − 205.8 = 20.2

2. Find the QU value from the table in appendix E.10 with


c = 3 and (n – c) = (15 – 3) = 12 degrees of freedom
for the desired level of α (α = 0.05 used here):

QU = 3.77

The Tukey-Kramer Procedure:


Example
(continued)
3. Compute Critical Range:
MSW  1 1  93.3  1 1 
Critical Range = QU + = 3.77  +  = 16.285
2  n j n j'  2 5 5

4. Compare:
5. All of the absolute mean differences x1 − x 2 = 23.2
are greater than critical range.
Therefore there is a significant x1 − x 3 = 43.4
difference between each pair of
means at 5% level of significance. x 2 − x 3 = 20.2
Thus, with 95% confidence we can conclude
that the mean distance for club 1 is greater
than club 2 and 3, and club 2 is greater than
club 3.

Created by: Prabhat Mittal 18


E-mail ID: profmittal@yahoo.co.in
The Randomized Block Design

 Like One-Way ANOVA, we test for equal


population means (for different factor levels, for
example)...

 ...but we want to control for possible variation


from a second factor (with two or more levels)

 Levels of the secondary factor are called blocks

Partitioning the Variation

 Total variation can now be split into three parts:

SST = SSA + SSBL + SSE

SST = Total variation


SSA = Among-Group variation
SSBL = Among-Block variation
SSE = Random variation

Created by: Prabhat Mittal 19


E-mail ID: profmittal@yahoo.co.in
Sum of Squares for Blocking
SST = SSA + SSBL + SSE

r
SSBL = c ∑ ( Xi. − X)2
i=1
Where:

c = number of groups
r = number of blocks
Xi. = mean of all values in block i
X = grand mean (mean of all data values)

Partitioning the Variation


 Total variation can now be split into three parts:

SST = SSA + SSBL + SSE

SST and SSA are SSE = SST – (SSA + SSBL)


computed as they were
in One-Way ANOVA

Created by: Prabhat Mittal 20


E-mail ID: profmittal@yahoo.co.in
Mean Squares

SSBL
MSBL = Mean square blocking =
r −1

SSA
MSA = Mean square among groups =
c −1

SSE
MSE = Mean square error =
(r − 1)(c − 1)

Randomized Block ANOVA Table


Source of
SS df MS F ratio
Variation
Among MSA
Treatments SSA c-1 MSA
MSE
Among SSBL r-1 MSBL MSBL
Blocks
MSE
Error SSE (r–1)(c-1) MSE

Total SST rc - 1
c = number of populations rc = sum of the sample sizes from all populations
r = number of blocks df = degrees of freedom

Created by: Prabhat Mittal 21


E-mail ID: profmittal@yahoo.co.in
Blocking Test
H0 : µ1. = µ2. = µ3. = ...
H1 : Not all block means are equal

MSBL
F=
MSE
 Blocking test: df1 = r – 1
df2 = (r – 1)(c – 1)

Reject H0 if F > FU

Main Factor Test


H0 : µ.1 = µ.2 = µ.3 = ... = µ.c
H1 : Not all population means are equal

MSA
F=
MSE
 Main Factor test: df1 = c – 1
df2 = (r – 1)(c – 1)

Reject H0 if F > FU

Created by: Prabhat Mittal 22


E-mail ID: profmittal@yahoo.co.in
The Tukey Procedure

 To test which population means are significantly


different
 e.g.: µ1 = µ2 ≠ µ3
 Done after rejection of equal means in randomized
block ANOVA design
 Allows pair-wise comparisons
 Compare absolute mean differences with critical
range

µ1= µ2 µ3 x

The Tukey Procedure


(continued)

MSE
Critical Range = Qu
r

Compare:
Is x.j − x.j' > Critical Range ? x.1 − x .2
If the absolute mean difference x.1 − x .3
is greater than the critical range
then there is a significant x.2 − x .3
difference between that pair of
means at the chosen level of etc...
significance.

Created by: Prabhat Mittal 23


E-mail ID: profmittal@yahoo.co.in
Factorial Design:
Two-Way ANOVA
 Examines the effect of
 Two factors of interest on the dependent
variable
 e.g., Percent carbonation and line speed on soft drink
bottling process
 Interaction between the different levels of these
two factors
 e.g., Does the effect of one particular carbonation
level depend on which level the line speed is set?

Two-Way ANOVA
(continued)

 Assumptions

 Populations are normally distributed


 Populations have equal variances
 Independent random samples are
drawn

Created by: Prabhat Mittal 24


E-mail ID: profmittal@yahoo.co.in
Two-Way ANOVA
Sources of Variation
Two Factors of interest: A and B
r = number of levels of factor A
c = number of levels of factor B
n’ = number of replications for each cell
n = total number of observations in all cells
(n = rcn’)
Xijk = value of the kth observation of level i of
factor A and level j of factor B

Two-Way ANOVA
Sources of Variation (continued)

SST = SSA + SSB + SSAB + SSE Degrees of


Freedom:
SSA r–1
Factor A Variation

SST SSB c–1


Factor B Variation
Total Variation
SSAB
Variation due to interaction (r – 1)(c – 1)
between A and B
n-1
SSE rc(n’ – 1)
Random variation (Error)

Created by: Prabhat Mittal 25


E-mail ID: profmittal@yahoo.co.in
Two Factor ANOVA Equations

Total Variation: r c n′
SST = ∑∑∑ ( Xijk − X)2
i=1 j =1 k =1

Factor A Variation: r
SSA = cn′∑ ( Xi.. − X)2
i=1

Factor B Variation:
c
SSB = rn′∑ ( X. j. − X)2
j =1

Two Factor ANOVA Equations


(continued)

Interaction Variation:
r c
SSAB = n′∑∑ ( Xij. − Xi.. − X.j. + X)2
i =1 j=1

Sum of Squares Error:


r c n′
SSE = ∑∑∑ ( Xijk − Xij. )2
i=1 j=1 k =1

Created by: Prabhat Mittal 26


E-mail ID: profmittal@yahoo.co.in
Two Factor ANOVA Equations
(continued)
r c n′

where: ∑∑∑ X
i=1 j=1 k =1
ijk

X= = Grand Mean
c n′
rcn′
∑∑ X
j=1 k =1
ijk

Xi.. = = Mean of ith level of factor A (i = 1, 2, ..., r)


cn′
r n′

∑∑ X ijk
X. j. = i =1 k =1
= Mean of jth level of factor B (j = 1, 2, ..., c)
rn′
n′ Xijk
Xij. = ∑
r = number of levels of factor A
= Mean of cell ij c = number of levels of factor B
k =1 n ′
n’ = number of replications in each cell

Mean Square Calculations


SSA
MSA = Mean square factor A =
r −1

SSB
MSB = Mean square factor B =
c −1

SSAB
MSAB = Mean square interaction =
(r − 1)(c − 1)

SSE
MSE = Mean square error =
rc(n'−1)

Created by: Prabhat Mittal 27


E-mail ID: profmittal@yahoo.co.in
Two-Way ANOVA:
The F Test Statistic
F Test for Factor A Effect
H0: µ1.. = µ2.. = µ3.. = • • •
MSA Reject H0
H1: Not all µi.. are equal F=
MSE if F > FU

F Test for Factor B Effect


H0: µ.1. = µ.2. = µ.3. = • • •
MSB Reject H0
H1: Not all µ.j. are equal F=
MSE if F > FU

F Test for Interaction Effect


H0: the interaction of A and B is
equal to zero
MSAB
H1: interaction of A and B is not F= Reject H0
MSE if F > FU
zero

Two-Way ANOVA
Summary Table
Source of Sum of Degrees of Mean F
Variation Squares Freedom Squares Statistic

MSA MSA
Factor A SSA r–1
= SSA /(r – 1) MSE
MSB MSB
Factor B SSB c–1
= SSB /(c – 1) MSE

AB MSAB MSAB
SSAB (r – 1)(c – 1)
(Interaction) = SSAB / (r – 1)(c – 1) MSE

MSE =
Error SSE rc(n’ – 1)
SSE/rc(n’ – 1)
Total SST n–1

Created by: Prabhat Mittal 28


E-mail ID: profmittal@yahoo.co.in
Features of Two-Way ANOVA
F Test
 Degrees of freedom always add up
 n-1 = rc(n’-1) + (r-1) + (c-1) + (r-1)(c-1)
 Total = error + factor A + factor B + interaction

 The denominator of the F Test is always the


same but the numerator is different
 The sums of squares always add up
 SST = SSE + SSA + SSB + SSAB
 Total = error + factor A + factor B + interaction

Examples:
Interaction vs. No Interaction
 Interaction is
 No interaction:
present:

Factor B Level 1
Mean Response

Mean Response

Factor B Level 1
Factor B Level 3

Factor B Level 2
Factor B Level 2
Factor B Level 3

Factor A Levels Factor A Levels

Created by: Prabhat Mittal 29


E-mail ID: profmittal@yahoo.co.in
Multiple Comparisons:
The Tukey Procedure
 Unless there is a significant interaction, you
can determine the levels that are significantly
different using the Tukey procedure
 Consider all absolute mean differences and
compare to the calculated critical range
 Example: Absolute differences X1.. − X 2..
for factor A, assuming three factors:
X1.. − X 3..

X 2.. − X 3..

Multiple Comparisons:
The Tukey Procedure
 Critical Range for Factor A:
MSE
Critical Range = QU
c n'
(where Qu is from Table E.10 with r and rc(n’–1) d.f.)

 Critical Range for Factor B:

MSE
Critical Range = QU
r n'
(where Qu is from Table E.10 with c and rc(n’–1) d.f.)

Created by: Prabhat Mittal 30


E-mail ID: profmittal@yahoo.co.in
Summary
 Described one-way analysis of variance
 The logic of ANOVA
 ANOVA assumptions
 F test for difference in c means
 The Tukey-Kramer procedure for multiple comparisons
 Considered the Randomized Block Design
 Treatment and Block Effects
 Multiple Comparisons: Tukey Procedure
 Described two-way analysis of variance
 Examined effects of multiple factors
 Examined interaction between factors

Created by: Prabhat Mittal 31


E-mail ID: profmittal@yahoo.co.in