You are on page 1of 7

QMDS 202 Data Analysis and Modeling

Chapter 14 Analysis Of Variance


Definition:

ANOVA is the name given to the approach that allows us to use sample
data to see if the values of two or more unknown population means are
likely to be equal.

Assumptions: 1. The populations under study are normally distributed.


2. The samples are drawn randomly, and each sample is independent of
the other samples.
3. The populations from which the sample values are obtained all have
2
2
2
the same unknown population variance (2). That is, 1 2 3
k2 where k = no. of populations under study.
Hypothesis Testing Procedure
1. H0: 1 = 2 = = k
H1: Not all population means are equal.
2. Set the level of significance ().
3. F-distribution as the testing distribution.
4. Find the critical value with df1 = k 1 and df2 = n k and state the rejection rule.
n = the total number of items in all samples = n1 + n2 + + nk
nj = sample size of the jth sample
5. Compute the test statistic.
6. Make the statistical decision.

Key Terms in ANOVA


k

SS(Total) = Total Sum of Squares =

nj

( X
j 1 i 1

ij

X )2

nj

j 1 i 1

X ij2

nj

X
j 1 i 1

ij

nj

X
j 1 i 1

ij

Grand mean

SST = Sum of Squares Between Groups = Sum of Squares of Treatment groups

j 1

n j (X j X )2

nj

ij

i 1

nj

j 1

k nj

X
j 1 i 1

ij

SSE = Sum of Squares Within Groups = Sum of Squares of Error

2
= ( n j 1) s j
j 1

k nj

(X
j 1 i 1

nj

Where

s 2j

X
i 1

ij

Xj

k nj

X j ) X
2

ij

j 1 i 1

2
ij

j 1

nj

i 1

ij

nj

= sample variance of the jth sample

n j 1

MST = Mean Squares of Treatment Groups = SST / (k 1)


MSE = Mean Squares of Error = SSE / (n k)
SS(Total) = SST + SSE

Test Statistic of One-Way ANOVA


TS

MST
F
MSE

Example 1

Suppose that we want to compare the cholesterol contents of four


competing diet foods on the basis of the following data (in milligrams per
package) which were obtained from three 6-ounce packages of each of the
diet foods.
Brand A
3.6
4.1
4.0
Brand B
3.1
3.2
3.9
Brand C
3.2
3.5
3.5
Brand D
3.5
3.8
3.8
Perform a test at a significance level of 0.05 to see if the mean cholesterol
contents of the four diet foods are equal.

Solution:

Brand A
Brand B

3.6
3.1

4.1
3.2

4.0
3.9
2

T1 = 11.7
T2 = 10.2

Brand C
Brand D

3.2
3.5

3.5
3.8

3.5
3.8

T3 = 10.2
T4 = 11.1
T = 43.2

11.7 2 10.2 2 10.2 2 11 .12


43.2 2


3
3
3
3
12

SST =

= 156.06 155.52 = 0.54


x ij2 (3.6) 2 (4.1) 2 ... (3.8) 2 (3.8) 2 156.7

SS(Total) = 156.7 155.52 = 1.18


SSE = 1.18 0.54 = 0.64
The One-way ANOVA Summary Table of this problem:
Source
Treatments
Error
Total

df
3
8
11

SS
0.54
0.64
1.18

MS
0.18
0.08

F
2.25

H0: 1 = 2 = 3 = 4
H1: Not all population means are equal
= 0.05
Independent Populations. Assume that all the populations are normally
distributed with an equal variance 2 F-distribution will be used in the
ANOVA test.

df1 = k 1 = 4 1 = 3
df2 = nT k = 12 4 = 8
Reject H0 if TS > 4.07
TS

0.18
2.25
0.08

TS = 2.25 is not great than 4.07 Do not reject H0


Conclusion: The mean cholesterol contents of the four diet foods are not
significantly different.
OR
x1

( x1 x1 ) 2

x2

( x2 x2 ) 2

x3

( x3 x3 ) 2

x4

( x4 x4 ) 2

3.6
4.1
4.0
11.7

(-0.3)2
(0.2)2
(0.1)2
0.14

3.1
3.2
3.9
10.2

(-0.3)2
(-0.2)2
(0.5)2
0.38

3.2
3.5
3.5
10.2

(-0.2)2
(0.1)2
(0.1)2
0.06

3.5
3.8
3.8
11.1

(-0.2)2
(0.1)2
(0.1)2
0.06

x1 11.7
x 2 10.2

3.9
x2

3.4
n1
3
n2
3
x
x 4 11.1
10.2
x3 3
3.4
x4

3.7
n3
3
n4
3
n = 3 + 3 + 3 + 3 = 12
k=4
11.7 10.2 10.2 11.1
x
3.6
12
n ( x x ) 2 n 2 ( x 2 x ) 2 n3 ( x 3 x ) 2 n 4 ( x 4 x ) 2
MST 1 1
k 1
x1

3(3.9 3.6) 2 3(3.4 3.6) 2 3(3.4 3.6) 2 3(3.7 3.6) 2


0.54

0.18
4 1
3

MSE

( x1 x1 ) 2 ( x 2 x 2 ) 2 ( x3 x3 ) 2 ( x 4 x 4 ) 2
nk

0.14 0.38 0.06 0.06 0.64

0.08
12 4
8

The One-Way ANOVA Table


Source of
Variation
Treatments
Error
Total

Degrees of
Freedom (df)
k1
nk
n1

Sum of Squares
(SS)
SST
SSE
SS(Total)

Mean Square
(MS)
MST
MSE

F-Statistic
(TS)
F = MST / MSE

Randomized Block (Two-Way) Analysis of Variance


This procedure is used to test whether two or more population means are likely to be
equal when 1) the samples are dependent, 2) the populations are normally distributed, and
3) all populations have the same common variance 2.
Two-way ANOVA table:
Source
Treatments
Blocks
Error
Total

df
k1
b1
nkb+1
n1

SS
SST
SSB
SSE
SS(Total)

MS
MST
MSB
MSE

b = no. of blocks
k = no. of treatments
n = total number of observations = bk
SST =

1 2
T2
T1 T22 ... Tk2
b
bk

F
F = MST / MSE
F = MSB / MSE

SS(Total) = x ij2

T2
bk

S i total of the observations in the ith block

SSB = Sum of Squares for Blocks =

1 2
T2
S1 S 22 ... S b2
k
bk

MSB = Mean Square for Blocks = SSB / (b 1)


SSE = SS(Total) SST SSB
Example 2

In the previous problem, suppose now we learn something that we did not
know earlier - the measurements of the cholesterol contents were
performed in different laboratories. The first value of each sample, we
learn, came from one laboratory, the second value came from another
laboratory, and the third value came from a third laboratory. We might
picture the original data as follows:
Brand A
Brand B
Brand C
Brand D

Lab. 1
3.6
3.1
3.2
3.5

Lab. 2
4.1
3.2
3.5
3.8

Lab. 3
4.0
3.9
3.5
3.8

Test whether there is a difference in the mean cholesterol contents among


the four diet foods as well as whether there is a difference in the mean
results given by the three laboratories. Use = 0.05.
Lab. 1
3.6
3.1
3.2
3.5
13.4

Solution:
Brand A
Brand B
Brand C
Brand D

SS(Total) = 1.18

Lab. 2
4.1
3.2
3.5
3.8
14.6

Lab. 3
4.0
3.9
3.5
3.8
15.2

11.7
10.2
10.2
11.1
43.2

SST = 0.54

1
13.4 2 14.6 2 15.2 2 155.52 155.94 155.52 0.42
SSB =
4
SSE = 1.18 0.54 0.42 = 0.22
Source
Treatments
Blocks
Error
Total

df
3
2
6
11

SS
0.54
0.42
0.22
1.18

For the treatments:


H0: 1 = 2 = 3 = 4

MS
0.18
0.21
0.0367

F
4.90
5.72

H1: Not all population means are equal


= 0.05
v1 = 3
v2 = 6
C.V. = 4.76
TS = 4.90 > 4.76 Reject H0
The mean cholesterol contents of the four diet foods are significantly
different.
For the blocks:
H0: 1 = 2 = 3
H1: Not all population means are equal
= 0.05
v1 = 2
v2 = 6
C.V. = 5.14
TS = 5.72 > 5.14 Reject H0
The mean results given by the three laboratories are significantly different.

Example 3

The sample data in the following table are the marks in a statistics test
obtained by nine students from 3 majors who were taught by 3 different
instructors:

Marketing major
Finance major
Accounting major

Instructor
A
77
88
85

Instructor
B
88
97
95

Instructor
C
71
81
72

At 5% significance level, test whether the mean scores of the majors are
the same by using the instructors as blocks.
Solution:
Marketing major
Finance major
Accounting major

k=

Instructor
A
77
88
85

b=

n=
SST =

SSB =
x ij2

Instructor
B
88
97
95

Instructor
C
71
81
72

SS(total) =
SSE =
Source
Treatments
Blocks
Error
Total

df

SS

MS

For the treatments:


H0:
H1:
= 0.05
v1 =

v2 =

C.V. =

v2 =

C.V. =

Decision:
Conclusion:
For the blocks:
H0:
H1:
= 0.05
v1 =
Decision:
Conclusion:
Assumptions in this problem:

Review Problems: 14.1, 14.3, 14.51, 14.56, 14.58.

You might also like