Professional Documents
Culture Documents
ANALYSIS OF VARIANCE
Suppose that two machines are to be compared. Because these machines are operated by people and because of other, inexplicable reasons, output per hour is subject to chance fluctuation. In the hope of averaging out and thus reducing the effect of chance fluctuation, a random sample of 5 different hours is obtained from each machine and set out in the below table where each sample mean is then calculated. Assume that 2 populations are normally distributed. Are the machines really different? Machine 1 55 Machine 2 47
54
58 61 52
53
49 50 46
Characteristics of distribution
Mean = 1 2
Variance = Variance =
2 1 1 2
+ +
2 2 2 2
2 :( ;1) 2 1 ;1 1 2 2 1 :2 ;2
412.5:47.5 5:5;2
= 10
0 : 1 = 2 or 1 2 = 0 and 1 : 1 2
1 ;2 ;(1 ;2 )
2 2 : 1 2
56;49 ;0
10 10 : 5 5
= 3.5
The significance level = 0.01 = 2.896 = 3.5 > = 2.896 0 There is a difference between two population means.
47
53 49 50 46
55
54 58 61 52 2 = 56
54
50 51 51 49 3 = 51 = 52
1 = 49
-3
-1
( ) = 0
2
16
= 26
From one machine, 3 samples are taken. As expected, sampling fluctuations cause small difference in the even though the in this case are identical.
Question
Are the differences in 3 means from 3 different machines of the same order as the differences in 3 means from 3 samples? If so, the differences are due to chance fluctuation. If not, the differences among 3 means from 3 different machines are large enough to indicate a difference in the underlying 0 : 1 = 2 = 3
1 = 49
3 machines
Three different machines Three erratic machines
ANOVA:
Partition the total variability into smaller components Identify sources for each components Measure each the extent of each sources Conclude which source is the true cause for total variability
2 sources
Variability due to distinct population means (treatments) Variability due to all other sources (residual)
Content
The completely randomized design one way ANOVA The randomized complete block design two way ANOVA The Latin square design The factorial experiment 2 factors ANOVA models
Definitions
Treatment: a treatment is any factor that the experimenter controls. It may refer, for example, to a type of drug, one of several concentrations of a single drug, a new type of house paint, an advertising technique, or a particular training program Entity: the entity that receives a treatment is called an experimental unit Mean square: the mean square is an expression synonym with the word variance.
Definitions of ANOVA
=1
=1
;..
in a set
of data is partitioned into several components. Associated with each of these components is a specific source of variation, so that, in the analysis, it is possible
Definitions of ANOVA
The
One-way ANOVA
The
ANOVA model
= + (1)
=1
= = + (2)
= + +
Treatment effect (Between)
Error term deviation from (Within)
1 :(2)
Grand mean
ANOVA assumptions
=0
Increase the probability of rejecting a true null hypothesis. Equal population variances Normally distributed population
Test whether assumptions are violated Variance test Levenes test or Boxplot Normality test Boxplot, histogram, normal Q-Q test
Solution Make sample sizes equal Refine data set to eliminate outliers
ANOVA
& ,
<1 <1 2
=
<1 <1
<1 <1
+2
<1 <1
+
<1 <1
<1 <1
+2
<1
<1
+
<1
<1 <1
=
<1 <1
SS for error (Within)
+
<1
= =
<1
2 ; =1 ;1
2 2 = =
Is the total variability of data explained by underlying population means? test hypothesis
0 : 1 = 2 = = 1 :
SSTr is the measure of proximity of sample means to each other. If sample mean are remarkably different, this measure will large
= =
<1
=130
130 2
;1
= 65
=
=
<1
<1
1 1
<1
+
<1
2 2
+ +
<1
( 1) = ( 1)
<1
( 1) 1 1 2 + ( 1)
<1
( 1) 2 2 2 + + ( 1)
<1
2 2 = 1 1 + 1 2 + +
2 = ( 1) 1
2
<1
2 : ;1 2 :: ;1 ;1 1 2 :1 : :1 ::(:1)
2 =
= =
<1 ;
<1
= 7.83
= 94
94 12
F distribution
2 (;1)1 2 1
2 (;1)2 2 2
(1)2 1 2 1
(;1)
2 1
2 1
(1)2 2 2 2
(;1)
2 2
2 2
= 2
2
2 1 2 2
2 2 2 2 0 : 1 = 2 and 1 : 1 2 2 2 > 0 1 2
MSE is the estimate of common variance ( 2 ) MSTr is the estimate of total variability ( 2 ) Total variability is mainly explained by Treatment or all sources other than treatment?
=
2 2
0 : 1 = 2 = = 1 :
2 If 0 is rejected, then will be relatively large 2 F > 1. Therefore, the larger is compared to F, the less credible is the null hypothesis
65 7.83
= 8.3
Reject 0
Three machines are really different
ANOVA table (1 = 2 = = )
Source of variation Treatment: Differences between treatments Residual: Differences within treatment Sum of squares SS d.f. Mean square MS F ratio
=
<1
k-1
2 = 1
=
<1 <1
nk k
2 = =
2 2
Total
=
<1 <1
nk 1
ANOVA table (1 = 2 = 3 = 5)
Source of variation Treatment: Differences between machines Residual: Differences within machine Sum of squares SS d.f. Mean square MS F ratio
130
65
65 = 7.83
94
12
7.83
= 8.3
Total
244
14
ANOVA table (1 2 )
Source of variation Treatment: Differences between treatments Residual: Differences within treatment Sum of squares SS
d.f.
Mean square MS
F ratio
=
<1
k1
= 1 =
=
<1 <1
nk
2 =
Total
=
<1 <1
= 1 + 2 + +
Computational formula
Population sampled 1 11 11 11 Total .1 22 .2 2 12 22 33 .3 3 13 23 . .. k 1 2 Total
Mean
.1
<1
.2
.3
..
. = . = .. = .. =
= =
<1 <1
<1 . .. <1
= =
Computational formula
<1
2 <1
=1
2 =1
We call it C.
2 =1
2 .1 1
2 .2 2
+ +
=1
2 .
=1
2 =1 .
2 =1
t tests or ANOVA
Number of Population 3 (1 , 2 , 3 ) t test 1 = 2 2 = 3 1 = 3 Number of Population 2 (1 , 2 ) t test 1 = 2 ANOVA 1 = 2 1 = 2 = 3 ANOVA