8 Anova

ANOVA
ANALYSIS OF VARIANCE
Difference between two sample means
Suppose that two machines are to be compared. Because these machines are operated by people and because of other, inexplicable reasons, output per hour is subject to chance fluctuation. In the hope of averaging out and thus reducing the effect of chance fluctuation, a random sample of 5 different hours is obtained from each machine and set out in the below table where each sample mean is then calculated. Assume that 2 populations are normally distributed. Are the machines really different? Machine 1 55 Machine 2 47
54
58 61 52
53
49 50 46
Distribution of difference between two sample means
Characteristics of distribution
Mean = 1 2
Variance = Variance =
2 1 1 2
+ +
2 2 2 2
z statistic t statistic with (probability to commit
the type I error)

2 2 Equal population variance common variance (1 = 2 = 2 ) exists 2 : pooled estimate of the common population variance 2
2 :( ;1) 2 1 ;1 1 2 2 1 :2 ;2
Difference between two sample means

2
2 :( ;1) 2 1 ;1 1 2 2 1 :2 ;2
412.5:47.5 5:5;2
= 10
0 : 1 = 2 or 1 2 = 0 and 1 : 1 2
1 ;2 ;(1 ;2 )
2 2 : 1 2
56;49 ;0
10 10 : 5 5
= 3.5
The significance level = 0.01 = 2.896 = 3.5 > = 2.896 0 There is a difference between two population means.
Difference between three sample means

Machine 1 Machine 2 Machine 3
47
53 49 50 46
55
54 58 61 52 2 = 56
54
50 51 51 49 3 = 51 = 52
1 = 49
-3
-1
( ) = 0

2
16
= 26
Are the machines really different?

Sample 1 49 55 51 52 48 1 = 51 Sample 2 52 51 55 58 49 2 = 53 Sample 3 55 51 52 52 50 3 = 52
From one machine, 3 samples are taken. As expected, sampling fluctuations cause small difference in the even though the in this case are identical.
Question
Are the differences in 3 means from 3 different machines of the same order as the differences in 3 means from 3 samples? If so, the differences are due to chance fluctuation. If not, the differences among 3 means from 3 different machines are large enough to indicate a difference in the underlying 0 : 1 = 2 = 3
Samples from three erratic machines means

Machine 1 Machine 2 50 42 53 45 55 48 57 65 59 51 2 = 56 Machine 3 57 59 48 46 45 3 = 51 = 52
1 = 49
3 machines
Three different machines Three erratic machines
ANOVA Analysis of Variance
ANOVA:
Partition the total variability into smaller components Identify sources for each components Measure each the extent of each sources Conclude which source is the true cause for total variability
2 sources
Variability due to distinct population means (treatments) Variability due to all other sources (residual)
Content
The completely randomized design one way ANOVA The randomized complete block design two way ANOVA The Latin square design The factorial experiment 2 factors ANOVA models
Definitions
Treatment: a treatment is any factor that the experimenter controls. It may refer, for example, to a type of drug, one of several concentrations of a single drug, a new type of house paint, an advertising technique, or a particular training program Entity: the entity that receives a treatment is called an experimental unit Mean square: the mean square is an expression synonym with the word variance.
Definitions of ANOVA
ANOVA: Analysis of variance is a technique whereby
the total variation present [
=1
=1
;..
in a set
of data is partitioned into several components. Associated with each of these components is a specific source of variation, so that, in the analysis, it is possible
to ascertain the magnitude of the contribution of each of

these sources to the total variation
Definitions of ANOVA
The
completely randomized design
One-way ANOVA
The
randomized complete block
design Two-way ANOVA

The
Latin square design
The completely randomized design One-way ANOVA

Assign the treatments at random to experimental units 4 brands of tires (A, B, C, D) 4 treatments Assign randomly 10 tires each brand to 20 cars 20 experimental units Observe number of km driven until the tread wear occurs. Analyze variance to decide whether or not the brands differ with respect to expected tire km. Define variation due to brand difference, and variation die to all sources other than brands.
ANOVA model

= + (1)
=1
= = + (2)
= + +
Treatment effect (Between)
Error term deviation from (Within)
1 :(2)
Grand mean
ANOVA assumptions
Fixed-effects model assumption

The k sets of observed data constitute k independent random samples form specified populations Each of the population represented by a sample is normally distributed with the mean and variance 2
Each of population has the same variance. That is 2 2 2 1 = 2 = = =
The are unknown constants. And <1
=0
Violation of ANOVA assumption
Increase the probability of rejecting a true null hypothesis. Equal population variances Normally distributed population
Test whether assumptions are violated Variance test Levenes test or Boxplot Normality test Boxplot, histogram, normal Q-Q test
Solution Make sample sizes equal Refine data set to eliminate outliers
ANOVA

& ,

<1 <1 2
=
<1 <1

<1 <1
+2
<1 <1
+
<1 <1
<1 <1
+2
<1

<1
+
<1

<1 <1
=
<1 <1

SS for error (Within)
+
<1
Total Sum of square (Variation)
SS for treatment (Between)
Sum of squares for treatment
= =
<1
2 ; =1 ;1
2 2 = =
Is the total variability of data explained by underlying population means? test hypothesis
0 : 1 = 2 = = 1 :
SSTr is the measure of proximity of sample means to each other. If sample mean are remarkably different, this measure will large
Sum of squares for treatment

= =
<1
=130
130 2
;1
= 65
Sum of squares for error
=
=
<1
<1
1 1
<1
+
<1
2 2
+ +
<1
( 1) = ( 1)
<1
( 1) 1 1 2 + ( 1)
<1
( 1) 2 2 2 + + ( 1)

<1
2 2 = 1 1 + 1 2 + +
2 = ( 1) 1
2
<1
2 : ;1 2 :: ;1 ;1 1 2 :1 : :1 ::(:1)
2 =
the pooled estimate of the common population variance
When the total variability is explained mainly by SSE, this
measure will be large.
Sum of squares for error

= =
<1 ;
<1
= 7.83
= 94
94 12
F distribution
Independent Chi squared variables
2 (;1)1 2 1
2 (;1)2 2 2
Divide each Chi-squared variable by its df
(1)2 1 2 1
(;1)
2 1
2 1
(1)2 2 2 2
(;1)
2 2
2 2
Ratio of those two is F distributed

2 1
= 2
2
2 1 2 2
2 2 2 2 0 : 1 = 2 and 1 : 1 2 2 2 > 0 1 2
F distribution MSE and MSTr

MSE is the estimate of common variance ( 2 ) MSTr is the estimate of total variability ( 2 ) Total variability is mainly explained by Treatment or all sources other than treatment?
=

2 2
0 : 1 = 2 = = 1 :
2 If 0 is rejected, then will be relatively large 2 F > 1. Therefore, the larger is compared to F, the less credible is the null hypothesis
F distribution MSE and MSTr

65 7.83
= 8.3
With numerator d.f. of 2; denominator d.f. of 12 and = 0.01

Critical F = 6.93
Computed F > critical F
Reject 0
Three machines are really different
ANOVA table (1 = 2 = = )
Source of variation Treatment: Differences between treatments Residual: Differences within treatment Sum of squares SS d.f. Mean square MS F ratio
=
<1
k-1
2 = 1
=
<1 <1
nk k
2 = =
2 2
Total
=
<1 <1
nk 1
ANOVA table (1 = 2 = 3 = 5)
Source of variation Treatment: Differences between machines Residual: Differences within machine Sum of squares SS d.f. Mean square MS F ratio
130
65
65 = 7.83
94
12
7.83
= 8.3
Total
244
14
ANOVA table (1 2 )
Source of variation Treatment: Differences between treatments Residual: Differences within treatment Sum of squares SS
d.f.
Mean square MS
F ratio
=
<1
k1
= 1 =
=
<1 <1
nk
2 =
Total
=
<1 <1
= 1 + 2 + +
Computational formula
Population sampled 1 11 11 11 Total .1 22 .2 2 12 22 33 .3 3 13 23 . .. k 1 2 Total
Mean
.1
<1
.2
.3
..
. = . = .. = .. =
= =
<1 <1
<1 . .. <1
= =
Computational formula
<1
2 <1
=1
2 =1
We call it C.
2 =1
2 .1 1
2 .2 2
+ +
=1
2 .
=1
2 =1 .
2 =1
When all samples are equal in size
t tests or ANOVA
Number of Population 3 (1 , 2 , 3 ) t test 1 = 2 2 = 3 1 = 3 Number of Population 2 (1 , 2 ) t test 1 = 2 ANOVA 1 = 2 1 = 2 = 3 ANOVA

8 Anova

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

8 Anova

Uploaded by

Copyright:

Available Formats

ANOVA

Difference between two sample means

Distribution of difference between two sample means

z statistic t statistic with (probability to commit

the type I error)

Difference between two sample means

Difference between three sample means

Are the machines really different?

Difference between three sample means

Difference between three sample means

Samples from three erratic machines means

ANOVA Analysis of Variance

ANOVA: Analysis of variance is a technique whereby

the total variation present [

to ascertain the magnitude of the contribution of each of

completely randomized design

randomized complete block

design Two-way ANOVA

Latin square design

The completely randomized design One-way ANOVA

Fixed-effects model assumption

Violation of ANOVA assumption

Total Sum of square (Variation)

SS for treatment (Between)

Sum of squares for treatment

Sum of squares for treatment

Sum of squares for error

the pooled estimate of the common population variance

When the total variability is explained mainly by SSE, this

measure will be large.

Sum of squares for error

Independent Chi squared variables

Divide each Chi-squared variable by its df

Ratio of those two is F distributed

F distribution MSE and MSTr

F distribution MSE and MSTr

With numerator d.f. of 2; denominator d.f. of 12 and = 0.01

Computed F > critical F

When all samples are equal in size

You might also like