6.3 - Anova

CHAPTER 6
Statistical Inference & Hypothesis Testing
6.1 - One Sample

Mean , Variance 2, Proportion
6.2 - Two Samples

Means, Variances, Proportions
1 vs. 2 12 vs. 22 1 vs. 2
6.3 - Multiple Samples

1, , k 12, , k2 1, , k
CHAPTER 6
Statistical Inference & Hypothesis Testing
6.1 - One Sample

Mean , Variance 2, Proportion
6.2 - Two Samples

1 vs. 2 12 vs. 22 1 vs. 2
6.3 - Multiple Samples

1, , k 12, , k2 1, , k
Example: Y = $ Cost of a certain medical service
Assume Y is known to be normally distributed at each of k = 2 health care facilities (groups).
Hospital: Y1 ~ N(1, 1) Clinic: Y2 ~ N(2, 2)
Null Hypothesis H0: 1 = 2,
i.e., 1 2 = 0
(No difference exists.")
2-sided test at significance level = .05
Data: Sample 1 = {667, 653, 614, 612, 604}; n1 = 5 Sample 2 = {593, 525, 520}; n2 = 3
Analysis via T-test (if equivariance holds): Point estimates y = yi / n
NOTE:
Group Means y1 =
667 + 653 + 614 + 612 + 604
5 = 630 y2 = 593 + 525 + 520
3 = 546 y1 - y2 = 84 >0
Group 630)) 2 +K + (604 - 630 )2
(667 - 630 (593 - 546 2
+K + (520 - 546 2
= 1663 F =

= 2.11 < 4
Variances s1 =
2
5 -1 = 788.5 s2 2 = 546))
3-1
546)) 1663
788.5
s2 = SS/df SS1 SS2

2 2
Pooled 2
spooled = (5( n1--1)(1)n788.5
s1 + ()n+2 -(3
1)-s1)(
2 1663 )
= 1080 The pooled variance is a weighted average of the group
Variance 1 + n52+-32- 2 variances, using the degrees of freedom as the weights.
i.e., 1 2 = 0
NOTE:
Group Means y1 =
667 + 653 + 614 + 612 + 604
5 = 630 y2 = 593 + 525 + 520
3 = 546 y1 - y2 = 84 >0
Group 630)) 2 +K + (604 - 630 )2
(667 - 630 (593 - 546 22
46 ) 22
+K + (520 - 5546)
Variances s1 =
2
5 -1 = 788.5 s2 2 = 546))
3-1 = 1663 F = 1663
788.5 = 2.11 < 4
s2 = SS/df SSErr = 6480
2 2
Pooled 2
spooled = (5( n1--1)(1)n788.
s1 +5()n+2 -(3
1)-s1)(
2 1663 )
dfErr = 6
p-value ==2 P (Y1 - Y2 84) = 2 P ( T6 24 ) = 2 P T6 3.5
p-value 84 - 0
( )
Standard 11 1 1 > 2 * (1 - pt(3.5, 6)) Reject H0 at = .05
Error s.e.0 = s 2
1080 ++ = 24
pooled [1] 0.01282634 stat signif, Hosp > Clinic
5n1 3n2
R code:
> y1 = c(667, 653, 614, 612, 604)
> y2 = c(593, 525, 520)
>
> t.test(y1, y2, var.equal = T)
Formal Conclusion
Two Sample t-test
p-value < = .05
data: y1 and y2 Reject H0 at this level.
t = 3.5, df = 6, p-value = 0.01283
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
25.27412 142.72588
sample estimates: Interpretation
mean of x mean of y
630 546 The samples provide evidence that the
difference between mean costs is (moderately)
statistically significant, at the 5% level, with
the hospital being higher than the clinic (by an
average of $84).
Alternate method ~
Analysis of Variance (ANOVA)

Main Idea: Among several (k 2) independent, equivariant,
normally-distributed treatment groups
Total Variability = Variability between groups + Variability within groups
Y1 Y2 Yk
kk
L
1
1 2
2
Null
H:
sis?
0
m1 = m2 = K = mk
pot he
Hy HA: At least one treatment mean i is
significantly different from the others.
i.e., 1 2 = 0
ANOVA F-test (if equivariance holds): Point estimates y = yi / n
NOTE:
Group Means y1 =
667 + 653 + 614 + 612 + 604
5 = 630 y2 = 593 + 525 + 520
3 = 546 y1 - y2 = 84 >0
5 (630) 3 (546)
Grand Mean
667 + 653 + 614 + 612 + 604 + 593 + 525 + 520
y= = 598.50
5+3
The grand mean is a weighted average of the group

means, using the sample sizes as the weights.
Alternate method ~

Y1 Y2 Yk
kk
L
1
1 2
2
H0: m1 = m2 = K = mk
HA: At least one treatment mean i is

significantly different from the others.
i.e., 1 2 = 0
667 + 653 + 614 + 612 + 604 593 + 525 + 520
Group Means y1 = 5 = 630 y2 = 3 = 546
5(630) + 3(546)
Grand Mean y= = 598.50
5+3
How far is the total sample from the grand mean?

i.e., 1 2 = 0
667 + 653 + 614 + 612 + 604 593 + 525 + 520
Group Means y1 = 5 = 630 y2 = 3 = 546
5(630) + 3(546)
5+3
SSTot = (667 - 598.5) + (653 - 598.5) + (614 - 598.5) + (612 - 598.5) + (604 - 598.5)
2 2 2 2 2
+ (593 - 598.5) 2 + (525 - 598.5)2 + (520 - 598.5) 2 = 19710 dfTot = (5+3) 1 = 7

Alternate method ~

Y1 Y2 Yk
kk
L
1
1 2
2
H0: m1 = m2 = K = mk
How can we measure this? Imagine zero variability within groups
Alternate method ~

Y1 Y2 Yk
kk
L
1
1 2
2
H0: m1 = m2 = K = mk
How can we measure this? Imagine zero variability within groups
i.e., 1 2 = 0
Data: Sample 1 = {667,{630,

653, 614,
630, 612,
630, 604}; n1 = 5
630, Sample 2 = {593,
{546, 525,
546, 520};
546} n2 = 3
ANOVA F-test (if equivariance630 }
holds): Point estimates y = yi / n
667 + 653 + 614 + 612 + 604 593 + 525 + 520
Group Means y1 = 5 = 630 y2 = 3 = 546
5(630) + 3(546)
5+3
SSTot = (667 - 598.5) + (653 - 598.5) + (614 - 598.5) + (612 - 598.5) + (604 - 598.5)
2 2 2 2 2
+ (593 - 598.5) 2 + (525 - 598.5)2 + (520 - 598.5) 2 = 19710 dfTot = (5+3) 1 = 7

SSTrt = 5 (630 - 598.5) 2 + 3 (546 - 598.5) 2 = 13230 dfTrt = (2) 1 =1
The
Alternate method ~

Y1 Y2 Yk
kk
L
1
1 2
2
H0: m1 = m2 = K = mk
i.e., 1 2 = 0
667 + 653 + 614 + 612 + 604 593 + 525 + 520
Group Means y1 = 5 = 630 y2 = 3 = 546
5(630) + 3(546)
5+3
SSTot = (667 - 598.5) + (653 - 598.5) + (614 - 598.5) + (612 - 598.5) + (604 - 598.5)
2 2 2 2 2
+ (593 - 598.5) 2 + (525 - 598.5)2 + (520 - 598.5) 2 = 19710 dfTot = (5+3) 1 = 7

SSTrt = 5 (630 - 598.5) 2 + 3 (546 - 598.5) 2 = 13230 dfTrt = (2) 1 =1
How far is each sample from its own group mean?

i.e., 1 2 = 0
667 + 653 + 614 + 612 + 604 593 + 525 + 520
Group Means y1 = 5 = 630 y2 = 3 = 546
5(630) + 3(546)
5+3
SSTot = (667 - 598.5) + (653 - 598.5) + (614 - 598.5) + (612 - 598.5) + (604 - 598.5)
2 2 2 2 2
+ (593 - 598.5) 2 + (525 - 598.5)2 + (520 - 598.5) 2 = 19710 dfTot = (5+3) 1 = 7

SSTrt = 5 (630 - 598.5) 2 + 3 (546 - 598.5) 2 = 13230 dfTrt = (2) 1 =1
SSErr = (667 - 630) + (653 - 630) + (614 - 630) + (612 - 630) + (604 - 630)
2 2 2 2 2
+ (593 - 546) 2 + (525 - 546) 2 + (520 - 546) 2 BUT

i.e., 1 2 = 0
NOTE:
Group Means y1 =
667 + 653 + 614 + 612 + 604
5 = 630 y2 = 593 + 525 + 520
3 = 546 y1 - y2 = 84 >0
Group 630)) 2 +K + (604
(667 - 630 (604 - 630 )2 (593 - 546 22
46 ) 22
+K + (520 - 5546)
Variances s1 =
2
5 -1 = 788.5 s2 2 = 546))
3-1 = 1663 F = 1663
788.5 = 2.11 < 4
s2 = SS/df SS1 SS2
2 2
Pooled 2
spooled = (5( n1--1)(1)n788.5
s1 + ()n+2 -(3
1)-s1)(
2 1663 )
LL
RECA
i.e., 1 2 = 0
NOTE:
Group Means y1 =
667 + 653 + 614 + 612 + 604
5 = 630 y2 = 593 + 525 + 520
3 = 546 y1 - y2 = 84 >0
Group 630)) 2 +K + (604
(667 - 630 (604 - 630 )2 (593 - 546 22
46 ) 22
+K + (520 - 5546)
Variances s1 =
2
5 -1 = 788.5 s2 2 = 546))
3-1 = 1663 F = 1663
788.5 = 2.11 < 4
s2 = SS/df SSErr = 6480
2 2
Pooled 2
spooled = (5( n1--1)(1)n788.5
s1 + ()n+2 -(3
1)-s1)(
2 1663 )
dfErr = 6
LL
RECA
i.e., 1 2 = 0
667 + 653 + 614 + 612 + 604 593 + 525 + 520
Group Means y1 = 5 = 630 y2 = 3 = 546
5(630) + 3(546)
5+3
SSTot = (667 - 598.5) + (653 - 598.5) + (614 - 598.5) + (612 - 598.5) + (604 - 598.5)
2 2 2 2 2
+ (593 - 598.5) 2 + (525 - 598.5)2 + (520 - 598.5) 2 = 19710 dfTot = (5+3) 1 = 7

SSTrt = 5 (630 - 598.5) 2 + 3 (546 - 598.5) 2 = 13230 dfTrt = (2) 1 =1
SSErr = (667 - 630) + (653 - 630) + (614 - 630) + (612 - 630) + (604 - 630)
2 2 2 2 2
+ (593 - 546) 2 + (525 - 546) 2 + (520 - 546) 2

i.e., 1 2 = 0
667 + 653 + 614 + 612 + 604 593 + 525 + 520
Group Means y1 = 5 = 630 y2 = 3 = 546
5(630) + 3(546)
5+3
SSTot = (667 - 598.5) + (653 - 598.5) + (614 - 598.5) + (612 - 598.5) + (604 - 598.5)
2 2 2 2 2
+ (593 - 598.5) 2 + (525 - 598.5)2 + (520 - 598.5) 2 = 19710 dfTot = (5+3) 1 = 7

SSTrt = 5 (630 - 598.5) 2 + 3 (546 - 598.5) 2 = 13230 dfTrt = (2) 1 =1
SSErr = 4(788.5) + 2 (1663) = 6480 dfErr = (5+3) 2 = 6

SSTot = SSTrt + SSErr dfTot = dfTrt + dfErr
Tot
Err
Trt
SS MSTrt
MS = F=
ANOVA Table df MSErr
Source df SS MS F-ratio p-value
Treatment 1 13230 13230

( between )
= s 2
Error 6 6480 1080

(=s 2
within )
Total 7 19710 Note:
2
This is also spooled .
Tot
Err
Trt
SS MSTrt
MS = F=
Treatment 1 13230 13230

( between )
= s 2
12.25 ????
Error 6 6480 1080
(=s 2
within )
Total 7 19710 Note:
2
H0: = 2
1
2 2
H A : 12 2 2
SS1 2 SS 2
s =
2
1 s2 =
df1 df 2
Test Statistic
s12
F= 2
s2
Sampling Distribution =?
Tot
Err
Trt
SS MSTrt
MS = F=
ANOVA Table F1,6 df MSErr
Treatment 1 13230 13230

( between )
= s 2
12.25
p-value
Error 6 6480 1080
(=s 2
within )
Total 7 19710 Note: |
2
This is also spooled . 12.25
5.99
Tot
Err
Trt
SS MSTrt
MS = F=
ANOVA Table F1,6 df MSErr
Treatment 1 13230 13230

( between )
= s 2
12.25
p-value
Error 6 6480 1080
(=s 2
within ) = .05
Total 7 19710 |
Note: |
2
5.99 This is also spooled . 12.25
Tot
Err
Trt
SS MSTrt
MS = F=
Treatment 1 13230 13230

( between )
= s 2
12.25 p < .05
(on F1, 6 )
Error 6 6480 1080
(=s 2
within )
Total 7 19710 Note:
2
Tot
Err
Trt
SS MSTrt
MS = F=
Treatment 1 13230 13230 1pf(12.25, 1, 6)
( between )
= s 2
12.25 .01282634
(on F1, 6 )
Error 6 6480 1080
(=s 2
within )
Total 7 19710 Note:
2
Tot
Err
Trt
SS MSTrt
MS = F=
Treatment 1 13230 13230 1pf(12.25, 1, 6)
( between )
= s 2
12.25 .01282634
(on F1, 6 )
Error 6 6480 1080
(=s 2
within )
Total 7 19710
13230
Thus, the treatment accounts for 19710 = 67.1% of the total variability in the response Y.
R code:
# ANOVA FOR UNBALANCED DESIGN
> y1 = c(667, 653, 614, 612, 604)
> y2 = c(593, 525, 520)
>
> Data = data.frame(
+ Y = c(y1, y2),
+ X = factor(rep(c("y1", "y2"), times = c(length(y1),
length(y2))))
+ )
>
> var.test(Y ~ X, data = Data) # EQUIVARIANCE?
F test to compare two variances

data: Y by X
F = 0.4741, num df = 4, denom df = 2,
p-value = 0.4738

alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
0.01208057 5.04920249
sample estimates:
ratio of variances
0.4741431
R code:
# ANOVA FOR UNBALANCED DESIGN
> out = aov(Y ~ X, data = Data)

> anova(out)
Analysis of Variance Table
Response: Y
Df Sum Sq Mean Sq F value Pr(>F)
X 1 13230 13230 12.25 0.01283 *
Residuals 6 6480 1080
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Note: Vis--vis T-test vs. F-test,

p-value is the same using either method (.01283), since the sample is unchanged!
The square of the Tdf -score (3.5) is equal to the F1, df -score (12.25).
c 2
(Recall that the square of the Z-score is equal to the 1 -score.)
X1 X2 Xk
X1 X2 Xk
Suppose this ANOVA overall F-test
indicates that a significant difference
exists between one (or more) of the
treatment means, at = .05.
How can we find out which one(s)?

Y1 Y2 Yk
1
etc k
k
1 2
2
H0: m1 = m2 = K = mk
Idea: Test all possible pairwise comparisons, each via a two-sample t-test.
Example : Suppose there are k = 5 treatment groups.
(1, 2) (1, 3) (1, 4) (1, 5) (2, 3) (2, 4) (2, 5) (3, 4) (3,5) (4,5)
p = ... p = ... p = ... p = ... p = ... p = ... p = ... p = ... p = ... p = ...
5

There are = 10 such comparisons. PROBLEM???
2

SPURIOUSY1 Y2 Yk
SIGNIFICANCE!!!
1
etc k
k
1 2
2
H0: m1 = m2 = Ke = mk
p-valu
Example
= .05 : Suppose there are k = 5 treatment groups.
(1, 2) (1, 3) (1, 4) (1, 5) (2, 3) (2, 4) (2, 5) (3, 4) (3,5) (4,5)
p = ... p = ... p = ... p = ... p = ... p = ... p = ... p = ... p = ... p = ...
5

2

* = .05/10
Y1 Y2 Yk
1
etc k
k
1 2
2
H0: m1 = m2 = K = mk
(1, 2) (1, 3) (1, 4) (1, 5) (2, 3) (2, 4) (2, 5) (3, 4) (3,5) (4,5)
p = ... p = ... p = ... p = ... p = ... p = ... p = ... p = ... p = ... p = ...
5

2

Make each comparison at level * = / 10.
Y1 Y2 Yk
1
etc k
k
1 2
2
H0: m1 = m2 = K = mk
(1, 2) (1, 3) (1, 4) (1, 5) (2, 3) (2, 4) (2, 5) (3, 4) (3,5) (4,5)
p = ... p = ... p = ... p = ... p = ... p = ... p = ... p = ... p = ... p = ...
5

There are = 10 such comparisons.
2

BONFERRONI
Make each comparison at level * = / 10. CORRECTION
Alternate method ~

NS?
PTI O
SSU M
EL A
MOD
Y1 Y2 Yk
1
L kk
1 2
2
H0: m1 = m2 = K = mk
Alternate method ~

Equivariance can be tested via very similar two variances F-test in

6.2.2 (but this is very sensitive to normality assumption), or others.
If violated, can extend Welch Test for two means.
Y1 Y2 Yk
1
L kk
1 2
2
H0: m1 = m2 = K = mk
Alternate method ~

Normality can be tested via usual methods.

If violated, use nonparametric Kruskal-Wallis Test.
Y1 Y2 Yk
1
L kk
1 2
2
H0: m1 = m2 = K = mk
Alternate method ~

Extensions of ANOVA for data in matched blocks designs,

repeated measures, multiple factor levels within groups, etc.
Y1 Y2 Yk
1
L kk
1 2
2
H0: m1 = m2 = K = mk
Alternate method ~

How to identify significant group(s)? Pairwise testing, with correction
(e.g., Bonferroni) for spurious significance.
Example: k = 5 groups result in 10 such tests, so let each * = / 10.
Y1 Y2 Yk
1
L kk
1 2
2
H0: m1 = m2 = K = mk
ssppuurrio
iou
ssiiggnniiffic uss
icaannccee

6.3 - Anova

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

6.3 - Anova

Uploaded by

Copyright:

Available Formats

CHAPTER 6

Statistical Inference & Hypothesis Testing

6.1 - One Sample

6.2 - Two Samples

6.3 - Multiple Samples

6.1 - One Sample

6.2 - Two Samples

6.3 - Multiple Samples

s2 = SS/df SS1 SS2

Analysis of Variance (ANOVA)

The grand mean is a weighted average of the group

Analysis of Variance (ANOVA)

HA: At least one treatment mean i is

How far is the total sample from the grand mean?

+ (593 - 598.5) 2 + (525 - 598.5)2 + (520 - 598.5) 2 = 19710 dfTot = (5+3) 1 = 7

Analysis of Variance (ANOVA)

Analysis of Variance (ANOVA)

Data: Sample 1 = {667,{630,

+ (593 - 598.5) 2 + (525 - 598.5)2 + (520 - 598.5) 2 = 19710 dfTot = (5+3) 1 = 7

Analysis of Variance (ANOVA)

+ (593 - 598.5) 2 + (525 - 598.5)2 + (520 - 598.5) 2 = 19710 dfTot = (5+3) 1 = 7

How far is each sample from its own group mean?

+ (593 - 598.5) 2 + (525 - 598.5)2 + (520 - 598.5) 2 = 19710 dfTot = (5+3) 1 = 7

+ (593 - 546) 2 + (525 - 546) 2 + (520 - 546) 2 BUT

+ (593 - 598.5) 2 + (525 - 598.5)2 + (520 - 598.5) 2 = 19710 dfTot = (5+3) 1 = 7

+ (593 - 546) 2 + (525 - 546) 2 + (520 - 546) 2

+ (593 - 598.5) 2 + (525 - 598.5)2 + (520 - 598.5) 2 = 19710 dfTot = (5+3) 1 = 7

SSErr = 4(788.5) + 2 (1663) = 6480 dfErr = (5+3) 2 = 6

Source df SS MS F-ratio p-value

Treatment 1 13230 13230

Error 6 6480 1080

Source df SS MS F-ratio p-value

Treatment 1 13230 13230

Source df SS MS F-ratio p-value

Treatment 1 13230 13230

Source df SS MS F-ratio p-value

Treatment 1 13230 13230

Source df SS MS F-ratio p-value

Treatment 1 13230 13230

Source df SS MS F-ratio p-value

Treatment 1 13230 13230 1pf(12.25, 1, 6)

Source df SS MS F-ratio p-value

Treatment 1 13230 13230 1pf(12.25, 1, 6)

F test to compare two variances

> out = aov(Y ~ X, data = Data)

Analysis of Variance Table

Note: Vis--vis T-test vs. F-test,

How can we find out which one(s)?

Analysis of Variance (ANOVA)

Analysis of Variance (ANOVA)

Equivariance can be tested via very similar two variances F-test in

Analysis of Variance (ANOVA)

Normality can be tested via usual methods.

Analysis of Variance (ANOVA)

Extensions of ANOVA for data in matched blocks designs,

Analysis of Variance (ANOVA)

You might also like