Professional Documents
Culture Documents
INTRODUCTION
Analysis of Variance (ANOVA) is a hypothesis-testing technique used to test the equality of two or
more population (or treatment) means by examining the variances of samples that are taken.
ANOVA allows one to determine whether the differences between the samples are simply due to
random error (sampling errors) or whether there are systematic treatment effects that cause the mean
in one group to differ from the mean in another.
Most of the time ANOVA is used to compare the equality of three or more means, however when the
means from two samples are compared using ANOVA it is equivalent to using a t-test to compare the
means of independent samples.
ANOVA is based on comparing the variance (or variation) between the data samples to variation within
each particular sample. If the between variation is much larger than the within variation, the means of
different samples will not be equal. If the between and within variations are approximately the same
size, then there will be no significant difference between sample means.
Definition 1
The response variable is the variable of interest to be measured in the experiment. We also refer to
the response as the dependent variable.
Definition 2
Factors are those variables whose effect on the response is of interest to the experimenter.
Quantitative factors are measured on a numerical scale, whereas qualitative factors are not
(naturally) measured on a numerical scale.
Definition 3
Factor levels are the values of the factor utilized in the experiment.
Definition 4
The treatments of an experiment are the factor-level combinations utilized.
Definition 5
An experimental unit is the object on which the response and factors are observed or measured.
Definition 6
A designed experiment is an experiment in which the analyst controls the specification of the
treatments and the method of assigning the experimental units to each treatment. An observational
experiment is an experiment in which the analyst simply observes the treatments and the response on
a sample of experimental units
Definition 7
The completely randomized design is a design in which treatments are randomly assigned to the
experimental units or in which independent random samples of experimental units are selected for
each treatment.
The test procedure compares the variation in observations between samples to the variation within
samples. Completely randomized designs are the simplest in which the treatments are assigned to the
experimental units completely at random. This allows every experimental unit, i.e., plot, animal, soil
sample, etc., to have an equal probability of receiving a treatment.
Suppose we wish to compare k population means ( k 2 ). This situation can arise in two ways. If the
study is observational, we are obtaining independently drawn samples from k distinct populations and
we wish to compare the population means for some numerical response of interest. If the study is
experimental, then we are using a completely randomized design to obtain our data from k distinct
treatment groups. In a completely randomized design the experimental units are randomly assigned to
one of k treatments and the response value from each unit is obtained. The mean of the numerical
response of interest is then compared across the different treatment groups.
1. Complete flexibility is allowed - any number of treatments and replicates may be used.
2. Relatively easy statistical analysis, even with variable replicates and variable experimental errors for
different treatments.
3. Analysis remains simple when data are missing.
4. Provides the maximum number of degrees of freedom for error for a given number of experimental
units and treatments.
1. Relatively low accuracy due to lack of restrictions which allows environmental variation to enter
experimental error.
2. Not suited for large numbers of treatments because a relatively large amount of experimental
material is needed which increases the variation.
Assumptions:
1. Samples are drawn independently (completely randomized design)
2. Population variances are equal, i.e. 12 22 k2 .
3. Populations are normally distributed.
Notations:
SST SS B
Total mean square MS T Between samples square MS B
n 1 k 1
SS
Within sample square, MS W W Number of d.o.f = (k -1) + (n – k) = n – 1.
nk
ANOVA TABLE
Source of variation Sum ofSquares Degrees of Freedom Mean Square F Ratio
Between samples SSB k-1 MSB MS B
Within samples SSW n-k MSW MS W
Total SST n-1
Working Method
PROBLEM 1
Neuroscience researchers examined the impact of environment on rat development. Rats were
randomly assigned to be raised in one of the four following test conditions: Impoverished (wire mesh
cage - housed alone), standard (cage with other rats), enriched (cage with other rats and toys), super
enriched (cage with rats and toys changes on a periodic basis). After two months, the rats were tested
on a variety of learning measures (including the number of trials to learn a maze to a three perfect trial
criteria), and several neurological measure (overall cortical weight, degree of dendritic branching, etc.).
The data for the maze task is below. Compute the appropriate test for the data provided below.
Impoverished Standard Enriched Super Enriched
22 17 12 8
19 21 14 7
15 15 11 10
24 12 9 9
18 19 15 12
Solution:
Source of variation Sum ofSquares Degrees of Freedom Mean Square F Ratio
Between samples SSB= 323.35 3 107.7833 MS B
Within samples SSW = 135.6 16 8.475 12.71
MSW
Total SST = 458.95 19
Null Hypothesis H 0 1 2 3 4
Alternative Hypothesis H1: At least two means differ
Test Statistic: Fc = 12.71
Table Value F0.05,(3,16)= 3.49
Conclusion: Fc > F0.05,(3,12) , Reject Null Hypothesis
PROBLEM 2
A research study was conducted to examine the clinical efficacy of a new antidepressant. Depressed
patients were randomly assigned to one of three groups: a placebo group, a group that received a low
dose of the drug, and a group that received a moderate dose of the drug. After four weeks of
treatment, the patients completed the Beck Depression Inventory. The higher the score, the more
depressed the patient. The data are presented below. Compute the appropriate test.
Placebo Low Dose Moderate Dose
38 22 14
47 19 26
39 8 11
25 23 18
42 31 5
Solution:
Source of variation Sum ofSquares Degrees of Freedom Mean Square F Ratio
Between samples SSB= 1484.9333 2 742.46666 MS B
11.26
Within samples SSW = 790.8 12 65.9 MSW
Total SST = 2275.73333 14
Null Hypothesis H 0 1 2 3
Alternative Hypothesis H1: At least two means differ
Test Statistic: Fc = 11.26
Table Value F0.05,(2,12)= 6.93
Conclusion: Fc > F0.05,(2,12) , Reject Null Hypothesis
PROBLEM 3
A manufacturer of television sets is interested in the effect on tube conductivity of four different types
of coating for color picture tubes. The following conductivity data are obtained.
Coating Type Conductivity
1 143 141 150 146
2 152 149 137 143
3 134 136 132 127
4 129 127 132 129
Test the null hypothesis that H 0 1 2 3 4 , against the alternative that at least two of the
means differ. Use α = 0.05.
Solution:
PROBLEM 4
A manufacturer suspects that the batches of raw material furnished by her supplier differ significantly
in calcium content. There is a large number of batches currently in the warehouse. Five of these are
randomly selected for study. A chemist makes five determinations on each batch and obtains the
following data.
Solution:
H 0 1 2 3 4
H1: At least two means differ
Test Statistic: Fc = 5.54
Table Value F0.05,(4,20)= 2.84
Conclusion: Fc > F0.05,(4,20) , Reject Null Hypothesis.
PROBLEM 5
Four Laboratories measure the tin coating weight of 12 disks and that the results are as follows.
Lab A 0.25 0.27 0.22 0.30 0.27 0.28 0.32 0.24 0.31 0.26 0.21 0.28
Lab B 0.18 0.28 0.21 0.23 0.25 0.20 0.27 0.19 0.24 0.22 0.29 0.16
Lab C 0.19 0.25 0.27 0.24 0.18 0.26 0.28 0.24 0.25 0.20 0.21 0.19
Lab D 0.23 0.30 0.28 0.28 0.24 0.34 0.20 0.18 0.24 0.28 0.22 0.21
Construct an ANOVA table and test the hypothesis , whether there is any difference among the four
sample means can be attributed to chance at 5%
Solution:
H 0 1 2 3 4
H1: At least two means differ
Test Statistic: Fc = 2.87
Table Value F0.05,(3,44)= 2.82
Conclusion: Fc > F0.05,(3,44) , Reject Null Hypothesis.
PROBLEM 5
A production manager wishes to test the effect of 5 similar milling machines on the surface of finish of
small casting. So he selected 5 such machines and conducted the experiment with four replication
under each machine as per ‘Completely Randomized Design’ and obtained the following reading
Machines
M1 M2 M3 M4 M5
25 10 40 27 15
Relication 30 20 30 20 8
16 33 49 35 45
36 42 22 48 34
Solution:
H 0 1 2 3 4 5
H1: At least two means differ
Test Statistic: Fc = 0.4502
Table Value F0.05,(3,44)= 3.06
Conclusion: Fc > F0.05,(3,44) , Accept Null Hypothesis.
There is no significant difference between machines in terms of surface finish of small castings.
Two-way (or multi-way) ANOVA is an appropriate analysis method for a study with a quantitative
outcome and two (or more) categorical explanatory variables. This is an extension of the one factor
situation to take account of second factor. As such it is often called a Blocking Factor because it places
subjects or units into homogeneous groups called Blocks. The design itself is called a Randomized
Block Design. The usual assumptions of Normality, equal variance, and independent errors apply. If an
experiment has a quantitative outcome and two categorical explanatory variables that are defined in
such a way that each experimental unit (subject) can be exposed to any combination of one level of
one explanatory variable and one level of the other explanatory variable, then the most common
analysis method is two-way ANOVA. Because there are two different explanatory variables the effects
on the outcome of a change in one variable may either not depend on the level of the other variable
(additive model) or it may depend on the level of the other variable (interaction model).
Assumptions
Computational Formulae
Total Sum of Squares T2
SST x 2
ij
i j rc
Between Rows Sum of Squares TRi2 T 2
SS R
i c rc
Between Columns Sum of Squares TRi2 T 2
SS C
i r rc
Error(residual) Sum of Squares SSE = SST – SSR – SSC
ANOVA TABLE
PROBLEM 1
Three laboratories, A, B, and C, are used by food manufacturing companies for making nutrition
analyses of their products. The following data are the fat contents (in grams) of the same weight of
three similar types of peanut butter.
Laboratory
Peanut A B C D
Butter
Brand 1 16.6 17.7 16.0 16.3
Brand 2 16.0 15.5 15.6 15.9
Brand 3 16.4 16.3 15.9 16.2
Analyse the data at 5% significance by (a) carrying out a one-way ANOVA to see if there is a difference
between the fat content of the three brands; (b) performing a two-way ANOVA to see if there is any
difference between the Brands using the laboratories as blocks. (c) Do you think there is any evidence
that the results were not reasonably consistent between the four laboratories?
a) One-way ANOVA
Laboratory
Peanut Butter A B C D Mean
Brand 1 16.6 17.7 16.0 16.3 16.65
Brand 2 16.0 15.5 15.6 15.9 15.75
Brand 3 16.4 16.3 15.9 16.2 16.20
Mean 16.33 16.50 15.83 16.13 16.20
Sums of squares
Total SS: Inputting all the individual values into the calculator gives the following summary statistics: n
= 12, x = 16.20, sn = 0.546 nsn2 = 3.58
x
Between Brands SS: The mean scores x1 = 16.65, x 2 = 15.75 and 3 = 16.20
Each of these means came from 4 values so inputting the means with a frequency of 4 gives: n = 12, x
= 16.20, sn = 0.367 nsn2 = 1.62 (n and x for checking)
Conclusion: T.S. < C.V. so H0 not rejected. There is no difference between the fat content of the
brands.
b) Two-way ANOVA
Sums of squares
From (a): Total SS: nsn2 = 3.58 Between Brands SS: nsn2 = 1.62
Between Labs Sum of Squares: Mean scores x A = 16.33, x B = 16.50, x C = 15.83, x D = 16.13
Each of these means came from 3 values so inputting the means with a frequency of 3 gives: n = 12, x
= 16.20, sn = 0.249 nsn2 = 0.75 (n and x for checking)
Critical value: F0.05 (2,6) = 5.14 (Deg. of free. from 'between brands' and 'errors'.)
Conclusion: T.S. < C.V. so H0 not rejected. There is no difference between the fat content of the
brands. Blocking has not changed to conclusion even though the test statistic has increased.
Critical value: F0.05 (3,6) = 4.76 (Deg. of free. from 'between brands' and 'errors'.)
Test Statistic: 1.25
Conclusion: T.S. < C.V. so H0 not rejected. The results between the different laboratories are
consistent.
PROBLEM 2
The following data represent the number of units of production per day turned out by 5 different workers
using 4 different types of machines
MACHINE TYPE
W A B C D
O 1 44 38 47 36
R
K 2 46 40 52 43
E 3 34 36 44 32
R
S 4 43 38 46 33
5 38 42 49 39
a) Test whether the five men differ with respect to mean productivity.
b) Test whether the mea productivity is same for four different machine types. Take α = 5%
Solution
We shift the origin to 40 and subtract 40 from the given values and work out with new values of xij.
MACHINE TYPE Ti Ti 2
W
O
A B C D r x
j
2
ij
R 1 4 -2 7 -4 5 6.25 85
K
2 6 0 12 3 21 110.5 189
E
R 3 -6 -4 4 -8 -14 49.0 132
S 4 3 -2 6 -7 0 0 98
5 -2 2 9 -1 16 16 90
Ti 5 -6 38 -17 T = 20 Ti 2 594
r
=181.1
F > F0.05,(4, 12) with respect to rows, hence 5 workers differ significantly.
F > F0.05,(3, 12) with respect to columns, hence 4 machine types also differ significantly in mean
productivity.
LATIN SQUARE DESIGN
A n x n LATIN Square is a square array of n distinct letters, with each appearing once and only once in
each row and in column
Example:
A B C D
B C D A
C D A B
D A B C
NOTATIONS:
Computational Formulae
Total Sum of Squares T2
SST x 2 2
ij
i j n
Between Rows Sum of Squares TRi2 T 2
SS R
i n n2
Between Columns Sum of Squares TRi2 T 2
SSC
i n n2
Between treatment sum of squares TK2 T 2
SSTk
i n n2
Error(residual) Sum of Squares SSE = SST – SSR – SSC - SSTk
ANOVA TABLE
Source of variation Sum ofSquares Degrees of Freedom Mean Square F Ratio
Between rows SSR n-1 MSR = SSR/(n-1) MS R
Between Columns SSC n-1 MSC = SSC/(n-1) MS E
MS C
Between SSE n-1 MSE = SSTk/(n-1)
Treatments MS E
MSTk
Error(residual) SSE (n– 1) x (n – 2) MSE = SSE/(n-1) MS E
Total SST n2 -1