Professional Documents
Culture Documents
(Analysis of Variance)
and
Experimentation
Anova
The Analysis of Variance technique is used when the independent variables are of
nominal scale (categorical) and the dependent variable is metric (continuous).
Designs
The design of the experiment is most critical in performing any experiment to be
analyzed through the technique of ANOVA.
There are four major types are
Completely Randomised Design in a One-Way ANOVA (Single Factor)
Randomised Block Design (Single Blocking Factor)
Latin Square Design (Two Blocking Factors)
Factorial Design (Two way Anova)
One-Way ANOVA
Researchers are often interested in examining the differences in the
mean values of the dependent variable for several categories of a
single independent variable or factor.
One dependent (metric) variable.
There is only one categorical independent variable. Variable is called a
Factor. Each category of an independent variable is called a level.
The independent variable may be different levels of prices, or different pack sizes,
or different product colours, and the effect (dependent variable) could be sales,
preferences or attitudes towards the brand.
SS y = (Y i Y )
i =1
SS x = n (Y j Y )2
j =1
SS error=
j
(Y ij Y j )2
Yi = individual observation
= mean for category j
Yj
= mean over the whole sample, or grand mean
Y
Yij = i th observation in the j th category
Independent Variable
Within
Category
Variation
=SSwithin
Category
Mean
X
Total
Sample
X1
X2
Categories
X3
Xc
Y1
Y1
Y1
Y1
Y1
Y2
:
:
Yn
Y2
Y2
Y2
Yn
Yn
Yn
Y2
:
:
YN
Y1
Y2
Y3
Yc
Total
Variation
=SSy
Test Significance
In one-way analysis of variance, the
interest lies in testing the null hypothesis
that the category means are equal in the
population.
(i) H0: 1 = 2 = 3 = ........... = c
(ii) At least one of the
(iv) Test-statistic:
The null hypothesis may be tested by the F statistic
based on the ratio between these two estimates:
SS x /(c1)
MS x
F=
=
SS error/(Nc) MS error
This statistic follows the F distribution, with (c - 1) and
(N - c) degrees of freedom (d.f.).
Sources of
Variation
Between the
groups
Within the
groups
Total
Sum of Squares
d.f
SSx
c-1
MSx
SSerror
SSy
N-c
N-1
SSerror
Example :
Three different versions of advertising copy have been created by an advertising
agency for a campaign. Let us call these versions of copy ADCOPY 1, 2 and 3. Now,
the ad agency wants to test which of these three versions of the advertising copy is
preferred by its target population, before they launch the campaign.
A sample of 18 respondents is selected from the target population in the nearby areas of
the city. At random, these 18 respondents are assigned to the 3 versions of ad copy.
Each version of ad copy is thus shown to six of the respondents.
The respondents are asked to rate their liking for the ad copy shown to them on a scale
of 1 to 10. (1 = Not liked at all, 10 = Liked a lot, and other values in between these
two). The ratings given by the 18 respondents are tabulated.
Ratings
Respondents
Adcopy1
Adcopy2
Adcopy3
F2,15 = 7.70
Source of
Variation
Between
Groups
Within
Groups
Total
Sum of
Squares
DF
Mean
Square
7.000
3.500
29.500
15
1.967
36.500
17
2.147
F
1.780
Sig.
of F
.203
The null hypothesis for this F-test is that there is no significant difference
in the mean ratings for the three ad copy versions.
H0: M1 = M2 = M3 where M1, M2 and M3 are the mean ratings for the
three versions of ad copy.
Thus, in this case, we have accepted the null hypothesis (or failed to
reject the null hypothesis), at the 95 percent confidence level.
The ANOVA has thus told us what we may not have been able to gauge if we
had simply looked at the mean ratings for each ad copy by computing these.
For example, the ratings for the ad copy version 1 are 6,7,5,8,8,8 and the
mean rating is (6+7+5+8+8+8) / 6, or 42/6 = 7. Similarly, the mean rating of
ad copy version 2 is (4+4+5+7+7+6) / 6, or 33/6 = 5.5. The mean rating for ad
copy version 3 is (5+5+4+7+8+7) / 6, or 36/6 = 6.
At a glance, the three mean ratings appear to be different 7, 5.5 and 6. But
the ANOVA tells us that this difference is not statistically significant at the 95
percent confidence level.
It does this by performing an F-test.
1.
Dependent Variable
Hypothesis
1. The assignment of our sample of 18 in the above manner assumes that
the magazine in which the version of adcopy appears may have an
impact on the ratings. We can test this hypothesis - in fact, two
hypotheses - by doing an ANOVA with a randomized block design.
2. For this purpose, we use the variable Rating as the dependent
variable, and Adcopy as the factor, and Magazine as the block.
Input Data
Column 1 is Sales, column 2 is Pack Design and Column 3 is Price. Please note that
even though Price is a continuous metric variable, for the purpose of ANOVA, being
an independent variable, it has to be treated as a categorical variable. Hence the
coding (1, 2, 3) for Price.