You are on page 1of 27

MATH& 146

Lesson 33
Section 4.4
Comparing Many Means

1
Comparing Many Means
Sometimes we want to compare means across
many groups.
We might initially think to do pairwise comparisons;
for example, if there were three groups, we might
be tempted to compare the first mean with the
second, then with the third, and then finally
compare the second and third means for a total of
three comparisons.

2
Comparing Many Means
However, if we have many groups and do many
comparisons, it is likely that we will eventually find
a difference just by chance, even if there is no
difference in the populations.

3
The F Distribution
So far, we have discussed the normal, t, and chi-
square distributions. To compare many means at
once, we will need to consider one more: the F
distribution.

4
The F Distribution
Like the chi-square distribution, the F distribution is
always positive and skewed right. However, this
distribution comes from the ratio of two types of
variation, with two different degrees of freedom.

5
The Hypotheses for
Comparing Many Means
When we want to compare the averages of many
groups, we will use the following hypotheses:
H0: 1 = 2 = ... = k (all population means are equal)
HA: Not all of the means are equal to each other.
Note: In the event the null hypothesis is rejected, this
test won't tell you which mean(s) is(are) different.
Additional tests/confidence intervals must be
performed.

6
Example 1
Suppose that we want to compare the averages of
three groups:
H0: 1 = 2 = 3
HA: Not all of the means are equal to each other.

List all of the possibilities for the alternate hypothesis.

7
Example 2
College departments commonly run multiple lectures
of the same introductory course each quarter because
of high demand. Consider a mathematics department
that runs five sections of an introductory statistics
course. We might like to determine whether there are
statistically significant differences in first exam scores
in these five classes (A, B, C, D, and E).
Describe appropriate hypotheses to determine
whether there are any differences between the five
classes.

8
ANOVA
ANOVA (analysis of variance) is used to
compare the means of at least 3 populations.
(Use a two-sample t test to compare the means of
two populations.)
The ANOVA procedure for comparing these means
involves analysis of the variance in the sample
data.

9
ANOVA
Generally, we must check three conditions on the data
before performing ANOVA:
1) The observations are independent within and
across groups.
2) The data within each group are nearly normal.
3) The variability across the groups is about equal.
When these conditions are met, we may perform
ANOVA to determine whether the data provide strong
evidence against the null hypothesis that all the means
are equal.
10
ANOVA
As its name implies, ANOVA uses variances to
complete the test. ANOVA separates the variance
among the entire set of data into two categories,
called the variation of groups and the variation of
error.
The F-distribution is then used to compare these
two variations.

11
Variation of Groups
The first category is the variation between the different
levels being tested, called the variation of groups.
Think of this variation as the treatment effect.
Large Variation of Groups Small Variation of Groups

12
Variation of Error
The second category is the variation within the levels
being tested (consistency), called the Variation of
Error. Think of this as a measure of precision.
Large Variation of Error Small Variation of Error

13
Decision of the Null Hypothesis

If the variation between the groups is significantly


more than the variation within the groups (the error),
then the means are considered unequal (the null
hypothesis is rejected).
Fail to reject the
Reject the null hypothesis null hypothesis

14
Example 3
Compare groups I, II, and III below. Can you visually
determine if the differences in the group centers is due
to chance or not? Now compare groups IV, V, and VI.
Do these differences appear to be due to chance?

15
Example 4
The graph below shows commuting times for randomly
selected drivers in six cities. Graphically test whether
the average commuting times are the same (We'll look
at the numbers in our next class). Assume the
conditions are met.

16
Batting Performance
We would like to determine whether there are real
differences between the batting performance of
baseball players according to their position: outfielder
(OF), infielder (IF), designated hitter (DH), and catcher
(C).

17
Batting Performance
Our data set includes batting records of all 327 Major
League Baseball (MLB) players from the 2010 season
with more than 200 at bats. Six of the 327 cases are
shown below.

18
Batting Performance
The measure we will use for the player batting
performance (the outcome variable) is on-base
percentage (OBP). The on-base percentage roughly
represents the proportion of the time a player
successfully gets on base or hits a home run.

19
Example 5
What hypotheses should be used to test whether
there are any differences in the batting averages
between outfielders (OF), infielders (IF),
designated hitters (DH), and catchers (C).

20
Example 6
The player positions have been divided into four
groups: outfielders (OF), infielders (IF),
designated hitters (DH), and catchers (C). What
would be an appropriate point estimate of the on-
base percentage by outfielders, OF?

21
Batting Performance
Below are the side-by-side box plots of the on-base
percentage for 327 players across four groups. There
is one prominent outlier visible in the infield group, but
with 154 observations in the infield group, this outlier is
not a concern.

Brandon Wood
3B for LAA
0.174 OBP
22
Batting Performance
Notice that the variability
appears to be approximately
constant across groups;
nearly constant variance
across groups is an
important assumption that
must be satisfied before we
can test for equal means.

23
Example 7
The largest difference between the sample means is
between the designated hitter and the catcher
positions. Consider again the original hypotheses:
H0: OF = IF = DH = C
HA: The average on-base percentage (i) varies across
some (or all) groups.
Why might it be inappropriate to run the test by simply
estimating whether the difference of DH and C is
statistically significant at a 0.05 significance level?

24
Data Snooping
The primary issue here is that we are inspecting
the data before picking the groups that will be
compared. (This is called data snooping.)
It is inappropriate to examine all data by eye
(informal testing) and only afterwards decide which
parts to formally test.
Instead, we should first reject the hypothesis that
all the means are equal, and only after that find out
which groups are different.

25
Data Snooping
Naturally we would pick the groups with the large
differences for the formal test, leading to an inflation in
the Type 1 (false positive) Error rate.
To understand this better, let's consider a slightly
different problem.
Suppose we are to measure the aptitude for students
in 20 classes in a large elementary school at the
beginning of the year.

26
Data Snooping
In this school, all students are randomly assigned to
classrooms, so any differences we observe between
the classes at the start of the year are completely due
to chance.
However, with so many groups, we will probably
observe a few classes that look rather different from
each other. If we select only these classes that look so
different, we will probably make the wrong conclusion
that the assignment wasn't random.

27

You might also like