Professional Documents
Culture Documents
Lesson 24
Section 3.3
The Chi-Square Distribution
1
One-Way Tables
Previously, we have looked at inference for single
proportions (one group) and for difference of
proportions (two groups).
Now we will develop a method for assessing a null
model when the data are binned (any number of
proportions).
This model can be assessed using a chi-square
distribution.
2
Representative Juries
Let us consider data from a random sample of 275
jurors in a small county. Jurors identified their
racial group, as shown in below, and we would like
to determine if these jurors are racially
representative of the population.
3
Representative Juries
If the jury is representative of the population, then
the proportions in the sample should roughly
reflect the population of eligible jurors, i.e.
registered voters.
4
Representative Juries
While the proportions in the juries do not precisely
represent the population proportions, it is unclear
whether these data provide convincing evidence
that the sample is not representative.
5
Representative Juries
If the jurors really were randomly sampled from the
registered voters, we might expect small
differences due to chance.
However, unusually large differences may provide
convincing evidence that the juries were not
representative.
6
Example 1
Of the people in the city, 275 served on a jury. If
the individuals are randomly selected to serve on a
jury, about how many of the 275 people would we
expect for each race?
7
Representative Juries
The sample proportion represented from each race
among the 275 jurors was not a precise match for
any ethnic group. While some sampling variation
is expected, we would expect the sample
proportions to be fairly similar to the population
proportions if there is no bias on juries.
10
Representative Juries
To evaluate these hypotheses, we quantify how
different the observed counts are from the
expected counts.
Strong evidence for the alternative hypothesis
would come in the form of unusually large
deviations in the groups from what would be
expected based on sampling variation alone.
11
The Chi-Square Test
Statistic
Recall (Lesson 20) that a test statistic is given by
the z-score formula:
point estimate null value
Z
SE of point estimate
5.89
15
The Chi-Square Test
Statistic
The chi-square, 2, test statistic is the sum of the
squares of the Z-scores
O1 E1 O2 E2 Ok Ek
2 2 2
2
E1 E2 Ek
16
The Chi-Square Distribution
The chi-square distribution is sometimes used to
characterize data sets and statistics that are
always positive and typically right skewed.
17
The Chi-Square Distribution
Recall the normal distribution had two parameters
mean and standard deviation that could be
used to describe its exact characteristics.
The chi-square distribution has just one parameter
called degrees of freedom (df), which influences
the shape, center, and spread of the distribution.
18
The Chi-Square Distribution
The figure below shows four chi-square distributions.
Notice how the center, variability (spread), and shape
of the distribution changes as the degrees of freedom
increases.
19
The Chi-Square Distribution
When df > 2, the mean of the chi-square distribution is
the degrees of freedom. Chi-square variables are
always nonnegative, so zero will always be the left
extreme.
20
2
Using cdf on the TI-83/84
2cdf computes the chi-square distribution probability
between lowerbound and upperbound for the specified
df (degrees of freedom).
a b 21
Example 3
The graph below shows a chi-square distribution with
3 degrees of freedom and an upper shaded tail
starting at 6.25. Use the 2cdf function to estimate the
shaded area.
22
Example 4
The figure below shows a cutoff of 11.7 on a chi-
square distribution with 7 degrees of freedom. Find
the area of the upper tail.
23
p-values for a Chi-Square
Distribution
A moment ago, we defined a chi-square test statistic:
O1 E1 O2 E2 Ok Ek
2 2 2
2
E1 E2 Ek
24
p-values for a Chi-Square
Distribution
The p-value for this test statistic is found by looking at
the upper tail of this chi-square distribution. We
consider the upper tail because larger values of 2
would provide greater evidence against the null
hypothesis.
25
Example 5
How many categories were there in the juror
example?
How many degrees of freedom should be
associated with the chi-square distribution used for
2?