MATH& 146 Lesson 24: The Chi-Square Distribution

MATH& 146
Lesson 24
Section 3.3
The Chi-Square Distribution
1
One-Way Tables
Previously, we have looked at inference for single
proportions (one group) and for difference of
proportions (two groups).
Now we will develop a method for assessing a null
model when the data are binned (any number of
proportions).
This model can be assessed using a chi-square
distribution.
2
Representative Juries
Let us consider data from a random sample of 275
jurors in a small county. Jurors identified their
racial group, as shown in below, and we would like
to determine if these jurors are racially
representative of the population.
Race White Black Hispanic Other Total

Representation in juries 205 26 25 19 275
Registered voters 0.72 0.07 0.12 0.09 1.00
3
If the jury is representative of the population, then
the proportions in the sample should roughly
reflect the population of eligible jurors, i.e.
registered voters.

4
While the proportions in the juries do not precisely
represent the population proportions, it is unclear
whether these data provide convincing evidence
that the sample is not representative.

5
If the jurors really were randomly sampled from the
registered voters, we might expect small
differences due to chance.
However, unusually large differences may provide
convincing evidence that the juries were not
representative.

6
Example 1
Of the people in the city, 275 served on a jury. If
the individuals are randomly selected to serve on a
jury, about how many of the 275 people would we
expect for each race?

7
The sample proportion represented from each race
among the 275 jurors was not a precise match for
any ethnic group. While some sampling variation
is expected, we would expect the sample
proportions to be fairly similar to the population
proportions if there is no bias on juries.

Observed data 205 26 25 19 275
Expected counts 198 19.25 33 24.75 275
8
We need to test whether the differences are strong
enough to provide convincing evidence that the
jurors are not a random sample.

Expected counts 198 19.25 33 24.75 275
9
These ideas can be organized into hypotheses:
H0: The jurors are a random sample, i.e. there is

no racial bias in who serves on a jury, and the
observed counts reflect natural sampling
fluctuation.
HA: The jurors are not randomly sampled, i.e.
there is a racial bias in juror selection.
10
To evaluate these hypotheses, we quantify how
different the observed counts are from the
expected counts.
Strong evidence for the alternative hypothesis
would come in the form of unusually large
deviations in the groups from what would be
expected based on sampling variation alone.
11
The Chi-Square Test
Statistic
Recall (Lesson 20) that a test statistic is given by
the z-score formula:
point estimate null value
Z
SE of point estimate
This construction was based on (1) identifying the

difference between a point estimate and an
expected value if the null hypothesis was true, and
(2) standardizing that difference using the standard
error of the point estimate.
12
The Chi-Square Test
Statistic
Our strategy is to compute this Z-score for each
race (category). The standard error in binned data
is the square root of the count under the null (the
expected counts). For whites,
205 198
Z1 0.50
198

Expected counts 198 19.25 33 24.75 275
13
Example 2
Compute the Z-scores for black, Hispanic, and
other groups.

Expected counts 198 19.25 33 24.75 275
14
The Chi-Square Test
Statistic
The chi-square, 2, test statistic is the sum of the
squares of the Z-scores
2 Z12 Z22 Z32 Z42

0.50 1.54 1.39 1.16
2 2 2 2
5.89
15
The Chi-Square Test
Statistic
The chi-square, 2, test statistic is the sum of the
squares of the Z-scores
O1 E1 O2 E2 Ok Ek
2 2 2
2

E1 E2 Ek
This summarizes how strongly the observed

counts tend to deviate from the null counts.
16
The chi-square distribution is sometimes used to
characterize data sets and statistics that are
always positive and typically right skewed.
17
Recall the normal distribution had two parameters
mean and standard deviation that could be
used to describe its exact characteristics.
The chi-square distribution has just one parameter
called degrees of freedom (df), which influences
the shape, center, and spread of the distribution.
18
The figure below shows four chi-square distributions.
Notice how the center, variability (spread), and shape
of the distribution changes as the degrees of freedom
increases.
19
When df > 2, the mean of the chi-square distribution is
the degrees of freedom. Chi-square variables are
always nonnegative, so zero will always be the left
extreme.
20
2
Using cdf on the TI-83/84
2cdf computes the chi-square distribution probability
between lowerbound and upperbound for the specified
df (degrees of freedom).
That is, if X ~ 2(df), then 2cdf(a,b,df) = P(a < X < b).
a b 21
Example 3
The graph below shows a chi-square distribution with
3 degrees of freedom and an upper shaded tail
starting at 6.25. Use the 2cdf function to estimate the
shaded area.
22
Example 4
The figure below shows a cutoff of 11.7 on a chi-
square distribution with 7 degrees of freedom. Find
the area of the upper tail.
23
p-values for a Chi-Square
Distribution
A moment ago, we defined a chi-square test statistic:
O1 E1 O2 E2 Ok Ek
2 2 2

2

E1 E2 Ek
where k is the number of bins (categories).

Not surprisingly, this test statistic follows a chi-square
distribution. The degrees of freedom is k 1, or one
less than the number of bins.
24
p-values for a Chi-Square
Distribution
The p-value for this test statistic is found by looking at
the upper tail of this chi-square distribution. We
consider the upper tail because larger values of 2
would provide greater evidence against the null
hypothesis.
p-value 2cdf test statistic, BIG, df
25
Example 5
How many categories were there in the juror
example?
How many degrees of freedom should be
associated with the chi-square distribution used for
2?

Expected counts 198 19.25 33 24.75 275
26
Example 6
If the null hypothesis is true, the test statistic
2 = 5.89 would be closely associated with a chi-
square distribution with three degrees of freedom.
Using this distribution and test statistic, identify the
p-value and interpret the result.

Expected counts 198 19.25 33 24.75 275
27

MATH& 146 Lesson 24: The Chi-Square Distribution

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

MATH& 146 Lesson 24: The Chi-Square Distribution

Uploaded by

Copyright:

Available Formats

MATH& 146

Race White Black Hispanic Other Total

Race White Black Hispanic Other Total

Race White Black Hispanic Other Total

Race White Black Hispanic Other Total

Race White Black Hispanic Other Total

Race White Black Hispanic Other Total

Race White Black Hispanic Other Total

H0: The jurors are a random sample, i.e. there is

This construction was based on (1) identifying the

Race White Black Hispanic Other Total

Race White Black Hispanic Other Total

2 Z12 Z22 Z32 Z42

This summarizes how strongly the observed

That is, if X ~ 2(df), then 2cdf(a,b,df) = P(a < X < b).

where k is the number of bins (categories).

p-value 2cdf test statistic, BIG, df

Race White Black Hispanic Other Total

Race White Black Hispanic Other Total

You might also like