You are on page 1of 27

MATH& 146

Lesson 24
Section 3.3
The Chi-Square Distribution

1
One-Way Tables
Previously, we have looked at inference for single
proportions (one group) and for difference of
proportions (two groups).
Now we will develop a method for assessing a null
model when the data are binned (any number of
proportions).
This model can be assessed using a chi-square
distribution.

2
Representative Juries
Let us consider data from a random sample of 275
jurors in a small county. Jurors identified their
racial group, as shown in below, and we would like
to determine if these jurors are racially
representative of the population.

Race White Black Hispanic Other Total


Representation in juries 205 26 25 19 275
Registered voters 0.72 0.07 0.12 0.09 1.00

3
Representative Juries
If the jury is representative of the population, then
the proportions in the sample should roughly
reflect the population of eligible jurors, i.e.
registered voters.

Race White Black Hispanic Other Total


Representation in juries 205 26 25 19 275
Registered voters 0.72 0.07 0.12 0.09 1.00

4
Representative Juries
While the proportions in the juries do not precisely
represent the population proportions, it is unclear
whether these data provide convincing evidence
that the sample is not representative.

Race White Black Hispanic Other Total


Representation in juries 205 26 25 19 275
Registered voters 0.72 0.07 0.12 0.09 1.00

5
Representative Juries
If the jurors really were randomly sampled from the
registered voters, we might expect small
differences due to chance.
However, unusually large differences may provide
convincing evidence that the juries were not
representative.

Race White Black Hispanic Other Total


Representation in juries 205 26 25 19 275
Registered voters 0.72 0.07 0.12 0.09 1.00

6
Example 1
Of the people in the city, 275 served on a jury. If
the individuals are randomly selected to serve on a
jury, about how many of the 275 people would we
expect for each race?

Race White Black Hispanic Other Total


Representation in juries 205 26 25 19 275
Registered voters 0.72 0.07 0.12 0.09 1.00

7
Representative Juries
The sample proportion represented from each race
among the 275 jurors was not a precise match for
any ethnic group. While some sampling variation
is expected, we would expect the sample
proportions to be fairly similar to the population
proportions if there is no bias on juries.

Race White Black Hispanic Other Total


Observed data 205 26 25 19 275
Expected counts 198 19.25 33 24.75 275
8
Representative Juries
We need to test whether the differences are strong
enough to provide convincing evidence that the
jurors are not a random sample.

Race White Black Hispanic Other Total


Observed data 205 26 25 19 275
Expected counts 198 19.25 33 24.75 275
9
Representative Juries
These ideas can be organized into hypotheses:

H0: The jurors are a random sample, i.e. there is


no racial bias in who serves on a jury, and the
observed counts reflect natural sampling
fluctuation.
HA: The jurors are not randomly sampled, i.e.
there is a racial bias in juror selection.

10
Representative Juries
To evaluate these hypotheses, we quantify how
different the observed counts are from the
expected counts.
Strong evidence for the alternative hypothesis
would come in the form of unusually large
deviations in the groups from what would be
expected based on sampling variation alone.

11
The Chi-Square Test
Statistic
Recall (Lesson 20) that a test statistic is given by
the z-score formula:
point estimate null value
Z
SE of point estimate

This construction was based on (1) identifying the


difference between a point estimate and an
expected value if the null hypothesis was true, and
(2) standardizing that difference using the standard
error of the point estimate.
12
The Chi-Square Test
Statistic
Our strategy is to compute this Z-score for each
race (category). The standard error in binned data
is the square root of the count under the null (the
expected counts). For whites,
205 198
Z1 0.50
198

Race White Black Hispanic Other Total


Observed data 205 26 25 19 275
Expected counts 198 19.25 33 24.75 275
13
Example 2
Compute the Z-scores for black, Hispanic, and
other groups.

Race White Black Hispanic Other Total


Observed data 205 26 25 19 275
Expected counts 198 19.25 33 24.75 275
14
The Chi-Square Test
Statistic
The chi-square, 2, test statistic is the sum of the
squares of the Z-scores

2 Z12 Z22 Z32 Z42


0.50 1.54 1.39 1.16
2 2 2 2

5.89

15
The Chi-Square Test
Statistic
The chi-square, 2, test statistic is the sum of the
squares of the Z-scores

O1 E1 O2 E2 Ok Ek
2 2 2

2

E1 E2 Ek

This summarizes how strongly the observed


counts tend to deviate from the null counts.

16
The Chi-Square Distribution
The chi-square distribution is sometimes used to
characterize data sets and statistics that are
always positive and typically right skewed.

17
The Chi-Square Distribution
Recall the normal distribution had two parameters
mean and standard deviation that could be
used to describe its exact characteristics.
The chi-square distribution has just one parameter
called degrees of freedom (df), which influences
the shape, center, and spread of the distribution.

18
The Chi-Square Distribution
The figure below shows four chi-square distributions.
Notice how the center, variability (spread), and shape
of the distribution changes as the degrees of freedom
increases.

19
The Chi-Square Distribution
When df > 2, the mean of the chi-square distribution is
the degrees of freedom. Chi-square variables are
always nonnegative, so zero will always be the left
extreme.

20
2
Using cdf on the TI-83/84
2cdf computes the chi-square distribution probability
between lowerbound and upperbound for the specified
df (degrees of freedom).

That is, if X ~ 2(df), then 2cdf(a,b,df) = P(a < X < b).

a b 21
Example 3
The graph below shows a chi-square distribution with
3 degrees of freedom and an upper shaded tail
starting at 6.25. Use the 2cdf function to estimate the
shaded area.

22
Example 4
The figure below shows a cutoff of 11.7 on a chi-
square distribution with 7 degrees of freedom. Find
the area of the upper tail.

23
p-values for a Chi-Square
Distribution
A moment ago, we defined a chi-square test statistic:

O1 E1 O2 E2 Ok Ek
2 2 2


2

E1 E2 Ek

where k is the number of bins (categories).


Not surprisingly, this test statistic follows a chi-square
distribution. The degrees of freedom is k 1, or one
less than the number of bins.

24
p-values for a Chi-Square
Distribution
The p-value for this test statistic is found by looking at
the upper tail of this chi-square distribution. We
consider the upper tail because larger values of 2
would provide greater evidence against the null
hypothesis.

p-value 2cdf test statistic, BIG, df

25
Example 5
How many categories were there in the juror
example?
How many degrees of freedom should be
associated with the chi-square distribution used for
2?

Race White Black Hispanic Other Total


Observed data 205 26 25 19 275
Expected counts 198 19.25 33 24.75 275
26
Example 6
If the null hypothesis is true, the test statistic
2 = 5.89 would be closely associated with a chi-
square distribution with three degrees of freedom.
Using this distribution and test statistic, identify the
p-value and interpret the result.

Race White Black Hispanic Other Total


Observed data 205 26 25 19 275
Expected counts 198 19.25 33 24.75 275
27

You might also like