Professional Documents
Culture Documents
11-1 11-2
© Copyright McGraw-Hill 2004 © Copyright McGraw-Hill 2004
Objectives Introduction
Test a distribution for goodness of fit using The chi-square distribution can be used for
chi-square. tests concerning frequency distributions, such
as: “ If a sample of buyers is given a choice of
Test two variables for independence using chi-
automobile colors, will each color be selected
square.
with the same frequency?”
Test proportions for homogeneity using chi-
The chi-square distribution can also be used
square.
to test the independence of two variables. For
example, “Are senators’ opinions on gun
control independent of party affiliations?”
11-3 11-4
© Copyright McGraw-Hill 2004 © Copyright McGraw-Hill 2004
Introduction (cont’d.) Test for Goodness-of-Fit
The chi-squared distribution can be used to The chi-square statistic can be used to see
test the homogeneity of proportions. For whether a frequency distribution fits a specific
example: “Is the proportion of high school pattern. This is referred to as the chi-squared
seniors who attend college immediately after goodness-of-fit test.
graduating the same for the northern,
southern, eastern, and western parts of the
United States?”
11-5 11-6
© Copyright McGraw-Hill 2004 © Copyright McGraw-Hill 2004
The data are obtained from a random sample. Is there enough evidence to reject the claim that there is no
preference in the selection of fruit soda flavors, using the data
shown previously? Let α = 0.05.
The expected frequency for each category
must be 5 or more. Solution
STEP 1 State the hypotheses and identify the claim.
H0: Consumers show no preference for flavors (claim).
H1 : Consumers show a preference.
STEP 2 Find the critical value. The degrees of freedom are 5 – 1=
4, and α = 0.05. Hence, the critical value from the table is 9.488.
STEP 3 Compute χ2.
(O − E ) 2
χ2 = ∑ = 18.0
E
11-9 11-10
© Copyright McGraw-Hill 2004 © Copyright McGraw-Hill 2004
11-11 11-12
© Copyright McGraw-Hill 2004 © Copyright McGraw-Hill 2004
Not A Good Fit Chi-Square Goodness-of-Fit Procedure
When the observed Step 1 State the hypotheses and identify the
values and the expected claim.
values are far apart, the y
chi-square test value Step 2 Find the critical value. The test is
will be large. Then, the always right-tailed.
null hypothesis will be
rejected—hence, there is x Step 3 Compute the test value.
“not a good fit.” 2
Find the sum of the (O − E ) values.
E
Observed values Expected Values
Step 4 Make the decision.
When there is a perfect agreement between Goodness-of-fit can also be used to test the observed data
against known frequency distributions.
the observed and the expected values, χ2 = 0 ,
Poisson Distribution: Let X be the number of defects in printed
but χ2 can never be negative. circuit boards. A random sample of n = 60 printed circuit boards
is taken and the number of defects recorded. The results were as
The test is right-tailed because “H0: Good fit” follows: No. of defects 0 1 2 3
and “H1: Not a good fit” means that χ2 will be
small in the first case and χ2 will be large in Observed Freq. 32 15 9 4
the second case. Does the assumption of a Poisson distribution seem appropriate
as a model for these data?
Solution:
The mean of the (assumed) Poisson distribution is unknown so
must be estimated from the data by the sample mean:
11-15 11-16
© Copyright McGraw-Hill 2004 © Copyright McGraw-Hill 2004
More Examples (cont.) More Examples (cont.)
32 × 0 + 15 × 1 + 9 × 2 + 4 × 3 Now do the chi-square goodness-of-fit.
X = = 0.75
60
Step 1: The null hypothesis is H0 : X ∼Poisson
Using the Poisson distribution with µ = 0.75 we can compute pi, The alternative hypothesis is H1 : X does not follow a
the hypothesised probabilities associated with each class. From Poisson distribution.
these we can calculate the expected frequencies(under the null
Step 2: Find the Critical Value. Note the degrees of freedom is 4 -1-
hypothesis):
1 =2, since we had to estimate one parameter (the mean, µ)
e −0.75 0.75 0
P ( X = 0) = = 0.472 ⇒ E ( X = 0) = 0.472 × 60 = 28.38 from the data. Suppose α = 0.05, we have the critical value
0!
from Chi-square table: 5.991.
e −0.75 0.751
P ( X = 1) = = 0.354 ⇒ E ( X = 1) = 0.354 × 60 = 21.24
1! Step 3: Computer χ2:
e −0.75 0.75 2
P( X = 2) = = 0.133 ⇒ E ( X = 2) = 0.133 × 60 = 7.98 (O − E ) 2
2! χ2 = ∑ = 3.3894
E
P( X = 3) = 1 − ( P( X = 0) + P( X = 1) + P( X = 2)) = 0.041 ⇒ E ( X = 3) = 0.041 × 60 = 2.46
Step 4: Make a decision: do not reject the null hypothesis.
11-17 11-18
© Copyright McGraw-Hill 2004 © Copyright McGraw-Hill 2004
11-21 11-22
© Copyright McGraw-Hill 2004 © Copyright McGraw-Hill 2004
11-23 11-24
© Copyright McGraw-Hill 2004 © Copyright McGraw-Hill 2004
Chi-Square Independence Test Contingency Table
11-25 11-26
© Copyright McGraw-Hill 2004 © Copyright McGraw-Hill 2004
The formula for the test value for the Using the contingency table, one can compute the expected
frequencies for each block (or cell) as shown next.
independence test is the same as the one for
1. Find the sum of each row and each column, and find the
the goodness-of-fit test. grand total, as shown.
χ2 = ∑
( O − E )2 Group
procedure procedure Preference
Row 1 sum
E Nurses 100 80 20
200
Row 2 sum
Doctors 50 120 30
with d.f. = (R − 1)(C − 1) 200
11-27 11-28
© Copyright McGraw-Hill 2004 © Copyright McGraw-Hill 2004
Calculation of the Expected Frequencies (cont’d) Calculation of the Expected Frequencies (cont’d)
2 For each cell, multiply the corresponding row sum by the For each cell, the expected values can be computed and placed
column sum and divide by the grand total, to get the expected in the table:
value: Prefer
Prefer old No
Row sum × column sum Group new
procedure Preference
Expected value = procedure
grand total
Row 1 sum
The rationale for the computation of the expected frequencies for Nurses 100(75) 80(100) 20(25)
200
a contingency table uses proportions. For C1,1 a total of 150 out
Row 2 sum
of 400 people prefer the new procedure. And since there are 200 Doctors 50(75) 120(100) 30(25)
200
nurses, one would expect, if the null hypothesis were true, Column 1 Column 2 Column 3 Grand
(150/400)(200), or 75, of the nurses to be in favor of the new sum sum sum Total
procedure. For example, for C1,2, the expected value, denoted by 150 200 50 400
E1,2, is
200 × 200
E1,2 = 400 =100
11-29 11-30
© Copyright McGraw-Hill 2004 © Copyright McGraw-Hill 2004
11-31 11-32
© Copyright McGraw-Hill 2004 © Copyright McGraw-Hill 2004
Homogeneity of Proportions Test (cont’d.) Independence and Homogeneity
11-33 11-34
© Copyright McGraw-Hill 2004 © Copyright McGraw-Hill 2004
Step 3 Compute the test value. To compute Step 4 Make the decision.
the test value, first find the expected
Step 5 Summarise the results.
values. For each cell of the
contingency table, use the formula
(row sum )(column sum )
E=
grand total
Yes 18 22 16 56 STEP 3 Compute the test value. First, compute the expected
No 32 28 34 94 values, and the complete table is shown.
50 50 50 150 School 1 School 2 School 3 Total
Yes 18(18.67) 22(18.67) 16(18.67) 56
No 32(31.33) 28(31.33) 34(31.33) 94
50 50 50 150
11-37 11-38
© Copyright McGraw-Hill 2004 © Copyright McGraw-Hill 2004
11-39 11-40
© Copyright McGraw-Hill 2004 © Copyright McGraw-Hill 2004
Summary Summary (cont’d.)
There are three main uses of the chi-square 3. The homogeneity of proportions test is
distribution:
used to determine if several proportions
1. The test of independence is used to are all equal when samples are selected
determine whether two variables are from different populations.
related or are independent.
2. It can be used as goodness-of-fit test, in
order to determine whether the
frequencies of a distribution are the same
as the hypothesized frequencies.
11-41 11-42
© Copyright McGraw-Hill 2004 © Copyright McGraw-Hill 2004
Conclusions
11-43
© Copyright McGraw-Hill 2004