Professional Documents
Culture Documents
CHI-SQUARE TESTS
THE CHI-SQUARE
DISTRIBUTION
Definition
The chi-square distribution has only one
parameter called the degrees of freedom. The
shape of a chi-squared distribution curve is
skewed to the right for small df and becomes
symmetric for large df. The entire chi-square
distribution curve lies to the right of the vertical
axis. The chi-square distribution assumes
nonnegative values only, and these are denoted
by the symbol χ2 (read as “chi-square”).
2
Figure 11.1 Three chi-square distribution
curves.
3
Example 11-1
Find the value of χ² for 7 degrees of
freedom and an area of .10 in the
right tail of the chi-square
distribution curve.
4
Table 11.1 χ2 for df = 7 and .10 Area in the
Right Tail
²
Required value of χ 5
Figure 11.2
df = 7
.10
0 12.017 χ²
6
Example 11-2
Find the value of χ² for 12 degrees of
freedom and area of .05 in the left tail
of the chi-square distribution curve.
7
Solution 11-2
Area in the right tail
= 1 – Area in the left tail
= 1 – .05 = .95
8
Table 11.2 χ2 for df = 12 and .95 Area in the
Right Tail
²
Required value of χ 9
Figure 11.3
df = 12
.05
0 5.226 ²
χ
10
A GOODNESS-OF-FIT TEST
Definition
An experiment with the following
characteristics is called a
multinomial experiment.
11
Multinomial Experiment
cont.
1. It consists of n identical trials (repetitions).
2. Each trial results in one of k possible
outcomes (or categories), where k > 2.
3. The trials are independent.
4. The probabilities of the various outcomes
remain constant for each trial.
12
A GOODNESS-OF-FIT TEST
cont.
Definition
The frequencies obtained from the
performance of an experiment are called the
observed frequencies and are denoted by
O. The expected frequencies, denoted by E,
are the frequencies that we expect to obtain if
the null hypothesis is true. The expected
frequency for a category is obtained as
E = np
Where n is the sample size and p is the
probability that an element belongs to that
category if the null hypothesis is true.
13
A GOODNESS-OF-FIT TEST
cont.
Degrees of Freedom for a Goodness-
of-Fit Test
In a goodness-of-fit test, the degrees
of freedom are
df = k – 1
18
Solution 11.3
There are five categories
Five days on which the ATM is used
Multinomial experiment
We use the chi-square distribution to
make this test.
19
Solution 11-3
Area in the right tail = α = .01
k = number of categories = 5
df = k – 1 = 5 – 1 = 4
The critical value of χ2 = 13.277
20
Figure 11.4
α = .01
χ2
Critical value of χ2 13.277 21
Table 11.3
Category Observed p Expected (O – E) (O – E)2
(Day) Frequency Frequency
(O − E ) 2
O E = np E
22
Sum =
Solution 11-3
All the required calculations to find
the value of the test statistic χ2 are
shown in Table 11.3.
(O − E ) 2
χ =∑
2
= 23.184
E
23
Solution 11.3
The value of the test statistic χ2 =
23.184 is larger than the critical value
of χ2 = 13.277
It falls in the rejection region
Hence, we reject the null hypothesis
24
Example 11-4
In a National Public Transportation survey
conducted in 1995 on the modes of
transportation used to commute to work, 79.6%
of the respondents said that they drive alone,
11.1% car pool, 5.1% use public transit, and
4.2% depend on other modes of transportation
(USA TODAY, April 14, 1999). Assume that these
percentages hold true for the 1995 population of
all commuting workers. Recently 1000 randomly
selected workers were asked what mode of
transportation they use to commute to work. The
following table lists the results of this survey.
25
Example 11-4
26
Solution 11-4
H0: The current percentage distribution
of the use of transportation modes
is the same as that for 1995.
H1: The current percentage distribution
of the use of transportation
modes is different from that for
1995.
27
Solution 11-4
There are four categories
Drive alone, carpool, public transit, and
other
Multinomial experiment
We use the chi-square distribution to
make the test.
28
Solution 11-4
Area in the right tail = α = .025
k = number of categories = 4
df = k – 1 = 4 – 1 = 3
The critical value of χ2 = 9.348
29
Figure 11.5
α=.
025
9.348
χ2 30
Critical value of χ2
Table 11.4
Category Observed p Expected (O – E) (O – E)2
Frequency Frequency
(O − E ) 2
O E = np E
(O − E ) 2
χ =∑
2
= 5.782
E
32
Solution 11-4
The value of the test statistic χ2 =
5.782 is less than the critical value of
χ2 = 9.348
It falls in the nonrejection region
Hence, we fail to reject the null
hypothesis.
33
CONTINGENCY TABLES
34
A TEST OF INDEPENDENCE
OR HOMOGENEITY
A Test of Independence
A Test of Homogeneity
35
A Test of Independence
Definition
A test of independence involves a test of the
null hypothesis that two attributes of a
population are not related. The degrees of
freedom for a test of independence are
df = (R – 1)(C – 1)
Where R and C are the number of rows and
the number of columns, respectively, in the
given contingency table.
36
A Test of Independence
cont.
Test Statistic for a Test of Independence
The value of the test statistic χ2 for a test
of independence is calculated as
(O − E ) 2
χ =∑2
E
where O and E are the observed and expected
frequencies, respectively, for a cell.
37
Example 11-5
Violence and lack of discipline have
become major problems in schools in the
United States. A random sample of 300
adults was selected, and they were asked
if they favor giving more freedom to
schoolteachers to punish students for
violence and lack of discipline. The two-
way classification of the responses of
these adults is represented in the
following table.
38
Example 11-5
In Favor Against No Opinions
(F) (A) (N)
Men (M) 93 70 12
Women (W)the 87
Calculate 32frequencies
expected 6 for
this table assuming that the two
attributes, gender and opinions on the
issue, are independent.
39
Table 11.6
Solution 11-5
In Favor Against No Opinion Row
(F) (A) (N) Totals
40
Expected Frequencies for
a Test of Independence
The expected frequency E for a cell is
calculated as
41
Table 11.7
Solution 11-5
In Favor Against No Opinion Row
(F) (A) (O) Totals
44
Solution 11-6
α = .01
df = (R – 1)(C – 1) = (2 – 1)(3 – 1) = 2
The critical value of χ2 = 9.210
45
Figure 11.6
α = .01
9.210 χ2
Critical value of χ2 46
Table 11.8
In Favor Against No Opinion Row
(F) (A) (N) Totals
Men 93 70 12 175
(M) (105.00) (59.50) (10.50)
Women 87 32 6 125
(W) (75.00) (42.50) (7.50)
Column 180 102 18 300
Totals
47
Solution 11-6
(O − E ) 2
χ2 = ∑
E
=
( 93 − 105.00 )
2
+
( 70 − 59.50 )
2
+
( 12 − 10.50 )
2
+
( 87 − 75.00 )
2
+
( 32 − 42.50 )
2
+
( 6 − 7.50 )
2
48
Solution 11-6
The value of the test statistic χ2 =
8.252
It is less than the critical value of χ2
It falls in the nonrejection region
Hence, we fail to reject the null
hypothesis
49
Example 11-7
A researcher wanted to study the
relationship between gender and
owning cell phones. She took a
sample of 2000 adults and obtained
the information given in the following
table.
50
Example 11-7
52
Solution 11-7
We are performing a test of
independence
We use the chi-square distribution
α = .05.
df = (R – 1)(C – 1) = (2 – 1)(2 – 1) = 1
The critical value of χ2 = 3.841
53
Figure 11.7
α = .05
3.841 χ2
Critical value of χ2 54
Table 11.9
Own Cell Do Not Own Cell Row
Phones (Y) Phones Totals
(N)
Men 640 450 1090
(M) (588.60) (501.40)
Women 440 470 910
(W) (491.40) (418.60)
(O − E )
2
χ =∑
2
=
( 640 − 588.60)
2
+
( 450 − 501.40 )
2
588.60 501.40
+
( 440 − 491.40 )
2
+
( 470 − 418.60)
2
491.40 481.60
= 4.489 + 5.269 + 5.376 + 6.311 = 21.445
56
Solution 11-7
The value of the test statistic χ2 =
21.445
It is larger than the critical value of χ2
It falls in the rejection region
Hence, we reject the null hypothesis
57
A Test of Homogeneity
Definition
A test of homogeneity involves
testing the null hypothesis that the
proportions of elements with certain
characteristics in two or more different
populations are the same against the
alternative hypothesis that these
proportions are not the same.
58
Example 11-8
Consider the data on income
distributions for households in
California and Wisconsin given in
following table:
Californi Wisconsin Row Totals
a
High Income 70 34 104
Medium 80 40 120
Income
Low Income 100 76 176
59
Example 11-8
Using the 2.5% significance level, test
the null hypothesis that the
distribution of households with regard
to income levels is similar
(homogeneous) for the two states.
60
Solution 11-8
H0: The proportions of households that
belong to different income groups are the
same in both states
H1: The proportions of households that
belong to different income groups are
not the same in both states
61
Solution 11-8
α = .025
df = (R – 1)(C – 1) = (3 – 1)(2 – 1) = 2
The critical value of χ2 = 7.378
62
Figure 11.7
α=.
025
7.378 χ2
Critical value of χ2 63
Table 11.11
California Wisconsin Row Totals
64
Solution 11-8
(O − E ) 2
χ2 = ∑
E
=
( 70 − 65)
2
+
( 34 − 39)
2
+
( 80 − 75)
2
65 39 75
+
( 40 − 45)
2
+
( 100 − 110 )
2
+
( 76 − 66 )
2
45 110 66
= .385 + .641 + .333 + .566 + .909 + 1.515 = 4.339
65
Solution 11-8
The value of the test statistic χ2 =
4.339
It is less than the critical value of χ2
It falls in the nonrejection region
Hence, we fail to reject the null
hypothesis
66
INFERENCES ABOUT THE
POPULATION VARIANCE
Estimation of the Population Variance
Hypothesis Tests About the
Population Variance
67
INFERENCES ABOUT THE
POPULATION VARIANCE
cont.
Sampling Distribution of (n – 1)s2 / σ2
If the population from which the
sample is selected is (approximately)
normally distributed, then
(n − 1) s 2
σ 2
69
Example 11-9
One type of cookie manufactured by
Haddad Food Company is Cocoa
Cookies. The machine that fills
packages of these cookies is set up in
such a way that the average net weight
of these packages is 32 ounces with a
variance of .015 square ounce.
70
Example 11-9
From time to time the quality control
inspector at the company selects a sample of
a few such packages, calculates the variance
of the net weights of these packages, and
construct a 95% confidence interval for the
population variance. If either both or one of
the two limits of this confidence interval is
not the interval .008 to .030, the machine is
stopped and adjusted.
71
Example 11-9
A recently taken random sample of 25
packages from the production line
gave a sample variance of .029 square
ounce. Based on this sample
information, do you think the machine
needs an adjustment? Assume that the
net weights of cookies in all packages
are normally distributed.
72
Solution 11-9
n = 25 s2 = .029
α = 1 - .95 = .05
α / 2 = .05 / 2 = .025
1 – α / 2 = 1 – .025 = .975
df = n – 1 = 25 – 1 = 24
χ2 for 24 df and .025 area in the right tail =
39.364
χ2 for 24 df and .975 area in the right tail =
12.401
73
Figure 11.9
df = 24
α = .025
2
39.36 χ2
χ2
Value of α / 2
4
74
Figure 11.9
df = 24
α = .025
1−
2
12.40 χ2
1
Value ofχ
2
1−α 2
75
Solution 11-9
(n − 1) s 2
(n − 1) s 2
to
χα / 2
2
χ 1−α / 2
2
76
Solution 11-9
Thus, with 95% confidence, we can
state that the variance for all
packages of Cocoa Cookies lies
between .0177 and .0561 square
ounce.
77
Hypothesis Tests About
the Population Variance
The value of the test statistic χ2 is calculated
as
(n − 1) s 2
χ =
2
σ 2
79
Example 11-10
She always uses α = .01. The
acceptable value of the population
variance is .015 square ounce or
less. If the conclusion from the test
of hypothesis is that the population
variance is not within the acceptable
limit, the machine is stopped and
adjusted.
80
Example 11-10
A recently taken random sample of
25 packages from the production line
gave a sample variance of .029
square ounce. Based on this sample
information, do you think the
machine needs an adjustment?
Assume that the net weights of
cookies in all packages are normally
distributed.
81
Solution 11-10
H0 :σ2 ≤ .015
The population variance is within the
acceptable limit
H1: σ2 >.015
The population variance exceeds the
acceptable limit
82
Solution 11-10
α = .01
df = n – 1 = 25 – 1 = 24
The critical value of χ2 = 42.980
83
Figure 11.10
α = .01
χ2
42.980
Critical value of χ2
84
Solution 11-10
(n − 1) s
2
(25 − 1)(.029)
χ =
2
= = 46.400
σ 2
.015
From H0
85
Solution 11-10
The value of the test statistic χ2 = 46.400
It is greater than the critical value of χ 2
86
Example 11-11
The variance of scores on a standardized
mathematics test for all high school seniors was
150 in 2002. A sample of scores for 20 high school
seniors who took this test this year gave a variance
of 170. Test at the 5% significance level if the
variance of current scores of all high school seniors
on this test is different from 150. Assume that the
scores of all high school seniors on this test are
(approximately) normally distributed.
87
Solution 11-11
H0: σ2 = 150
The population variance is not different
from 150
H1: σ2 ≠ 150
The population variance is different from
150
88
Solution 11-11
α = .05
Area in the each tail = .025
df = n – 1 = 20 – 1 = 19
The critical values of χ2 32.852 and
8.907
89
Figure 11.11
α /2 = .025 α /2 = .025
8.907 32.852
Two critical values of χ2
90
Solution 11-11
(n − 1) s
2
(20 − 1)(170)
χ =
2
= = 21.533
σ 2
150
From H0
91
Solution 11-11
The value of the test statistic χ2 =
21.533
It is between the two critical values of χ2
It falls in the nonrejection region
Consequently, we fail to reject H0.
92