You are on page 1of 92

CHAPTER 11:

CHI-SQUARE TESTS
THE CHI-SQUARE
DISTRIBUTION
 Definition
 The chi-square distribution has only one
parameter called the degrees of freedom. The
shape of a chi-squared distribution curve is
skewed to the right for small df and becomes
symmetric for large df. The entire chi-square
distribution curve lies to the right of the vertical
axis. The chi-square distribution assumes
nonnegative values only, and these are denoted
by the symbol χ2 (read as “chi-square”).

2
Figure 11.1 Three chi-square distribution
curves.

3
Example 11-1
 Find the value of χ² for 7 degrees of
freedom and an area of .10 in the
right tail of the chi-square
distribution curve.

4
Table 11.1 χ2 for df = 7 and .10 Area in the

Right Tail

Area in the Right Tail Under the Chi-Square


Distribution Curve
df .995 … .100 … .005
1 .000 … 2.706 … 7.879
2 .010 … 4.605 … 10.597
. … … … … …
7 .989 … 12.017 … 20.278
. … … … … …
100 67.328 … 118.498 … 140.169

²
Required value of χ 5
Figure 11.2

df = 7

.10

0 12.017 χ²
6
Example 11-2
 Find the value of χ² for 12 degrees of
freedom and area of .05 in the left tail
of the chi-square distribution curve.

7
Solution 11-2
 Area in the right tail
= 1 – Area in the left tail
= 1 – .05 = .95

8
Table 11.2 χ2 for df = 12 and .95 Area in the
Right Tail

Area in the Right Tail Under the Chi-Square


Distribution Curve
df .995 … .950 … .005
1 .000 … .004 … 7.879
2 .010 … .103 … 10.597
. … … … … …
12 3.074 … 5.226 … 28.300
. … … … … …
100 67.328 … 77.929 … 140.169

²
Required value of χ 9
Figure 11.3
df = 12

Shaded area = .95

.05

0 5.226 ²
χ
10
A GOODNESS-OF-FIT TEST
 Definition
 An experiment with the following
characteristics is called a
multinomial experiment.

11
Multinomial Experiment
cont.
1. It consists of n identical trials (repetitions).
2. Each trial results in one of k possible
outcomes (or categories), where k > 2.
3. The trials are independent.
4. The probabilities of the various outcomes
remain constant for each trial.

12
A GOODNESS-OF-FIT TEST
cont.
 Definition
 The frequencies obtained from the
performance of an experiment are called the
observed frequencies and are denoted by
O. The expected frequencies, denoted by E,
are the frequencies that we expect to obtain if
the null hypothesis is true. The expected
frequency for a category is obtained as
 E = np
 Where n is the sample size and p is the
probability that an element belongs to that
category if the null hypothesis is true.
13
A GOODNESS-OF-FIT TEST
cont.
 Degrees of Freedom for a Goodness-
of-Fit Test
 In a goodness-of-fit test, the degrees
of freedom are
 df = k – 1

 where k denotes the number of


possible outcomes (or categories) for
the experiment.
14
Test Statistic for a
Goodness-of-Fit Test
 The test statistic for a goodness-of-fit
test is χ2 and its value is calculated as
(O − E ) 2
χ2 = ∑
E
 where
 O = observed frequency for a category

 E = expected frequency for a category = np

 Remember that a chi-square goodness-of-fit


test is always right-tailed.
15
Example 11-3
 A bank has an ATM installed inside the bank, and
it is available to its customers only from 7 AM to 6
PM Monday through Friday. The manager of the
bank wanted to investigate if the percentage of
transactions made on this ATM is the same for
each of the five days (Monday through Friday) of
the week. She randomly selected one week and
counted the number of transactions made on this
ATM on each of the five days during this week.
The information she obtained is given in the
following table, where the number of users
represents the number of transactions on this ATM
on these days. For convenience, we will refer to
these transactions as “people” or “users.” 16
Example 11-3

Day Monday Tuesday Wednesday Thursday Friday

Number of 253 197 204 179 267


users
 At the 1% level of significance, can we
reject the null hypothesis that the
proportion of people who use this ATM
each of the five days of the week is the
same? Assume that this week is typical of
all weeks in regard to the use of this ATM.
17
Solution 11-3
 H0 : p1 = p2 = p3 = p4 = p5 = .20
 H1 : At least two of the five proportions
are not equal to .20

18
Solution 11.3
 There are five categories
 Five days on which the ATM is used
 Multinomial experiment
 We use the chi-square distribution to
make this test.

19
Solution 11-3
 Area in the right tail = α = .01
 k = number of categories = 5
 df = k – 1 = 5 – 1 = 4
 The critical value of χ2 = 13.277

20
Figure 11.4

Do not reject H0 Reject H0

α = .01

χ2
Critical value of χ2 13.277 21
Table 11.3
Category Observed p Expected (O – E) (O – E)2
(Day) Frequency Frequency
(O − E ) 2
O E = np E

Monday 253 .20 1200(.20) = 240 13 169 .704


Tuesday 197 .20 1200(.20) = 240 -43 1849 7.704
Wednesday 204 .20 1200(.20) = 240 -36 1296 5.400
Thursday 279 .20 1200(.20) = 240 39 1521 6.338
Friday 267 .20 1200(.20) = 240 27 729 3.038

22
Sum =
Solution 11-3
 All the required calculations to find
the value of the test statistic χ2 are
shown in Table 11.3.

(O − E ) 2

χ =∑
2
= 23.184
E

23
Solution 11.3
 The value of the test statistic χ2 =
23.184 is larger than the critical value
of χ2 = 13.277
 It falls in the rejection region
 Hence, we reject the null hypothesis

24
Example 11-4
 In a National Public Transportation survey
conducted in 1995 on the modes of
transportation used to commute to work, 79.6%
of the respondents said that they drive alone,
11.1% car pool, 5.1% use public transit, and
4.2% depend on other modes of transportation
(USA TODAY, April 14, 1999). Assume that these
percentages hold true for the 1995 population of
all commuting workers. Recently 1000 randomly
selected workers were asked what mode of
transportation they use to commute to work. The
following table lists the results of this survey.

25
Example 11-4

Mode of Drive alone Carpool Public transit Other


transportation
Number of workers 812 102 57 29

Test at the 2.5% significance level


whether the current pattern of use of
transportation modes is different from
that for 1995.

26
Solution 11-4
 H0: The current percentage distribution
of the use of transportation modes
is the same as that for 1995.
 H1: The current percentage distribution
of the use of transportation
modes is different from that for
1995.

27
Solution 11-4
 There are four categories
 Drive alone, carpool, public transit, and
other
 Multinomial experiment
 We use the chi-square distribution to
make the test.

28
Solution 11-4
 Area in the right tail = α = .025
 k = number of categories = 4
 df = k – 1 = 4 – 1 = 3
 The critical value of χ2 = 9.348

29
Figure 11.5

Do not reject H0 Reject H0

α=.
025

9.348
χ2 30
Critical value of χ2
Table 11.4
Category Observed p Expected (O – E) (O – E)2
Frequency Frequency
(O − E ) 2
O E = np E

Drive alone 812 .796 1000(.796) = 796 16 256 .322


Car pool 102 .111 1000(.111) = 111 -9 81 .730
Public transit 57 .051 1000(.051) = 51 6 36 .706
Other 29 .042 1000(.042) = 42 -13 169 4.024

n = 1000 Sum =31


5.782
Solution 11-4
 All the required calculations to find
the value of the test statistic χ2 are
shown in Table 11.4.

(O − E ) 2

χ =∑
2
= 5.782
E

32
Solution 11-4
 The value of the test statistic χ2 =
5.782 is less than the critical value of
χ2 = 9.348
 It falls in the nonrejection region
 Hence, we fail to reject the null
hypothesis.

33
CONTINGENCY TABLES

Table 11.5 Total 2002 Enrollment at a


University
Full-Time Part-Time
Male 6768 2615 Students who
are male and
Female 7658 3717 enrolled part-
time

34
A TEST OF INDEPENDENCE
OR HOMOGENEITY
 A Test of Independence
 A Test of Homogeneity

35
A Test of Independence
 Definition
 A test of independence involves a test of the
null hypothesis that two attributes of a
population are not related. The degrees of
freedom for a test of independence are
 df = (R – 1)(C – 1)
 Where R and C are the number of rows and
the number of columns, respectively, in the
given contingency table.

36
A Test of Independence
cont.
 Test Statistic for a Test of Independence
 The value of the test statistic χ2 for a test
of independence is calculated as

(O − E ) 2
χ =∑2

 E
where O and E are the observed and expected
frequencies, respectively, for a cell.

37
Example 11-5
 Violence and lack of discipline have
become major problems in schools in the
United States. A random sample of 300
adults was selected, and they were asked
if they favor giving more freedom to
schoolteachers to punish students for
violence and lack of discipline. The two-
way classification of the responses of
these adults is represented in the
following table.
38
Example 11-5
In Favor Against No Opinions
(F) (A) (N)

Men (M) 93 70 12
 Women (W)the 87
Calculate 32frequencies
expected 6 for
this table assuming that the two
attributes, gender and opinions on the
issue, are independent.

39
Table 11.6
Solution 11-5
In Favor Against No Opinion Row
(F) (A) (N) Totals

Men (M) 93 70 12 175


Women (W) 87 32 6 125
Column Totals 180 102 18 300

40
Expected Frequencies for
a Test of Independence
 The expected frequency E for a cell is
calculated as

(Row total)(Column total)


E=
sample size

41
Table 11.7
Solution 11-5
In Favor Against No Opinion Row
(F) (A) (O) Totals

Men (M) 93 70 12 175


(105.00) (59.50) (10.50)

Women (W) 87 32 6 125


(75.00) (42.50) (7.50)
42
Example 11-6
 Reconsider the two-way classification table
given in Example 11-5. In that example, a
random sample of 300 adults was selected,
and they were asked if they favor giving
more freedom to schoolteachers to punish
students for violence and lack of discipline.
Based on the results of the survey, a two-
way classification table was prepared and
presented in Example 11-5. Does the
sample provide sufficient information to
conclude that the two attributes, gender
and opinions of adults, are dependent? Use
a 1% significance level.
43
Solution 11-6
 H0: Gender and opinions of adults are
independent
 H1: Gender and opinions of adults are
dependent

44
Solution 11-6
 α = .01
 df = (R – 1)(C – 1) = (2 – 1)(3 – 1) = 2
 The critical value of χ2 = 9.210

45
Figure 11.6

Do not reject H0 Reject H0

α = .01

9.210 χ2
Critical value of χ2 46
Table 11.8
In Favor Against No Opinion Row
(F) (A) (N) Totals

Men 93 70 12 175
(M) (105.00) (59.50) (10.50)
Women 87 32 6 125
(W) (75.00) (42.50) (7.50)
Column 180 102 18 300
Totals
47
Solution 11-6

(O − E ) 2
χ2 = ∑
E

=
( 93 − 105.00 )
2
+
( 70 − 59.50 )
2
+
( 12 − 10.50 )
2

105.00 59.50 10.50

+
( 87 − 75.00 )
2
+
( 32 − 42.50 )
2
+
( 6 − 7.50 )
2

75.00 42.50 7.50


= 1.371 + 1.853 + .214 + 1.920 + 2.594 + .300 = 8.252

48
Solution 11-6
 The value of the test statistic χ2 =
8.252
 It is less than the critical value of χ2
 It falls in the nonrejection region
 Hence, we fail to reject the null
hypothesis

49
Example 11-7
 A researcher wanted to study the
relationship between gender and
owning cell phones. She took a
sample of 2000 adults and obtained
the information given in the following
table.

50
Example 11-7

Own Cell Phones Do Not Own Cell


Phones
Men 640 450
Women 440 470

 At the 5% level of significance, can


you conclude that gender and owning
cell phones are related for all adults?
51
Solution 11-7
 H0: Gender and owning a cell phone
are not related
 H1: Gender and owning a cell phone
are related

52
Solution 11-7
 We are performing a test of
independence
 We use the chi-square distribution
 α = .05.
 df = (R – 1)(C – 1) = (2 – 1)(2 – 1) = 1
 The critical value of χ2 = 3.841

53
Figure 11.7

Do not reject H0 Reject H0

α = .05

3.841 χ2
Critical value of χ2 54
Table 11.9
Own Cell Do Not Own Cell Row
Phones (Y) Phones Totals
(N)
Men 640 450 1090
(M) (588.60) (501.40)
Women 440 470 910
(W) (491.40) (418.60)

Column 1080 920 2000


Totals
55
Solution 11-7

(O − E )
2
χ =∑
2

=
( 640 − 588.60)
2
+
( 450 − 501.40 )
2

588.60 501.40

+
( 440 − 491.40 )
2
+
( 470 − 418.60)
2

491.40 481.60
= 4.489 + 5.269 + 5.376 + 6.311 = 21.445

56
Solution 11-7
 The value of the test statistic χ2 =
21.445
 It is larger than the critical value of χ2
 It falls in the rejection region
 Hence, we reject the null hypothesis

57
A Test of Homogeneity
 Definition
 A test of homogeneity involves
testing the null hypothesis that the
proportions of elements with certain
characteristics in two or more different
populations are the same against the
alternative hypothesis that these
proportions are not the same.
58
Example 11-8
 Consider the data on income
distributions for households in
California and Wisconsin given in
following table:
Californi Wisconsin Row Totals
a
High Income 70 34 104
Medium 80 40 120
Income
Low Income 100 76 176
59
Example 11-8
 Using the 2.5% significance level, test
the null hypothesis that the
distribution of households with regard
to income levels is similar
(homogeneous) for the two states.

60
Solution 11-8
 H0: The proportions of households that
belong to different income groups are the
same in both states
 H1: The proportions of households that
belong to different income groups are
not the same in both states

61
Solution 11-8
 α = .025
 df = (R – 1)(C – 1) = (3 – 1)(2 – 1) = 2
 The critical value of χ2 = 7.378

62
Figure 11.7

Do not reject H0 Reject H0

α=.
025

7.378 χ2
Critical value of χ2 63
Table 11.11
California Wisconsin Row Totals

High income 70 34 104


(65) (39)
Medium income 80 40 120
(75) (45)
Low income 100 76 176
(110) (66)
Column Totals 250 150 400

64
Solution 11-8
(O − E ) 2
χ2 = ∑
E

=
( 70 − 65)
2
+
( 34 − 39)
2
+
( 80 − 75)
2

65 39 75

+
( 40 − 45)
2
+
( 100 − 110 )
2
+
( 76 − 66 )
2

45 110 66
= .385 + .641 + .333 + .566 + .909 + 1.515 = 4.339

65
Solution 11-8
 The value of the test statistic χ2 =
4.339
 It is less than the critical value of χ2
 It falls in the nonrejection region
 Hence, we fail to reject the null
hypothesis

66
INFERENCES ABOUT THE
POPULATION VARIANCE
 Estimation of the Population Variance
 Hypothesis Tests About the
Population Variance

67
INFERENCES ABOUT THE
POPULATION VARIANCE
cont.
 Sampling Distribution of (n – 1)s2 / σ2
 If the population from which the
sample is selected is (approximately)
normally distributed, then
(n − 1) s 2

σ 2

 has a chi-square distribution with n –


1 degrees of freedom.
68
Estimation of the
Population Variance
 Assuming that the population from
which the sample is selected is
(approximately) normally distributed,
the (1 – α)100% confidence interval
for the population variance σ2 is
(n − 1) s 2 (n − 1) s 2
to
χα / 2
2
χ 1−α / 2
2

69
Example 11-9
 One type of cookie manufactured by
Haddad Food Company is Cocoa
Cookies. The machine that fills
packages of these cookies is set up in
such a way that the average net weight
of these packages is 32 ounces with a
variance of .015 square ounce.

70
Example 11-9
 From time to time the quality control
inspector at the company selects a sample of
a few such packages, calculates the variance
of the net weights of these packages, and
construct a 95% confidence interval for the
population variance. If either both or one of
the two limits of this confidence interval is
not the interval .008 to .030, the machine is
stopped and adjusted.

71
Example 11-9
 A recently taken random sample of 25
packages from the production line
gave a sample variance of .029 square
ounce. Based on this sample
information, do you think the machine
needs an adjustment? Assume that the
net weights of cookies in all packages
are normally distributed.

72
Solution 11-9
 n = 25 s2 = .029
 α = 1 - .95 = .05
 α / 2 = .05 / 2 = .025
 1 – α / 2 = 1 – .025 = .975
 df = n – 1 = 25 – 1 = 24
 χ2 for 24 df and .025 area in the right tail =
39.364
 χ2 for 24 df and .975 area in the right tail =
12.401
73
Figure 11.9

df = 24

α = .025
2

39.36 χ2
χ2
Value of α / 2
4

74
Figure 11.9

df = 24

α = .025
1−
2

12.40 χ2
1
Value ofχ
2
1−α 2
75
Solution 11-9

(n − 1) s 2
(n − 1) s 2
to
χα / 2
2
χ 1−α / 2
2

(25 − 1)(.029) (25 − 1)(.029)


to
39.364 12.401
.0177 to .0561

76
Solution 11-9
 Thus, with 95% confidence, we can
state that the variance for all
packages of Cocoa Cookies lies
between .0177 and .0561 square
ounce.

77
Hypothesis Tests About
the Population Variance
 The value of the test statistic χ2 is calculated
as
(n − 1) s 2
χ =
2

σ 2

 where s2 is the sample variance, σ2 is the


hypothesized value of the population variance,
and n – 1 represents the degrees of freedom.
The population from which the sample is
selected is assumed to be (approximately)
normally distributed.
78
Example 11-10
 One type of cookie manufactured by Haddad Food
Company is Cocoa Cookies. The machine that fills
packages of these cookies is set up in such a way
that the average net weight of these packages is
32 ounces with a variance of .015 square ounce.
From time to time the quality control inspector at
the company selects a sample of a few such
packages, calculates the variance of the net
weights of these packages, and makes a test of
hypothesis about the population variance.

79
Example 11-10
 She always uses α = .01. The
acceptable value of the population
variance is .015 square ounce or
less. If the conclusion from the test
of hypothesis is that the population
variance is not within the acceptable
limit, the machine is stopped and
adjusted.
80
Example 11-10
 A recently taken random sample of
25 packages from the production line
gave a sample variance of .029
square ounce. Based on this sample
information, do you think the
machine needs an adjustment?
Assume that the net weights of
cookies in all packages are normally
distributed.

81
Solution 11-10
 H0 :σ2 ≤ .015
 The population variance is within the
acceptable limit
 H1: σ2 >.015
 The population variance exceeds the
acceptable limit

82
Solution 11-10
 α = .01
 df = n – 1 = 25 – 1 = 24
 The critical value of χ2 = 42.980

83
Figure 11.10

Do not reject H0 Reject H0

α = .01

χ2
42.980
Critical value of χ2
84
Solution 11-10

(n − 1) s
2
(25 − 1)(.029)
χ =
2
= = 46.400
σ 2
.015

From H0

85
Solution 11-10
 The value of the test statistic χ2 = 46.400
 It is greater than the critical value of χ 2

 It falls in the rejection region


 Hence, we reject the null hypothesis H0
 We conclude that the population variance is not within
the acceptable limit
 The machine should be stopped and adjusted

86
Example 11-11
 The variance of scores on a standardized
mathematics test for all high school seniors was
150 in 2002. A sample of scores for 20 high school
seniors who took this test this year gave a variance
of 170. Test at the 5% significance level if the
variance of current scores of all high school seniors
on this test is different from 150. Assume that the
scores of all high school seniors on this test are
(approximately) normally distributed.

87
Solution 11-11
 H0: σ2 = 150
 The population variance is not different
from 150
 H1: σ2 ≠ 150
 The population variance is different from
150

88
Solution 11-11
 α = .05
 Area in the each tail = .025
 df = n – 1 = 20 – 1 = 19
 The critical values of χ2 32.852 and
8.907

89
Figure 11.11

Do not reject H0 Reject H0


Reject H0

α /2 = .025 α /2 = .025

8.907 32.852
Two critical values of χ2

90
Solution 11-11

(n − 1) s
2
(20 − 1)(170)
χ =
2
= = 21.533
σ 2
150

From H0

91
Solution 11-11
 The value of the test statistic χ2 =
21.533
 It is between the two critical values of χ2
 It falls in the nonrejection region
 Consequently, we fail to reject H0.

92

You might also like