Professional Documents
Culture Documents
Chapter 12
Analysis of Categorical Data
LEARNING OBJECTIVES
This chapter presents several nonparametric statistics that can be used to analyze data
enabling you to:
Chapter 12 is a chapter containing the two most prevalent chi-square tests: chi-
square goodness-of-fit and chi-square test of independence. These two techniques are
important because they give the statistician a tool that is particularly useful for analyzing
nominal data (even though independent variable categories can sometimes have ordinal
or higher categories). It should be emphasized that there are many instances in business
research where the resulting data gathered are merely categorical identification. For
example, in segmenting the market place (consumers or industrial users), information is
gathered regarding gender, income level, geographical location, political affiliation,
religious preference, ethnicity, occupation, size of company, type of industry, etc. On
these variables, the measurement is often a tallying of the frequency of occurrence of
individuals, items, or companies in each category. The subject of the research is given no
"score" or "measurement" other than a 0/1 for being a member or not of a given category.
These two chi-square tests are perfectly tailored to analyze such data.
CHAPTER OUTLINE
KEY TERMS
SOLUTIONS TO CHAPTER 16
( f0 − fe )2
12.1 f0 fe
f0
53 68 3.309
37 42 0.595
32 33 0.030
28 22 1.636
18 10 6.400
15 8 6.125
( f0 − fe )2
Observed χ = ∑
2
= 18.095
fe
df = k - 1 = 6 - 1 = 5, α = .05
χ 2
.05,5 = 11.07
The observed frequencies are not distributed the same as the expected
frequencies.
( f0 − fe )2
12.2 f0 fe
f0
19 18 0.056
17 18 0.056
14 18 0.889
18 18 0.000
19 18 0.056
21 18 0.500
18 18 0.000
18 18 0.000
Chapter 12: Analysis of Categorical Data 4
x=
∑f 0
=
144
= 18
k 8
df = k – 1 = 8 – 1 = 7, α = .01
χ 2
.01,7 = 18.4753
( f0 − fe )2
Observed χ = ∑
2
= 1.557
fe
0 28 0
1 17 17
2 11 22
3 5 15
54
54
λ = =0.9
61
Expected Expected
Number Probability Frequency
0 .4066 24.803
1 .3659 22.312
2 .1647 10.047
>3 .0628 3.831
Chapter 12: Analysis of Categorical Data 5
( f0 − fe )2
Number fo fe
f0
0 28 24.803 0.412
1 17 22.312 1.265
>2 16 13.878 0.324
61 60.993 2.001
df = k - 2 = 3 - 2 = 1, = .05
χ 2
.05,1 = 3.84146
( f0 − fe )2
Calculated χ = ∑
2
= 2.001
fe
12.4
x=
∑ fm = 5,715 = 44.3
∑f 129
(∑ fM ) 2 (5,715 ) 2
s= ∑ fM 2 − n
279 ,825 −
129 = 14.43
=
n −1 128
Chapter 12: Analysis of Categorical Data 6
10 − 44 .3
z = = -2.38 .4913
14 .43
20 − 44 .3
z = = -1.68 - .4535
14 .43
Expected prob.: .0378
60 − 44 .3
z = = 1.09 .3621
14 .43
for x = 50, z = 0.40 -.1554
Expected prob: .2067
Chapter 12: Analysis of Categorical Data 7
70 − 44 .3
z = = 1.78 .4625
14 .43
for x = 60, z = 1.09 -.3621
Expected prob: .1004
80 − 44 .3
z = = 2.47 .4932
14 .43
for x = 70, z = 1.78 -.4625
Expected prob: .0307
Probability between 80 and the mean, 44.3 = (.0307 + .1004 + .2067 + .1554) =
.4932. Probability > 80 = .5000 - .4932 = .0068
Due to the small sizes of expected frequencies, category < 10 is folded into 10-20
and >80 into 70-80.
Chapter 12: Analysis of Categorical Data 8
( f0 − fe )2
Category fo fe
f0
10-20 6 5.87 .003
20-30 14 14.78 .041
30-40 29 28.51 .008
40-50 38 32.26 .213
50-60 25 26.66 .103
60-70 10 12.95 .672
70-80 7 4.84 .964
2.004
( f0 − fe )2
Calculated χ = ∑
2
= 2.004
fe
df = k - 3 = 7 - 3 = 4, α = .05
χ 2
.05,1 = 9.48773
Since the observed χ 2 = 2.004 > χ 2.05,4 = 9.48773, the decision is to fail to reject
the null hypothesis. There is not enough evidence to declare that the observed
frequencies are not normally distributed.
( f0 − fe )2
12.5 Definition fo Exp.Prop. fe
f0
Happiness 42 .39 227(.39)= 88.53 24.46
Sales/Profit 95 .12 227(.12)= 27.24 168.55
Helping Others 27 .18 40.86 4.70
Achievement/
Challenge 63 .31 70.34 0.77
227 198.48
Ho: The observed frequencies are distributed the same as the expected
frequencies.
Ha: The observed frequencies are not distributed the same as the expected
frequencies.
Observed χ 2 = 198.48
df = k – 1 = 4 – 1 = 3, α = .05
χ 2
.05,3 = 7.81473
Chapter 12: Analysis of Categorical Data 9
The observed frequencies for men are not distributed the same as the
expected frequencies which are based on the responses of women.
( f0 − fe )2
12.6 Age fo Prop. from survey fe
f0
10-14 22 .09 (.09)(212)=19.08 0.45
15-19 50 .23 (.23)(212)=48.76 0.03
20-24 43 .22 46.64 0.28
25-29 29 .14 29.68 0.02
30-34 19 .10 21.20 0.23
> 35 49 .22 46.64 0.12
212 1.13
Ha: The distribution of observed frequencies is not the same as the distribution of
expected frequencies.
α = .01, df = k - 1 = 6 - 1 = 5
χ 2
.01,5 = 15.0863
x=
∑ fM =
9,155
= 39.63
n 231
(∑ fM ) 2 (9,155 ) 2
s = ∑ fM 2 − n
405,375 −
231 = 13.6
=
n −1 230
10 − 39 .63
z = = -2.18 .4854
13 .6
20 − 39 .63
z = = -1.44 -.4251
13 .6
Expected prob. .0603
30 − 39 .63
z = = -0.71 -.2611
13 .6
Expected prob. .1640
40 − 39 .63
z = = 0.03 +.0120
13 .6
Expected prob. .2731
Chapter 12: Analysis of Categorical Data 11
50 − 39 .63
z = = 0.76 .2764
13 .6
60 − 39 .63
z = = 1.50 .4332
13 .6
70 − 39 .63
z = = 2.23 .4871
13 .6
Age Probability fe
< 10 .0146 (.0146)(231) = 3.37
10-20 .0603 (.0603)(231) = 13.93
20-30 .1640 37.88
30-40 .2731 63.09
40-50 .2644 61.08
50-60 .1568 36.22
Chapter 12: Analysis of Categorical Data 12
( f0 − fe )2
Age fo fe
f0
10-20 16 17.30 0.10
20-30 44 37.88 0.99
30-40 61 63.09 0.07
40-50 56 61.08 0.42
50-60 35 36.22 0.04
60-70 19 15.43 0.83
2.45
df = k - 3 = 6 - 3 = 3, α = .05
χ 2
.05,3 = 7.81473
Observed χ 2 = 2.45
λ =
∑ f ⋅ number =
358
= 2.4
∑f 150
Number Probability fe
0 .0907 (.0907)(150 = 13.61
1 .2177 (.2177)(150) = 32.66
2 .2613 39.20
3 .2090 31.35
4 .1254 18.81
5 .0602 9.03
6 or more .0358 5.36
( f0 − fe )2
fo fe
f0
18 13.61 1.42
28 32.66 0.66
47 39.66 1.55
21 31.35 3.42
16 18.81 0.42
11 9.03 0.43
9 5.36 2.47
10.37
α = .01, df = k – 2 = 7 – 2 = 5, χ 2
.01,5 = 15.0863
There is not enough evidence to reject the claim that the observed
frequencies are Poisson distributed.
( f0 − fe )2
fo fe
f0
χ 2
.025,1 = 5.02389
( f0 − fe )2
f0 fe
f0
Provide 42 180(.30) = 54 2.6666
χ 2
.025,1 = 5.02389
12.11
Variable Two
Variable 203 326 529
One 68 110 178
271 436 707
Variable Two
Variable (202.77 (326.23
One ) ) 529
203 326
(68.23) (109.77 178
68 )
110
χ 2
.05,1 = 3.84146
Since the observed χ 2 = 0.00 < χ 2.05,1 = 3.84146, the decision is to fail to reject
the null hypothesis.
Variable One is independent of Variable Two.
Chapter 12: Analysis of Categorical Data 16
12.12
Variable
Two
Variable 24 13 47 58 142
One 93 59 187 244 583
Variable
Two
Variable (22.92 (14.10 (45.83) (59.15)
One ) ) 47 58 142
24 13
(94.08 (57.90 (188.17 (242.85) 583
) ) ) 244
93 59 187
χ 2
.01,3 = 11.3449
12.13
Social Class
Lower Middle Upper
Number 0 7 18 6 31
of 1 9 38 23 70
Children 2 or 3 34 97 58 189
>3 47 31 30 108
97 184 117 398
(31 )( 97 ) (189 )( 97 )
e11 = = 7.56 e31 = = 46.06
398 398
(70 )( 97 ) (108 )( 97 )
e21 = = 17.06 e41 = = 26.32
398 398
Social Class
Lower Middle Upper
0 (7.56) (14.33 (9.11)
Number 7 ) 6 31
of 1 18
Children (17.06 (32.36 (20.58 70
2 or 3 ) ) )
9 38 23 189
>3 (46.06 (87.38 (55.56
) ) ) 108
34 97 58
(26.32 (49.93 (31.75
) ) )
47 31 30
97 184 117 398
.04 + .94 + 1.06 + 3.81 + .98 + .28 + 3.16 + 1.06 + .11 + 16.25 +
χ 2
.05,6 = 12.5916
12.14
Type of Music Preferred
Rock R&B Coun Clssic
NE 140 32 5 18 195
Region S 134 41 52 8 235
W 154 27 8 13 202
428 100 65 39 632
428 100 65 39
.48 + .04 + 11.31 + 2.96 + 3.97 + .39 + 32.04 + 2.91 + 2.16 + .77 +
χ 2
.01,6 = 16.8119
12.15
Transportation Mode
Air Train Truck
Industr Publishing 32 12 41 85
y Comp.Hard. 5 6 24 35
37 18 65 120
(85 )( 37 ) (35 )( 37 )
e11 = = 26.21 e21 = = 10.79
120 120
(85 )( 65 ) (35 )( 65 )
e13 = = 46.04 e23 = = 18.96
120 120
Transportation Mode
Air Train Truck
Industr Publishing (26.21 (12.75 (46.04
y ) ) ) 85
32 12 41
Comp.Hard. (10.79 (5.25) (18.96 35
) 6 ) 120
5 24
37 18 65
χ 2
.05,2 = 5.99147
12.16
Number of Bedrooms
<2 3 >4
Number of 1 116 101 57 274
Stories 2 90 325 160 575
206 426 217 849
χ 2
= 36.89 + 9.68 + 2.42 + 17.58 + 4.61 + 1.16 = 72.34
χ 2
.10,2 = 4.60517
12.17
Mexican Citizens
Yes No
Type Dept. 24 17 41
of Disc. 20 15 35
Store Hard. 11 19 30
Shoe 32 28 60
87 79 166
( 41 )( 87 ) (30 )( 87 )
e11 = = 21.49 e31 = = 15.72
166 166
( 41 )( 79 ) (30 )( 79 )
e12 = = 19.51 e32 = = 14.28
166 166
(35 )( 87 ) (60 )( 87 )
e21 = = 18.34 e41 = = 31.45
166 166
(35 )( 79 ) (60 )( 79 )
e22 = = 16.66 e42 = = 28.55
166 166
Mexican Citizens
Yes No
Type Dept. (21.49 (19.51
of ) ) 41
Store 24 17
Disc. (18.34 (16.66 35
) )
20 15 30
Hard. (15.72 (14.28
) ) 60
11 19
Shoe (31.45 (28.55
) )
32 28
87 79 166
(24 − 21 .49 ) 2 (17 −19 .51) 2 ( 20 −18 .34 ) 2 (15 −16 .66 ) 2
χ 2
= + + +
21 .49 19 .51 18 .34 16 .66
+
χ 2
.05,3 = 7.81473
12.18 α = .01, k = 7, df = 6
Use:
( f0 − fe )2
χ =∑
2
fe
critical χ 2
.01,7 = 18.4753
( f0 − fe )2
fo fe (f0-fe)2
f0
214 206 64 0.311
235 232 9 0.039
279 268 121 0.451
281 284 9 0.032
264 268 16 0.060
254 232 484 2.086
211 206 25 0.121
3.100
( f0 − fe )2
χ2 = ∑ = 3.100
fe
12.19
Variable 2
12 23 21 56
Variable 1 8 17 20 45
7 11 18 36
27 51 59 137
.084 + .222 + .403 + .085 + .004 + .020 + .001 + .430 + .402 = 1.652
χ 2
.05,4 = 9.48773
12.20
Location
NE W S
Customer Industrial 230 115 68 413
Retail 185 143 89 417
415 258 157 830
Location
NE W S
Customer Industrial (206.5 (128.38 (78.12
) ) ) 413
230 115 68
Retail (208.5 (129.62 (78.88
) ) ) 417
185 143 89
415 258 157 830
χ 2
.10,2 = 4.60517
( f0 − fe )2
fo fe
f0
189 175.67 1.01
168 175.67 0.33
155 175.67 2.43
161 175.67 1.23
216 175.67 9.26
165 175.67 0.65
14.91
α = .05 df = k - 1 = 6 - 1 = 5
χ 2
.05,5 = 11.0705
12.22
Gender
M F
Bought Y 207 65 272
Car N 811 984 1,795
1,018 1,049 2,067
Gender
M F
Bought Y (133.96 (138.04
Car ) ) 272
207 65
N (884.04 (910.96
) ) 1,795
811 984
1,018 1,049 2,067
χ 2
.05,1 = 3.841
λ =
∑( f 0 )( arrivals )
=
426
= 2.2
∑f 0 192
Arrivals Probability fe
0 .1108 (.1108)(192) = 21.27
1 .2438 (.2438)(192) = 46.81
2 .2681 51.48
3 .1966 37.75
4 .1082 20.77
5 .0476 9.14
6 .0249 4.78
( f0 − fe )2
fo fe
f0
26 21.27 1.05
40 46.81 2.18
57 51.48 0.59
32 37.75 0.88
17 20.77 0.68
12 9.14 0.89
8 4.78 2.17
8.44
Observed χ 2
= 8.44
α = .05 df = k - 2 = 7 - 2 = 5
χ 2
.05,5 = 11.0705
Since the observed χ 2 = 8.44 < χ 2.05,5 = 11.0705, the decision is to fail to reject
the null hypothesis. There is not enough evidence to reject the claim that the
observed frequency of arrivals is Poisson distributed.
Chapter 12: Analysis of Categorical Data 31
( f0 − fe )2
Soft Drink fo proportions fe
f0
Classic Coke 361 .206 (.206)(1726) = 355.56 0.08
Pepsi 272 .145 (.145)(1726) = 250.27 1.89
Diet Coke 192 .085 146.71 13.98
Mt. Dew 121 .063 108.74 1.38
Dr. Pepper 94 .059 101.83 0.60
Sprite 102 .062 107.01 0.23
Others 584 .380 655.86 7.87
∑fo = 1,726 26.03
Calculated χ 2
= 26.03
α = .05 df = k - 1 = 7 - 1 = 6
χ 2
.05,6 = 12.5916
The observed frequencies are not distributed the same as the expected frequencies
from the national poll.
12.25
Position
Systems
Manager Programmer Operato Analyst
r
0-3 6 37 11 13 67
Years 4-8 28 16 23 24 91
>8 47 10 12 19 88
81 63 46 56 246
Chapter 12: Analysis of Categorical Data 32
(67 )( 81 ) (91 )( 46 )
e11 = = 22.06 e23 = = 17.02
246 246
(67 )( 63 ) (91 )( 56 )
e12 = = 17.16 e24 = = 20.72
246 246
(67 )( 46 ) (88 )( 81 )
e13 = = 12.53 e31 = = 28.98
246 246
(67 )( 56 ) (88 )( 63 )
e14 = = 15.25 e32 = = 22.54
246 246
(91 )( 81 ) (88 )( 46 )
e21 = = 29.96 e33 = = 16.46
246 246
(91 )( 63 ) (88 )( 56 )
e22 = = 23.30 e34 = = 20.03
246 246
Position
Systems
Manager Programmer Operato Analyst
r
0-3 (22.06) (17.16) (12.53) (15.25) 67
Years 6 37 11 13
4-8 (29.96) (23.30) (17.02) (20.72) 91
28 16 23 24
>8 (28.98) (22.54) (16.46) (20.03) 88
47 10 12 19
81 63 46 56 246
(6 − 22 .06 ) 2 (37 −17 .16 ) 2 (11 −12 .53 ) 2 (13 −15 .25 ) 2
χ 2
= + + + +
22 .06 17 .16 12 .53 15 .25
11.69 + 22.94 + .19 + .33 + .13 + 2.29 + 2.1 + .52 + 11.2 + 6.98 +
χ 2
.01,6 = 16.8119
Since the observed χ 2 = 59.63 > χ 2.01,6 = 16.8119, the decision is to reject the
null hypothesis. Position is not independent of number of years of experience.
( f0 − fe )2
fo fe
f0
More Work,
More Business 120 (.43)(315) = 135.45 1.76
χ 2
.025,1 = 5.02389
12.27
Type of College or University
Community Large Small
College University College
Number 0 25 178 31 234
of 1 49 141 12 202
Children 2 31 54 8 93
>3 22 14 6 42
127 387 57 571
( 234 )( 57 ) (93 )( 57 )
e13 = = 23.36 e33 = = 9.28
571 571
(202 )( 57 ) (42 )( 57 )
e23 = = 20.16 e43 = = 4.19
571 571
χ 2
.05,6 = 12.5916
12.28 The observed chi-square is 30.18 with a p-value of .0000043. The chi-square
goodness-of-fit test indicates that there is a significant difference between the
observed frequencies and the expected frequencies. The distribution of responses
to the question are not the same for adults between 21 and 30 years of age as they
are to others. Marketing and sales people might reorient their 21 to 30 year old
efforts away from home improvement and pay more attention to leisure
travel/vacation, clothing, and home entertainment.
12.29 The observed chi-square value for this test of independence is 5.366. The
associated p-value of .252 indicates failure to reject the null hypothesis. There is
not enough evidence here to say that color choice is dependent upon gender.
Automobile marketing people do not have to worry about which colors especially
appeal to men or to women because car color is independent of gender. In
addition, design and production people can determine car color quotas based on
other variables.