You are on page 1of 25

CHAPTER 9 ..

Categorical Data Analysis


9.2 a. The one-way table is shown below:
Turned Left 357 Turned Right 321 Drove Straight 294 Total 972

b.

The form of the confidence interval is:

1 z / 2 p

1 (1 p 1 ) p n

1 = p

294 = .3025 972

For confidence coefficient .95, = 1 .95 = .05 and /2 = .05/2 = .025. From Table 5, Appendix B, z.025 = 1.96. The confidence interval is:
.3025 1.96 .3025(1 .3025) .3025 .0289 (.2736, .3314) 972

c.

The form of the confidence interval is:


1 p 2 ) z / 2 (p 1 (1 p 1 ) + p 2 (1 p 2) + 2 p 1 p 2 p n

1 = p

357 = .3673 972

2 = p

321 = .3302 972

The confidence interval is: (.3673 .3302) 1.96 .3673(1 .3673) + .3302(1 .3302) + 2(.3673)(.3302) 972

.0371 .0524 (.0153, .0895)


We are 95% confident the difference in the proportion of cars turning left and right is contained between .0153 and .0895. 9.4 a. Let p1 = the proportion of WMU students who agree that their DSIP research experience is valuable to their professional future.

1 = p

47 = .94 50

190

Chapter 9

1 z The confidence interval for p1 is p

1q 1 p n

For confidence coefficient .99, = 1 .99 = .01 and /2 = .01/2 = .005. From Table 5 in Appendix B, z.005 = 2.576. The 99% confidence interval for p1 is: .94 2.576 b. .94(.06) .94 .087 (.853,1.027) 50

Let p1 = the proportion of WMU students who agree that their DSIP research experience is valuable to their professional future and let p2 = the proportion of WMU students who are neutral about the statement.

1 = p

47 = .94 50

and

2 = p

3 = .06 50

The confidence interval for p1 p2 is:

1 p 2 ) z 2 (p

1 (1 p 1 ) + p 2 (1 p 2) + 2p 1 p 2 p n

For confidence coefficient .99, = 1 .99 = .01 and /2 = .01/2 = .005. From Table 5 in Appendix B, z.005 = 2.576. The 99% confidence interval for p1 is: (.94 .06) 2.576 9.6 The form of the interval is: 1 p 2 z p 1 = p 1 (1 p 1 ) + p 2 (1 p 2) + 2 p 1 p 2 p n 2 = p 15 = .1667 90 .94(.06) + .06(.94) + 2(.94)(.06) .88 .173 (.707,1.053) 50

58 = .6444 90

For confidence coefficient .95, = .1 .95 = .05 and /2 = .05/2 = .025. From Table 5, Appendix B, z.025 = 1.96. The 95% confidence interval is: (.6444 .1667) 1.96 .6444(.3556) + .1667(.8333) + 2(.6444)(.1667) 90

.4777 .1577 (.3200, .6354)

We are 95% confident the difference between the proportion of subjects who selected brighter side up and the proportion who select darker side up falls in the interval .3200 to .6354. Categorical Data Analysis 191

9.8

a.

The form of the confidence interval for pC is:

C z / 2 p C = p

C (1 p C ) p n

nC 22 = = .22 n 100

For confidence coefficient .90, = 1 .90 = .10 and /2 = .10/2 = .05. From Table 5, Appendix B, z.05 = 1.645 . The confidence interval is:
.22 1.645 b. .22(1 .22) .22 .068 (.152, .288) 100

The form of the confidence interval for ( pE pB ) is: E p B ) z / 2 (p


E = p B = p nE 19 = = .19 n 100 nB 27 = = .27 n 100

E (1 p E ) + p B (1 p B ) + 2 p E p B p n

Using the information from part a, the confidence interval is:


(.19 .27) 1.645 .19(1 .19) + .27(1 .27) + 2(.19)(.27) 100

.08 .111 (.191, .031) c.


A = p nA 17 = = .17 n 100

D = p

nD 15 = = .15 n 100

Using the information from part b, the confidence interval is:


(.17 .15) 1.645 .17(1 .17) + .15(1 .15) + 2(.17)(.15) 100

.02 .093 (.073, .095)

192

Chapter 9

9.10

a.

To determine if the opinions of Internet users are evenly divided among the four categories, we test:
H 0 : p1 = p2 = p3 = p4 = .25 H a : At least two of the proportions differ

b.

The expected numbers in each category are:


E(ni )= npi = 328(.25) = 82

The test statistic is: =


2

[ ni E (ni )]2 = (59 82)2 + (108 82)2 + (82 82)2 + (79 82)2 = 14.805
E ( ni )

82

82

82

82

The rejection region requires = .05 in the upper tail of the 2 distribution with df = k
2 1 = 4 1 = 3. From Table 8 in Appendix B, .05 = 7.81473. The rejection region is

2 > 7.81473. Since the observed value of the test statistic does fall in the rejection region ( 2 = 14.805 > 7.81473), H0 is rejected. There is sufficient evidence to indicate that the opinions of Internet users are not evenly divided among the four categories. c. A Type I error would occur if we conclude that differences exist when, in fact, they do not. A Type II error would occur if we conclude that no differences exist when, in fact, they do. d. 9.12 The expected cell counts must all be at least five and the multinomial assumptions must be met.

To determine if there are significant differences in the percentage of incidents in the four cause categories, we test:
H 0 : p1 = p2 = p3 = p4 = .25 H a : At least two of the proportions differ

The expected numbers in each category are:


E(ni )= npi = 83(.25) = 20.75

Categorical Data Analysis

193

The test statistic is: 2 =

[ ni E (ni )]2 = (27 20.75)2 + (24 20.75)2 + (22 20.75)2 + (10 20.75)2 = 8.04
E (ni )

20.75

20.75

20.75

20.75

The rejection region requires = .05 in the upper tail of the 2 distribution with df = k 1 =
2 4 1 = 3. From Table 8 in Appendix B, .05 = 7.81473. The rejection region is 2 > 7.81473.

Since the observed value of the test statistic does fall in the rejection region 2 = 8.04 > 7.81473), H0 is rejected. There is sufficient evidence to indicate that there are significant differences in the percentage of incidents in the four cause categories. 9.14 a. To determine if the traffic is equally divided among the three directions, we test:
H 0 : p1 = p2 = p3 = 1/ 3 H a : At least two proportions are unequal

The expected number in each category is:


1 E(ni )= npi = 972 = 324 (i = 1, 2, 3) 3

The observed and expected category counts are:


Straight Turn Right Turn Left Observed 294 321 357 Expected 324 324 324

The test statistic is: =


2

( ni npi )
npi

(294 324) 2 (321 324) 2 (357 324) 2 + + = 6.167 324 324 324

The rejection region requires = .05 in the upper tail of the 2 distribution with df = k
2 1 = 3 1 = 2. From Table 8, Appendix B, .05 = 5.99147. The rejection region is

2 > 5.99147. Since the observed value of the test statistic falls in the rejection region ( 2 = 6.167 > 5.99147), H 0 is rejected. There is sufficient evidence to indicate the traffic is not equally divided at = .05.

194

Chapter 9

b.

To determine if more than one-third of all automobiles entering the intersection turn left, we test:

H 0 : p = 1/ 3 H a : p > 1/ 3
The rejection region for this large-sample, one-tailed test requires = .05 in the upper tail of the z distribution. From Table 5, Appendix B, z.05 = 1.645. The rejection region is z > 1.645.
357 1 p 0 972 3 p The test statistics is z = = = 2.25 p0 q0 1 2 i 3 3 n 972 Since the observed value of the test statistic falls in the rejection region (z = 2.25 > 1.645), H 0 is rejected. There is sufficient evidence to indicate the proportion of all automobiles entering this intersection that turn left exceeds 1/3 using = .05. 9.16 To determine if three proportions differ, we test: H 0 : p1 = p2 = p3 = 1/ 3 H a : At least two of the proportions differ The expected cell counts are:
1 E(ni) = npi = 90 = 30 (i = 1, 2, 3) 3 The observed and expected category counts are:

Observed Expected The test statistic is: =


2

Brighter Side Up Darker Side Up Aligned 58 15 17 30 30 30

( ni npi )
npi

(58 30) 2 (15 30) 2 (17 30) 2 + + = 39.267 30 30 30

The rejection region requires = .05 in the upper tail of the 2 distribution with df = k 1 =
2 3 1 = 2. From Table 8, Appendix B, .05 = 5.99147. The rejection region is 2 > 5.99147.

Since the observed value of the test statistic falls in the rejection region ( 2 = 39.267 > 5.99147), H 0 is rejected. There is sufficient evidence to indicate at least two of the proportions differ at = .05.

Categorical Data Analysis

195

9.18

For k = 2: 2 =

i =1

( ni npi )
npi

( n1 np1 )2 + ( n2 np2 )2
np1 np2

For a binomial experiment, n1 = y, n2 = n y, p1 = p, and p2 = (1 p)


( y np) 2 [ (n y ) n(1 p )] + = np n(1 p)
2 2

y 2 2 ynp + n 2 p 2 (n y ) 2 2n( n y )(1 p) + n 2 (1 p) 2 + np n(1 p) y 2 2 ynp + n 2 p 2 n2 2ny + y 2 2n 2 + 2n 2 p + 2ny 2npy + n 2 2n 2 p + n 2 p 2 + np n(1 p)


2 2 2 2 2 2 2

y (1 p ) 2 ynp (1 p ) + n p (1 p ) + n p 2 nyp + y p 2 n p + 2 n p + 2 nyp 2 nyp + n p 2 n p + n p np (1 p )

y 2 y 2 p 2 ynp + 2 ynp 2 + n 2 p 2 n 2 p 3 + y 2 p + 2n 2 p 2 2nyp 2 2n 2 p 2 + n 2 p 3 np(1 p ) y 2 2nyp + n 2 p 2 ( y np) 2 ( y np) 2 = = = z2 np (1 p) np (1 p) npq

= 9.20 a.

Yes, the sampling appears to satisfy the assumptions of a multinomial experiment. The experiment contains 120 trials and 2(4) = 8 categories. Since the 120 rats were randomly selected, the trials are considered independent and the probabilities are considered constant. n = ni .n. j E ij n

b.

( )

( n ) = 80(30) = 20 E 11 120

( n ) = 40(30) = 10 E 21 120

( n ) = 80(30) = 20 E 12 120 ( n ) = 80(30) = 20 E 13 120 ( n ) = 80(30) = 20 E 14 120

( n ) = 40(30) = 10 E 22 120 ( n ) = 40(30) = 10 E 23 120 ( n ) = 40(30) = 10 E 24 120

196

Chapter 9

c.

n nij E ij = E nij
2

( ) ( )

(27 20) 2 (20 20) 2 (19 20) 2 (14 20) 2 (3 10) 2 + + + + 20 20 20 20 10 +

(10 10) 2 (11 10) 2 (16 10) 2 + + = 12.9 10 10 10

d.

To determine if diet and presence/absence of cancer are independent, we test:

H 0 : Diet and presence/absence of cancer are independent H a : Diet and presence/absence of cancer are dependent
The test statistic is 2 = 12.9. The rejection region requires = .05 in the upper tail of the 2 distribution with df =
2 = 5.99147. The (r 1)(c 1) = (2 1)(4 1) = 3. From Table 8, Appendix B, .05

rejection region is 2 > 5.99147. Since the observed value of the test statistic falls in the rejection region ( 2 = 12.9 > 5.99147), H 0 is rejected. There is sufficient evidence to indicate that diet and presence/absence of cancer are not independent at = .05. e. Let p1 = proportion of rats on high fat/no fiber diet with cancer and let p2 = proportion of rats on high fat/fiber diet with cancer.
1 = p 27 = .9 30 2 = p 20 = .667 30

The confidence interval for the difference between two proportions is: 1 p 2 ) z 2 (p 1q 1 p q p + 2 2 n1 n2

For confidence coefficient .95, = 1 .95 = .05 and /2 = .05/2 = .025. From Table 5, Appendix B, z.025 = 1.96. The 95% confidence interval is: (.90 .667) 1.645 .9(.1) .667(.333) + .233 .2 (.033, .433) 30 30

Categorical Data Analysis

197

To obtain the confidence interval for the percentage, multiply the endpoints by 100%. The interval is (3.3%, 43.3%). We are 95% confident that the difference in the percentage of rats with cancer between those on high fat/no fiber diets and those on high fat/fiber diets is between 3.3% and 43.3%. Since the rats were divided into groups according to diets, we assume the groups are independent. 9.22 Using MINITAB, the results of the analyses are:
Tabulated statistics: Stops, Kills
Using frequencies in Fr Rows: Stops 1 1 2 All 32 28.31 24 27.69 56 56.00 Columns: Kills 2 33 34.88 36 34.12 69 69.00 3 19 18.71 18 18.29 37 37.00 4 5 6.57 8 6.43 13 13.00 5 2 2.53 3 2.47 5 5.00 All 91 91.00 89 89.00 180 180.00

Cell Contents:

Count Expected count

Pearson Chi-Square = 2.171, DF = 4, P-Value = 0.704 Likelihood Ratio Chi-Square = 2.182, DF = 4, P-Value = 0.702 * NOTE * 2 cells with expected counts less than 5

First, we check to see if the assumption about the expected cells is met. From the table, there are two expected cell counts that are less than 5. Thus, the results of the test are suspect. To determine if the number of kills is related to whether the trial was stopped or not, we test:
H0: Number of kills and whether the trial was stopped or not are independent Ha: Number of kills and whether the trial was stopped or not are dependent

The test statistic is 2 = 2.171 (from the printout). The p-value of the test is .704. Since this p-value is so large, H0 is not rejected. There is insufficient evidence to indicate that the number of kills is related to whether the trial was stopped or not at .10.

198

Chapter 9

9.24

a.

The contingency table is shown below:


High Less than 300 300-600 meters 600 or more Total 105 121 59 285 Low 85 77 17 179 Total 190 198 76 464

b.

To determine if flight response of the geese depends on altitude of the helicopter, we test:
H0: Flight response and Altitude are independent Ha: Flight response and Altitude are dependent

Statistix was used to create the following printout:


Chi-Square Test for Heterogeneity or Independence for Count = Altitude Response Altitude 1 Observed Expected Cell Chi-Sq Observed Expected Cell Chi-Sq Observed Expected Cell Chi-Sq Response Low High +-----------+-----------+ | 85 | 105 | | 73.30 | 116.70 | | 1.87 | 1.17 | +-----------+-----------+ | 77 | 121 | | 76.38 | 121.62 | | 0.00 | 0.00 | +-----------+-----------+ | 17 | 59 | | 29.32 | 46.68 | | 5.18 | 3.25 | +-----------+-----------+ 179 285 11.48 0.0032 2

190

198

76

464

Overall Chi-Square P-Value Degrees of Freedom

Since = .01 > p-value = .0032, H0 can be rejected. There is sufficient evidence to indicate that flight response of the geese depends on the altitude of the helicopter. c. The contingency table is shown below:
High Less than 1,000 1,000-2,000 meters 2,000-3,000 meters 3,00 or more Total 243 37 4 1 285 Low 37 68 44 30 179 Total 280 105 48 31 464

Categorical Data Analysis

199

d.

To determine if flight response of the geese depends on lateral distance of the helicopter, we test:
H0: Flight response and Lateral distance are independent Ha: Flight response and Lateral distance are dependent

Statistix was used to create the following printout:


Chi-Square Test for Heterogeneity or Independence for Count = Lat_Cat Response Lat_Cat 1 Observed Expected Cell Chi-Sq Observed Expected Cell Chi-Sq Observed Expected Cell Chi-Sq Observed Expected Cell Chi-Sq Response Low High +-----------+-----------+ | 37 | 243 | | 108.02 | 171.98 | | 46.69 | 29.33 | +-----------+-----------+ | 68 | 37 | | 40.51 | 64.49 | | 18.66 | 11.72 | +-----------+-----------+ | 44 | 4 | | 18.52 | 29.48 | | 35.07 | 22.03 | +-----------+-----------+ | 30 | 1 | | 11.96 | 19.04 | | 27.22 | 17.09 | +-----------+-----------+ 179 285 207.80 0.0000 3

280

105

48

31

464

Overall Chi-Square P-Value Degrees of Freedom

Since = .01 > p-value = .0000, H0 can be rejected. There is sufficient evidence to indicate that flight response of the geese depends on the lateral distance of the helicopter. 9.26 a. To find the proportion of censored measurements for each of the six tractor lines, we take the number of censored measurements for each tractor line and divide it by the total number of measurements for each tractor lane. 1 = p 2 = p
3 = p 4 = p

175 = 0.028 6047 236 = 0.050 4692


319 = 0.045 7140 231 = 0.038 6120

200

Chapter 9

5 = p
6 = p

480 = 0.046 10353


187 = 0.039 4794

b.

Statistix was used to create the following printout:


Chi-Square Test for Heterogeneity or Independence for Count = Lat_Cat Response Tractor Line 1 Observed Expected Cell Chi-Sq Observed Expected Cell Chi-Sq Observed Expected Cell Chi-Sq Observed Expected Cell Chi-Sq Observed Expected Cell Chi-Sq Observed Expected Cell Chi-Sq Response Uncensored Censored +-----------+-----------+ | 175 | 6047 | | 257.61 | 5964.39 | | 26.49 | 1.14 | +-----------+-----------+ | 236 | 4456 | | 194.26 | 4497.74 | | 8.97 | 0.39 | +-----------+-----------+ | 319 | 6821 | | 295.62 | 6844.38 | | 1.85 | 0.08 | +-----------+-----------+ | 231 | 5889 | | 253.39 | 5866.61 | | 1.98 | 0.09 | +-----------+-----------+ | 480 | 9873 | | 428.64 | 9924.36 | | 6.15 | 0.27 | +-----------+-----------+ | 187 | 4607 | | 198.49 | 4595.51 | | 0.66 | 0.03 | +-----------+-----------+ 1628 37693 48.09 0.0000 5

6222

4692

7140

6120

10353

4794

39321

Overall Chi-Square P-Value Degrees of Freedom

To determine the proportion of censored measurements differs for the six tractor lines, we test:
H0: Measurement type and tractor line are independent Ha: Measurement type and tractor line are dependent

Since = 01 > p-value = .0000, H0 can be rejected. There is sufficient evidence to indicate that the proportion of censored measurements differs for the six tractor lines. c. While statistically significant, we have no way of knowing when a tractor line will produce a large number of censored measurements and when it will produce a small number of censored measurements. From a practical perspective, not much useful information has been learned.

Categorical Data Analysis

201

9.28

a.

The contingency table is:


Committee Acceptable Rejected Inspector Acceptable Rejected Totals

Totals

101 10 111

23 19 42

124 29 153

b.

Yes. To plot the percentages, first convert frequencies to percentages by dividing the numbers in each column by the column total and multiplying by 100. Also, divide the row totals by the overall total and multiply by 100.
Acceptable Acceptable Inspector Rejected
101 100 = 90.99% 111

Rejected

Totals

23 124 100 = 54.76% 100 = 81.05% 42 123

10 100 = 9.01% 111

19 29 100 = 45.23% 100 = 18.95% 42 153

From the plot, it appears there is a relationship.


1 0.8 0.6 0.4 0.2 0 Acceptable Rejected Committee Total

c.

Some preliminary calculations are: = r1c1 = 124(111) = 89.961 E 11 n 153 = r2c1 = 29(111) = 21.039 E 21 n 153 = r1c2 = 124(42) = 34.039 E 12 n 153 = r2c2 = 29(42) = 7.961 E 22 n 153

202

Proportion accept/rejecte

Chapter 9

To determine if the inspector's classifications and the committee's classifications are related, we test:
H0: The inspector's and committee's classification are independent Ha: The inspector's and committee's classifications are dependent

The test statistic is 2 =


=

]2 [nij E ji E
ij

(101 89.961) (23 34.039) 2 (10 21.039) 2 (19 7.961) 2 + + + 89.961 34.039 21.039 7.961 = 26.034 The rejection region requires = .05 in the upper tail of the 2 distribution with 2 df = (r 1)(c 1) = (2 1)(2 1) = 1. From Table 8, Appendix B, .05 = 3.84146. The rejection region is 2 > 3.84146. Since the observed value of the test statistic falls in the rejection region (2 = 26.034 > 3.84146), H0 is rejected. There is sufficient evidence to indicate the inspector's and committee's classifications are dependent at = .05. This indicates that the inspector and committee tend not to make the same decisions. 9.30 We wish to test:

H 0 : p1 = p2 = p3 = p4 = p5 = p6 = p7 = 1/ 7 H a : At least two of these proportions are different from


p1 = p2 = p3 = p4 = p5 = p6 = p7 = 1/ 7 Our statistic is 2 =

i =1

( Oi ei )
ei

The observed counts are found by using the table information: Oi = (number of specimens)(percentage with manganese nodules) The expected counts are found by ei = ni pi These results are summarized as follows:
Age Miocene-recent Oligocene Eocene Paleocene Lake Cretaceous Early and Middle Cretaceous Jurassic Observed Expected 389(.059) = 23 389(1/7) = 55.6 140(.179) = 25 140(1/7) = 20.0 214(.164) = 35 214(1/7) = 30.6 84(.214) = 18 84(1/7) = 12.0 247(.211) = 52 247(1/7) = 35.3 1120(.142) = 159 1120(1/7) = 160.0 99(.110) = 11 99(1/7) = 14.1 323 (23 55.6) 2 (25 20.0) 2 (11 14.1) 2 2 = + + + = 32.59 55.6 20.0 14.1

Categorical Data Analysis

203

The rejection region requires = .05 in the upper tail of the 2 distribution with k 1 = 7 1
2 = 12.5916. Reject H 0 if 2 > 12.5916. = 6 df. From Table 8, Appendix B, .05

Since the observed value of the test statistic falls in the rejection region ( 2 = 32.59 > 12.5916), H 0 is rejected. 9.32 a. To determine if the percentages of the different types of programming statements differ for the two languages, we test: H 0 : The proportions of the different types of programming statements are the same for the two languages H a : The proportions of the different types of programming statements are different for the two languages The expected category counts are: n = ni .n. j E ij n

( )

( n ) = 2170(10, 412) = 1136.407 E 11 19,882 ( n ) = 2170(9470) = 1033.593 E 12 19,882 ( n ) = 726(9470) = 345.801 E 52 19,882 The observed and expected category counts are:
ALGOL IF 125 (1136.407) FOR 968 (690.223) IO 135 (1037.953) IF ASSIGNMENT 8,293 (7167.218) Other 261 (380.199) Totals 10,412 PASCAL Totals 2,045 (1033.593) 2,170 350 (627.777) 1,318 1,847 (944.047) 1,982 4,763 (6518.782) 13,686 465 (345.801) 726 9,470 19,882

The test statistic is: n nij E ij = E nij


2

( ) ( )

(125 1136.407) 2 (2045 1033.593) 2 + + 1136.407 1033.593

(465 345.801) 2 345.801

= 4755.1933

204

Chapter 9

The rejection region requires = .05 in the upper tail of the 2 distribution with df = (r
2 = 9.48773. The 1)(c 1) = (5 1)(2 1) = 4. From Table 8, Appendix B, .05

rejection region is 2 > 9.48773. Since the observed value of the test statistic falls in the rejection region ( 2 = 4755.1993 > 9.48773), H 0 is rejected. There is sufficient evidence to indicate the percentages of the different types of programming statements differ for the two languages at = .05. b. The form of the confidence interval for ( pA pP ) is:

( pA pP ) z 2
A = p

A (1 p A ) p (1 p P ) p + P nA nP P = p

XA 8923 = = .857 nA 10, 412

X P 4763 = = .503 nP 9470

For confidence coefficient .95, = 1 .95 = .05 and /2 = . 05/2 = .025. From Table 5, Appendix B, z.025 = 1.96 . The confidence interval is:
(.857 .503) 1.96 .857(1 .857) .503(1 .503) + .354 .0121 10412 9470 (.3419, .3661)

9.34

a.

The form of the contingency tables will all be:


Predicted EVG No Yes 439 + y 10 y 49 y y 488 10 Total 449 49 498

Defect

FALSE TRUE Total

b.

The hypergeometric formula for these tables is: 449 49 10 y y , where y = 0, 1, 2, , 10 498 10

Categorical Data Analysis

205

Due to the large sample size, these factorials produce difficult probabilities to calculate. The resulting probabilities are shown below:
y 0 1 2 3 4 5 6 7 8 9 10 P(y) 0.3514 0.3914 0.1917 0.0544 0.0099 0.0012 0.0001 0 0 0 0

c. d. 9.36 a.

The Fishers exact test p-value can be found by adding the probabilities at least as contradictory as the one observed. P-value = P(y = 2 or 3 or or 10) = 0.2572. We see that these two probabilities are equal. The form of the confidence interval is: i z p
2

i (1 p i ) p n

1 = .60, p 2 = .23, p 3 = .17 p

For confidence coefficient .95, = 1 .95 = .05 and /2 = .05/2 = .025. From Table 5, Appendix B, z.025 = 1.96. The 95% confidence intervals are: For p1 : .60 1.96 For p2 : .23 1.96 For p3 : .17 1.96 b. We want to test: .60(.40) .60 .029 (.571, .629) 1132 .23(.77) .23 .025 (.205, .255) 1132 .17(.83) .17 .022 (.148, .192) 1132

H 0 : p1 = .8, p2 = .1, and p3 = .1 H a : At least two proportions are different than specified
The expected counts in each category are:

E (n1 ) = np1 = 1132(.8) = 905.6 E ( n2 ) = np2 = 1132(.1) = 113.2 E ( n3 ) = np3 = 1132(.1) = 113.2

206

Chapter 9

The observed and expected category counts are:


Observed Expected Appropriate Inappropriate Avoidable 679 261 192 905.6 113.2 113.2

The test statistic is:


2

( n npi ) = i
npi

(679 905.6) 2 (261 113.2) 2 (192 113.2) 2 + + = 304.5 905.6 113.2 113.2

The rejection region requires = .10 in the upper tail of the 2 distribution with df = k
2 = 4.60517. The rejection region is 2 1 = 3 1 = 2. From Table 8, Appendix B, .10 > 4.60517.

Since the observed value of the test statistic falls in the rejection region ( 2 = 304.5 > 4.60517) H 0 is rejected. There is sufficient evidence to indicate at least two proportions are different than specified at = .10. 9.38 The Statistix printout for the analysis appears below:
Chi-Square Test for Heterogeneity or Independence for count = Year abuse Year 1 Observed Expected Cell Chi-Sq Observed Expected Cell Chi-Sq Observed Expected Cell Chi-Sq Observed Expected Cell Chi-Sq abuse 1 2 3 4 +-----------+-----------+-----------+-----------+ | 7 | 5 | 9 | 8 | | 9.61 | 8.22 | 5.74 | 5.43 | | 0.71 | 1.26 | 1.85 | 1.22 | +-----------+-----------+-----------+-----------+ | 22 | 18 | 6 | 6 | | 17.24 | 14.74 | 10.29 | 9.73 | | 1.31 | 0.72 | 1.79 | 1.43 | +-----------+-----------+-----------+-----------+ | 12 | 15 | 6 | 12 | | 14.92 | 12.75 | 8.90 | 8.42 | | 0.57 | 0.40 | 0.95 | 1.52 | +-----------+-----------+-----------+-----------+ | 21 | 15 | 16 | 9 | | 20.22 | 17.29 | 12.07 | 11.42 | | 0.03 | 0.30 | 1.28 | 0.51 | +-----------+-----------+-----------+-----------+ 62 53 37 35 15.86 0.0699 9 Missing Cases 0

29

52

45

61

187

Overall Chi-Square P-Value Degrees of Freedom Cases Included 16

To determine if the proportion of different types of abuse are changing over time, we test:

H 0 : Types of abuse and year are independent H a : Types of abuse and year are dependent

Categorical Data Analysis

207

The expected category counts are shown in the printout. (n ) nij E ij = 15.86 from printout. The test statistic is = E (nij )
2 2

The rejection region requires = .05 in the upper tail of the 2 distribution with df = (r 1)(c
2 1) = (4 1)(4 1) = 9. From Table 8, Appendix B, .05 =16.9190. The rejection region is

2 > 16.9190. Since the observed value of the test statistic does not fall in the rejection region ( 2 = 15.859 > / 16.9190), H 0 is not rejected. There is insufficient evidence to indicate the proportions of different types of abuse are changing over time at = .05. 9.40 a. To determine if pesticide depends on orchard type, we test:

H 0 : Pesticide and orchard type are independent H a : Pesticide and orchard type are dependent
The test statistic is 2 = 31000.416 (from printout). The p-value for the test is p = .000. At = .01, > p-value, and we reject H 0 . There is sufficient evidence to indicate that pesticide used and orchard type are dependent. PHstat was used to conduct the desired analysis and the following printout was created:
Observed Frequencies Column variable Row variable Almonds Peaches Nectarines Chlor. 41077 4419 11594 Diazinon 102935 9651 5928 Methid. 21240 5198 1790 Parathion 136064 53384 24417 Total 301316 72652 43729 Expected Frequencies Column variable Almonds Peaches 41183.27505 9929.931697 85492.98756 20613.69636 20362.96178 4909.82855 154276.7756 37198.54339 301316 72652

Total 57090 118514 28228 213865 417697

Row variable Chlor. Diazinon Methid. Parathion Total

Nectarines 5976.79325 12407.31608 2955.209666 22389.681 43729

Total 57090 118514 28228 213865 417697

208

Chapter 9

Data Level of Significance Number of Rows Number of Columns Degrees of Freedom Results Critical Value Chi-Square Test Statistic p-Value

0.01 4 3 6

16.8118718 31000.41584 0

b.

We will calculate 95% confidence intervals for the rate of parathion application for the three orchard types. Almonds: = p 136,064 = .45 301,316 pq .45(.55) = .45 1.96 .45 .002 n 301,316

z.025 p

Nectars:

= p

24, 417 = .56 43,729 pq .56(.44) = .56 1.96 .56 .005 n 43,729

z.025 p

Peaches:

= p

53,384 = .73 72,652 pq .73(.27) = .73 1.96 .73 .003 n 72,652

z.025 p 9.42 a. Test H 0 : p1 = p2 = .5 H a : p1 p2 The test statistic is:

i =1 j =1

(O

ij

eij eij

Categorical Data Analysis

209

The expected cell counts are eij =

ri c j n

( n ) = r1c1 = 17(81) = 9.56 E 11 n 144 ( n ) = r1c2 = 17(63) = 7.44 E 12 n 144


( n ) = r2c1 = 127(81) = 71.44 E 21 n 144 ( n ) = r2c2 = 127(63) = 55.56 E 22 n 144

2 =

(5 9.56) 2 (12 7.44) 2 (76 71.44) 2 (51 55.56) 2 + + + = 5.635 9.56 7.44 71.44 55.56

The rejection region requires = .01 in the upper tail of the 2 distribution with df = (r
2 = 6.63490. Reject 1)(c 1) = (2 1)(2 1) = 1. From Table 8, Appendix B, .01

H 0 if 2 > 6.63490. Since the observed value of the test statistic does not fall in the rejection region ( 2 = 5.635 > / 6.63490), H 0 cannot be rejected. There is not sufficient evidence to detect a difference in proportions at = .01. b. Let p1 = proportion of males with scar tissue in snout and p2 = proportion of females with scar tissue in snout. The form of the confidence interval is:
1 p 2 ) z 2 (p 5 = .062 81 1q 1 p q p + 2 2 n1 n2 2 = p 12 = .190 63

1 = p

For confidence coefficient .99, = 1 .99 = .01 and /2 = .01/2 = .005. From Table 5, Appendix B, z.005 = 2.576. The 99% confidence interval is: (.062 .190) 2.576 .062(.938) .190(.810) + .128 .145 (.273, .017) 81 63

We are 99% confident the true difference in the proportions of males and females with scar tissue in snout is contained in the interval .273 to .017. 210 Chapter 9

c.

Fishers exact test computes the p-value at p = 0.0173. When testing at = .01, H0 cannot be rejected. There is insufficient evidence to detect a difference in proportions which agrees with our conclusion above in part a.

9.44

The Statistix printout for the analysis is shown below:


Chi-Square Test for Heterogeneity or Independence for count = Technology Group Technology 1 Observed Expected Cell Chi-Sq Observed Expected Cell Chi-Sq Observed Expected Cell Chi-Sq Group 1 2 3 4 +-----------+-----------+-----------+-----------+ | 21 | 42 | 11 | 25 | | 24.75 | 24.75 | 24.75 | 24.75 | | 0.57 | 12.02 | 7.64 | 0.00 | +-----------+-----------+-----------+-----------+ | 18 | 2 | 16 | 13 | | 12.25 | 12.25 | 12.25 | 12.25 | | 2.70 | 8.58 | 1.15 | 0.05 | +-----------+-----------+-----------+-----------+ | 11 | 6 | 23 | 12 | | 13.00 | 13.00 | 13.00 | 13.00 | | 0.31 | 3.77 | 7.69 | 0.08 | +-----------+-----------+-----------+-----------+ 50 50 50 50 44.548 0.0000 6 Missing Cases 0

99

49

52

200

Overall Chi-Square P-Value Degrees of Freedom Cases Included 12

a.

To determine if public opinion regarding the choice of future technology options for generating electricity differ among the four groups, we test: H 0 : Choice and group are independent H a : Choice and group are dependent The test statistic is 2 = 44.548. The rejection region requires = .10 in the upper tail of the 2 distribution with df = (r
2 = 10.6446. The 1)(c 1) = (3 1)(4 1) = 6. From Table 8, Appendix B, .10

rejection region is 2 > 10.6446. Since the observed value of the test statistic falls in the rejection region ( 2 = 44.548 > 10.6446), H 0 is rejected. There is sufficient evidence to indicate that public opinion does differ among the four groups at = .10. b. Let p1 = proportion supporting the coal option and p2 = proportion supporting the nuclear option.

Categorical Data Analysis

211

To determine if the proportion supporting the coal option exceeds the proportion supporting the nuclear option, we test:

H 0 : p1 p2 = 0 H a : p1 p2 > 0
1 = p 99 = .495 200 2 = p 49 = .245 200 = p 99 + 49 = .37 200 + 200

The rejection region re requires = .10 in the upper tail of the z distribution. From Table 5, Appendix B, z.10 = 1.282. The rejection region is z > 1.282. The test statistic is: z= 1 p 2 ) D0 (p (1 p ) + p (1 p ) + 2p p n
2

(.495 .245) 0 .37(.63) + .37(.63) + 2(.37) 2 200

= 4.11

Since the observed value of the test statistic falls in the rejection region (z = 4.11 > 1.282), H 0 is rejected. There is sufficient evidence to indicate the proportion supporting coal exceeds the proportion supporting nuclear at = .10. c. The form of the confidence interval is: z / 2 p (1 p ) p n = p 16 = .32 50

For confidence coefficient .90, = 1 .90 = .10 and /2 = .10/2 = .05. From Table 5, Appendix B, z.05 = 1.645. The 90% confidence interval is:
.32 1.645 .32(1 .32) .32 .109 (.211, .429) 50

9.46

The data were tested using Fishers exact test and the results are shown below:
Two by Two Tables +----------+----------+ | | | | 10 | 6 | | | | +----------+----------+ | | | | 12 | 2 | | | | +----------+----------+ 22 8 Fisher Exact Tests: Lower Tail 0.1541

16

14 30 Two Tailed 0.2255

Upper Tail 0.0715

212

Chapter 9

To determine if the fidelity and selectivity are dependent, we test:

H 0 : Fidelity and Selectivity are independent H a : Fidelity and Selectivity are dependent
The p-value for the test is 0.2255. When testing at = .05, H 0 cannot be rejected. There is insufficient evidence to indicate that fidelity and selectivity are dependent when testing at = .05. 9.48 Some preliminary calculations are:

e1 = e2 = e3 = e4 = e5 = e6 = e7 = e8 = npi = 714(.125) = 89.25


a. To determine if the probabilities of worker accidents are higher for some time periods, we test:

H 0 : p1 = p2 = p3 = p4 = p5 = p6 = p7 = p8 = .125 H a : At least two of the cell probabilities differ from each other
The test statistic is: =
2

( Oi ei )
ei

(93 89.25) 2 (71 89.25) 2 (79 89.25) 2 + + + 89.25 89.25 89.25

(110 89.25) 2 = 15.905 89.25

The rejection region requires = .10 in the upper tail of the 2 distribution with df = k
2 =12.0170. The rejection region is 1 = 8 1 = 7. From Table 8, Appendix B, .10

2 > 12.0170. Since the observed value of the test statistic falls in the rejection region ( 2 = 15.905 > 12.017, H 0 is rejected. There is sufficient evidence to indicate the probabilities of worker accidents are higher in some time periods at = .10. b. 1 = p 98 + 89 + 102 + 110 399 = = .5588 714 714 H 0 : p1 = .5 H a : p1 > .5

Categorical Data Analysis

213

The test statistic is z =

1 p10 p .5588 .5 = = 3.14 p10 (q10 ) .5(.5) 714 n

The rejection region requires = .10 in the upper tail of the z distribution. From Table 5, Appendix B, z.10 = 1.28 . The rejection region is z > 1.28. Since the observed value of the test statistic falls in the rejection region (z = 3.14 > 1.28), H 0 is rejected. There is sufficient evidence to indicate the probability of an accident during the last 4 hours of a shift is greater than during the first 4 hours at = .10.

214

Chapter 9

You might also like