You are on page 1of 7

1

Chapter 9 Inferences for Count Data



Section 9.3 Inferences for One-way Count Data

Multinomial Distribution
- n independent trials
- each trial results in one of c possible outcomes ( ) 2 > c
- probability that a trial will result in outcome i is
i
p (Note:

=
=
c
i
i
p
1
1)
- random variables of interest are
i
N = # of trials with outcome i. (Note:

=
=
c
i
i
n N
1
)
- the corresponding distribution of these random variables is
( )
c
n
c
n n
c
c c
p p p
n n n
n
n N n N n N P

2 1
2 1
2 1
2 2 1 1
! ! !
!
, , = = = =

- expected number of outcomes of type i in n trials is ( )
i i
np N E =




Test for Multinomial Distribution (Chi-square Test)
Outline of test:

1. Specify the null and alternative hypotheses


0 , 20 2 10 1 0
, , :
c c
p p p p p p H = = =
:
a
H at least one
io i
p p =

2. Test statistic

( )

=
c
i
i
i i
E
E n
1
2
2
_
where n
i
= observed # of trials with outcome i, and
0 i i
np E = (i.e., expected # of trials with
outcome i under H
0
).


3. Rejection rule:
Reject H
0
if test statistic
2
1 ,
2

>
c o
_ _ , or reject H
0
if p-value < o

where
2
1 , c o
_ is the upper o 100 percentile ( o quantile) of the
2
_ distribution with 1 c degrees
of freedom.


Note:
- Sample size must be large for the test to be valid.
- How large? Rule of thumb: every 1 >
i
E , and no more than 1/5
th
of all
i
E s < 5.



2
Example:
A clinical trial for a new drug is conducted on a random sample of 200 patients suffering from high-blood
pressure. The classification of the outcomes of the trial is as follows:

1 : patients blood pressure decreases substantially
2 : patients blood pressure decreases moderately
3 : patients blood pressure decreases slightly
4 : patients blood pressure remains the same or increases

The data are as follows:

Category Observed Counts
1 120
2 60
3 10
4 10

Standard treatment of blood pressure gives the following outcomes for any large clinical trial:

Category Percentage
1 50%
2 25%
3 10%
4 15%

Test the hypothesis that the new drug is no different from the standard treatment in terms of reducing blood
pressure. Use 05 . 0 = o .




























3
Chi-Square Goodness-of-Fit Test
The goal is to test whether a specified distribution or model fits a data set.

Basic idea:
- Partition all possible values of the distribution into c categories.
- Treat each category as a possible outcome of a multinomial distribution.


Outline of test:

1. Specify the null and alternative hypotheses

H
0
: The specified distribution or model fits the data
( i.e.,
0 , 20 2 10 1
, ,
c c
p p p p p p = = = )
:
a
H The specified distribution or model does not fit the data
(i.e., at least one
io i
p p = )
where
0 i
p is the probability that a data point falls in the ith category assuming the data came from
the specified distribution (i.e., if
0
H is true).


2. Test statistic

( )

=
c
i
i
i i
E
E n
1
2
2
_
where n
i
= observed # of data points in the ith category,
0 i i
np E = (i.e., expected # of trials with
outcome i under H
0
), n is the size of data, and
0 i
p is defined in step 1.


3. Rejection rule:
Reject H
0
if test statistic
2
1 ,
2
k c
>
o
_ _

where
2
1 , k c o
_ is the upper o 100 percentile (o quantile) of the
2
_ distribution with k c 1
degrees of freedom, and k is the number of parameters of the specified distribution that must be
estimated from the data.

Alternative rejection rule:
Reject H
0
if p-value < o .


Note: For the test to be valid, the number of categories should be as large as possible, as long as
+
0 i i
np E = > 1 for every category i.
+ At least 4/5
th
of
i
E s > 5.
If these two conditions are not satisfied, combine or pool two or more categories so that the new
categories satisfy the conditions.








4
Example: (Prob. 9.21 from text) Bomb hits on London during WWII
During WWII, a 36 km
2
area of South London was divided into 576 small squares of 0.25 km
2
each to
record bomb hits. The data are as follows:

# of hits 0 1 2 3 4 5 6 7
# of squares 229 211 93 35 7 0 0 1

Test whether a Poisson model would fit the data at . 05 . 0 = o
















































5
Section 9.4 Inferences for 2-Way Count Data

2-way count data
- For each observational unit in the sample, measurements are made on two characteristics/variables.
- The data are presented in a two-dimensional array or table (see below).
- Example: a survey in which each person is classified according to his/her race and his/her opinion
about a certain political issue.




A 2-Way Table of Observed Counts

Column (Variable Y)

1 2 j c
Row
total
1
11
n
12
n
j
n
1


c
n
1

- 1
n
2
21
n
22
n
j
n
2


c
n
2

- 2
n

Row
(Variable X)
i
1 i
n
2 i
n
ij
n

ic
n
- i
n

r
1 r
n
2 r
n
rj
n

rc
n
- r
n

Col
Total
1 -
n
2 -
n
j
n
-


c
n
-

- -
n = n


Notation:
+

=
-
j
ij i
n n ,

=
-
i
ij j
n n ,

= =
- -
i j
ij
n n n .
+ r = # of possible values of variable X.
+ c = # of possible values of variable Y.







Independence between two discrete random variables X and Y

Let the possible values of X be { } r , , 2 , 1 and the possible values of Y be { } c , , 2 , 1 .
If X and Y are independent, then by definition for all r i , , 2 , 1 = and c j , 2 , 1 = ,

( ) ( ) ( ) j Y P i X P j Y i X P = = = = = ,


6
Notation:
+ ( )
ij
p j Y i X P = = = , , i.e., the probability that an observation will fall in the i-j cell in the 2-way table.
+ ( )

-
= = =
j
i ij
p p i X P , i.e., the probability that an observation will be in the ith row in the 2-way
table.
+ ( )

-
= = =
i
j ij
p p j Y P , i.e., the probability that an observation will be in the jth column in the 2-
way table.


With this notation, the condition of independence between X and Y becomes

j i ij
p p p
- -
=

for all i and j.





Test of Independence between Two Discrete Variables X and Y

Outline of test:

1. Specify the null and alternative hypotheses

H
0
: The two variables X and Y are independent
( i.e.,
j i ij
p p p
- -
= for all i and j)
:
a
H The two variables X and Y are not independent
(i.e., at least one
j i ij
p p p
- -
= )


2. Test statistic

( )

= =

=
r
i
c
j ij
ij ij
E
E n
1 1
2
2

_

where
ij
n = observed # of data points in the i-j cell,

( ) ( )
n
j i
n
n n
p p n p n E
j i
j i ij ij
total column total row


= = = =
- -
- -

(i.e., estimated expected # of counts in the i-j cell)



3. Rejection rule:
Reject H
0
if test statistic
( )( )
2
1 1 ,
2

>
c r o
_ _ or reject H
0
if p-value < o




7
Note: The conditions for using the test:
- 1

>
ij
E for all i and j.
- Not more than 1/5
th
of all
ij
E

s are less than 5.




Example:

A random survey of 200 employees from a number of educational institutions is taken, and each employee
is asked about his/her opinion about collective bargaining by teachers union. One of the main objectives of
the survey is to determine if the opinion about collective bargaining depends on the employees profession.
The data are as follows:

Opinion on Collective Bargaining (Y)
Employee
Classification
(X)
Favor Do not favor Undecided Totals
Staff 30 15 15 60
Faculty 40 50 10 100
Administrators 10 25 5 40
Totals 80 90 30 200

Are the two variables independent? Check using 05 . 0 = o .






















Wording of the conclusion of a hypothesis test:

If H
0
is rejected, say There is enough statistical evidence to support (the statement of) H
a
.

If H
0
is not rejected, say either There is not enough statistical evidence to support (the statement of) H
a
.
or The data are consistent with (the statement of) H
0
.

You might also like