Professional Documents
Culture Documents
=
=
c
i
i
p
1
1)
- random variables of interest are
i
N = # of trials with outcome i. (Note:
=
=
c
i
i
n N
1
)
- the corresponding distribution of these random variables is
( )
c
n
c
n n
c
c c
p p p
n n n
n
n N n N n N P
2 1
2 1
2 1
2 2 1 1
! ! !
!
, , = = = =
- expected number of outcomes of type i in n trials is ( )
i i
np N E =
Test for Multinomial Distribution (Chi-square Test)
Outline of test:
1. Specify the null and alternative hypotheses
0 , 20 2 10 1 0
, , :
c c
p p p p p p H = = =
:
a
H at least one
io i
p p =
2. Test statistic
( )
=
c
i
i
i i
E
E n
1
2
2
_
where n
i
= observed # of trials with outcome i, and
0 i i
np E = (i.e., expected # of trials with
outcome i under H
0
).
3. Rejection rule:
Reject H
0
if test statistic
2
1 ,
2
>
c o
_ _ , or reject H
0
if p-value < o
where
2
1 , c o
_ is the upper o 100 percentile ( o quantile) of the
2
_ distribution with 1 c degrees
of freedom.
Note:
- Sample size must be large for the test to be valid.
- How large? Rule of thumb: every 1 >
i
E , and no more than 1/5
th
of all
i
E s < 5.
2
Example:
A clinical trial for a new drug is conducted on a random sample of 200 patients suffering from high-blood
pressure. The classification of the outcomes of the trial is as follows:
1 : patients blood pressure decreases substantially
2 : patients blood pressure decreases moderately
3 : patients blood pressure decreases slightly
4 : patients blood pressure remains the same or increases
The data are as follows:
Category Observed Counts
1 120
2 60
3 10
4 10
Standard treatment of blood pressure gives the following outcomes for any large clinical trial:
Category Percentage
1 50%
2 25%
3 10%
4 15%
Test the hypothesis that the new drug is no different from the standard treatment in terms of reducing blood
pressure. Use 05 . 0 = o .
3
Chi-Square Goodness-of-Fit Test
The goal is to test whether a specified distribution or model fits a data set.
Basic idea:
- Partition all possible values of the distribution into c categories.
- Treat each category as a possible outcome of a multinomial distribution.
Outline of test:
1. Specify the null and alternative hypotheses
H
0
: The specified distribution or model fits the data
( i.e.,
0 , 20 2 10 1
, ,
c c
p p p p p p = = = )
:
a
H The specified distribution or model does not fit the data
(i.e., at least one
io i
p p = )
where
0 i
p is the probability that a data point falls in the ith category assuming the data came from
the specified distribution (i.e., if
0
H is true).
2. Test statistic
( )
=
c
i
i
i i
E
E n
1
2
2
_
where n
i
= observed # of data points in the ith category,
0 i i
np E = (i.e., expected # of trials with
outcome i under H
0
), n is the size of data, and
0 i
p is defined in step 1.
3. Rejection rule:
Reject H
0
if test statistic
2
1 ,
2
k c
>
o
_ _
where
2
1 , k c o
_ is the upper o 100 percentile (o quantile) of the
2
_ distribution with k c 1
degrees of freedom, and k is the number of parameters of the specified distribution that must be
estimated from the data.
Alternative rejection rule:
Reject H
0
if p-value < o .
Note: For the test to be valid, the number of categories should be as large as possible, as long as
+
0 i i
np E = > 1 for every category i.
+ At least 4/5
th
of
i
E s > 5.
If these two conditions are not satisfied, combine or pool two or more categories so that the new
categories satisfy the conditions.
4
Example: (Prob. 9.21 from text) Bomb hits on London during WWII
During WWII, a 36 km
2
area of South London was divided into 576 small squares of 0.25 km
2
each to
record bomb hits. The data are as follows:
# of hits 0 1 2 3 4 5 6 7
# of squares 229 211 93 35 7 0 0 1
Test whether a Poisson model would fit the data at . 05 . 0 = o
5
Section 9.4 Inferences for 2-Way Count Data
2-way count data
- For each observational unit in the sample, measurements are made on two characteristics/variables.
- The data are presented in a two-dimensional array or table (see below).
- Example: a survey in which each person is classified according to his/her race and his/her opinion
about a certain political issue.
A 2-Way Table of Observed Counts
Column (Variable Y)
1 2 j c
Row
total
1
11
n
12
n
j
n
1
c
n
1
- 1
n
2
21
n
22
n
j
n
2
c
n
2
- 2
n
Row
(Variable X)
i
1 i
n
2 i
n
ij
n
ic
n
- i
n
r
1 r
n
2 r
n
rj
n
rc
n
- r
n
Col
Total
1 -
n
2 -
n
j
n
-
c
n
-
- -
n = n
Notation:
+
=
-
j
ij i
n n ,
=
-
i
ij j
n n ,
= =
- -
i j
ij
n n n .
+ r = # of possible values of variable X.
+ c = # of possible values of variable Y.
Independence between two discrete random variables X and Y
Let the possible values of X be { } r , , 2 , 1 and the possible values of Y be { } c , , 2 , 1 .
If X and Y are independent, then by definition for all r i , , 2 , 1 = and c j , 2 , 1 = ,
( ) ( ) ( ) j Y P i X P j Y i X P = = = = = ,
6
Notation:
+ ( )
ij
p j Y i X P = = = , , i.e., the probability that an observation will fall in the i-j cell in the 2-way table.
+ ( )
-
= = =
j
i ij
p p i X P , i.e., the probability that an observation will be in the ith row in the 2-way
table.
+ ( )
-
= = =
i
j ij
p p j Y P , i.e., the probability that an observation will be in the jth column in the 2-
way table.
With this notation, the condition of independence between X and Y becomes
j i ij
p p p
- -
=
for all i and j.
Test of Independence between Two Discrete Variables X and Y
Outline of test:
1. Specify the null and alternative hypotheses
H
0
: The two variables X and Y are independent
( i.e.,
j i ij
p p p
- -
= for all i and j)
:
a
H The two variables X and Y are not independent
(i.e., at least one
j i ij
p p p
- -
= )
2. Test statistic
( )
= =
=
r
i
c
j ij
ij ij
E
E n
1 1
2
2
_
where
ij
n = observed # of data points in the i-j cell,
( ) ( )
n
j i
n
n n
p p n p n E
j i
j i ij ij
total column total row
= = = =
- -
- -
(i.e., estimated expected # of counts in the i-j cell)
3. Rejection rule:
Reject H
0
if test statistic
( )( )
2
1 1 ,
2
>
c r o
_ _ or reject H
0
if p-value < o
7
Note: The conditions for using the test:
- 1
>
ij
E for all i and j.
- Not more than 1/5
th
of all
ij
E