You are on page 1of 6

Pearson's Chi-squared Test Pearson's Chi-squared Test Pearson's Chi-squared Test Pearson's Chi-squared Test

Multinomial distribution is like this.


j(n
1
, n
2
, ... n

)
(
n
n
1
n
2
... n

)
j
1
n
1
j
2
n
2
...j

n
1
!n
2
!...n

!
n!
j
1
n
1
j
2
n
2
...j

But to compute j(n


1
, n
2
, ... n

)
(
n
n
1
n
2
... n

)
j
1
n
1
j
2
n
2
...j

is overwhelming. To compute
p-value which is the probability of given event or events that are rare than given
events, we should calculate several j(n
1
, n
2
, ... n

)
(
n
n
1
n
2
... n

)
j
1
n
1
j
2
n
2
...j

for different
n
1
, n
2
, ..., n

.
For example, s=5. And we have 100 observations. For the probability j(n
1
, n
2
, ... n

)
to be calculated, the condition 0 n
1
n, 0 n
2
n, ..., 0 n
2
n,

/ 1

n
/
n should
be satisfied. So the number of possible combination of (n
1
, n
2
, ... n

) is
So it would take years to compute j(n
1
, n
2
, ... n

)
(
n
n
1
n
2
... n

)
j
1
n
1
j
2
n
2
...j

for possible
combination of (n
1
, n
2
, ... n

). Even on my computer, it took 20 minutes!


n! get harder to compute when n gets bigger. Sterling's formula shows
approximation for n!
n! _

2n
(
c
n
)
n
It shows that factorial on the left side is approximated by exponents. If we take
logarithm on both sides, we get
logn! nlogn- n- O(ln(n))
Let's use Sterling's formula to compute

n
1
!n
2
!...n

!
n!
j
1
n
1
j
2
n
2
...j

. We can use
approximation,
n! _

2n
(
c
n
)
n
, n
1
! _

2n
1 (
c
n
1
)
n
1
, n
2
! _

2n
2 (
c
n
2
)
n
2
, ... , n

! _

2n
(
c
n

)
n

So,

n
1
!n
2
!...n

!
n!
j
1
n
1
j
2
n
2
...j

=
_

2n
1 (
c
n
1
)
n
1
_

2n
2 (
c
n
2
)
n
2
...
_

2n
(
c
n

)
n

2n
(
c
n
)
n
j
1
n
1
j
2
n
2
...j

=
_

2n
1
_

2n
2
...
_

2n
(
c
n
1
)
n
1
(
c
n
2
)
n
2
...
(
c
n

)
n

2n
(
c
n
)
n
j
1
n
1
j
2
n
2
...j

(_

2 )

n
1
n
2
...n
(
c
n
1
)
n
1
(
c
n
2
)
n
2
...
(
c
n

)
n

2n
(
c
n
)
n
j
1
n
1
j
2
n
2
...j

(_

2 )
- 1
_

n
1
n
2
...n
(
c
n
1
)
n
1
(
c
n
2
)
n
2
...
(
c
n

)
n

n
(
c
n
)
n
j
1
n
1
j
2
n
2
...j

We are looking for j(n


1,
n
2
, ..., n

), which is function of n
1
, n
2
, ..., n

.
So we can rewrite the previous equation,

(_

2 )
- 1
_

n
1
n
2
...n

n
(
c
n
)
n
(

n
1
c
)
n
1
(

n
2
c
)
n
2
...
(

c
)
n

j
1
n
1
j
2
n
2
...j

(_

2 )
- 1
_

n
(
c
n
)
n
(

n
1
n
2
...n

1
)

2
1
(

n
1
cj
1
)
n
1
(

n
2
cj
2
)
n
2
...
(

cj

)
n

Use c
n
1
c
n
2
...c
n

c
n
1
- n
2
- ... - n

c
n
.
=

(_

2 )
- 1
_

n n
n
(

n
1
n
2
...n

1
)

2
1
(

n
1
j
1
)
n
1
(

n
2
j
2
)
n
2
...
(

)
n

And use n n
1
-n
2
-... -n

(_

2 )
- 1
_

n n
n
1
- n
2
- ... - n

n
1
n
2
...n

1
)

2
1
(

n
1
j
1
)
n
1
(

n
2
j
2
)
n
2
...
(

)
n

(_

2 )
- 1
_

n
(

n
1
n
2
...n

1
)

2
1
(

n
1
nj
1
)
n
1
(

n
2
nj
2
)
n
2
...
(

nj

)
n

Remember j(n
1,
n
2
, ..., n

) is function of n
1
, n
2
, ..., n

. n and j
1
, j
2
, ..., j

are constants.
Let's gather n
1
, n
2
, ..., n

,
=

(_

2 )
- 1
_

n
(

n
1
1
)

2
1
(

n
2
1
)

2
1
...
(

1
)

2
1
(

n
1
nj
1
)
n
1
(

n
2
nj
2
)
n
2
...
(

nj

)
n

(_

2 )
- 1
_

n
(

n
1
nj
1
)

2
1
(

n
2
nj
2
)

2
1
...
(

nj

2
1
(

j
1
j
2
...j

1
)

2
1
(

n
1
nj
1
)
n
1
(

n
2
nj
2
)
n
2
...
(

nj

)
n

(_

2 )
- 1
_

n
(

j
1
j
2
...j

1
)

2
1
(

n
1
nj
1
)
n
1
-

2
1
(

n
2
nj
2
)
n
2
-

2
1
...
(

nj

)
n

2
1
and gather n.
=

(_

2 )
- 1
n

2
1
(

j
1
j
2
...j

1
)

2
1
(

n
1
nj
1
)
n
1
-

2
1
(

n
2
nj
2
)
n
2
-

2
1
...
(

nj

)
n

2
1
=

(_

2 )
- 1
1
(

j
1
j
2
...j

n
)

2
1
(

n
1
nj
1
)
n
1
-

2
1
(

n
2
nj
2
)
n
2
-

2
1
...
(

nj

)
n

2
1
=

(_

2 )
- 1
1
(

n
- 1
j
1
j
2
...j

1
)

2
1
(

n
1
nj
1
)
n
1
-

2
1
(

n
2
nj
2
)
n
2
-

2
1
...
(

nj

)
n

2
1
=

(_

2 )
- 1
1
(

n
- 1
1
)

2
1
(

j
1
j
2
...j

1
)

2
1
(

n
1
nj
1
)
n
1
-

2
1
(

n
2
nj
2
)
n
2
-

2
1
...
(

nj

)
n

2
1
=

(_

2 )
- 1
1

(_

n )
- 1
1
(

j
1
j
2
...j

1
)

2
1
(

n
1
nj
1
)
n
1
-

2
1
(

n
2
nj
2
)
n
2
-

2
1
...
(

nj

)
n

2
1
=

(_

2n )
- 1
1
(

j
1
j
2
...j

1
)

2
1
(

n
1
nj
1
)
n
1
-

2
1
(

n
2
nj
2
)
n
2
-

2
1
...
(

nj

)
n

2
1
So, j(n
1,
n
2
, ..., n

)=C
(

n
1
i
1
)
n
1
-

2
1
(

n
2
i
2
)
n
2
-

2
1
...
(

)
n

2
1
.
(C is constant, i
1
, i
2
, ..., i

is expected frequency, i
1
nj
1
,...)
It is a lot simpler than the original equation.
Let n
i
nj
i
-r
i
_

nj
i
. It becomes

(_

2n )
- 1
1
(

j
1
j
2
...j

1
)

2
1
(

nj
1
-r
1
_

nj
1
nj
1
)
nj
1
- r
1
_

nj
1
-

2
1
(

nj
2
-r
2
_

nj
2
nj
2
)
nj
2
- r
2
_

nj
2
-

2
1
...
(

nj

-r

nj

nj

)
nj

- r

nj
Take note,
(

nj
1
-r
1
_

nj
1
nj
1
)
nj
1
- r
1
_

nj
1
-

2
1
=
(

nj
1
nj
1
-r
1
_

nj
1
)
- nj
1
- r
1
_

nj
1
-

2
1
=
(
1 -

nj
1
r
1
)
- nj
1
- r
1
_

nj
1
-

2
1
Use lim
n
(
1 -

n
r
)
n
c.
To get lim
n
(
1 -

n
r
)
- n
2
- rn -

2
1
,
remember n means :0, n

:
1
.
So, lim
n
(
1 -

n
r
)
- n
2
- rn -

2
1
=lim
:0
(1 -r:)
-
(
:
1
)
2
- r
(
:
1
)
-

2
1
=lim
:0
exp[log(1 -r:)
-
(
:
1
)
2
- r
(
:
1
)
-

2
1
[
=lim
:0
exp[(-
(
:
1
)
2
- r
(
:
1
)
-

2
1
)log(1 -r:)[
To compute lim
:0
log(1 -r:),
Use lim
r0
log(1 -r)

n 1

n
(-1)
n - 1
r
n
.
lim
r:0
log(1 -r:)=lim
r0

n 1

n
(- 1)
n - 1
r
n
:
n
.
lim
r0

n 1

n
(-1)
n - 1
r
n
:
n
r: -

2
1
r
2
:
2
-

8
1
r
8
:
8
-...
So the original equation
lim
:0
exp[(-
(
:
1
)
2
- r
(
:
1
)
-

2
1
)log(1 -r:)[
=lim
:0
exp[(-
(
:
1
)
2
- r
(
:
1
)
-

2
1
)(r: -

2
1
r
2
:
2
-

8
1
r
8
:
8
-... )[
Since :0, we take only the terms of :
- 1
and constant,
(-
(
:
1
)
2
- r
(
:
1
)
-

2
1
)(r: -

2
1
r
2
:
2
-

8
1
r
8
:
8
-...)
=
-
(
:
1
)
2
(r: -

2
1
r
2
:
2
-

8
1
r
8
:
8
-... )
-r
(
:
1
)
(r: -

2
1
r
2
:
2
-

8
1
r
8
:
8
-... )
-

2
1
(r: -

2
1
r
2
:
2
-

8
1
r
8
:
8
-... )
=
-r:
- 1
-

2
1
r
2
-

8
1
r
8
:-...
- r
2
-

2
1
r
8
:-

8
1
r
8
:
8
-...
-

2
1
r: -

4
1
r
2
:
2
-

0
1
r
8
:
8
-...
=-r:
- 1
-

2
1
r
2
-...
So the equation
lim
:0
exp[(-
(
:
1
)
2
- r
(
:
1
)
-

2
1
)(r: -

2
1
r
2
:
2
-

8
1
r
8
:
8
-... )[
=lim
r0
exp(-r:
- 1
-

2
1
r
2
-... )
=lim
n
exp(-rn -

2
1
r
2
-... )
So the equation
(
1 -

nj
1
r
1
)
- nj
1
- r
1
_

nj
1
-

2
1
is, as _

nj
1
increases,
(
1 -

nj
1
r
1
)
- nj
1
- r
1
_

nj
1
-

2
1
~exp(- r
1
_

nj
1
-

2
1
r
1
2
).
So the original equation

(_

2n )
- 1
1
(

j
1
j
2
...j

1
)

2
1
(

nj
1
-r
1
_

nj
1
nj
1
)
nj
1
- r
1
_

nj
1
-

2
1
(

nj
2
-r
2
_

nj
2
nj
2
)
nj
2
- r
2
_

nj
2
-

2
1
...
(

nj

-r

nj

nj

)
nj

- r

nj
=

(_

2n )
- 1
1
(

j
1
j
2
...j

1
)

2
1
exp(-r
1
_

nj
1
-

2
1
r
1
2
)exp(-r
2
_

nj
2
-

2
1
r
2
2
)...exp(- r

nj

2
1
r

2
)
=

(_

2n )
- 1
1
(

j
1
j
2
...j

1
)

2
1
exp(-r
1
_

nj
1
-r
2
_

nj
2
-... -r

nj

)exp(-

2
1
r
1
2
-

2
1
r
2
2
-... -

2
1
r

2
)
Since n
i
nj
i
-r
i
_

nj
i
, r
i
_

nj
i
n
i
- nj
i
.

i
r
i
_

nj
i

i
n
i
- nj
i

i
n
i
-

i
nj
i

i
n
i
- n

i
j
i
n- n 0.
So the equation becomes
=

(_

2n )
- 1
1
(

j
1
j
2
...j

1
)

2
1
exp(-

2
1
r
1
2
-

2
1
r
2
2
- ... -

2
1
r

2
)
P-value is
=
]

,
(_

2n )
- 1
1
(

j
1
j
2
...j

1
)

2
1
exp(-

2
1
r
1
2
-

2
1
r
2
2
-... -

2
1
r

2
)dr
1
dr
2
dr

=
,
(_

2n )
- 1
1
(

j
1
j
2
...j

1
)

2
1
exp(-

2
1
r
1
2
-

2
1
r
2
2
- ... -

2
1
r

2
)dV
let r
2
r
1
2
- r
2
2
- - r
n
2
.
j(n
1,
n
2
, ..., n

)=j(r
1
, r
2
, ... , r
n
)=j(r)=

(_

2n )
- 1
1
(

j
1
j
2
...j

1
)

2
1
exp(-r
2
)
Since dV is volume of the s-dimensional space, dV Cr
- 1
dr.
,
r

(_

2n )
- 1
1
(

j
1
j
2
...j

1
)

2
1
exp(- r
2
)Cr
- 1
dr would suffice
,
r 0

(_

2n )
- 1
1
(

j
1
j
2
...j

1
)

2
1
exp(-r
2
)Cr
- 1
dr =1 might not be true.
Let
,
r 0

C exp(-r
2
)r
- 1
dr=1.

2
-distribution with degree of freedom k is
r r
2
and dr 2rdr makes the former equation is exactly a
2
-distribution.
(Notice: r
i

nj
i
n
i
-nj
i
, r
i
2

nj
i
(n
i
-nj
i
)
2
, r
1
-r
2
-... -r

0 makes (r
1
, r
2
,..., r

)
a s-1 dimensional hyperplane.)

You might also like