Pearsons Chi-Squared Test

Pearson's Chi-squared Test Pearson's Chi-squared Test Pearson's Chi-squared Test Pearson's Chi-squared Test
Multinomial distribution is like this.

j(n
1
, n
2
, ... n
)
(
n
n
1
n
2
... n
)
j
1
n
1
j
2
n
2
...j
n
1
!n
2
!...n
!
n!
j
1
n
1
j
2
n
2
...j
But to compute j(n

1
, n
2
, ... n
)
(
n
n
1
n
2
... n
)
j
1
n
1
j
2
n
2
...j
is overwhelming. To compute
p-value which is the probability of given event or events that are rare than given
events, we should calculate several j(n
1
, n
2
, ... n
)
(
n
n
1
n
2
... n
)
j
1
n
1
j
2
n
2
...j
for different
n
1
, n
2
, ..., n
.
For example, s=5. And we have 100 observations. For the probability j(n
1
, n
2
, ... n
)
to be calculated, the condition 0 n
1
n, 0 n
2
n, ..., 0 n
2
n,
/ 1
n
/
n should
be satisfied. So the number of possible combination of (n
1
, n
2
, ... n
) is
So it would take years to compute j(n
1
, n
2
, ... n
)
(
n
n
1
n
2
... n
)
j
1
n
1
j
2
n
2
...j
for possible
combination of (n
1
, n
2
, ... n
). Even on my computer, it took 20 minutes!

n! get harder to compute when n gets bigger. Sterling's formula shows
approximation for n!
n! _
2n
(
c
n
)
n
It shows that factorial on the left side is approximated by exponents. If we take
logarithm on both sides, we get
logn! nlogn- n- O(ln(n))
Let's use Sterling's formula to compute
n
1
!n
2
!...n
!
n!
j
1
n
1
j
2
n
2
...j
. We can use
approximation,
n! _
2n
(
c
n
)
n
, n
1
! _
2n
1 (
c
n
1
)
n
1
, n
2
! _
2n
2 (
c
n
2
)
n
2
, ... , n
! _
2n
(
c
n
)
n
So,
n
1
!n
2
!...n
!
n!
j
1
n
1
j
2
n
2
...j
=
_
2n
1 (
c
n
1
)
n
1
_
2n
2 (
c
n
2
)
n
2
...
_
2n
(
c
n
)
n
2n
(
c
n
)
n
j
1
n
1
j
2
n
2
...j
=
_
2n
1
_
2n
2
...
_
2n
(
c
n
1
)
n
1
(
c
n
2
)
n
2
...
(
c
n
)
n
2n
(
c
n
)
n
j
1
n
1
j
2
n
2
...j
(_
2 )
n
1
n
2
...n
(
c
n
1
)
n
1
(
c
n
2
)
n
2
...
(
c
n
)
n
2n
(
c
n
)
n
j
1
n
1
j
2
n
2
...j
(_
2 )
- 1
_
n
1
n
2
...n
(
c
n
1
)
n
1
(
c
n
2
)
n
2
...
(
c
n
)
n
n
(
c
n
)
n
j
1
n
1
j
2
n
2
...j
We are looking for j(n

1,
n
2
, ..., n
), which is function of n
1
, n
2
, ..., n
.
So we can rewrite the previous equation,
(_
2 )
- 1
_
n
1
n
2
...n
n
(
c
n
)
n
(
n
1
c
)
n
1
(
n
2
c
)
n
2
...
(
c
)
n
j
1
n
1
j
2
n
2
...j
(_
2 )
- 1
_
n
(
c
n
)
n
(
n
1
n
2
...n
1
)
2
1
(
n
1
cj
1
)
n
1
(
n
2
cj
2
)
n
2
...
(
cj
)
n
Use c
n
1
c
n
2
...c
n
c
n
1
- n
2
- ... - n
c
n
.
=
(_
2 )
- 1
_
n n
n
(
n
1
n
2
...n
1
)
2
1
(
n
1
j
1
)
n
1
(
n
2
j
2
)
n
2
...
(
)
n
And use n n
1
-n
2
-... -n
(_
2 )
- 1
_
n n
n
1
- n
2
- ... - n
n
1
n
2
...n
1
)
2
1
(
n
1
j
1
)
n
1
(
n
2
j
2
)
n
2
...
(
)
n
(_
2 )
- 1
_
n
(
n
1
n
2
...n
1
)
2
1
(
n
1
nj
1
)
n
1
(
n
2
nj
2
)
n
2
...
(
nj
)
n
Remember j(n
1,
n
2
, ..., n
) is function of n
1
, n
2
, ..., n
. n and j
1
, j
2
, ..., j
are constants.
Let's gather n
1
, n
2
, ..., n
,
=
(_
2 )
- 1
_
n
(
n
1
1
)
2
1
(
n
2
1
)
2
1
...
(
1
)
2
1
(
n
1
nj
1
)
n
1
(
n
2
nj
2
)
n
2
...
(
nj
)
n
(_
2 )
- 1
_
n
(
n
1
nj
1
)
2
1
(
n
2
nj
2
)
2
1
...
(
nj
2
1
(
j
1
j
2
...j
1
)
2
1
(
n
1
nj
1
)
n
1
(
n
2
nj
2
)
n
2
...
(
nj
)
n
(_
2 )
- 1
_
n
(
j
1
j
2
...j
1
)
2
1
(
n
1
nj
1
)
n
1
-
2
1
(
n
2
nj
2
)
n
2
-
2
1
...
(
nj
)
n
2
1
and gather n.
=
(_
2 )
- 1
n
2
1
(
j
1
j
2
...j
1
)
2
1
(
n
1
nj
1
)
n
1
-
2
1
(
n
2
nj
2
)
n
2
-
2
1
...
(
nj
)
n
2
1
=
(_
2 )
- 1
1
(
j
1
j
2
...j
n
)
2
1
(
n
1
nj
1
)
n
1
-
2
1
(
n
2
nj
2
)
n
2
-
2
1
...
(
nj
)
n
2
1
=
(_
2 )
- 1
1
(
n
- 1
j
1
j
2
...j
1
)
2
1
(
n
1
nj
1
)
n
1
-
2
1
(
n
2
nj
2
)
n
2
-
2
1
...
(
nj
)
n
2
1
=
(_
2 )
- 1
1
(
n
- 1
1
)
2
1
(
j
1
j
2
...j
1
)
2
1
(
n
1
nj
1
)
n
1
-
2
1
(
n
2
nj
2
)
n
2
-
2
1
...
(
nj
)
n
2
1
=
(_
2 )
- 1
1
(_
n )
- 1
1
(
j
1
j
2
...j
1
)
2
1
(
n
1
nj
1
)
n
1
-
2
1
(
n
2
nj
2
)
n
2
-
2
1
...
(
nj
)
n
2
1
=
(_
2n )
- 1
1
(
j
1
j
2
...j
1
)
2
1
(
n
1
nj
1
)
n
1
-
2
1
(
n
2
nj
2
)
n
2
-
2
1
...
(
nj
)
n
2
1
So, j(n
1,
n
2
, ..., n
)=C
(
n
1
i
1
)
n
1
-
2
1
(
n
2
i
2
)
n
2
-
2
1
...
(
)
n
2
1
.
(C is constant, i
1
, i
2
, ..., i
is expected frequency, i
1
nj
1
,...)
It is a lot simpler than the original equation.
Let n
i
nj
i
-r
i
_
nj
i
. It becomes
(_
2n )
- 1
1
(
j
1
j
2
...j
1
)
2
1
(
nj
1
-r
1
_
nj
1
nj
1
)
nj
1
- r
1
_
nj
1
-
2
1
(
nj
2
-r
2
_
nj
2
nj
2
)
nj
2
- r
2
_
nj
2
-
2
1
...
(
nj
-r
nj
nj
)
nj
- r
nj
Take note,
(
nj
1
-r
1
_
nj
1
nj
1
)
nj
1
- r
1
_
nj
1
-
2
1
=
(
nj
1
nj
1
-r
1
_
nj
1
)
- nj
1
- r
1
_
nj
1
-
2
1
=
(
1 -
nj
1
r
1
)
- nj
1
- r
1
_
nj
1
-
2
1
Use lim
n
(
1 -
n
r
)
n
c.
To get lim
n
(
1 -
n
r
)
- n
2
- rn -
2
1
,
remember n means :0, n
:
1
.
So, lim
n
(
1 -
n
r
)
- n
2
- rn -
2
1
=lim
:0
(1 -r:)
-
(
:
1
)
2
- r
(
:
1
)
-
2
1
=lim
:0
exp[log(1 -r:)
-
(
:
1
)
2
- r
(
:
1
)
-
2
1
[
=lim
:0
exp[(-
(
:
1
)
2
- r
(
:
1
)
-
2
1
)log(1 -r:)[
To compute lim
:0
log(1 -r:),
Use lim
r0
log(1 -r)
n 1
n
(-1)
n - 1
r
n
.
lim
r:0
log(1 -r:)=lim
r0
n 1
n
(- 1)
n - 1
r
n
:
n
.
lim
r0
n 1
n
(-1)
n - 1
r
n
:
n
r: -
2
1
r
2
:
2
-
8
1
r
8
:
8
-...
So the original equation
lim
:0
exp[(-
(
:
1
)
2
- r
(
:
1
)
-
2
1
)log(1 -r:)[
=lim
:0
exp[(-
(
:
1
)
2
- r
(
:
1
)
-
2
1
)(r: -
2
1
r
2
:
2
-
8
1
r
8
:
8
-... )[
Since :0, we take only the terms of :
- 1
and constant,
(-
(
:
1
)
2
- r
(
:
1
)
-
2
1
)(r: -
2
1
r
2
:
2
-
8
1
r
8
:
8
-...)
=
-
(
:
1
)
2
(r: -
2
1
r
2
:
2
-
8
1
r
8
:
8
-... )
-r
(
:
1
)
(r: -
2
1
r
2
:
2
-
8
1
r
8
:
8
-... )
-
2
1
(r: -
2
1
r
2
:
2
-
8
1
r
8
:
8
-... )
=
-r:
- 1
-
2
1
r
2
-
8
1
r
8
:-...
- r
2
-
2
1
r
8
:-
8
1
r
8
:
8
-...
-
2
1
r: -
4
1
r
2
:
2
-
0
1
r
8
:
8
-...
=-r:
- 1
-
2
1
r
2
-...
So the equation
lim
:0
exp[(-
(
:
1
)
2
- r
(
:
1
)
-
2
1
)(r: -
2
1
r
2
:
2
-
8
1
r
8
:
8
-... )[
=lim
r0
exp(-r:
- 1
-
2
1
r
2
-... )
=lim
n
exp(-rn -
2
1
r
2
-... )
So the equation
(
1 -
nj
1
r
1
)
- nj
1
- r
1
_
nj
1
-
2
1
is, as _
nj
1
increases,
(
1 -
nj
1
r
1
)
- nj
1
- r
1
_
nj
1
-
2
1
~exp(- r
1
_
nj
1
-
2
1
r
1
2
).
So the original equation
(_
2n )
- 1
1
(
j
1
j
2
...j
1
)
2
1
(
nj
1
-r
1
_
nj
1
nj
1
)
nj
1
- r
1
_
nj
1
-
2
1
(
nj
2
-r
2
_
nj
2
nj
2
)
nj
2
- r
2
_
nj
2
-
2
1
...
(
nj
-r
nj
nj
)
nj
- r
nj
=
(_
2n )
- 1
1
(
j
1
j
2
...j
1
)
2
1
exp(-r
1
_
nj
1
-
2
1
r
1
2
)exp(-r
2
_
nj
2
-
2
1
r
2
2
)...exp(- r
nj
2
1
r
2
)
=
(_
2n )
- 1
1
(
j
1
j
2
...j
1
)
2
1
exp(-r
1
_
nj
1
-r
2
_
nj
2
-... -r
nj
)exp(-
2
1
r
1
2
-
2
1
r
2
2
-... -
2
1
r
2
)
Since n
i
nj
i
-r
i
_
nj
i
, r
i
_
nj
i
n
i
- nj
i
.
i
r
i
_
nj
i
i
n
i
- nj
i

i
n
i
-
i
nj
i

i
n
i
- n
i
j
i
n- n 0.
So the equation becomes
=
(_
2n )
- 1
1
(
j
1
j
2
...j
1
)
2
1
exp(-
2
1
r
1
2
-
2
1
r
2
2
- ... -
2
1
r
2
)
P-value is
=
]
,
(_
2n )
- 1
1
(
j
1
j
2
...j
1
)
2
1
exp(-
2
1
r
1
2
-
2
1
r
2
2
-... -
2
1
r
2
)dr
1
dr
2
dr
=
,
(_
2n )
- 1
1
(
j
1
j
2
...j
1
)
2
1
exp(-
2
1
r
1
2
-
2
1
r
2
2
- ... -
2
1
r
2
)dV
let r
2
r
1
2
- r
2
2
- - r
n
2
.
j(n
1,
n
2
, ..., n
)=j(r
1
, r
2
, ... , r
n
)=j(r)=
(_
2n )
- 1
1
(
j
1
j
2
...j
1
)
2
1
exp(-r
2
)
Since dV is volume of the s-dimensional space, dV Cr
- 1
dr.
,
r
(_
2n )
- 1
1
(
j
1
j
2
...j
1
)
2
1
exp(- r
2
)Cr
- 1
dr would suffice
,
r 0
(_
2n )
- 1
1
(
j
1
j
2
...j
1
)
2
1
exp(-r
2
)Cr
- 1
dr =1 might not be true.
Let
,
r 0
C exp(-r
2
)r
- 1
dr=1.
2
-distribution with degree of freedom k is
r r
2
and dr 2rdr makes the former equation is exactly a
2
-distribution.
(Notice: r
i

nj
i
n
i
-nj
i
, r
i
2
nj
i
(n
i
-nj
i
)
2
, r
1
-r
2
-... -r
0 makes (r
1
, r
2
,..., r
)
a s-1 dimensional hyperplane.)

Pearsons Chi-Squared Test

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Pearsons Chi-Squared Test

Uploaded by

Copyright:

Available Formats

Pearson's Chi-squared Test Pearson's Chi-squared Test Pearson's Chi-squared Test Pearson's Chi-squared Test

Multinomial distribution is like this.

But to compute j(n

). Even on my computer, it took 20 minutes!

We are looking for j(n

You might also like