You are on page 1of 42

Advanced Statistics

Chen, L.-A.

Distribution of order statistics:


Review : Let X1 , . . . , Xk be random variables with joint p.d.f f (x1 , . . . , xk )
and Y1 = h1 (X1 , . . . , Xk ), Y2 = h2 (X1 , . . . , Xk ), . . . , Yk = hk (X1 , . . . , Xk ) be
1-1 transformation with inverse functions x1 = w1 (y1 , . . . , yk ), x2 = w2 (y1 , . . . , yk ), . . . , xk =
wk (y1 , . . . , yk ). Then the joint p.d.f of Y1 , . . . , Yk is
fY1 ,...,Yk (y1 , . . . , yk ) = f (w1 (y1 , . . . , yk ), . . . , wk (y1 , . . . , yk ))|J| with Jacobian
∂w
1 · · · ∂w1

∂y1 ∂yk
J = ... ..

∂wn .
∂wn

∂y1
· · · ∂yk
Order statistics are not 1-1 transformations.
Suppose that we have random variables X1 , X2 .Order statistics are Y1 =
min{X1 , X2 }, Y2 = max{X1 , X2 }
If Y1 , . . . , Yk are not 1-1 transformations,
 
we may partition the space Rk into
Y1 
 . 

 . 
 . 
 
Yk
sets A1 , . . . , Am such that Aj −→ B is a 1-1 transformation. There are
jacobians J1 , . . . , Jm such that the the joint p.d.f of Y1 , . . . , Yk is
 
m y 1
f (x1 , . . . , xn )|Jj |,  ...  ∈ B
X
fY1 ,...,Yk (y1 , . . . , yk ) =
 
j=1 yk
Let X1 , . . . , Xn be a random sample from a continuous distribution with p.d.f
f (x), a < x < b. We consider order statistics Y1 = min{X1 , . . . , Xn }, . . . , Yn =
max{X1 , . . . , Xn } that satisfies Y1 ≤ Y2 ≤ · · · ≤ Yn . We call Yi the ith order
statistic of random sample X1 , . . . , Xn .
Thm. The joint p.d.f of Y1 , . . . , Yn is

n!f (y1 )f (y2 ) · · · f (yn ) if a < y1 < y2 < · · · < yn < b
fY1 ,...,Yn (y1 , . . . , yn ) =
0 otherwise.

1
Proof. We consider case n = 3 only.
iid
We have X1 , X2 , X3 ∼ f (x), a < x < b, the joint p.d.f of X1 , X2 , X3 is

f (x1 , x2 , x3 ) = f (x1 )f (x2 )f (x3 ), a < xi < b, i = 1, 2, 3.


 
X1
The space of  X2  is
X3
 
X1
A = { X2  : a < xi < b, i = 1, 2, 3}
X3
Consider the partitions:
 
X1
A1 = { X2  : a < x1 < x2 < x3 < b}, x1 = y1 , x2 = y2 , x3 = y3
X3
 
X1
A2 = { X2  : a < x2 < x1 < x3 < b}, x1 = y2 , x2 = y1 , x3 = y3
X3
 
X1
A3 = { X2  : a < x1 < x3 < x2 < b}, x1 = y1 , x2 = y3 , x3 = y2
X3
 
X1
A4 = { X2  : a < x2 < x3 < x1 < b}, x1 = y3 , x2 = y1 , x3 = y2
X3
 
X1
A5 = { X2  : a < x3 < x1 < x2 < b}, x1 = y2 , x2 = y3 , x3 = y1
X3
 
X1
A6 = { X2  : a < x3 < x2 < x1 < b}, x1 = y3 , x2 = y2 , x3 = y1
X3
Each set Aj forms a 1-1 transformation, i.e.
 

Y1 

 Y2 
  

Y3

y1
Aj −→ { y2  : a < y1 < y2 < y3 < b}
y3

2
Jacobians:

1 0 0 0 1 0 1 0 0

J1 = 0 1 0 = 1, J2 = 1 0 0 = −1, J3 = 0 0 1 = −1,
0 0 1 0 0 1 0 1 0

0 0 1 0 1 0 0 0 1

J4 = 1 0 0 = 1, J5 = 0 0 1 = 1, J6 = 0 1 0 = −1
0 1 0 1 0 0 1 0 0
The joint p.d.f of Y1 , Y2 , Y3 is

fY1 ,Y2 ,Y3 (y1 , y2 , y3 ) = f (y1 )f (y2 )f (y3 )|1| + f (y2 )f (y1 )f (y3 )| − 1|
+ f (y1 )f (y3 )f (y2 )| − 1| + f (y3 )f (y1 )f (y2 )|1|
+ f (y2 )f (y3 )f (y1 )|1| + f (y3 )f (y2 )f (y1 )| − 1|
= 6f (y1 )f (y2 )f (y3 )
= 3!f (y1 )f (y2 )f (y3 )

Let Y1 , . . . , Yn be the order statistics for a random sample X1 , . . . , Xn from


a continuous distribution with p.d.f f (x), a < x < b. Then the joint p.d.f of
Y1 , . . . , Yk is

fY1 ,...,Yn (y1 , . . . , yn ) = n!f (y1 )f (y2 ) · · · f (yn ), a < y1 < y2 < · · · < yn < b

Note: If a < x < b, then


( R
b
f (x)dx = F (x)|by = F (b) − F (y) = 1 − F (y)
Ryy
a
f (x)dx = F (x)|ya = F (y) − F (a) = F (y)

where F is the d.f. of X.


Thm. (a) The marginal p.d.f of Y1 is

fY1 (y1 ) = n(1 − F (y1 ))n−1 · f (y1 ), a < y1 < b

(b) The marginal p.d.f of Yn is

fYn (yn ) = n(F (yn ))n−1 · f (yn ), a < yn < b

(c) The marginal p.d.f of Yj is


n!
fYj (yj ) = (F (yj ))j−1 (1 − F (yj ))n−j · f (yj ), a < yj < b
(j − 1)!(n − j)!

3
(d) For 1 ≤ j < k ≤ n, the joint p.d.f of Yj and Yk is

fYj ,Yk (yj , yk ) =


n!
(F (yj ))j−1 (F (yk ) − F (yj ))k−j−1 (1 − F (yk ))n−k f (yj )f (yk ),
(j − 1)!(k − j − 1)!(n − k)!
a < yj < yk < b

Proof. Marginal p.d.f of Y1 is


Z bZ b Z b
fY1 (y1 ) = n! ··· f (y1 )f (y2 ) · · · f (yn )dyn · · · dy2
y1 y2 yn−1
Z bZ b Zb
1
= n! ··· f (y1 )f (y2 ) · · · f (yn−1 ) (1 − F (yn−1 ))dyn−1 · · · dy2
y1 y2 yn−2 1!
Z bZ b Z b
1
= n! ··· f (y1 )f (y2 ) · · · f (yn−2 ) (1 − F (yn−2 ))2 dyn−2 · · · dy2
y1 y2 yn−3 2!
Z bZ b Z b
1
= n! ··· f (y1 )f (y2 ) · · · f (yn−3 ) (1 − F (yn−3 ))3 dyn−3 · · · dy2
y1 y2 yn−4 3!
= ······
Z b
1
= n! f (y1 )f (y2 ) (1 − F (y2 ))n−2 dy2
y1 (n − 2)!
n!
= f (y1 )(1 − F (y1 ))n−1
(n − 1)!
= n(1 − F (y1 ))n−1 f (y1 )

Marginal p.d.f of Yn is
Z yn Z y3 Z y2
fYn (yn ) = n! ··· f (y1 )f (y2 ) · · · f (yn )dy1 · · · dyn−1
a a a
Z yn Z y3
= n! ··· f (y2 ) · · · f (yn )F (y2 )dy2 · · · dyn−1
Za yn Za y4
1
= n! ··· f (y3 ) · · · f (yn ) (F (y3 ))2 dy3 · · · dyn−1
a a 2!
= ······
Z yn
1
= n! f (yn−1 )f (yn ) (F (yn−1 ))n−2 dyn−1
a (n − 2)!
1
= n!f (yn ) (F (yn ))n−1
(n − 1)!
= n(F (yn ))n−1 f (yn )

4
Marginal p.d.f of Yj is
Z b Z b Z b Z yj Z y2
fYj (yj ) = n! ··· ··· f (y1 ) · · · f (yn )dy1 · · · dyj−1 dyn · · · dyj+1
yj yn−2 yn−1 a a
1 1
= n!f (yj ) (F (yj ))j−1 (1 − F (yj ))n−j
(j − 1)! (n − j)!
n!
= (F (yj ))j−1 (1 − F (yj ))n−j f (yj ), a < yj < b
(j − 1)!(n − j)!

Joint p.d.f of Yj and Yk is


Z yj Z y2 Z yk Z yk Z b Z b Z b
fYj ,Yk (yj , yk ) = n! ··· ··· ··· f (y1 ) · · · f (yn )
a a yj yk−2 yk yn−2 yn−1

dyn dyn−1 · · · dyk+1 dyk−1 · · · dyj+1 dy1 · · · dyj−1


n!
= (F (yj ))j−1 (F (yk ) − F (yj ))k−j−1
(j − 1)!(k − j − 1)!(n − k)!
(1 − F (yk ))n−k f (yj )f (yk ), where a < yj < yk < b

t-distribution

Z ∼ N (0, 1)
indep.
V ∼ χ2 (r)
We say that T = √ZV has a t-distribution with d.f. r and denote T ∼ t(r).
r
Want the p.d.f of T .
Proof. The joint p.d.f of Z and V is
1 z2 1 r
−1 − v2
fZ,V (z, v) = √ e− 2 r r v
2 e , z ∈ R, v > 0
2π Γ( 2 )2 2

Consider the transformation T = √ZV , U = V .


r
 

T 
U
   
z t
A={ : −∞ < z < ∞, v > 0} −→ B = { : −∞ < t < ∞, u > 0}
v u
  √
t
For ∈ B, we have v = u, z = √ut r
,unique solution implies a 1-1
u

transformation. Inverse function v = u, z = √utr
.

5
Jacobian is √u − 12 √
∂z ∂z √ u√ t u
J = ∂t ∂u = r 2 r = √

∂v ∂v
∂t ∂u

0 1 r
The joint p.d.f of T and U is
√ √
ut u
fT,U (t, u) = fZ,V ( √ , u)| √ |
r r

1 − 1 u t2 1 r
−1 − u u
= √ e 2r r r u
2 e 2√
2π Γ( 2 )2 2 r
1 2
− 12 u(1+ tr ) r+1
=√ r r √ e u 2 −1 , −∞ < t < ∞, u > 0
2πΓ( 2 )2 r
2

The p.d.f of T is
Z ∞
1 r+1
−1 1 t2
fT (t) = √ r√ u 2 e− 2 u(1+ r ) du
2πΓ( 2r )2 2 r 0
2
T h du 1
( Let H = U (1 + ), u = 2 , = 2 )
r 1 + tr dh 1 + tr
Z ∞ r+1
1 h 2 −1 h 1
=√ r√ 2 r+1 −1
e− 2 2 dh
r
2πΓ( 2 )2 2 r 0 (1 + r ) 2 t 1 + tr
r+1 Z ∞
Γ( r+1
2
)2 2 1 1 r+1
−1 − h
=√ r √ 2 r+1 r+1 h 2 e 2 dh
2πΓ( 2r )2 2 r (1 + tr ) 2 0 Γ( r+1 2
)2 2

Γ( r+1
2
)
=√ t2 r+1
, −∞ < t < ∞
πrΓ( 2r )(1 + r
) 2

1 x
α−1 − β
X ∼ Gamma(α, β) if f (x) = x e ,x > 0
Γ(α)β α
r
If β = 2, we let α = , then X ∼ χ2 (r).
2
1 r x
If X ∼ χ (r), then f (x) = r r x 2 −1 e− 2 , x > 0
2
Γ( 2 )2 2

We need several convergence theorem for use in constructing C.I.


Thm. (a) Central Limit Theorem ( CLT )
If X1 , . . . , Xn is a random sample with mean µ and variance σ 2 < ∞, then
X −µ d
√ −→ N (0, 1)
σ/ n

6
(b) Weak Law of Large Numbers ( WLLN )
If X1 , . . . , Xn are iid random variables with mean µ and variance σ 2 < ∞,
then
P
X −→ µ
P P
Thm. If Yn −→ a, then g(Yn ) −→ g(a) for any continuous function g.
Thm. Slutsky’s theorem
d P
If Xn −→ X and Yn −→ a, then
d
Xn ± Yn −→ X ± a
d
Xn Yn −→ aX
Xn d X
−→ , if a 6= 0.
Yn a
Concept of C.I. :
(a)Suppose that P (X ∈ A) = 0.9. If we observe X many times with
x1 , . . . , xn of large n, then there are about number 0.9 × n obs. x0i s
such that xi ∈ A.

(b) If (t1 (X1 , . . . , Xn ), t2 (X1 , . . . , Xn )) is a 90% C.I. for θ,then we have about
number 0.9×n obs.(t1 , t2 ) such that θ ∈ (t1 , t2 ) when we observe (t1 , t2 )
many times (saying n times).

Note:
(a) The normal approximation by CLT can be applied on any distribution
f (x, θ), normal or not, when n is large (n ≥ 30)

(b) Why convergence in distribution ?


d
If Xn −→ N (0, 1),then
P (a ≤ Xn ≤ b) = FXn (b) − FXn (a) −→ FZ (b) − FZ (a) = P (a ≤ Z ≤ b)

Approximate C.I.:
(1) Let X1 , . . . , Xn be a random sample from f (x) with mean µ and variance
σ 2 , where f can be normal or not. Then
X −µ X −µ σ d
√ = √ · −→ N (0, 1) · I = N (0, 1) by Slutsky’s theorem .
s/ n σ/ n s

7
We have
X −µ
1 − α = P (−z α2 ≤ Z ≤ z α2 ) ' P (−z α2 ≤ √ ≤ z α2 )
s/ n
s s
= P (X − z α2 √ ≤ µ ≤ X + z α2 √ )
n n

Here, (X − z α2 √sn , X + z α2 √sn ) is an approximate 100(1 − α)% C.I. for


µ.

n
d P
(2) Let Y ∼ b(n, p). If X1 , . . . , Xn are iid Bernoulli(p), then Y = Xi .
i=1
Y
Let p̂ = n
. We have

Y d 1P n
P

 p̂ = = n
Xi = X −→ p by WLLN.

 n i=1
p̂ − p d X − p d

 q = √ −→ N (0, 1) by CLT.
 p(1−p) p(1−p)


n n

Then
s
p̂ − p p̂ − p p(1 − p) d
q =q · −→ N (0, 1)·I = N (0, 1) by Slutsky’s theorem .
p̂(1−p̂) p(1−p) p̂(1 − p̂)
n n

We have
p̂ − p
1 − α = P (−z α2 ≤ Z ≤ z α2 ) ' P (−z α2 ≤ q ≤ z α2 )
p̂(1−p̂)
n
r r
p̂(1 − p̂) p̂(1 − p̂)
= P (p̂ − z α2 ≤ p ≤ p̂ + z α2 )
n n
q q
p̂(1−p̂)
Here, (p̂ − z 2
α
n
, p̂ + z 2 p̂(1−p̂)
α
n
) is an approximate 100(1 − α)%
C.I. for p.

Example: Y ∼ b(n, p).n = 100, y = 20. Want a 95.4% for p. z0.023 = 2


A 95.4% C.Iqfor p is approximately
q as
20 0.2×0.8 20 0.2×0.8
( 100 − z0.023 100
, 100 + z0.023 100
) = (0.12, 0.28)

8
C.I. for σ 2 or σ:
Let X1 , . . . , Xn be a random sample from N (µ, σ 2 ) where µ and σ 2 are un-
known. We have
n
(n − 1)s2 2 2 1 X
2
∼ χ (n − 1), where s = (Xi − X)2
σ n − 1 i=1
α
Let χ2α and χ21− α satisfy 2
= P (χ2 (n − 1) ≤ χ2α ) = P (χ2 (n − 1) ≥ χ21− α ).
2 2 2 2
Then
1 − α = P (χ2α ≤ χ2 (n − 1) ≤ χ21− α )
2 2
2
(n − 1)s
= P (χ2α ≤ ≤ χ21− α )
2 σ2 2
2
(n − 1)s 2 (n − 1)s2
= P( ≤ σ ≤ )
χ21− α χ2α
2 2

(n−1)s2 (n−1)s2
Here, ( , χ2α ) is an approximate 100(1 − α)% C.I. for σ 2 .
χ21− α
2
r 2 r
(n−1)s2 2
1 − α = P ( χ2 α ≤ σ ≤ (n−1)s χ2α
) since all is positive.
1− 2 2

C.I for µ :
iid
(a) X1 , . . . , Xn ∼ N (µ, σ02 ) where σ02 is known.
X −µ
Pivotal quantity: Z = σ0 ∼ N (0, 1)

n

iid
(b) X1 , . . . , Xn ∼ N (µ, σ 2 )
X −µ
Pivotal quantity: T = ∼ t(n − 1)
√s
n

iid
(c) X1 , . . . , Xn ∼ f (x) with mean µ, n is large (n ≥ 30).
X −µ
Pivotal quantity: s ' N (0, 1)

n

C.I. for σ 2 :
iid
X1 , . . . , Xn ∼ N (µ, σ 2 )
2
Pivotal quantity: (n−1)s σ 2 ∼ χ2 (n − 1).
Confidence Interval for Difference of Means :
Case I :
(
iid
X1 , . . . , Xn ∼ N (µx , σx2 )
iid indep. and σx2 , σy2 are known.
2
Y1 , . . . , Ym ∼ N (µy , σy )

9
( 2
X ∼ N (µx , σnx )
⇒ σ2 independent
Y ∼ N (µy , my )
σx2 σy2
⇒ X − Y ∼ N (µx − µy , + )
n m
(X − Y ) − (µx − µy )
⇒Z= q ∼ N (0, 1)
σx2 σy2
n
+ m

So,

(X − Y ) − (µx − µy )
1 − α = P (−z α2 ≤ Z ≤ z α2 ) = P (−z α2 ≤ q ≤ z α2 )
σx2 σy2
n
+m
r r
σx2 σy2 σx2 σy2
= P (X − Y − z α2 + ≤ µx − µy ≤ X − Y + z α2 + )
n m n m
A 100(1 − α)%q C.I. for µx − µy is q
2 σ2 2 σy2
(X − Y − z α2 σnx + my , X − Y + z α2 σnx + m
)
iid
X1 , . . . , Xn ∼ f (x, θ)
Q = h(X1 , . . . , Xn , θ) is a pivotal quantity if it has a distribution free of
parameter θ
⇒ ∃a, b s.t. 1 − α = Pθ (a ≤ Q ≤ b), ∀θ
We want a pivotal quantity with
1 − α = P (a ≤ Q ≤ b) = P (t1 (x1 , . . . , xn ) ≤ θ ≤ t2 (x1 , . . . , xn ))

Case II :
Variances are unknown but they are equal.
(
iid
X1 , . . . , Xn ∼ N (µx , σ 2 )
iid indep. and σ 2 is unknown.
2
Y1 , . . . , Ym ∼ N (µy , σ )

σ2

 X ∼ N (µ x , n
)
2 n
(n − 1)sx

 
2 1
(Xi − X)2
P
2 s =



2
∼ χ (n − 1) 
 x n−1
⇒ σ indep. where i=1
m
σ2
 Y ∼ N (µ y , m
)  s 2
= 1
P
(Yi − Y )2
y
 2
 m−1
(m − 1)s

i=1

y
∼ χ2 (m − 1)



σ2
(
X − Y ∼ N (µx − µy , σ 2 ( n1 + m1 ))
⇒ (n−1)s2x +(m−1)s2y indep.
σ2
∼ χ2 (n + m − 2)

10
Then
(X−Y )−(µx −µy )
√1
σ n
1
+m (X − Y ) − (µx − µy )
T =q =q q ∼ t(n + m − 2)
(n−1)s2x +(m−1)s2y (n−1)s2x +(m−1)s2y 1 1
σ 2 (n+m−2) n+m−2 n
+m

We have
(X − Y ) − (µx − µy )
1 − α = P (−t α2 ≤ T ≤ z α2 ) = P (−t α2 ≤ q q ≤ t α2 )
(n−1)s2x +(m−1)s2y 1 1
n+m−2 n
+m
s r
(n − 1)s2x + (m − 1)s2y 1 1
= P (X − Y − t α2 + ≤ µx − µy
n+m−2 n m
s r
(n − 1)s2x + (m − 1)s2y 1 1
≤ X − Y + t α2 + )
n+m−2 n m
A 100(1 − α)% q C.I. 2for µx −2 q
µy is q q
(n−1)sx +(m−1)sy 1 1 (n−1)s2x +(m−1)s2y 1 1
(X − Y − t α2 n+m−2 n
+ m
,X −Y +t α
2 n+m−2 n
+ m
)

Example:
n = 10, m = 7, X = 4.2, Y = 3.4, (n−1)s2x = 490, (m−1)s2y = 224, t0.05 (15) =
1.753
A 90% C.I. for µx −
qµy is q q q
490+224 1 1 490+224 1
(4.2 − 3.4 − 1.753 15 10
+ 7 , 4.2 − 3.4 + 1.753 15 10
+ 17 ) =
(−5.16, 6.76)

Chapter Testing Hypothesis


Let X1 , . . . , Xn be a random sample from f (x, θ) where θ is unknown.

Def. A statistical hypothesis is a conjecture about unknown parameter θ. If


it specifies a single value for θ, it is called a simple hypothesis, otherwise, it
is a composite hypothesis.
Example :
H : θ = θ0 is a simple hypothesis.
H : θ ≤ θ0 , H : θ ≥ θ0 , H : θ1 ≤ θ ≤ θ2 are all composite hypothesis.
There are two hypothesis, one called the null hypothesis and one called the
alternative hypothesis.

11
Def. The null hypothesis, denoted by H0 , is a hypothesis that we will reject
it only if the data reveal strongly that it is not true.
The alternative hypothesis, denoted by H1 , is the hypothesis alternative to
the null hypothesis.

We often want to see if a new product (drug, manufacturing process a fertil-


izer) is functioning better than the old product. The hypothesis H0 and H1
are both concerning the quantity of new product as
H0 : the new product is not better than the old product.
vs. H1 : the new product is better than the old product.
Once the hypotheses are set, we need to do experiment on the new product
to generate a random sample to define a test for hypotheses.

Def. A test is a rule deciding whether to reject or not to reject the null
hypothesis. Usually, a test specifies a subset C of the sample space of the
random sample X1 , . . . , Xn that we reject H0 if the observation (x1 , . . . , xn )
falls in C and do not reject H0 if (x1 , . . . , xn ) does not fall in C. This subset
C is called the critical region or the rejection region.
For any test, there are two possible errors that may occur:
Type I error : H0 is true but we reject H0 .
Type II error : H1 is true but we do not reject H0 .

Def. The power function πc (θ) of critical region C is the probability of re-
jecting H0 when θ is true.
Let X1 , . . . , Xn be a random sample and we consider a test with critical region
C. The power function is
 
X1
πc (θ) = P (reject H0 : θ is true) = P ( ...  ∈ C : θ is true)
 
Xn

Example :
X= score of a test ∼ N (θ, 100). The past experience indicates θ = 75. Want
to test hypothesis H0 : θ = 75 vs. H1 : θ > 75
sol : Let X1 , . . . , X25 be a random sample from N (θ, 100) and we consider
critical region
 
X1 25
 ..  1 X
C1 = {X : X > 75} = { .  : Xi > 75}.
25 i=1
X25

12
The power function is

X −θ 75 − θ 75 − θ 75 − θ
πC1 (θ) = P (X > 75 : θ) = P ( > ) = P (Z > ) = 1−P (Z ≤ )
2 2 2 2
where X ∼ N (θ, 10025
) = N (θ, 4).
πC1 (75) = 0.5, πC1 (77) = 0.841, πC1 (79) = 0.977
If we choose critical region C2 = {X : X > 78}. Power function is
πC2 (θ) = P (X > 78 : θ) = 1 − P (Z ≤ 78−θ 2
)
πC2 (75) = 0.067

Def. The size of a test with critical region C is the maximum of the probability
of type I error, i.e. size=max πc (θ)
θ∈H0

The rule for choosing critical region is fixing a significance level α and find
the test among class of tests with size ≤ α that minimize the probability of
type II error. (Usually we let α = 0.01 or 0.05)

Def. Consider simple hypothesis H0 : θ = θ0 vs. H1 : θ = θ1 . We say that


the test with critical region C is a most powerful (MP) test of significance
level α if, for every set A with P ((X1 , . . . , Xn ) ∈ A : H0 ) ≤ α, the followings
hold:
(a) P ((X1 , . . . , Xn ) ∈ C : H0 ) = πc (θ0 ) = α
(b) P ((X1 , . . . , Xn ) ∈ C : H1 ) ≥ P ((X1 , . . . , Xn ) ∈ A : H1 ),
i.e. πc (θ1 ) ≥ πA (θ1 )
We also call C the MP critical region of significance level α.

Let X1 , . . . , Xn be a random sample from a distribution with p.d.f f (x, θ)


Joint p.d.f :
n
Y
f (x1 , . . . , xn , θ) = f (xi , θ) ⇒ function of x1 , . . . , xn
i=1

Likelihood function:
n
Y
L(θ, x1 , . . . , xn ) = f (xi , θ) ⇒ function of θ
i=1

The likelihood function is


n
Y
L(θ, x1 , . . . , xn ) = f (xi , θ)
i=1

13
A ratio of two likelihood as
L(θ0 , x1 , . . . , xn )
L(θ1 , x1 , . . . , xn )

is called the likelihood ratio. We will derive the MP test through this likeli-
hood ratio.

Thm. Neyman-Pearson Theorem


Consider the simple hypothesis H0 : θ = θ0 vs. H1 : θ = θ1 . Let C be the
critical region with k > 0 such that
 
x1
L(θ0 , x1 , . . . , xn )  ..
(a) ≤ k, for  . ∈C

L(θ1 , x1 , . . . , xn )
xn
 
x1
L(θ0 , x1 , . . . , xn )  ..
(b) ≥ k, for  . ∈/C

L(θ1 , x1 , . . . , xn )
xn
Z
(c)α = P ((X1 , . . . , Xn ) ∈ C : H0 ) ⇒ α = L(θ0 )
C

Then C is the MP critical region of significance level α.

Proof. Note that, for any set B,


Z Z
P ((X1 , . . . , Xn ) ∈ B : θ) = ··· f (x1 , . . . , xn , θ)dx1 · · · dxn
Z Z
= · · · L(θ, x1 , . . . , xn )dx1 · · · dxn
Z
saying
= L(θ)
B
R
Let A be a critical region
R with A L(θR 0) ≤ α .
We want to show that C L(θ1 ) ≥ A L(θ1 )
Now
Z Z Z Z Z Z
L(θ1 ) − L(θ1 ) = L(θ1 ) + L(θ1 ) − ( L(θ1 ) + L(θ1 ))
C A c A∩C c
ZC∩A ZC∩A A∩C

= L(θ1 ) − L(θ1 )
C∩Ac A∩C c

14
 
x1
L(θ0 ) 1
For  ...  ∈ C, ≤ k ⇒ L(θ1 ) ≥ L(θ0 )
 
L(θ1 ) k
xn
 
x1
L(θ0 ) 1
For  ...  ∈ C c , ≥ k ⇒ L(θ1 ) ≤ L(θ0 )
 
L(θ1 ) k
xn
Z Z Z Z
1 1
L(θ1 ) − L(θ1 ) ≥ L(θ0 ) − L(θ0 )
C A k C∩Ac k c
Z Z A∩C Z Z
1
= [ L(θ0 ) + L(θ0 ) − ( L(θ0 ) + L(θ0 ))]
k C∩Ac C∩A A∩C c A∩C
Z Z
1
= [ L(θ0 ) − L(θ0 )]
k c A
≥0

Note: MP critical region is


 
x1
 L(θ0 , x1 , . . . , xn )
C = { ...  : ≤ k}

L(θ1 , x1 , . . . , xn )
xn
L(θ0 , x1 , . . . , xn )
s.t. α = P ((X1 , . . . , Xn ) ∈ C : θ = θ0 ) = P ( ≤ k : θ = θ0 )
L(θ1 , x1 , . . . , xn )
L(θ0 , x1 , . . . , xn )
You need to know distribution of
L(θ1 , x1 , . . . , xn )
With Neyman-Pearson theorem, a test with critical region
 
x1
 L(θ0 , x1 , . . . , xn )
C = { ...  : ≤ k}

L(θ1 , x1 , . . . , xn )
xn
L(θ0 , x1 , . . . , xn )
s.t. α = P ( ≤ k : θ = θ0 )
L(θ1 , x1 , . . . , xn )
is a MP test with significance level α. Unfortunately, the distribution of
L(θ0 , x1 , . . . , xn )
is generally unknown and then constant k is not available.
L(θ1 , x1 , . . . , xn )
Suppose that, for giving k, there exists c such that
L(θ0 , x1 , . . . , xn )
≤ k iff either u(x1 , . . . , xn ) ≤ c or u(x1 , . . . , xn ) ≥ c
L(θ1 , x1 , . . . , xn )

15
and we know the distribution of u(X1 , . . . , Xn ), then the test with critical
region  
x1
C = { ...  : u(x1 , . . . , xn ) ≤ c}
 
xn
s.t. α = P (u(X1 , . . . , Xn ) ≤ c : θ = θ0 ) is a MP test with significance level
α.

Example:
iid
X1 , . . . , Xn ∼ N (θ, 1)
Consider simple hypothesis H0 : θ = 0 vs. H1 : θ = 1
Want the MP test with significance level α.
sol: Likelihood function is
n
n (xi −θ)2
P
Y 1 − (xi −θ)2 − n
− i=1
L(θ, x1 , . . . , xn ) = √ e 2 = (2π) 2 e 2

i=1

n n
− 12 [ x2i −2θ xi +nθ2 ]
P P
−n
= (2π) 2 e i=1 i=1

n n
− 12 [ x2i −2θ0 xi +nθ02 ]
P P
−n
L(θ0 , x1 , . . . , xn ) (2π) e i=1 2 i=1
= n n
L(θ1 , x1 , . . . , xn ) − 12 [
P
x2i −2θ1
P
xi +nθ12 ]
−n
(2π) 2 e i=1 i=1

n
xi + n
P
− 2
=e i=1 ≤k
n
X n
⇔− xi + ≤ ln k
i=1
2
n
X n
⇔ xi ≥ − ln k
i=1
2
1 n
⇔x≥ ( − ln k) = c
n 2
MP critical region is C = {x ≥ c}.
s.t.
α = P (type I error) = P (reject H0 : H0 )
X −0 c−0
= P (X ≥ c : θ = 0) = P ( ≥ : θ = 0)
√1 √1
n n
c
= P (Z ≥ )
√1
n

16
c zα
⇒ = zα ⇒ c = √
√1 n
n

The MP critical region with significance level α is C = {x ≥ √
n
}
iid
Example : X1 , . . . , Xn ∼ Poisson(λ)
H0 : λ = 10 vs. H1 : λ = 1
sol: The likelihood function is
n P
Y λxi e−λ λ xi e−nλ
L(λ, x1 , . . . , xn ) = = Qn
i=1
xi ! i=1 xi !
P
10Q xi e−10n
L(λ = 10, x1 , . . . , xn ) n
i=1 xi !
P
xi −9n
= P
1Q xi e−1n
= 10 e ≤k
L(λ = 1, x1 , . . . , xn ) n
i=1 xi !
P
xi −9n
10 e ≤k
X
⇔( xi ) ln 10 − 9n ≤ ln k
X 1
⇔ xi ≤ (ln k + 9n) = c
ln 10
n
P
The MP critical region with significance level α is C = { xi ≤ c}
i=1
where c satisfies
n c
X X (10n)y e−10n
α = P (Y = xi ≤ c : λ = 10) =
i=1 y=0
y!

Hypothesis H0 : θ = θ0 vs. H1 : θ = θ1 is equivalent to the


hypothesis H0 : X ∼ f (x, θ0 ) vs. H1 : X ∼ f (x, θ1 )
Then we can solve MP critical region for hypothesis H0 : X ∼ f1 (x) vs.
H1 : X ∼ f2 (x)
iid
Example: X1 , . . . , Xn ∼ with p.d.f f (x), where f is unknown.
Consider hypothesis
 e−1
x = 0, 1, 2, · · ·
H0 : X ∼ f1 (x) = x!
0 elsewhere.
 1 x−1
(2) x = 0, 1, 2, · · ·
vs. H1 : X ∼ f2 (x) =
0 elsewhere.

17
sol:
n
e−1
Q
xi ! e−n
L(f1 (x), x1 , . . . , xn ) i=1
Q
xi !
= Q
n =
L(f2 (x), x1 , . . . , xn ) P1
( 12 )xi −1 2 xi −n
i=1
n
P
xi
−n
e 2i=1
= nQ ≤k
2 xi !
Xn n
Y
⇔( xi ) ln 2 − ln( xi !) ≤ c
i=1 i=1

The MP critical region is


Xn n
Y
C = {( xi ) ln 2 − ln( xi !) ≤ c}
i=1 i=1

where c satisfies
Xn n
Y
α = P (( xi ) ln 2 − ln( xi !) ≤ c : f = f1 (x))
i=1 i=1

Consider testing the hypothesis H0 : θ = θ0 vs. the composite hypothesis


H1 : θ ∈ {θ1 , θ2 }. Considering significance level α, we can construct two MP
critical regions with significance level α for simple hypothesis as follows:

Hypothesis MP critical region with significance level α


H0 : θ = θ0 vs. H1 : θ = θ1 C1
H0 : θ = θ0 vs. H1 : θ = θ2 C2
If C1 = C2 , we call C = C1 = C2 the uniformly most powerful (UMP)
critical region with significance level α for simple hypothesis H0 : θ = θ0 vs.
composite hypothesis H1 : θ ∈ {θ1 , θ2 }. Suppose that C1 6= C2 , we say that
the UMP test or UMP critical region doesn’t exists.
Def. A test with critical region C is called a UMP test with significance
level α for testing simple hypothesis H0 : θ = θ0 vs. composite hypothesis
H1 : θ ∈ Θ1 , if, for θ1 ∈ Θ1 , C is MP critical region with significance level α
for H0 : θ = θ0 vs. H1 : θ = θ1
Usually UMP test exists for the following hypothesis:
(a)H0 : θ = θ0 vs. H1 : θ > θ0 (One sided hypothesis)

18
(b)H0 : θ = θ0 vs. H1 : θ < θ0 (One sided hypothesis)
Usually UMP test doesn’t exist for hypothesis
H0 : θ = θ0 vs. H1 : θ 6= θ0 (Two sided hypothesis)

Example:
iid
X1 , . . . , Xn ∼ N (µ, 1), H0 : µ = 0 vs. H1 : µ > 0. Let µ1 > 0.
Consider simple hypothesis H0 : µ = 0 vs. H1 : µ = µ1 .
sol: By Nayman-Pearson theorem,
n
x2 − 12 x2i
Qn P
− i
√1
L(µ = 0, x1 , . . . , xn ) i=1 2π e e
2 i=1
=Q (x −µ )2
= n n
L(µ = µ1 , x1 , . . . , xn ) n
√1 − i 21 − 12 [
P
x2i −2µ1
P
xi +nµ21 ]
i=1 2π e e i=1 i=1

xi + n µ2
P
−µ1
=e 2 1 ≤k
n
X n 2
⇔ −µ1 xi + µ ≤ ln k
i=1
2 1
n
X n 2
⇔ −µ1 xi ≤ ln k − µ where − µ1 < 0 for µ1 > 0
i=1
2 1
n
X 1 n
⇔ xi ≥ − (ln k − µ21 )
i=1
µ1 2

⇔ nx ≥ c under H0

α = P ( nx ≥ c : H0 ) = P (Z ≥ c)
⇒ c = zα
⇒ MP critical region for H0 : µ = 0 vs. H1 : µ = µ1 is

C = { nx ≥ zα }

This holds for every µ = µ1 . Hence C = { nx ≥ zα } is UMP critical region
with significance level α for H0 : µ = 0 vs. H1 : µ > 0.

Example:
iid
X1 , . . . , Xn ∼ N (0, σ 2 ), H0 : σ 2 = 1 vs. H1 : σ 2 < 1. Let θ < 1.
Consider simple hypothesis H0 : σ 2 = 1 vs. H1 : σ 2 = θ.

19
sol:
Qn x2i
√1 − 1
x2i
i=1 2π e
P
2
L(σ = 1, x1 , . . . , xn ) 2
e− 2
=Q x2
= P 2
L(σ 2 = θ, x1 , . . . , xn ) n − i
x
i
i=1 √
1
1 e 2θ
1
n e− 2θ
2πθ 2 θ2
n
− 21 (1− θ1 ) x2i
P
=θ e 2 ≤k
n
1 1 X 2 n 1 1
⇔ − (1 − ) xi ≤ ln(θ− 2 k) where − (1 − ) > 0 for θ < 1
2 θ i=1 2 θ
n
X 1 −n
⇔ x2i ≤ 1 ln(θ
2 k) = c

i=1
− 21 (1 − θ)

n
H
Xi2 ∼0 χ2 (n)
P
i=1
MP critical region for hypothesis H0 : σ 2 = 1 vs. H1 : σ 2 = θ with significance
level α is n
X
C={ x2i ≤ χ21−α (n)}
i=1

This holds for every σ 2 = θ < 1. Hence C is the UMP critical region for
hypothesis H0 : σ 2 = 1 vs. H1 : σ 2 < 1.

Example:
iid
X1 , . . . , Xn ∼ Poisson(λ). Hypothesis H0 : λ = 1 vs. H1 : λ 6= 1.
Show that UMP test with significance level α doesn’t exist.
You need to find λ1 and λ2 such that MP tests for H0 : λ = 1 vs. H1 : λ = λ1
and H0 : λ = 1 vs. H1 : λ = λ2 are different.
sol: Let λ0 6= 1. Consider hypothesis H0 : λ = 1 vs. H1 : λ = λ0 .
The likelihood function is
n P
Y λxi e−λ λ xi e−nλ
L(λ, x1 , . . . , xn ) = = Qn
i=1
xi ! i=1 xi !

20
By Neyman-Pearson theorem,

L(λ = 1, x1 , . . . , xn ) e−n
= P xi ≤k
L(λ = λ0 , x1 , . . . , xn ) λ0 e−nλ0
n
P
− xi
⇔ λ0 i=1 ≤ ken−nλ0
n
X
⇔− xi ln λ0 ≤ ln(ken−nλ0 )
i=1
n
X
⇔ xi (− ln λ0 ) ≤ ln(ken−nλ0 )
i=1

If λ0 > 1, then − ln λ0 < 0. ⇒ xi ≥ − ln1 λ0 ln(ken−nλ0 ) = k1


P
⇒ MP critical region for H0 : λ = 1 vs. H1 : λ = λ0 is
Xn X
C1 = { xi ≥ k1 } s.t. α = P ( Xi ≥ k1 : λ = 1)
i=1

xi ≤ − ln1 λ0 ln(ken−nλ0 ) = k2
P
If λ0 < 1, then − ln λ0 < 0. ⇒
⇒ MP critical region for H0 : λ = 1 vs. H1 : λ = λ0 is
Xn X
C2 = { xi ≤ k2 } s.t. α = P ( Xi ≤ k2 : λ = 1)
i=1

∵ C1 6= C2
∴ UMP test doesn’t exist.
There are several types of hypothesis that we need to treat them in different
ways:

A1 : H0 : θ = θ0 vs.H1 : θ = θ1 (simple hypothesis)


⇒ the MP test always exists by Neyman-Pearson theorem.

A2 : H0 : θ = θ0 vs.H1 : θ > θ0 or H0 : θ = θ0 vs.H1 : θ < θ0


⇒ the UMP test exists for some distribution.

A3 : H0 : θ = θ0 vs.H1 : θ 6= θ0
⇒ the UMP test general doesn’t exist.

21
A4 : Suppose that we have composite hypothesis
H0 : θ ≤ θ0 vs. H1 : θ > θ0 or
H0 : θ ≥ θ0 vs. H1 : θ < θ0 (One sided test)
⇒ the UMP test exists for some special distributions.

A5 : General hypothesis
H0 : θ ∈ Θ0 vs. H1 : θ ∈
/ Θ0
⇒ If there is no UMP test, we will consider likelihood ratio test.

We now consider UMP tests for one sided hypothesis:


A test with critical region C has power function.

πc (θ) = P ((X1 , . . . , Xn ) ∈ C : θ is true )

If the hypothesis is H0 : θ ∈ Θ0 vs. H1 : θ ∈


/ Θ0 , the size of the test is
sup πc (θ)
θ∈Θ0

Def. Consider composite hypothesis H0 : θ ∈ Θ0 vs. H1 : θ ∈ / Θ0 . A test with


critical region C is called a UMP test with significance level α if it satisfies
(a) sup πc (θ) = α.
θ∈Θ0
(b) Any critical region C ∗ with size ≤ α satisfies πc (θ) ≥ πc∗ (θ), ∀θ ∈
/ Θ0 .

Denote L(θ, x1 , . . . , xn ) the likelihood function of a random sample X1 , . . . , Xn


from p.d.f f (x, θ).

Def. A family of densities {f (x, θ) : θ ∈ Θ} is said to have a monotone


likelihood ratios if, for θ0 < θ00 , there exists a statistic T = t(X1 , . . . , Xn )
such that the likelihood ratio
L(θ0 , x1 , . . . , xn )
= h(T )
L(θ00 , x1 , . . . , xn )

is either nonincreasing or nondecreasing in T.

If X with a distribution has a monotone likelihood ratio and hypothesis is


one sided, then it has UMP test.
Idea of UMP test:

22

Larger T ⇒ θ0 (small) is more reliable.

 

Smaller T ⇒ θ00 (large) is more reliable.

 

% in T ⇒


If H1 : θ < θ0 , c = {T ≥ t0 }


 
L(θ0 )
 
If H1 : θ > θ0 , c = {T ≤ t0 }
 
= h(T ) 
L(θ00 ) 
 
 Larger T ⇒ θ00 (large) is more reliable.
Smaller T ⇒ θ0 (small) is more reliable.

 
& in T ⇒


If H1 : θ < θ0 , c = {T ≤ t0 }



 

If H1 : θ > θ0 , c = {T ≥ t0 }
 

We will say that the test following this rule is UMP test if monotone likelihood
ratio exists.
Note: When there is monotone likelihood ratio (MLR), the UMP test has
monotone power function. If H0 : θ ≤ θ0 , πc (θ) is %. If H0 : θ ≥ θ0 , πc (θ) is
&.

Thm. Suppose that the family of densities has a MLR in a statistic T =


t(X1 , . . . , Xn ),
(a)Consider hypothesis H0 : θ ≤ θ0 vs. H1 : θ > θ0 (one sided hypothesis)
L(θ0 )
If the LR = L(θ 00 ) = h(T ) is nondecreasing in T, then the UMP critical region

with significance level α is C = {T ≤ t0 } s.t.

α = sup P (T ≤ t0 : θ) = P (T ≤ t0 : θ0 )
θ≤θ0

0
L(θ )
If the LR = L(θ 00 ) = h(T ) is nonincreasing in T, then the UMP critical region

significance level α is C = {T ≥ t0 } s.t.

α = sup P (T ≥ t0 : θ) = P (T ≥ t0 : θ0 )
θ≥θ0

(b) Consider hypothesis H0 : θ ≥ θ0 vs. H1 : θ < θ0


L(θ0 )
If the LR = L(θ 00 ) = h(T ) is nondecreasing in T, then the UMP critical region

with significance level α is C = {T ≥ t0 } s.t.

α = sup P (T ≥ t0 : θ) = P (T ≥ t0 : θ0 )
θ≥θ0

0
L(θ )
If the LR = L(θ 00 ) = h(T ) is nonincreasing in T, then the UMP critical region

with significance level α is C = {T ≤ t0 } s.t.

α = sup P (T ≤ t0 : θ) = P (T ≤ t0 : θ0 )
θ≥θ0

23
Example:
iid
X1 , . . . , Xn ∼ U (0, θ), θ > 0. H0 : θ ≤ θ0 vs. H1 : θ > θ0
sol: f (x, θ) = 1θ I(0 < x < θ). The likelihood function is
n n
Y 1 1 Y 1
L(θ, x1 , . . . , xn ) = I(0 < xi < θ) = n I(0 < xi < θ) = n I(0 < yn < θ)
i=1
θ θ i=1 θ

where Yn = max{X1 , . . . , Xn }
Let θ0 < θ00 . The LR is

L(θ0 , x1 , . . . , xn )
1
(θ0 )n
I(0 < yn < θ0 ) θ00 n I(0 < yn < θ0 )
= 1 =( )
L(θ00 , x1 , . . . , xn ) (θ00 )n
I(0 < yn < θ00 ) θ0 I(0 < yn < θ00 )

The UMP critical region is


C = {yn ≥ c} s.t. α = P (Yn ≥ c : θ0 )
The p.d.f of Yn is

y 1 y n−1
fYn (y) = n(F (yn ))n−1 f (yn ) = n( )y−1 = n n , 0 < y < θ
θ θ θ
Z θ0 n−1
y cn
α = P (Yn ≥ c : θ0 ) = n n dy = 1 − n
c θ0 θ0
1
⇒ c = θ0 (1 − α) n
1
The UMP critical region with significance level α is C = {yn ≥ θ0 (1 − α) n }
Example:
iid
X1 , . . . , Xn ∼ N (µ, 1).H0 : µ ≥ 0 vs. H1 : µ < 0.
Want the UMP test.
sol: Let µ1 < µ2 .
n n n
(xi −µ1 )2
P
− 12 [ x2i −2µ1 xi +nµ21 ]
P P
−n − i=1
L(µ1 , x1 , . . . , xn ) (2π) e 2 2 e i=1 i=1
= n = n n
L(µ2 , x1 , . . . , xn ) P
(xi −µ2 )2 − 12 [
P
x2i −2µ2
P
xi +nµ22 ]
n i=1
(2π) e −2 − 2 e i=1 i=1

1 2 2
P X
= e− 2 [2(µ2 −µ1 ) xi +n(µ1 −µ2 )] , & in xi

The UMP critical region is


n
P
C = { xi ≤ c}
i=1
n
P
n Xi
√c √c )
P
s.t. α = P ( Xi ≤ c : µ = 0) = P ( i=1

n
≤ n
: µ = 0) = P (Z ≤ n
i=1

24

⇒ √c = −zα ⇒ c = − nzα .
n
n
P √
So, the UMP critical region significance level α is C = { xi ≤ − nzα } =
i=1
−z
{x ≤ √α}
n
iid
X1 , . . . , Xn ∼ N (µ, σ02 ) where σ0 is known.

Hypotheses UMP critical region


x−µ
H0 : µ ≤ µ0 vs. H1 : µ > µ0 √0 ≥ zα
σ0 / n
x−µ
H0 : µ ≥ µ0 vs. H1 : µ < µ0 √0 ≤ −zα
σ0 / n

Likelihood Ratio Test:


Let X1 , . . . , Xn be a random sample from p.d.f f (x, θ), θ ∈ Θ.
Denote by L(θ, x1 , . . . , xn ) as the likelihood function of X1 , . . . , Xn . So
n
Y
L(θ, x1 , . . . , xn ) = f (xi , θ)
i=1

If we have hypothesis H0 : θ ∈ Θ0 vs. H1 : θ ∈


/ Θ0 , consider two maximum
likelihood as

max L(θ, x1 , . . . , xn ) and max L(θ, x1 , . . . , xn ) = L(θ̂mle , x1 , . . . , xn )


θ∈Θ0 θ∈Θ

where θ̂mle is the m.l.e. of θ.


Obviously,
max L(θ, x1 , . . . , xn ) ≤ max L(θ, x1 , . . . , xn )
θ∈Θ0 θ∈Θ

If θ ∈ Θ0 is true, they must be very close.


If θ ∈ Θ0 is not true, they are no longer close.

Def. The generalized likelihood ratio is

max L(θ, x1 , . . . , xn ) max L(θ, x1 , . . . , xn )


θ∈Θ0 θ∈Θ0
λ = λ(x1 , . . . , xn ) = =
max L(θ, x1 , . . . , xn ) L(θ̂mle , x1 , . . . , xn )
θ∈Θ

The likelihood ratio test for testing hypothesis H0 : θ ∈ Θ0 vs. H1 : θ ∈


/ Θ0
is
rejecting H0 if λ ≤ λ0
where λ0 satisfies α = sup P (λ(X1 , . . . , Xn ) ≤ λ0 : θ)
θ∈Θ0
Note:
The power function P (λ(X1 , . . . , Xn ) ≤ λ0 : θ) is not generally monotone,

25
but it often is.
H0 : θ = θ0 vs. H1 : θ = θ1
⇒ MP test exists by Neyman-Pearson Theorem.
H0 : θ = θ0 vs. H1 : θ > θ0 or H0 : θ = θ0 vs. H1 : θ < θ0
⇒ UMP test often exists by Neyman-Pearson Theorem.
H0 : θ ≤ θ0 vs. H1 : θ > θ0 or H0 : θ ≥ θ0 vs. H1 : θ < θ0
⇒ One sided hypothesis + MLR ⇒ UMP test exists.
Critical region of likelihood ratio test is

max L(θ, x1 , . . . , xn )
θ∈Θ0
C = {λ = λ(x1 , . . . , xn ) = ≤ λ0 }
max L(θ, x1 , . . . , xn )
θ∈Θ

s.t. α = sup P (λ(X1 , . . . , Xn ) ≤ λ0 : θ)


θ∈Θ0

In many cases, there is a statistic T = t(X1 , . . . , Xn ) such that λ(x1 , . . . , xn ) ≤


λ0 if and only if t(x1 , . . . , xn ) ≤ k and then C = {t(x1 , . . . , xn ) ≤ k}

Example:
X1 , . . . , Xn iid with p.d.f f (x, θ) = θe−θx , x > 0, θ > 0
H0 : θ ≤ θ0 vs. H1 : θ > θ0
sol:
n n
P
Y −θ xi
−θxi n
L(θ, x1 , . . . , xn ) = θe =θ e i=1

i=1
n
X
ln L(θ, x1 , . . . , xn ) = n ln θ − θ xi
i=1
n
∂ ln L(θ) n X
= − xi = 0
∂θ θ i=1
n 1
⇒ m.l.e θ̂ = P
n =
x
xi
i=1
1
So, max L(θ, x1 , . . . , xn ) = L(θ̂, x1 , . . . , xn ) = ( )n e−n
θ∈Θ x
1
Since L(θ, x1 , . . . , xn ) achieves maximum at θ =
 1x n −n 1
P (x) e P if θ0 >
So, max L(θ, x1 , . . . , xn ) = max θn e−θ xi = n −θ0 xi
x
1
θ≤θ0 θ≤θ0 θ0 e if θ0 < x

26
max L(θ) (
1 if θ0 > 1
0<θ≤θ0 x
So, λ = λ(x1 , . . . , xn ) = =
P
θ0n e−θ0 xi
max L(θ) = (θ0 x)n e−n(θ0 x−1)
( x1 )n e−n
if θ0 < 1
x
θ>0
if x > θ10

1
= % in x
(θ0 x)n e−n(θ0 x−1) if x ≤ θ10

The critical region of LRT is C = {x ≤ c}


s.t. α = P (X ≤ c : θ0 )

Xn
α = P (X ≤ c : θ0 ) = P ( Xi ≤ nc : θ0 )
i=1
n
P
2 Xi
i=1 2nc
= P( 1 ≤ 1 : θ0 )
θ0 θ0
2
= P (χ (2n) ≤ 2θ0 nc)

⇒ χ2α = 2θ0 nc
χ2α
⇒ c = 2nθ 0

Example:
iid
X1 , . . . , Xn ∼ N (θ, 1), H0 : θ = 0 vs. H1 : θ 6= 0.
This is no UMP test for this hypothesis. We want LRT.
sol: n
n (xi −θ)2
P
Y 1 − (xi −θ)2 n
−2 − i=1
L(θ, x1 , . . . , xn ) = √ e 2 = (2π) e 2

i=1

∂ ln L(θ, x1 , . . . , xn )
= 0 ⇒ m.l.e. θ̂ = x
∂θ

max L(θ, x1 , . . . , xn ) L(0, x1 , . . . , xn ) e− 2


1 P
x2i
θ=0
Likelihood ratio λ = = = 1 P
(xi −x)2
max L(θ, x1 , . . . , xn )
θ∈R
L(θ̂, x1 , . . . , xn ) e− 2
1 P 2
P 2
P 2 1 2
= e− 2 [ xi − xi +2x xi −nx ] = e− 2 nx ≤ λ0
1 √
⇔ − nx2 ≤ ln λ0 ⇔ |x| ≥ c0 ⇔ | nx| ≥ c
2

H 1 X √ H
X ∼0 N (0, ) ⇒ 1 = nX ∼0 N (0, 1)
n √
n

27

α = P (| nx| ≥ c : θ = 0) = P (|Z| ≥ c) = 1 − P (|Z| ≤ c)
⇒ P (|Z| ≤ c) = 1 − α ⇒ c = z α2

The critical region with significance level α for LRT is



C = {x : | nx| ≥ z α2 }

Example:
iid
X1 , . . . , Xn ∼ N (θ1 , θ2 ), −∞ < θ1 < ∞, θ2 > 0
H0 : θ1 = 0 vs. H1 : θ1 6= 0.
i.e.H0 : θ1 = 0, θ2 > 0 vs. H1 : θ1 6= 0, θ2 > 0.

28
sol:
(xi −θ1 )2
P
n −n −
L(θ1 , θ2 , x1 , . . . , xn ) = (2π)− 2 θ2 2 e 2θ2

P 2
xi
n −n −
max L(θ1 , θ2 , x1 , . . . , xn ) = max L(0, θ2 , x1 , . . . , xn ) = max(2π)− 2 θ2 2 e 2θ2
θ1 =0,θ2 >0 θ2 >0 θ2 >0
P 2
n n xi
ln L(0, θ2 , x1 , . . . , xn ) = − ln 2π − ln θ2 −
2 2 2θ2
P 2
∂ ln L(0, θ2 , x1 , . . . , xn ) n xi
=− + =0
∂θ2 2θ2 2θ22
1X 2
⇒ θˆ2 = xi
n
1X 2 n 1
X n n
max L(θ1 , θ2 , x1 , . . . , xn ) = L(0, xi , x1 , . . . , xn ) = (2π)− 2 ( x2i )− 2 e− 2
θ1 =0,θ2 >0 n n
(xi −θ1 )2
P
n −n −
max L(θ1 , θ2 , x1 , . . . , xn ) = max (2π)− 2 θ2 2 e 2θ2
θ1 ∈R,θ2 >0 θ1 ∈R,θ2 >0

(xi − θ1 )2
P
n n
ln L(θ1 , θ2 , x1 , . . . , xn ) = − ln 2π − ln θ2 −
2 2 2θ2
∂ ln L(θ1 , θ2 , x1 , . . . , xn ) 1 X
= (xi − θ1 ) = 0 ⇒ θˆ1 = x
∂θ1 θ2
∂ ln L(x, θ2 , x1 , . . . , xn ) n 1 X 1X
=− + 2 (xi − x)2 ⇒ θˆ2 = (xi − x)2
∂θ2 2θ2 2θ2 n
n 1 n n
X
max L(θ1 , θ2 , x1 , . . . , xn ) = (2π)− 2 ( (xi − x)2 )− 2 e− 2
θ1 ∈R,θ2 >0 n
max L(θ1 , θ2 , x1 , . . . , xn ) P 2 −n −n
( n1 (xi − x)2 n
P
θ1 =0,θ2 >0 xi ) 2 e 2
⇒λ= = 1P n n = ( P 2 ) 2 ≤ λ0
max L(θ1 , θ2 , x1 , . . . , xn ) ( n (xi − x)2 )− 2 e− 2 xi
θ1 ∈R,θ2 >0

(xi − x)2
P
2
⇔ ln P 2 ≤ ln λ0
xi n
2
(xi − x)2
P P
(xi − x) 1 2
ln λ0
⇔ P 2 =P 2 2 = 2 ≤ e n
xi (xi − x) + nx nx
1 + (xi −x)2
P

x 2

(xi − x)2 nx2


P
1
⇔ P 2 = 2
= n
> c0
xi (n − 1)s (n − 1)s2
x2
1
⇔ n
(n−1)s2
> (n − 1)c0
n−1
|x|
⇔ > c∗ = t α2
√s
n

29
|x|
α = P( > c∗ : θ1 = 0) = P (|T | > c∗ ), c∗ = t α2
√s
n

The critical region with significance level α for LRT is

|x|
C={ > t α2 }
√s
n

F-distribution:

Def. A random variable F is said to have F-distribution with degrees of free-


dom (r1 , r2 ) if there are independent chi-square r.v.s χ2 (r1 ) and χ2 (r2 ) such
that 2 χ (r1 )
r1
F = χ2 (r2 )
,F >0
r2

We denote by F ∼ f (r1 , r2 )

We denote fα (r1 , r2 ) with P (F ≤ fα (r1 , r2 )) = α.


In the table we can find fα (r1 , r2 ) with α = 0.95, 0.975, 0.99 only.
How can we find fα (r1 , r2 ) with α = 0.05, 0.025, 0.01?
1
Thm. fα (r1 , r2 ) = f1−α (r2 ,r1 )

Proof. If F ∼ f (r1 , r2 ), then there exists χ2 (r1 ) and χ2 (r2 ), independent,


χ2 (r1 ) χ2 (r2 )
r1 1 r2
with F = χ2 (r2 )
, then = χ2 (r1 )
∼ f (r2 , r1 )
F
r2 r1

Let F ∼ f (r1 , r2 ). We want fα (r1 , r2 ) with α = 0.05, 0.025, 0.01.

1 1 1
α = P (F ≤ fα (r1 , r2 )) = P ( ≥ ) = P (fα (r2 , r1 ) ≥ )
F fα (r1 , r2 ) fα (r1 , r2 )
1
= 1 − P (fα (r2 , r1 ) ≤ )
fα (r1 , r2 )
1
⇒ 1 − α = P (fα (r2 , r1 ) ≤ )
fα (r1 , r2 )
1 1
⇒ = f1−α (r2 , r1 ) ⇒ fα (r1 , r2 ) =
fα (r1 , r2 ) f1−α (r2 , r1 )

30
σx2 σ2
Ratio of Variances or σy2 :
σy2 x
(
iid
X1 , . . . , Xn ∼ N (µx , σx2 )
iid indep.
Y1 , . . . , Ym ∼ N (µy , σy2 )
 n
1
 s2x =
P
(n−1)s2x (xi − x)
( 2
∼ χ (n − 1)

n−1
σx2 i=1
⇒ (m−1)s2y indep. where m
σy2
∼ χ2 (m − 1)  s2y =
 1
P
(yi − y)
m−1
i=1
(m−1)s2y
σy2 (m−1) σx2 s2y
F = (n−1)s2x
= 2 2 ∼ f (m − 1, n − 1)
σy sx
σx2 (n−1)
2
C.I. for σσx2 :
y
Let a,b satisfy
α α
= P (f (m − 1, n − 1) ≤ a) and 1 − = P (f (m − 1, n − 1) ≤ b)
2 2
1
⇒a= , b = f1− α2 (m − 1, n − 1)
f1− α2 (n − 1, m − 1)
So,

1 − α = P (f α2 (m − 1, n − 1) ≤ F ≤ f1− α2 (m − 1, n − 1))
1 σx2 s2y
= P( ≤ 2 2 ≤ f1− α2 (m − 1, n − 1))
f1− α2 (n − 1, m − 1) σy sx
1 s2x σx2 s2x
= P( ≤ ≤ f 1− α (m − 1, n − 1) )
f1− α2 (n − 1, m − 1) s2y σy2 2
s2y

σx2
Hence, a 100(1 − α)% C.I. for σy2
is:

1 s2x s2x
( , f α (m − 1, n − 1) )
1−
f1− α2 (n − 1, m − 1) s2y 2
s2y

Hypothesis Testing for Equal Variance:


Consider the hypothesis
H0 : σx2 = σy2 vs. H1 : σx2 6= σy2
2 2
or H0 : σσx2 = 1 vs. H1 : σσx2 6= 1
y y

31
σx2 s2y
Since F = σy2 s2x
∼ f (m − 1, n − 1), we have

α σx2 s2y s2y 1


= P ( 2 2 ≤ f α2 (m − 1, n − 1)) = P ( 2 ≤ : H0 )
2 σ y sx sx f1− α2 (n − 1, m − 1)
α σx2 s2y s2y
and = P ( 2 2 ≥ f1− α2 (m − 1, n − 1)) = P ( 2 ≥ f1− α2 (m − 1, n − 1) : H0 )
2 σ y sx sx
s2y 1 s2y
We have α = P ( ≤ or ≥ f1− α2 (m − 1, n − 1) : H0 )
s2x f1− α2 (n − 1, m − 1) s2x
Then a critical region with significance level α for H0 is
s2y 1 s2y
C={ 2 ≤ or 2 ≥ f1− α2 (m − 1, n − 1)}
sx f1− α2 (n − 1, m − 1) sx
What is this test? A UMP test or LRT ?
We will show that it is a LRT.
Likelihood Ratio Test for Ratio of variances:
(
iid
X1 , . . . , Xn ∼ N (µ1 , σ12 )
iid indep.
Y1 , . . . , Ym ∼ N (µ2 , σ22 )
Θ = {(µ1 , µ2 , σ1 , σ2 ) : µ1 , µ2 ∈ R, σ1 , σ2 > 0}
σ2
Hypothesis H0 : σ12 = σ22 ( σ12 = 1) vs. H1 : σ12 6= σ22
2
Θ0 = {(µ1 , µ2 , σ1 , σ2 ) : µ1 , µ2 ∈ R, σ1 = σ2 > 0}
Likelihood function is the joint p.d.f of X1 , . . . , Xn and Y1 , . . . , Ym which is
n m
(xi −µ1 )2 (yi −µ2 )2
P P
i=1 i=1
n+m − − n m
L(µ1 , µ2 , σ12 , σ22 ) = (2π)− (σ12 )− 2 (σ22 )− 2
2σ12 2σ22
2 e
The likelihood ratio is
sup L(µ1 , µ2 , σ12 , σ22 )
µ1 ,µ2 ∈R,σ1 =σ2
λ=
sup L(µ1 , µ2 , σ12 , σ22 )
µ1 ,µ2 ∈R,σ1 ,σ2 >0

n m
(xi − µ1 )2 (yi − µ2 )2
P P
n+m n m
ln L(µ1 , µ2 , σ12 , σ22 ) = − ln(2π)− ln(σ12 )− ln(σ22 )− i=1 − i=1
2 2 2 2σ12 2σ22
Under H0 :
n m
2 n+m n+m 1 X X
ln L(µ1 , µ2 , σ = σ12 = σ22 ) =− ln(2π)− ln(σ )− 2 [ (xi −µ1 ) + (yi −µ2 )2 ]
2 2
2 2 2σ i=1 i=1

32
∂ ln L 1 X
= 2 (xi − µ1 ) = 0 ⇒ µˆ1 = x
∂µ1 σ
∂ ln L 1 X
= 2 (yi − µ2 ) = 0 ⇒ µˆ2 = y
∂µ2 σ
n m
∂ ln L(µ1 = x, µ2 = y, σ 2 = σ12 = σ22 ) n+m 1 X X
=− + 4 [ (xi −x) + (yi −y)2 ] = 0
2
∂σ 2 2σ 2 2σ i=1 i=1
n m
1 X X
⇒ σ̂ 2 = [ (xi − x)2 + (yi − y)2 ]
n + m i=1 i=1
n m
− n+m 1 X X n+m n+m
sup L(µ1 , µ2 , σ =2
σ12 = σ22 ) = (2π) 2 ( [ (xi −x) + (yi −y)2 ])− 2 e− 2
2
Θ0 n + m i=1 i=1

We want sup L(µ1 , µ2 , σ12 , σ22 )


Θ

n
∂L(µ1 , µ2 , σ12 , σ22 ) ∂L(µˆ1 , µ2 , σ12 , σ22 ) ˆ2 = 1
X
= 0 ⇒ µˆ1 = x, = 0 ⇒ σ 1 (xi − x)2
∂µ1 ∂σ12 n i=1
m
∂L(µ1 , µ2 , σ12 , σ22 ) ∂L(µ1 , µˆ2 , σ12 , σ22 ) 1 X
= 0 ⇒ µˆ2 = y, 2
= 0 ⇒ σˆ22 = (yi − y)2
∂µ2 ∂σ2 m i=1

n m
− n+m 1X n 1 X m n+m
sup L(µ1 , µ2 , σ12 , σ22 ) = (2π) 2 ( (xi − x)2 )− 2 ( (yi − y)2 )− 2 e− 2
Θ n i=1 m i=1
Likelihood ratio:
n+m
n m n+m
(xi − x)2 + (yi − y)2 )−
P P
(n + m) 2 ( 2
i=1 i=1
λ= n m ≤ λ0
n m n m
x)2 )− 2 ( y)2 )−
P P
n m (2 2 (xi − (yi − 2
i=1 i=1

n n
m m
(xi − x)2 ) 2 ( (yi − y)2 ) 2
P P
(
i=1 i=1
⇔ P
n m ≤ λ1
P n+m
2 2
( (xi − x) + (yi − y) ) 2
i=1 i=1
1 1
⇔ m · n ≤ λ1
(yi −y)2 (xi −x)2
P P
i=1 n i=1 m
(1 + n )2 (1 + m )2
(xi −x)2 (yi −y)2
P P
i=1 i=1

33
1
∵ n m → 0 as x → ∞ or x → 0
(1 + x) (1 + x1 ) 2
2

m
(yi − y)2
P
i=1
∴ Pn ≤ c1 or ≥ c2
(xi − x)2
i=1

m
(Yi −Y )2
P
i=1
(m−1)
⇒F = n ≤ f α2 (m − 1, n − 1) or ≥ f1− α2 (m − 1, n − 1)
(Xi −X)2
P
i=1
(n−1)

The LR test with significance level α is


m
(Yi −Y )2
P
i=1
(m−1)
⇒F = n ≤ f α2 (m − 1, n − 1) or ≥ f1− α2 (m − 1, n − 1)
(Xi −X)2
P
i=1
(n−1)

Testing for Hypothesis of Equal Means:


(
iid
X1 , . . . , Xn ∼ N (µ1 , σ12 )
iid indep.
Y1 , . . . , Ym ∼ N (µ2 , σ22 )

H0 : µ1 = µ2 vs. H1 : µ1 6= µ2
(1) Suppose that it is known that σ = σ1 = σ2 . We have
(
iid
X1 , . . . , Xn ∼ N (µ1 , σ 2 )
iid indep.
Y1 , . . . , Ym ∼ N (µ2 , σ 2 )
2

 X ∼ N (µ1 , σn ) 
1
n
 s2x = n−1 (xi − x)2
 P
 2
 (n−1)sx ∼ χ2 (n − 1) 
σ2 indep. where i=1
2 m

 Y ∼ N (µ2 , σm )  1
 s2y = m−1
P
(yi − y)2
 (m−1)sy
 2

σ2
∼ χ2 (m − 1) i=1
(
X − Y ∼ N (µ1 − µ2 , σ 2 ( n1 + m1 ))
(n−1)s2x +(m−1)s2y indep.
σ2
∼ χ2 (n + m − 2)
X−Y −(µ1 −µ2 )
√1
σ n
1
+m X −Y
T =q =q q ∼ t(n + m − 2)
(n−1)s2x +(m−1)s2y (n−1)s2x +(m−1)s2y 1 1
σ 2 (n+m−2) (n+m−2) n
+ m

34
And
X −Y H0
T =q q ∼ t(n + m − 2)
(n−1)s2x +(m−1)s2y 1 1
(n+m−2) n
+ m

Reject H0 if

|X − Y |
|T | = q q > t α2 (n + m − 2)
(n−1)s2x +(m−1)s2y 1 1
σ 2 (n+m−2) n
+ m

(2) Suppose that we don’t know if σ12 = σ22 is true.


Two step to test H0 :
(i) Test H00 : σ12 = σ22 Reject H00 if
m
(yi −y)2
P
i=1
(m−1)
F = n ≤ f α2 (m − 1, n − 1) or ≥ f1− α2 (m − 1, n − 1)
(xi −x)2
P
i=1
(n−1)
(ii) If we accept H00 , we test H0 : µ1 = µ2 by rejecting H0 if
|X−Y |
|T | = r 2 2√
> t α2 (n + m − 2)
(n−1)sx +(m−1)sy 1 1
n+m−2 n
+m
(iii) If we reject H00 , we can do nothing.
Hypothesis Testing for p:
Y ∼ b(n, p). Want to test hypothesis H0 : p = p0 vs. H1 : p 6= p0
iid
Let Y1 , . . . , Yn ∼ Bernoulli(p)
d
Then Y = vYi ∼ b(n, p)

n
P
Yi
Y d i=1 P
p̂ = = −→ p by WLLN.
n n
n
P
Yi
p̂ − p d n i=1
−p d
q = q −→ N (0, 1) by CLT.
p(1−p) p(1−p)
n n
s
p̂ − p p(1 − p) p̂ − p d
⇒q = q −→ 1 · N (0, 1) = N (0, 1) by Slutsky’s theorem
p̂(1−p̂) p̂(1 − p̂) p(1−p)
n n
p̂ − p0 d
Under H0 , q −→ N (0, 1)
p̂(1−p̂)
n
|p̂ − p0 |
α = P (|Z| ≥ z α2 ) ' P ( q ≥ z α2 : H0 )
p̂(1−p̂)
n

35
Then, an approximate test with significance level α is
rejecting H0 if qp̂−p 0
p̂(1−p̂)
≥ z α2 or ≤ −z α2
n
Table of approximate tests for p.
Hypothesis Critical region
p̂−p
H0 : p = p0 vs. H1 : p 6= p0 q p̂(1−0p̂) ≥ z α2 or ≤ −z α2
n
H0 : p = p0 vs. H1 : p > p0 qp̂−p0 ≥ zα
p̂(1−p̂)
n
p̂−p
H0 : p = p0 vs. H1 : p < p0 q 0
p̂(1−p̂)
≤ −zα
n

Hypothesis for difference of p0 s:



X ∼ b(n, p1 )
indep.
Y ∼ b(m, p2 )
Want to test hypothesis H0 : p1 = p2 vs. H1 : p1 6= p2 .
Let pˆ1 = Xn , pˆ2 = m Y
. We have, by CLT.
 d
pˆ −p
 q p11 (1−p1 1 ) −→ N (0, 1) d
(
pˆ1 ≈ N (p1 , p1 (1−p1)

)

n
−p d ⇒ d
n indep.
p (1−p )
 q p2 (1−p2 ) −→ N (0, 1)
 2 2
pˆ2 ≈ N (p2 , 2 m 2 )
m

d p1 (1 − p1 ) p2 (1 − p2 )
⇒ pˆ1 − pˆ2 ≈ N (p1 − p2 , + )
n m
or
(pˆ1 − pˆ2 ) − (p1 − p2 )
q ≈ N (0, 1)
p1 (1−p1 ) p2 (1−p2 )
n
+ m
P P
Since pˆ1 −→ p1 , pˆ2 −→ p2 , we have
(pˆ1 − pˆ2 ) − (p1 − p2 )
q ≈ N (0, 1)
pˆ1 (1−pˆ1 ) pˆ2 (1−pˆ2 )
n
+ m

We further have
pˆ1 − pˆ2 H0
q ≈ N (0, 1)
pˆ1 (1−pˆ1 ) pˆ2 (1−pˆ2 )
n
+ m

Since
pˆ1 − pˆ2
α ≈ P (q ≥ z α2 or ≤ −z α2 : H0 )
pˆ1 (1−pˆ1 ) pˆ2 (1−pˆ2 )
n
+ m

the rule for testing H0 is


rejecting H0 if q pˆ1 (1−ppˆˆ11−
)
pˆ2
pˆ2 (1−pˆ2 )
≥ z α2 or ≤ −z α2
n
+ m

36
Bivariate Normal Distribution:
Note:

(a) S: Sample space, P: Probability set function


Random variable X : S → R, P (X ∈ A) = P ({w ∈ S : X(w) ∈ A})
If there exists f ≥ 0, such that
( P
f (x), discrete X
P (X ∈ A) = x∈A R
A
f (x)dx, , continuous X

then we call f the p.d.f of r.v. X. R



The p.d.f f satisfies f (x) ≥ 0 and −∞ f (x)dx = 1

R∞
(b)For each function f , f (x) ≥ 0 and −∞ f (x)dx = 1, there exists a r.v. X
such that f is the p.d.f r.v. X. R∞ R∞
On the other hand, if f (x, y) satisfies f (x, y) ≥ 0 and −∞ −∞ f (x, y)dxdy =
1,there exists r.v.’s X and Y such that f (x, y) is joint p.d.f of X, Y
Consider the function
2
(x−µ1 ) (x−µ1 )(y−µ2 ) (y−µ2 )2
1 −1 1 2 [ 2 −2ρ + 2 ]
f (x, y) = p e 2 1−ρ σ1 σ1 σ 2 σ2
, −∞ < x < ∞, −∞ < y < ∞
2πσ1 σ2 1 − ρ2

for some µ1 , µ2 ∈ R, σ1 , σ2 > 0, −1 < ρ < 1

We want to do the followings:


R∞ R∞
(a) Show that −∞ −∞ f (x, y)dxdy = 1. There exists X and Y such that
f (x, y) is the joint p.d.f of X, Y .
We call f the bivariate normal distribution.

(b) Find marginal p.d.f’s of X and Y .

(c) Find conditional mean E(Y |X = x) and variance Var(Y |X = x).


This is the basis for linear regression.

Now, f (x, y) ≥ 0 for x, y ∈ R R∞ R∞


To show f being a joint p.d.f needs to show that −∞ −∞ f (x, y)dxdy = 1.

37
R∞ R∞ R∞ R∞
We want to show that −∞ −∞
f (x, y)dxdy = (
−∞ −∞
f (x, y)dy)dx = 1

∞ ∞ (x−µ1 )2 (x−µ1 )(y−µ2 ) (y−µ2 )2


Z Z
1 − 12 1
1−ρ2
[ 2
σ1
−2ρ σ1 σ2
+ 2
σ2
]
f (x, y)dy = p e dy
−∞ 2πσ1 σ2 1 − ρ2 −∞
σ
(x−µ1 ) 2 Z ∞ (y−(µ2 +ρ σ2 (x−µ1 )))2
1 − 2 1 − 2
1
2σ2 (1−ρ2 )
=√ e 2σ1 √ p e dy
2πσ1 −∞ 2πσ2 1 − ρ2
(x−µ1 ) 2
1 − 2
=√ e 2σ1
2πσ1
Z ∞Z ∞ Z ∞ (x−µ1 ) 2
1 − 2
⇒ f (x, y)dxdy = √ e 2σ1 dx = 1
−∞ −∞ −∞ 2πσ1
⇒ f (x, y) is a joint p.d.f of two r.v.’s X and Y .
We have shown that a bivariate normal distribution has p.d.f
(x−µ1 ) 2 (x−µ1 )(y−µ2 ) (y−µ2 )2
1 −1 1 2 [ 2 −2ρ + 2 ]
f (x, y) = p e 2 1−ρ σ1 σ1 σ 2 σ2
, −∞ < x < ∞, −∞ < y < ∞
2πσ1 σ2 1 − ρ 2

and it may be written as


σ
(x−µ1 ) 2 (y−(µ2 +ρ σ2 (x−µ1 )))2
1 − 2 1 − 1
2 (1−ρ2 )
f (x, y) = √ e 2σ1 √ p e 2σ2

2πσ1 2πσ2 1 − ρ2

We now derive the marginal p.d.f’s.


The marginal p.d.f of X is
Z ∞
fX (x) = f (x, y)dy
−∞
σ
(x−µ1 ) 2 Z ∞ (y−(µ2 +ρ σ2 (x−µ1 )))2
1 − 2 1 − 2
1
2σ2 (1−ρ2 )
=√ e 2σ1 √ p e dy
2πσ1 −∞ 2πσ2 1 − ρ2
(x−µ1 ) 2
1 − 2
=√ e 2σ1 , x ∈ R
2πσ1

Then X ∼ N (µ1 , σ12 )


We can also show that Y ∼ N (µ2 , σ22 )
Conditional distribution :
The distribution of Y given X = x is called the conditional distribution.
The conditional mean is E(Y |x) and conditional variance Var(Y |x) = E[(Y −

38
E(Y |x)2 )|x].
The conditional p.d.f of Y given X = x is
f (x, y)
fY |x (y) =
fX (x)
σ
(x−µ1 )2 (y−(µ2 +ρ σ2 (x−µ1 )))2
− − 1
2 2
√ 1 e 2σ1
√ √
1
e 2σ2 (1−ρ2 )
2πσ1 2πσ2 1−ρ2
= (x−µ1 )2
− 2
√ 1 e 2σ1
2πσ1
σ
(y−(µ2 +ρ σ2 (x−µ1 )))2
1 − 2
1
2σ2 (1−ρ2 )
=√ p e ,y ∈ R
2πσ2 1 − ρ2
σ2
⇒ Y |x ∼ N (µ2 + ρ (x − µ1 ), σ22 (1 − ρ2 ))
σ1
And
σ1
X|y ∼ N (µ1 + ρ (y − µ2 ), σ12 (1 − ρ2 ))
σ2
So,
σ2
Y |x = µ2 + ρ (x − µ1 ) + ,  ∼ N (0, σ22 (1 − ρ2 ))
σ1
σ2 σ2
= µ2 − ρ µ1 + ρ x + 
σ1 σ1
= β0 + β1 x + ,  ∼ N (0, σ 2 )
This is the linear regression
 model.
We have observations xy11 , . . . , xynn , then the linear regression model is
yi = β0 + β1 xi + i , i = 1, . . . , n
where 1 , . . . , n are iid N (0, σ 2 )
The problem in linear regression is that  we have  a sequence of random vectors
X1 Xn x1 xn
Y1
, . . . , Yn
with observations y1
, . . . , yn
and picture.
But we believe that the observations obey the following linear regression
model.
yi = β0 + β1 xi + i , i = 1, . . . , n
where 1 , . . . , n are iid random variables with mean 0 and variance σ 2

The aim in linear regression analysis is to influence (estimation and hypoth-


esis testing ) the parameter β0 , β1 . If we know β0 and β1 , then the prediction
of a future y0 where x is x0 is
yˆ0 = β0 + β1 x0

39
Chi-square Test(Goodness of Fit Test)
In developing of C.I. for θ and hypothesis testing, most methods are derived
assuming that random sample X1 , . . . , Xn is drawn from normal distribution.
An important question is: how do you know that it is really drawn from nor-
mal distribution ?
So, we may try to test the following hypothesis:
iid
H0 : X1 , . . . , Xn ∼ N (µ, σ 2 )
This is a goodness of fit problem that can be solved be chi-square test by
Karl Pearson (Father of Pearson by Neyman-Pearson Theorem)
We first consider the hypothesis
iid
H0 : X1 , . . . , Xn ∼ f0 (x)

where f0 is known p.d.f.


Let A1 , . . . , Ak be a partition (mutually exclusive sets ) for space of X.
Define Z
H0
Pj = P (x ∈ Aj ) = f0 (x)dx, j = 1, . . . , k
Aj
k
P
⇒ Pj = 1
j=1
Let Nj denotes the number of X1 , . . . , Xn falling in set Aj , j = 1, . . . , k.
k
P
So, Nj = n. We have the following theorem.
j=1

Thm. Let
k k
X (Nj − nPj )2 X (pratical # − theoretical #)2
Qk = =
j=1
nPj j=1
theoretical # in Aj

d
We have Qk −→ χ2 (k − 1) if H0 is true.
Proof. We consider k = 2 only.

(N1 − nP1 )2 (N2 − nP2 )2 (N1 − nP1 )2 ((n − N1 ) − n(1 − P1 ))2


Q2 = + = +
nP1 nP2 nP1 n(1 − P1 )
2
1 1 (N1 − nP1 )
= (N1 − nP1 )2 ( + )=
nP1 n(1 − P1 ) nP1 (1 − P1 )
N1 − nP1 d
= (p )2 −→ χ2 (1) as n → ∞
nP1 (1 − P1 )

40
H
N1 (# of X1 , . . . , Xn falling A1 ) ∼0 b(n, P1 )
N1 − nP1 d
⇒p −→ N (0, 1) by CLT
nP1 (1 − P1 )

Def. The Pearson’s chi-square test for hypothesis


iid
H0 : X1 , . . . , Xn ∼ f0 (x)

is rejecting H0 if Qk ≥ χ2α (k − 1) , where α = P (χ2 (k − 1) ≥ χ2α (k − 1))


Note:
n→∞
P (Qk ≥ χ2α (k − 1) : H0 ) −→ P (χ2 (k − 1) ≥ χ2α (k − 1)) = α

Example:
Mendelian theory:
Shape and color of a pea ought to be grouped into four groups with proba-
bilities as follows:
Groups probabilities obs
9
Round and yellow P1 = 16 N1 = 315
3
Round and green P2 = 16 N2 = 108
3
Angular and yellow P3 = 16 N3 = 101
1
Angular and green P4 = 16 N4 = 32
With sample n = 556(x1 , . . . , x556 ), the numbers grouped are displaying
above. We want to test hypothesis at significance level α = 0.05
9 3 3 1
H0 : P1 = , P2 = , P3 = , P4 = .
16 16 16 16

4
X (Nj − nPj )2
Q4 =
j=1
nPj
9 2 3 2 3 2 1 2
(315 − 556 × 16
) (108 − 556 × 16
) (101 − 556 × 16
) (32 − 556 × 16 )
= 9 + 3 + 3 + 1
556 × 16 556 × 16 556 × 16 556 × 16
= 0.47

χ20.05 (3) = 7.81 > Q4


We do not reject H0 and conclude that Mendelian’s theorem is correct.
How can we test the hypothesis
iid
H0 : X1 , . . . , Xn ∼ f (x, θ1 , . . . , θp )

41
where f is known p.d.f but θ1 , . . . , θp are unknown?
Let A1 , . . . , An be a partition of space of X and θˆ1 , . . . , θˆp be mle’s of θ1 , . . . , θp .
Define Z
P̂j = f (x, θˆ1 , . . . , θˆp )dx, j = 1, . . . , k
Aj

k
P
Again, we denote N̂j the # of X1 , . . . , Xn falling in Aj . So, N̂j = n
j=1

Thm. Let
k
X (N̂j − nP̂j )2
Qk =
j=1 nP̂j
d
We have Qk −→ χ2 (k − p − 1) if H0 is true.
Def. The chi-square test for H0 : X ∼ f (x, θ1 , . . . , θp ) is rejecting H0 if
Qk ≥ χ2α (k − p − 1)
Example:
iid
X1 , . . . , Xn ∼ f (x, θ). Consider hypothesis H0 : X ∼ N (µ, σ 2 ).
n
sol: mle’s µ̂ = x, σˆ2 = n1 (xi − x)2
P
i=1
Let A1 = (−∞, a1 ), A2 = (a1 , a2 ), . . . , Ak = (ak−1 , ∞) partition of space of
N (x, s2 ).Define
X −x a1 − x a1 − x
Pˆ1 = PN (x,s2 ) (X ≤ a1 ) = PN (x,s2 ) ( ≤ ) = P (Z ≤ )
s s s
a1 − x X −x a2 − x
Pˆ2 = PN (x,s2 ) (a1 ≤ X ≤ a2 ) = PN (x,s2 ) ( ≤ ≤ )
s s s
a2 − x a1 − x
= P (Z ≤ ) − P (Z ≤ )
s s
..
.
aj − x aj−1 − x
P̂j = PN (x,s2 ) (aj−1 ≤ X ≤ aj ) = P (Z ≤ ) − P (Z ≤ )
s s
..
.
ak−1 − x
Pˆk = PN (x,s2 ) (X ≥ ak−1 ) = P (Z ≥ )
s
k
X (N̂j − nP̂j )2 d
Qk = −→ χ2 (k − 3)
j=1 nP̂j
Reject H0 if Qk ≥ χ2 (k − 3)

42

You might also like