Professional Documents
Culture Documents
Week 6
Week 2
Week 3
Week 4
Probability:
Review
Estimation: Week 5
Week
7
Week
8
Week 9
Hypothesis testing:
Week
10
Week
11
Week
12
Linear regression:
Week 2 VL
Week 3 VL
Week 4 VL
Video lectures: Week 1 VL
Week 1
Week 5 VL
This week
Evaluation estimators:
- UMVUE (unbiased, lowest variance);
- Cramer-Rao lower bound;
- Rao-Blackwell Theorem.
Summary
Summary
1103/1175
1104/1175
Summary
Summary
MSE
1X
Xk
n
k=1
n
n
1 X
1X
Var (Xk ) = 2
= .
2
n
n
n
k=1
1106/1175
k=1
Approximation
Note that the variance of the estimator is a function of the
parameter we are estimating.
Hence, we do not know Var (T ), thus we approximate using
the estimator with:
b
X
\
Var
(T ) = = .
n
n
The square root of this is called the standard error of the
estimate:
s
X
sd
() =
.
n
1107/1175
Functions of
Note that we defined () as a function of the unknown
parameters.
Question: Why might we be interested in determining an
estimate of a function of the parameters instead of the an
estimate of the parameters?
Solution: We might be interested in an estimate of a
non-linear transformation of the parameters.
Example: consider Pr(X = 0), where X Poi():
1108/1175
e 0
.
Pr(X = 0) =
0!
h i
b = , however
We know that E
" b #
h
i
b0
0
b =E e 6= e .
E Pr X = 0|
0!
0!
UMVUEs
Consider two unbiased estimators, say T1 and T2 . We define
efficiency of T1 relative to T2 as:
eff (T1 , T2 ) =
Var (T2 )
.
Var (T1 )
1110/1175
Summary
Summary
log(fX (x|))
log(fX (x|))
=
E
If ? () = E
2
2
h 2
i
`(x;)
2
= E `(x;)
/n
=
E
/n,
2
* see also slides 1166-1168 (we do not need to prove it in this
course). Fisher information is the variance of the score (using
mean of zero). ** using i.i.d. samples.
Note: asymptotically, as n , the MLE is on the CRLB
MLE is asymptotically UMVUE.
1112/1175
mx
x mp
log (fX (x; p)) x
=
=
p
p
1p
p(1 p)
2
2
log (fX (x; p))
(x mp)
= 2
p
p (1 p)2
"
#
E (x mp)2
log (fX (x; p)) 2
Var (X )
m
If ? (p) = E
= 2
= 2
=
2
2
p
p
(1
p)
p
(1
p)
p(1
p)
1113/1175
=
2
2
2
p
p
(1 p)
p(1 p)
Thus, the Cramer-Rao Lower Bound is given by:
Var (T (X1 , . . . , Xn ))
1
n
m
p(1p)
p(1 p)
.
mn
Summary
Summary
Consistency
A sequence of estimators {Tn } is a consistent sequence of
estimators of the parameter if for every > 0 we have:
lim Pr (|Tn | < ) = 1,
n
a.s.
i.e., Tn .
Equivalently, if Tn is a sequence of estimators of a parameter
that satisfies the following two conditions:
i) lim Var (Tn ) = 0 (the uncertainty in the estimate is zero as
n
n );
ii) lim Bias (Tn ) = 0 (estimator is asymptotically unbiased);
n
1115/1175
b
d
n n N 0, I ?1() .
Summary
Summary
Sufficient Statistics
Let (X1 , X2 , . . . , Xn ) have joint p.d.f. f (x; ). A statistic S is
said to be sufficient for if for any other statistic T the
conditional p.d.f. of T given S = s, denoted by fT |S (t) does
not depend on , for any value of t.
Idea: if S is observed, additional information about cannot
be obtained from if the conditional distribution of T given
S = s is free of .
Factorization Theorem. A necessary and sufficient condition
for T (X1 , . . . , Xn ) to be a sufficient statistic for is that the
joint probability function (density function or frequency
function) factors in the form:
1117/1175
1118/1175
2
2
b
e = E[E |T
b ,
E
]=E
| {z }
e
=
1119/1175
b
b > Var
e , unless Var |T
Thus, Var
= 0. This is
b is a function of T , which would imply
the case only if
b = .
e
Pn
i=1 xi ,
we know
n
s n1 exp ( s) .
(n)
Summary
Summary
Introduction
Last week we have seen point estimators;
Point estimators: using a sample tries to describes the
distribution of a population;
However, the sample itself is a random variable;
This implies that parameters estimated using a sample are
uncertain!
You should take that into account, especially when you are
interested in tail risk (example insurer: probability of ruin).
Using a point estimate would underestimate the true risk.
1122/1175
Example
Consider an i.i.d. sample of size 4, X1 , X2 , X3 , X4 from
N (, 1). Recall that we can estimate the population mean
that will be in the range
by X . The probability
X 1, X + 1 is:
Pr X 1 < < X + 1 =Pr 1 < X < 1
=0.9544.
Summary
Summary
Y = 2 X n 2 (2n)
fY(y)
Confidence intervals
1 2
q2 (2n)
q1
1128/1175
2
q1 (2n)
q2
Summary
Summary
2n/2
2 t n
1
1
=
.
12t
and
q2 = 21/2 (2 n) .
1130/1175
2nx
<<
21/2 (2 n)
2nx
as n .
This holds for all r.v. with finite mean and variance, not only
normal r.v.!
Suppose X1 , . . . , Xn is a random sample from a population
with mean and known variance 2 .
1131/1175
X X
X
N(0, 1).
=
X
/ n
X
(decreasing). Using:
/ n
< Z < z1/2 = 1 ,
Pr z/2 n X <
Pr X n z1/2 <
1132/1175
/ n
< z1/2 )
< z1/2
<X
z/2
X
= 1
= 1
= 1
1%
5%
10%
1134/1175
two-sided
z1/2
2.05
1.96
1.645
one-sided
z1
2.33
1.645
1.28
Note that the above gives the confidence interval for the mean
both when the population variance is known and when it is
only an approximation for which the approximation improves
with increasing sample size. This same confidence interval
formula for the mean holds even if the population variance is
replaced by the sample variance provided the sample is large
(generally, n > 30 is a rule of thumb for large samples).
(n 1) tn1 .
T =
2
S/ n
/ n
| {z } |
{z
}
=Z
1135/1175
s/ n
2 (n1)
n1
(decreasing).
Interpretation: as n we have s .
Let X1 , . . . , Xn be a random sample from N , 2 .
We suppose that is not known and we wish to construct a
100 (1 ) % confidence interval for 2 .
a. Question: What is the pivot? See week 5 online lecture.
b. Question: Find an (approximated) 100 (1 ) % confidence
interval for 2 .
1137/1175
(n 1) S 2
(n 1) S 2
2
<
<
21/2 (n 1)
2/2 (n 1)
!
=1 .
<
,
21/2 (n 1)
2/2 (n 1)
where s 2 is the observed sample variance.
1139/1175
from N 1 , 12 ,
from N 2 , 22 .
and
Denote the respective sample variances by S12 and S22 .
Application: Is one portfolio riskier than another?
1140/1175
and
(n2 1) S22
2 (n2 1) .
22
1141/1175
2 s 2
2
2. The function g X1 , . . . , Xn ; 12 = 22 12 (decreasing).
2
1 s2
< 2 < 2
=1 ,
S22 F1/2 (n1 1, n2 1)
2
S2 F/2 (n1 1, n2 1)
where F/2 (n1 1, n2 1) and F1/2 (n1 1, n2 1) are
determined from the table of F -distribution (see F&T page
170174).
1142/1175
Snecdors F p.d.f.
0.7
n1=3, n2=15
0.6
0.4
0.4
fX(x)
fX(x)
0.5
0.3
0.1
0.1
0
Snecdors F c.d.f.
Snecdors F c.d.f.
7/8
FX(x)
7/8
FX(x)
0.3
0.2
0.2
n1=15, n2=3
0.5
1/2
1/8
n1=3, n2=15
0.23 0.83
1143/1175
2.25
x
1/2
1/8
n1=15, n2=3
0.45
1.21
4.37
x
F1/2 (n2 1, n1 1) =
1
F/2 (n1 1,n2 1)
1
.
F/2 (n1 1, n2 1)
12
is given
22
s12
1
12
s12
<
F1/2 (n2 1, n1 1) ,
<
s22 F1/2 (n1 1, n2 1)
22
s22
1144/1175
where s12 and s22 are the observed sample variances from the
two populations.
1145/1175
12
.
22
0.68
0.85
2
1
12
0.68 2
<
<
F10.025 (50, 178)
F10.025 (178, 50) 22
0.85
0.68 2
1
2
0.68 2
< 12 <
1.435
0.85
1.56
0.85
2
0.410 3<
1146/1175
12
< 0.918 4
22
1147/1175
s
Pr X 1 X 2
(X
1 X 2 )(1 2 )
12 /n1 +22 /n2
2
12
+ 2 z1/2 < 1 2 < X 1 X 2 +
n1
n2
22
12
+
z1/2 = 1.
n1
n2
12 22
(x 1 x 2 )
+ 2 z1/2 < 1 2 < (x 1 x 2 ) +
+ z1/2 ,
n1
n2
n1
n2
where z1/2 is the point on the standard normal for which
the probability below it is /2.
1149/1175
1150/1175
+
n1 n2
has an approximate standard normal distribution.
2 can be estimated by pooling the squared deviations from
the means of the two samples with the pooled estimator:
Sp2 =
(n1 1) S12
2
|
{z
}
=
=
Pn1 1
i=1
(n2 1) S22
2 (n2 1) .
2
(n2 1) S22
2
|
{z
}
Zi2 2 (n1 1)
Pn2 1
i=1
Zi2 2 (n2 1)
(n1 + n2 2) Sp2
2 (n1 + n2 2) ,
2
|
{z
}
=
Pn1 +n2 2
i=1
Zi2
Z
Y
n1 +n2 2
( )
(X 1 X
2) 1 2
1/n1 +1/n2
Sp2
2
X 1 X 2 (1 2 )
r
=
tn1 +n2 2 .
1
1
Sp
+
n1 n2
Here Sp is the pooled standard deviation (see slide 1151).
2. We have (decreasing function):
. p
g (X1 , . . . , Xn ; 1 2 ) = X 1 X 2 (1 2 )
Sp 1/n1 + 1/n2 .
1153/1175
1
1
+
< 1 2
n1
n2
r
1
1
< X 1 X 2 +t1/2,n1 +n2 2 Sp
+
= 1 .
n1
n2
X 1 X 2 t1/2,n1 +n2 2 Sp
1
1
+ ,
n1 n2
where t1/2,n1 +n2 2 is the point on the t-distribution (with
n1 + n2 2 degrees of freedom) for which the probability
above it is /2.
< (x 1 x 2 ) +t1/2,n1 +n2 2 sp
1154/1175
1155/1175
p (1 p)
.
n
b
pp
b
p 1b
p
approx
N(0, 1).
n
p
p (1 b
p ) /n and
2. Thus, g (X1 , . . . , Xn ; p) = (b
p p) / b
!
r
r
b
b
p (1 b
p)
p (1 b
p)
Pr b
p
z1/2 < p < b
p+
z1/2 = 1 .
n
n
3. A 100 (1 ) % confidence interval for p is given by:
r
r
b
b
p (1 b
p)
p (1 b
p)
b
z1/2 < p < b
p+
z1/2 .
p
n
n
1157/1175
bp21 p2 = Var (b
p1 b
p2 ) =Var (b
p1 ) + Var (b
p2 ) 2Cov (b
p1 , p
b2 (1 p
b2 )
b1 (1 p
b1 ) p
p
+
.
=
n1
n2
Question: Why is Cov (b
p1 , b
p2 ) = 0?
Solution: We have two different populations we draw from,
hence independent.
1158/1175
b2 ) (p1 p2 )
(b
p1 p
q
bp21 p2
approx
N(0, 1).
2. Thus,
g (X1 , . . . , Xn ; p1 p2 ) = ((b
p1 b
p2 ) (p1 p2 )) /b
p1 p2 and
Pr (b
p1 b
p2 )
bp1 p2 z1/2 < p1 p2 < (b
p1 b
p2 ) +b
p1 p2 z1/2 = 1 .
claim sizes in
2011 2010
45
46
37
42
1160/1175
216
188
bF = 1350
Solution: b
pM = 1500
= 0.144, p
= 0.139259,
b
pM b
pF = 0.004740741. Note: pM = 0.15 and pF = 0.13!
p2M pF = 0.144(10.144)
+ 0.139259(10.139259)
= 1.71 104
1500
1350
pM pF = 0.0131. Z = 0.004740741/0.01307538 =
0.362569852 = 1 0.641536883 = 0.358463117.
1161/1175
n
n
1 X
1 X
Dk =
(X1k X2k ) ,
n
n
k=1
and define:
k=1
v
uP
2
u n
Dk D
u
t
SD = k=1
,
n1
1164/1175
13,725
10
1,104 1,241
10 10 = 24.66
2
1,104
= 285,
10
24.66
28598
= 0.15.
1165/1175
Solution: d = 137
10 = 13.7,
sd = 1/9 (4, 829 13.72 10) = 18.11.
Pr (d < 10 = 13.7 + t1,9 (18.11/3)) = 1
t1,9 = 0.61288 = 0.277561 1 = 0.722439.
Summary
Summary
"
E
2 #!1
1
log (fX (x|))
=
.
n If ? ()
1
If ? ()
.
"
If ? () = E
2 #
.
log (fX (x|))
E
log (fX (x|))
= E
log (fX (x|)) .
2
In evaluating the variance of MLE, you can therefore use either
form of this variance formula.
1167/1175
1
(1 + (y )2 )
Summary
Summary
1
=
Var b = E
log (fX (x|))
,
n
n If ? ()
"
2 #
2
where If ? () =E
log (fX (x|))
= E
log (fX (x|)) .
2
1171/1175
1172/1175
1
b
n If ? ()
1173/1175
Summary
Summary
Evaluating estimators
1. UMVUE estimator: unbiased (E[T ] = ()), minimum
variance (Var (T ) Var (T ? ) for all T ? ). If estimator T is on
CRLB then T is UMVUE.
CRLB: Var (T (X1 , X2 , . . . , Xn ))
1
n If ? ()
2. Consistent estimator:
lim Pr (|Tn | < ) = 1,
a.s.
i.e., Tn .
Interval estimators
Pivotal quantity method:
1. Find the pivot;
2. Find the function g (X1 , . . . , Xn ) such that
Pr(q1 < g (X1 , . . . , Xn , ) < q2 ) = 1 ;
3. The 100(1 )% confidence interval of :
g 1 (X1 , . . . , Xn ; q1 ) g 1 (X1 , . . . , Xn ; q2 ).
1175/1175
3. Asymptotically sufficient.