Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6
ACTL2002/ACTL5101 Probability and Statistics

c Katja Ignatieva

School of Risk and Actuarial Studies
Australian School of Business
University of New South Wales
k.ignatieva@unsw.edu.au
Week 6
Week 2
Week 3
Week 4
Probability:
Review
Estimation: Week 5
Week
7
Week
8
Week 9
Hypothesis testing:
Week
10
Week
11
Week
12
Linear regression:
Week 2 VL
Week 3 VL
Week 4 VL
Video lectures: Week 1 VL
Week 1
Week 5 VL
Last five weeks

Introduction to probability;
Moments: (non)-central moments, mean, variance (standard
deviation), skewness & kurtosis;
Special univariate (parametric) distributions (discrete &
continue);
Joint distributions;
Moments & distribution for sample mean and variance.
Convergence; with applications LLN & CLT;
Estimators (MME, MLE, and Bayesian).
1101/1175
This week
Evaluation estimators:
- UMVUE (unbiased, lowest variance);
- Cramer-Rao lower bound;
- Rao-Blackwell Theorem.
Interval estimation (v.s. point estimates last week):

- Pivotal quantity method;
- Confidence interval for: mean, difference between two means,
proportions, variance, ratio of two variances, paired difference,
and MLE estimates.
Many examples: not going to cover all in the lecture. Know

and be able to apply the method, do not memorize them!
1102/1175

Evaluating estimators
Fisher (1922) on good estimators
Evaluating estimators & Interval estimation using CIs

UMVUEs
Cram
er-Rao Lower Bound (CRLB)
Consistency
Sufficient Statistics
Interval estimation using confidence intervals

Introduction
The Pivotal Quantity Method
Examples & Exercises
Maximum Likelihood estimate

Important properties of MLE estimates
CI for Maximum Likelihood Estimates
Summary
Summary


Last week we have seen 3 estimators.
There are infinite different estimators.
How to tell whether an estimator is good/better than another?
Fisher (1922) can with three conditions for good estimators:
- Efficiency: good estimator has smaller variance than others;
- Consistency: good estimator converges to true value of
parameter;
- Sufficiency: good estimator contains/uses all the information
about our parameter of interest that is present in the data.
1103/1175

Methods to Evaluate Estimators

How good is an estimator?
One can compare them using:
i. The Best Unbiased Minimum Variance Estimator;
- Lowest mean squared error and unbias;
- Prove using Cramer-Rao Lower Bound;
ii. Consistency;
iii. Sufficient Statistics.
In the next slides we will discuss all three.
1104/1175

UMVUEs

UMVUEs
Cram
Consistency

Introduction

Summary
Summary

UMVUEs
Mean Squared Error and Bias

The mean squared error (MSE) of an estimator
T (X1 , X2 , . . . , Xn ) of a parameter is defined as:
h
i
MSE = E (T )2 .
The MSE gives the average squared difference between the
estimator T (X1 , X2 , . . . , Xn ) and and is given by:
h
i

MSE = E (T )2 = E T 2 + 2 2T + E[T ] E[T ]

= E T 2 E [T ]2 + E [T ]2 + E 2 E [2T ]
= Var (T ) + (E[T ] )2 = Var (T ) + (Bias (T ))2 ,
where Bias(T) =E[T ] ; * note is a constant.

1105/1175
An unbiased estimator has: E[T ] = .

UMVUEs
Example: estimation of Poisson

bML = X .
Recall (last weeks lecture) the MLE of a Poisson is
bMM = X .
We know X = E [X ] = , thus MME estimator:
These are thus both unbiased:
Bias (T ) = E[T ] = E[X ]
n
n
1X
E[Xk ] =
= 0.
=
n
n
k=1
The MSE is given by (using unbiased):

n
MSE
= Var (T ) + (Bias (T )) = Var
1X
Xk
n
k=1
n
n
1 X
1X
Var (Xk ) = 2
= .
2
n
n
n
k=1
1106/1175
k=1

UMVUEs
Approximation
Note that the variance of the estimator is a function of the
parameter we are estimating.
Hence, we do not know Var (T ), thus we approximate using
the estimator with:
b
X
\
Var
(T ) = = .
n
n
The square root of this is called the standard error of the
estimate:
s
X
sd
() =
.
n
1107/1175

UMVUEs
Functions of
Note that we defined () as a function of the unknown
parameters.
Question: Why might we be interested in determining an
estimate of a function of the parameters instead of the an
estimate of the parameters?
Solution: We might be interested in an estimate of a
non-linear transformation of the parameters.
Example: consider Pr(X = 0), where X Poi():
1108/1175
e 0
.
Pr(X = 0) =
0!
h i
b = , however
We know that E
" b #
h
i
b0
0
b =E e 6= e .
E Pr X = 0|
0!
0!

UMVUEs
UMVUEs
Consider two unbiased estimators, say T1 and T2 . We define
efficiency of T1 relative to T2 as:
eff (T1 , T2 ) =
Var (T2 )
.
Var (T1 )
It is clear that if this is larger than 1, then:

Var (T2 ) > Var (T1 ),
i.e., estimator T1 has lower variance than estimator T2 .
Thus high value of eff (T1 , T2 ) implies prefer T1 above T2 .
1109/1175

UMVUEs
Unbiased Estimators with Minimum Variance (UMVUEs)
An estimator T is said to be a best unbiased estimator of

() if it satisfies two conditions:
- The estimator T is unbiased, i.e., E[T ] = ();
- The estimator T has the smallest variance, i.e.,
Var (T ) Var (T ? ), for any other unbiased estimator T ? .
Note that the best unbiased estimator T is often called the

uniform minimum variance unbiased estimator (UMVUE) of
() .
1110/1175

Cram

UMVUEs
Cram
Consistency

Introduction

Summary
Summary

Cram
Cramer-Rao Lower Bound (CRLB)

How to prove T (X1 , X2 , . . . , Xn ) has the lowest variance of all
unbiased estimators?
Calculate efficiency for all unbiased estimators?
That will take some time, what is all?
Let X1 , X2 , . . . , Xn be a random sample from fX (x|) and let
T (X1 , X2 , . . . , Xn ) be an unbiased estimator of .
The smallest lower bound of the variance (called the
Cramer-Rao Lower Bound (CRLB)) for unbiased estimators is:
1
Var (T (X1 , X2 , . . . , Xn ))
,
n If ? ()
1111/1175
where If ? () is the Fisher information of the parameter (see

next slide).

Cram
Cramer-Rao Lower Bound (CRLB)

Score: S = `(x; )/. MLE satisfies FOC E[S] = 0.
The Fisher information of the parameter is defined to be the
function:

h 2
i
2
log(fX (x|))
log(fX (x|))
=
E
If ? () = E
2

2
h 2
i
`(x;)
2
= E `(x;)
/n
=
E
/n,
2
* see also slides 1166-1168 (we do not need to prove it in this
course). Fisher information is the variance of the score (using
mean of zero). ** using i.i.d. samples.
Note: asymptotically, as n , the MLE is on the CRLB
MLE is asymptotically UMVUE.
1112/1175

Cram
Exercise: Cramer-Rao Lower Bound (CRLB)

Consider n draws from a Bin(m, p) r.v..

m
fX (x; p) =
p x (1 p)mx
x

m
log (fX (x; p)) = log
+ x log(p) + (m x) log(1 p)
x
Question: Find the CRLB.
Solution: First, Fisher information (* Var (X ) = mp(1 p)):
mx
x mp
log (fX (x; p)) x
=
=
p
p
1p
p(1 p)

2
2
log (fX (x; p))
(x mp)
= 2
p
p (1 p)2
"
#

E (x mp)2
log (fX (x; p)) 2
Var (X )
m
If ? (p) = E
= 2
= 2
=
2
2
p
p
(1
p)
p
(1
p)
p(1
p)
1113/1175

Cram
Exercise: Cramer-Rao Lower Bound (CRLB)

Alternative, we can find the Fisher information by:
2 log (fX (x; p)) x
mx
= 2
p 2
p
(1 p)2
2

log (fX (x; p))
E [X ] m E [X ]
m
If ? (p) = E
.
=
=
2
2
2
p
p
(1 p)
p(1 p)
Thus, the Cramer-Rao Lower Bound is given by:
Var (T (X1 , . . . , Xn ))
1
n
m
p(1p)
p(1 p)
.
mn
Hence, the minimum of the variance of the estimate p

decreases if the number of r.v. (i.e., m) increases or the
sample size (i.e., n) increases.
1114/1175

Consistency

UMVUEs
Cram
Consistency

Introduction

Summary
Summary

Consistency
Consistency
A sequence of estimators {Tn } is a consistent sequence of
estimators of the parameter if for every > 0 we have:
lim Pr (|Tn | < ) = 1,
n
a.s.
i.e., Tn .
Equivalently, if Tn is a sequence of estimators of a parameter
that satisfies the following two conditions:
i) lim Var (Tn ) = 0 (the uncertainty in the estimate is zero as
n
n );
ii) lim Bias (Tn ) = 0 (estimator is asymptotically unbiased);
n
1115/1175
then it is a sequence of consistent estimators of (Proof

using Chebyshevs inequality: Pr (|X | > ) 2 /2 ).

Consistency
Example: consistency of MLEs

Suppose X1 , X2 , . . . Xn is a random sample from fX (x|).

Let b be the MLE of so that b is the MLE of any
continuous function ().
Under certain regularity conditions (e.g., continuous,
differentiable,
no parameter on the boundaries of x, etc.) on
fX (x|), b is a consistent estimator of ().
Due to:

b
d
n n N 0, I ?1() .
Proof: See slide 1166.

1116/1175


UMVUEs
Cram
Consistency

Introduction

Summary
Summary

Let (X1 , X2 , . . . , Xn ) have joint p.d.f. f (x; ). A statistic S is
said to be sufficient for if for any other statistic T the
conditional p.d.f. of T given S = s, denoted by fT |S (t) does
not depend on , for any value of t.
Idea: if S is observed, additional information about cannot
be obtained from if the conditional distribution of T given
S = s is free of .
Factorization Theorem. A necessary and sufficient condition
for T (X1 , . . . , Xn ) to be a sufficient statistic for is that the
joint probability function (density function or frequency
function) factors in the form:
1117/1175
fX (x1 , . . . , xn | ) = g (T (x1 , . . . , xn ) , ) h (x1 , . . . , xn ) .

The Rao-Blackwell Theorem

b be an estimator of with E[
b 2 ] < (i.e., finite) for
Let
all . Suppose that T is sufficient for . Define a new
estimator as:
e = E[
b |T ].
Then for all , this new estimator has a smaller MSE. We

have that:

e MSE
b
MSE
or, equivalently:

E
1118/1175
2

2
Thus, we see from that Rao-Blackwell theorem, that if an

estimator is not a function of a sufficient statistic it can be
improved in terms of MSE (proof: see next slides).

Proof: From * the law of iterated expectation (see week 4):

i
h
h i
h i
b
e = E[E |T
b ,
E
]=E
| {z }
e
=
so to compare the two estimators, we need only compare their

variances. Using the conditional variance identity, we have:

h
i
h

i
b
b
b
Var
= Var E |T
+ E Var |T

e
b
= Var + E[Var |T ].
| {z }
0
1119/1175

b
b > Var
e , unless Var |T
Thus, Var
= 0. This is
b is a function of T , which would imply
the case only if
b = .
e

The Rao-Blackwell Theorem

How do we explain this last clause? Well,
Z

i2

h

b d b = 0,
b =t
b
|t
f|T
b E |T
Var |T = 0
b
b
for all possible realizations t of T , and so:

h
i
b
b =t ,
Var |T
= 0 b = E |T
which implies b is a function of t, and thus:

h
i
b
b = E |T
b
e
Var |T
=0
= ,
as stated above.
1120/1175

Example: Sufficient Statistic for Exponential distribution

Consider a random sample Xi EXP() for i = 1, . . . , n. The
joint p.d.f. is:
!
n
X
n
xi .
fX (x1 , . . . , xn ; ) = exp
i=1
This suggests checking statistic S =

S Gamma(n, ) so that:
fS (s; ) =
Pn
i=1 xi ,
we know
n
s n1 exp ( s) .
(n)
The conditional density given S = s is:

P
fX (x1 , . . . , xn ; )
n exp ( ni=1 xi )
(n)
= n
= n1 ,
n1
fS (s; )
s
exp ( s)
(n) s
1121/1175
which is free of , thus S is sufficient for .

Introduction

UMVUEs
Cram
Consistency

Introduction

Summary
Summary

Introduction
Introduction
Last week we have seen point estimators;
Point estimators: using a sample tries to describes the
distribution of a population;
However, the sample itself is a random variable;
This implies that parameters estimated using a sample are
uncertain!
You should take that into account, especially when you are
interested in tail risk (example insurer: probability of ruin).
Using a point estimate would underestimate the true risk.
1122/1175

Introduction
Application: parameter risk

See Excel file.
We have 25 samples of 100 simulated observations of a
N(8, 122 ) random variable.
For each sample we can estimate the parameters of the
normal distribution.
Using the parameters we estimate the 99.5% percentile (VaR
required capital) for each sample or expected shortfall
E[Y |Y > b] where b = Y + Y (0.99).
Large variation in required capital between samples: between
35 and 43.
Parameters themselves are source of uncertainty!
1123/1175

Introduction
Parametric Interval Estimation

An interval estimate of a parameter has the form
b1 < < b2 , where b1 and b2 are realized values of suitable
random variables b1 (X1 , . . . , Xn ) and b2 (X1 , . . . , Xn ), which
are functions of the random sample X1 , . . . , Xn .
Construct the interval:

Pr b1 (X1 , . . . , Xn ) < < b2 (X1 , . . . , Xn ) = 1 ,
for some specified 0 1 and then we define:

b1 (X1 , . . . , Xn ) , b2 (X1 , . . . , Xn )
as the 100 (1 ) % confidence interval for .
1124/1175

Introduction
Example
Consider an i.i.d. sample of size 4, X1 , X2 , X3 , X4 from
N (, 1). Recall that we can estimate the population mean
that will be in the range
by X . The probability

X 1, X + 1 is:

Pr X 1 < < X + 1 =Pr 1 < X < 1

=Pr 4 < Z < 4

=(2) (1 (2))
=0.9544.
* using m.g.f. technique we have X N(, 2 /n).

Thus, is in the range: X 1, X + 1 with probability
0.9544.
1125/1175
Use: (4) = 0.999968, (2) = 0.97725, (1) = 0.8413


UMVUEs
Cram
Consistency

Introduction

Summary
Summary


The general method for constructing confidence intervals is
using the pivotal quantity method.
1. Find a pivot: i.e., function of X1 , . . . , Xn whose distribution
does not depend on .
2. Find the function g (X1 , . . . , Xn ; ):
The pivotal quantity method requires finding a function of the
form g (X1 , . . . , Xn , ), so that it is known that for quantiles
q1 and q2 we have:
Pr (q1 < g (X1 , . . . , Xn ; ) < q2 ) = 1 ,
with q1 q2 .
1126/1175
Continues next slide.


Thus, let g (X1 , . . . , Xn ; ) be a monotonic function of and
let it have a unique inverse g 1 (X1 , . . . , Xn ) = .
3. The 100 (1 ) % confidence interval of is given by:
g 1 (X1 , . . . , Xn ; q1 ) < < g 1 (X1 , . . . , Xn ; q2 ) ,
if g (X1 , . . . , Xn ; ) is an increasing function, and
g 1 (X1 , . . . , Xn ; q2 ) < < g 1 (X1 , . . . , Xn ; q1 ) ,
if g (X1 , . . . , Xn ; ) is a decreasing function.
See graph on slide 1128.

1127/1175

Y = 2 X n 2 (2n)
fY(y)
Confidence intervals
1 2
q2 (2n)
q1
1128/1175
1=q (2n)q (2n)
2
q1 (2n)
q2
Pr (q1 < g (X1 , . . . , Xn ; ) < q2 ) = 1


UMVUEs
Cram
Consistency

Introduction

Summary
Summary

Example: Pivotal quantity method and the Exponential

Suppose X1 , X2 , . . . , Xn is a random sample from Exp()
distribution (with MXi (t) = (1 t/)1 ). We know that
(week 2):
n
X
nX =
Xk Gamma (n, ) .
k=1
We know that the the m.g.f. of nX is:

h Pn
i

t n
MnX (t) = E e k=1 Xk t = (MXi (t))n = 1
,
and the m.g.f. of the random variable 2 n X is:

M2nX (t) = MnX (2 t) =
1129/1175

2n/2
2 t n
1
1
=
.
12t

1. The pivot is (recall from week 5):

2 n X Gamma n, 21 = 2 (2 n) .
Its distribution is free of the parameter value , thus a pivot.
If we therefore denote the quantiles from the 2 distributions
as (F&T page 164-166 & survival: 168, 169, see graph on
slide 1128):
q1 = 2/2 (2 n)
and
q2 = 21/2 (2 n) .
2. The function g (X1 , . . . , Xn ; ) = 2 n X (increasing):

Pr 2/2 (2 n) < 2 n X < 21/2 (2 n) = 1 .
3. Hence, a 100 (1 ) % confidence interval for is:
2/2 (2 n)
1130/1175
2nx
<<
21/2 (2 n)
2nx

Example: Confidence Interval for the Mean

Recall from week 5.
Suppose X1 , X2 , . . . , Xn are independent, identically
distributed random variables with finite mean and finite
variance 2 . As before, denote the sample mean by X n .
Then, the central limit theorem states:
Xn d
N (0, 1) ,
as n .
This holds for all r.v. with finite mean and variance, not only
normal r.v.!
Suppose X1 , . . . , Xn is a random sample from a population
with mean and known variance 2 .
1131/1175
Question: Find the CI for .

Solution: By the central limit theorem, X is approximately

normally distributed
with mean X = and (population)
variance X2 = 2 n.
1. Our pivot is Z =
X X
X
N(0, 1).
=
X
/ n
X
(decreasing). Using:
/ n

< Z < z1/2 = 1 ,
2. The function g (X1 , . . . , Xn ; ) =

Pr z/2
we then have:
Pr(z/2 <

Pr z/2 n X <

Pr X n z1/2 <
1132/1175
/ n
< z1/2 )
< z1/2
<X
z/2
X

= 1
= 1
= 1

Example: Confidence Interval for the Mean

3. Thus we have:
x z1/2 < < x+ z1/2 ,

n
n
where z1/2 is the point on the standard normal for which
the probability above it is /2 (note symmetry of standard
normal distribution). This is an approximate 100 (1 ) %
confidence interval for (given known population variance
2 ).
Question: Why approximate 100 (1 ) % confidence
interval?
Solution: Recall, X is asymptotically normally distributed
using CLT (except when the Xi are i.i.d. normally distributed).
1133/1175

Confidence Interval for the Mean

For standard normal distribution, we have the following (often
used) quantiles:
1%
5%
10%
1134/1175
two-sided
z1/2
2.05
1.96
1.645
one-sided
z1
2.33
1.645
1.28
Note that the above gives the confidence interval for the mean
both when the population variance is known and when it is
only an approximation for which the approximation improves
with increasing sample size. This same confidence interval
formula for the mean holds even if the population variance is
replaced by the sample variance provided the sample is large
(generally, n > 30 is a rule of thumb for large samples).

Exercise: CI mean, unknown Variance, Small Sample

Let X1 , . . . , Xn is a random sample from a population with
mean and unknown variance 2 (but with known sample
variance s 2 ).
a. Question: What is the pivot? See week 5 online lecture.
b. Question: Find an (approximated) 100 (1 ) % confidence
interval for .
a. Solution: The pivot is:
,s

X
X
(n 1)S 2
=
(n 1) tn1 .
T =
2
S/ n
/ n
| {z } |
{z
}
=Z
1135/1175
The function g (X1 , . . . , Xn ; ) =
s/ n
2 (n1)
n1
(decreasing).

Exercise: CI mean, unknown Variance, Small Sample

b. Solution: an approximate 100 (1 ) % confidence interval
for is given by:
s
s
x t1/2,n1 < < x+ t1/2,n1 ,
n
n
where t1/2,n1 is the point on the t-distribution with n 1
degrees of freedom for which above it is /2.
Table of percentiles (quantiles) from the t-distribution are
given in F&T page 163 (note symmetry of the distribution).
d
Note: tn1 N(0, 1) as n , often used for large

samples.
1136/1175
Interpretation: as n we have s .

Exercise: CI for the variance

Let X1 , . . . , Xn be a random sample from N , 2 .
We suppose that is not known and we wish to construct a
100 (1 ) % confidence interval for 2 .
a. Question: What is the pivot? See week 5 online lecture.
b. Question: Find an (approximated) 100 (1 ) % confidence
interval for 2 .
1137/1175


Define quantities 2/2 (n 1) and 21/2 (n 1):

Pr X 2/2 (n 1) =/2

Pr X 21/2 (n 1) =1 /2,
where X 2 (n 1). See F&T tables page 164-169.
a. Solution: We know from week 5 that the pivot is:
(n 1) S 2
2 (n 1) .
2
(n 1) s 2
The function g X1 , . . . , Xn ; 2 =
(decreasing).
2
1138/1175


b. Solution:

(n 1) S 2
2
2
< 1/2 (n 1) =1 .
Pr /2 (n 1) <
2
Rewriting, we obtain:
Pr
(n 1) S 2
(n 1) S 2
2
<
<
21/2 (n 1)
2/2 (n 1)
!
=1 .
A 100 (1 ) % confidence interval estimate for 2 is:

(n 1) s 2
(n 1) s 2
2
<
<
,
21/2 (n 1)
2/2 (n 1)
where s 2 is the observed sample variance.
1139/1175

Example: CI for ratios of two variances

When comparing the variances of two populations, the ratio
of the variances (rather than the difference) is considered
because there is a pivotal quantity available for ratios of the
variances that has an F -distribution.
Assume that we have two sets of samples:
X11 , X12 , . . . , X1n1 ,

from N 1 , 12 ,
X21 , X22 , . . . , X2n2 ,

from N 2 , 22 .
and
Denote the respective sample variances by S12 and S22 .
Application: Is one portfolio riskier than another?
1140/1175


Recall that:
(n1 1) S12
2 (n1 1)
12
and
(n2 1) S22
2 (n2 1) .
22
1. The pivot is:

(n1 1) S12 12
2 (n1 1)
n1 1
n 1
F (n1 1, n2 1)
= 2 1
2
2
(n2 1)
(n2 1) S2 2
n2 1
n2 1

S12 12
2 S 2
= 2 2 = 22 12 F (n1 1, n2 1) .
1 S2
S2 2
1141/1175
2 s 2

2
2. The function g X1 , . . . , Xn ; 12 = 22 12 (decreasing).
2
1 s2


So that:

22 S12
Pr F/2 (n1 1, n2 1) < 2 2 < F1/2 (n1 1, n2 1) =1
1 S2
2

2
S2
2
S22
Pr
F/2 (n1 1, n2 1) < 2 < 2 F1/2 (n1 1, n2 1) =1
S12
1
S1
2

2
S1
1
1
S12
1
Pr
< 2 < 2
=1 ,
S22 F1/2 (n1 1, n2 1)
2
S2 F/2 (n1 1, n2 1)
where F/2 (n1 1, n2 1) and F1/2 (n1 1, n2 1) are
determined from the table of F -distribution (see F&T page
170174).
1142/1175

Snecdors F p.d.f.
Snecdors F p.d.f.
0.7
n1=3, n2=15
0.6
0.4
0.4
fX(x)
fX(x)
0.5
0.3
0.1
0.1
0
Snecdors F c.d.f.
Snecdors F c.d.f.
7/8
FX(x)
7/8
FX(x)
0.3
0.2
0.2
n1=15, n2=3
0.5
1/2
1/8
n1=3, n2=15
0.23 0.83
1143/1175
2.25
x
1/2
1/8
n1=15, n2=3
0.45
1.21
4.37
x
F1/2 (n2 1, n1 1) =
1
F/2 (n1 1,n2 1)


Note that we have:
F1/2 (n2 1, n1 1) =
1
.
F/2 (n1 1, n2 1)
Note: F&T tables only has tables for 1 = 0.1,

1 = 0.05, 1 = 0.025, or 1 = 0.01.
3. A 100 (1 ) % confidence interval estimate for

by:
12
is given
22
s12
1
12
s12
<
F1/2 (n2 1, n1 1) ,
<
s22 F1/2 (n1 1, n2 1)
22
s22
1144/1175
where s12 and s22 are the observed sample variances from the
two populations.

Application: CI for Ratios of Two Variances

ABC Manufacturing Company makes computer chips in the
Asia Pacific region. It has been alleged that the price of its
computer chip is less variable in Asia than in the Pacific. A
total of 230 random purchases of ABCs computer chips were
made in the region and the following sample statistics were
determined:
Asia:
n1 = 179, S1 = 0.68;
Pacific: n2 = 51,
S2 = 0.85.
Question: Construct a 95% confidence interval for
1145/1175
12
.
22
One may use F10.025 (178, 50) 1.56 and

F10.025 (50, 178) 1.435. These are approximated from F
tables. For degrees of freedom much larger than 120 just use
the corresponding value at .

Application: CI for Ratios of Two Variances
Solution: A 95% confidence interval for 12 /22 is:

0.68
0.85
2

1
12
0.68 2
<
<
F10.025 (50, 178)
F10.025 (178, 50) 22
0.85

0.68 2
1
2
0.68 2
< 12 <
1.435
0.85
1.56
0.85
2
0.410 3<
1146/1175
12
< 0.918 4
22

Exercise: CI for the Difference Between Two Means

Consider two sets of independent random samples from two
different normal populations:

- X11 , X12 . . . , X1n1 from N 1 , 12 (sample size: n1 );

- X21 , X22 . . . , X2n2 from N 2 , 22 (sample size: n2 ).
a. Question: What is the distribution of X 1 X 2 ?

b. Question: What is the pivot for X 1 X 2 ?
c. Question: What is (approximated) 100 (1 ) % confidence
interval for (1 2 )?
1147/1175


a. Solution: Recall that (week 4) the statistic X 1 X 2 is
normally distributed with mean:

E X 1 X 2 = 1 2 ,
and variance:

2 2
Var X 1 X 2 = Var X 1 +Var X 2 2Cov X 1 , X 2 = 1 + 2 .
n1
n2
* using Cov (X 1 , X 2 ) = 0 using independent samples.
b. Solution: To construct a confidence interval for 1 2 , we
use the pivot:

X 1 X 2 (1 2 )
s
N (0, 1) .
12 22
+
n1
n2
1148/1175


We have the (decreasing) function:
g (X1 , . . . , Xn ; 1 2 ) =
s
Pr X 1 X 2
(X
1 X 2 )(1 2 )
12 /n1 +22 /n2

2
12
+ 2 z1/2 < 1 2 < X 1 X 2 +
n1
n2
22
12
+
z1/2 = 1.
n1
n2
c. Solution: An approximate 100 (1 ) % confidence interval

for (1 2 ) is given by:
s
s
2
2
1
12 22
(x 1 x 2 )
+ 2 z1/2 < 1 2 < (x 1 x 2 ) +
+ z1/2 ,
n1
n2
n1
n2
where z1/2 is the point on the standard normal for which
the probability below it is /2.
1149/1175

CI diff means when variances are equal/not equal
Previous slides assumed that the variances in the two samples

were unequal.
Sometimes, only the location should change, not the volatility.
Then, we might have more information if we combine the
information of the volatility from the two samples. This leads
to better prediction.
Be cautious when to use it!
1150/1175

Example: CI diff means when variances are equal

Consider the case where 1 = 2 = , then the random
variable:

X 1 X 2 (1 2 )
r
Z=
1
1
+
n1 n2
has an approximate standard normal distribution.
2 can be estimated by pooling the squared deviations from
the means of the two samples with the pooled estimator:
Sp2 =
(n1 1) S12 + (n2 1) S22

.
n1 + n2 2
This is unbiased, that is E[Sp2 ] = 2 .

1151/1175


Also we have:
(n1 1) S12
2 (n1 1)
and
2
Hence, the weighted average:
Y
(n1 1) S12
2
|
{z
}
=
=
Pn1 1
i=1
(n2 1) S22
2 (n2 1) .
2
(n2 1) S22
2
|
{z
}
Zi2 2 (n1 1)
Pn2 1
i=1
Zi2 2 (n2 1)
(n1 + n2 2) Sp2
2 (n1 + n2 2) ,
2
|
{z
}
=
Pn1 +n2 2
i=1
Zi2
since the sum of two chi-square random variables is another

chi-square random variable with d.f. the sum of the d.f.s.
1152/1175
Question: Find the CI for 1 2 .


Solution: Recall the t-distribution definition (week 5).
1. Use as pivot the random variable:
T =q
Z
Y
n1 +n2 2
( )
(X 1 X
2) 1 2
1/n1 +1/n2
Sp2
2

X 1 X 2 (1 2 )
r
=
tn1 +n2 2 .
1
1
Sp
+
n1 n2
Here Sp is the pooled standard deviation (see slide 1151).
2. We have (decreasing function):

. p
g (X1 , . . . , Xn ; 1 2 ) = X 1 X 2 (1 2 )
Sp 1/n1 + 1/n2 .
1153/1175


We have:

Pr
1
1
+
< 1 2
n1
n2
r

1
1
< X 1 X 2 +t1/2,n1 +n2 2 Sp
+
= 1 .
n1
n2
X 1 X 2 t1/2,n1 +n2 2 Sp
3. An approximate 100 (1 ) % confidence interval for

(1 2 ) is given by:
r
1
1
+
< 1 2
(x 1 x 2 ) t1/2,n1 +n2 2 sp
n1 n2
r
1
1
+ ,
n1 n2
where t1/2,n1 +n2 2 is the point on the t-distribution (with
n1 + n2 2 degrees of freedom) for which the probability
above it is /2.
< (x 1 x 2 ) +t1/2,n1 +n2 2 sp
1154/1175

Application: CI difference in means

An insurance company offers marine insurance.
Up to two years ago, the insurer had 150 contracts, with
sample mean claims $150 and sample standard deviation $25.
Last year, the insurer introduced a small deductible in the
contract. The number of contracts after the introduction was
25 with sample mean claims $140 and sample standard
deviation $21.
Question: What is the 95% confidence interval for the change
in the sample mean due to the introduction of the deductible?
p
Solution 1: z0.025 = 1.96, 252 /150 + 212 /25 = 4.66976.
CI: (19.15; 0.85).
1155/1175
Solution 2: t0.025,173 = 1.96, p

Sp =
p
(149 252 + 24 212 )/173 1/150 + 1/25 = 5.289183.
CI: (20.37; 0.37).

Example: Confidence Interval for proportions

Confidence interval estimates for p which is the proportion of
successes in a population can be found using the sampling
distribution of proportions.
Let X be the random variable denoting the number of
successes in an experiment of n trials.
Then, X Bin (n, p) and an estimator for p is b
p = X /n.
It is unbiased, because E [b
p ] = p.
Its variance is Var (b
p) =
p (1 p)
.
n
Application: Probability of issuing a claim.

1156/1175
Question: How to construct a pivotal quantity?

Example: Confidence Interval for proportions

1. Solution: The pivot is (using CLT):
Z=s
b
pp

b
p 1b
p
approx
N(0, 1).
n
p
p (1 b
p ) /n and
2. Thus, g (X1 , . . . , Xn ; p) = (b
p p) / b
!
r
r
b
b
p (1 b
p)
p (1 b
p)
Pr b
p
z1/2 < p < b
p+
z1/2 = 1 .
n
n
3. A 100 (1 ) % confidence interval for p is given by:
r
r
b
b
p (1 b
p)
p (1 b
p)
b
z1/2 < p < b
p+
z1/2 .
p
n
n
1157/1175

Exercise: CI for difference of proportions

For two population proportions say p1 and p2 , the statistic
(b
p1 b
p2 ) is the unbiased point estimator for the difference
between p1 and p2 .
The variance of the sampling distribution is given by the sum
of the variances as:
b2 )
bp21 p2 = Var (b
p1 b
p2 ) =Var (b
p1 ) + Var (b
p2 ) 2Cov (b
p1 , p
b2 (1 p
b2 )
b1 (1 p
b1 ) p
p
+
.
=
n1
n2
Question: Why is Cov (b
p1 , b
p2 ) = 0?
Solution: We have two different populations we draw from,
hence independent.
1158/1175
Question: Find a CI for p1 p2 .

Exercise: CI for difference of proportions

Z=
b2 ) (p1 p2 )
(b
p1 p
q
bp21 p2
approx
N(0, 1).
2. Thus,
g (X1 , . . . , Xn ; p1 p2 ) = ((b
p1 b
p2 ) (p1 p2 )) /b
p1 p2 and

Pr (b
p1 b
p2 )
bp1 p2 z1/2 < p1 p2 < (b
p1 b
p2 ) +b
p1 p2 z1/2 = 1 .
3. A 100 (1 ) % confidence interval estimate for p1 p2 is

given by:
(b
p1 b
p2 ) z1/2
bp1 p2 < p1 p2 < (b
p1 b
p2 ) +z1/2
bp1 p2 .
1159/1175

Application: CI for difference of proportions

A motor vehicle insurer insurer is interested in the difference
in claim rates between males and females. The insurer had
each year 300 males insured and 270 females insured.
The yearly
year
Males
Females
claim sizes in
2011 2010
45
46
37
42
the past five years were:

2009 2008 2007 total
31
49
45
216
41
36
32
188
Question: Is there a difference in the claim rate between

males and females?
1160/1175
216
188
bF = 1350
Solution: b
pM = 1500
= 0.144, p
= 0.139259,
b
pM b
pF = 0.004740741. Note: pM = 0.15 and pF = 0.13!
p2M pF = 0.144(10.144)
+ 0.139259(10.139259)
= 1.71 104
1500
1350
pM pF = 0.0131. Z = 0.004740741/0.01307538 =
0.362569852 = 1 0.641536883 = 0.358463117.

Example: CI for paired difference
Sometimes we are interested in the comparison of two

samples, but the samples are not independent.
Let investigate data which comes in pairs, i.e.,:
(X11 , X21 ) , (X12 , X22 ) , . . . , (X1n , X2n ) .
In the case of paired or matched data, we are interested in
analysing the differences in the sample Di = X1i X2i and
therefore estimating the difference in the mean D = 1 2 .
1161/1175


Define:
D=
n
n
1 X
1 X
Dk =
(X1k X2k ) ,
n
n
k=1
and define:
k=1
v
uP
2
u n
Dk D
u
t
SD = k=1
,
n1
which are respectively the sample mean and sample standard

deviation of the differences in the sample.
Question: How to construct a pivotal quantity?
1162/1175


,s

D D
D D
(n 1)S 2
approx
=

(n 1) t (n 1) .
2
D
SD
n
D
n
| {z } |
{z
}
2
=Z
(n1)/(n1)
2. Thus, g (D1 , . . . , Dn ; D ) = n D D /SD and

Pr D t1/2,n1 sD
n < D < D+t1/2,n1 sD
n = 1 .
3. A 100 (1 ) % confidence interval estimate for D is:

d t1/2,n1 sD
n < D < d+t1/2,n1 sD
n ,
where d is the observed sample mean of the differences and sD
is the observed sample standard deviation of the differences.
1163/1175

Application: CI for paired difference

An insurance company offers directors and officers liability
insurance (D&O) Iin Australia and China. The yearly claim
sizes in the past ten years were:
year
2011 2010 2009 2008 2007 2006 2005 2004 2003 2002
Aus
93
113
93
115 103 111 136
86
133 121
China 137 116 126 117 118 140 122 108 130 127
Differ 44 3 33 2 15 29
14
22
3
6
Moreover, the total claim size in Australia is $1,104 and in
China $1,241. The sum of the squared yearly claim size is
$12,444 in Australia and $15,489 in China and the sum of the
product of the Australian and Chinese yearly claim size is
$13,725.
1164/1175
The total difference in claim size is difference $-137 and the

sum of the yearly squared differences is $4,829.

Application: CI for paired difference

Question: Find the correlation coefficient between the claims
in Australia and China.
Solution: Cov (Aus,
Ch) =
13,725
10
1,104 1,241
10 10 = 24.66
2
1,104
= 285,
10
Var (Aus) = 1/9 12, 444 10

2
1,241
Var (Ch) = 1/9 15, 489 10 10
= 98. Hence
=
24.66
28598
= 0.15.
Question: Find the probability that the claims in China, on

average, are $10 larger than in Australia.
1165/1175
Solution: d = 137
10 = 13.7,
sd = 1/9 (4, 829 13.72 10) = 18.11.
Pr (d < 10 = 13.7 + t1,9 (18.11/3)) = 1
t1,9 = 0.61288 = 0.277561 1 = 0.722439.


UMVUEs
Cram
Consistency

Introduction

Summary
Summary

Important properties of the MLEs

Suppose the density fX (x|) satisfies certain regularity
conditions (e.g., continuous, differentiable, no parameter on
the boundaries of x, etc.) and suppose bn is the MLE of for
a random sample of size n from fX (x|). Then the bn are
asymptotically normally distributed with mean:
h i
E bn = ,
and variance:
1
Var (bn ) =
n
"
E
2 #!1
1
log (fX (x|))
=
.
n If ? ()
We write this as:

d
n bn N 0,
1166/1175
1
If ? ()

.


Note that:
"
If ? () = E
2 #
.
log (fX (x|))
It can be shown (not required for this course) that:

"
2 #
2

E
log (fX (x|))
= E
log (fX (x|)) .
2
In evaluating the variance of MLE, you can therefore use either
form of this variance formula.
1167/1175


For functions of the parameter, say g (), we can easily extend
the theorem, except there is a delta-method adjustment to the
variance.
Thus, assuming g () is a differentiable function of , then:

2

1
d
n g bn g () N 0, g 0 ()
,
If ? ()
where g 0 () is the first derivative of g with respect to the
parameter .
Using week 4 approximate method with Taylor series.
1168/1175


Asymptotic properties of MLEs work well for N(, 1), N(0, ),
Exp(), Poisson(), Bernoulli().
What about the Cauchy density:
fY (y |) =
1
(1 + (y )2 )
Its notorious for having no mean (hence no variance). What

about the asymptotic behavior of MLEs?

d
Theory above suggests: bn N , n2 .
1169/1175


Sample of size 40 drawn from
Cauchy with = 0. We compare
the Bayesian posterior density
with the normal approx., based
on first n observations for n =3,
15 and 40. First three simulated
values were 5.01, 0.40 and -8.75:
pretty spread out. Here, for large
n, posterior density more
concentrated around the mean
than the normal, but normal
must necessarily tail off more
quickly.
1170/1175


UMVUEs
Cram
Consistency

Introduction

Summary
Summary


Suppose we are interested in constructing a confidence interval
for and let b denotes its maximum likelihood estimate.
Recall that b is asymptotically normally distributed with mean:
h i
E b = ,
and variance:
"
2 #!1
1
1
=
Var b = E
log (fX (x|))
,
n
n If ? ()
"
2 #
2

where If ? () =E
log (fX (x|))
= E
log (fX (x|)) .
2
1171/1175


We then have:

p
d
n If ? () b N (0, 1) .
Using this as a pivotal quantity, we have approximately:
!
r

Pr z1/2 < n If ? b b < z1/2 1 ,
or, equivalently:
1
b z1/2 r

n If ? b
1172/1175
is an approximate 100 (1 ) % confidence interval for the

parameter .


Note that the variance Var b actually depends on the
parameter and is being estimated by replacing by b so
that:

\
Var b =
1
b
n If ? ()
The standard error is usually defined to be:

r

\
s.e. b = Var b .
1173/1175

Summary
Summary

UMVUEs
Cram
Consistency

Introduction

Summary
Summary

Summary
Summary
1. UMVUE estimator: unbiased (E[T ] = ()), minimum
variance (Var (T ) Var (T ? ) for all T ? ). If estimator T is on
CRLB then T is UMVUE.
CRLB: Var (T (X1 , X2 , . . . , Xn ))
1
n If ? ()
2. Consistent estimator:
lim Pr (|Tn | < ) = 1,
a.s.
i.e., Tn .
3. Sufficient statistic: T is sufficient for if the conditional

distribution of X1 , X2 , . . . , Xn given T = t does not depend on
for any value of t.
1174/1175

Summary
Summary
Interval estimators
Pivotal quantity method:
1. Find the pivot;
2. Find the function g (X1 , . . . , Xn ) such that
Pr(q1 < g (X1 , . . . , Xn , ) < q2 ) = 1 ;
3. The 100(1 )% confidence interval of :
g 1 (X1 , . . . , Xn ; q1 ) g 1 (X1 , . . . , Xn ; q2 ).
Properties of MLE: Asymptotically normally distributed

E[bnML ] and Var (bnML ) (nIf ? ())1 as n .
1. Asymptotically unbiased and asymptotically on the CRLB,

hence asymptotically UMVUE;
2. Asymptotically consistent;
1175/1175
3. Asymptotically sufficient.

Week 6 Annotated

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Week 6 Annotated

Uploaded by

Copyright:

Available Formats

ACTL2002/ACTL5101 Probability and Statistics: Week 6

ACTL2002/ACTL5101 Probability and Statistics

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Last five weeks

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Interval estimation (v.s. point estimates last week):

Many examples: not going to cover all in the lecture. Know

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Evaluating estimators & Interval estimation using CIs

Interval estimation using confidence intervals

Maximum Likelihood estimate

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Fisher (1922) on good estimators

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Methods to Evaluate Estimators

In the next slides we will discuss all three.

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Evaluating estimators & Interval estimation using CIs

Interval estimation using confidence intervals

Maximum Likelihood estimate

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Mean Squared Error and Bias

= Var (T ) + (E[T ] )2 = Var (T ) + (Bias (T ))2 ,

where Bias(T) =E[T ] ; * note is a constant.

An unbiased estimator has: E[T ] = .

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Example: estimation of Poisson

The MSE is given by (using unbiased):

= Var (T ) + (Bias (T )) = Var

ACTL2002/ACTL5101 Probability and Statistics: Week 6

ACTL2002/ACTL5101 Probability and Statistics: Week 6

ACTL2002/ACTL5101 Probability and Statistics: Week 6

It is clear that if this is larger than 1, then:

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Unbiased Estimators with Minimum Variance (UMVUEs)

An estimator T is said to be a best unbiased estimator of

Note that the best unbiased estimator T is often called the

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Evaluating estimators & Interval estimation using CIs

Interval estimation using confidence intervals

Maximum Likelihood estimate

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Cramer-Rao Lower Bound (CRLB)

where If ? () is the Fisher information of the parameter (see

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Cramer-Rao Lower Bound (CRLB)

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Exercise: Cramer-Rao Lower Bound (CRLB)

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Exercise: Cramer-Rao Lower Bound (CRLB)

Hence, the minimum of the variance of the estimate p

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Evaluating estimators & Interval estimation using CIs

Interval estimation using confidence intervals

Maximum Likelihood estimate

ACTL2002/ACTL5101 Probability and Statistics: Week 6

then it is a sequence of consistent estimators of (Proof

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Example: consistency of MLEs

Proof: See slide 1166.

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Evaluating estimators & Interval estimation using CIs

Interval estimation using confidence intervals

Maximum Likelihood estimate