(X
i
= k) = p
Poisson
(k) =
k
e
k!
, k N, > 0
Formal model
consists, for a given n, of the family
_
p
Poisson
(X
i
)
_
R
+
here parameter space is R
+
2
Parametric Statistical Models
General denition
A parametric statistical model for observations X = (X
1
, . . . , X
n
) is a family
{f
(x)}
of probability distributions.
We want to know which f
is responsible.
Point estimator: a function
(x) of the observations. This is a guess at
what was used to generate data.
Sampling distribution
(X) takes random values. It has a distribution derived from X.
3
Method of Moments
... is a simple way of nding an estimator.
kth sample moment
k
=
1
n
(X
i
)
k
kth population moment
k
() = E
[X
k
]
Now solve for in the system (
k
() =
k
)
k=1,...,p
.
Usually need as many equations (p) as the number of parameters.
4
Method of Moments: Examples
Poisson
For Poisson, = . Also
E
[X] =
k=0
kP(X = k) =
k=0
k
k
e
k!
=
thus the method of moments estimator is
(X) =
1
=
1
n
X
i
Normal
The parameters are two, = (,
2
). We need 2 equations. We know that
E[X] = and V ar[X] =
2
. Thus
1
= ,
2
=
2
+
2
The method of moments estimator is =
1
,
2
=
2
1
2
=
1
n
x
2
i
x
2
=
1
n
(x
2
i
x)
2
+
2
n
x
i
x x
2
x
2
=
1
n
(x
2
i
x)
2
5
Maximum Likelihood
Likelihood is the density of the data as function of . When data i.i.d.
(independent and identically distributed) with density f
, then
L() = f
(X
1
, . . . , X
n
) =
n
i=0
f
(X
i
)
Maximum likelihood estimate (MLE),
, is the that maximizes the
likelihood.
The idea is that
is the value of for which the sample is most likely.
Finding MLE
Helpful to take log
Usually use calculus (L
() = 0)
Remember to check values at boundaries and second derivatives
6
MLE: Example
Poisson
L() =
n
i=0
X
i
e
X
i
!
take logs (argmax
x
log f(x) = argmax
x
f(x))
l() = log L() =
n
i=0
(X
i
log log X
i
!)
l
() =
1
X
i
n
i.e. the MLE is
=
1
n
X
i
.
In this example, method of moments and MLE give same answer.
Note, we dont really need X = (X
1
, . . . , X
n
) here, only the sum.
T(X) =
X
i
is called a sucient statistic.
7
Bias and MSE
= E
(X)]
MSE
= E
[(
(X) )
2
]
Note

(X) is random (function of sample)
 Bias, MSE are nonrandom
 Bias, MSE are functions of
When E
i
(X
i
X)
2
this estimator is biased:
E[
2
] =
1
n
E
_
(X
2
i
2X
i
X +
X
2
)
_
=
1
n
E
_
X
2
i
n
X
2
_
= E[X
2
1
] E[
X
2
]
= (
2
+
2
)
_
2
n
+
2
_
=
2
n 1
n
which implies
Bias = E[
2
]
2
=
2
n
9
Bias Example: Sample variance
From last lecture, sample variance is
S
2
n
=
1
n 1
i
(X
i
X)
2
by preceding derivation, S
2
n
is an unbiased estimate of
2
.
Frequently method of moments estimators and MLEs are biased and can be
made slightly better by a small change.
10
Biasvariance tradeo
We would like no bias and low variance.
Often there is a choice:
 low bias, high variance
 some bias, some variance
 high bias, low variance
Illustration
11
Mean Squared Error
The MSE combines the bias and the variance of the estimator.
MSE
= E
_
(
)
2
_
= E
_
_
E[
] + E[
]
_
2
_
= E
_
_
E[
]
_
2
_
+ E
_
(E[
] )
2
_
+ 2E
_
E[
]
_
E
_
E[
]
_
= V ar[
] + Bias
2
Bias and variance sometimes called precision and accuracy.
12
MSE Example: Sample Variance
To compute the MSE of S
2
n
, recall that under independence and normality
i
(X
i
X)
2
2
2
n1
which has mean n 1 and variance 2(n 1). Thus
MSE
S
2
n
= Bias
2
+ V ar[S
2
n
] = V ar
_
1
n 1
i
(X
i
X)
2
_
=
_
2
n 1
_
2
2(n 1) =
2
4
n 1
for the MLE, however
MSE
2 = Bias
2
+ V ar[
2
] =
_
2
n
_
2
+ V ar
_
1
n
i
(X
i
X)
2
_
=
4
n
2
+
4
2(n 1)
n
2
=
4
2n 1
n
2
< MSE
S
2
n
13
MSE Example: Sample Variance
MSE perhaps not natural for scale parameters. In fact, minimum MSE
estimator is
1
n + 1
i
(X
i
X)
2
Asymptotically identical (lim
n+1
n1
= 1).
Method of moments and MLE estimators are rarely unbiased.
However, MLE has nice asymptotic properties.
14
Example: Gamma distribution, one observation
Recall that (x) =
_
0
t
x1
e
tx
dt.
Important property: () = ( 1)( 1) for > 0.
Gamma distribution with parameters , > 0:
f
,
(x) =
()
x
1
e
x
, x > 0
We have one observation, X R
+
, we know . Find MLE for ; compute bias
and MSE.
Solution:
l() = log() log(()) + ( 1) log(x) x
l
() =
x = 0
our candidate is
=
x
.
15
Example: Gamma distribution, one observation (2)
Check:
l
() =
2
< 0
and
x
> 0, i.e.
is in parameter space. Moments:
E
] =
_
0
()
x
1
e
x
dx
=
( 1)
()
_
0
1
( 1)
x
(1)1
e
x
dx
=
1
and in same way
E
2
] =
2
2
( 2)
()
=
2
2
( 1)( 2)
16
Example: Gamma distribution, one observation (3)
E
] =
1
and E
2
] =
2
2
(1)(2)
. so
Bias = E
=
_
1
1
_
=
1
MSE = Bias
2
+ V ar[
]
=
2
_
1
1
_
2
+ E[
2
] E[
]
2
=
2
_
1
( 1)
2
+
2
( 1)( 2)
2
( 1)
2
_
=
2
+ 2
( 1)( 2)
17
Properties of MLE
Invariance
MLEs are invariant under transformations.
Ex: In Poisson model, =
1
=
n
X
i
.
Consistency
n
is the MLE obtained from X
1
, . . . , X
n
. Then, under minimal technical
conditions,
n
P
] = ),
 strong consistency,
n
a.s..
Method of moments estimators often also consistent.
18
Properties of MLE (2)
Theorem (CramerRao) Under conditions, notably
(X) =
_
f(x)dx,
V ar
(X)]
(1 + Bias
())
2
nI()
.
where
I() = E
_
_
log f(X)
_
2
_
= E
_
2
2
log f(x)
_
is the Fisher information of f
.
Theorem (Fisher) MLE achieves bound asymptotically and
n
_
_
D
N
_
0,
1
I()
_
MLE is asymptotically ecient, i.e. attains lowest possible variance.
19
Example: Gamma, n observations
Let X
1
, . . . , X
n
be i.i.d. Gamma(, ), and unknown.
L(, ) =
n
i=1
f
,
(x
i
) =
n
i=1
()
x
1
i
e
x
i
l(, ) =
(, ) = nlog +
log x
i
n
()
()
l
(, ) = n
x
i
Last equation:
=
n
x
i
. Must solve l
(, ) = 0 numerically.
Hard to compute small sample bias and MSE.
Use asymptotic methods.
20
Example: Gamma, n observations (2)
Fisher information: nI() = E[l
()].
l
,
(, ) = n
_
()
()
_
= n
()()
()
()
(())
2
l
,
(, ) =
n
l
,
(, ) = n
2
we can use e.g. the approximation
n(
) N
_
0,
2
_
i.e.
N
_
,
2
n
_
21