You are on page 1of 21

Data Analysis and Statistical Arbitrage

Lecture 2: Point estimation


Oli Atlason
Outline
Parametric Statistical Models
Method of Moments and Maximum Likelihood
Bias and MSE
Asymptotic Properties
1
Parametric Statistical Models
Example
We observe X
1
, . . . , X
n
.
X
i
: number of alpha particles emitted by sample during the i-th time interval
of an experiment
Natural model: X
i
Poisson() and X
1
, . . . , X
n
independent.
Poisson distribution
P

(X
i
= k) = p
Poisson

(k) =

k
e

k!
, k N, > 0
Formal model
consists, for a given n, of the family
_

p
Poisson

(X
i
)
_
R
+
here parameter space is R
+
2
Parametric Statistical Models
General denition
A parametric statistical model for observations X = (X
1
, . . . , X
n
) is a family
{f

(x)}

of probability distributions.
We want to know which f

is responsible.
Point estimator: a function

(x) of the observations. This is a guess at
what was used to generate data.
Sampling distribution
(X) takes random values. It has a distribution derived from X.
3
Method of Moments
... is a simple way of nding an estimator.
k-th sample moment

k
=
1
n

(X
i
)
k
k-th population moment

k
() = E

[X
k
]
Now solve for in the system (
k
() =
k
)
k=1,...,p
.
Usually need as many equations (p) as the number of parameters.
4
Method of Moments: Examples
Poisson
For Poisson, = . Also
E

[X] =

k=0
kP(X = k) =

k=0
k

k
e

k!
=
thus the method of moments estimator is

(X) =
1
=
1
n

X
i
Normal
The parameters are two, = (,
2
). We need 2 equations. We know that
E[X] = and V ar[X] =
2
. Thus

1
= ,
2
=
2
+
2
The method of moments estimator is =
1
,

2
=
2

1
2
=
1
n

x
2
i
x
2
=
1
n

(x
2
i
x)
2
+
2
n

x
i
x x
2
x
2
=
1
n

(x
2
i
x)
2
5
Maximum Likelihood
Likelihood is the density of the data as function of . When data i.i.d.
(independent and identically distributed) with density f

, then
L() = f

(X
1
, . . . , X
n
) =
n

i=0
f

(X
i
)
Maximum likelihood estimate (MLE),

, is the that maximizes the
likelihood.
The idea is that

is the value of for which the sample is most likely.
Finding MLE
Helpful to take log
Usually use calculus (L

() = 0)
Remember to check values at boundaries and second derivatives
6
MLE: Example
Poisson
L() =
n

i=0

X
i
e

X
i
!
take logs (argmax
x
log f(x) = argmax
x
f(x))
l() = log L() =
n

i=0
(X
i
log log X
i
!)
l

() =
1

X
i
n
i.e. the MLE is

=
1
n

X
i
.
In this example, method of moments and MLE give same answer.
Note, we dont really need X = (X
1
, . . . , X
n
) here, only the sum.
T(X) =

X
i
is called a sucient statistic.
7
Bias and MSE

(X) is an estimator of . Then


Bias

= E

(X)]
MSE

= E

[(

(X) )
2
]
Note
-

(X) is random (function of sample)
- Bias, MSE are non-random
- Bias, MSE are functions of
When E

= , the estimator is called unbiased.


8
Bias Example: Sample variance
For an i.i.d. sample X
1
, . . . , X
n
from a N(,
2
), the method of moments
and maximum likelihood estimators coincide and are

2
=
1
n

i
(X
i


X)
2
this estimator is biased:
E[
2
] =
1
n
E
_

(X
2
i
2X
i

X +

X
2
)
_
=
1
n
E
_

X
2
i
n

X
2
_
= E[X
2
1
] E[

X
2
]
= (
2
+
2
)
_

2
n
+
2
_
=
2
n 1
n
which implies
Bias = E[
2
]
2
=

2
n
9
Bias Example: Sample variance
From last lecture, sample variance is
S
2
n
=
1
n 1

i
(X
i


X)
2
by preceding derivation, S
2
n
is an unbiased estimate of
2
.
Frequently method of moments estimators and MLEs are biased and can be
made slightly better by a small change.
10
Bias-variance trade-o
We would like no bias and low variance.
Often there is a choice:
- low bias, high variance
- some bias, some variance
- high bias, low variance
Illustration
11
Mean Squared Error
The MSE combines the bias and the variance of the estimator.
MSE

= E
_
(

)
2
_
= E
_
_

E[

] + E[

]
_
2
_
= E
_
_

E[

]
_
2
_
+ E
_
(E[

] )
2
_
+ 2E
_

E[

]
_
E
_
E[

]
_
= V ar[

] + Bias
2
Bias and variance sometimes called precision and accuracy.
12
MSE Example: Sample Variance
To compute the MSE of S
2
n
, recall that under independence and normality

i
(X
i


X)
2

2

2
n1
which has mean n 1 and variance 2(n 1). Thus
MSE
S
2
n
= Bias
2
+ V ar[S
2
n
] = V ar
_
1
n 1

i
(X
i


X)
2
_
=
_

2
n 1
_
2
2(n 1) =
2
4
n 1
for the MLE, however
MSE

2 = Bias
2
+ V ar[
2
] =
_

2
n
_
2
+ V ar
_
1
n

i
(X
i


X)
2
_
=

4
n
2
+
4
2(n 1)
n
2
=
4
2n 1
n
2
< MSE
S
2
n
13
MSE Example: Sample Variance
MSE perhaps not natural for scale parameters. In fact, minimum MSE
estimator is
1
n + 1

i
(X
i


X)
2
Asymptotically identical (lim
n+1
n1
= 1).
Method of moments and MLE estimators are rarely unbiased.
However, MLE has nice asymptotic properties.
14
Example: Gamma distribution, one observation
Recall that (x) =
_

0
t
x1
e
tx
dt.
Important property: () = ( 1)( 1) for > 0.
Gamma distribution with parameters , > 0:
f
,
(x) =

()
x
1
e
x
, x > 0
We have one observation, X R
+
, we know . Find MLE for ; compute bias
and MSE.
Solution:
l() = log() log(()) + ( 1) log(x) x
l

() =

x = 0
our candidate is

=

x
.
15
Example: Gamma distribution, one observation (2)
Check:
l

() =

2
< 0
and

x
> 0, i.e.

is in parameter space. Moments:
E

] =
_

0

()
x
1
e
x
dx
=
( 1)
()
_

0

1
( 1)
x
(1)1
e
x
dx
=

1
and in same way
E

2
] =
2

2
( 2)
()
=
2

2
( 1)( 2)
16
Example: Gamma distribution, one observation (3)
E

] =

1
and E

2
] =
2
2
(1)(2)
. so
Bias = E

=
_

1
1
_
=

1
MSE = Bias
2
+ V ar[

]
=
2
_
1
1
_
2
+ E[

2
] E[

]
2
=
2
_
1
( 1)
2
+

2
( 1)( 2)


2
( 1)
2
_
=
2
+ 2
( 1)( 2)
17
Properties of MLE
Invariance
MLEs are invariant under transformations.
Ex: In Poisson model, =
1

measures waiting time between observations. By


invariance,

=
1

=
n

X
i
.
Consistency

n
is the MLE obtained from X
1
, . . . , X
n
. Then, under minimal technical
conditions,

n
P

Compare with statements of:


- unbiasedness ( E[

] = ),
- strong consistency,

n
a.s..
Method of moments estimators often also consistent.
18
Properties of MLE (2)
Theorem (Cramer-Rao) Under conditions, notably

(X) =
_

f(x|)dx,
V ar

(X)]
(1 + Bias

())
2
nI()
.
where
I() = E

_
_

log f(X|)
_
2
_
= E

_

2

2
log f(x|)
_
is the Fisher information of f

.
Theorem (Fisher) MLE achieves bound asymptotically and

n
_


_
D
N
_
0,
1
I()
_
MLE is asymptotically ecient, i.e. attains lowest possible variance.
19
Example: Gamma, n observations
Let X
1
, . . . , X
n
be i.i.d. Gamma(, ), and unknown.
L(, ) =
n

i=1
f
,
(x
i
) =
n

i=1

()
x
1
i
e
x
i
l(, ) =

(log log () + ( 1) log x


i
x
i
)
l

(, ) = nlog +

log x
i
n

()
()
l

(, ) = n

x
i
Last equation:

=
n

x
i
. Must solve l

(, ) = 0 numerically.
Hard to compute small sample bias and MSE.
Use asymptotic methods.
20
Example: Gamma, n observations (2)
Fisher information: nI() = E[l

()].
l
,
(, ) = n
_

()
()
_

= n

()()

()

()
(())
2
l
,
(, ) =
n

l
,
(, ) = n

2
we can use e.g. the approximation

n(

) N
_
0,

2

_
i.e.

N
_
,

2
n
_
21