You are on page 1of 21

# Data Analysis and Statistical Arbitrage

## Lecture 2: Point estimation

Oli Atlason
Outline
Parametric Statistical Models
Method of Moments and Maximum Likelihood
Bias and MSE
Asymptotic Properties
1
Parametric Statistical Models
Example
We observe X
1
, . . . , X
n
.
X
i
: number of alpha particles emitted by sample during the i-th time interval
of an experiment
Natural model: X
i
Poisson() and X
1
, . . . , X
n
independent.
Poisson distribution
P

(X
i
= k) = p
Poisson

(k) =

k
e

k!
, k N, > 0
Formal model
consists, for a given n, of the family
_

p
Poisson

(X
i
)
_
R
+
here parameter space is R
+
2
Parametric Statistical Models
General denition
A parametric statistical model for observations X = (X
1
, . . . , X
n
) is a family
{f

(x)}

of probability distributions.
We want to know which f

is responsible.
Point estimator: a function

(x) of the observations. This is a guess at
what was used to generate data.
Sampling distribution
(X) takes random values. It has a distribution derived from X.
3
Method of Moments
... is a simple way of nding an estimator.
k-th sample moment

k
=
1
n

(X
i
)
k
k-th population moment

k
() = E

[X
k
]
Now solve for in the system (
k
() =
k
)
k=1,...,p
.
Usually need as many equations (p) as the number of parameters.
4
Method of Moments: Examples
Poisson
For Poisson, = . Also
E

[X] =

k=0
kP(X = k) =

k=0
k

k
e

k!
=
thus the method of moments estimator is

(X) =
1
=
1
n

X
i
Normal
The parameters are two, = (,
2
). We need 2 equations. We know that
E[X] = and V ar[X] =
2
. Thus

1
= ,
2
=
2
+
2
The method of moments estimator is =
1
,

2
=
2

1
2
=
1
n

x
2
i
x
2
=
1
n

(x
2
i
x)
2
+
2
n

x
i
x x
2
x
2
=
1
n

(x
2
i
x)
2
5
Maximum Likelihood
Likelihood is the density of the data as function of . When data i.i.d.
(independent and identically distributed) with density f

, then
L() = f

(X
1
, . . . , X
n
) =
n

i=0
f

(X
i
)
Maximum likelihood estimate (MLE),

, is the that maximizes the
likelihood.
The idea is that

is the value of for which the sample is most likely.
Finding MLE
Usually use calculus (L

() = 0)
Remember to check values at boundaries and second derivatives
6
MLE: Example
Poisson
L() =
n

i=0

X
i
e

X
i
!
take logs (argmax
x
log f(x) = argmax
x
f(x))
l() = log L() =
n

i=0
(X
i
log log X
i
!)
l

() =
1

X
i
n
i.e. the MLE is

=
1
n

X
i
.
In this example, method of moments and MLE give same answer.
Note, we dont really need X = (X
1
, . . . , X
n
) here, only the sum.
T(X) =

X
i
is called a sucient statistic.
7
Bias and MSE

## (X) is an estimator of . Then

Bias

= E

(X)]
MSE

= E

[(

(X) )
2
]
Note
-

(X) is random (function of sample)
- Bias, MSE are non-random
- Bias, MSE are functions of
When E

## = , the estimator is called unbiased.

8
Bias Example: Sample variance
For an i.i.d. sample X
1
, . . . , X
n
from a N(,
2
), the method of moments
and maximum likelihood estimators coincide and are

2
=
1
n

i
(X
i

X)
2
this estimator is biased:
E[
2
] =
1
n
E
_

(X
2
i
2X
i

X +

X
2
)
_
=
1
n
E
_

X
2
i
n

X
2
_
= E[X
2
1
] E[

X
2
]
= (
2
+
2
)
_

2
n
+
2
_
=
2
n 1
n
which implies
Bias = E[
2
]
2
=

2
n
9
Bias Example: Sample variance
From last lecture, sample variance is
S
2
n
=
1
n 1

i
(X
i

X)
2
by preceding derivation, S
2
n
is an unbiased estimate of
2
.
Frequently method of moments estimators and MLEs are biased and can be
made slightly better by a small change.
10
We would like no bias and low variance.
Often there is a choice:
- low bias, high variance
- some bias, some variance
- high bias, low variance
Illustration
11
Mean Squared Error
The MSE combines the bias and the variance of the estimator.
MSE

= E
_
(

)
2
_
= E
_
_

E[

] + E[

]
_
2
_
= E
_
_

E[

]
_
2
_
+ E
_
(E[

] )
2
_
+ 2E
_

E[

]
_
E
_
E[

]
_
= V ar[

] + Bias
2
Bias and variance sometimes called precision and accuracy.
12
MSE Example: Sample Variance
To compute the MSE of S
2
n
, recall that under independence and normality

i
(X
i

X)
2

2

2
n1
which has mean n 1 and variance 2(n 1). Thus
MSE
S
2
n
= Bias
2
+ V ar[S
2
n
] = V ar
_
1
n 1

i
(X
i

X)
2
_
=
_

2
n 1
_
2
2(n 1) =
2
4
n 1
for the MLE, however
MSE

2 = Bias
2
+ V ar[
2
] =
_

2
n
_
2
+ V ar
_
1
n

i
(X
i

X)
2
_
=

4
n
2
+
4
2(n 1)
n
2
=
4
2n 1
n
2
< MSE
S
2
n
13
MSE Example: Sample Variance
MSE perhaps not natural for scale parameters. In fact, minimum MSE
estimator is
1
n + 1

i
(X
i

X)
2
Asymptotically identical (lim
n+1
n1
= 1).
Method of moments and MLE estimators are rarely unbiased.
However, MLE has nice asymptotic properties.
14
Example: Gamma distribution, one observation
Recall that (x) =
_

0
t
x1
e
tx
dt.
Important property: () = ( 1)( 1) for > 0.
Gamma distribution with parameters , > 0:
f
,
(x) =

()
x
1
e
x
, x > 0
We have one observation, X R
+
, we know . Find MLE for ; compute bias
and MSE.
Solution:
l() = log() log(()) + ( 1) log(x) x
l

() =

x = 0
our candidate is

=

x
.
15
Example: Gamma distribution, one observation (2)
Check:
l

() =

2
< 0
and

x
> 0, i.e.

is in parameter space. Moments:
E

] =
_

0

()
x
1
e
x
dx
=
( 1)
()
_

0

1
( 1)
x
(1)1
e
x
dx
=

1
and in same way
E

2
] =
2

2
( 2)
()
=
2

2
( 1)( 2)
16
Example: Gamma distribution, one observation (3)
E

] =

1
and E

2
] =
2
2
(1)(2)
. so
Bias = E

=
_

1
1
_
=

1
MSE = Bias
2
+ V ar[

]
=
2
_
1
1
_
2
+ E[

2
] E[

]
2
=
2
_
1
( 1)
2
+

2
( 1)( 2)

2
( 1)
2
_
=
2
+ 2
( 1)( 2)
17
Properties of MLE
Invariance
MLEs are invariant under transformations.
Ex: In Poisson model, =
1

## measures waiting time between observations. By

invariance,

=
1

=
n

X
i
.
Consistency

n
is the MLE obtained from X
1
, . . . , X
n
. Then, under minimal technical
conditions,

n
P

## Compare with statements of:

- unbiasedness ( E[

] = ),
- strong consistency,

n
a.s..
Method of moments estimators often also consistent.
18
Properties of MLE (2)
Theorem (Cramer-Rao) Under conditions, notably

(X) =
_

f(x|)dx,
V ar

(X)]
(1 + Bias

())
2
nI()
.
where
I() = E

_
_

log f(X|)
_
2
_
= E

_

2

2
log f(x|)
_
is the Fisher information of f

.
Theorem (Fisher) MLE achieves bound asymptotically and

n
_

_
D
N
_
0,
1
I()
_
MLE is asymptotically ecient, i.e. attains lowest possible variance.
19
Example: Gamma, n observations
Let X
1
, . . . , X
n
be i.i.d. Gamma(, ), and unknown.
L(, ) =
n

i=1
f
,
(x
i
) =
n

i=1

()
x
1
i
e
x
i
l(, ) =

## (log log () + ( 1) log x

i
x
i
)
l

(, ) = nlog +

log x
i
n

()
()
l

(, ) = n

x
i
Last equation:

=
n

x
i
. Must solve l

(, ) = 0 numerically.
Hard to compute small sample bias and MSE.
Use asymptotic methods.
20
Example: Gamma, n observations (2)
Fisher information: nI() = E[l

()].
l
,
(, ) = n
_

()
()
_

= n

()()

()

()
(())
2
l
,
(, ) =
n

l
,
(, ) = n

2
we can use e.g. the approximation

n(

) N
_
0,

2

_
i.e.

N
_
,

2
n
_
21