You are on page 1of 77

1

INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION


This sequence introduces the principle of maximum likelihood estimation and illustrates it
with some simple examples.
L
p
0.0
0.1
0.2
0.3
0.4
0 1 2 3 4 5 6 7 8
0.00
0.02
0.04
0.06
0 1 2 3 4 5 6 7 8




2
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
Suppose that you have a normally-distributed random variable X with unknown population
mean and standard deviation o, and that you have a sample of two observations, 4 and 6.
For the time being, we will assume that o is equal to 1.
L
p
0.0
0.1
0.2
0.3
0.4
0 1 2 3 4 5 6 7 8
0.00
0.02
0.04
0.06
0 1 2 3 4 5 6 7 8




3
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
Suppose initially you consider the hypothesis = 3.5. Under this hypothesis the probability
density at 4 would be 0.3521 and that at 6 would be 0.0175.
L
p
0.0
0.1
0.2
0.3
0.4
0 1 2 3 4 5 6 7 8
0.00
0.02
0.04
0.06
0 1 2 3 4 5 6 7 8


p(4) p(6)
3.5 0.3521 0.0175




0.3521
0.0175


4
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
The joint probability density, shown in the bottom chart, is the product of these, 0.0062.

p(4) p(6) L
3.5 0.3521 0.0175 0.0062




L
p
0.0
0.1
0.2
0.3
0.4
0 1 2 3 4 5 6 7 8
0.00
0.02
0.04
0.06
0 1 2 3 4 5 6 7 8


0.3521
0.0175


5
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
Next consider the hypothesis = 4.0. Under this hypothesis the probability densities
associated with the two observations are 0.3989 and 0.0540, and the joint probability
density is 0.0215.

p(4) p(6) L
3.5 0.3521 0.0175 0.0062
4.0 0.3989 0.0540 0.0215



L
p
0.00
0.02
0.04
0.06
0 1 2 3 4 5 6 7 8
0.0
0.1
0.2
0.3
0.4
0 1 2 3 4 5 6 7 8

0.3989
0.0540


6
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
Under the hypothesis = 4.5, the probability densities are 0.3521 and 0.1295, and the joint
probability density is 0.0456.

p(4) p(6) L
3.5 0.3521 0.0175 0.0062
4.0 0.3989 0.0540 0.0215
4.5 0.3521 0.1295 0.0456


L
p
0.00
0.02
0.04
0.06
0 1 2 3 4 5 6 7 8
0.0
0.1
0.2
0.3
0.4
0 1 2 3 4 5 6 7 8

0.3521
0.1295


7
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
Under the hypothesis = 5.0, the probability densities are both 0.2420 and the joint
probability density is 0.0585.

p(4) p(6) L
3.5 0.3521 0.0175 0.0062
4.0 0.3989 0.0540 0.0215
4.5 0.3521 0.1295 0.0456
5.0 0.2420 0.2420 0.0585

L
p
0.00
0.02
0.04
0.06
0 1 2 3 4 5 6 7 8
0.0
0.1
0.2
0.3
0.4
0 1 2 3 4 5 6 7 8

0.2420 0.2420


8
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
Under the hypothesis = 5.5, the probability densities are 0.1295 and 0.3521 and the joint
probability density is 0.0456.
p(4) p(6) L
3.5 0.3521 0.0175 0.0062
4.0 0.3989 0.0540 0.0215
4.5 0.3521 0.1295 0.0456
5.0 0.2420 0.2420 0.0585
5.5 0.1295 0.3521 0.0456
0.0
0.1
0.2
0.3
0.4
0 1 2 3 4 5 6 7 8
0.00
0.02
0.04
0.06
0 1 2 3 4 5 6 7 8
L
p


0.3521
0.1295


9
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
The complete joint density function for all values of has now been plotted in the lower
diagram. We see that it peaks at = 5.
p(4) p(6) L
3.5 0.3521 0.0175 0.0062
4.0 0.3989 0.0540 0.0215
4.5 0.3521 0.1295 0.0456
5.0 0.2420 0.2420 0.0585
5.5 0.1295 0.3521 0.0456
0.00
0.02
0.04
0.06
0 1 2 3 4 5 6 7 8
0.0
0.1
0.2
0.3
0.4
0 1 2 3 4 5 6 7 8
p
L


0.1295
0.3521


10
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
Now we will look at the mathematics of the example. If X is normally distributed with mean
and standard deviation o, its density function is as shown.

2
2
1
2
1
) (
|
.
|

\
|

=
o

t o
X
e X f


11
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
For the time being, we are assuming o is equal to 1, so the density function simplifies to the
second expression.

( )
2
2
1
2
1
) (

t

=
X
e X f
2
2
1
2
1
) (
|
.
|

\
|

=
o

t o
X
e X f


12
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
Hence we obtain the probability densities for the observations where X = 4 and X = 6.

( )
2
4
2
1
2
1
) 4 (

t

= e f
( )
2
6
2
1
2
1
) 6 (

t

= e f
( )
2
2
1
2
1
) (

t

=
X
e X f
2
2
1
2
1
) (
|
.
|

\
|

=
o

t o
X
e X f


13
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
The joint probability density for the two observations in the sample is just the product of
their individual densities.

( )
2
4
2
1
2
1
) 4 (

t

= e f
( )
2
6
2
1
2
1
) 6 (

t

= e f
( ) ( )
|
|
.
|

\
|
|
|
.
|

\
|
=

2
6
2
1
2
4
2
1
2
1
2
1
density joint

t t
e e
( )
2
2
1
2
1
) (

t

=
X
e X f
2
2
1
2
1
) (
|
.
|

\
|

=
o

t o
X
e X f


14
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
In maximum likelihood estimation we choose as our estimate of the value that gives us the
greatest joint density for the observations in our sample. This value is associated with the
greatest probability, or maximum likelihood, of obtaining the observations in the sample.
2
2
1
2
1
) (
|
.
|

\
|

=
o

t o
X
e X f
( )
2
2
1
2
1
) (

t

=
X
e X f
( )
2
4
2
1
2
1
) 4 (

t

= e f
( )
2
6
2
1
2
1
) 6 (

t

= e f
( ) ( )
|
|
.
|

\
|
|
|
.
|

\
|
=

2
6
2
1
2
4
2
1
2
1
2
1
density joint

t t
e e


15
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
In the graphical treatment we saw that this occurs when is equal to 5. We will prove this
must be the case mathematically.
p(4) p(6) L
3.5 0.3521 0.0175 0.0062
4.0 0.3989 0.0540 0.0215
4.5 0.3521 0.1295 0.0456
5.0 0.2420 0.2420 0.0585
5.5 0.1295 0.3521 0.0456
0.00
0.02
0.04
0.06
0 1 2 3 4 5 6 7 8
0.0
0.1
0.2
0.3
0.4
0 1 2 3 4 5 6 7 8
p
L


0.1295
0.3521


16
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
To do this, we treat the sample values X = 4 and X = 6 as given and we use the calculus to
determine the value of that maximizes the expression.
( ) ( )
|
|
.
|

\
|
|
|
.
|

\
|
=

2
6
2
1
2
4
2
1
2
1
2
1
) 6 , 4 | (

t t
e e L


17
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
When it is regarded in this way, the expression is called the likelihood function for , given
the sample observations 4 and 6. This is the meaning of L( | 4,6).
( ) ( )
|
|
.
|

\
|
|
|
.
|

\
|
=

2
6
2
1
2
4
2
1
2
1
2
1
) 6 , 4 | (

t t
e e L


18
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
To maximize the expression, we could differentiate with respect to and set the result equal
to 0. This would be a little laborious. Fortunately, we can simplify the problem with a trick.
( ) ( )
|
|
.
|

\
|
|
|
.
|

\
|
=

2
6
2
1
2
4
2
1
2
1
2
1
) 6 , 4 | (

t t
e e L


19
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
log L is a monotonically increasing function of L (meaning that log L increases if L
increases and decreases if L decreases).
( ) ( )
|
|
.
|

\
|
|
|
.
|

\
|
=

2
6
2
1
2
4
2
1
2
1
2
1
) 6 , 4 | (

t t
e e L
( ) ( )
( ) ( )
( ) ( )
( ) ( )
2 2
2
6
2
1
2
4
2
1
2
6
2
1
2
4
2
1
2
6
2
1
2
4
2
1
6
2
1
4
2
1
2
1
log 2
log
2
1
log log
2
1
log
2
1
log
2
1
log
2
1
2
1
log log

t
t t
t t
t t




|
.
|

\
|
=
|
|
.
|

\
|
+
|
.
|

\
|
+
|
|
.
|

\
|
+
|
.
|

\
|
=
|
|
.
|

\
|
+
|
|
.
|

\
|
=
(
(

|
|
.
|

\
|
|
|
.
|

\
|
=



e e
e e
e e L


20
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
It follows that the value of which maximizes log L is the same as the one that maximizes L.
As it so happens, it is easier to maximize log L with respect to than it is to maximize L.
( ) ( )
|
|
.
|

\
|
|
|
.
|

\
|
=

2
6
2
1
2
4
2
1
2
1
2
1
) 6 , 4 | (

t t
e e L
( ) ( )
( ) ( )
( ) ( )
( ) ( )
2 2
2
6
2
1
2
4
2
1
2
6
2
1
2
4
2
1
2
6
2
1
2
4
2
1
6
2
1
4
2
1
2
1
log 2
log
2
1
log log
2
1
log
2
1
log
2
1
log
2
1
2
1
log log

t
t t
t t
t t




|
.
|

\
|
=
|
|
.
|

\
|
+
|
.
|

\
|
+
|
|
.
|

\
|
+
|
.
|

\
|
=
|
|
.
|

\
|
+
|
|
.
|

\
|
=
(
(

|
|
.
|

\
|
|
|
.
|

\
|
=



e e
e e
e e L


21
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
The logarithm of the product of the density functions can be decomposed as the sum of
their logarithms.

( ) ( )
|
|
.
|

\
|
|
|
.
|

\
|
=

2
6
2
1
2
4
2
1
2
1
2
1
) 6 , 4 | (

t t
e e L
( ) ( )
( ) ( )
( ) ( )
( ) ( )
2 2
2
6
2
1
2
4
2
1
2
6
2
1
2
4
2
1
2
6
2
1
2
4
2
1
6
2
1
4
2
1
2
1
log 2
log
2
1
log log
2
1
log
2
1
log
2
1
log
2
1
2
1
log log

t
t t
t t
t t




|
.
|

\
|
=
|
|
.
|

\
|
+
|
.
|

\
|
+
|
|
.
|

\
|
+
|
.
|

\
|
=
|
|
.
|

\
|
+
|
|
.
|

\
|
=
(
(

|
|
.
|

\
|
|
|
.
|

\
|
=



e e
e e
e e L


22
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
Using the product rule a second time, we can decompose each term as shown.
( ) ( )
|
|
.
|

\
|
|
|
.
|

\
|
=

2
6
2
1
2
4
2
1
2
1
2
1
) 6 , 4 | (

t t
e e L
( ) ( )
( ) ( )
( ) ( )
( ) ( )
2 2
2
6
2
1
2
4
2
1
2
6
2
1
2
4
2
1
2
6
2
1
2
4
2
1
6
2
1
4
2
1
2
1
log 2
log
2
1
log log
2
1
log
2
1
log
2
1
log
2
1
2
1
log log

t
t t
t t
t t




|
.
|

\
|
=
|
|
.
|

\
|
+
|
.
|

\
|
+
|
|
.
|

\
|
+
|
.
|

\
|
=
|
|
.
|

\
|
+
|
|
.
|

\
|
=
(
(

|
|
.
|

\
|
|
|
.
|

\
|
=



e e
e e
e e L


23
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
Now one of the basic rules for manipulating logarithms allows us to rewrite the second term
as shown.
( ) ( )
|
|
.
|

\
|
|
|
.
|

\
|
=

2
6
2
1
2
4
2
1
2
1
2
1
) 6 , 4 | (

t t
e e L
( ) ( )
( ) ( )
( ) ( )
( ) ( )
2 2
2
6
2
1
2
4
2
1
2
6
2
1
2
4
2
1
2
6
2
1
2
4
2
1
6
2
1
4
2
1
2
1
log 2
log
2
1
log log
2
1
log
2
1
log
2
1
log
2
1
2
1
log log

t
t t
t t
t t




|
.
|

\
|
=
|
|
.
|

\
|
+
|
.
|

\
|
+
|
|
.
|

\
|
+
|
.
|

\
|
=
|
|
.
|

\
|
+
|
|
.
|

\
|
=
(
(

|
|
.
|

\
|
|
|
.
|

\
|
=



e e
e e
e e L
a b a
b
log log =
2
2
) 4 (
2
1
) 4 (
2
1
log ) 4 (
2
1
log
2
=
=

X
e X e
X


24
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
log e is equal to 1, another basic logarithm result. (Remember, as always, we are using
natural logarithms, that is, logarithms to base e.)
( ) ( )
|
|
.
|

\
|
|
|
.
|

\
|
=

2
6
2
1
2
4
2
1
2
1
2
1
) 6 , 4 | (

t t
e e L
( ) ( )
( ) ( )
( ) ( )
( ) ( )
2 2
2
6
2
1
2
4
2
1
2
6
2
1
2
4
2
1
2
6
2
1
2
4
2
1
6
2
1
4
2
1
2
1
log 2
log
2
1
log log
2
1
log
2
1
log
2
1
log
2
1
2
1
log log

t
t t
t t
t t




|
.
|

\
|
=
|
|
.
|

\
|
+
|
.
|

\
|
+
|
|
.
|

\
|
+
|
.
|

\
|
=
|
|
.
|

\
|
+
|
|
.
|

\
|
=
(
(

|
|
.
|

\
|
|
|
.
|

\
|
=



e e
e e
e e L
a b a
b
log log =
2
2
) 4 (
2
1
) 4 (
2
1
log ) 4 (
2
1
log
2
=
=

X
e X e
X


25
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
Hence the second term reduces to a simple quadratic in X. And so does the fourth.
( ) ( )
|
|
.
|

\
|
|
|
.
|

\
|
=

2
6
2
1
2
4
2
1
2
1
2
1
) 6 , 4 | (

t t
e e L
( ) ( )
( ) ( )
( ) ( )
( ) ( )
2 2
2
6
2
1
2
4
2
1
2
6
2
1
2
4
2
1
2
6
2
1
2
4
2
1
6
2
1
4
2
1
2
1
log 2
log
2
1
log log
2
1
log
2
1
log
2
1
log
2
1
2
1
log log

t
t t
t t
t t




|
.
|

\
|
=
|
|
.
|

\
|
+
|
.
|

\
|
+
|
|
.
|

\
|
+
|
.
|

\
|
=
|
|
.
|

\
|
+
|
|
.
|

\
|
=
(
(

|
|
.
|

\
|
|
|
.
|

\
|
=



e e
e e
e e L
a b a
b
log log =
2
2
) 4 (
2
1
) 4 (
2
1
log ) 4 (
2
1
log
2
=
=

X
e X e
X


26
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
We will now choose so as to maximize this expression.
( ) ( )
|
|
.
|

\
|
|
|
.
|

\
|
=

2
6
2
1
2
4
2
1
2
1
2
1
) 6 , 4 | (

t t
e e L
( ) ( )
( ) ( )
( ) ( )
( ) ( )
2 2
2
6
2
1
2
4
2
1
2
6
2
1
2
4
2
1
2
6
2
1
2
4
2
1
6
2
1
4
2
1
2
1
log 2
log
2
1
log log
2
1
log
2
1
log
2
1
log
2
1
2
1
log log

t
t t
t t
t t




|
.
|

\
|
=
|
|
.
|

\
|
+
|
.
|

\
|
+
|
|
.
|

\
|
+
|
.
|

\
|
=
|
|
.
|

\
|
+
|
|
.
|

\
|
=
(
(

|
|
.
|

\
|
|
|
.
|

\
|
=



e e
e e
e e L


27
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
Quadratic terms of the type in the expression can be expanded as shown.
( ) ( )
2 2
6
2
1
4
2
1
2
1
log 2 log
t

|
.
|

\
|
= L
( ) ( )
2 2 2 2
2
2
1
2
1
2
2
1
2
1
+ = + = a a a a a


28
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
Thus we obtain the differential of the quadratic term.
( ) ( )
2 2
6
2
1
4
2
1
2
1
log 2 log
t

|
.
|

\
|
= L
( ) ( )
2 2 2 2
2
2
1
2
1
2
2
1
2
1
+ = + = a a a a a
( )

=
)
`

a a
d
d
2
2
1


29
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
Applying this result, we obtain the differential of log L with respect to . (The first term in
the expression for log L disappears completely since it is not a function of .)
( ) ( )
2 2
6
2
1
4
2
1
2
1
log 2 log
t

|
.
|

\
|
= L
( ) ( )
2 2 2 2
2
2
1
2
1
2
2
1
2
1
+ = + = a a a a a
( )

=
)
`

a a
d
d
2
2
1
) 6 ( ) 4 (
log

+ =
d
L d
max ( ) maxlog( ( ))
x x
f x f x =


30
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
Thus from the first order condition we confirm that 5 is the value of that maximizes the
log-likelihood function, and hence the likelihood function.
( ) ( )
2 2
6
2
1
4
2
1
2
1
log 2 log
t

|
.
|

\
|
= L
( ) ( )
2 2 2 2
2
2
1
2
1
2
2
1
2
1
+ = + = a a a a a
( )

=
)
`

a a
d
d
2
2
1
) 6 ( ) 4 (
log

+ =
d
L d
5 0
log
= =
d
L d


31
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
Note that a caret mark has been placed over , because we are now talking about an
estimate of , not its true value.
( ) ( )
2 2
6
2
1
4
2
1
2
1
log 2 log
t

|
.
|

\
|
= L
( ) ( )
2 2 2 2
2
2
1
2
1
2
2
1
2
1
+ = + = a a a a a
( )

=
)
`

a a
d
d
2
2
1
) 6 ( ) 4 (
log

+ =
d
L d
5 0
log
= =
d
L d


32
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
Note also that the second differential of log L with respect to is -2. Since this is negative,
we have found a maximum, not a minimum.
( ) ( )
2 2
6
2
1
4
2
1
2
1
log 2 log
t

|
.
|

\
|
= L
( ) ( )
2 2 2 2
2
2
1
2
1
2
2
1
2
1
+ = + = a a a a a
( )

=
)
`

a a
d
d
2
2
1
) 6 ( ) 4 (
log

+ =
d
L d
5 0
log
= =
d
L d


33
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
We will generalize this result to a sample of n observations X
1
,...,X
n
. The probability density
for X
i
is given by the first line.

( )
2
2
1
2
1
) (

t

=
i
X
i
e X f


34
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
The joint density function for a sample of n observations is the product of their individual
densities.

( ) ( )
|
|
.
|

\
|

|
|
.
|

\
|

2
2
1
2
2
1
2
1
...
2
1 1

t t
n
X X
e e
( )
2
2
1
2
1
) (

t

=
i
X
i
e X f


35
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
Now treating the sample values as fixed, we can re-interpret the joint density function as the
likelihood function for , given this sample. We will find the value of that maximizes it.
( )
2
2
1
2
1
) (

t

=
i
X
i
e X f
( ) ( )
|
|
.
|

\
|

|
|
.
|

\
|
=

2
2
1
2
2
1
1
2
1
...
2
1
) ,..., | (
1

t t

n
X X
n
e e X X L


36
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
We will do this indirectly, as before, by maximizing log L with respect to . The logarithm
decomposes as shown.

( ) ( )
|
|
.
|

\
|

|
|
.
|

\
|
=

2
2
1
2
2
1
1
2
1
...
2
1
) ,..., | (
1

t t

n
X X
n
e e X X L
( ) ( )
( ) ( )
( ) ( )
2 2
1
2
2
1
2
2
1
2
2
1
2
2
1
2
1
...
2
1
2
1
log
2
1
log ...
2
1
log
2
1
...
2
1
log log
1
1

t
t t
t t



|
.
|

\
|
=
|
|
.
|

\
|
+ +
|
|
.
|

\
|
=
(
(

|
|
.
|

\
|

|
|
.
|

\
|
=


n
X X
X X
X X n
e e
e e L
n
n
( )
2
2
1
2
1
) (

t

=
i
X
i
e X f


37
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
We differentiate log L with respect to .
) ( ... ) (
log
1

+ + =
n
X X
d
L d
( ) ( )
2 2
1
2
1
...
2
1
2
1
log log
t

|
.
|

\
|
=
n
X X n L


38
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
The first order condition for a maximum is that the differential be equal to zero.

= = 0 0
log

n X
d
L d
i
) ( ... ) (
log
1

+ + =
n
X X
d
L d
( ) ( )
2 2
1
2
1
...
2
1
2
1
log log
t

|
.
|

\
|
=
n
X X n L


39
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
Thus we have demonstrated that the maximum likelihood estimator of is the sample
mean. The second differential, -n, is negative, confirming that we have maximized log L.

( ) ( )
2 2
1
2
1
...
2
1
2
1
log log
t

|
.
|

\
|
=
n
X X n L
) ( ... ) (
log
1

+ + =
n
X X
d
L d

= = 0 0
log

n X
d
L d
i

= = X X
n
i
1



40
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
So far we have assumed that o, the standard deviation of the distribution of X, is equal to 1.
We will now relax this assumption and find the maximum likelihood estimator of it.
2
2
1
2
1
) (
|
.
|

\
|

=
o

t o
i
X
i
e X f


41
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
We will illustrate the process graphically with the two-observation example, keeping fixed
at 5. We will start with o equal to 2.
0.0
0.2
0.4
0.6
0.8
0 1 2 3 4 5 6 7 8 9
L

p
0
0.02
0.04
0.06
0 1 2 3 4
o


42
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
With o equal to 2, the probability density is 0.1760 for both X = 4 and X = 6, and the joint
density is 0.0310.
0.0
0.2
0.4
0.6
0.8
0 1 2 3 4 5 6 7 8 9
L

p
o p(4) p(6) L
2.0 0.1760 0.1760 0.0310


0
0.02
0.04
0.06
0 1 2 3 4
o


43
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
Now try o equal to 1. The individual densities are 0.2420 and so the joint density, 0.0586,
has increased.
0.0
0.2
0.4
0.6
0.8
0 1 2 3 4 5 6 7 8 9
L

p
o p(4) p(6) L
2.0 0.1760 0.1760 0.0310
1.0 0.2420 0.2420 0.0586


0
0.02
0.04
0.06
0 1 2 3 4
o


44
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
Now try putting o equal to 0.5. The individual densities have fallen and the joint density is
only 0.0117.
0.0
0.2
0.4
0.6
0.8
0 1 2 3 4 5 6 7 8 9
L

p
o p(4) p(6) L
2.0 0.1760 0.1760 0.0310
1.0 0.2420 0.2420 0.0586
0.5 0.1080 0.1080 0.0117

0
0.02
0.04
0.06
0 1 2 3 4
o


45
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
The joint density has now been plotted as a function of o in the lower diagram. You can see
that in this example it is greatest for o equal to 1.
0
0.02
0.04
0.06
0 1 2 3 4
0.0
0.2
0.4
0.6
0.8
0 1 2 3 4 5 6 7 8 9
o p(4) p(6) L
2.0 0.1760 0.1760 0.0310
1.0 0.2420 0.2420 0.0586
0.5 0.1080 0.1080 0.0117

L
p

o


46
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
We will now look at this mathematically, starting with the probability density function for X
given and o.
2
2
1
2
1
) (
|
.
|

\
|

=
o

t o
i
X
i
e X f


47
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
The joint density function for the sample of n observations is given by the second line.
2
2
1
2
1
) (
|
.
|

\
|

=
o

t o
i
X
i
e X f
|
|
.
|

\
|

|
|
.
|

\
|
|
.
|

\
|

|
.
|

\
|

2
2
1
2
2
1
2
1
...
2
1
1
o

o

t o t o
n
X X
e e


48
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
As before, we can re-interpret this function as the likelihood function for and o, given the
sample of observations.
2
2
1
2
1
) (
|
.
|

\
|

=
o

t o
i
X
i
e X f
|
|
.
|

\
|

|
|
.
|

\
|
=
|
.
|

\
|

|
.
|

\
|

2
2
1
2
2
1
1
2
1
...
2
1
) ,..., | , (
1
o

o

t o t o
o
n
X X
n
e e X X L


49
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
We will find the values of and o that maximize this function. We will do this indirectly by
maximizing log L.
2
2
1
2
1
) (
|
.
|

\
|

=
o

t o
i
X
i
e X f
(
(

|
|
.
|

\
|

|
|
.
|

\
|
=
|
.
|

\
|

|
.
|

\
|

2
2
1
2
2
1
2
1
...
2
1
log log
1
o

o

t o t o
n
X X
e e L
|
|
.
|

\
|

|
|
.
|

\
|
=
|
.
|

\
|

|
.
|

\
|

2
2
1
2
2
1
1
2
1
...
2
1
) ,..., | , (
1
o

o

t o t o
o
n
X X
n
e e X X L


50
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
We can decompose the logarithm as shown. To maximize it, we will set the partial
derivatives with respect to and o equal to zero.

( ) ( )
|
.
|

\
|
+
|
.
|

\
|
+
|
.
|

\
|
=
|
.
|

\
|


|
.
|

\
|

|
.
|

\
|
=
|
|
.
|

\
|
+ +
|
|
.
|

\
|
=
(
(

|
|
.
|

\
|

|
|
.
|

\
|
=
|
.
|

\
|

|
.
|

\
|

|
.
|

\
|

|
.
|

\
|

2 2
1
2
2 2
1
2
2
1
2
2
1
2
2
1
2
2
1
2
1
...
2
1 1
2
1
log
1
log
2
1
...
2
1
2
1
log
2
1
log ...
2
1
log
2
1
...
2
1
log log
1
1

o t o
o

o

t o
t o t o
t o t o
o

o

o

o

n
n
X X
X X
X X n n
X X
n
e e
e e L
n
n


51
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
When differentiating with respect to , the first two terms disappear. We have already seen
how to differentiate the other terms.
( ) ( )
( )


|
.
|

\
|
+ =
|
.
|

\
|
+
|
.
|

\
|
+
|
.
|

\
|
=

2
2
2 2
1
2
2 2
1
log log
2
1
...
2
1 1
2
1
log
1
log log

o
t
o

o t o
i
n
X n n
X X n n L
( ) ( )
| |
( )

=
+ + =
|
.
|

\
|

c
c
=
c
c

o

o

o
n X
X X
X X
L
i
n
n
2
1
2
2 2
1
2
1
) ( ... ) (
1
2
1
...
2
1 1 log


52
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
Setting the first differential equal to 0, the maximum likelihood estimate of is the sample
mean, as before.
X
L
= =
c
c

0
log
( ) ( )
( )


|
.
|

\
|
+ =
|
.
|

\
|
+
|
.
|

\
|
+
|
.
|

\
|
=

2
2
2 2
1
2
2 2
1
log log
2
1
...
2
1 1
2
1
log
1
log log

o
t
o

o t o
i
n
X n n
X X n n L
( ) ( )
| |
( )

=
+ + =
|
.
|

\
|

c
c
=
c
c

o

o

o
n X
X X
X X
L
i
n
n
2
1
2
2 2
1
2
1
) ( ... ) (
1
2
1
...
2
1 1 log


53
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
Next, we take the partial differential of the log-likelihood function with respect to o.

( ) ( )
( )


|
.
|

\
|
+ =
|
.
|

\
|
+
|
.
|

\
|
+
|
.
|

\
|
=

2
2
2 2
1
2
2 2
1
log log
2
1
...
2
1 1
2
1
log
1
log log

o
t
o

o t o
i
n
X n n
X X n n L


54
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
Before doing so, it is convenient to rewrite the equation.
a b a
b
log log =
o o o
o
log log ) 1 ( log
1
log
1
= = =

( ) ( )
( )


|
.
|

\
|
+ =
|
.
|

\
|
+
|
.
|

\
|
+
|
.
|

\
|
=

2
2
2 2
1
2
2 2
1
log log
2
1
...
2
1 1
2
1
log
1
log log

o
t
o

o t o
i
n
X n n
X X n n L


55
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
The derivative of log o with respect to o is 1/o. The derivative of o
--2
is 2o
--3
.

+ =
c
c
2 3
) (
log
o
o o
i
X
n L
( ) ( )
( )


|
.
|

\
|
+ =
|
.
|

\
|
+
|
.
|

\
|
+
|
.
|

\
|
=

2
2
2 2
1
2
2 2
1
log log
2
1
...
2
1 1
2
1
log
1
log log

o
t
o

o t o
i
n
X n n
X X n n L


56
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
Setting the first derivative of log L to zero gives us a condition that must be satisfied by the
maximum likelihood estimator.
0 ) (

0
log
2 3
= + =
c
c

o
o o
i
X
n L

+ =
c
c
2 3
) (
log
o
o o
i
X
n L
( ) ( )
( )


|
.
|

\
|
+ =
|
.
|

\
|
+
|
.
|

\
|
+
|
.
|

\
|
=

2
2
2 2
1
2
2 2
1
log log
2
1
...
2
1 1
2
1
log
1
log log

o
t
o

o t o
i
n
X n n
X X n n L


57
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
We have already demonstrated that the maximum likelihood estimator of is the sample
mean.
0 ) (
2 2
= +

X X n
i
o
0 ) (

0
log
2 3
= + =
c
c

o
o o
i
X
n L

+ =
c
c
2 3
) (
log
o
o o
i
X
n L
( ) ( )
( )


|
.
|

\
|
+ =
|
.
|

\
|
+
|
.
|

\
|
+
|
.
|

\
|
=

2
2
2 2
1
2
2 2
1
log log
2
1
...
2
1 1
2
1
log
1
log log

o
t
o

o t o
i
n
X n n
X X n n L


58
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
Hence the maximum likelihood estimator of the population variance is the mean square
deviation of X.

=
2 2
) (
1
X X
n
i
o
0 ) (
2 2
= +

X X n
i
o
0 ) (

0
log
2 3
= + =
c
c

o
o o
i
X
n L

+ =
c
c
2 3
) (
log
o
o o
i
X
n L
( ) ( )
( )


|
.
|

\
|
+ =
|
.
|

\
|
+
|
.
|

\
|
+
|
.
|

\
|
=

2
2
2 2
1
2
2 2
1
log log
2
1
...
2
1 1
2
1
log
1
log log

o
t
o

o t o
i
n
X n n
X X n n L


59
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
Note that it is biased. The unbiased estimator is obtained by dividing by n 1, not n.

+ =
c
c
2 3
) (
log
o
o o
i
X
n L
( ) ( )
( )


|
.
|

\
|
+ =
|
.
|

\
|
+
|
.
|

\
|
+
|
.
|

\
|
=

2
2
2 2
1
2
2 2
1
log log
2
1
...
2
1 1
2
1
log
1
log log

o
t
o

o t o
i
n
X n n
X X n n L
0 ) (

0
log
2 3
= + =
c
c

o
o o
i
X
n L
0 ) (
2 2
= +

X X n
i
o

=
2 2
) (
1
X X
n
i
o


INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION

+ =
c
c
2 3
) (
log
o
o o
i
X
n L
( ) ( )
( )


|
.
|

\
|
+ =
|
.
|

\
|
+
|
.
|

\
|
+
|
.
|

\
|
=

2
2
2 2
1
2
2 2
1
log log
2
1
...
2
1 1
2
1
log
1
log log

o
t
o

o t o
i
n
X n n
X X n n L
0 ) (

0
log
2 3
= + =
c
c

o
o o
i
X
n L
0 ) (
2 2
= +

X X n
i
o
However it can be shown that the maximum likelihood estimator is asymptotically efficient,
in the sense of having a smaller mean square error than the unbiased estimator in large
samples.
60

=
2 2
) (
1
X X
n
i
o
1
MAXIMUM LIKELIHOOD ESTIMATION OF REGRESSION COEFFICIENTS
X
Y
X
i

|
1

|
1
+ |
2
X
i
We will now apply the maximum likelihood principle to regression analysis, using the simple linear
model Y = |
1
+ |
2
X + u.
2
MAXIMUM LIKELIHOOD ESTIMATION OF REGRESSION COEFFICIENTS
The black marker shows the value that Y would have if X were equal to X
i
and if there were no
disturbance term.
X
Y
X
i

|
1

|
1
+ |
2
X
i
3
MAXIMUM LIKELIHOOD ESTIMATION OF REGRESSION COEFFICIENTS
However we will assume that there is a disturbance term in the model and that it has a normal
distribution as shown.
X
Y
X
i

|
1

|
1
+ |
2
X
i
4
MAXIMUM LIKELIHOOD ESTIMATION OF REGRESSION COEFFICIENTS
Relative to the black marker, the curve represents the ex ante distribution for u, that is, its potential
distribution before the observation is generated. Ex post, of course, it is fixed at some specific value.
X
Y
X
i

|
1

|
1
+ |
2
X
i
5
MAXIMUM LIKELIHOOD ESTIMATION OF REGRESSION COEFFICIENTS
Relative to the horizontal axis, the curve also represents the ex ante distribution for Y for that
observation, that is, conditional on X = X
i
.
X
Y
X
i

|
1

|
1
+ |
2
X
i
6
MAXIMUM LIKELIHOOD ESTIMATION OF REGRESSION COEFFICIENTS
Potential values of Y close to |
1
+ |
2
X
i
will have relatively large densities ...
X
Y
X
i

|
1

|
1
+ |
2
X
i
X
Y
X
i

|
1

|
1
+ |
2
X
i
7
MAXIMUM LIKELIHOOD ESTIMATION OF REGRESSION COEFFICIENTS
... while potential values of Y relatively far from |
1
+ |
2
X
i
will have small ones.
8
MAXIMUM LIKELIHOOD ESTIMATION OF REGRESSION COEFFICIENTS
The mean value of the distribution of Y
i
is |
1
+ |
2
X
i
. Its standard deviation is o, the standard deviation
of the disturbance term.
X
Y
X
i

|
1

|
1
+ |
2
X
i
9
MAXIMUM LIKELIHOOD ESTIMATION OF REGRESSION COEFFICIENTS
Hence the density function for the ex ante distribution of Y
i
is as shown.
X
Y
X
i

|
1

|
1
+ |
2
X
i
2
2
1
2 1
2
1
) (
|
.
|

\
|

=
o
| |
t o
i i
X Y
i
e Y f
10
MAXIMUM LIKELIHOOD ESTIMATION OF REGRESSION COEFFICIENTS
The joint density function for the observations on Y is the product of their individual densities.
2
2
1
2 1
2
1
) (
|
.
|

\
|

=
o
| |
t o
i i
X Y
i
e Y f
2
2
1
2
2
1
1
2 1 1 2 1 1
2
1
...
2
1
) ( ... ) (
|
.
|

\
|

|
.
|

\
|

=
o
| |
o
| |
t o t o
n n
X Y X Y
n
e e Y f Y f
11
MAXIMUM LIKELIHOOD ESTIMATION OF REGRESSION COEFFICIENTS
Now, taking |
1
, |
2
and o as our choice variables, and taking the data on Y and X as given, we can re-
interpret this function as the likelihood function for |
1
, |
2
, and o.
2
2
1
2 1
2
1
) (
|
.
|

\
|

=
o
| |
t o
i i
X Y
i
e Y f
2
2
1
2
2
1
1
2 1 1 2 1 1
2
1
...
2
1
) ( ... ) (
|
.
|

\
|

|
.
|

\
|

=
o
| |
o
| |
t o t o
n n
X Y X Y
n
e e Y f Y f
( )
2
2
1
2
2
1
1 2 1
2 1 1 2 1 1
2
1
...
2
1
,..., | , ,
|
.
|

\
|

|
.
|

\
|

=
o
| |
o
| |
t o t o
o | |
n n
X Y X Y
n
e e Y Y L
12
MAXIMUM LIKELIHOOD ESTIMATION OF REGRESSION COEFFICIENTS
We will choose |
1
, |
2
, and o so as to maximize the likelihood, given the data on Y and X. As usual, it is
easier to do this indirectly, maximizing the log-likelihood instead.
2
2
1
2 1
2
1
) (
|
.
|

\
|

=
o
| |
t o
i i
X Y
i
e Y f
2
2
1
2
2
1
1
2 1 1 2 1 1
2
1
...
2
1
) ( ... ) (
|
.
|

\
|

|
.
|

\
|

=
o
| |
o
| |
t o t o
n n
X Y X Y
n
e e Y f Y f
( )
2
2
1
2
2
1
1 2 1
2 1 1 2 1 1
2
1
...
2
1
,..., | , ,
|
.
|

\
|

|
.
|

\
|

=
o
| |
o
| |
t o t o
o | |
n n
X Y X Y
n
e e Y Y L
|
|
.
|

\
|
=
|
.
|

\
|

|
.
|

\
|

2
2
1
2
2
1
2 1 1 2 1 1
2
1
...
2
1
log log
o
| |
o
| |
t o t o
n n
X Y X Y
e e L
13
MAXIMUM LIKELIHOOD ESTIMATION OF REGRESSION COEFFICIENTS
As usual, the first step is to decompose the expression as the sum of the logarithms of the factors.
Z n
X Y X Y
n
e e
e e L
n n
X Y X Y
X Y X Y
n n
n n
2 2
1
log
2
1
...
2
1
2
1
log
2
1
log ...
2
1
log
2
1
...
2
1
log log
2
2
2 1
2
1 2 1 1
2
2
1
2
2
1
2
2
1
2
2
1
2 1 1 2 1 1
2 1 1 2 1 1

|
.
|

\
|

|
.
|

\
|

|
.
|

\
|

|
.
|

\
|

|
.
|

\
|
=
|
.
|

\
|


|
.
|

\
|

|
.
|

\
|
=
|
|
.
|

\
|
+ +
|
|
.
|

\
|
=
|
|
.
|

\
|
=
o
t o
o
| |
o
| |
t o
t o t o
t o t o
o
| |
o
| |
o
| |
o
| |
14
MAXIMUM LIKELIHOOD ESTIMATION OF REGRESSION COEFFICIENTS
Then we split the logarithm of each factor into two components. The first component is the same in
each case.
Z n
X Y X Y
n
e e
e e L
n n
X Y X Y
X Y X Y
n n
n n
2 2
1
log
2
1
...
2
1
2
1
log
2
1
log ...
2
1
log
2
1
...
2
1
log log
2
2
2 1
2
1 2 1 1
2
2
1
2
2
1
2
2
1
2
2
1
2 1 1 2 1 1
2 1 1 2 1 1

|
.
|

\
|

|
.
|

\
|

|
.
|

\
|

|
.
|

\
|

|
.
|

\
|
=
|
.
|

\
|


|
.
|

\
|

|
.
|

\
|
=
|
|
.
|

\
|
+ +
|
|
.
|

\
|
=
|
|
.
|

\
|
=
o
t o
o
| |
o
| |
t o
t o t o
t o t o
o
| |
o
| |
o
| |
o
| |
15
MAXIMUM LIKELIHOOD ESTIMATION OF REGRESSION COEFFICIENTS
Hence the log-likelihood simplifies as shown.
Z n
X Y X Y
n
e e
e e L
n n
X Y X Y
X Y X Y
n n
n n
2 2
1
log
2
1
...
2
1
2
1
log
2
1
log ...
2
1
log
2
1
...
2
1
log log
2
2
2 1
2
1 2 1 1
2
2
1
2
2
1
2
2
1
2
2
1
2 1 1 2 1 1
2 1 1 2 1 1

|
.
|

\
|

|
.
|

\
|

|
.
|

\
|

|
.
|

\
|

|
.
|

\
|
=
|
.
|

\
|


|
.
|

\
|

|
.
|

\
|
=
|
|
.
|

\
|
+ +
|
|
.
|

\
|
=
|
|
.
|

\
|
=
o
t o
o
| |
o
| |
t o
t o t o
t o t o
o
| |
o
| |
o
| |
o
| |
| |
2
2 1
2
1 2 1 1
) ( ... ) ( where
n n
X Y X Y Z | | | | + + =
16
MAXIMUM LIKELIHOOD ESTIMATION OF REGRESSION COEFFICIENTS
To maximize the log-likelihood, we need to minimize Z. But choosing estimators of |
1
and |
2
to
minimize Z is exactly what we did when we derived the least squares regression coefficients.
Z n
X Y X Y
n
e e
e e L
n n
X Y X Y
X Y X Y
n n
n n
2 2
1
log
2
1
...
2
1
2
1
log
2
1
log ...
2
1
log
2
1
...
2
1
log log
2
2
2 1
2
1 2 1 1
2
2
1
2
2
1
2
2
1
2
2
1
2 1 1 2 1 1
2 1 1 2 1 1

|
.
|

\
|

|
.
|

\
|

|
.
|

\
|

|
.
|

\
|

|
.
|

\
|
=
|
.
|

\
|


|
.
|

\
|

|
.
|

\
|
=
|
|
.
|

\
|
+ +
|
|
.
|

\
|
=
|
|
.
|

\
|
=
o
t o
o
| |
o
| |
t o
t o t o
t o t o
o
| |
o
| |
o
| |
o
| |
| |
2
2 1
2
1 2 1 1
) ( ... ) ( where
n n
X Y X Y Z | | | | + + =
17
MAXIMUM LIKELIHOOD ESTIMATION OF REGRESSION COEFFICIENTS
Thus, for this regression model, the maximum likelihood estimators are identical to the least squares
estimators. The estimator of o will, however, be slightly different.
Z n
X Y X Y
n
e e
e e L
n n
X Y X Y
X Y X Y
n n
n n
2 2
1
log
2
1
...
2
1
2
1
log
2
1
log ...
2
1
log
2
1
...
2
1
log log
2
2
2 1
2
1 2 1 1
2
2
1
2
2
1
2
2
1
2
2
1
2 1 1 2 1 1
2 1 1 2 1 1

|
.
|

\
|

|
.
|

\
|

|
.
|

\
|

|
.
|

\
|

|
.
|

\
|
=
|
.
|

\
|


|
.
|

\
|

|
.
|

\
|
=
|
|
.
|

\
|
+ +
|
|
.
|

\
|
=
|
|
.
|

\
|
=
o
t o
o
| |
o
| |
t o
t o t o
t o t o
o
| |
o
| |
o
| |
o
| |
| |
2
2 1
2
1 2 1 1
) ( ... ) ( where
n n
X Y X Y Z | | | | + + =

You might also like