Maximum Likely Hood

HO #2 EE 722
9/3/2002 J. Chun
MLD (Maximum Likelihood Detection)
Example 1 A blood test is 95% effective in detecting the HIV infection when it is, in fact, present.
However, the test also yields false positive result for 1% of the healthy persons tested. If a person is
tested to be positive, would you decide that he has HIV?
95 . 0 } HIV | positive { P
01 . 0 } HIV no | positive { P
It is more likely that a person with HIV gives the positive result than a person with no HIV. So we
would decide that the person has HIV.
The decision criterion used in Example 1 is called the maximum likelihood decision criterion. Let us
generalize the idea in Example 1. There are two hypotheses
0
H and
1
H ( HIV no
0
H ,
HIV
1
H in the above example). Each of the two messages generates a point z in the observation
space Z . It is desired to divide Z into two decision regions
0
Z and
1
Z (The division gives a
decision rule). We make decision
0
d , that hypothesis
0
H is true, if
0
Z z , similarly for decision
1
d .
In Example 1,
} positive negative, { Z
.
1
0
Z
1
Z
0
) ( d z d
1
) ( d z d
Z
binary
Single observation
MLD Criterion
Given an observation Z z , set
0
) ( d z d if it is more likely that
0
H generated z than
1
H
generated z . Namely,
( ) ( ) { }
1 0 0
| | | H z p H z p z Z >
( ) ( ) { }
1 0 1
| | | H z p H z p z Z <
Shorthand notation
1 ) (
1
0
H
H
z
<
>
,
) | (
) | (
) (
0
1
H z p
H z p
z
Example 2
n z H
N n n z H
+
1 :
) 1 , 0 ( ~ , :
1
0
2
likelihood ratio
'
2
) 1 (
1
2
0
2
2
2
1
) | (
2
1
) | (
z
z
e H z p
e H z p
0 5 . 0 1
) | (
1
H z p ) | (
0
H z p
z
1
Z
0
Z
1
) | (
) | (
) (
1
0
2 2
2
2
2
1 2
2
) 1 (
2
2
1
2
) 1 (
2
1
0
1
H
H
z z z
z
z
e e
e
e
H z p
H z p
z
<
>

0
2
1 2
) ( l n
1
0
H
H
z
z
<
>

or
2
1
1
0
H
H
z
<
>
So, if 6 . 0 z , we choose
1
H because
2
1
6 . 0 > .
Example 3
( ) ) 1 , 0 ( ~
2
1
) | ( :
2
0 0
2
N z e H z p H
z
)) 2 , 0 ( ~ (
2 2
1
) | ( :
2
2
2
8
1 1
N z e H z p H
z
3
2
1
1
Z
0
Z
Z
) | (
1
H z p
) | (
0
H z p
1
Z
0
Z
1
Z
1
2
1
) | (
) | (
) (
1
0
2
2
2
8
3
2
2 2
1
8
2
1
0
1
H
H
z
z
z
e
e
e
H z p
H z p
z
<
>

0
8
3
2 l n ) ( l n
1
0
2
H
H
z z
<
>
+
2 l n
3
8
1
0
2
H
H
z
<
>
or 2 ln
3
8
1
0
H
H
z
<
>
.
Example 4 (Multiple observation)
n z :
0
H
n s z + :
1
H
[ ] n n n
1
2
1
exp
) ( det ) 2 (
1
) (
2
1
2
R
R
p
T
P
Aside:
(i)
I R
2

,
_
I R
2
1
1
:
4
p-element vector
( ) ( ) ( )
P
P
I I
2 2 2
det det
So,
( )
1
]
1
P
i
i
P
n p
1
2
2
1
2
exp
2
1
) (

n
(ii) 1 P :
[ ]
2
2
1
2
exp
2
1
) ( n p

n

( ) ( ) { } [ ] z z s z s z
z
z
z
1 1
2
1
0
1
exp
) | (
) | (
) (

R R
H p
H p
T T
( ) 0 2
2
1
) ( l n
1
0
1 1
H
H
T T
R R
<
>

s z s s z
So, s s s z
1 1
2
1
1
0

<
>
R R
T
H
H
T
.
This is a multiple decision, too.
Neyman-Pearson Criterion
Fix { }
0 1
| H d P at a preselected value
0
, and then maximize { }
1 1
| H d P . constrained
maximization
5
( ) s z s s
1 1
2
1
2

R R
T T
) | (
1
H z p
) | (
0
H z p
1
Z
0
Z
1
) ( d z d
0
) ( d z d
z
area ( )= { }
F
P H d P
0 1
|
false alarm probability
area( )= { }
D
P H d P
1 1
|
detection probability
We want { }
1 1
| H d P , { }
0 1
| H d P . By moving , however, { }
1 1
| H d P and { }
0 1
| H d P
either decrease and increase simultaneously.
To find the threshold according to the Neyman-Pearson criterion, we want to maximize:
{ } { }
[ ]
0 0 1
0
) | (
0 1
) | (
1 1
1
1
0
1
1
) | ( ) | (
] | [ |

+

Z
dz H z p dz H z p
dz H z p H z p
H d P H d P
Z Z

0 ) | ( ) | (
0 1
> H z p H z p
or
>
) | (
) | (
0
1
H z p
H z p
.
Namely,
1
0
) | (
) | (
) (
0
1
H
H
H z p
H z p
z
<
>

We must select such that the constraint
{ }

1
) | ( |
0 0 0 1
Z
dz H z p H d P
is satisfied.
6
This can be maximized by selecting
1
Z of all z such that
Example 5
0 ,
2
) (
exp
2
1
) | (
2
exp
2
1
) | (
2
1
2
0
>
,
_
,
_
z
H z p
z
H z p
We require that { } 25 . 0 |
0 1
H d P
( )
z
z z
z
z
H z p
H z p
z
1
1
]
1

2
2
1
2 2
2
2
e e
e
e
) | (
) | (
) (
2 2
) (
2
2
) (
0
1
So,

l n
2
1
1
0
2
1
2
H
H
z
z
<
>
+
,
_
ln
2
1
1
0
H
H
z
<
>
or
2
1 ln
1
0
+
<
>
H
H
z
{ }

,
_
2
ln
e |
2
ln
2
2
1
0 1
2
25 . 0
Q dz H d P
z
So, 674 . 0
2
ln
,
_
Notice that we did not have to know the value
, to derive the Neyman-Pearson detection

rule.
Example 6
7
1
Z
0
Z
z
674 . 0
0
z
( )
0
z Q
unknown
'
<
elsewhere : 0
2 0 : ) 1 (
) | (
2
2
3
0
z z
H z p
'
<
elsewhere : 0
2 0 : ) 2 (
) | (
4
3
1
z z z
H z p
{ } 2 . 0 |
0 1
H d P
1
0
2
0
1
) 1 ( 2
) 2 (
) | (
) | (
) (
H
H
z
z z
H z p
H z p
z
<
>

3
2
3
3
) 1 (
2
3
2
2
2
3
2
0
) 1 ( ) 1 ( ) 1 ( ) | ( 2 . 0

z dz z dz H z p
So
415 . 0
.
MAP (maximum a posteriori criterion)
MLD or Neyman-Pearson criteria are simple, but they could give bad decisions. If we have a priori
information, we would like to incorporate it to make a better decision. Going back Example 1, suppose
8
2
) | (
1
H z p
) | (
0
H z p
{ }
0 1
| H d P
z
1
0
2 2
) (z
z
1
0
2
0
Z
1
Z
0
Z
that it is known that only 0.5% of the Korean population has HIV. Would you strongly thrust the test when
you are tested to be positive?
The probability that a person has HIV, given that his test result is positive.
{ }
{ }
{ }
{ } { }
{ } { } { } { }
323 . 0
) 995 . 0 ( ) 01 . 0 ( ) 005 . 0 ( ) 95 . 0 (
) 005 . 0 ( ) 95 . 0 (
HIV no HIV no | positive HIV HIV | positive
HIV HIV | positive
positive
positive HIV,
positive | HIV
P P P P
P P
P
P
P
So, we need a better decision criterion that can use a priori information such as 0.5% in the above
argument. The idea is that if either
0
H or
1
H is highly unlikely to be true, the MLD is not a good
criterion.
MAP (maximum a posteriori decision criterion)
Given an observation z , choose
0
H if
0
H is more likely than
1
H .
1
) | (
) | (
1
0
0
1
H
H
z H p
z H p
<
>
So,
{ }
{ }
1
0
1
0
) (
H P
H P
z
H
H
<
>
(*)
For Example 1, we consider the ratio,
9
{ }
{ }
{ }
{ }
0
1
0 0
1 1
) (
) | (
) | (
H P
H P
z
H P H z p
H P H z p

{ }
{ }
{ } { } { }
{ } { } { }
2
1
477 . 0
995 . 0 01 . 0
005 . 0 95 . 0
positive HIV no HIV no | positive
positive HIV HIV | positive
positive | HIV no
positive | HIV
<
P P P
P P P
P
P
) (z
So the MAP criterion decides that he does not have HIV.
Another good thing about the MAP criterion is that the MAP minimizes the probability of error (of making
incorrect decision).
Proof.
{ } { }
{ } { } { } { }
{ } { } { } [ ]
+
+
+
1
1 1 0 0 1
1 1 0 0 0 1
1 0 0 1
) | ( ) | (
| |
, ,
Z
e
dz H P H z p H P H z p H p
H P H d P H P H d P
H d P H d P P

To minimize
e
P , put z , for which { } { } [ ]
1 1 0 0
) | ( ) | ( H P H z p H P H z p is negative into
1
Z :
{ } { } [ ] { } 0 ) | ( ) | (
1 1 0 0 1
< H P H z p H P H z p z Z
i.e.
{ }
{ }
1
0
1
0
) (
H P
H P
z
H
H
<
>
which is the same to (*).

Example 7
,
_
,
_

2
) 1 (
exp
2
1
) | (
2
exp
2
1
) | (
2
1
2
0
z
H z p
z
H z p
10
1
0
) | ( 1
) | (
1
1
Z
Z
dz H z p
dz H z p
1
) | (
0
Z
dz H z p
{ }
{ }
1
0
0
1
) | (
) | (
H P
H P
H z p
H z p
>
{ } 25 . 0
0
H P , { } 75 . 0
1
H P
{ }
{ } 3
1
2
1 2
e x p ) (
1
0
1
0
<
>
,
_

H P
H P z
z
H
H
i.e.
6 . 0
2
1
3
1
ln
1
0

+
,
_
<
>
H
H
z
Example 8 (single observation, multiple decision)

Closed region has N animals.
(i) Catch r animals, mark them and release them.
(ii) After they are dispersed, catch n animals, and count the number, i of the marked animal.
Let x denote the number of the marked animals.
11
Unknown that we want to estimate (decide).
multiple decision
Single observation
0
2
1
1
z
1
Z
0
Z
6 . 0
function of
N
unknown parameter
{ }
,
_
,
_
,
_

n
N
i n
r N
i
r
i x P N P
i
) (
Suppose that
50 r , 40 n , 4 i .
The MLD chooses the value N that maximizes ) (N P
i
, the probability of the observed event (
4 i ) when there are actually N animals.
12
clear all
r = 50;
n = 40;
i = 4;
for N = 50:1000
rCi = prod(r:-1:r-(i-1))/factorial(i);
NmrCnmi = prod(N-r:-1:N-r-(n-i-1))/factorial(n-i);
NCn = prod(N:-1:N-(n-1))/factorial(n);

Pi(N) = rCi*NmrCnmi/NCn;
end
plot(Pi);
i: observation
13
Decision v.s. Estimation
In the decision problem, the number of Hypotheses is finite or countably infinite. (So, the Hypotheses
form a discrete space.)
example 8
In the estimation problem, the number of hypothesis is uncountably infinite.
The same physical problem may be formulated as either a decision problem or an estimation problem.
Example 8, where we used the decision problem setting, may be formulated as an estimation problem,
which would give a solution such as N = 501.42.
e.g.
i
j
target
decision setting: (5,6)
estimation setting: (4.98, 6.12)
target position
subpixel
resolution
Image plane
Something more general than MLD, Neyman-Pearson or MAP criteria?
Bayes risk criterion.
14
estimator decision rule
estimate decision
Assign cost to each of four possible situations and minimize the total average cost.
00
c cost of deciding
0
d when
0
H is true.
10
c cost of deciding
1
d when
0
H is true.
01
c cost of deciding
0
d when
1
H is true.
11
c cost of deciding
1
d when
1
H is true.
Total average cost
} {
ij
c E B
||
} , { } , { } , { } , {
1 1 11 1 0 01 0 1 10 0 0 00
H d P c H d P c H d P c H d P c + + +
||
} { ]
|||
} | {
} 1
||
} | { [ } { ]
|||
} | {

} 1
||
} [
1
1
1 1 11
0 0
1 0 01 0
0
0 0 00
0 0
0 0 00
H P
b
H d P c
|H P{d
H d P c H P
b
H d P c
|H P{d
| H P{d c

+
+ +
d z H z P H P c c H z P H P c c H P c H P c
H P H d P c c H P c H P H d P c c H P c
) ] | ( } { ) ( ) | ( } { ) [ ( } { } {
} { } | { ) ( } { } { } | { ) ( } {
1 1 1 1 0 1 0 0 0 0 1 0 1 0 1 0 0 0
1 1 1 0 1 1 1 1 0 1 0 0 1 0 0 1 0 0 0 0
1
+ +
+ + +
Z
15
conditional cost, i.e. average cost
assuming that is true.
To minimize
B
, put z for which [ ]<0 into
1
Z , i.e. choose
1
H if
0 ) | ( } { ) ( ) | ( } { ) (
1 1 11 01 0 0 00 10
< H z P H P c c H z P H P c c
or
11 01
00 10
0 0
1 1
} { ) | (
} { ) | (
c c
c c
H P H z P
H P H z P
>
assuming that 0 ) (
11 01
> c c

Therefore, the Bayes decision rule is:
} {
} {
) (
1
0
1 1 0 1
0 0 1 0
0
1
H P
H P
c c
c c
z
H
H
<
>
(when
11 01
c c > )
Example 9
| |
0
2
1
) | (
z
e H z P

cost for miss cost for false alarm
| | 2
1
) | (
z
e H z P

2 , 1 , 0
10 01 11 00
c c c c
75 . 0 } {
1
H P
3
2
75 . 0 ) 0 1 (
25 . 0 ) 0 2 (
2
) | (
) | (
) (
0
1
| |
0
1

<
>

H
H
z
e
H z P
H z P
z
16
cost for
incorrect
decision
cost for
correct
decision
i.e.
3
1
ln | |
0
1
H
H
z
<
>
or
3
1
ln | |
0
1
>
<
H
H
z
To summarize
MLD: 1 ) (
0
1
H
H
z
<
>
Neyman-Pearson: z
H
H
0
1
) (
<
>

MAP:
} {
} {
) (
1
0
0
1
H P
H P
z
H
H
<
>

Bayes:
} {
} {
) (
1
0
1 1 0 1
0 0 1 0
0
1
H P
H P
c c
c c
z
H
H
<
>

Each of the above decision methods compares the likelihood ratio ) (z to a different threshold.
17
if } { } {
0 1
H P H P
if
1
0
01 10
11 00

c c
c c
Min-Max criterion
Can we use the Bayes method even if we do not have } {
0
H P (or } {
1
H P )?
The min-max criterion uses the Bayes decision rule using the least favorable } {
0
H P ( conservative
approach).
Note that the average cost B in p.13 is a function of } {
0
H P and
1
Z :

} { ]
) (
|||
} | { ) ( [ } { ]
|||
) (
} | { ) ( [
) ,
|||
} { (
|||
1
1 1
1 1 01 11 01 0
1 0
0 1 00 01 00
1
0
0
H P
b
H d P c c c H P
b
H d P c c c
P
H P B
B

Z
Z
Z
+ + +
So, ) 1 )( ( ) ( ) , (
0 1 1 0 1 0 1 0
P b P b P B + Z Z Z
The min-max decision region
1
Z is defined by
1
Z that minimizes
) , ( max
1 0
0
Z P B
P
.
So
) , ( max
) , ( min max
||
) , ( max min ) , ( max
1 0
0
1 0
1 0
1 0
0
1
1 0
0
Z
Z
Z Z
Z
Z
P B
P B
P B P B
P
P
P
P
<

18
the minimum (Bayes) cost
associated with the a priori
probability .
Example 10
0 ) | (
0
>

z e H z P
z
0 2 ) | (
2
1
>

z e H z P
z
1 , 2 , 0
10 01 11 00
c c c c
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
0
0.5
1
1.5
2
2.5
3
1
1
z
H
ln2
) | (
1
H z P
) | (
0
H z P
0
0
z
H
0
0
0
1
2
1 2
1
2
2
P
P
H
H
e
e
e
(z)
z
z
z
<
>
or
|||
) 1 ( 4
ln
0
0
0
1
P
P
z
H
H
>
<
19
Bayes cost
function of
function of
( i ) find max Bayes cost and
( ii ) find from
) 8 . 0 if 0 (
0
> < P
function of unknown
) 1 ( 8
8 9
) 1 ( 2 ) 1 (
) 1 ( 2 ) 1 (
) 1 ( }] | { ) 2 0 ( 2 [ }] | { ) 0 1 ( 0 [
0
0
2
0
) 1 ( 4
ln 2
0
) 1 ( 4
ln
0
0
2
0
0 1 1 0 0 1
0
0
0
0
+
+
+ + +

P
P P
e P e P
P e e P
P H d P P H d P B
P
P
P
P

3
2
0
0
0
P
dP
dB
So,
69 . 0 2 ln
3
2
)
3
2
1 ( 4
ln '
0
1

>
<
H
H
z
20

e dz e H d P
z
1 } | {
0 0 1
) } | { 1 ( 2
1 1

H d P
2 3
1 2

e dz e
z
Example 10
1 , 0
10 01 11 00
c c c c
0
0
1
2 ) (
1
0
P
P
e z
H
H
z
<
>

or
0
0
) 1 ( 2
ln
1
0
P
P
z
H
H
>
<
) 1 ( 4
5 4
) 1 ( ] 1 [
) 1 ( ) 1 (
) 1 (
)
1
2 1 (
}] | { ) 1 0 ( 1 [
1
}] |
1
{ ) 0 1 ( 0 [
0
2
0 0
) 1 ( 2
ln 2
0
) 1 ( 2
ln
0
0
2
0
0
2
2
1 1 0
0
0
0
0
0
0
P
P P
e P e P
P e e P
P
e
dz e
H d P P
e
dz e
H d P B
P
P
P
P
z
z
+
+

+ +

0 4 10 5 0
0
2
0
0
+ P P
dP
dB
447 . 1
10
80 100 10
0

t
P ,
553 . 0
21
)
3
2
if 0 (
0
> < P
B=[];
for p0=0:0.01:1
if p0<=2/3;
B=[B,(4*p0-5*p0^2)/(4*(1-p0))];
end
end
plot((0:max(size(B))-1)*0.01,B)
grid
22
Claim
Suppose that
1
Z s.t. ),
( )
(
1 1 1 0
Z Z b b and
1
Z is a Bayes decision region for some

}) { (
0 0
H P P . Then
1
Z is the min-max decision region.

Proof
Suppose that
1
Z s.t. ),
( )
(
1 1 1 0
Z Z b b and
1
Z is not the min-max decision region. Then

1
'
Z
s.t.

)
, ( max ) , ( max
1 0
0
1 0
0
Z Z P B
P
P B
P
<
So,
)
( ) , ( max
1 0 1 0
0
Z Z b P B
P
<

)
( ) , ( max
1 1 1 0
0
Z Z b P B
P
<
So,

)
, (
)
( ) , (
1 0
1 0 1 0
Z
Z Z
P B
b P B
<
for all
0
P
But
1
Z is a Bayes decision region for some ) (

0
P , and therefore,
) , ( )
, (
1 1
Z Z < B B

so
1
Z is the min-max decision region.
23
independent of
0
P
)
(
)
(
) 1 ( )
( )
(
1 1
1 0
0 1 1 0 1 0
Z
Z
Z Z
b
b
P b P b
+
)
(
1 1
Z b p.15
)
( )
(
1 1 1 0
Z Z b b
24
) 1 )( ( ) ( ) , (
0 1 1 0 1 0 1 0
P b P b P B + Z Z Z - (*) (p. 15)
) 1 (
1 1
Z Z
) 2 (
1 1
Z Z
) 3 (
1 1
Z Z line when
tangent to all lines
( must be convex)
(why?)
minimum cost when
minimum cost when
maximum of the
minimum cost
(
is the optimal
Bayes decision
region)
) 2 (
0 0
P P
) 1 (
0 0
P P
) 1 (
1
Z
) , (
1 0
Z P B
0
P 0 1
) 1 (
0
P
) 3 (
0
P
) 2 (
0
P
So from (*)
0 ) ( ) ( 0
) , (
1 1 1 0
0
1 0

Z Z
Z
b b
dP
P dB
i.e. )
( )
(
1 1 1 0
Z Z b b
25
'
'
5 . 2 $
2
1
10
4
1
10
4
1
20 +
Connection to the game theory (G. Strang)
2 cards 2 cards
player x
dealer
player y
$20 $20
$10
$10
Player x and player y show one of their two cards simultaneously.
If player y matches the card of player x ($20 $20, or $10 $10), then player y gets $10 from player
x.
If player y does not match the card of player x ($20 $10, or $20 $10), then player x gets $20 (if
player x showed $20) or $10 (if player x showed $10) from player y.
Some thought
Players x and y must make decisions which do not have a regular pattern, and each decision must be
independent from the previous decision. Otherwise the opponent would try to take advantage of it
x chooses $20 with probability
x
P
, 20
x
P
, 20
1
y chooses $20 with probability
y
P
, 20
y
P
, 20
1
Want to find the optimal

y x
P P
, 20 , 20
and
.
equilibrium point
Suppose that x and y choose a card with equal probability i.e.
2
1
, 20 , 20

y x
P P .
Then

y of cost average the
Player y does not know what card player x would show.
26
(
0
P in the previous examples)
But player y wishes to minimize the average cost by choosing
y
P
, 20
.
cost for y
1
1
1
1
]
1
10

20

20

10

1 , 1 0 1
1 0 0 , 0
c c
c c
,
,
x chooses x chooses
$10 $20
cost for y cost for y is earning for x
1
]
1
10 10
20 10

ys strategy is to minimize the average cost.
x
y y
P P
b
P
a
P
P P
, 20
, 20 , 20
], 30 20 , 20 10 [
] 10 , 10 [ ) 1 ]( 20 , 10 [
+
+

s.t. b a
) (
5
3
30 20 20 10
, 20 y
P P
P P

+
So, y should show $20-card with the rate of
5
3
.
$10-card with the rate of
5
2
.
What is the cost for y with this strategy?
] 2 , 2 [ ]
5
3
30 20 ,
5
3
20 10 [ +
average cost = $2 which is less than $2.5.
So y minimizes his maximum cost.
27
y chooses $10
y chooses $20
zero-sum game
y's average cost
other example
unknown noise
covariance matrix
optimal point
(equilibrium point,
saddle point)
optimal cost for y
our parameter
estimate
x tries to stay on this
line
other example
x
P
, 20
y
P
, 20
28

Maximum Likely Hood

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Maximum Likely Hood

Uploaded by

Copyright:

Available Formats

HO #2 EE 722

Notice that we did not have to know the value

, to derive the Neyman-Pearson detection

which is the same to (*).

Example 8 (single observation, multiple decision)

Z is a Bayes decision region for some

Z is the min-max decision region.

Z is not the min-max decision region. Then

Z is a Bayes decision region for some ) (

You might also like