You are on page 1of 91

4.


Bayesian
Inference,
Part
IV:
Least

Mean
Squares
(LMS)
es;ma;on

ECE
302
Fall
2009
TR
3‐4:15pm

Purdue
University,
School
of
ECE

Prof.
Ilya
Pollak


An
Es;mate
vs
an
Es;mator

•  Suppose
X
and
Y
are
random
variables.

•  Y
is
observed,
X
is
not
observed.

An
Es;mate
vs
an
Es;mator

•  Suppose
X
and
Y
are
random
variables.

•  Y
is
observed,
X
is
not
observed.

•  An
es;mator
of
X
based
on
Y
is
any
func;on
g(Y).

An
Es;mate
vs
an
Es;mator

•  Suppose
X
and
Y
are
random
variables.

•  Y
is
observed,
X
is
not
observed.

•  An
es;mator
of
X
based
on
Y
is
any
func;on
g(Y).

•  Since
g(Y)
is
a
func;on
of
a
r.v.,
it
is
a
r.v.
itself.

An
Es;mate
vs
an
Es;mator

•  Suppose
X
and
Y
are
random
variables.

•  Y
is
observed,
X
is
not
observed.

•  An
es;mator
of
X
based
on
Y
is
any
func;on
g(Y).

•  Since
g(Y)
is
a
func;on
of
a
r.v.,
it
is
a
r.v.
itself.

•  An
es;mate
of
X
based
on
Y=y
is
a
number,
the

value
of
an
es;mator
as
determined
by
the

observed
value
y
of
Y.

An
Es;mate
vs
an
Es;mator

•  Suppose
X
and
Y
are
random
variables.

•  Y
is
observed,
X
is
not
observed.

•  An
es;mator
of
X
based
on
Y
is
any
func;on
g(Y).

•  Since
g(Y)
is
a
func;on
of
a
r.v.,
it
is
a
r.v.
itself.

•  An
es;mate
of
X
based
on
Y=y
is
a
number,
the

value
of
an
es;mator
as
determined
by
the

observed
value
y
of
Y.

•  E.g.,
if
g(Y)
is
an
es;mator
of
X,
then
g(y)
is
an

es;mate
of
X
for
any
specific
observa;on
y.


Pros
and
Cons
of
MAP
Es;mators


•  Minimize
the
error
probability.

Pros
and
Cons
of
MAP
Es;mators


•  Minimize
the
error
probability.

•  Not
appropriate
when
different
errors
have

different
costs
(as
in,
e.g.,
the
burglar
alarm

and
spam
filtering
examples).

Pros
and
Cons
of
MAP
Es;mators


•  Minimize
the
error
probability.

•  Not
appropriate
when
different
errors
have

different
costs
(as
in,
e.g.,
the
burglar
alarm

and
spam
filtering
examples).

•  For
some
posterior
distribu;ons,
may
result
in

large
probabili;es
of
large
errors.

An
example
when
the
MAP

es;mate
is
bad

Suppose fX|Y(x|0) is:

10

0 0.01 1 1.38 x

•  MAP:
the
max
of
the
tall
le[
triangle,
x*
=
0.005.

An
example
when
the
MAP

es;mate
is
bad

Suppose fX|Y(x|0) is:

10

0 0.01 1 1.38 x

•  MAP:
the
max
of
the
tall
le[
triangle,
x*
=
0.005.

•  Le[
triangle:
0.05
of
the
condi;onal
probability
mass.

An
example
when
the
MAP

es;mate
is
bad

Suppose fX|Y(x|0) is:

10

0 0.01 1 1.38 x

•  MAP:
the
max
of
the
tall
le[
triangle,
x*
=
0.005.

•  Le[
triangle:
0.05
of
the
condi;onal
probability
mass.

•  MAP
makes
errors
of
≥
0.995
with
cond.
prob.
0.95.

An
example
when
the
MAP

es;mate
is
bad

Suppose fX|Y(x|0) is:

10

0 0.01 1 1.38 x

•  MAP:
the
max
of
the
tall
le[
triangle,
x*
=
0.005.

•  Le[
triangle:
0.05
of
the
condi;onal
probability
mass.

•  MAP
makes
errors
of
≥
0.995
with
cond.
prob.
0.95.

•  Another
es;mate:
condi;onal
mean
E[X|Y=0]
=

0.050.005
+
0.951.19
≈
1.13.

Mean
Squared
Error
(MSE)
for
an

Es;mator
g(Y)
of
X

Mean
Squared
Error
(MSE)
for
an

Es;mator
g(Y)
of
X


•  The
expecta;on
is
taken
with
respect
to
the
joint

distribu;on
of
X
and
Y.

Mean
Squared
Error
(MSE)
for
an

Es;mator
g(Y)
of
X


•  The
expecta;on
is
taken
with
respect
to
the
joint

distribu;on
of
X
and
Y.

•  This
is
the
expecta;on
of
the
squared
difference

between
the
unobserved
random
variable
X
and

its
es;mator
based
on
the
observed
random

variable
Y.

Mean
Squared
Error
(MSE)
for
an

Es;mator
g(Y)
of
X


•  The
expecta;on
is
taken
with
respect
to
the
joint

distribu;on
of
X
and
Y.

•  This
is
the
expecta;on
of
the
squared
difference

between
the
unobserved
random
variable
X
and

its
es;mator
based
on
the
observed
random

variable
Y.

•  Makes
large
errors
more
costly
than
small
ones.

Condi;onal
Mean
Squared
Error

for
an
Es;mate
g(y)
of
X,
given
Y=y

Condi;onal
Mean
Squared
Error

for
an
Es;mate
g(y)
of
X,
given
Y=y


•  The
expecta;on
is
taken
with
respect
to
the

condi;onal
distribu;on
of
X
given
Y=y.

Condi;onal
Mean
Squared
Error

for
an
Es;mate
g(y)
of
X,
given
Y=y


•  The
expecta;on
is
taken
with
respect
to
the

condi;onal
distribu;on
of
X
given
Y=y.

•  This
is
the
condi;onal
expecta;on,
given
Y=y,

of
the
squared
difference
between
the

unobserved
random
variable
X
and
its

es;mate
based
on
the
specific
observa;on
y

of
the
random
variable
Y.

Least
Mean
Squares
(LMS)

Es;mator

•  LMS
es'mator
for
X
based
on
Y
is
the

es'mator
that
minimizes
the
MSE
among
all

es'mators
of
X
based
on
Y.

Least
Mean
Squares
(LMS)

Es;mator

•  LMS
es'mator
for
X
based
on
Y
is
the

es'mator
that
minimizes
the
MSE
among
all

es'mators
of
X
based
on
Y.

•  If
g(Y)
is
the
LMS
es;mator,
then,
for
any

observa;on
y
of
Y,
the
number
g(y)
minimizes

the
condi;onal
MSE
among
all
es;mates
of
X.

Least
Mean
Squares
(LMS)

Es;mator

•  LMS
es'mator
for
X
based
on
Y
is
the

es'mator
that
minimizes
the
MSE
among
all

es'mators
of
X
based
on
Y.

•  If
g(Y)
is
the
LMS
es;mator,
then,
for
any

observa;on
y
of
Y,
the
number
g(y)
minimizes

the
condi;onal
MSE
among
all
es;mates
of
X.

•  The
LMS
es'mator
of
X
based
on
Y
is
E[X|Y].

Least
Mean
Squares
(LMS)

Es;mator

•  LMS
es'mator
for
X
based
on
Y
is
the

es'mator
that
minimizes
the
MSE
among
all

es'mators
of
X
based
on
Y.

•  If
g(Y)
is
the
LMS
es;mator,
then,
for
any

observa;on
y
of
Y,
the
number
g(y)
minimizes

the
condi;onal
MSE
among
all
es;mates
of
X.

•  The
LMS
es'mator
of
X
based
on
Y
is
E[X|Y].

•  For
any
observa;on
y
of
Y,
the
LMS
es;mate

of
X
is
the
mean
of
the
posterior
density,
i.e.,

E[X|Y=y]. 


LMS
es;mate
with
no
data

•  Suppose
X
is
a
r.v.
with
a
known
mean.

LMS
es;mate
with
no
data

•  Suppose
X
is
a
r.v.
with
a
known
mean.

•  We
observe
nothing.

LMS
es;mate
with
no
data

•  Suppose
X
is
a
r.v.
with
a
known
mean.

•  We
observe
nothing.

•  We
would
like
an
es;mate
x*
of
X
which
minimizes
the

MSE,
E[(X−x*)2].

LMS
es;mate
with
no
data

•  Suppose
X
is
a
r.v.
with
a
known
mean.

•  We
observe
nothing.

•  We
would
like
an
es;mate
x*
of
X
which
minimizes
the

MSE,
E[(X−x*)2].

•  E[(X−x*)2]
=
var(X−x*)
+
(E[X−x*])2

LMS
es;mate
with
no
data

•  Suppose
X
is
a
r.v.
with
a
known
mean.

•  We
observe
nothing.

•  We
would
like
an
es;mate
x*
of
X
which
minimizes
the

MSE,
E[(X−x*)2].

•  E[(X−x*)2]
=
var(X−x*)
+
(E[X−x*])2
=
var(X)
+
(E[X]
−
x*)2


LMS
es;mate
with
no
data

•  Suppose
X
is
a
r.v.
with
a
known
mean.

•  We
observe
nothing.

•  We
would
like
an
es;mate
x*
of
X
which
minimizes
the

MSE,
E[(X−x*)2].

•  E[(X−x*)2]
=
var(X−x*)
+
(E[X−x*])2
=
var(X)
+
(E[X]
−
x*)2



MSE


var(X)


E[X]
 x*
LMS
es;mate
with
no
data

•  Suppose
X
is
a
r.v.
with
a
known
mean.

•  We
observe
nothing.

•  We
would
like
an
es;mate
x*
of
X
which
minimizes
the

MSE,
E[(X−x*)2].

•  E[(X−x*)2]
=
var(X−x*)
+
(E[X−x*])2
=
var(X)
+
(E[X]
−
x*)2


•  This
is
a
quadra;c
func;on
of
x*,
minimized
by
x*=E[X].

MSE


var(X)


E[X]
 x*
LMS
es;mate
from
one
observa;on

•  Suppose
X,
Y
are
r.v.’s,
and
E[X|Y=y]
is
known.

LMS
es;mate
from
one
observa;on

•  Suppose
X,
Y
are
r.v.’s,
and
E[X|Y=y]
is
known.

•  We
observe
Y=y.

LMS
es;mate
from
one
observa;on

•  Suppose
X,
Y
are
r.v.’s,
and
E[X|Y=y]
is
known.

•  We
observe
Y=y.

•  We
would
like
an
es;mate
x*
of
X
which
minimizes
the

posterior
MSE,
E[(X−x*)2|Y=y].

LMS
es;mate
from
one
observa;on

•  Suppose
X,
Y
are
r.v.’s,
and
E[X|Y=y]
is
known.

•  We
observe
Y=y.

•  We
would
like
an
es;mate
x*
of
X
which
minimizes
the

posterior
MSE,
E[(X−x*)2|Y=y].

•  E[(X−x*)2|Y=y]
=
var(X−x*|Y=y)
+
(E[X−x*]
|Y=y)2

LMS
es;mate
from
one
observa;on

•  Suppose
X,
Y
are
r.v.’s,
and
E[X|Y=y]
is
known.

•  We
observe
Y=y.

•  We
would
like
an
es;mate
x*
of
X
which
minimizes
the

posterior
MSE,
E[(X−x*)2|Y=y].

•  E[(X−x*)2|Y=y]
=
var(X−x*|Y=y)
+
(E[X−x*]
|Y=y)2















=
var(X|Y=y)
+
(E[X|Y=y]
−
x*)2


LMS
es;mate
from
one
observa;on

•  Suppose
X,
Y
are
r.v.’s,
and
E[X|Y=y]
is
known.

•  We
observe
Y=y.

•  We
would
like
an
es;mate
x*
of
X
which
minimizes
the

posterior
MSE,
E[(X−x*)2|Y=y].

•  E[(X−x*)2|Y=y]
=
var(X−x*|Y=y)
+
(E[X−x*]
|Y=y)2















=
var(X|Y=y)
+
(E[X|Y=y]
−
x*)2



MSE


var(X|Y=y)


E[X|Y=y]
 x*
LMS
es;mate
from
one
observa;on

•  Suppose
X,
Y
are
r.v.’s,
and
E[X|Y=y]
is
known.

•  We
observe
Y=y.

•  We
would
like
an
es;mate
x*
of
X
which
minimizes
the

posterior
MSE,
E[(X−x*)2|Y=y].

•  E[(X−x*)2|Y=y]
=
var(X−x*|Y=y)
+
(E[X−x*]
|Y=y)2















=
var(X|Y=y)
+
(E[X|Y=y]
−
x*)2


•  Quadra;c
func;on
of
x*,
minimized
by
x*=E[X|Y=y].

MSE


var(X|Y=y)


E[X|Y=y]
 x*
LMS
es;mator

•  Suppose
X,
Y
are
r.v.’s,
and
E[X|Y=y]
is
known
for
all
y.

LMS
es;mator

•  Suppose
X,
Y
are
r.v.’s,
and
E[X|Y=y]
is
known
for
all
y.

•  Let
g(y)
=
E[X|Y=y]
be
the
LMS
es;mate
of
X
based
on
data
Y=y.

LMS
es;mator

•  Suppose
X,
Y
are
r.v.’s,
and
E[X|Y=y]
is
known
for
all
y.

•  Let
g(y)
=
E[X|Y=y]
be
the
LMS
es;mate
of
X
based
on
data
Y=y.

•  Then
g(Y)
=
E[X|Y]
is
an
es;mator
of
X
based
on
Y.

LMS
es;mator

•  Suppose
X,
Y
are
r.v.’s,
and
E[X|Y=y]
is
known
for
all
y.

•  Let
g(y)
=
E[X|Y=y]
be
the
LMS
es;mate
of
X
based
on
data
Y=y.

•  Then
g(Y)
=
E[X|Y]
is
an
es;mator
of
X
based
on
Y.

•  Let
h(Y)
be
any
other
es;mator
of
X
based
on
Y.

LMS
es;mator

•  Suppose
X,
Y
are
r.v.’s,
and
E[X|Y=y]
is
known
for
all
y.

•  Let
g(y)
=
E[X|Y=y]
be
the
LMS
es;mate
of
X
based
on
data
Y=y.

•  Then
g(Y)
=
E[X|Y]
is
an
es;mator
of
X
based
on
Y.

•  Let
h(Y)
be
any
other
es;mator
of
X
based
on
Y.

•  We
just
showed
that,
for
every
value
of
y,
we
have:

E[(X−g(y))2|Y=y]

≤
E[(X−h(y))2|Y=y]

LMS
es;mator

•  Suppose
X,
Y
are
r.v.’s,
and
E[X|Y=y]
is
known
for
all
y.

•  Let
g(y)
=
E[X|Y=y]
be
the
LMS
es;mate
of
X
based
on
data
Y=y.

•  Then
g(Y)
=
E[X|Y]
is
an
es;mator
of
X
based
on
Y.

•  Let
h(Y)
be
any
other
es;mator
of
X
based
on
Y.

•  We
just
showed
that,
for
every
value
of
y,
we
have:

E[(X−g(y))2|Y=y]

≤
E[(X−h(y))2|Y=y]

•  Therefore,

E[(X−g(Y))2|Y]

≤
E[(X−h(Y))2|Y]

LMS
es;mator

•  Suppose
X,
Y
are
r.v.’s,
and
E[X|Y=y]
is
known
for
all
y.

•  Let
g(y)
=
E[X|Y=y]
be
the
LMS
es;mate
of
X
based
on
data
Y=y.

•  Then
g(Y)
=
E[X|Y]
is
an
es;mator
of
X
based
on
Y.

•  Let
h(Y)
be
any
other
es;mator
of
X
based
on
Y.

•  We
just
showed
that,
for
every
value
of
y,
we
have:

E[(X−g(y))2|Y=y]

≤
E[(X−h(y))2|Y=y]

•  Therefore,

E[(X−g(Y))2|Y]

≤
E[(X−h(Y))2|Y]

•  Take
expecta;ons
of
both
sides
and
use
the
iterated
expecta;on

formula:

E[(X−g(Y))2]

≤
E[(X−h(Y))2]

LMS
es;mator

•  Suppose
X,
Y
are
r.v.’s,
and
E[X|Y=y]
is
known
for
all
y.

•  Let
g(y)
=
E[X|Y=y]
be
the
LMS
es;mate
of
X
based
on
data
Y=y.

•  Then
g(Y)
=
E[X|Y]
is
an
es;mator
of
X
based
on
Y.

•  Let
h(Y)
be
any
other
es;mator
of
X
based
on
Y.

•  We
just
showed
that,
for
every
value
of
y,
we
have:

E[(X−g(y))2|Y=y]

≤
E[(X−h(y))2|Y=y]

•  Therefore,

E[(X−g(Y))2|Y]

≤
E[(X−h(Y))2|Y]

•  Take
expecta;ons
of
both
sides
and
use
the
iterated
expecta;on

formula:

E[(X−g(Y))2]

≤
E[(X−h(Y))2]

•  Therefore,
the
condi;onal
expecta;on
es;mator
achieves
an
MSE

which
is
≤
to
the
MSE
for
any
other
es;mator.

Other
names
that
LMS
es;mate

goes
by
in
literature


•  Least‐squares
es;mate
(LSE)

•  Bayes
least‐squares
es;mate
(BLSE)

•  Minimum
mean
square
error
es;mate

(MMSE)

Ex
8.3:
es;ma;ng
the
mean
of
a

Gaussian
r.v.

•  Observe
Yi
=
X
+
Wi
for
i
=
1,…,n

•  X,
W1,
…,
Wn

independent
Gaussian,
with

known
means
and
variances

•  Wi
has
mean
0
and
variance
σi2

•  X
has
mean
x0
and
variance
σ02

•  Previously,
we
found
the
MAP
es;mate
of
X

based
on
observing
Y
=
(Y1,
…,
Yn)

•  Now
let’s
find
the
LMS
es;mate
of
X
based
on

observing
Y
=
(Y1,
…,
Yn)

Prior
model
and
measurement
model

Prior model:

Measurement model:
2
1⎛ y −x⎞
n
1 − ⎜ i
2 ⎝ σ i ⎟⎠
fY| X (y | x) = ∏ e
i =1 σ i 2π

− n /2 ⎛
n
1 ⎞ ⎛ n 1 ⎛ y − x⎞2⎞
= ( 2π ) ⎜ ∏
⎝ σ ⎟⎠ exp ⎜ − ∑ 2 ⎜⎝ σ ⎟⎠ ⎟
i

i =1 i ⎝ i =1 i ⎠
Posterior
PDF


fY| X (y | x) f X (x)
f X |Y (x | y) =
fY (y)
Posterior
PDF


fY| X (y | x) f X (x)
f X |Y (x | y) =
fY (y)
Posterior
PDF


fY| X (y | x) f X (x)
f X |Y (x | y) =
fY (y)
Posterior
PDF


fY| X (y | x) f X (x)
f X |Y (x | y) =
fY (y)
1 ⎛ n
1 ⎞ ⎛ n

1 yi − x ⎞
2
⎞ 1 ⎛ 1 x − x0 ⎞
⎛ ⎞
2

= ( 2π ) ⎜ ∏ ⎟ exp ⎜ − ∑ ⎜
− n /2
⎟ ⎟ exp ⎜ − ⎜ ⎟⎠ ⎟
fY (y) ⎝ i =1 σ i ⎠ ⎝ i =1 2 ⎝ σ i ⎠ σ
⎠ 0 2π ⎝ 2 ⎝ σ 0 ⎠
Posterior
PDF


fY| X (y | x) f X (x)
f X |Y (x | y) =
fY (y)
1 ⎛ n
1 ⎞ ⎛ n
1 ⎛ y − x ⎞
2
⎞ 1 ⎛ 1 ⎛ x − x ⎞
2

= ( 2π ) ⎜ ∏ ⎟ exp ⎜ − ∑ ⎜
− n /2 i
⎟ ⎟ exp ⎜ − ⎜ 0
⎟ ⎟
fY (y) σ
⎝ i =1 i ⎠ ⎝ i =1 ⎝
2 σ i ⎠ ⎠ σ 0 2π ⎝ 2 ⎝ σ 0 ⎠ ⎠

⎛ n 1 ⎛ y − x⎞2⎞
= C exp ⎜ − ∑ ⎜ i ⎟⎠ ⎟
⎝ i=0 2 ⎝ σ i ⎠
(Denoting y0 = x0 and C = the factor that's not a function of x.)
Further
massaging
the
posterior
PDF…

⎛ n 1 ⎛ y − x⎞2⎞ ⎛ 1 ⎡ n
1 n
y n
y 2
⎤⎞
f X |Y (x | y) = C exp ⎜ − ∑ ⎜ i
⎟ ⎟ = C exp ⎜ − ⎢ x ∑ 2 − 2x ∑ 2 + ∑ 2 ⎥⎟
2 i i

⎝ i=0 2 ⎝ σ i ⎠ ⎠ ⎝ 2 ⎣ i=0 σ i i=0 σ i i = 0 σ i ⎦⎠


Further
massaging
the
posterior
PDF…

⎛ n 1 ⎛ y − x⎞2⎞ ⎛ 1 ⎡ n
1 n
y n
y 2
⎤⎞
f X |Y (x | y) = C exp ⎜ − ∑ ⎜ i
⎟ ⎟ = C exp ⎜ − ⎢ x ∑ 2 − 2x ∑ 2 + ∑ 2 ⎥⎟
2 i i

⎝ i=0 2 ⎝ σ i ⎠ ⎠ ⎝ 2 ⎣ i=0 σ i i=0 σ i i = 0 σ i ⎦⎠


n
yi
1
∑σ2
Denoting v = n and m = i =n0 i , we have:
1 1
∑σ2 ∑σ2
i=0 i i=0 i
Further
massaging
the
posterior
PDF…

⎛ n 1 ⎛ y − x⎞2⎞ ⎛ 1 ⎡ n
1 n
y n
y 2
⎤⎞
f X |Y (x | y) = C exp ⎜ − ∑ ⎜ i
⎟ ⎟ = C exp ⎜ − ⎢ x ∑ 2 − 2x ∑ 2 + ∑ 2 ⎥⎟
2 i i

⎝ i=0 2 ⎝ σ i ⎠ ⎠ ⎝ 2 ⎣ i=0 σ i i=0 σ i i = 0 σ i ⎦⎠


n
yi
1
∑σ2
Denoting v = n and m = i =n0 i , we have:
1 1
∑σ2 ∑σ2
i=0 i i=0 i

⎛ 1 2 m 2 1 n yi2 ⎞
f X |Y (x | y) = C exp ⎜ − ⎡⎣ x − 2xm + m ⎤⎦ +
2
− ∑ 2⎟
⎝ 2v 2v 2 i = 0 σ i ⎠
Further
massaging
the
posterior
PDF…

⎛ n 1 ⎛ y − x⎞2⎞ ⎛ 1 ⎡ n
1 n
y n
y 2
⎤⎞
f X |Y (x | y) = C exp ⎜ − ∑ ⎜ i
⎟ ⎟ = C exp ⎜ − ⎢ x ∑ 2 − 2x ∑ 2 + ∑ 2 ⎥⎟
2 i i

⎝ i=0 2 ⎝ σ i ⎠ ⎠ ⎝ 2 ⎣ i=0 σ i i=0 σ i i = 0 σ i ⎦⎠


n
yi
1
∑σ2
Denoting v = n and m = i =n0 i , we have:
1 1
∑σ2 ∑σ2
i=0 i i=0 i

⎛ 1 2 m 2 1 n yi2 ⎞
f X |Y (x | y) = C exp ⎜ − ⎡⎣ x − 2xm + m ⎤⎦ +
2
− ∑ 2⎟
⎝ 2v 2v 2 i = 0 σ i ⎠
⎛ (x − m)2 ⎞
= C1 exp ⎜ −
⎝ 2v ⎟⎠
Further
massaging
the
posterior
PDF…

⎛ n 1 ⎛ y − x⎞2⎞ ⎛ 1 ⎡ n
1 n
y n
y 2
⎤⎞
f X |Y (x | y) = C exp ⎜ − ∑ ⎜ i
⎟ ⎟ = C exp ⎜ − ⎢ x ∑ 2 − 2x ∑ 2 + ∑ 2 ⎥⎟
2 i i

⎝ i=0 2 ⎝ σ i ⎠ ⎠ ⎝ 2 ⎣ i=0 σ i i=0 σ i i = 0 σ i ⎦⎠


n
yi
1
∑σ2
Denoting v = n and m = i =n0 i , we have:
1 1
∑σ2 ∑σ2
i=0 i i=0 i

⎛ 1 2 m 2 1 n yi2 ⎞
f X |Y (x | y) = C exp ⎜ − ⎡⎣ x − 2xm + m ⎤⎦ +
2
− ∑ 2⎟
⎝ 2v 2v 2 i = 0 σ i ⎠
⎛ (x − m)2 ⎞
= C1 exp ⎜ −
⎝ 2v ⎟⎠

This is a Gaussian PDF with mean m and variance v,


Further
massaging
the
posterior
PDF…

⎛ n 1 ⎛ y − x⎞2⎞ ⎛ 1 ⎡ n
1 n
y n
y 2
⎤⎞
f X |Y (x | y) = C exp ⎜ − ∑ ⎜ i
⎟ ⎟ = C exp ⎜ − ⎢ x ∑ 2 − 2x ∑ 2 + ∑ 2 ⎥⎟
2 i i

⎝ i=0 2 ⎝ σ i ⎠ ⎠ ⎝ 2 ⎣ i=0 σ i i=0 σ i i = 0 σ i ⎦⎠


n
yi
1
∑σ2
Denoting v = n and m = i =n0 i , we have:
1 1
∑σ2 ∑σ2
i=0 i i=0 i

⎛ 1 2 m 2 1 n yi2 ⎞
f X |Y (x | y) = C exp ⎜ − ⎡⎣ x − 2xm + m ⎤⎦ +
2
− ∑ 2⎟
⎝ 2v 2v 2 i = 0 σ i ⎠
⎛ (x − m)2 ⎞
= C1 exp ⎜ −
⎝ 2v ⎟⎠

This is a Gaussian PDF with mean m and variance v,


1
and therefore we immediately have: C1 =
2π v
Posterior
PDF
and
LMS
es;mate
 n
yi
1 ⎛ (x − m)2 ⎞ 1
∑σ2
f X |Y (x | y) = exp ⎜ − where v = n and m = i =n0 i
2π v ⎝ 2v ⎟⎠ 1 1
∑σ2 ∑σ2
i=0 i i=0 i
Posterior
PDF
and
LMS
es;mate
 n
yi
1 ⎛ (x − m)2 ⎞ 1
∑σ2
f X |Y (x | y) = exp ⎜ − where v = n and m = i =n0 i
2π v ⎝ 2v ⎟⎠ 1 1
∑σ2 ∑σ2
i=0 i i=0 i
n
yi
∑σ2
Therefore, the LMS estimate is E[X | Y = y] = m = i =n0 i ,
1
∑σ2
i=0 i
Posterior
PDF
and
LMS
es;mate
 n
yi
1 ⎛ (x − m)2 ⎞ 1
∑σ2
f X |Y (x | y) = exp ⎜ − where v = n and m = i =n0 i
2π v ⎝ 2v ⎟⎠ 1 1
∑σ2 ∑σ2
i=0 i i=0 i
n
yi
∑σ2
Therefore, the LMS estimate is E[X | Y = y] = m = i =n0 i ,
1
∑σ2
i=0 i
n
Yi
∑σ2
and the LMS estimator is E[X | Y] = i =n0 i
1
∑σ2
i=0 i
Posterior
PDF
and
LMS
es;mate
 n
yi
1 ⎛ (x − m)2 ⎞ 1
∑σ2
f X |Y (x | y) = exp ⎜ − where v = n and m = i =n0 i
2π v ⎝ 2v ⎟⎠ 1 1
∑σ2 ∑σ2
i=0 i i=0 i
n
yi
∑σ2
Therefore, the LMS estimate is E[X | Y = y] = m = i =n0 i ,
1
∑σ2
i=0 i
n
Yi
∑σ2
and the LMS estimator is E[X | Y] = i =n0 i
1
∑σ2
i=0 i

Note
that,
in
this
example,
LMS
=
MAP

Posterior
PDF
and
LMS
es;mate
 n
yi
1 ⎛ (x − m)2 ⎞ 1
∑σ2
f X |Y (x | y) = exp ⎜ − where v = n and m = i =n0 i
2π v ⎝ 2v ⎟⎠ 1 1
∑σ2 ∑σ2
i=0 i i=0 i
n
yi
∑σ2
Therefore, the LMS estimate is E[X | Y = y] = m = i =n0 i ,
1
∑σ2
i=0 i
n
Yi
∑σ2
and the LMS estimator is E[X | Y] = i =n0 i
1
∑σ2
i=0 i

Note
that,
in
this
example,
LMS
=
MAP,
because:

• 
the
posterior
density
is
Gaussian

• 
for
a
Gaussian
density,
the
maximum
is
at
the
mean

Ex.
8.11

•  X
=
con;nuous
uniform
over
[4,
10].

•  W
=
con;nuous
uniform
over
[‐1,
1].

•  X
and
W
are
independent.

Ex.
8.11

•  X
=
con;nuous
uniform
over
[4,
10].

•  W
=
con;nuous
uniform
over
[‐1,
1].

•  X
and
W
are
independent.

•  Y
=
X
+
W.

Ex.
8.11

•  X
=
con;nuous
uniform
over
[4,
10].

•  W
=
con;nuous
uniform
over
[‐1,
1].

•  X
and
W
are
independent.

•  Y
=
X
+
W.

•  Calculate
the
LMS
es;mate
of
X
based
on
Y=y.

Ex.
8.11

•  X
=
con;nuous
uniform
over
[4,
10].

•  W
=
con;nuous
uniform
over
[‐1,
1].

•  X
and
W
are
independent.

•  Y
=
X
+
W.

•  Calculate
the
LMS
es;mate
of
X
based
on
Y=y.

•  Calculate
the
associated
condi;onal
MSE.

Ex.
8.11

•  X
=
con;nuous
uniform
over
[4,
10].

•  W
=
con;nuous
uniform
over
[‐1,
1].

•  X
and
W
are
independent.

•  Y
=
X
+
W.

•  fY|X(y|x)
is
uniform
over
[x−1,
x+1]

Ex.
8.11

•  X
=
con;nuous
uniform
over
[4,
10].

•  W
=
con;nuous
uniform
over
[‐1,
1].

•  X
and
W
are
independent.

•  Y
=
X
+
W.

•  fY|X(y|x)
is
uniform
over
[x−1,
x+1]

⎧⎪ 1 / 12, if 4 ≤ x ≤ 10 and x − 1 ≤ y ≤ x + 1
f X ,Y (x, y) = f X (x) fY | X (y | x) = ⎨
⎪⎩ 0, otherwise
Ex.
8.11:
support
of
the
joint
PDF
of
X
and
Y

⎧⎪ 1 / 12, if 4 ≤ x ≤ 10 and x − 1 ≤ y ≤ x + 1
f X ,Y (x, y) = f X (x) fY | X (y | x) = ⎨
⎪⎩ 0, otherwise
x
10
The
joint
PDF
is
supported

x−1=y (i.e.,
is
nonzero)
inside
the

blue
parallelogram.


x+1=y
4

3 5 9 11 y
Ex.
8.11:
the
form
of
the
posterior

f X ,Y (x, y)
f X |Y (x | y) =
fY (y)
Ex.
8.11:
the
form
of
the
posterior

f X ,Y (x, y) fX,Y
is
uniform
over
the
parallelogram.


f X |Y (x | y) =
fY (y)

x
10

x=y+1

x=y−1
4

3 5 9 11 y
Ex.
8.11:
the
form
of
the
posterior

f X ,Y (x, y) fX,Y
is
uniform
over
the
parallelogram.
Therefore,
for

f X |Y (x | y) = any
fixed
y=y0,
the
posterior
is
uniform
over
the

fY (y)
intersec;on
of
the
line
y=y0
with
the
parallelogram.


x
10

x=y+1

x=y−1
4

3 5 9 11 y
Ex.
8.11:
the
posterior

f X ,Y (x, y) fX,Y
is
uniform
over
the
parallelogram.
Therefore,
for

f X |Y (x | y) = any
fixed
y=y0,
the
posterior
is
uniform
over
the

fY (y)
intersec;on
of
the
line
y=y0
with
the
parallelogram.


x fX|Y(x|y)
is
uniform
over:

[4,
y+1]

for
3
≤
y
≤
5

10

x=y+1

x=y−1
4

3 5 9 11 y
Ex.
8.11:
the
posterior

f X ,Y (x, y) fX,Y
is
uniform
over
the
parallelogram.
Therefore,
for

f X |Y (x | y) = any
fixed
y=y0,
the
posterior
is
uniform
over
the

fY (y)
intersec;on
of
the
line
y=y0
with
the
parallelogram.


x fX|Y(x|y)
is
uniform
over:

[4,
y+1]

for
3
≤
y
≤
5

10 [y−1,
y+1]

for
5
≤
y
≤
9


x=y+1

x=y−1
4

3 5 9 11 y
Ex.
8.11:
the
posterior

f X ,Y (x, y) fX,Y
is
uniform
over
the
parallelogram.
Therefore,
for

f X |Y (x | y) = any
fixed
y=y0,
the
posterior
is
uniform
over
the

fY (y)
intersec;on
of
the
line
y=y0
with
the
parallelogram.


x fX|Y(x|y)
is
uniform
over:

[4,
y+1]

for
3
≤
y
≤
5

10 [y−1,
y+1]

for
5
≤
y
≤
9

[y−1,
10]

for
9
≤
y
≤
11

x=y+1

x=y−1
4

3 5 9 11 y
Ex.
8.11:
the
LMS
es;mate

f X ,Y (x, y) fX,Y
is
uniform
over
the
parallelogram.
Therefore,
for

f X |Y (x | y) = any
fixed
y=y0,
the
posterior
is
uniform
over
the

fY (y)
intersec;on
of
the
line
y=y0
with
the
parallelogram.


x fX|Y(x|y)
is
uniform
over:

[4,
y+1]

for
3
≤
y
≤
5

10 [y−1,
y+1]

for
5
≤
y
≤
9

[y−1,
10]

for
9
≤
y
≤
11

x=y+1

Therefore,
E[X|Y=y]
=

(y+5)/2

for
3
≤
y
≤
5

x=y−1
4

3 5 9 11 y
Ex.
8.11:
the
LMS
es;mate

f X ,Y (x, y) fX,Y
is
uniform
over
the
parallelogram.
Therefore,
for

f X |Y (x | y) = any
fixed
y=y0,
the
posterior
is
uniform
over
the

fY (y)
intersec;on
of
the
line
y=y0
with
the
parallelogram.


x fX|Y(x|y)
is
uniform
over:

[4,
y+1]

for
3
≤
y
≤
5

10 [y−1,
y+1]

for
5
≤
y
≤
9

[y−1,
10]

for
9
≤
y
≤
11

x=y+1

Therefore,
E[X|Y=y]
=

(y+5)/2

for
3
≤
y
≤
5

x=y−1 y













for
5
≤
y
≤
9

4

3 5 9 11 y
Ex.
8.11:
the
LMS
es;mate

f X ,Y (x, y) fX,Y
is
uniform
over
the
parallelogram.
Therefore,
for

f X |Y (x | y) = any
fixed
y=y0,
the
posterior
is
uniform
over
the

fY (y)
intersec;on
of
the
line
y=y0
with
the
parallelogram.


x fX|Y(x|y)
is
uniform
over:

[4,
y+1]

for
3
≤
y
≤
5

10 [y−1,
y+1]

for
5
≤
y
≤
9

[y−1,
10]

for
9
≤
y
≤
11

x=y+1

Therefore,
E[X|Y=y]
=

(y+5)/2

for
3
≤
y
≤
5

x=y−1 y













for
5
≤
y
≤
9

4 (y+9)/2

for
9
≤
y
≤
11


3 5 9 11 y
Ex.
8.11:
the
LMS
es;mate

f X ,Y (x, y) fX,Y
is
uniform
over
the
parallelogram.
Therefore,
for

f X |Y (x | y) = any
fixed
y=y0,
the
posterior
is
uniform
over
the

fY (y)
intersec;on
of
the
line
y=y0
with
the
parallelogram.


x fX|Y(x|y)
is
uniform
over:

[4,
y+1]

for
3
≤
y
≤
5

10 [y−1,
y+1]

for
5
≤
y
≤
9

[y−1,
10]

for
9
≤
y
≤
11

x=y+1
Therefore,
E[X|Y=y]
=

(y+5)/2

for
3
≤
y
≤
5

y













for
5
≤
y
≤
9

x=y−1
4 (y+9)/2

for
9
≤
y
≤
11


3 5 9 11 y
Ex.
8.11:
the
condi;onal
MSE

E[(X−E[X|Y=y])2
|
Y=y]
=
var(X|Y=y)

Ex.
8.11:
the
condi;onal
MSE


x fX|Y(x|y)
is
uniform
over:

[4,
y+1]

for
3
≤
y
≤
5

10

x=y+1
Therefore,


E[(X−E[X|Y=y])2
|
Y=y]
=
var(X|Y=y) =

(y−3) 2/12

for
3
≤
y
≤
5

x=y−1
4

3 5 9 11 y
Ex.
8.11:
the
condi;onal
MSE


x fX|Y(x|y)
is
uniform
over:

[4,
y+1]

for
3
≤
y
≤
5

10 [y−1,
y+1]

for
5
≤
y
≤
9


x=y+1
Therefore,


E[(X−E[X|Y=y])2
|
Y=y]
=
var(X|Y=y) =

(y−3) 2/12

for
3
≤
y
≤
5

x=y−1
4 1/3













for
5
≤
y
≤
9


3 5 9 11 y
Ex.
8.11:
the
condi;onal
MSE


x fX|Y(x|y)
is
uniform
over:

[4,
y+1]

for
3
≤
y
≤
5

10 [y−1,
y+1]

for
5
≤
y
≤
9

[y−1,
10]

for
9
≤
y
≤
11

x=y+1
Therefore,


E[(X−E[X|Y=y])2
|
Y=y]
=
var(X|Y=y) =

(y−3) 2/12

for
3
≤
y
≤
5

x=y−1
4 1/3













for
5
≤
y
≤
9

(11−y) 2/12

for
9
≤
y
≤
11


3 5 9 11 y
Ex.
8.11:
the
condi;onal
MSE


E[(X−E[X|Y=y])2
|
Y=y]
=
var(X|Y=y) =

(y−3) 2/12

for
3
≤
y
≤
5

1/3













for
5
≤
y
≤
9

(11−y) 2/12

for
9
≤
y
≤
11


Conditional MSE

1/3

3 5 9 11 y
Es;ma;on
error
for
LMS

We denote the LMS estimator of X based on Y by X̂LMS :
X̂LMS = E[X | Y ]

The associated estimation error is denoted by X LMS and is defined as:


X LMS = X̂LMS − X = E[X | Y ] − X
Proper;es
of
the
es;ma;on
error
for
LMS

We denote the LMS estimator of X based on Y by X̂LMS :
X̂LMS = E[X | Y ]

The associated estimation error is denoted by X LMS and is defined as:


X LMS = X̂LMS − X = E[X | Y ] − X

The estimation error has zero mean:


E ⎡⎣ X LMS ⎤⎦ = E [ E[X | Y ] − X ] = E [ E[X | Y ]] − E [ X ] = 0
Proper;es
of
the
es;ma;on
error
for
LMS

We denote the LMS estimator of X based on Y by X̂LMS :
X̂LMS = E[X | Y ]

The associated estimation error is denoted by X LMS and is defined as:


X LMS = X̂LMS − X = E[X | Y ] − X

The estimation error has zero mean:


E ⎡⎣ X LMS ⎤⎦ = E [ E[X | Y ] − X ] = E [ E[X | Y ]] − E [ X ] = 0

The estimation error is uncorrelated with X̂LMS :


E ⎡⎣ X̂LMS X LMS ⎤⎦ = 0
Proper;es
of
the
es;ma;on
error
for
LMS

We denote the LMS estimator of X based on Y by X̂LMS :
X̂LMS = E[X | Y ]

The associated estimation error is denoted by X LMS and is defined as:


X LMS = X̂LMS − X = E[X | Y ] − X

The estimation error has zero mean:


E ⎡⎣ X LMS ⎤⎦ = E [ E[X | Y ] − X ] = E [ E[X | Y ]] − E [ X ] = 0

The estimation error is uncorrelated with X̂LMS :


E ⎡⎣ X̂LMS X LMS ⎤⎦ = 0

Therefore,
( ) ( ) ( ) (
var(X) = var X̂LMS − ( X̂LMS − X) = var X̂LMS − X LMS = var X̂LMS + var − X LMS )
= var ( X̂ ) + var ( X )
LMS LMS

You might also like