Professional Documents
Culture Documents
Bayesian
Inference,
Part
IV:
Least
Mean
Squares
(LMS)
es;ma;on
ECE
302
Fall
2009
TR
3‐4:15pm
Purdue
University,
School
of
ECE
Prof.
Ilya
Pollak
An
Es;mate
vs
an
Es;mator
• Suppose
X
and
Y
are
random
variables.
• Y
is
observed,
X
is
not
observed.
An
Es;mate
vs
an
Es;mator
• Suppose
X
and
Y
are
random
variables.
• Y
is
observed,
X
is
not
observed.
• An
es;mator
of
X
based
on
Y
is
any
func;on
g(Y).
An
Es;mate
vs
an
Es;mator
• Suppose
X
and
Y
are
random
variables.
• Y
is
observed,
X
is
not
observed.
• An
es;mator
of
X
based
on
Y
is
any
func;on
g(Y).
• Since
g(Y)
is
a
func;on
of
a
r.v.,
it
is
a
r.v.
itself.
An
Es;mate
vs
an
Es;mator
• Suppose
X
and
Y
are
random
variables.
• Y
is
observed,
X
is
not
observed.
• An
es;mator
of
X
based
on
Y
is
any
func;on
g(Y).
• Since
g(Y)
is
a
func;on
of
a
r.v.,
it
is
a
r.v.
itself.
• An
es;mate
of
X
based
on
Y=y
is
a
number,
the
value
of
an
es;mator
as
determined
by
the
observed
value
y
of
Y.
An
Es;mate
vs
an
Es;mator
• Suppose
X
and
Y
are
random
variables.
• Y
is
observed,
X
is
not
observed.
• An
es;mator
of
X
based
on
Y
is
any
func;on
g(Y).
• Since
g(Y)
is
a
func;on
of
a
r.v.,
it
is
a
r.v.
itself.
• An
es;mate
of
X
based
on
Y=y
is
a
number,
the
value
of
an
es;mator
as
determined
by
the
observed
value
y
of
Y.
• E.g.,
if
g(Y)
is
an
es;mator
of
X,
then
g(y)
is
an
es;mate
of
X
for
any
specific
observa;on
y.
Pros
and
Cons
of
MAP
Es;mators
• Minimize
the
error
probability.
Pros
and
Cons
of
MAP
Es;mators
• Minimize
the
error
probability.
• Not
appropriate
when
different
errors
have
different
costs
(as
in,
e.g.,
the
burglar
alarm
and
spam
filtering
examples).
Pros
and
Cons
of
MAP
Es;mators
• Minimize
the
error
probability.
• Not
appropriate
when
different
errors
have
different
costs
(as
in,
e.g.,
the
burglar
alarm
and
spam
filtering
examples).
• For
some
posterior
distribu;ons,
may
result
in
large
probabili;es
of
large
errors.
An
example
when
the
MAP
es;mate
is
bad
Suppose fX|Y(x|0) is:
10
0 0.01 1 1.38 x
• MAP:
the
max
of
the
tall
le[
triangle,
x*
=
0.005.
An
example
when
the
MAP
es;mate
is
bad
Suppose fX|Y(x|0) is:
10
0 0.01 1 1.38 x
• MAP:
the
max
of
the
tall
le[
triangle,
x*
=
0.005.
• Le[
triangle:
0.05
of
the
condi;onal
probability
mass.
An
example
when
the
MAP
es;mate
is
bad
Suppose fX|Y(x|0) is:
10
0 0.01 1 1.38 x
• MAP:
the
max
of
the
tall
le[
triangle,
x*
=
0.005.
• Le[
triangle:
0.05
of
the
condi;onal
probability
mass.
• MAP
makes
errors
of
≥
0.995
with
cond.
prob.
0.95.
An
example
when
the
MAP
es;mate
is
bad
Suppose fX|Y(x|0) is:
10
0 0.01 1 1.38 x
• MAP:
the
max
of
the
tall
le[
triangle,
x*
=
0.005.
• Le[
triangle:
0.05
of
the
condi;onal
probability
mass.
• MAP
makes
errors
of
≥
0.995
with
cond.
prob.
0.95.
• Another
es;mate:
condi;onal
mean
E[X|Y=0]
=
0.050.005
+
0.951.19
≈
1.13.
Mean
Squared
Error
(MSE)
for
an
Es;mator
g(Y)
of
X
Mean
Squared
Error
(MSE)
for
an
Es;mator
g(Y)
of
X
• The
expecta;on
is
taken
with
respect
to
the
joint
distribu;on
of
X
and
Y.
Mean
Squared
Error
(MSE)
for
an
Es;mator
g(Y)
of
X
• The
expecta;on
is
taken
with
respect
to
the
joint
distribu;on
of
X
and
Y.
• This
is
the
expecta;on
of
the
squared
difference
between
the
unobserved
random
variable
X
and
its
es;mator
based
on
the
observed
random
variable
Y.
Mean
Squared
Error
(MSE)
for
an
Es;mator
g(Y)
of
X
• The
expecta;on
is
taken
with
respect
to
the
joint
distribu;on
of
X
and
Y.
• This
is
the
expecta;on
of
the
squared
difference
between
the
unobserved
random
variable
X
and
its
es;mator
based
on
the
observed
random
variable
Y.
• Makes
large
errors
more
costly
than
small
ones.
Condi;onal
Mean
Squared
Error
for
an
Es;mate
g(y)
of
X,
given
Y=y
Condi;onal
Mean
Squared
Error
for
an
Es;mate
g(y)
of
X,
given
Y=y
• The
expecta;on
is
taken
with
respect
to
the
condi;onal
distribu;on
of
X
given
Y=y.
Condi;onal
Mean
Squared
Error
for
an
Es;mate
g(y)
of
X,
given
Y=y
• The
expecta;on
is
taken
with
respect
to
the
condi;onal
distribu;on
of
X
given
Y=y.
• This
is
the
condi;onal
expecta;on,
given
Y=y,
of
the
squared
difference
between
the
unobserved
random
variable
X
and
its
es;mate
based
on
the
specific
observa;on
y
of
the
random
variable
Y.
Least
Mean
Squares
(LMS)
Es;mator
• LMS
es'mator
for
X
based
on
Y
is
the
es'mator
that
minimizes
the
MSE
among
all
es'mators
of
X
based
on
Y.
Least
Mean
Squares
(LMS)
Es;mator
• LMS
es'mator
for
X
based
on
Y
is
the
es'mator
that
minimizes
the
MSE
among
all
es'mators
of
X
based
on
Y.
• If
g(Y)
is
the
LMS
es;mator,
then,
for
any
observa;on
y
of
Y,
the
number
g(y)
minimizes
the
condi;onal
MSE
among
all
es;mates
of
X.
Least
Mean
Squares
(LMS)
Es;mator
• LMS
es'mator
for
X
based
on
Y
is
the
es'mator
that
minimizes
the
MSE
among
all
es'mators
of
X
based
on
Y.
• If
g(Y)
is
the
LMS
es;mator,
then,
for
any
observa;on
y
of
Y,
the
number
g(y)
minimizes
the
condi;onal
MSE
among
all
es;mates
of
X.
• The
LMS
es'mator
of
X
based
on
Y
is
E[X|Y].
Least
Mean
Squares
(LMS)
Es;mator
• LMS
es'mator
for
X
based
on
Y
is
the
es'mator
that
minimizes
the
MSE
among
all
es'mators
of
X
based
on
Y.
• If
g(Y)
is
the
LMS
es;mator,
then,
for
any
observa;on
y
of
Y,
the
number
g(y)
minimizes
the
condi;onal
MSE
among
all
es;mates
of
X.
• The
LMS
es'mator
of
X
based
on
Y
is
E[X|Y].
• For
any
observa;on
y
of
Y,
the
LMS
es;mate
of
X
is
the
mean
of
the
posterior
density,
i.e.,
E[X|Y=y].
LMS
es;mate
with
no
data
• Suppose
X
is
a
r.v.
with
a
known
mean.
LMS
es;mate
with
no
data
• Suppose
X
is
a
r.v.
with
a
known
mean.
• We
observe
nothing.
LMS
es;mate
with
no
data
• Suppose
X
is
a
r.v.
with
a
known
mean.
• We
observe
nothing.
• We
would
like
an
es;mate
x*
of
X
which
minimizes
the
MSE,
E[(X−x*)2].
LMS
es;mate
with
no
data
• Suppose
X
is
a
r.v.
with
a
known
mean.
• We
observe
nothing.
• We
would
like
an
es;mate
x*
of
X
which
minimizes
the
MSE,
E[(X−x*)2].
• E[(X−x*)2]
=
var(X−x*)
+
(E[X−x*])2
LMS
es;mate
with
no
data
• Suppose
X
is
a
r.v.
with
a
known
mean.
• We
observe
nothing.
• We
would
like
an
es;mate
x*
of
X
which
minimizes
the
MSE,
E[(X−x*)2].
• E[(X−x*)2]
=
var(X−x*)
+
(E[X−x*])2
=
var(X)
+
(E[X]
−
x*)2
LMS
es;mate
with
no
data
• Suppose
X
is
a
r.v.
with
a
known
mean.
• We
observe
nothing.
• We
would
like
an
es;mate
x*
of
X
which
minimizes
the
MSE,
E[(X−x*)2].
• E[(X−x*)2]
=
var(X−x*)
+
(E[X−x*])2
=
var(X)
+
(E[X]
−
x*)2
MSE
var(X)
E[X]
x*
LMS
es;mate
with
no
data
• Suppose
X
is
a
r.v.
with
a
known
mean.
• We
observe
nothing.
• We
would
like
an
es;mate
x*
of
X
which
minimizes
the
MSE,
E[(X−x*)2].
• E[(X−x*)2]
=
var(X−x*)
+
(E[X−x*])2
=
var(X)
+
(E[X]
−
x*)2
• This
is
a
quadra;c
func;on
of
x*,
minimized
by
x*=E[X].
MSE
var(X)
E[X]
x*
LMS
es;mate
from
one
observa;on
• Suppose
X,
Y
are
r.v.’s,
and
E[X|Y=y]
is
known.
LMS
es;mate
from
one
observa;on
• Suppose
X,
Y
are
r.v.’s,
and
E[X|Y=y]
is
known.
• We
observe
Y=y.
LMS
es;mate
from
one
observa;on
• Suppose
X,
Y
are
r.v.’s,
and
E[X|Y=y]
is
known.
• We
observe
Y=y.
• We
would
like
an
es;mate
x*
of
X
which
minimizes
the
posterior
MSE,
E[(X−x*)2|Y=y].
LMS
es;mate
from
one
observa;on
• Suppose
X,
Y
are
r.v.’s,
and
E[X|Y=y]
is
known.
• We
observe
Y=y.
• We
would
like
an
es;mate
x*
of
X
which
minimizes
the
posterior
MSE,
E[(X−x*)2|Y=y].
• E[(X−x*)2|Y=y]
=
var(X−x*|Y=y)
+
(E[X−x*]
|Y=y)2
LMS
es;mate
from
one
observa;on
• Suppose
X,
Y
are
r.v.’s,
and
E[X|Y=y]
is
known.
• We
observe
Y=y.
• We
would
like
an
es;mate
x*
of
X
which
minimizes
the
posterior
MSE,
E[(X−x*)2|Y=y].
• E[(X−x*)2|Y=y]
=
var(X−x*|Y=y)
+
(E[X−x*]
|Y=y)2
=
var(X|Y=y)
+
(E[X|Y=y]
−
x*)2
LMS
es;mate
from
one
observa;on
• Suppose
X,
Y
are
r.v.’s,
and
E[X|Y=y]
is
known.
• We
observe
Y=y.
• We
would
like
an
es;mate
x*
of
X
which
minimizes
the
posterior
MSE,
E[(X−x*)2|Y=y].
• E[(X−x*)2|Y=y]
=
var(X−x*|Y=y)
+
(E[X−x*]
|Y=y)2
=
var(X|Y=y)
+
(E[X|Y=y]
−
x*)2
MSE
var(X|Y=y)
E[X|Y=y]
x*
LMS
es;mate
from
one
observa;on
• Suppose
X,
Y
are
r.v.’s,
and
E[X|Y=y]
is
known.
• We
observe
Y=y.
• We
would
like
an
es;mate
x*
of
X
which
minimizes
the
posterior
MSE,
E[(X−x*)2|Y=y].
• E[(X−x*)2|Y=y]
=
var(X−x*|Y=y)
+
(E[X−x*]
|Y=y)2
=
var(X|Y=y)
+
(E[X|Y=y]
−
x*)2
• Quadra;c
func;on
of
x*,
minimized
by
x*=E[X|Y=y].
MSE
var(X|Y=y)
E[X|Y=y]
x*
LMS
es;mator
• Suppose
X,
Y
are
r.v.’s,
and
E[X|Y=y]
is
known
for
all
y.
LMS
es;mator
• Suppose
X,
Y
are
r.v.’s,
and
E[X|Y=y]
is
known
for
all
y.
• Let
g(y)
=
E[X|Y=y]
be
the
LMS
es;mate
of
X
based
on
data
Y=y.
LMS
es;mator
• Suppose
X,
Y
are
r.v.’s,
and
E[X|Y=y]
is
known
for
all
y.
• Let
g(y)
=
E[X|Y=y]
be
the
LMS
es;mate
of
X
based
on
data
Y=y.
• Then
g(Y)
=
E[X|Y]
is
an
es;mator
of
X
based
on
Y.
LMS
es;mator
• Suppose
X,
Y
are
r.v.’s,
and
E[X|Y=y]
is
known
for
all
y.
• Let
g(y)
=
E[X|Y=y]
be
the
LMS
es;mate
of
X
based
on
data
Y=y.
• Then
g(Y)
=
E[X|Y]
is
an
es;mator
of
X
based
on
Y.
• Let
h(Y)
be
any
other
es;mator
of
X
based
on
Y.
LMS
es;mator
• Suppose
X,
Y
are
r.v.’s,
and
E[X|Y=y]
is
known
for
all
y.
• Let
g(y)
=
E[X|Y=y]
be
the
LMS
es;mate
of
X
based
on
data
Y=y.
• Then
g(Y)
=
E[X|Y]
is
an
es;mator
of
X
based
on
Y.
• Let
h(Y)
be
any
other
es;mator
of
X
based
on
Y.
• We
just
showed
that,
for
every
value
of
y,
we
have:
E[(X−g(y))2|Y=y]
≤
E[(X−h(y))2|Y=y]
LMS
es;mator
• Suppose
X,
Y
are
r.v.’s,
and
E[X|Y=y]
is
known
for
all
y.
• Let
g(y)
=
E[X|Y=y]
be
the
LMS
es;mate
of
X
based
on
data
Y=y.
• Then
g(Y)
=
E[X|Y]
is
an
es;mator
of
X
based
on
Y.
• Let
h(Y)
be
any
other
es;mator
of
X
based
on
Y.
• We
just
showed
that,
for
every
value
of
y,
we
have:
E[(X−g(y))2|Y=y]
≤
E[(X−h(y))2|Y=y]
• Therefore,
E[(X−g(Y))2|Y]
≤
E[(X−h(Y))2|Y]
LMS
es;mator
• Suppose
X,
Y
are
r.v.’s,
and
E[X|Y=y]
is
known
for
all
y.
• Let
g(y)
=
E[X|Y=y]
be
the
LMS
es;mate
of
X
based
on
data
Y=y.
• Then
g(Y)
=
E[X|Y]
is
an
es;mator
of
X
based
on
Y.
• Let
h(Y)
be
any
other
es;mator
of
X
based
on
Y.
• We
just
showed
that,
for
every
value
of
y,
we
have:
E[(X−g(y))2|Y=y]
≤
E[(X−h(y))2|Y=y]
• Therefore,
E[(X−g(Y))2|Y]
≤
E[(X−h(Y))2|Y]
• Take
expecta;ons
of
both
sides
and
use
the
iterated
expecta;on
formula:
E[(X−g(Y))2]
≤
E[(X−h(Y))2]
LMS
es;mator
• Suppose
X,
Y
are
r.v.’s,
and
E[X|Y=y]
is
known
for
all
y.
• Let
g(y)
=
E[X|Y=y]
be
the
LMS
es;mate
of
X
based
on
data
Y=y.
• Then
g(Y)
=
E[X|Y]
is
an
es;mator
of
X
based
on
Y.
• Let
h(Y)
be
any
other
es;mator
of
X
based
on
Y.
• We
just
showed
that,
for
every
value
of
y,
we
have:
E[(X−g(y))2|Y=y]
≤
E[(X−h(y))2|Y=y]
• Therefore,
E[(X−g(Y))2|Y]
≤
E[(X−h(Y))2|Y]
• Take
expecta;ons
of
both
sides
and
use
the
iterated
expecta;on
formula:
E[(X−g(Y))2]
≤
E[(X−h(Y))2]
• Therefore,
the
condi;onal
expecta;on
es;mator
achieves
an
MSE
which
is
≤
to
the
MSE
for
any
other
es;mator.
Other
names
that
LMS
es;mate
goes
by
in
literature
• Least‐squares
es;mate
(LSE)
• Bayes
least‐squares
es;mate
(BLSE)
• Minimum
mean
square
error
es;mate
(MMSE)
Ex
8.3:
es;ma;ng
the
mean
of
a
Gaussian
r.v.
• Observe
Yi
=
X
+
Wi
for
i
=
1,…,n
• X,
W1,
…,
Wn
independent
Gaussian,
with
known
means
and
variances
• Wi
has
mean
0
and
variance
σi2
• X
has
mean
x0
and
variance
σ02
• Previously,
we
found
the
MAP
es;mate
of
X
based
on
observing
Y
=
(Y1,
…,
Yn)
• Now
let’s
find
the
LMS
es;mate
of
X
based
on
observing
Y
=
(Y1,
…,
Yn)
Prior
model
and
measurement
model
Prior model:
Measurement model:
2
1⎛ y −x⎞
n
1 − ⎜ i
2 ⎝ σ i ⎟⎠
fY| X (y | x) = ∏ e
i =1 σ i 2π
− n /2 ⎛
n
1 ⎞ ⎛ n 1 ⎛ y − x⎞2⎞
= ( 2π ) ⎜ ∏
⎝ σ ⎟⎠ exp ⎜ − ∑ 2 ⎜⎝ σ ⎟⎠ ⎟
i
i =1 i ⎝ i =1 i ⎠
Posterior
PDF
fY| X (y | x) f X (x)
f X |Y (x | y) =
fY (y)
Posterior
PDF
fY| X (y | x) f X (x)
f X |Y (x | y) =
fY (y)
Posterior
PDF
fY| X (y | x) f X (x)
f X |Y (x | y) =
fY (y)
Posterior
PDF
fY| X (y | x) f X (x)
f X |Y (x | y) =
fY (y)
1 ⎛ n
1 ⎞ ⎛ n
⎛
1 yi − x ⎞
2
⎞ 1 ⎛ 1 x − x0 ⎞
⎛ ⎞
2
= ( 2π ) ⎜ ∏ ⎟ exp ⎜ − ∑ ⎜
− n /2
⎟ ⎟ exp ⎜ − ⎜ ⎟⎠ ⎟
fY (y) ⎝ i =1 σ i ⎠ ⎝ i =1 2 ⎝ σ i ⎠ σ
⎠ 0 2π ⎝ 2 ⎝ σ 0 ⎠
Posterior
PDF
fY| X (y | x) f X (x)
f X |Y (x | y) =
fY (y)
1 ⎛ n
1 ⎞ ⎛ n
1 ⎛ y − x ⎞
2
⎞ 1 ⎛ 1 ⎛ x − x ⎞
2
⎞
= ( 2π ) ⎜ ∏ ⎟ exp ⎜ − ∑ ⎜
− n /2 i
⎟ ⎟ exp ⎜ − ⎜ 0
⎟ ⎟
fY (y) σ
⎝ i =1 i ⎠ ⎝ i =1 ⎝
2 σ i ⎠ ⎠ σ 0 2π ⎝ 2 ⎝ σ 0 ⎠ ⎠
⎛ n 1 ⎛ y − x⎞2⎞
= C exp ⎜ − ∑ ⎜ i ⎟⎠ ⎟
⎝ i=0 2 ⎝ σ i ⎠
(Denoting y0 = x0 and C = the factor that's not a function of x.)
Further
massaging
the
posterior
PDF…
⎛ n 1 ⎛ y − x⎞2⎞ ⎛ 1 ⎡ n
1 n
y n
y 2
⎤⎞
f X |Y (x | y) = C exp ⎜ − ∑ ⎜ i
⎟ ⎟ = C exp ⎜ − ⎢ x ∑ 2 − 2x ∑ 2 + ∑ 2 ⎥⎟
2 i i
⎛ 1 2 m 2 1 n yi2 ⎞
f X |Y (x | y) = C exp ⎜ − ⎡⎣ x − 2xm + m ⎤⎦ +
2
− ∑ 2⎟
⎝ 2v 2v 2 i = 0 σ i ⎠
Further
massaging
the
posterior
PDF…
⎛ n 1 ⎛ y − x⎞2⎞ ⎛ 1 ⎡ n
1 n
y n
y 2
⎤⎞
f X |Y (x | y) = C exp ⎜ − ∑ ⎜ i
⎟ ⎟ = C exp ⎜ − ⎢ x ∑ 2 − 2x ∑ 2 + ∑ 2 ⎥⎟
2 i i
⎛ 1 2 m 2 1 n yi2 ⎞
f X |Y (x | y) = C exp ⎜ − ⎡⎣ x − 2xm + m ⎤⎦ +
2
− ∑ 2⎟
⎝ 2v 2v 2 i = 0 σ i ⎠
⎛ (x − m)2 ⎞
= C1 exp ⎜ −
⎝ 2v ⎟⎠
Further
massaging
the
posterior
PDF…
⎛ n 1 ⎛ y − x⎞2⎞ ⎛ 1 ⎡ n
1 n
y n
y 2
⎤⎞
f X |Y (x | y) = C exp ⎜ − ∑ ⎜ i
⎟ ⎟ = C exp ⎜ − ⎢ x ∑ 2 − 2x ∑ 2 + ∑ 2 ⎥⎟
2 i i
⎛ 1 2 m 2 1 n yi2 ⎞
f X |Y (x | y) = C exp ⎜ − ⎡⎣ x − 2xm + m ⎤⎦ +
2
− ∑ 2⎟
⎝ 2v 2v 2 i = 0 σ i ⎠
⎛ (x − m)2 ⎞
= C1 exp ⎜ −
⎝ 2v ⎟⎠
⎛ 1 2 m 2 1 n yi2 ⎞
f X |Y (x | y) = C exp ⎜ − ⎡⎣ x − 2xm + m ⎤⎦ +
2
− ∑ 2⎟
⎝ 2v 2v 2 i = 0 σ i ⎠
⎛ (x − m)2 ⎞
= C1 exp ⎜ −
⎝ 2v ⎟⎠
Note
that,
in
this
example,
LMS
=
MAP
Posterior
PDF
and
LMS
es;mate
n
yi
1 ⎛ (x − m)2 ⎞ 1
∑σ2
f X |Y (x | y) = exp ⎜ − where v = n and m = i =n0 i
2π v ⎝ 2v ⎟⎠ 1 1
∑σ2 ∑σ2
i=0 i i=0 i
n
yi
∑σ2
Therefore, the LMS estimate is E[X | Y = y] = m = i =n0 i ,
1
∑σ2
i=0 i
n
Yi
∑σ2
and the LMS estimator is E[X | Y] = i =n0 i
1
∑σ2
i=0 i
Note
that,
in
this
example,
LMS
=
MAP,
because:
•
the
posterior
density
is
Gaussian
•
for
a
Gaussian
density,
the
maximum
is
at
the
mean
Ex.
8.11
• X
=
con;nuous
uniform
over
[4,
10].
• W
=
con;nuous
uniform
over
[‐1,
1].
• X
and
W
are
independent.
Ex.
8.11
• X
=
con;nuous
uniform
over
[4,
10].
• W
=
con;nuous
uniform
over
[‐1,
1].
• X
and
W
are
independent.
• Y
=
X
+
W.
Ex.
8.11
• X
=
con;nuous
uniform
over
[4,
10].
• W
=
con;nuous
uniform
over
[‐1,
1].
• X
and
W
are
independent.
• Y
=
X
+
W.
• Calculate
the
LMS
es;mate
of
X
based
on
Y=y.
Ex.
8.11
• X
=
con;nuous
uniform
over
[4,
10].
• W
=
con;nuous
uniform
over
[‐1,
1].
• X
and
W
are
independent.
• Y
=
X
+
W.
• Calculate
the
LMS
es;mate
of
X
based
on
Y=y.
• Calculate
the
associated
condi;onal
MSE.
Ex.
8.11
• X
=
con;nuous
uniform
over
[4,
10].
• W
=
con;nuous
uniform
over
[‐1,
1].
• X
and
W
are
independent.
• Y
=
X
+
W.
• fY|X(y|x)
is
uniform
over
[x−1,
x+1]
Ex.
8.11
• X
=
con;nuous
uniform
over
[4,
10].
• W
=
con;nuous
uniform
over
[‐1,
1].
• X
and
W
are
independent.
• Y
=
X
+
W.
• fY|X(y|x)
is
uniform
over
[x−1,
x+1]
⎧⎪ 1 / 12, if 4 ≤ x ≤ 10 and x − 1 ≤ y ≤ x + 1
f X ,Y (x, y) = f X (x) fY | X (y | x) = ⎨
⎪⎩ 0, otherwise
Ex.
8.11:
support
of
the
joint
PDF
of
X
and
Y
⎧⎪ 1 / 12, if 4 ≤ x ≤ 10 and x − 1 ≤ y ≤ x + 1
f X ,Y (x, y) = f X (x) fY | X (y | x) = ⎨
⎪⎩ 0, otherwise
x
10
The
joint
PDF
is
supported
x−1=y (i.e.,
is
nonzero)
inside
the
blue
parallelogram.
x+1=y
4
3 5 9 11 y
Ex.
8.11:
the
form
of
the
posterior
f X ,Y (x, y)
f X |Y (x | y) =
fY (y)
Ex.
8.11:
the
form
of
the
posterior
f X ,Y (x, y) fX,Y
is
uniform
over
the
parallelogram.
f X |Y (x | y) =
fY (y)
x
10
x=y+1
x=y−1
4
3 5 9 11 y
Ex.
8.11:
the
form
of
the
posterior
f X ,Y (x, y) fX,Y
is
uniform
over
the
parallelogram.
Therefore,
for
f X |Y (x | y) = any
fixed
y=y0,
the
posterior
is
uniform
over
the
fY (y)
intersec;on
of
the
line
y=y0
with
the
parallelogram.
x
10
x=y+1
x=y−1
4
3 5 9 11 y
Ex.
8.11:
the
posterior
f X ,Y (x, y) fX,Y
is
uniform
over
the
parallelogram.
Therefore,
for
f X |Y (x | y) = any
fixed
y=y0,
the
posterior
is
uniform
over
the
fY (y)
intersec;on
of
the
line
y=y0
with
the
parallelogram.
x fX|Y(x|y)
is
uniform
over:
[4,
y+1]
for
3
≤
y
≤
5
10
x=y+1
x=y−1
4
3 5 9 11 y
Ex.
8.11:
the
posterior
f X ,Y (x, y) fX,Y
is
uniform
over
the
parallelogram.
Therefore,
for
f X |Y (x | y) = any
fixed
y=y0,
the
posterior
is
uniform
over
the
fY (y)
intersec;on
of
the
line
y=y0
with
the
parallelogram.
x fX|Y(x|y)
is
uniform
over:
[4,
y+1]
for
3
≤
y
≤
5
10 [y−1,
y+1]
for
5
≤
y
≤
9
x=y+1
x=y−1
4
3 5 9 11 y
Ex.
8.11:
the
posterior
f X ,Y (x, y) fX,Y
is
uniform
over
the
parallelogram.
Therefore,
for
f X |Y (x | y) = any
fixed
y=y0,
the
posterior
is
uniform
over
the
fY (y)
intersec;on
of
the
line
y=y0
with
the
parallelogram.
x fX|Y(x|y)
is
uniform
over:
[4,
y+1]
for
3
≤
y
≤
5
10 [y−1,
y+1]
for
5
≤
y
≤
9
[y−1,
10]
for
9
≤
y
≤
11
x=y+1
x=y−1
4
3 5 9 11 y
Ex.
8.11:
the
LMS
es;mate
f X ,Y (x, y) fX,Y
is
uniform
over
the
parallelogram.
Therefore,
for
f X |Y (x | y) = any
fixed
y=y0,
the
posterior
is
uniform
over
the
fY (y)
intersec;on
of
the
line
y=y0
with
the
parallelogram.
x fX|Y(x|y)
is
uniform
over:
[4,
y+1]
for
3
≤
y
≤
5
10 [y−1,
y+1]
for
5
≤
y
≤
9
[y−1,
10]
for
9
≤
y
≤
11
x=y+1
Therefore,
E[X|Y=y]
=
(y+5)/2
for
3
≤
y
≤
5
x=y−1
4
3 5 9 11 y
Ex.
8.11:
the
LMS
es;mate
f X ,Y (x, y) fX,Y
is
uniform
over
the
parallelogram.
Therefore,
for
f X |Y (x | y) = any
fixed
y=y0,
the
posterior
is
uniform
over
the
fY (y)
intersec;on
of
the
line
y=y0
with
the
parallelogram.
x fX|Y(x|y)
is
uniform
over:
[4,
y+1]
for
3
≤
y
≤
5
10 [y−1,
y+1]
for
5
≤
y
≤
9
[y−1,
10]
for
9
≤
y
≤
11
x=y+1
Therefore,
E[X|Y=y]
=
(y+5)/2
for
3
≤
y
≤
5
x=y−1 y
for
5
≤
y
≤
9
4
3 5 9 11 y
Ex.
8.11:
the
LMS
es;mate
f X ,Y (x, y) fX,Y
is
uniform
over
the
parallelogram.
Therefore,
for
f X |Y (x | y) = any
fixed
y=y0,
the
posterior
is
uniform
over
the
fY (y)
intersec;on
of
the
line
y=y0
with
the
parallelogram.
x fX|Y(x|y)
is
uniform
over:
[4,
y+1]
for
3
≤
y
≤
5
10 [y−1,
y+1]
for
5
≤
y
≤
9
[y−1,
10]
for
9
≤
y
≤
11
x=y+1
Therefore,
E[X|Y=y]
=
(y+5)/2
for
3
≤
y
≤
5
x=y−1 y
for
5
≤
y
≤
9
4 (y+9)/2
for
9
≤
y
≤
11
3 5 9 11 y
Ex.
8.11:
the
LMS
es;mate
f X ,Y (x, y) fX,Y
is
uniform
over
the
parallelogram.
Therefore,
for
f X |Y (x | y) = any
fixed
y=y0,
the
posterior
is
uniform
over
the
fY (y)
intersec;on
of
the
line
y=y0
with
the
parallelogram.
x fX|Y(x|y)
is
uniform
over:
[4,
y+1]
for
3
≤
y
≤
5
10 [y−1,
y+1]
for
5
≤
y
≤
9
[y−1,
10]
for
9
≤
y
≤
11
x=y+1
Therefore,
E[X|Y=y]
=
(y+5)/2
for
3
≤
y
≤
5
y
for
5
≤
y
≤
9
x=y−1
4 (y+9)/2
for
9
≤
y
≤
11
3 5 9 11 y
Ex.
8.11:
the
condi;onal
MSE
E[(X−E[X|Y=y])2
|
Y=y]
=
var(X|Y=y)
Ex.
8.11:
the
condi;onal
MSE
x fX|Y(x|y)
is
uniform
over:
[4,
y+1]
for
3
≤
y
≤
5
10
x=y+1
Therefore,
E[(X−E[X|Y=y])2
|
Y=y]
=
var(X|Y=y) =
(y−3) 2/12
for
3
≤
y
≤
5
x=y−1
4
3 5 9 11 y
Ex.
8.11:
the
condi;onal
MSE
x fX|Y(x|y)
is
uniform
over:
[4,
y+1]
for
3
≤
y
≤
5
10 [y−1,
y+1]
for
5
≤
y
≤
9
x=y+1
Therefore,
E[(X−E[X|Y=y])2
|
Y=y]
=
var(X|Y=y) =
(y−3) 2/12
for
3
≤
y
≤
5
x=y−1
4 1/3
for
5
≤
y
≤
9
3 5 9 11 y
Ex.
8.11:
the
condi;onal
MSE
x fX|Y(x|y)
is
uniform
over:
[4,
y+1]
for
3
≤
y
≤
5
10 [y−1,
y+1]
for
5
≤
y
≤
9
[y−1,
10]
for
9
≤
y
≤
11
x=y+1
Therefore,
E[(X−E[X|Y=y])2
|
Y=y]
=
var(X|Y=y) =
(y−3) 2/12
for
3
≤
y
≤
5
x=y−1
4 1/3
for
5
≤
y
≤
9
(11−y) 2/12
for
9
≤
y
≤
11
3 5 9 11 y
Ex.
8.11:
the
condi;onal
MSE
E[(X−E[X|Y=y])2
|
Y=y]
=
var(X|Y=y) =
(y−3) 2/12
for
3
≤
y
≤
5
1/3
for
5
≤
y
≤
9
(11−y) 2/12
for
9
≤
y
≤
11
Conditional MSE
1/3
3 5 9 11 y
Es;ma;on
error
for
LMS
We denote the LMS estimator of X based on Y by X̂LMS :
X̂LMS = E[X | Y ]
Therefore,
( ) ( ) ( ) (
var(X) = var X̂LMS − ( X̂LMS − X) = var X̂LMS − X LMS = var X̂LMS + var − X LMS )
= var ( X̂ ) + var ( X )
LMS LMS