You are on page 1of 44

Minimum Mean-Square Error (MMSE) and

Linear MMSE (LMMSE) Estimation


Outline:
MMSE estimation,
Linear MMSE (LMMSE) estimation,
Geometric formulation of LMMSE estimation and
orthogonality principle.
Reading: Chapter 12 in Kay-I.
EE 527, Detection and Estimation Theory, # 4b 1
MMSE Estimation
Consider the following problem:
A signal = is transmitted through a noisy channel, modeled
using the conditional pdf f
X |
(x| ), which is the likelihood
function of . We observe X = x. The signal has known
prior (marginal) pdf
f

() = ()
which summarizes our knowledge about before (i.e. prior to)
collecting X = x. We wish to estimate using the observation
X = x:

=

(x) = g(x).
We choose g(X) to minimize the Bayesian (preposterior)
mean-square error:
BMSE = E
,X
{[

(X) ]
2
} = E
,X
{[g(X) ]
2
}.
Here,

(X) that achieve the minimum BMSE are called
minimum MSE (MMSE) estimates of .

(X) may not
be unique.
EE 527, Detection and Estimation Theory, # 4b 2
A Reminder: MMSE Estimation
Theorem 1. The MMSE estimate of (based on the
observation X = x) is given by

MMSE
= g(x) = E
| X
(| x). (1)
The minimum BMSE (i.e. the BMSE of

MMSE
(x) =
E
| X
[| x]) is
MBMSE = E
X
[var
| X
(| X)]
= E

(
2
) E
X
{[E
| X
(| X)]
2
}. (2)
Lemma 1. We rst show that
min
b
E

[(b )
2
] = var

()
is achieved for
b = E

().
Therefore, in absence of any observations, the MMSE estimate
EE 527, Detection and Estimation Theory, # 4b 3
of is equal to the mean of the (prior, marginal) pdf of :
E

[(b)
2
] = E

[(E

() + E

() b)
2
]
= E

_
[E

()]
2
+ [E

() b]
2
+2 [E

() b] [E

()]
_
= E

[(E

[])
2
] + (E

[] b)
2
+2 [E

() b] E

[E

()]
. .
0
E

{[E

()]
2
}
with equality if and only if b = E

().
Proof. (Theorem 1) We now consider our MMSE estimation
problem, write BMSE of an estimator g(X) as
BMSE = E
,X
{[g(X)]
2
}
iter. exp.
= E
X
_
E
| X
{[g(X)]
2
| X}
. .
(
b
| X), see handout # 4
_
and use Lemma 1 to conclude that, for each X = x, the
posterior expected squared loss
(

| X) = E
| X
{[g(X)]
2
| x}
EE 527, Detection and Estimation Theory, # 4b 4
is minimized for
g(x) = E
| X
(| x).
Thus, BMSE is minimized for
g(X) = E
| X
(| X).
We now nd the minimum BMSE:
MBMSE = E
,X
{[E
| X
(| X)]
2
}
iter. exp.
= E
X
[E
| X
{[E
| X
(| X)]
2
| X}]
= E
X
[var
| X
(| X)]. (3)
2
Comments:

E
X
[

MMSE
(X)] = E

() unbiased on average. (4)


However,

MMSE
(X) is practically never unbiased in the
classical sense:
E
X |
[

MMSE
(X) | ] = in general. (5)
You will show (5) in a HW assignment.
EE 527, Detection and Estimation Theory, # 4b 5
For independent and X, the MMSE estimate of is

MMSE
(X) = E

().
The estimation error
E =

MMSE
(X) (6)
and the MMSE estimate

MMSE
(X) are orthogonal:
E
,X
[E

MMSE
(X)] = E
,X
{[

MMSE
(X) ]

MMSE
(X)}
iter. exp.
= E
X
{E
| X
([

MMSE
(X) ]

MMSE
(X) | X)}
= E
X
{

MMSE
(X) E
| X
[

MMSE
(X) | X]} = 0
since

MMSE
(X) = E
| X
[| X]. It is clear from this
derivation that the estimation error E in (6) is orthogonal
to any function g(X) of X:
E
,X
{[

MMSE
(X)] g(X)}
= E
X
{E
| X
([

MMSE
(X)] g(X) | X)}
= E
X
{g(X) E
| X
[

MMSE
(X) | X]} = 0.
The law of conditional variances [(5) in handout # 0b]
EE 527, Detection and Estimation Theory, # 4b 6
implies
var

() = E
X
[var
| X
(| X)]
. .
MBMSE, see (3)
+var
X
( E
| X
[| X]
. .
b

MMSE
(X), see (1)
)
i.e. the sum of
the minimum BMSE for estimating and
variance of the MMSE estimate of
is equal to the (marginal, prior) variance of .
EE 527, Detection and Estimation Theory, # 4b 7
Additive Gaussian Noise Channel
Consider a communication channel with input
N(

,
2

)
and noise
W N(0,
2
)
where and W are independent and the measurement X is
modeled as
X = + W. (7)
Find the MMSE estimate of based on X and the
resulting minimum BMSE (MBMSE), i.e. E
| X
(| X) and
E
X
[var
| X
(| X)], see (1) and (2).
Note: We have already considered this problem in handout
# 4. We revisit it here with focus on MMSE estimation and
nding MBMSE.
Solution: From (7), we have:
f
X |
(x| ) = N(x| ,
2
).
EE 527, Detection and Estimation Theory, # 4b 8
We now nd f
| X
( | x) using the Bayes rule:
f
| X
( | x) f

() f
X |
(x| )
exp
_

1
2
2

)
2

exp
_

1
2
2
(x )
2

exp
_

1
2
(
1

+
1

2
)
2
+ (
1

+
1

2
x)
_
= N
_

2
x +
1

2
+
1

,
_
1

2
+
1

_
1
_
implying that

MMSE
(X) = E
| X
(| X) =
1

2
X +
1

2
+
1

(8)
var
| X
(| X) =
_
1

2
+
1

_
1
(9)
and, consequently,
MBMSE = E
X
[var
| X
(| X)] =
_
1

2
+
1

_
1
. (10)
Note: In the above example, the MMSE estimate is a linear
(more precisely, constant + linear = ane) function of the
EE 527, Detection and Estimation Theory, # 4b 9
observation X. This is not always the case, e.g. for
f
| X
( | x) =
_
xe
x
x > 0, 0
0 otherwise
we obtain E
| X
(| X) = 1/X. Here is another example.
Computing the MMSE estimator: Another example.
EE 527, Detection and Estimation Theory, # 4b 10
EE 527, Detection and Estimation Theory, # 4b 11
Gaussian Linear Model (Theorem 10.3 in Kay-I)
Theorem 2. Consider the linear model:
X = H +W
H is a known matrix, and
W N(0, C
W
)
N(

, C

)
where W and are independent and C
W
,

, and C

are
known hyperparameters. Then, the posterior pdf f
| X
( | x)
is Gaussian:
f
| X
( | x) = N
_
| (H
T
C
1
W
H + C
1

)
1
(H
T
C
1
W
x + C
1

),
(H
T
C
1
W
H + C
1

)
1
_
. (11)
EE 527, Detection and Estimation Theory, # 4b 12
Proof.
f
| X
( | x) f
X|
(x| ) ()
exp[
1
2
(x H)
T
C
1
W
(x H)]
exp[
1
2
(

)
T
C
1

)]
exp(
1
2

T
H
T
C
1
W
H +x
T
C
1
W
H)
exp(
1
2

T
C
1

+
T

C
1

)
= exp[
1
2

T
(H
T
C
1
W
H + C
1

) + (x
T
C
1
W
H +
T

C
1

) ]
N
_
(H
T
C
1
W
H + C
1

)
1
(H
T
C
1
W
x + C
1

),
(H
T
C
1
W
H + C
1

)
1
_
.
2
Comments:
DC-level estimation in AWGN with known variance
introduced on p. 17 of handout # 4 is a special case of
this result, see also Example 10.2 in Kay-I.
EE 527, Detection and Estimation Theory, # 4b 13
Examine the posterior mean:
E
| X
( | x) = ( H
T
C
1
W
H
. .
likelihood precision
+ C
1

..
prior precision
)
1
(H
T
C
1
W
x
. .
data-dependent term
+ C
1

. .
prior-dependent term
).
Noninformative (at) prior on and white noise. Consider
the Jereys noninformative (at) prior pdf for :
() 1 (C
1

= 0)
and white noise:
C
W
=
2
I
..
identity matrix
.
Then, f
| X
( | x) in (11) simplies to
f
| X
( | x) = N
_

LS
(x)
..
(H
T
H)
1
H
T
x,
2
(H
T
H)
1
_
.
Prediction: We now practice prediction for this model. Say
we wish to predict a X

coming from the following model:


X

= h
T

+ W

EE 527, Detection and Estimation Theory, # 4b 14


where W

N(0,
2
) is independent from W, implying
that X

and X are conditionally independent given =


and, therefore,
f
X

| ,X
(x

| , x) = f
X

|
(x

| ) = N(x

| h
T

,
2
).
Then, our posterior predictive pdf is [along the lines of (10)]
f
X

| X
(x

| x) =
_
f
X

|
(x

| )
. .
N(x

| h
T

,
2
)
f
| X
( | x)
. .
N( |
b
(x),C
post
)
d
where

(x) = (H
T
C
1
W
H + C
1

)
1
(H
T
C
1
w
x + C
1

)
C
post
= (H
T
C
1
W
H + C
1

)
1
implying
f
X

| X
(x

| x) = N(h
T


(x), h
T

C
post
h

+
2
).
EE 527, Detection and Estimation Theory, # 4b 15
Linear MMSE (LMMSE) Estimation
For exact MMSE estimation, we need to know the joint pdf
(or joint pmf) f
,X
(, x), typically specied through the prior
(marginal) pdf/pmf f

() and conditional pdf/pmf f


X |
(x| ),
which together yield the joint pdf (or pmf, or combined
pdf/pmf)
f
,X
(, x) = f
X |
(x| ) f

().
This information may not be available.
We typically have estimates of the rst and second moments
of the signal and the observation, i.e. of the means, variances,
and covariance between and X.
This information is generally not sucient for MMSE
estimation of , but is sucient for linear MMSE (LMMSE)
estimation of , i.e. for nding estimates of the form:

=

(X) = a X + b (12)
that minimize BMSE:
BMSE = E
,X
{[

(X)]
2
}.
The minimization is with respect to a and b.
EE 527, Detection and Estimation Theory, # 4b 16
Note: Even though it is more appropriate to refer to this
estimator as ane MMSE estimator, linear MMSE estimator
is the most widely used name for it. In most applications, we
consider zero-mean X and ; then, our estimator is indeed
linear, see Theorem 3 below.
Theorem 3. The LMMSE estimate of is

(X) =
cov
,X
(, X)

2
X
. .
a
opt
[X E
X
(X)] + E

()
=
,X

X E
X
(X)

X
+ E

() (13)
and its BMSE is given by
MBMSE
linear
= cov
,X
(

(X), ) (14)
=
2

cov
2
,X
(, X)

2
X
= (1
2
,X
)
2

. (15)
Here,
cov
,X
(, X) = E
,X
[(

) (X
X
)]
= E
,X
(X) E

() E
X
(X)
EE 527, Detection and Estimation Theory, # 4b 17
is introduced on p. 4 of handout # 0b,
var

() = cov

(, ) =
2

var
X
(X) =
2
X
and
,X
is the correlation coecient between and X,
dened as

,X
=
cov
,X
(, X)
_
var

()
_
var
X
(X)
=
cov
,X
(, X)

X
where

=
_

and
X
=
_

2
X
are the (marginal) standard
deviations of and X.
Proof. Suppose rst that the constant a has already been
chosen. Then, choosing the constant b to minimize the BMSE
E
,X
[(a X b)
2
]
is equivalent to nding b that minimizes E

[( b)
2
], where


= a X. This problem is solved in Lemma 1, and the
optimal b is b = E

(), i.e.
b = E

() = E
,X
(a X) = E

() a E
X
(X). (16)
EE 527, Detection and Estimation Theory, # 4b 18
Substituting (16) into E
,X
[(a X
. .

b)
2
] yields:
E

{[ E

()]
2
} = var

() = var
,X
(a X) (17)
=
2

+ a
2

2
X
2 a cov
,X
(, X) (18)
which is easy to minimize with respect to a. In particular,
dierentiating (18) with respect to a and setting the result to
zero yields
2 a
2
X
2 cov
,X
(, X) = 0
i.e.
a cov
,X
(X, X) cov
,X
(, X) = 0
and, nally,
cov
,X
(a X , X) = 0
which is the famous orthogonality principle. Clearly, the optimal
a is
a
opt
=
cov
,X
(, X)

2
X
(19)
and (13) follows. We summarize the orthogonality principle:
EE 527, Detection and Estimation Theory, # 4b 19
cov
,X
(a
opt
X , X) = 0 (20)
or, equivalently,
cov
,X
_

(X)
. .
LMMSE est. of
based on X
, X
_
= 0. (21)
Substituting (19) into (17) yields
MMSE
linear
= cov
,X
(a
opt
X, a
opt
X)
. .
var
,X
(a
opt
X)
= cov
,X
(a
opt
X, ) a
opt
cov
,X
(a
opt
X, X)
. .
0, by (20)
=
2

cov
2
,X
(, X)

2
X
and (15) follows. By completing the squares, it is easy to check
EE 527, Detection and Estimation Theory, # 4b 20
that, for any a R,
var
,X
(a X) = var
,X
(a X + a
opt
X a
opt
X)
= var
,X
_
(a
opt
X) (a a
opt
) X
_
= var
,X
(a
opt
X)
. .
MBMSE
linear
+(a a
opt
)
2
var
X
(X)
2 (a a
opt
) cov
,X
(a
opt
X, X)
. .
0, by (20)
=
2

[cov
,X
(, X)]
2

2
X
. .
see (15)
+
2
X
(a a
opt
)
2
(22)
which proves MMSE optimality of (19). 2
Comments:

E
X
[

(X)] = E

()
also true for the MMSE estimate, see (4).
If
,X
= 0, i.e. and X are uncorrelated, then

(X) = E

() = constant
i.e. LMMSE estimation ignores the observation X.
EE 527, Detection and Estimation Theory, # 4b 21
If
,X
= 1, i.e. E

() and X E
X
(X) are linearly
dependent with probability one, then the LMMSE estimate
is perfect.
EE 527, Detection and Estimation Theory, # 4b 22
LMMSE vs. MMSE
In general, the LMMSE estimate is not as good as the MMSE
estimate.
Example: Suppose that
X U(1, 1) uniform pdf, see the table of distributions
and
= X
2
.
The MMSE estimate of based on X is

(X) = E
| X
(| X) = X
2
which is perfect. To nd the LMMSE estimate of based on
X, we need
E
X
(X) = 0
E

() =
_
1
1
x
2

1
2
dx =
1
3
cov
,X
(, X) = E
,X
(X) 0 = 0 and X uncorr.
EE 527, Detection and Estimation Theory, # 4b 23
yielding the LMMSE estimate

(X) = E

() =
1
3
i.e. the observation X is totally ignored even though it
completely determines .
An class of random signals for which the MMSE estimate is
linear is the class of jointly Gaussian random signals, e.g.
and X in the additive Gaussian noise channel example on p. 8.
EE 527, Detection and Estimation Theory, # 4b 24
Linear MMSE Estimation:
A Geometric Formulation
We rst introduce some background:
A vector space V (e.g. the common Euclidean space) consists
of a set of vectors that is closed under two operations:
vector addition: if v
1
, v
2
V, then v
1
+v
2
V and
scalar multiplication: if a R and v V, then a v V.
An inner product, (e.g. scalar product product in Euclidean
spaces), is an operation u v satisfying
commutativity: u v = v u,
linearity: (a u + b v) w = a u w + b v w, and
the inner product of any vector with itself
is non-negative: u u 0, and
u u = 0 if and only if u = 0.
The norm of u is dened as u =

u u.
u and v are orthogonal (written u v) if and only if
u v = 0.
EE 527, Detection and Estimation Theory, # 4b 25
A vector space with an inner product is called an inner-
product space. Example: Euclidean space with the scalar
product.
How about a vector space for random variables?
Consider random variables X and Y as vectors in an inner-
product space V that contains all RVs dened over the same
probability space, with
vector addition: V
1
+ V
2
V,
scalar multiplication: a V V,
inner product: V
1
V
2
= cov
V
1
,V
2
(V
1
, V
2
) (check that it is a
legitimate inner product),
the norm of V : V =
_
var
V
(V
2
) =
V
.
Hence,
EE 527, Detection and Estimation Theory, # 4b 26
inner product cov
X,Y
(X, Y )
norm of X
X
norm of Y
Y
cos
X,Y
.
The linear MMSE estimation problem can be recast in the
above geometric framework after substituting the optimal b
from (16) into E
,X
{[a X b]
2
}, yielding
var
,X
(a X) = a X
2
.
We wish to minimize this variance with respect to a.
Clearly, a X
2
is minimized if
( a X) X
EE 527, Detection and Estimation Theory, # 4b 27
i.e. if
cov
,X
(a X, X) = 0
and, consequently, the MMSE-optimal linear term a is
a
opt
=
cov
,X
(, X)
var
X
(X)
=
cov
,X
(, X)

2
X
.
EE 527, Detection and Estimation Theory, # 4b 28
To summarize:
Orthogonality principle:
(a
opt
X) X
i.e.
cov
,X
(a
opt
X, X) = 0 (23)
see (20).
EE 527, Detection and Estimation Theory, # 4b 29
Additive White Noise Channel
Consider again the communication channel example on p. 8,
with input having mean

and variance
2

and noise
W having mean zero and variance
2
, where and W are
independent and the measurement X is
X = + W.
Find the LMMSE estimate of based on X and the resulting
BMSE (MBMSE
linear
). We need
E

() =

E
X
(X) = E
X,W
( + W) = E

() + E
W
(W) =

and
cov
,X
(, X) = cov
,W
(, + W)
and W uncorr.
=
2

cov
X
(X) = cov
X,W
( + W, + W)
and W uncorr.
=
2

+
2
.
EE 527, Detection and Estimation Theory, # 4b 30
The LMMSE estimate of X is

(X) =
cov
,X
(, X)

2
X
[X E
X
(X)] + E
X
(X)
=

2

+
2
(X
X
) +
X
=

2

+
2
X +

2

+
2

=
1

2
X +
1

2
+
1

which is the same as the MMSE estimate in (8).


EE 527, Detection and Estimation Theory, # 4b 31
Example: Estimating the Bias of a Coin
Suppose that (prior) pdf of heads of a coin is
f

() = U( | 0, 1) = i
(0,1)
().
We ip this coin N times and record the number of heads X.
Then, if the coin ips are independent, identically distributed
(i.i.d.), the conditional pdf of X given = is
f
X |
(x| ) =
_
N
x
_

x
(1)
Nx
= Bin(x| N, ) binomial pdf.
(24)
Find the MMSE and LMMSE estimates of based on X.
MMSE:
f
| X
( | x) f

() f
X |
(x| )
i
(0,1)
()
x
(1 )
Nx
= Beta( | x + 1, N x + 1)
see the table of distributions. Now, the MMSE estimate of
is

MMSE
(x) = E
| X
(| X = x) =
x + 1
N + 2
.
EE 527, Detection and Estimation Theory, # 4b 32
LMMSE: We need

= E

() =
1
2
mean of uniform(0, 1) pdf

X
= E
,X
(X)
iter. exp.
= E

[E
X |
(X| )]
= E

( N
..
mean of binomial pdf in (24)
) =
1
2
N
and

2
X
cond. var.
= E

{ var
X |
(X| )
. .
var of binomial in (24)
} + var

{ E
X |
[X| ]
. .
mean of binomial in (24)
}
= E

[N (1 )] + var

(N )
= N E

[(1 )] + N
2
var

()
= N (
1
2

1
3
) + N
2
1
12
=
N (N + 2)
12
cov
,X
(, X) = E
,X
(X)

X
iterated exp.
= E

{E
X |
[X| ]}
N
4
= E

[E
X |
(X| )]
N
4
= E

{ N
..
mean of binomial in (24)
}
N
4
=
N
3

N
4
=
N
12
.
EE 527, Detection and Estimation Theory, # 4b 33
Now,

(X) =
cov
,X
(, X)

2
X
(X
X
) +

=
N/12
N (N + 2)/12
(X
1
2
N) +
1
2
=
X + 1
N + 2
.
In this example, the MMSE and LMMSE estimates of are the
same.
EE 527, Detection and Estimation Theory, # 4b 34
Linear MMSE Estimation: the Vector Case
(FIR Wiener Filter)
Consider the signal of interest with prior knowledge described
by the pdf
f

()
and an N-dimensional random vector X representing the
observations.
The MMSE estimate of X is the conditional expectation
E
| X
(| X)
which may be dicult to nd in practice, since it requires
knowledge of the joint distribution of and X.
The linear MMSE estimate of X is easier to nd, since it
depends only on the means, variances, and covariances of
the random variables and vectors involved.
EE 527, Detection and Estimation Theory, # 4b 35
Linear MMSE Estimation via the Orthogonality
Principle
We wish to nd an N 1 vector a and a constant b such that

(X) = a
T
X + b =
N

i=1
a
i
X
i
+ b
minimizes the BMSE
BMSE = E
,X
{[

(X)]
2
}
where
X =
_
_
X[0]
. . .
X[N 1]
_
_
.
Suppose rst that the constant vector a has already been
chosen. Then, choosing the constant b to minimize the BMSE
BMSE = E
,X
[(a
T
X b)
2
]
is equivalent to nding b that minimizes E

[( b)
2
] where
= a
T
X. This problem is solved in Lemma 1 and the
optimal b is
b = E

() = E
,X
(a
T
X) = E

()a
T
E
X
(X). (25)
EE 527, Detection and Estimation Theory, # 4b 36
We view , X[0], . . . , X[N 1] as vectors in an inner-product
space. The linear MMSE estimation problem can be cast into
our geometric framework after substituting the optimal b in
(25) into BMSE = E
,X
{[

(X)]
2
}, yielding
var
,X
(a
T
X) = a
T
X
2
. (26)
We minimize this variance with respect to a.
Clearly, a
T
X
2
is minimized if a is chosen to satisfy the
orthogonality principle:
(a
T
X) subspace V
N
spanned by X[0], X[1], . . . , X[N 1]
EE 527, Detection and Estimation Theory, # 4b 37
or, equivalently,
cov
,X
(a
T
X, X[n]) = 0 n = 0, 1, . . . , N 1 (27)
which gives the following set of equations:
cov
,X[n]
(, X[n]) cov
X
_
N1

l=0
a
l
X[l], X[n]
_
= 0
or
N1

l=0
cov
X[n],X[l]
(X[n], X[l]) a
l
= cov
X[n],
(X[n], ). (28)
Dene the crosscovariance vector between X and and
covariance matrix of X as

X,
= cov
X,
(X, ) =
_

_
cov
X[0],X
(X[0], )
cov
X[1],
(X[1], )
.
.
.
cov
X[N1],
(X[N 1], )
_

_
and

X
= cov
X
(X)
and use these denitions to compactly write (28):

X
a =
X,
.
EE 527, Detection and Estimation Theory, # 4b 38
If
X
is a positive denite matrix, we can solve for a:
a
opt
=
1
X

X,
(29)
and, nally, the LMMSE estimate of is [using (25)]

(X) = a
T
opt
X + E

() a
T
opt
E
X
(X)
=
T
X,

1
X
. .
a
T
opt
[X E
X
(X)] + E

(). (30)
Compare this result to the scalar case in (13):

(X) =
cov
,X
(, X)

2
X
. .
a
opt
[X E
X
(X)] + E

().
We now nd the minimum BMSE of our LMMSE estimator:
substitute (29) into (26), yielding
MBMSE
linear
= cov
,X
(a
T
opt
X, a
T
opt
X)
= cov
,X
(a
T
opt
X, ) cov
,X
(a
T
opt
X, a
T
opt
X)
= cov
,X
(a
T
opt
X, ) cov
,X
(a
T
opt
X, X)
. .
= 0, see (27)
a
opt
= cov
,X
(a
T
opt
X, ) (31)
EE 527, Detection and Estimation Theory, # 4b 39
which can also be written as
MBMSE
linear
= cov
,X
(

(X), ) (32)
and further simplied:
MBMSE
linear
=
2
X
a
T
opt
cov
X,
(X, )
. .

X,
see (29)
=
2

T
X,

1
X

X,
. (33)
Compare this result to the scalar case in (15):
MBMSE
linear
=
2

cov
2
,X
(, X)

2
X
.
EE 527, Detection and Estimation Theory, # 4b 40
Example: Additive Noise Channel
Again:
N(

,
2

)
where

and
2

are known hyperparameters. We collect


multiple observations X[n], modeled as
X[n] = + W[n] n = 0, 1, . . . , N 1
where W[n] are zero-mean uncorrelated RVs with known
variance
2
. We also know that and W[n] are uncorrelated
for all n. Find the LMMSE estimate of based on
X =
_
_
X[0]
. . .
X[N 1]
_
_
.
Find also the minimum BMSE.
By the orthogonality principle (27), we have:
cov
,X[n]
(, X[n]) cov
X
_
N1

l=0
a
l
X[l], X[n]
_
= 0
EE 527, Detection and Estimation Theory, # 4b 41
for n = 0, 1, . . . , N 1. Here,
cov
,X[n]
(, X[n]) = cov
,W[n]
( + W[n], )
= cov

(, ) + cov
W[n],X
(W[n], )
= var

() =
2

(34)
cov
X[l],X[n]
(X[l], X[n]) = cov
,W
( + W[l], + W[n])
=
_

2

, l = n

2 +
2
, l = n
(35)
and, therefore,

= (

2 +
2
) a
0
+
2

a
1
+ . . . +
2

a
N1

=
2

a
0
+ (

2 +
2
) a
1
+ . . . +
2

a
N1

=
2

a
0
+
2

a
1
+ . . . + (

2 +
2
) a
N1
.
Now, by symmetry,
a
opt,1
= a
opt,2
= . . . = a
opt,N
=

2

N
2

+
2
EE 527, Detection and Estimation Theory, # 4b 42
yielding

(X) =

2

N
2

+
2
N1

n=0
(X[n]

) +

=

2

N
2

+
2
_
N1

n=0
X[n]
_
+

2
N
2

+
2

(36)
=
N

2
X +
1

2
+
1

where
X =
1
N
N1

n=0
X[n].
The minimum average MSE of our LMMSE estimator follows
EE 527, Detection and Estimation Theory, # 4b 43
by using (31):
MBMSE
linear
= cov
,X
(

(X), )
=
2

cov
,X
(

(X), )
see (36)
=
2

cov
,X
_

2

N
2

+
2
_
N1

n=0
X[n]
_
,
_
see (34)
=
2

N
2

+
2
N1

n=0
cov
X[n],
(X[n], )
=
2

N
2

+
2
N
2

=

2

2
N
2

+
2
=
_
N

2
+
1

_
1
which is the same as
2
N
in (15) of handout # 4.
EE 527, Detection and Estimation Theory, # 4b 44

You might also like