Professional Documents
Culture Documents
Bayesian approach:
We assume that θ is a random variable whose particular
realization we must estimate.
This is the Bayesian approach, so named because its
implementation is based directly on Bayes' theorem.
Prior knowledge about θ can be incorporated into our
estimator by assuming that θ is a random variable with a
given prior PDF.
Page 3
Bayesian Estimation
( )
MSE = Ex (θˆ − θ ) 2 = ∫ (θˆ − θ ) 2 p (x)dx
( )
BMSE = Ex ,θ (θ − θˆ) 2 = ∫∫ (θ − θˆ) 2 p(x, θ )dxdθ
Page 4
Bayesian Estimation
Taken from Kay: Fundamentals of Statistical Signal Processing, Vol 1: Estimation Theory, Prentice Hall, Upper Saddle River 2009
J = ∫ [∫ (θ − θˆ) 2
]
p (θ | x)dθ p (x)dx.
Since p(x) >= 0 for all x, if the integral in brackets can be
minimized for each x, then the Bayesian MSE will be minimized.
Page 6
Bayesian Estimation
J ' = ∫ (θ − θ ) p (θ | x)dθ
ˆ 2
θˆ = E (θ | x).
Page 7
Bayesian Estimation
Comments:
It is seen that the optimal estimator in terms of minimizing the
Bayesian MSE is the mean of the posterior PDF p(θlx).
The posterior PDF refers to the PDF of θ after the data have been
observed. It summarizes our new state of knowledge about the
parameter.
We will term the estimator that minimizes the Bayesian MSE the
minimum mean square error (MMSE) estimator.
Intuitively, the effect of observing data will be to concentrate the
PDF of θ.
Comments:
The MMSE estimator will in general depend on the prior knowledge as
well as the data.
If the prior knowledge is weak relative to that of the data, then the
estimator will ignore the prior knowledge.
Otherwise, the estimator will be "biased" towards the prior mean. As
expected, the use of prior information always improves the estimation
accuracy.
Page 10
Bayesian Estimation
It is often not possible to find closed form solutions for the MMSE
estimator. An exception is the case, where x and θ are jointly
Gaussian distributed.
Page 11
Bayesian Estimation
1 1
p ( A) = exp − 2
( A − µ A )
2πσ 2 A 2σ A
with µA=0
1 1 N −1
If p (x | A) = N
exp − 2
2σ
∑ (x[n] − A)
2
(2πσ ) 2 2 n =0
σ2
2
σ A N
Aˆ = 2
x + 2
µ A = αx + (1 − α ) µ A
2 σ 2 σ
σ A+ σ A+
N N
σ 2A
with α = 2
σ
σ 2A +
N
Page 14
Bayesian Estimation
2 1
σ A| x =
N 1
2
+
σ σ 2A
decreases as N increases.
Page 15
Bayesian Estimation
Taken from Kay: Fundamentals of Statistical Signal Processing, Vol 1: Estimation Theory, Prentice Hall, Upper Saddle River 2009
Page 16
Bayesian Estimation
Vector Case
1 1 x − E (x) −1 x − E (x)
T
p ( x, θ ) = k +l 1
exp − C
2 θ − E (θ) θ − E (θ)
(2π ) det (C)
2 2
θˆ = ax + b
that minimizes the Bayesian MMSE:
[
J = E x ,θ (θ − θˆ) 2 ]
Solution:
ˆ cov( x, θ )
θ = E (θ ) + ( x − E ( x))
var( x)
Example: Derive the LMMSE estimator and the MSE for Example 1.
Page 20
Linear Bayesian Estimation
Scalar Parameter
J = E x ,θ [(θ − θˆ) ]
2
Page 21
Linear Bayesian Estimation
Scalar Parameter
∂ N −1
2
N −1
E θ − ∑ an x[ n] − a N = −2 E θ − ∑ an x[n] − a N =
∂a N n =0 n =0
N −1
= −2 E (θ ) − ∑ an E ( x[n]) − a N
n =0
Setting to zero results in:
N −1
a N = E (θ ) − ∑ an E ( x[n])
n =0
Page 22
Linear Bayesian Estimation
Scalar Parameter
[(
E a (x − E (x )) − (θ − E (θ ))
T
) ]=
2
[ ] [ ]
= E aT (x − E (x ))(x − E (x )) a − E aT (x − E (x ))(θ − E (θ )) −
T
Page 23
Linear Bayesian Estimation
Scalar Parameter
[(
J = E aT (x − E (x )) − (θ − E (θ )) ) ]=
2
[ ] [
− E (θ − E (θ ))(x − E (x )) a + E (θ − E (θ )) =
T 2
]
= aT C xxa − aT c xθ − cθx a + σ θ
where is Cxx the NxN covariance matrix of x and cxθ is the 1xN
cross-covariance vector having the property cxθ = cθxT and σθ is
the variance of θ.
Taking the gradient yields
∂J
= 2C xx a − 2c xθ
∂a
setting to zero results in
−1
a = C xx c xθ Page 24
Linear Bayesian Estimation
Scalar Parameter
Page 25
Linear Bayesian Estimation
Vector Parameter
[ ˆ
J i = E (θ i − θ i ) 2
]
Combining the scalar LMMSE estimators leads to
ˆθ = E (θ ) + c C −1 (x − E (x ))
θx xx
and
BMSE (θˆi ) = σ θi − cθi xC xx c xθi
−1
Page 26
Linear Bayesian Estimation
Vector Parameter
Hints:
as we will see in the following lectures:
For uncorrelated data samples and uncorrelated noise with zero
means and σy2 and σn2, respectively, we have
σ 2
R xx = σ y2 H H H + n2 I
σy
as the autocorrelation matrix of the samples and
H 2
rxy = H σ e y i
as the cross correlation vector of x and y. ei is the vector that has a
one as position i and zeros at all other elements. Choose i = l+1 and
the length of ei as 7.
H is the convolution matrix of h. Use convmtx(h,l) to obtain H in
matlab.
Please be aware that the output sequence after the filter w is shifted
by i=l+1 samples
Page 28