You are on page 1of 80

Ch2: Wiener Filters

Optimal filters for stationary stochastic models are reviewed and

derived in this presentation.
Linear optimal filtering
Principle of orthogonality
Minimum mean squared error
Wiener-Hopf equations
Error-performance surface
Multiple Linear Regressor Model
Numerical example
Channel equalization
Linearly constrained minimum variance filter
Generalized Sidelobe Cancellers

Linear Optimum Filtering: Statement

Complex-valued stationary (at least w.s.s.) stochastic processes.

Linear discrete-time filter, w0, w1, w2, ... (IIR or FIR (inherently
y(n) is the estimate of the desired response d(n)
e(n) is the estimation error, i.e., difference bw. the filter output
and the desired response
Linear Optimum Filtering: Statement
Problem statement:
Filter input, u(n),
Desired response, d(n),
Find the optimum filter coefficients, w(n)
To make the estimation error as small as possible

An optimization problem.

Linear Optimum Filtering: Statement
Optimization (minimization) criterion:
1. Expectation of the absolute value,
2. Expectation (mean) square value,
3. Expectation of higher powers of the absolute value
of the estimation error.
Minimization of the Mean Square value of the Error (MSE) is
mathematically tractable.
Problem becomes:
Design a linear discrete-time filter whose output y(n) provides an
estimate of a desired response d(n), given a set of input samples
u(0), u(1), u(2) ..., such that the mean-square value of the
estimation error e(n), defined as the difference between the
desired response d(n) and the actual response, is minimized.
Principle of Orthogonality
Filter output is the convolution of the filter IR and the input

where the asterisk denotes complex conjugation. Note

that in complex terminology, the term wk*u (n k )
represents the scalar version of an inner product of
the filter coefficient wk and the filter input u(n - k).

Principle of Orthogonality

MSE (Mean-Square Error) criterion:

Square Quadratic Func. Convex Func.

Minimum is attained when

(Gradient w.r.t. optimization variable w is zero.)

Derivative in complex variables

Then derivation w.r.t. wk is


the cost function J is
a scalar independent
of time n.

Principle of Orthogonality
Partial derivative of J is

Using and and

b k
n 9
Principle of Orthogonality
Since , or

The necessary and sufficient condition for the cost function J to

attain its minimum value is, for the corresponding value of the
estimation error eo(n) to be orthogonal to each input sample that
enters into the estimation of the desired response at time n.

Error at the minimum is uncorrelated with the filter input!

A good basis for testing whether the linear filter is operating in its
optimum condition.

Principle of Orthogonality

If the filter is operating in optimum conditions (in the MSE sense)

When the filter operates in its optimum condition, the estimate of

the desired response defined by the filter output yo(n) and the
corresponding estimation error eo(n) are orthogonal to each other.

Minimum Mean-Square Error
Let the estimate of the desired response that is optimized in the MSE sense,
depending on the inputs which span the space
i.e. so

Then the error in optimal conditions is


Also let the minimum MSE be (0)

HW: try to derive this

relation from the corollary.
Minimum Mean-Square Error
Normalized MSE: Let


If is zero, the optimum filter operates perfectly, in the sense

that there is complete agreement bw. d(n) and .
(Optimum case)

If is unity, there is no agreement whatsoever bw. d(n) and

(Worst case)

Wiener-Hopf Equations
We have (principle of orthogonality)

i i
Rearranging k 0, 1, 2, ...

(set of infinite

Wiener-Hopf Equations
Solution of Wiener-Hopf Equations for Linear Transversal (FIR) Filter

Wiener-Hopf Equations reduces to M simultaneous equations

The transversal filter involves a combination of three operations:

Storage, multiplication and addition, as described here: 15
1. The storage is represented by a cascade of M-1 one-sample
delays, with the block for each such unit labeled z -1. We refer
to the various points at which the one-sample delays are
accessed as tap points. The tap inputs are denoted by u(n),
u(n - 1), ... ,u(n M + 1). Thus, with u(n) viewed as the
current value of the filter input, the remaining M - 1 tap inputs,
u(n - 1), ... , u(n - M + 1), represent past values of the input.
2. The scalar inner products of tap inputs u(n), u(n - 1), ... ,
u(n - M + 1) and tap weights w0, w1, , wM-1 are respectively
formed by using a corresponding set of multipliers. In
particular the multiplication involved in forming the scalar
product of u(n) and w0 is represented by a block labeled w0*,
and so on for the other inner products.
3. The function of adders to sum the multiplier outputs to
produce an overall output for the filter.
Wiener-Hopf Equations (Matrix Form)



Wiener-Hopf Equations (Matrix Form)
Then the Wiener-Hopf equations can be written as


is composed of the optimum (FIR) filter coefficients.

The solution is found to be

Note that R is almost always positive-definite.

Error-Performance Surface


Error-Performance Surface
Quadratic function of the filter coefficients convex function, then


Minimum value of Mean-Squared Error
We calculated that

The estimate of the desired response is

Hence its variance is

Then w oH p p H w o
At wo.
(Jmin is independent of w)
Canonical Form of the Error-Performance Surface

Rewrite the cost function in matrix form

Next, express J(w) as a perfect square in w

Then, by substituting

In other words,

Canonical Form of the Error-Performance Surface

J(w) is quadratic in w,
Minimum is attained at w=wo,
Jmin is bounded below, and is always a positive quantity,

Canonical Form of the Error-Performance Surface
Transformations may significantly simplify the analysis,
Use Eigen-decomposition for R


a vector

a transformed version of the difference between the tap-weight vector w

and the optimum solution wo
Substituting back into J

Canonical form

The transformed vector v is called as the principal axes of the surface.

Canonical Form of the Error-Performance Surface
wo J(w)=c curve v2 J(v)=c curve
w2 J(wo)=Jmin (2)



Multiple Linear Regressor Model
Wiener Filter tries to match the filter coefficients to the model of the
desired response, d(n).

Desired response can be generated by

1. a linear model, a
2. with noisy observable data, d(n)
3. noise is additive and white.
Model order is m, i.e. a [ a0 a1 a m 1 ]T

What should the length of the Wiener filter be to achive minimum

Multiple Linear Regressor Model
The variance of the desired response is

, R m E um (n )umH (n )
But we know that

where wo is the filter optimized w.r.t. MSE (Wiener filter) of

length M. The only adjustable term;
1. Underfitted model: M < m Quadratic in M
Performance improves quadratically with increasing M.
Worst case: M=0,
2. Critically fitted model: M = m
wo=a, R=Rm,
Multiple Linear Regressor Model
3. Overfitted model: M > m

Filter longer than the model does not improve performance.

u m (n )
u( n ) ,
u M m ( n )
uM-m(n) is an (M-m)-by-l vector made up of past data
samples immediately preceding the m-by-l vector um(n).

See Example 2.7 pp 108 or 110

Numerical Example (Ch2:P11)
The desired response d(n) is modeled as an AR process of order 1;
that is, it may be produced by applying a white-noise process v(n) of
zero mean and variance 12=0.27 to the input of an all-pole filter of
order 1;
H1 ( z )
1 + 0.8458z -1
The process d(n) is applied to a communication channel modeled by
the all-pole transfer function
H 2 ( z)
1 - 0.9458z -1
The channel output x(n) is corrupted by an additive white-noise
process v2(n) of zero mean and variance 22 = 0.1, so a sample of the
received signal u(n) equals u(n) = x(n) + v2(n)

(a) Autoregressive model of desired

response d(n); (b) model of noisy
communication channel.

The requirement is to specify a Wiener filter consisting of a
transversal filter with two taps, which operates on the received
signal u(n) so as to produce an estimate of the desired response that
is optimum in the mean-square sense.
Statistical Characterization of the Desired Response d(n) and
the Received signal u(n)
d(n) + a1 d(n - 1) = v1(n)
where a1 = 0.8458. The variance of the process d(n) equals
12 0.27
1 a1
1 0.8458

The process d(n) acts as input to the channel. Hence, from Fig. (b).
we find that the channel output x(n) is related to the channel input
d(n) by the first-order difference equation
x(n) + b1x(n - 1) = d(n)

where b1 = -0.9458 . We also observe from the two parts of Fig. that
the channel output x(n) may be generated by applying the white-
noise process v1(n) to a second-order all-pole filter whose transfer
function equals.
H (z ) H 1 (z )H 2 (z )
(1 + 0.8458z -1 )(1 - 0.9458z -1 )
X (z ) H (z )V (z )

Accordingly, x(n) is a second-order AR process described by

the difference equation
x(n) a1 x(n 1) a2 x(n 2) v(n)
where a1= -0.1 and a2 = -0.8. Note that both AR processes
d(n) and x(n) are wide-sense stationary.

Since the process x(n) and v2(n) are uncorrelated, it follows
that the correlation matrix R equals the correlation matrix of
x(n) plus the correlation matrix of v2(n). R=Rx+R2
rx (0) rx (1)
r (1) rx (0)

1 a2 1 2
a1 2
x rx (0)
. rx (1) x
1 a2
1 a2 (1 a2 ) a1
2 2

1 0.8 0.27 0.5
1 1 0.8

1 0.8 (1 0.8) (0.1)
2 2

See section 1.9 (Computer

Experiments). 33
1 0.5
0.5 1
Next we observe that since v2(n) is a white-noise process of zero
mean and variance 22= 0.1

0.1 0
0 0.1

1.1 0.5
R R x R2
0.5 1.1

x(n) + b1x(n - 1) = d(n)

Since these two processes

are real valued

p(k ) p(k ) E[u(n k )d (n)] k 0, 1

p (k ) rx (k ) b1rx (k 1), k 0, 1 b1 0.9458

p (0) rx (0) b1rx (1) 1 - 0.9458 0.5 0.5272
p (1) rx (1) b1rx (0) 0.5 0.9458 1 0.4458
-0.4458 35
Error Performance Surface

Wiener Filter and MMSE

1 0.8360
1 r (0) r (1) 1
wo R p
R 0.7853
r (1) r (0)
J min d2 p H wo
1 r (0) r (1)
2 0.8360
r (0) r 2 (1) r (1) r (0) 0.9486 [0.5272, 0.4458]

1.1456 0.5208 0.1579

0.5208 1.1456 36
Canonical Error-Performance Surface
we know that
where for M=2



* Application Channel Equalization

We consider a temporal signal-processing problem, namely that

of channel equalization.
When data are transmitted over the channel by means of discrete
pulse-amplitude modulation combined with a linear modulation
scheme (e.g., quadriphase-shift keying) the number of detectable
levels that the telephone channel can support is essentially
limited by intersymbol interference (ISI) rather than by additive
noise. Criterion: 1. Zero Forcing (ZF)
2. Minimum Mean Square Error (MMSE)

The impulse response of the equalizer

the impulse response of the channel

Ignoring the effect of channel noise, the cascade

connection of the channel and the equalizer is
equivalent to a single tapped-delay-line filter

where the sequence Wk is equal to the convolution of the sequences cn and hk, i.e.

Let the data sequence u(n) applied to the channel input consist of a white-noise
sequence of zero mean and unit variance.

Accordingly, we may express the elements of the correlation matrix R of the

channel input as follows

1, l 0
r (l )
0, l0
For d(n) supplied to the equalizer, we assume the availability of a delayed
"replica" of the transmitted sequence. This d(n) may be generated by using
another feedback shifter of identical design to that used to supply the original
data sequence u(n). The two feedback shift registers are synchronized with each
other such that we may set
d(n) = u(n)
Thus the cross-correlation between the transmitted sequence u(n) and the desired
response d(n) is defined by
1, l0
p(l )
0, l 1, 2, ..., N 43
1, l 0
0, l 1, 2, ..., N
1, l 0
hc k l k
l 1, 2, ..., N
k N 0,

Given the impulse response of the channel
characterized by the coefficients c-N , ,c-1, , c0, c1,
, cN we may use above Eq. to solve for the
unknown tap-weights h-N , ,h-1, , h0, h1, , hN of
the equalizer.
In the literature on digital communications, an
equalizer designed in accordance the above Eq. is
referred to as a zero-forcing equalizer. The equalizer
is so called because, with a single pulse transmitted
over the channel, it "forces" the receiver output to
be zero at all the sampling instances, except for the
time instant that corresponds to the transmitted
Application Channel Equalization - MMSE

L v(n) M
x(n) y(n) z(n) (n)
Channel, h + Filter, w - +


Transmitted signal passes through the dispersive channel and a

corrupted version (both channel & noise) of x(n) arrives at the
Problem: Design a receiver filter so that we can obtain a delayed
version of the transmitted signal at its output.

Application Channel Equalization
MMSE cost function is:

Filter output Convolution

Filter input Convolution

Application Channel Equalization
Combine last two equations Convolution

Toeplitz matrix performs convolution -2 ???
Compact form of the filter output

Desired signal is x(n-), or

Application Channel Equalization
Rewrite the MMSE cost function

Expanding (data and noise are uncorrelated E{x(n)v(k)}=0 for all


Re-expressing the expectations

Application Channel Equalization
Quadratic function gradient is zero at minimum

The solution is found as

And Jmin is

Jmin depends on the design parameter

Application Linearly Constrained
Minimum Variance (LCMV) Filter
1. We want to design an FIR filter which suppresses all frequency
components of the filter input except o, with a gain of g at o.

Application Linearly Constrained
Minimum - Variance Filter
2. We want to design a beamformer which can resolve an incident
wave coming from angle o (with a scaling factor g), while at the same
time suppress all other waves coming from other directions.

Fig: 2.10 Plane wave incident on a linear-array antenna 52

Application Linearly Constrained
Minimum - Variance Filter
Although these problems are physically different, they are
mathematically equivalent.

They can be expressed as follows:

Suppress all components (freq. or dir. ) of a signal while
setting the gain of a certain component constant (o or o)

They can be formulated as a constrained optimization problem:

Cost function: variance of all components (to be minimized)
Constraint (equality): the gain of a single component has to be g.
Observe that there is no desired response!.

Application Linearly Constrained
Minimum - Variance Filter
Mathematical model:
Filter output | Beamformer output

Minimize the MS value of y(n) subject to:

Minimize the MS beamformer output
Constraints: y(n) subject to linear constraint:

Normalized angular freq. with respect
to the sampling rate

g is a complex valued gain
Application Linearly Constrained
Minimum - Variance Filter
Cost function: output power quadratic convex
Constraint : linear
Method of Lagrange multipliers can be utilized to solve the problem.

output power constraint

Solution: Set the gradient of J to zero

Optimum beamformer weights are found from the set of equations


similar to Wiener-Hopf equations.

Application Linearly Constrained
Minimum - Variance Filter
Rewrite the equations in matrix form:
* j0 j0 ( M 1) T
Rw o s(0 ) where s(0 ) 1 e e
Hence *

How to find ? Use the linear constraint:

to find

Therefore the solution becomes

For o, wo is
the linearly Constrained Minimum-Variance (LCMV) beamformer
For o, wo is
the linearly Constrained Minimum-Variance (LCMV) filter
Minimum-Variance Distortionless Response
Distortionless set g=1, then

We can show that (HW) H

Jmin represents an estimate of the variance of the signal impinging on the

antenna array along the direction 0.
Generalize the result to any direction (angular frequency ):

minimum-variance distortionless response (MVDR) spectrum

An estimate of the power of the signal coming from direction
An estimate of the power of the signal coming from frequency
Minimum Variance Distortionless Response Spectrum

In addition to spectrum estimation, the

constrained optimization is popular in
array signal processing in spatial rather than
temporal domain.
Therefor one can include multiple constraints
to result in generalized sidelobe canceler.

For stationary signals, the MSE is a quadratic function of
linear filter coefficients.
optimal linear filter in the MMSE sense is found by
setting gradients to zero
orthogonality principle
Wiener filter.
It depends on the second order statistics.
It can be used as an approximation, if the signals are
locally stationary.
A competing optimization criterion is to minimize the filter
output mean
power (variance) given constraints on desired outputs
optimization by the method of Lagrange multipliers.
Generalized Sidelobe Cancellers
Continuing with the discussion of the LCMV narrowband beamformer
defined by the linear constraint of Eq.(2.76), we note that this constraint
represents the inner product

in which w is the weight vector and s(0 ) is the steering vector pointing
along the electrical angle 0 . The steering vector is an M-by-1 vector,
where M is the number of antenna elements in the beamformer. We
may generalize the notion of a linear constraint by introducing multiple
linear constraints defined by

CHw=g (2.91)
Generalized Sidelobe Cancellers

The matrix C is termed the constraint matrix, and the vector g,

termed the gain vector, has constant elements. Assuming that there
are L linear constraints, C is an M-by-L matrix and g is an L-by-1
vector; each column of the matrix C represents a single linear
constraint. Furthermore, it is assumed that the constraint matrix C
has linearly independent columns. For example, with
[ s(0 ), s(1 )] w ,

the narrowband beamformer is constrained to preserve a signal of
interest impinging on the array along the electrical angle o and, at
the same time, to suppress an interference known to originate along
the electrical angle l.

Generalized Sidelobe Cancellers
Let the columns of an M-by-(M - L) matrix Ca be defined as a basis
for the orthogonal complement of the space spanned by the columns
of matrix C. Using the definition of an orthogonal complement, we
may thus write
CH Ca 0 (2.92)
or, just as well,
Ca H C 0 (2.93)
The null matrix 0 in Eq.(2.92) is L-by-(M - L), whereas in Eq.(2.93)
it is (M - L)-by-L; we naturally have M > L. We now define the M-
by-M partitioned matrix
U [C Ca ] (2.94)
whose columns span the entire M-dimensional signal space. The
inverse matrix U-1 exists by virtue of the fact that the determinant of
matrix U is nonzero.
Generalized Sidelobe Cancellers
Next, let the M-by-1 weight vector of the beamformer be written in
terms of the matrix U as w Uq (2.95)
Equivalently, the M-by-1 vector q is defined by
q U-1w (2.96)
Let q be partitioned in a manner compatible with that in Eq.(2.94),
as shown by v
w a
where v is an L-by-1 vector and the (M - L)-by-l vector wa is that
portion of the weight vector w that is not affected by the constraints.
We may then use the definitions of Eqs. (2.94) and (2.97) in
Eq.(2.95) to write v
w [C Ca ] Cv C w
a a (2.98)
w a 63
Generalized Sidelobe Cancellers
We may now apply the multiple linear constraints of Eq.(2.91),
(CHw=g) obtaining
C Cv C Ca w a g
But, from Eq.(2.92), we know that CH Ca is zero; hence, Eq.(2.99)
reduces to
C Cv g
Solving for the vector v, we thus get

v C C g
H 1
which shows that the multiple linear constraints do not affect wa

Generalized Sidelobe Cancellers

Next, we define a fixed beamformer component represented by

wq=Cv=C(CHC)-1g (2.102)
which is orthogonal to the columns of matrix Ca by virtue of the
property described in Eq.(2.93); the rationale for using the subscript
q in wq will become apparent later. From this definition, we may use
Eq.(2.98) to express the overall weight vector of the beamformer as

w=wq-Cawa (2.103)

Substituting Eq.(2.103) into Eq.(291) yields

CHw=g CHwq-CHCawa =g (2.103)

Generalized Sidelobe Cancellers

which, by virtue of Eq. (2.92), reduces to

CHwq=g (2.104)
Equation (2.104) shows that weight vector wq is that part of the
weight vector w which satisfies the constraints. In contrast, the
vector wa is unaffected by the beamformer.

Thus, in light of Eq.(2.103), the beamformer may be represented by

the block diagram shown in Fig. 2.11(a). The beamformer described
herein is referred to as a generalized sidelobe cenceller (GSC).

Generalized Sidelobe Cancellers

In light of Eq. (2.102), we may now perform an unconstrained

minimization of the mean-square value of the beamformer output
y(n) with respect to the adjustable weight vector wa . According to
Eq.(2.75), the beamformer output is defined by the inner product


FIGURE 2.11 (a) Block diagram of generalized sidelobe
canceller. (b) Reformulation of the generalized sidelobe
cancelling problem as a standard optimum filtering problem.

Generalized Sidelobe Cancellers

u(n) is the input signal vector, in which the electrical angle 0 is

defined by the direction of arrival of the incoming plane wave
and u0 (n) is the electrical signal picked up by antenna element 0 of
the linear array in Fig. 2.10 at time n. Hence, substituting Eq. (2.103)
into Eq. (2.105) yields

y(n) w qH u(n) w aH CaH u(n) (2.107)

Generalized Sidelobe Cancellers

If we now define


we may rewrite Eq.(2.107) in a form that resembles the standard

Wiener filter exactly, as shown by

where d(n) plays the role of a desired response for the GSC and
x(n) plays the role of input vector, as depicted in Fig.2.11(b).

Generalized Sidelobe Cancellers

We thus see that the combined use of vector wq and matrix Ca has
converted the linearly constrained optimization problem into a
standard optimum filtering problem.
In particular, we now have an unconstrained optimization problem
involving the adjustable portion wa of the weight vector, which may
be formally written as

where the (M-L)-by-1 cross-correlation vector

and the (M-L)-by-(M-L) correlation matrix

Generalized Sidelobe Cancellers
The cost function of Eq.(2.111) is quadratic in the unknown vector
wa , which, as previously stated, embodies the available degrees of
freedom in the GSC.
Most importantly, this cost function has exactly the same
mathematical form as that of the standard wiener filter defined in
Accordingly, we may readily use our previous results to obtain the
optimum value of wa as

Using the definitions of Eqs.(2.108) and (2.109) in Eq.(2.112), we

may express the vector px as

where R is the correlation matrix of the incoming data vector u(n). 72

Generalized Sidelobe Cancellers

Similarly, using the definition of Eq.(2.109) in Eq.(2.113), we may

express the matrix Rx as

The matrix Ca has full rank, and the correlation matrix R is positive
definite, since the incoming data always contain some form of
additive sensor noise, with the result that Rx is nonsingular.
Accordingly, we may rewrite the optimum solution of Eq.(2.114) as

Generalized Sidelobe Cancellers
Let Po denote the minimum output power of the GSC attained by using
the optimum solution wao. Then adapting the previous result derived in
Eq.(2.49) for the standard Wiener filter and proceeding in a manner
similar to that just described, we may express Po as

Now consider the special case of a quiet environment, for which the
received signal consists of white noise acting alone. Let the
corresponding value of the correlation matrix R be written as

Generalized Sidelobe Cancellers
where I is the M-by-M identity matrix and 2 is the noise variance.
Under this condition we readily find, from Eq.(2.117),that

By definition, the weight vector wq is orthogonal to the columns of

matrix Ca. It follows, therefore, that the optimum weight vector wao is
identically zero for the quiet environment described by Eq.(2.119).
Thus, with wao equal to zero, we find from Eq.(2.103) that w = wq It is
for this reason that wq is often referred to as the quiescent weight vector
-hence the use of subscript q to denote it.

Generalized Sidelobe Cancellers
Filtering Interpretations of Wq and Ca
The quiescent weight vector wq and matrix Ca play critical roles of their
own in the operation of the GSC. To develop physical interpretations of
them, consider an MVDR spectrum estimator (formulated in temporal
terms) for which we have

Hence, the use of these values in Eq. (2.102) yields the corresponding
value of the quiescent weight vector, viz.,

Generalized Sidelobe Cancellers
which represents an FIR filter of length M-Be frequency response of
this filter is given by

Figure 2.12(a) shows the amplitude response of this filter for M = 4

and 0=1. From this figure, we clearly see that the FIR filter
representing the quiescent weight vector wq acts like a bandpass filter
tuned to the angular frequency 0, for which the MVDR spectrum
estimator is constrained to produce a distortionless response.
Generalized Sidelobe Cancellers
Consider next a physical interpretation of the matrix Ca. The use of
Eq.(2.120) in Eq.(2.92) yields

According to Eq.(2.123),each of the (M-L) columns of matrix Ca

represents an FIR filter with an amplitude response that is zero at 0, as
illustrated in Fig.2.12(b) for 0 =1, M = 4, L = 1,and

In other words, the matrix Ca is represented by a bank of band-rejection

filters, each of which is tuned to 0. Thus, Ca is referred to as a signal
blocking matrix , since it blocks (rejects) the received signal at the
angular frequency 0.The function of the matrix Ca is to cancel
interference that leaks through the sidelobes of the bandpass filter
representing the quiescent weight vector wq 78
Generalized Sidelobe Cancellers

Generalized Sidelobe Cancellers

FlGURE 2.12(a) Interpretation of wHq s() as the response of an FIR filter. (b) Interpretation of
each column of matrix Ca as a band-rejection filter. In both parts of the figure It is assumed
that 0 =1

HW2: Ch2; P 2, 5, 7, 10, 13, 15 80