You are on page 1of 39

Steepest Descent

Summary
Wiener Filtering: A Numerical Example Steepest Descent Algorithm Stability of Steepest Descent Some Numerical Examples

Example: Problem statement (1)


Consider the following communication channel scheme:

12 = 0.27
1 H1 ( z ) = 1 + 0.8458 z 1 2 2 = 0.1 u ( n ) = v2 ( n ) + x ( n )

Example: Problem Statement (2)

Goal: design a Wiener filter of two taps which operates on u(n) to obtain an estimte of d(n) optimum in MSE sense.

Example: Signal Characterization


Signal Characterization:
Input signal
d (n) + a1d (n 1) = v1 (n) v1(n) vs. d(n) relation 0.27 = = 0.9486 Variance of d(n) = 2 2 1 a1 1 0.8458
2 d

12

Output Signal
x(n) vs. d(n) relationship 1 H ( z ) = H1 ( z ) H 2 ( z ) = (1 + 0.8458 z 1 )(1 0.9458 z 1 )

x(n) + b1 x(n 1) = d (n)

Example: Optimum solution (1)


To characterize the Wiener Filter we need to solve Wiener-Hopf equations.
Assumption: x(n) and v2(n) are uncorrelated, thus we may write R = R x + R 2
rx (0) rx (1) Rx = rx (1) rx (0)

Example: Optimum solution (2)

Where b1=-0.9458

We can finally solve Wiener-Hopf equations

Example: Error Performance Surface (1)


Error Performance Surface

where

Applying previous equation to our case we obtain:

Example: Error Performance Surface (2)


Error Performance Surface:

Example: Error Performance Surface (3)


Error Performance Contour lines

Example: Canonical Error Performance Surface


R characteristic equation

Observation: The canonical form of the function cost highlights the fact that the contour of the cost function is an ellipse of major axis and minor axis

Steepest Descent Algorithm (1)


Problem: Solution of Wiener-Hopf equations may be computationally inefficient. Alternative: method of steepest descent. Procedure:
1. 2. 3. 4.

Start with an initial guess Using this initial guess compute the gradient of the function cost. Make a change to the previous weight set. Go back to step 2 and repeat.

Steepest Descent Algorithm (2)


Let J (n) be the gradient of the function cost. Written in vectorial form:

Update equation
Using previous equation we may finally write the tap weights set update equation:

is referred as step-size parameter


We can also observe that p Rw (n) = E{u(n)e* (n)}, thus we may compute the update using a bank of cross-correlators

Signal-flow graph
We may visualize the update equation with the following block diagram:

Stability of the algorithm (1)


The previous block diagram highlights the fact that the steepest descent algorithm involves a feedback loop, thus stability must be considered. The stability of the feedback loop is determined by:
Step-size parameter Autocorrelation of input data u(n)

Stability of the algorithm (2)


Let define , the weight-error vector Update equation can be rewritten in terms of the weighterror set as Eigenvalue decomposition of autocorrelation matrix:

We can write Premultiplying both sides of the previous equation by Q H we obtain

Stablity of the algorithm (3)


We define a new set of coordinated as follows: The update equation becomes Applying recursively the previous update equation we obtain

Stability of the algorithm (4)


In order to achieve the stability of the algorithm, the following condition must hold for all k Applying the previous condition to the worst case we obtain:

Transient Behavior of MSE (1)


The function cost may be rewritten in terms of the transformed coordinates as learning curve As expected It can be noticed that the learning curve is a sum of exponentials, each of them corresponds to a natural mode of the algorithm.

Transient Behavior of MSE (2)


The exponential decay of the k-th natural mode is If k << 1 we may approximate Observation: The overall speed of convergence depends upon the minimum eigenvalue

Conclusions
Stability of the algorithm depends on the maximum eigenvalue Convergence rate depends on the minimum eigenvalue A convergence rate problem arises when the ratio max / min (conditioning number is high)

Example
We consider a 2-nd order real-valued AR process that generates the input signal u(n):

Eigenvalues:

Example
Linear predictor: estimate the value of u(n) by linear combination of: u(n-1), u(n-2), u(n-M+1)

Example
We perform some experiments varying some parameters, as summarized in this table:

Example
Is an ellipse in the plane v1-v2, centered at the origin The pairs v1(0),v2(0) ; v1(1),v2(1) ; v1(n),v2(n) defines a trajectory in the plane v1-v2, converging to the origin

Also the tap-weigth estimates w1(n) and w2(n) defines ellipse; in this case the center is at [w0(1),w0(2)].

Experiment 1: Varying eigenvalue spread, fixed

Experiment 1: Varying eigenvalue spread, fixed

Experiment 1: Varying eigenvalue spread, fixed

Experiment 1: Varying eigenvalue spread, fixed

Experiment 1: Varying eigenvalue spread, fixed

Experiment 1: Varying eigenvalue spread, fixed

Experiment 1: Varying eigenvalue spread, fixed

Experiment 1: Varying eigenvalue spread, fixed

Experiment 1: Varying eigenvalue spread, fixed

Experiment 2: Varying

Experiment 2: Varying

Observations (1)
When 1=2, the trajectory of [v1(n),v2(n)] or that of [w1(n),w2(n)] are straight lines: this corresponds to the shortest path to reach the optimum This also happens if v1(0) or v2(0) is zero (rigth choice in the initial condition) In the other cases the trajectory follows a curved path; more the eigenvalues spread is large, more the path is curved and the convergence takes more time

Observations (2)
If is too small, the transient behavior is overdamped When approahces the maximum allowable value, the transient behavior is underdamped, i.e. the trajectory exhibits oscillations

You might also like