You are on page 1of 275

Adaptive Fillers

1

, i

AdaptiveFi Iters Theory and Applications

B. Farhang ... Boroujsny

National University of Singapore

John Wiley & Sons

Chlchester . New York - Weinheim· Brisbane. Singapore' Toronto

[I

I

fiuliollJ,Zl 0'1243 179171

fmmJat/oTl(li ( .;- '14) 1213719777

Copyright Ii;:> 199'& JiJhn Wiley & Sons Ltd, B "ffins La fie, Cfiiehes W;r.

W~st S'usSex ,PO 19 I ue, England

Visit 011:1" Home Page oft h\tp:/lw\I'W,wil<l)'.oo.uk or

tntp:lIw'ivw. wiley ,00 rn

To my family 10r their support, understanding and love

All rigltu;, re8erYed, N,o part ol' this ,publi~lloo may be [ep"'odu~, stcred

ID,8 re!1iJ:-va) ~tem,oLlfanSl;l1jLtecl, ill any form or by "",oYmeiuf~ruectroDj!; mechanical. phot0C>3PYillg_. rooordin,g,sc.'lIuliog or otlierwi~. em:e:pt under the terms or til" Copyrigln, Designs and Patents A.:t 1988 or under th~ terms of a licence Issued

by !he,CopyrighLU~j-DgAge"cy,gOTiJtten:ha1ll Conn Road, Londo» WlP 9HE, UK ",{lhaul the permission in wriling of ihePublisher, with the exception of any material supplied spOlcificaiPy rOt tft~ purpose of'being.enteredand executed on a computer system for the, ~)l\';)\lS; VB'~ se by I he purchJlIi!' r tl f die pn blleation,

Or/,er lViieJ' EililG,ioi O_ffi.ues

John Wiley & SOilS, Inc., 605 Third Avenue, 'New York, NY unss-oeu. USA

Wiliry.vCR V"r!~g GtnbH,Pal'PClal.lec: 3. :0,69469 Wel~neiib, Germany

In_C;ojr;wda Wiley LIt!, )J Park RoM, :NWlon, Q ueensla nd 4064. A ustralia

John Wiley &, Sore; (ASill) Pte Ltd, 2, Clemenri Loop,j;(1J2-1l1, J ill Xi ~ g Distripark, Si'Uga jI"I<. OS 12

John Wiley &- Sons (CaOllda) LlII. ;12. W·tl¢e.5t~~ Road,

RCo\'!l'aLe, Ouca(lQ M9W tLI. OIO;,d~ . .

Lil"ary ojCOl\gress C<tta/og'jng-i,,·Pub/iclItiIHl /J1Itf! Fadiangclldrf;luJ\l>lY, B.

Ad"'l'livellllers: theory und .~pplir::atio!lo I B_Y~rhan&-Bor"'t1jeny.

p. 0!;11.

fndlld'ei< Mbli'lgI"'<tpltica/ referenees fwd ,~ndex. ISBN 0-471 ·98137,3

L Ad"pt;vl!:'6!ter~_ l. Titl,,_ TK.7a-nFSF.n 1999

621.38 !5'3-24-de21 98·81R~

Cll'

B'riri:Jh Ukrar), CtH.a!Og ning in P,ihli~l11iM lJl11 (J

A catalogue record ror this bOlok is.u~aiJahlt from Lb~ British Library

ISBN 0411-98317-1

Ty.pe&l! in part from the author's disks in 10/ IZpl Timesby jill; Ald,en Group. Oxford. Prinredand bound in Great Brita'IJ hy An toll)' RoweL'd; Chij:lpeJ]jlJlJJ].

1111s hook is prhued-on acid-free paper =pensibly manufactur •. d frern sustalnable r"C~s!ry in which at least two l:<~ are phlltCd for each Ul'It ~ [or Jlap:er " rcdueti 011 _

Contents

Preface

Ackrl 0 ..... led gemen 1$

.xiil

xvji

1

Introduction

1

1.1 Li near Filters

1.2 Adaptive FIllers

1.3 Adaptive Filter Structures

1.4 Adaptation Approaches.

1.4.1 Approach oaS'I;lO 011 WiEmer filter theory

1.4.2 Method (lj least squares

i _5 Real and Complex. Forms ot Adaptive -Fllters

1.6· Applications'

1.6.1 Modelli ng

1.'6.2 I:nver~e mOdell irng

1.6.3 Linear prediction-

1.6.4 Interference-cancellation

1 2 :3 6 7 7 9 9

10 11 1S 21

2

Di.screte~Time Signals and Systems

29

2.1 Saq_u:ene·es and the z-Transform

2.2 Parsaval's ReiaF on

2 .. 3 Sysll;lm Fl,lllGtlon

2.4 StQchastic Precesses

2_4_ 1 StochaStic averages

2.4',2 z-transtorrn representauoris

'2.4.3 The power spectral densJty

2.4.4 Respon'se of linear systems to s_tO_Cihastlc: processes

2.'1'.5 Ergodic-Ity_and time averages

Problems

29 34 ;l4 36 37 39 40 42 46 4£

3

WienerFilte rs

3; 1 Mean"Square Error cnterton

3.2 Wiener Filter - thB> Transversal, Real-Valued Case

3_"3 Principl.eoj .0rtho9.onamy

SA NOfm'aJlzed Perform-ance Function

3.5 Exten~ion to Ihe Cornplex-Val wed Cas€!

3-.6 lfnccnstralnad Wiener filters

3,6.1 Performanee lunction

3.6.2 Optimum Iransferfull!:!1o/l

3.6.3 Modellin~ .

49 51 56 SS 59 62 63 65 68

49

viii

Contenls

1

B.5.4 Inv,erse modelling.

3.1;i;5 Noisecanceliatiol'l

3.,7 Summary a.nd Discussion Problems

71 1'5 81 82

\

Elgenanalysis and the Performance Surface

89

4

4.1 Eigenvalues al1d Eigenvectors

4.2 Propertles of Elgen:;'alues and Eig.envet:\ors

4.3 The Pertormanee Suriace

Prgblems

69 90 104 118

5

Search Methods

119

5.1 M,ettlod of 8tei;Jf)\:fsl [)e;>qml 120

5.2 Learning Curve 127

5,3 The Effec! of Elgenllalue Spread 130

5A· Newton's Moitl'lod 132

5.5 An Alternative Inlflrpretation 01 Newton's Al gorithm 134

Problems 135

J

6 The lMS AlgorUhm 139

6.1 Derivation of the LMS Algprithm 139

6.2 Average tap-W~ighI8eh~'vioIlr 0.1 the LMS AJgplithm 141

6.3 MSE Behaviour 01 the LMS' Algorrthm 144

6.3.1 Leamirig.curve 146

6,3.2 The welgM-error correlation 'rtmirbt i49

6,3.3 Excess MSE and mlsadlustrnent 162

6.~.4 $Iabillty 154

6.S.5 The effect ollnillal vatues 0t tap wei.:g'tll~ on ihe'transie,nrb¢h;:jvlotlr

of the I"MSalgorilhm 156

:SA Computer Simulatibns 1'57

6.4.1 Syst~m mGdelling 157

6.4.2. Channel e'lualiz:atlon 159

6.4.3 Adaptive ltne enhaneernant 164

6.4.4 Be.a:mformln9 tes

6.5 Simplified LMS Algor.lthms 169

S.6 Notmalll:ed LMS AI9withm 172

6.7 Variable Slep-She LMS Algorithm 175

6.8 LMS Algorithm lot OompleX'-Valued'Sigl1'als 178

6.9 BeiClmlorming (ReYislted) 180

Iuo Linearly Constrained uvis Algorithm 184

6,1(1,1 Statement of Ihe preblem and its optimal solutian 184

6.10.2 Update equatinns 18S

6.10.3 Exlensionto the' Complex·Valued Case 1.86

Problema 18'8

Al'ip.endi,X: 6A: Derivation 01 (6.39) 199

7 Transform Domain Adaptive Filters 201

7.1 OverView QfTransfor'm Domain Adaptive Alters 202

7.2 The Band-Partitioning Property otOrthogonal Transforms 204

7.3 The Orthogonalization Property of Orthogonal Transforms 205

7.4 The Transform Domain LMS Algorithm ' . 208

7.5 The IdeallMS-Newton Algorithm and Us Beialionshipwith TDLMS 210

7.6 Selection of the Transtorm T 210

Conle.nts

ix

7.i 7.8

7.6.1 A .geOmetrlcal interj)'retatlotl

7.6.2 A useful penormance Index

7.6.3 Improvement taetor and comparisons

7.6.4 Fillerin,g view

Transforms

Sliding Transforms

7.S.1 Freque nay sampll ng fillers

7.8.2 Heourslve realization of sliding transforms

7 ,8.3 Non-recursive realization o'l,glirjlng'lranslorms

7.8A Carnparisbn of recursiveand nen-recurslve slidlngttansforms

Summary and Dlseusston

P rohl ems

211' 215 216 219 224 ~5 226 .227 2:30 235 237 238

7.9

8 Block Implementation of Adaptive Filters, 247

B.l Block LMS Algorithm .248

5Jl Malhematlcal Background 25J

.8,2,1 t.lnaar convoiutlcn u,sing, the discrete Fourler transforrn 252

B.2,2 Cimular matrlaes . 254

8.2 .. 3 Wihdow matrices and marriJ(formuiaiinn of'lh,e overlap-save

method 256

5,3 The FBlM3 Algoritl'lm 251

8.5.1 'Cbnst~aineG and uoconstratned FBLMS algorithms 259,

8.3.2 Converqence behaviour of the F8lMS algorithm 259

8.3..3 S1ep~normall'zalion 261

6.3.4 Summary of the FBLMS alg'ori~hm 262

8,3.5 FBLMS misadjJ.lstmentequatioflS 264

1l.3.S Selecti.on of th~ bloc}; length 2.65

8.4 The Partitioned FBLMS Alg()ril;hm 265

SA. 1 Anatysls of the PFBlMS algor.ithm 288

8.4.2 The PFBLMS algorithm with M >- L 270

8.4.3 PF'I,lMS mlsadjuslmelll'equations 2'13

8.4.4 Gompulational compJe);nyand memory requirement 273

8.4.5 Mbdilled constrained PFB.tMS·algprilhm 275

8.5 Com puler Simulations 275

Problems 278

Appendix ,8A; Derivational a MrsaGJustrnent Equa,t(;On f9r the 'aLMS Algorithm 283

Appendix BB: Derivation of M1.sadjuslme.nt Equattcns.for the FBLMS Algor.ithm 285

9 Subband Adaptive Finers 293

9.1 OFT Hiler Banks 294 9.1. i The weighted cverlap-add method tor thereallzatlon of OFT

analysis filter basks 295

9.'.2 The weighted ovarlap=add method for the rea Ilzation of' OFT

synthesis filter banks 296

9.2 Comp'lementary Filter 8a;nks 298

9.3 Susband Adaptive FilJerStrtm1o:res 302

9.4 ,ge~ection 01 Analysis and Synthesis Alters $03

,9.5 Computanonal Complexity 30&

9.6 Decimation Factor and Aliasing 307

9.7 Low-Delay Analysis and Syhthesis Alter Banks 309

9.7'.1 Design method 3Q9

9:1.2 Propertles oftha fillers 311

9.8 A DeSign Procedure lor Subbartd Adaptiv~ Alters 314

9.9 An Exarnple 315

x

Contents

9_10 Applicanen to Acoul'iic ECho Gancelfalion 9.11 Gdmpari'sori with the F9l.,MS.lI):gprithm Problems

10 UR ,Adaptive Fillers

10.1 The Output Error Method 10.2 The Equation Error Method

10.3 Caf;ie: Study I: IIR Adapticve line.Entian·c,ement

10.3.1 IIR ALE filter, W(z) .' ..

10.3.2 Perform-ancehmctimi's

10.3.3 Simulfuneou.fj adaptation '01 sand w 10.3,4 Robust adaptation 01 w

10.3.5 Slm.ulallon resuus

lOA Cllse' $tuey II: EquaHzer Design for Magnefi!.:ReC"ordlng ChqRne!,$ 10.4.1 Channel discretization

1004,2 O'lsign s'ieps

10,4:.3 FIR equalizer design

10.4.4 Conversion from the FIR 10 the IIA equalizer 10.4.5 Conversion from the z-dornalnto thes-domain 10 . .4.6 Numerical results

'10.5 GOi')cluding Remarks Problems'

11 Lattice FiUers

11.1 Forward Linear Prec ietton 11.2 Backward lin~ar P;edi:Gtlon

11.3' The. Relations,hip Hetween Forward and Backwa rd ·P:red!ctors , 1.4 P red lctton -E fro r F lite rs

11.5. Th_e p~{}perttf!s of Pre<Uction Errors 11.8' Derivation of the Lattice Structure

11.? The Lat!19B as an Orth,;j.gonall~ati.on Transform

11.13 The Lattlce Joint Process Est:lma,mr .

11,'·9 System Functions

11.1G CO!'lversions

11.10.1 converston bal'ween the I attl c\'! and transversal pre_d iotors

11.10,2 The l.evlnsorr-Durbln algorithm .

11.10.3 EXtensloll ofth.e Levlnscn-Durbln alg\)rithm

11.11 All-Pole Lattice Structure '

11.12 Pule"':ZerO Lattice SkUctlJre

11.1::i Adaptive Lattice Fllter

11.13,.1 Discusalon .and 'simuiations

11.t4 Aulore:g:ressh'e Modelling 01 Random Processes

i1.15 Adapttve Algorithms Based on Alrt,oregrilssive ModaillRg , 1.15.1 Algorithms

11.15.2 Performance analysis

11.15.3 SlmulatiQi'I res;ulfs and plS-cl.Jssion Problems

AppendiX 1 1A: Evaluation of E[u~(n)xT(n)K(n)x(n)u!(n)l Appendix 11 B: EvaJuation of Ilia Parameter '7

12 Method ofLeast Squares

,,2:,1 Forrnulatiun 01 the Least~Squares 8stimatiQTll'or a Linear Combiner 12.2 The P~lnciple of Ort!iogonaHiy

12.3 Pmje.ctio!1 Operator

3'7' 319 320

323 324 330 ~S4 334 335 S3? 339 340 3M 345 34'6 346 348 349 350 352 3'S3

357

357 $9 361 Sa1 362 364 370 371 372 373 373 375 377 379 380 381 383 3116 388 389 394 398 40'3 409 410

413

414 416 41(8

12.4 TheSiandard Recblrslve Least-Squares Aigarithm 12.4.1 RLB rscurslons

1-2.4.2 [nlljaliza;tion 6f the RLS a:lgorithm

12.4.3 Summary of the stan.dard' RLS algorfihm

12.5 The, Converge-nce Behavfour of the RLS klijOfllhm

12.5.1 AVerage tapcweigh1l}e~.a,vi.our o-I~he RLS algorithm

12.5 .. 2 Welghi~rror eorrelanon matrix .'

12.5.3 The learning curve

., 25.4 t;;:<,cess MSE and mls",djus!menil

1.2.5.5 Initial transient behaviour oJ the RLS algorithm Pwb:lerris

13 Fast R.tS .ALgorithms

13.1 l.aast-Squares FQrwiI,rd Predidlon 13.2 Least-Squares Backward Predlotion 1 a.a The L'iH15lCSquare,S I.1ltli~e

13.4 The RlSL A.!gorll'hm

13.4. 1 Notatlof1s·a~d. p!reliminaries

1'3.4.2· Update recursion fqr-Ihe least-squares error sums

13.4_$ Converslcn taetor .

13AA Update equatlontor the' conversion faotor 13..4.5 j,Jp'da:lQeq_ualion for crcss-ccrretattons

13.4A:l The AlSL alg.orlthm using a poareriort errors 13.4.7 Tile RlSL algornhm wlth error feedback

13.5 The FTRLSalgortthm

13.5.1 Derlvaticn of the FrRtS·algorithm 1,3:5.2 Summary of the- FlRLS algorithm 13 .. 5.3 The sfablHzed FTRLS algorilhm Problems

14 Track'ing

14.1 FormUlatIon of Ihe Tracki n9 Problem

14.2 Gerratalizad FormuLation of tim LMS Algorithm 14.3 MSE An alysis of the Genera;lized LMS Algorithm 14.4 Optimum Step.Siz.e Parameters

14 . .5 Oompa-risons QI ConVentional Algorl~hms

14.6 Comparisons Based on the Optimum Step-Size Parameters 14.7 VSl.MS:.An .Algp-rithm with Opt.iillum T.(ackingBetlayjaur

14.7.1 Derivatlon of the ilSLMSalgori\hm 14.7.2 Varlattons and extsnslons

14.7.3 Normalization Of ttrepararneter p 14.7.4 Computer stmutattons

14.8 The RtS Algorithm wUh a Variable Forgetting Factor r4.$ Summary

Problems

Appendix il: Ust of MATLAB Programs

References

Index

Contellts

xi

41'9 419 422 423 425 425 426 427 430 431 434

439 440 442 443 ,4.46:

446 449 450 452 453 456 4,,8 460 4Ei1 465 466 466

471

471 472 473 477 479 483 485 436 487 4<19 489 494 49B 497

501

503

517

1

TIlls book has grown out orme author's research work and teac!JiMexpen.ence in the field of adaptive signal precessing, It is prinrerily designed as a text for a first-year graduate level course in adaptive filters .. It is also intended to serve as a technical reference for practising engineer".

The book is based on the au thor's class notes used for teaching a graduate level course at the Department oJ Electrical Engineering, National University of Singapore. These notes have also hlYi':D used to ccndnct shon courses for practising engineers from industry.

A typical one-semester COUT$e wouldcover Chapters 1, 3-6; and 12, and the first ha lr ofChapter 11, in depth. Chapter 2, which contains a short review of the basic concepts of the discrete-time signals and systems; ruay be left as self-study material for students. Seieci-e4 parts of the res! of the book may also he ta ugh! ill the same semester, or may be used with.supplemental readings for a second semester eourse on advanced topics and applicarioas ..

In the study of adaptive filters, computer simulations constitute an important supplemental component to theoretical a nalyses and deductions. Often, theoretical developments and analyses involve II number of approximations and/or assumptions, Hence, computer simulations become necessary to ooufrnn the theoreueal results. Apart from thIs,cO'mputer simulation turns out to be a necessity i.n the study of adaptive filters [Of gaining au in-depth understanding of the behaviour and properties of the various adaptive, algorithms. MATLAB,. from MaLhWork.~ Inc., appears to be the most commonly used software simulation package. Throughout the hook we use MATLAB to present a number of simulation results to darrry and/or confirm the theeretioal developments. A diskette containing the programs used for generating these results is supplied along wjLh the hook W that the reader Call run these programs and acquire a more in-depth insight intotbe eoncepts of adaptive Iil teriag.

Another integral part of this text is exercise problemsat the end of chapters. With the exception of the first few chapters, two kinds of exercise problems.are provided in-each chapter:

I. TIre usual problem exercises. These problems are designed to sharpen the reader's skill in theoretical. developOlenl. They are designed to extend results developed in tbe text, to develop someresults that are referred to in the text, and to illustrate applications to practical problems. Solutions to these problems are available to instruatora through the pu blisher (ISBN 0-471-987S8-5).

)flv

2. SinmliItiaTl-Qrfented problems. These involve computer simulations "nd are designed to enhance the reader's understanding of the behaviour of the different adaptive algorithms that are introduced in the. text. Most "Of these problems are based on the MATLAB programs that are provided on the diskette accompanying the book. In addition, tlieee are also other (open-ended)sllnulation-orientedproblems designed to help the teader d~veJop .his/her own pmgr<lmS and .prepare him/her to ~pe.rim!;!1lt with practical probfems,

Tills book assumes that the reader has Some basic background of discrete-time signals and systems (including an introduction to linear .system theory and random signal aaalysis), complex variable theery and matrix algebra. However. brief reviews of these topics are provided in Chapters Z and 4.

TIle book: starts wit$. a general overview of adaptive filters in Chapter L Many examples of applications such as system modelling, channel equalization, echo eancellation and antenna-arrays are reviewed in this chapter, This is followed bya brief review of discrete-time signlils and SyStems, ill .Chapter 2, waiGh puts the related concepts ina ftamewodc appn;tpriate l."ot" 'the rest of th.j::· book.

In Chapter S we introduce a class of optimum lineer systems collectively known as.

Wiener filters. W.iener filters are Iundamenral to the implernenraticn of adaptive filters, We note that the COSl Iunctian used !g formulate the Wiener filters is an elegant caoiee leading (0 a marhemaueally tractable problem. Wi:! also discuss t1i!3 uncoosrrained Wiener filters with respect to causality and duration of the filter impulse response. Tills stud:y reveals many interesting aspects of Wiener :fi.lten. and establishes a good t!mndalioI): for the study of ad:aptiyefi!ter~ for the rest of the book. In particular, 'we find. that, .in the limit, when the filter Length tends to infinity, a Wieuer fdter' [Teats different freqcency components of underlying processes separately. Numerical examples reveal .that when the filler length is limited, separation of frequency components may be replaced by s~paratiQn of frequency bands, within a good apprcximation. Tills treatrnent of auaptive filters that is pursned thsoughout the bonk tarnsout to be an enllghrening engineering approach to the srudy 'Of adaptive filters.

Eigenanalysii> isan essential mathematical tool for the study of adaptive filters, A thosough treatment Of this topic is covered in the :lirsl balf of Chapter 4. The second lialf of tbets chapter;giveli an analysis of theperformance Slir [ace oftransversaJ Wiener filters. This is foUdwed by search methods, whicb are introduced ill Chapter 5. The search methods discussed in this cihapter are idealit;ed "Ver~iolls of lhe statistical search methods that are used in practice for the actual implementatien of'adaptive filters. They are idealized in the .sense that the sta tist(Cll of the underlying processes areassumed to be knowna priori.

The celebrated least-mean-square (LMS) algorithm is introduced in Chapter 6 and studied extensively 'in Chapters 1-11. The Ll1S algorithm; which was. 1irSL proposed by Widrow and Hoff in 1960. is the most widely used adaptive filtering ajgorithm in practice, owing to liS simplicity and rebustness 10 signal statistics.

Chapters l.i .and 13 are devoted [Q the method of le-ast squares. This discussion, althDl:lgD brief, gives' tiiebasic concept of tl'e w.e:Lhoil ofleast SllUl:lns <l1l,d .highligbts its advantages and disadvantages Qnmpared with the LMS· based algorithms. ln Chapter 13 the reader is introdnced to the fast versions of lcast-squaresalgcruhras. Overall, these two chapters lay a good foundation for the reader to continue bis/her study of this subject with reference to more advanced books and/or-papers.

Preface

The problem of trackingis discnssed ill the firral chapter of the book. In the context of asystem modelling problem. we present ageneralized lormulation of the LM Salgorithm which covers nrostof the alg:orith.im; that" are discnssed in the various chapters of the book, thus bringinga CODl)JlQII platform fOF the compaJiS.OD of the different algorithms, We. alse discuss' how the seep-slze parameter/a) of the LMS algorithm and the forgetting factor of the RLS algorithm .may be optimized to achieve g-ood tracking behavieur.

'The following notations are adopted in this book. We usenon-bold lowercase letters for scalar quantities, bold lowercase (OT vectors, and bold uppercase for matrices. Nonbold uppercase letters are used forfunctiens of variables, .such as Ntz), and lengths] dimensions of vectors/matrices. Thelowercase letter n is used rQf tbe time index. In the case of block processing algorithms, such ;;IS those discussed in Chapters 8. and 9, we reserve the lowercase letter k as the block index, The time and block indices are put in brackets, while subscripts are used to refer to clements of vectors and matrices. For example, tile fth element of'the time-varying tap-weighL vector w(u) is denoted !).S wr(n). The superscripts T and H denote vector or matrix: transposition and Hermitian transpesitioa, .respectively. We keep all vectors in column Conn. Mere specific notations are explained in the text as and when found necessary.

11. Farhang-Boroujeny

1

Acknowledgements

1 am deeply indebted to Dr George 'Mathew of Data Storage Institute, National University of Singapore, for critically reviewing the entire manuscript of this book. Dr Mathew checked tl'lrough every single line of the manuscript and matIe numerous invaluable suggestions and improved thebook in many ways. lam trulygrateful to him for his invaluable help,

I am alse grateful to Professor V. U. Reddy, Indian Instituie or Schmoe. Bangalore, India, for re .... iewing Chapters 2-7, and Dr M. Chakraborty, Indian institute of Technology, Kharagpur, India. for reviewing Chapters 5,4, and n ana making many

valwble suggestions. .

1 am indebted LO my gr;ldua te students, both in Iran and Singapore. for helping me in the development of many results that are presented ill tbis book. In particular, J am gra teful to Dr S. Gazer and Mr Y. Lee for helping me to develop some ofthe results on transform domain adaptive filters lh{u are presented in Chapter 7. I am .also grateful to Mr Z. Wang for his enthusiasm towards the development of subband adaptive filters i:6 Ihe fonn presented in Chapter 9,.

1 wish to thank my students R B. Chionh, K. K.... Ng (Adrian) and T. P. Ng Ior their great help and patience in checking the accaracy of all che references in the bibliography.

I also wish to thank. my colleagues in the Department of Electrical Engineering, Na;tiomtl University of Singapore, for rheir support andencouragement in the course of the development of this book.

1

Introduction

As we begin OUf study of"aciaptivefilters.', it may be worth trying to understand the meaning of the terms 'adaptive' and 'filters' in a very general sense, The adjective <adaptive' can be understood byconsidesing ii system which is trying to adjust itself'so as to respond to some pheeomenod that is taking plate in its surroundings. In other words, the system tries to adjust its parameters with the aim of meeting some well-defined goal OT target which depends upon the state ef the system as. well as its surrounding, Tills is what 'adaptation' means. Moreover, there is a need to have a set of steps or certain procedure hy which this process of'adaptat~nll' is carried out, ADd finally, the 'system' that carries out and undergoes the process of'adaptation' is called by the more technical, yet general enough, name 'filter' - a term that is very f3.mjliar to and a favourite of a.ny engineer. Clearly, depending upon the 'time required to meet the final target of the adaptation process, whicli we call convergence time, and the compiexity/resOl.l.IOOS that are available to carry out the adaptation, we can have a variety of adaptation algorithms and filter structures, From this point Qf view, we may summarize the C{j.liten(s/col1tribution of this book as '[he study or some selected adaptive algorithms and. their inl plemenrations along wi th the associated filter structures, from the points of view of their convergence. and complexity performance'.

1 .. 1 LinearFUters

The term'jilter' is.comp:ranly used to refer to any device or system that takes a mixture of particles/elements from its input and process them according to some specific rules to generate a corresponding set oC particles/elements at its output. In the context ofsignafs and systems. pOlflicles/e1e111Imis are thcft"eqll&ncy components efthe underlying signals and, !.radiltpually, filters are used to retain an the frequency components that belong to a particular band of frequencies, while rejecting the rest of'tliern, as much as possible, In a mote general sense. the term filler may be used ro refer to a system that reshapes the frequency components of the input to generate an output Signal with some desirable feat U res, and. this is how we view the. concept of filtering throughoutthe cha peers which follow,

Filters (or systems, in general) may beeither linepl" or non-linear. In this book, we consider onty liD car filtersand our emphasis will also be on discrete-time signals and systems. Thus, all the si.gnals wi!! be represented by sequences, sueb as ,:1;(11). The most

2

In1roduction

Adaptive Filter Structures

3

1

input signal

Filter

output signal

1 desired

signa]

ffi error signal ~

detenninistic formulation, the filter design requires the computation of certain average quantities using the gi venset of data that the filter should process .. On the other hand, the design of Wiener filter (i.e. in the stochastic approach) requires a priori knowledge of the statistics of the underlying signals. Strictly speaking, a large 1] urn ber of realizations of the underlying signal sequence' are required for reliably estimating these statistics. This procedure is not feasible in practice since we usually have only one realization fO'T each or the signal sequences. To resolve this problem, it is assumed that the underlying signal sequences are ergodic, which means that they are stationary and their statistical and time averages are identical. Thus, by using time averages, Wiener filters can be designed even though there is only one realization for each of the signal sequences,

Although direct measurement of thesignal averages to obtain the necessary informalion for the design of Wiener or other optimum filters is possible, in most of the applicatrons the signalaverages (statistics) are used in an indirect manner. All the algorithms covered. in this book take the output error of the filter, correlate that with the samples of filter input in some way and use the result in a recursive equation to adjust the filler coefficients iteratively. The reasons for solving the problem of adaptive filtering in an iterative manner are:

1. Direct computation of'the necessary averages and their application for: computing the filter coefficients requires the accumulation of alarge amount of signal samples. Iterative solutions, on the other hand, do not require accumulation of signal samples, thereby resuhing in a significant amount of saving in memory.

2. The accumulation of signal samples and their post processing to generate the filter output, as required in non-iterative solutions, introduces a large delay in the filter ourpu L This is unacceptable in inanyapplicati:ons.lIerCIti1l6 solutions. on the contrary; do no/ iruroduce any.sitntfir:arz/ delaJl in the filter output,

3. The use of rteraticns results in ada-ptive Solutions with some tracking capability. That i , if the signal statistics are changing with rime, then the soluriou provided by an iterative adjustment of the filter coefficients will be able to adapt lo the new statistics.

4. Iterative solutions, io general are much simpler to code in software or [.Q implement.in hardware than their non-iterative counterparts.

1

Figure 1.1 Schematic diagram of a filler emphasizing its role in reshaping the i nput signal to match the desired signal

I

basic feature of linear systems is that their behaviour is governed by the principle oj superposition. This means that if the responses of a linear discrete-time system to input sequences Xl (II) and x",(n) are YI (n).aJidY2{rz), respectively, then the response of the same system to the input sequence -'I:(n) = aX1 (ll) + hXQ(n), where a and b are arbitrary constants, will be yen) = aYl (n) + bY2(1I). This property leads to many interesting results in 'linear system theory'. In particular, a linear system is completely characterized by its impulse response or the Fourier transform of its impulse response, known as the transfer jllm:ticm. The transfer function of a system at any frequency is equal to its gain at that frequency. In other words in the context of our discussion above, we may say that the transfer function of a system determines how the varieus frequency components of its input are reshaped hy the system ..

Figure 1.1 depicts a general schematic diagram ora filter emphasizing the purpose [or which it is used in different problems addressedjdiscussed in this book. In particular, the filter iatised 10 reshape certain btput S;;gllalsin such 11 way that its OUlpH( is a good estimate of the given desiredslgnal. The process of selecting the f1J ter parameters (coefficien ts) so as to achieve the best match between the desiredsignal and the filter output is often done by optimizing-an appropriately defined performancefunciion. The performance function can be-defined in a statistical or delenninisticuarnework.ln the statistical approach, the mest commonly usedperformanec function is the mean-square value of the error signal, i.e, the difference between the desired signal and tlre filter output. For stationary input and desired signals, mrnimizingjhe mean-squa re error results in the well-kn own Wiener filler. which is said to be optimum in the mean-square sense. Thti su bjed of Wiener filters ~H be covered extensively in Chapter 3. Mosl or the adaptive algorithms lhat arc studied in tins book are practical solutions to Wiener filters. In the deterministic approach, the usual choice of performance function is a wei glued sum oj [he squared error sign.al. Minimizing this function results in a fi I tel' which is optimum for tire given set of data. However under

orne assumptions on certain statistical properties of the data, the deterministic solution will approach the staristical solution, i.e. the Wiener filter, for large data lengths. Chapters 12 and 13 deal with the deterministic approach in detail. We refer the reader to Section ,\.4 of this. chapter for a brie.f overview of the adaptive Iormulations under the stoehastic (i .e .. statistioal) and. deterrii:inistic Fnuncworks.

1.3 Adaptive Filter Structures

The most commonly used structure in t1:re implementation of adaptive filters is the transversal structure, depicted III Figure 1.2. Here, the adaptive filter .aas a single input, x(/I), and an output'. y(n). The sequence d(lI) is the desired signal. The output, )'(11) is generated as a linear combination of the delayed samples of the input sequence, x(n), according to [he equatian

N-[

Y(Il) = L lI'i(n)x(n- f), i~O

(Ll)

1.2 Adaptive Filters

where the ",;(n)s are the filter tap weights (coefficients) and N is the filter length, We refer to tile input samples, x(1l - i), for i = 0, I, ... ,N - I, as the filter lap inputs. The tap weights, the IVI(n)s~ which may vary in time, are controlled by the adaptation algorithm.

As mentioned in the previous section, the filter req uired for estimating the given desired signal can be designed using either the stochastic or derenninistie formulations. III the

In some applications, such as beamforming (see Section 1.6.4), the 6Jtertap inputs are not the delayed samples of a single input. In such cases the structure of the adaptive filter assumes the form shown ill Figure ! .3. This is called a linear combiner, sinceits output is a linear combination of the different signals received at its tap inputs:

4

Introduction

Adaptive Filter Structures

5

flO(n)

X{Il)

1-------.---- yen)

Figure 1.2 Adaptive transversal fliler

Figure 1 .4 The stru ctu re of a n II R filter

N~l

y(n} = E 11';(lI)x;(n).

i=O

(12)

Note that the linear combiner structure is more general than the transversal. The latter, as a speoial case of the former can be obtained by choosing XI(11) = x(1I - i).

The structures of Figures 1.2 and 1.3 are those of the non-recursive filters, I.e, computation of:fi1ter outpnt does not involve any feedback mechanism, We also refer to Figure 1.2 as a finite-1m pulse response (FIR) filter, since its impulse response is of finite duration in time. An infinite-impulse response (ITR) filter is governed by recursive equations such as (see Figure 1.4)

(1.3)

IV-I M-l

y(n) = L: aj(fl}x(n - i) +2: b;('n)Y(71 - i),

1'=0 -1=1

wo(n) ~+-----~~------------------~

where ai(") and bi(n) are the forward and feedback tap weights, respectively. IIR filters have been used in many applications. However, as we shall see in the later chapters, because of the many difficul ties hrvolvedIn the adaptation of IIR filters, their application in the area of adaptive filters is rather limited. In particular, they can easily become unstablesince their poles rnay gel shifted out of the unit circle {i.e. Izi = 1, in the s-plane (see next chapter)) by the adaptation _PTOCesS. Moreover, the performance function (e.g. mean-square error as a function of filter coefficients) of an IIR filter usually has many local minima points. This may result in convergence of the Iilter to one of the local minima and not to the desiredglobal minimum point of (he peJ"fonnance function, On the contrary, the mean-square error functions of the FIR filter and linear combiner are well behaved quadratic functions with a single minimura point which can easily be found

den)

FIgure 1.3 Adaptive' linear comblnar

6

Introduction

1

through various adaptive algorithm . Because of these points, the non-recursive filters are tile sole candidates in most of the applications of adaptive filters. Hence.znost of the discu sica in the subseqneni chapter is limited to the non-recursive filters. The fiR adaptive filters" with two specific examples of their applications, are discussed in Chapter 10.

The FIR and OR structures shown in Figures L2 and '[.4 are obtained by direct realization of {be respective difference equations (l.l) and (1.3), These filteramay alternatively be lmplemfmled 1lsing the lattice structures. The lattice structures 's- ingeneral; aTC more complicated than the direct implementa tions, However, in certain appl ications they have some advantages wbich make them better candidates than the direct forms. For instance, in the application of linear prediction for speech processing where we need. to realize all pole (TIR,) filters, the lattice structure can be more easily controlled to prevent possible instability of the filter. The derivation of lattice structures for both FIR and llR filters is presented in Chapter 11. Also ill tile implementation or the least-squares method (se-e Section 1.4.2), the use of lattice structures leads- to a computationally efficient algorithm lrnOWII as recursive leest-squares lattice. A derivation of this algorithm is presented in Chapter 13.

. The FIR and [[R filters that werediscuased abcve are elassified as linear filters since their ou tpu ts are obtained as linear com binations of the present and past samples o£ input and, in the case of the 11R filter, the past samples of the output also. Altbough most applications are restricted to the use of linear filters, [Jon-linear adaptive filters become necessary in some applications where the underlying physical phenomena to be modelled are far Irom being linear. A typical example is magnetic recording 'where the recording channel becomes non-linear at high densities as <I result of the interaction between the magnetization transitions written on the medium. The Volterra series representation of systems is usually used in such applications. The output yen), of a Volterra system is related to its input, x(n)., according to the equation

J

)'(11) = wo,g(n) + L IVI,dn)x(n - f) i

+ L fl'2JJ(n)x(n-l)x{11- j) i,j

-I- L 11!3,iJ,k(1I)x(n- i)x(n - j)x(n - k) + ... , i,j,k

(1.4)

)

where IV-O.O(71). the .~\i(n)s, the f~'l,i,i(II)S, the 1I11tiJ,/r(n)s, ... are filter coefficients. In this book, we do not discuss the Volterra filters any further. However.we note that all the summations ill (I.4) may be put together and the Volterra filter may be thought <of as a Linear combiner Whose inputs are determined by the delayed samples of x(n) and their crQ!',s-m\lltl?Ey~tialls. Noting tni'it, we iin:Q. that the extension of most of the adaptive filteringalgerithms to the Volterra fillers is straightforward,

1.4 Adaptation Approaches

As introduced in Se-ctions 1.1 and 1.2, there are two distinct approaches that have been widely used in the development of various adaptive algorithms; namely, stochastic and

Adaptation Approaches

7

deterministic. Both approaches have many variations in their implemen~'l:tions leadio.g to a rich variety of algorithms, each of whieh offers desirable features of lt~ OWIl. In this section we present a review of these two approaches and highlight the main features- of the related algorithms.

1.4.1 Approach based on Wiener fifter theory

According to the Wiener filter theory, which comes frOID. lb,e stoehastic framework, the optimum coefficients of a linear filte~r are obtai~ed by m~ti~n of its me~-sq~~e error (MSE). As already noted, strictly speaking ~e JU!I1l!11lz8tlOn of MSE re:G-uu~ certain statistics obtained through ensemble averaging; which may not be pOSSIbLe III practical applications. The pro blem.is resolved using erg0?-icity so as to Us~ time averages instead of ensemble averages. Furthermore, to come up with simple .rCCurSlVC algorithms, very rough estimates of'the required statistics are used. In fact, the celebr~ted l~asf~m~allsquare (LMS) algorithm which is the most basic and widely used algorithm In various adaptive filtering applications, uses the i"sra1Hane(Ju~ value of the s~uare of the error signal as an estimate of the MSE. It turns out that this very rough e~tunate of th~ MSE, when used 'with a small step-size parameter in searching for the optnnum coefficients of the Wiener filter, leads to a very simple and yet reliable adaptive algorithm.

Tne main disadvantage of the LMS algorirhmis that its convergence beba;iour ~s highly dependeni on the power spectral densi ly of the filter input. When lhe. Ji1 ter mpu t IS white, i.e. its power spectrum is flat across the whole range of frequencies, the LMS algorithm converges very fast. However, when certain frequency bands are not well excited (i.e, the signal energy in those bands is relatively low), someslow modes of converzence appear, resulring m very slow convergence compared wlth the case or white input. fu. other words, to converge fast, theLMS algorithm requires-equal excitation over lhe whole range of frequencies. Noting this, over tire yeats researchers have developed many algorithms which effectively divide tbe fr-eq:r-ency ba~d ~r the in~ul signal into a number of subbands and achieve s-ome degree 0 f SIgnal whitening by using some power normalizatiou mechanism, prior to applying the adaptive algorithm. These algorithms, which a ppear in differen t forms are presented in Chapters 7, 9 and J 1.

In some applications, we need to use. adaptive filters who~e length exceeds. a few hundreds OT even a few thousaeds of taps. Clearly uch iilters are compurationally expensive to Implement. An effective way of Implem.cnting such filters at a mu~ lower computational complexity is to use the fast Fourier transform (FFT) algorithm ~o Implement time domain convolutions in the frequency domain, as IS commonly done III the implementation of long digital filters (Oppenheim and Schafer, 1975, 15189). Adaptive algorithms lhat use FIT Ior reducing computational complexity are presented in Chapter 8.

1.4.2 Method of least squares

The adaptive filtering algorithms whose derivations are based 011 the Wiener filter theory have their origin in a statistical formulation ef'the pro blem. In. contrast to this, the method: of least squares approaches the problem of filter optimization from a detenninisti:c point of view. Asalready mentioned, in the Wiener'filter theorythe desired filter 18 obtained by minimizing the mean-square error (MSE). ie. a statistical quantity. In the method of least

8

Introduction

Real and Complelt Forms of Adaptive Filters

9

sq-uares, on the other hand, the performance index is the sum of weighted error squares for the given data, i.e. a deterministic quantity. A consequence of this deterministic approach (which will become clear as we go through its derivation In Chapter 12) is that the leastsquares-based algorithms, in general, converge much faster than the LMS-based algorithms. They are also insensitive to the power spectral density of the input signal. The price that is paid for aehieving this improved convergence performance is higher

eonrputaticnalecmplexity and poorer numerical stability. -

Direct f ornrulation of the least-squares problem results in a IDa trix formulati on of its solution. which can be applied on a block-by-block basis to the incoming signals. This, which is referred to as the block estimation of the least-squares method, has same useful applications in areas. such as linear predictive coding of speech signals. However, in the context of adapti ve filters, reenrsive formulations of the least-squares method that update the filter coefficients-after the arrival of.every sample of input are preferred, for reasons that were given in Scotian 1.2. There are three major classes of reeursive least-squares (RLS) adaptive filtering algorithms:

• The standard RLS algorithm

• The QR-eecomposition-l,Jased RLS CQRD-RLS) algorithm

• Fast RLS algorithms

pi-periling technique. Another desirable feature of these algorithms is that certain variants of them are very (0 bust against numerical errors arising from the use of finite wo.rd lengths in computations.

2- Fast transversal RLS algorithm: In terms of number of operations per iteration, the fasttransversal RLSaigorithm is lesscotuplex than the lattice RLS algorithms. However, it suffers from numerical instabiliry problems which require careful a1tenuQD to prevent undesirable behaviour in practice,

In this book we present a complete treatment 0 r the various LMS- based algorithms, in seven ehapters. Hewever our discussion of RLS algorithms is. rather limited. We present a comprehensive treatment of the properties of the method of least squares and a derivation of the standard RLS algnrithm in Chapter 12. The basic results related to the development orrast RUi algorithm!:; and some-examples of such algorithms are presented in Chapter 13. A study of lilt: tracking bella l'iOlll" of selected adapti vefilrering algorithms is presented in the final chapter of the book.

1.5 Real and Complex Forms of Adaptive Filters

The standard RLS algorithm

The detlvatiou of this algorithm involves the use of a well known result from linear algebra known as the matrix inversion lemma. Consequently, the implementation of the standard RLS algorithm involves matrix manipulations that result in a computational complexity proportional to the square of the filter length.

There are some practical applications in which the .1iI LeT in put and its desired signal are complex .. valued. A good example of this situation appears in digital data transmission. where the most widely used signalling techniques are phase shift keying (PSK) and quadrature amplitude modulation (QAM . ill rills application. the baseband signal consists of two separate components whkh are the real and imaginary parts of a eomplex-valned signal, MOTeQVer, in the case of frequency domain implementatigll of adaptive filters (Chapter 8) and subband adaptive filters (Chapter 9), we will be dealing with complex-valued signals, even though the original signals may be real-valued. Thus, we find cases where the formulation of the adaptive filtering algorithms must be given in terms of complex .. valued variables.

In this book, to keep our presen ta Lion as simple as possi ble, most of the derivations are given J;:or real-valued signals .. However, wherever wefmd it necessary, the extenslens to complex forms will also be followed.

The QR .. decomposition .. based RLS (QRD-RLS) algorithm

This formu lation of RLS algori fum also in valves rna trix manipulations which lead [0 a computational complexity that grows with the square of the filler length. However, the operations involved here are such that they can he put into some regular structures knowa as SYS(o/te. arrays, Another important feature of the QRD-RLS algorithm is its robustness to numerical errors as compared with other types of RLS algorithms (Haykin, L!)91. 19-96). -

1.6 Applications

Adaptive filters, by their very nature, are self-designing systems which can adjust themselves to different environments .. As a resulL,adapLiv.e fillers find applications in such diverse fields as COIltrOJ, cornmunicatione, radar and sonar signal -processing, interference cancellation, active noise control, biomedical engineering, etc. The common feature of theseapplications which brings them under the same basic formulation of adaptive fi)teringis that they all involve a precess of filtering some inpu; signal to match a desired response. The filter parameters Me updated by making a set of measurements of the underlying signals and applying that set to the adaprivc filtering algorithm such that the difference between the filter output and the desired response is minimized in either a statistical or a deterministic sense. In this context, four basic classes of adaptive filtering applications are recognized. Namely, modelling, inverse modelling, linear prediction, and interference cancellation. In the rest of this chapter, we present an overview of these applications.

Fast RLS algorithms

In the case of transversal filters, the tap inputs are successive samples of the input signal, x(n) (see Figure 1.2). The fast RLS algorithms use this property ofthe filter input and solve the problem of least squares with a computational complexity which is proportional to the length of the filter, thus the name fdst RLS. Two types of fast RLS algorithms may be recognized:

I. RLS lauice algorithms: These lattice algorithms involve the use of order-update all well as the time-update equations. A consequence of this feature is that It results in modular structures which are suitable fQT hardware impleraeutaticns using the

f

J

1

J

J

10

Introduotion

Applications

11

1

x(n) G(Z)

d
.« yen) --,
W(z)
'" sen) X(ll) sen)
Channel Detector
channel
parameters /f deci
+ -: die
,- Channel ~
]lodel
train
Training
sequence den) ...J.-

SiOIl ected

tog

e(n)

Figure 1.5 Adaptive system modellina

1.6.1 Modelling

Figure 1_5 depicts the prpbJem of modellJng in the can text ofadapti vc filters. The aim is to .es~imate the parameters of the model, W(z), of a plant, G(z). On the basis of some a pnen knowledge of tile plant, G(z), a transfer function, W(z), with certain number of adjustable param~~lcrs is s~lectedfust. The parameters of W(z) are then chosen through an adaptrvc filtering algorithm such that the difference between the plant output, d(n), and the adaptive ruler output, yen). is minimized.

A a application of modelling, which may be readily tho rrghr of, is system idmt{ficorioll.

In tnostmodemcontrol systems the plant under control is identified on-line and the result is used in a self-tuning regulator (STR) loop, as depicted in Figure 1.6 (see A~tr6m and Wittenmark. 1989, for example).

Another application of modelling is echo cancellation. In this application an adaptive filter I.S used to identify the impulse response of the path between the source from Whicl! the echo originates and (he poim wn1::n: the <len€) appears. The QU[Put of the adaptive filter, which is an estimate of the echo signa), can then be used to cancel the undesirable echo. The SUbject of echo cancellation is discussed further below in Section 1.6.4.

Figure 1.1 An adaptive data receiver using channel ideptlfieation

NOll-ideal ch aracteristies of communication channels-often resul t in some distortion in the recei ved signals. To mitigate such distortion, channel eq ualizers are usually used. This technique, which is equivalent to implementing the inverse or the channel response, is discussed below in Section 1.6.2. Direct modelling of the channel, however, bas also been fonnd useful in some imp1e:mel.ltalions of data receivers, For instance, data receivers equipped with maximum likelihood detectors require an estimate of the channel response (Proakis, 1995). Furthermore, computation of equalizer coefficients from channel response has been proposed by some researchers since this technique has been found to result In better tracking of time-varying channels (Fechtel aod Meyr. 1991 and Farhang-Boroujeny, 1 996c). In such applications, ntraining pauern is transmitted in the bcgiotdng of every CODlJeCuo·n. The received signal, which acts as the desired signal to an adaptive filter, is used in a set-up to identify the channel.as shown in Figure 1.7. Once the channel is identified aud the norma) mode oftrausmission begins, the detected data sym,bols,s(n), are used as input LO the channel model and the adaptation process continues [or tracking possible variations of the channel. This is known as the decision directed mode and is also hewn in Figure 1.7.

Model Parameters

1.6.2 Inverse modelling

Inverse modelling. also known as deconvolution is another application of adaptive filters which has found extensive use In various engineering disciplines. The most widely used applieauon of inverse modelling is in ccmmunlcarione where an inverse mode] (also called an etjuaHzer) is used to mitigate: tfte channel distortion. The Concept of inverse modelling bas also been applied to adaptivecontrol systems where a controller is to be designee and cascaded with a plan [ so tha tthe overall response 0 F this cascade matches a desired (target) response (Widrow and Stearns, 1985). The process of prediction, wmch will be explained later, may also be; viewed as an inverse modelling scheme (see Section 1.6.3). 10 this section we concentrate on the application of inverse modelling in channel equalization.

Figure 1.6 Block diagram of a s.eIHunfng regulator

12

Introduction

v(n) additive neise

s(n)

-J ..

sen - L>.)

Equalizer W(z)

Channel H(z)

transmitted symbolS

detected '-------' symbols threshold

detector

Figure 1.8 A baseband data transmission system with cll-annel equalizer

Channel equalization

Figure 1.g depicts the block diagram of a baseband transmission system equipped with a channel equal izer. Here, the channel represents the cornbi ned response of the transmitter filter, the aetual channel, and the receiver front-end filter. The additive noise sequence, v{n), arises from thermal noise in the electronic circuits and possible cross-talks from neighbouring channels, The transmitted data symbols, S(II), thatappear 1n the form of amplituderphase modulated pulses, are distorted by the channel. The most significant among the different distortions is the pulse-spreadtng effect. which results because the channel impulse response is not equal to an ideal impulsefunction, but rather a response that is Don-zeta ove-r mauy ymbol periods. This distortion results in interference of the neighbouring data symbols with one another, thereby making the detection process through a simple threshold detector unreliable. The phenomenon of interference between neighbouring data symbols is known as intersymbol interference (lSI). The presence of the.additive noise samples, v(n) further deteriora tee the performance of data receivers, The role of the equalizer, as a ftlte~, is to resolve the distortion introduced by the Ch!I.n:Uet (i.e, rejection or.rninimizarion of lSI) while minimizing tile effect of additive noise at [he threshold detester input (equalizer output) as much as possible. If the additive noise eo-uld be ignored, then the task of equalizer would be rather straightforward. For a channel H{z), an equalizer with transfer'function W(z) = ]/ H(z) could do the job perfectly, as this results in an overall channel-equalizer transfer function H(z)W(z) = I, which implies that the transmitted data sequence, sen), will appear at the det-ector input without any distortion. Unfortunately, this is an ideal situation which cannot be used in most of the practical applications.

We note that the inverse of the channel transfer function i.e. 1/ H(z), may be noncausaluH(z) happens to have a zero outside the unit circle, thus making it unrealizable in practice. This problem.is solved by selecting the equalizer so that H(z) W{z) ~ z-I::.. where £:. Js an appropriate inreger delay, This is equivaleu!lD saying that a delayed rep)jca of !:be transmitted symbols appears at the equalizer output. Example 3.4 of Chapter 3 clarifies the concept Q[ non-causality of 1/ li(z) and also the way the problem is (approximately)

solved by introducing a delay, ll. -

We also note that the choice. of W{z) = l/H{z) (or W(z)::::I z-A IH(z}) may lead to a significant enhancement of the additive noise, V(li), in those frequency bands where the magnitude of H(z) is small (i,e, 1/ H(z) is large). Hence, in choosing an equalizer, W(z), we should keep a balance between residual lSI and noise enhancement at the

Applications

13

v(n)

s(n- ~)

.J ..

sen)

Channel --.j~ H(z)

decision directed

e(n) den)

L- ~+.~------~

+

[raining

lr~ping

~------------~

sequence

Figure 1.9 Details ola baseband data transmission system equipped with an adaptive channel equalizer

equalizer output. A Wiener filter is a solution with such a balance (see Chapter J, Section 3.6.4).

F~gLlre 1.9 presents the details of a baseband transmis-sion system, equipped with an adaptive equalizer. The equalizer- is usually implemented ill the form of a transversal filter.Initlal training of the equalizer requires knowledge of the transmitted data symbols (or, to be more accurare, a delayed replica of them) since they sho-uld be used as the desired signal samples for adaptation of the equalizer tap weights. This follows from the Iact that the-equalizer outpm should ideally be the same as the transmitted data symbols. We thus require all initializarion period during which the transmitter sends a sequence of training symbols that are known to the receiver. This is called the training mode. Training symbols are usually specified as pan of the standards and the manufacturers or data moderns should comply with these so that the modems of different manufacturers can communicate with one another. (The term modern, whicb is an abbreviation for 'modulator and demcdulator', is comrnenly used to relerro data transeeivers (transmitter and r~i'Ver).)

At the end of the training mode. the tap weights of the equalizer wouJ,d have converged dose to their optimal values. The detected symbols would then be similar to the transmitted symbols with probability close to one. Hence, from then onwards. the detected symbols ean be treated as the desired signal for further adaptation of the equalizer so that possible variations in the channel can be tracked. This mode of operation of the equalizer is called the decision directed mode. The decision directed.mode successfully works as long as thechannel variation is slow enough so that the adaptation algorithm is able to fellow the cbannel variations satisfactorily. This is necessary for the purpose of ensuring low symbol error rates in detection so that these symbols can still be llSOO as the desired signal,

The jn~'Cr$ernodelling discussed above defines the equBlize:r as aJ'J a.pproximatiolJ of :-8jH(z), i,e, the target/desired response of the cescade ofchannel.ande-qualiZer (cs z-8., a pure delay. This can be generalized by replacing the target response z-8. by a general target response, say I'( z). In fact, LO achieve higher efficiency in the usage of the II vailable bandwidth, some special choices ofr(z} i- z -8. are usually considered in communication systems, Syst-ems that incorporate such non-trivial target responses are referred to as partial-response signalling sys-tems. The detector in such systems is no more file simple threshold detector, bur one which can exploit the information that the overall channel is

j

1

14 Introduction

now r_(z), instead of the trivial mernor 1- c _-4 _ .

1995).IS all example of such a der l Y';: hanneL . The Vlferbi detector (Proakis

IDJ ag~~tude response apprOXimat~yor~atc~Ug~t response, r(z), is selected so that it~ H( e ) I, over the range of fJ"('!quen' [ . S t ie chan, ne1 response, Le, Ir(e1W) I ~ equali - hi- 1 - cres 0 mteres: Tb .. 1 f 1.' I ~

. _ rzer, 'AI C 1 is now W(z) ~ f(z.)/H(z) h . ~ ~pact 0 tUIS choite is that fue

matelyequal to one, thereby minimizin th' a~ a magruttlde response that is approxiand also lo mention another apPlicatigon e n'o.lse enhanceme:~L To clarify this further

problem of ITmgnctic reco di of mverse rn odel ling we next .. I':

- _' r mg. . " , uiseuss the

Magnetic recording

The process of Writing data bits On a mazne . " ,_

back later is similar to ending data bits 0 - tic llledium (tape or disk) 'lIud reading them

a tra " . J' ] s over a eOllUrfUllIC3 ti 'I

' n~m]SSlOJl rne and receiving them t lh th "OI1.c rannel from ,one-end of

are conVerted to signal pulses- prior to ~eco~~;~~rend of the line, ~he ~ta bits, which non-perfect be1)avJOur of Ihe head and moo' ~"O, undergo .some dlS,tOrtion due to the because.oftbeu?u_idcfll respOose of the Oha:::' as h~~pens ill communication channels from neigh bouangrecmding tracks (just rk :!ddltlve thermal noise and interference are also present in the.lllagnetie recordln J ,~nclg bounng channe-ls in comnnmications)

Magnetic recordi ng channelS are us alT c ha~els (_Bergtn.an, 1996).

pulse o.f width one hit in terval,· T. uti: is :: ,ill acfel'lZed b~ their responSe to an isolated hard-dis~ chaonels it isusuajly modelled b °twn as the dl~l~ response, ,and in lhe case of

Lor, ,emzwJ1 pulses separated b'b" Y, he sll,perposlt1011 of POsttive and ne t'

, , ' ' . y ooe II Interval T In ga Ive

models the Step respOnse of the channel Th ' , . .other words, the L,orentzjll!l pulse

. e LorentMBn pulse is defined as

ga(l)=-~_

1+ (2l)2'

Iso

where 1 is th '·1 .

.' so e pu se width measured at 50~ f i .

a' Ill.Kn (t) and other functions that appear'· °b Jts maxlm~U1 ampIitud~. The subscript thal t~ey are analog (oon-sampled) sizu It t; rest o! this 8ubsectl0J1lS to emphasize recording del1sity. Typical values of D a --: ~ s. ,~ rano D == t5(}/T is known as th that more hi ts are contained in one t int~:'::l i.?e range 1 to J. A higher density means a temporal measure of the recoro1!g densit·I.C:;;;Or:eISI. We may also note tbaL (51) is another parameter, plV.5Q = tsO/v Where . - rJ' I e~ measured spatIally, we obtain ~e h~a.d. AccQrdingLy. for a gjve; speed, ~,I:be~~:;~;~ of tn:' medium with reSpect to

ts "'.IHlen on a IGlJglb pW5Q along the trackLh sp_eci:fies ~e actual DlJmber of

USJllg (1.5), tbe dibit response of a hard-di~: ',h'o ma~ettc. n;edrum.

c rannel 15 obUulled as

lIa(t) = g_,.(t) - ga (I - T).

The response of the cha I

en n"oI'u,,]'on sum' nne' La a sequence sen) of .:l~ 'a, h' .

, , Uul rts IS the-ngt\,en by the

(J .5)

(1.6)

uaCt) = LS(h)hn(r - 111').

~

(L7)

Applica.tions 15

Thus. the dibit response, h.U), is nothing but the impul e response 01 the recording channel.

Figures l.H~(a) and (b) mow the dihit (time domain) and magnitude (frequency domain) responses, respectively, of the magnetic channels (based on the Lorentzian model) for dellsitiesD = 1,2 and 3. From Figure 1.I0(b) we note tha.tmost of the energy io the read-back signals is concentrated in a midband range between zero and an upper" limit around 1/2T Clearly, the bandwidth increases withincrease in density. In the light of our previous discussions, we may thus choose tbe target response, P(z), of the equalizer S0 that it.resemblesa bandpass filter whose bandwidth and magnitude response are close to that of the Lorentzian dibit responses" In magnetic recording, the most commonly used partial responses (i.e. target responses) are given by the class IV response

(1.8)

where d as before, is an integer delay and K is an integer greater than or equal to one, As the recording density increases, higher values or K will be required to match the channel characteristics. But, as K increases, the channel lengthalso increases, implying higher complexity in the detecror, In CHapter 10, we elaborate OD these aspects of partial response systems.

1.6.3 Linear prediction

Prediction is a spectral estimation technique that is used for modelling correlated random processes for the purpose of finding a parametri representation of these processes. In general. different parametric representations could- be used to model the processes. In the context or linear prediction, the model used is sbown ill Figure 1.11. Here, tbe random process, x(n). is assumed 1O be generated by exciting the filter G(z) with the input u(n). Since G(z) is au ail-pole filter, this is known as autoregressive (AR) modelling. TIle choice/type of the excitation signal, urn), is application dependent and may vary depending on the nature of the process being modelled. However, iL is usually chosen to be a white process.

Other models used for parametric representation are moving average (MA) models, where G(z) is anall-zero (transversal) filter, and autoregressive moving average (ARMA) models, where G(:z) has both poles and zeres, However, the use of AR model is more popular than the other two.

The rationale behind the use of AR modelling may be explained as follows. Since the samples of any given Don-white random signal, X(II), are correlated with one another, these correlations could be used to make a prediction of the present sample of the process, x(n}, in terms of its past samples, .;(11 - I), x(n - 2), ... , X(II - N), as in Figure I.li. Intuitively. such prediction improves as the predictor length increases, HOwever, the improvement obtained may become negligible. once the predictor length, N, exceeds a certain value, which depends upon tll,e extent of the correlation in the given process. The prediction error, e(n), will then be approximately white. We now note that the transfer function between the input process, x(n), and the prediction error, e(n), is



H(z) = 1- L ajz-",

(L9)

Hi

Introduction

0.8
0.6
Q.4
Q.2
2: 0
L
-0.2
-0.4
-0.6
-0.8
-1
-5 OJ :£.

w -5

o

::J

t::

z . (!J" -1Q <l":

:2

-15

Applications

17

a tfT (a)

5~----~----~----~----~---'

o

x(n)

1

u(n)

.. G(z) = _""N _I

I i-l=l ail!

FlgLlre 1.11 Autoreqresstve m.o061l1ng of a random process

where the (liS are thepredieror coefflcients. Now j[ a white process, u(n), with similar statistics as e(ll) is passed through an a'lI-pole filter with (he transfer function

1

G(z) = L'''' _-i 1

I - l-I14~

(l.l())

as in Figure 1.11, then the generated output, i(n), will dearly be a process with the same statistics as x(n).

With the background developed above, we are now ready 10 discuss a few applications ofadaptive predict jon.

5

Autoregressive spectral analysis

In certain applications we need to estimate the power spectrum of a random process. A trivial way of obtaining such an estimate is to take the Fourier transform (discrete Fourier transform (DFT) in the case of discrete-lime processes) and use some averaging (smoothing) technique ro improve the estimate, This comes under the class or nONparametric spectralestimatian techniques (Kay, 1988)_ When the number of samples €I[ the input are limited. the estimates provided by non-parametric spectral estimation techniques will become unreliable. In such cases the parametric spec.tral estimation, as explained above, may give more reliable estimate.

As mentioned already, parametric spectral estimatioc could be done by using either AR, MAor ARMAruodels (Kay, 1988). III the Case at AR modelling We proceed as follows. Wefust chcosea proper order. N. for the model. The observed sequence. x(n), is then applied to a predictor structure similar to Figure 1 .l2 whose coefficients. the a,s. are optimized 'by minimizing the prediction error, e(fI). Once the predictor coefficients have converged, an estimate: of the power spectral density of X (11) is obtained according.to the following equation:

(1.11 )

x(n)

_20~----~----~_L--~L_--~~--~

a

0.2 0.4 0.6 O.B NORMALIZED FREQUENCY, fT

(b)

+

i(n) ~ e(n)

Figure 1'_10 Time and frequency domain responses pf magnetic recordlnp c:l)anneJs for densities D = 1, 2and 3. modeled' Llsing the Locentzlan pulse. (a) Dibit response. (b) Magnitude response of dlblt response

Figure 1.12 Linear predletor

18

lntroducji.on

where No Is an estimate of the power of the. prediction error, e(,,). Tbblfollows from the medel of Figure 1.11 and the fact thatafter convergence of tIw predictor, e(n) is approximately white. For further explanation on the derivation dO. I 1) from tile signal mode! of Pjgl,lre ! .11, refer to Chapter 2 (SectiQn 2.4.4).

Adaptive nne enhancement

Adaptive line enhancement refers to thesituatlonwhere a narrow-bandalgnal embedded in a wide-band signal (usually while). needs to be extracted. Depending on the application, the- extracted signal may be the signal of interest, or an unwanted inrerference that should be removed. Examples of'the latter oasearea spread spectrum signal.that has been corrupted by:a narrow-band signal an.a b~bmediCa! measurement signals tBat have been corrupted by the SO/60Hz power-line interference.

TIle idea. of using predlction to extract a narrl;lw-band signal when. mixed with a wideband signal follows from the following fundamental result of s{gnal analysis: suecessivc samples of a narrow-band signal are highly correlated with one-another, whereas there is a lmost no correlation between suecessi ve sam ples of a wide-band process. Because oHms, j,fa pmc.(!S{; x(n) consiiiting of tbe sv:m of rnirtow~band and wide-hand precesses isappJ;iGd to a predictor, then the predictor eutpnt, ~(Ii), will bea good estimate of the, narrow-band portion of X(II).In other words; the predictor will aetas a narrow-hand fil ter which rej,e~ts most of the wide-band POFtiOD of X(II) and lreeps (enhances) the narrow-band portion, thus the name true enhaneer. Example'S Q( Iineenhaneess can be found in Chapters 6 and 10. In particular, in Chapter 1.0 we find that Iine enhancers can be bestimplemented I')sing llRfillcrs.

We also note that in tbe applieatiaas where the: narrow-band portion Qf .\"(n} bas to be rejected (suchas the examples mentioned above); tlse differenceb¥tween x(n) and i(n),. i.e. the estimatior; error, e(Il}, is taken as the system output. In this case the transfer function between the input, X(II), and the output, e{nJ, wilt be that of a notch

filter. .

Speech coding

Since theadvent or digita: signal processing, speech processing has ah~ays been one of the focused research areas. Among various.processing; techfliqUfS that have been applied to speech signals, linear prediction has been found [0 be the most promising technique, leading La many useful algorithms. In fact, most of the theory of prediction was developed III the context of sp6cch processing.

There are two major speech coding tecb:iti.quesl::l}at involve linear prediorlon (J ayant and Noll, 1984). Both techniques aim at reducing the numher ofbits-u~cl fore'VC:ry second of speech to achieve saving in storage. and!qr transmission bandwidth. The first technique, which is categorized under the class of source coders. strives to produce digitized voice daraat low bft rates in rbe range 2:'" to kb/s. TIle synthesized speech, however, is not of a high quality. It sounds more Syilthetic, lacking naruralism, Hence, it becomes difficult to recognize the speaker. The second technique, which comes under the dass of i'.'Dl'ejorfrl.coders, gi-ves much better quality at the cost ofa mueh higher bit rate (typically, 32 kb/s),

Applications

19

Impulse train (voeal-cordsound pulse generator)

i(n)

",,_U_(U-i) ~,Gez) = . !

] - ".' a. -I .L..I=L'Z

u(n) :.exciration

x(n): synthesized speech

Vocal-tract model

White noise generator

Figure 1.1'3 Sp¢e.oh-pr()ldl,H::ticofl mOdel

The main reason Tor linear prediction being wrdely used in speech co ding is that speech signals can be accurately modelled, as in F'1gu-re 1.13. Here, the all-pole filter is the vocal trast model. The excitation to this model; !i{n), is either a white noise in theease of unvoiced sounds (fricatives such as /s/ and (f/), or an impulse train in the ease of voiced sounds (vowels such as lin. The period of the impulse train; knownas lhe pitch period; and the power ofthewh_ite noise, known as the excttation lew::;!, are parameters oftae speech model which are [0 be i.denti:fioo. ill the ceding process.

Linear predictive coding (L?C). Speech signal is a hlghly non-stationary process. The vocal-tract shape undergoes variations to generate different sounds ill uttering each word. ACC(lrdlngly,in LPC, tooode a-speech signal, it is first partitioned into segments. of 1 0-30ros long. These segments are short enough for thevoeal-tractshape to be nearly stationary, so that the parameters of thespeeeh-production model of Figure' I. Daould be.assumed med. Then, the following steps are used to obtain the parameters of each segmen l:

1. Using the predictor structure shown in Figure 1.12, thepredicter ccefficients, the a,s,. are obtalned by miaimizing the prediction error e(n-) in the least-sqaares sense, for the given segment.

1. The energy of the prediction errore(n) is measured. This specifies the level of

excitation required for syntheSizing [his segment, 3.. The segment i~ classified as voiced Or unvoiced.

4. In the C~ of voiced .speech, the pitch period of the segment is measured.

11,.e foIlowing parameters are then stored or transmitted for every segmen t, as the coded speeoh: (i) the predictor coefficients, (ii) the energy of the excitation signal, (iii) voiced! unvoiced classification, and (iv) the pitch period i11 the case of voiced speech. These parameters can then (when necessary) be used in a model similar to Figure 1.13 to synthesize the Speech signal.

Waveform coding. The most direct way ofwaveformooding is the standard pulse oode modulation (PCM) techniq ue, where (In: speech signal samples are directly digitized in to

20

InfrodlJc!iOi1

a prescribed number of bits to generate the information bits associated with the ceded speech. DheCI quantization of speech samples requires rela lively a large number of bits (usually 11 bits per sample) in order to be able 10 reconstruos the original speech 'with an acceptable quality.

A modification of the standard PCM, known as dij!erential pulse code modutauo» (OPCM), employs ·alinear predictor such as Figure 1.12 and uses the bits associated wi th the quantized samples of the prediction error, .t(Il), as the coded speech. The rationale here is that the prediction etwr"e(lI),. hasa much smaller variance than the input, x(n). Thus, for a given q uamizarioa level, e{n) may be qnanrized wi til fewer bits, as compared with .\"(11). Moreover, since the number of information bits perevery second of the coded speech is directly proportional to the number of bits used per sample. the bit rate of the DPCM will be less compared with the standard reM.

The predictiPI1 filter used in DPCM.cao be fixed 0[' bemade adaptive. A OPCM system with en adaptive predictor iscaJled an adajJlive DPCM (ADPCM).ln the ease of'speeeli sign:;!is, use of the ADPCM results in superior performance as compared with the cas!': w.hore a non-adaptive OPCM is used. In fact, the ADPCM has been standardized and widely used in practice (International Telecemmunioatlon Unit (ITU) Recommendation G.72:6).

Figure 1.14 depicts a simplified diagram of the ADPCM system, :;IS proposed in lTU Rec{)mmell{hUiGJl 0.726. Here, the pn;dic1or is a six-zero, two-pole adaptive lIRfilter. The coefficients orlll-is filter are adjusted adapt:ively so tha; the quantized error, e(n), is minlraized in the mean-sq uare sense. The predictor in put, i(n), is the same as the original input, x(f/J. exeept.for the quantization error in i(n). To understand tbe jointoperation of the encoder and decoder in Figere, 1 . 14,liQ,te tha t the sallie signal. e(n J, is used as inputs to the pTediclQf structures at Ute encode, and decoder. Hence, if the stability Qrthe loop censisting of-th.e predictof.,;lUd adaptation al gorithm could.be guaranteed, then the s ready stare value of the reconstructed speech at the decoder, i.e. x(n). win beequal to that at the encoder. i.e .. X(/l) , since non "equal in itial condi tions of the encoder and decoder lOQPs will die away after the-ir uanslent phase.

ENCODER DECODER

Figure 1,14 AOPCM encoder-d'H~oder

Applications

21

Adaptive fiIter

FlglJr~ 1.15 lnterterenee cancellation

1.6.4 .Inter.ference cancellation

Interference cancellation f'0f~ts La situations where if is required to cancel an interferingsignal/noise frotnlhe given signal which is a mixture of the desired signal and the interference. TIle principle of interference caneellatien is to obtain an estimate of the in terferi ng signal and su btract th at from the corrupted signal, The feasibility of this idea relies on the availability of ,a reference seuree from which the interfering stgnal originates.

Pigure 1.15 d.epfcts th:e·concepl olinterferenee cancella tion, in its simplest form .. There MC two Inputs to the =celler:p,.if/laJY 'and reference. TIle prhnary Input is the corrupted s.ignal, i.e, tile desired signal plus iuterferenee. The reference.input, on the other band, originaees from. the interference source only.' The adaptive iilter is adjusted so that 1:1 replica of (he interference sipal that is. present ill the primary signal appears at its output, yen), Subtracting this frOl'll the primary input results in an output that is cleared from in terference, liltls the name interference cancella tion,

We note that the Interference canc~llatiQu configuration of Figure US is dilferent From the previous cases of adaptive filter-s, ill the. sense that the residual error (which was -discarded in other cases) is the eleaned-up signal here, The desired signal in the previous eases has bsen replaced here by a noisy (corrupted) version of the actual desired signal. Moreover, the use of the term 'reference to refer to the adaptive filter input is clearly related to the role of this input in the canceller,

til the fest of this section we present some specific applications of interference cancelling.

EOho cancellation in telephone lines

EchQe-£ ill telephone lines n~(1,sl1y occur 81 points WIJe.:-e hyb'r;id G'ircnits are used to convert four-wirenerwnrks to two-wire networks. Figure 1.16 presents a simplified diagram ,of a telephone co nnection network, hlghligh ti ng the poin IS where 'echoes .occur. The two "wires at the ends are subscriber loops connecting ens \ omers , telephones to cen tral offices. It may abo include portions of the local. network. The four-wires, on the

I ln Snm$ applications of interference eancellation there might also be some leakagecf' the desi_r~ signal 10 the reference input. Rete. we have Ignored this situation for simplicity,

22

Int;cJducUon

1

Central switching offices and inter~office trunk lines

r-""-~------------------------~~~~-~---'

I I I I I I I I I ,

1

: two-wire ,

I

,

,

I

I

: four-wire :

~-----------------------~-~--------------

Figure 1.16 Simplifj'ed diagram or a telephone network

J

other hand, are carrier systems (trunk 'lines) for medium- to long-haul transmission. The distinction is that the two-wire segments carry signals in both directions on the same lines, while iu the four-wire segments signals in the t\VO directions are transmitted on two separase lines. ACGordillgly, the role of the hybrid circuit is to separate the signalsin the two directions. Perfect operation of the 'hybrid circuit requires that the in-coming: signal from the, trunk lines shou ld be directed 10 the su bscriber line and that there be no leakaze (echo) of that to the return line. In practice, however. S11Ch ideal behaviour cannot be expected, from hybrid circuits. There would always be some echo on the return path. In the case of voice communications (i.e. ordinary COnversation On telephone lines), the effect of the echoes becomes more obvious (and annoying to the speaker) in longdistance calls where the delay with which the echo returns to the speaker may be in the range of a few hundred milliseconds, In digital data transmission, both short- and longdelay echoes are serious.

As notedearlier, and also can clearly be seen from Figure 1.17, the problem ofecbo cancellation may be viewedss 0I1C of system modellil'lg. An adaptivefiUer is pin between the in-coming and out-going Lines of the hybrid. By aqapting the filter terealize an approximation of the echo path, a replica of the echo is obtainedat its output. This is then subtracted from the out-going signal (0 clear that from the undesirable echo.

Echo cancellers are usually implemented in transversal form, The time spread of echoes in a typical hybrld eirouit.is in the range 20-30 IDS. If we assume a sampling rate of 8 kHz [or the operation of the echo canceller, then an echo spread of 30ms requires an

J

Figure 1.17 Adaptive echo canceller

Applications

23

Subscriber Modem Trunk Lines

~-----~-------------, l------~------,

: transrnhted : I I

I d .. ta ': :

I I

I I I 1

I I I I

I ~ ~I~ I

I I

I t

I I

:~~ :

I data I I

t I I

~ ~ < J

FIgur'e 1.18 Data echo canceller

adaptive filter with at least 240 taps (30 ms x 8 kHz). This.is a relatively long filter, requiringa high-speed digital signal processor fOf jts realization. Frequency domain processing.is often used to reduce the high, computational complexity of long filters. The subject of frequency domainadaptive (lJrers is covered in Chapter 8. .

The echo cancellers described above are applicable to both voice und data transmission. However more stringent conditions need to- be .satisfied in the case of data transmission. To maximize .the usage of the available bandwidth, full-duplex data transmissionis often used. This requires the use of a hybrid circuit for connecting the data modem to the two-wire subscriber loop, as ShOWUll Figure 1.16. The leakage of the transmitted data back. to the receiver input is thus inevitable and an echo canceller has to be added. as indicated in Figure Ll8. However, we note that the data echo cancellers are different from the voice echo cancellers used in central switching offices in many way. For instance, since tjie input to the data echo canceller are data symbols, it tim operate at

. the data symbol rate, which isin the range of2.4-3 kHz (about three times smaller than the 8 kHz sampling frequency used in voice ecao cancellers), For a given echo spread, a lower sampling frequency implies fewer raps for the echo canceller. Cl early , this greatly simplifies the implementation of the echo canceller, On the other hand, thedata echo cancellers require a much higher level of echocancellalioo to ensure the reliable transrrtisstnn of data at higher bit rates. In addition, the echoes returned from tile other side of the trunk lines should also be. taken care of. Detailed discussions on these issues c-an be found in Lee and Messerschrnitt (1994) and Gitlin, Hayes and Weinstein

1992).

Acoustic echo cancellation

The problem of acoustic echo cancellation can be best explained by referrina to Figure L19 which depicts the scenario that arises in teleconferencing applications. The speech signal froma.Jar-end speaker, received through II communication channel. is broadcast by a loudspeaker in a room and its echo is picked up by a microphone. This echo must be cancelled to prevent its feedback to the far-cod speaker. TIle microphone also picks up the near-end speaker's speech and possible background noise which may exist in the room. An adaptive transversal filter with sufficient length is used to model the

24

tmroduction

receive

Conferenee room

ABC

transmit

Figure 1.19 AcQUs\<11;l-ectmoaocellalton. Repr,nted from Far'hang-8oroujeny (1~97b)

acoustics of the room. A replica of tneIoudspeaker echo is, then obtained and subtracted

from the microphone signal prior to transmission. "

Clearly. the problem of acoustic echo cancellation can. also be posed as OTIe of system modelling, The main challenge here j that the echo paths spread over a relatively long length in time. For typical office rooms, echoes in the range 100-2501115 spread is quite common, For a sampling rate of kffz, litis would mean 800-2000 laps! Thus. the main problem of acoustic echo cancellation is (hal of realizing very long adaptive filters. In addition. since speech is a low-pass signal, it becomes necessary LO use special algorithms to ensure fast adaptation of the echo canceller, The algorithms discussed in Chapters 8 alld 9 lull'l' been Ividel)' used to overcome these dJ.fficulties in the implementation .of aeousnc echo cancellers,

Active noise control

Active noise control (ANe) refers to situations where acoustic antinoise waves 'are generated from electronic circuits (Kuo and Morgan, 1996). The A C can be best e x plained by the Iallowing example,

25

error microphone

Noise Source

-+A

B

c

F1gure 1.20 Active noise cancellation in a naHOW duel

A well-exami ned application of ANC is eancella lion of noise in narrow ducts, such as exhausr pipes and ventilarioa systems, as illustrated in Figure 1.20. The acoustic noise travelling alongthe d oct is picked up by a mierephone at position A. This is used as reference input to an ANC filter whose parameters are adapted so that its output, after conversion to an acoustic wave (through the cancelling loudspeakerj.Is eq ual tQ the negative; val ue of the duct noise-at position B, thereby cancelling that. The residual noise, picked up by the error micrcphoneat position C. is the error signal used for adaptation of the ANC filter.

Comparing thi ANC set-up with the interference cancellation set-up given in Figure 1.15, \ e may note the following. The source of Interference here is the duct noise, me reference input is the noise picked up by tile reference mierophone, tile desired output (i,e. what we wish 10 see after cancelling the duct noise) is zero, and the primary input is the duet noise reaehing position B. Accordingly the [ole Of the ANC filter is to model the response of the duel Irom position A to B.

The above description of ANC assumes that the duct is narrow and [he acoustic noise waves are travelling along the duct, which is-like a OTIe dimensional model. The acoustie models or wider ducts and large enclestnes, such as cars and aircraft, are usually more complicated. MuJ tipJemicrophom;s/loudspeakers are needed for successful implementalion of ANCs in such enclosures. The adaptive filtering problem is then that of a multiplelnpur-reultiple-cutput system (Kuo and Morgan. 1996). Nevertheless. the basic principle remains the same, i.e, the generation of antinoise to cancel the actual noise.

Beamforming

In the applications that have been discussed so Car the filters/predictors are used to coroNne slJmples of til<: inPlJI signal{s) at diifecceDl time instants 10 gel'lerate the Qutput. Hence, these areelassified as temporalfihering, Beamforming, however, isdifferem from these in the sense that the inputs to a bearnforrner are samples of incoming signals at different positions in space, This is called spariaJ ji/tering. Bearnfcrming finds applications in communications. radar and sonar (Johnson and Dudgeon. 1993), and also imaging in radar and medical engineering (Soumekh, 1994).

In spatia] filtering, a number of independent sensors are placed at different point in space to pick up ignals corning from various sources (see Figure 1.2}). In radar and

28

fnt.roduotion

App!fcations

27

PROCESSOR

arrives. at.ele)1lentsA and Sat the same time, whereas the arrival times of signal v.(n)ar A and BaTe different. We may thus write

Sensors

[)

• •

I-------j~ Output (Bearaformer filter)



Fig.ure 1.21 Spatial filtering (beamformil1g}

where the subscripts A and B are used to denote the signals picked up by elements A and B, respectively, and 'P is the phase-shift.arisirrg fremthe lime delay of arrival of v{tI) at

element A with respect to its. arrival at element B. .

. Now, iFwe assume that S(I1) is the desired sign(!lan.d 0:(11) is an interference, then, by rnSpeotion, 'We call see (hat if the phase-shifter phase is ohosenequal t.o \0, then the interference, 11(11), will be completely cancelled by the bearnfortner. The desired signal, Oil the other hand, reaches the beamformer output as a(coswo'tl - cos(w"n - 'P) )., which is non-zero (and still holding the information contained in itsenvelope, 0:) when <p #- '0, i.e, when the interfsrence.direction is different from the direction of the desired signal. This shows that we can tune a beamformertoallow tile desired signal arriving from a direction to pass through.i t.while rejecting the unwan ted signals (inteuerences) arrivingf rom other

directions. .

The idea of using: a phase-shifter to adjust the beam pattern of two sensors. is easily extendible to the general case of more tban two sensers, In general. by introducing approprialephase shifts and also gains at tho: outpmof the varieus.sensnrs and summing up tl1~se outputs, we ean realize any arbitrary beam pattern. Thisis similar to the selection Qftap weights fora transversal filter s.o thar the filter frequency response becomes a good approximation to the desired response. Clearly, by increasing the number of elements in the array, better approximations to the desired beam pattern can be achieved.

The final point that Wit wish to add 'here is that in cases where the input signals to the beamformer arc aotnarrow-band, a combination of spatial and temporal fiJ tering needs 10 be used. [n suchcases, spatial information is pbta:iJwd by having sensors at different positions in space.aswasdiscussed above, rhe temporal informatica ls.obtainedby using a transversal filter at the output of each sensor. The output of the broad-band beamfarmer is the summation of the outputs of these. transversalfilters.

communications. the signals are usually electromagnetic waves and the sensors are [bus antenna elements. Accordingly, the term antenna arrays is often used to refer to these applications of beamformers. In S.OIl3r applications, the sensors are Eydrophenes. designed to. respond teaceustic waves,

. In a beam FOJIl]er , the samples ofsignaJs picked up bj the sensors at a parti ¢.t! lar instant of time constitutes a snapsho«. The samples of snapshot (spatialsamples) plil,.Y the same role as the successive (temporal) samples of input in a transversal filter. The beamfo"aner fdter linearly combines the sensors' signals so that signals .arriving from some particular directions axe amplified, while signals [rom other directions are attetruated. Thus, in ana.logy with the frequency response of temporal ill ttrli, spatial filters have responses that vary according to the direction of arrival of tho in-coming signal{s). Thls i~ givenin the form of a polar plot (gain vs, angle) and is referred to. as the beam pattern.

In many applisations of beamformers, the signals picked up by sensors. are narrow" band having the samecarrier (cenire) frequency. These signals differ in their directlcnsof-arrival, w.hic)J an: related to tho Ioeatioa oftl1ei1; sousces. The eperatien ofbe.atllfo:r~ mzrs in such applications can behest esplafned by the following e;.:ampl~.

Consider an antenna array eouslsrirrg of tWO emni-direetienal elements A and B, as" presented in Figure 1.22. The tone (as an approximation to narrow-band) sigmtlss(ll) = Cl'CQSW.(}11 and LI(II) = t3C.OSW,,1J arriving at angles 0 and' 80 (with respect to the line perpendicular to the lineeonnectiag A ant! B). Alspectiv:ely , arc the inputs to the array CbeamfQrIner) filter which consists of a phase-shifter and a subtracter. The signal sen)

J

A primary input

j

v(n

B x(n)

den)

sen)

reference input

Phase-shifter

yen)

e(n)

Figure 1.22 A two-element bearntormar

2

Discrete- Time Signals and Systems

Most adaptive algerithms have been d~velep.ed for discrete-time (sampled) signals; Discrete-time systems are used fOT the implementation of adaptive filters, In this-chapter we present a short review of'discrete-time signals and sys~eIl1s. We assume [.nat the-reader 'is familiar with-the basic concepts of diserete-time systems, suchas the Nyq uisr sampling theorem. ehe z-rransfcrru and system Iunetion, and' also with the theory of random variables and stm:rhastic processes. Ot1J goal, in this chapter. is [0 review rheseconeepts and put them in ,1 [Tamewo(j{appr::opria!e fOT the f:(iSL of the book.

2.1 Sequences and the z- Transtorm

In discrete-time systems We are concerned with prot~sing signals that are represented by sequences, Such sequences may be samples of'a.ccntin uous-time analogue signal 0 r [nay be discrete innature, As au example, in thecbacnel eqimli~r structure presented 1U Figure 1.9. the input sequence to' the equalizer, x(n), consists of rhe samples ofthe channel output which is an analogue signal, but tile original data sequence, :1'(11), is discrete in namre.

A discrete-time sequence, "(II), may be equivalently represented by its z-transform defin!'!d as

:);;

X(z) = LX(Il)z~"

(2.1 )

n=-DLr

where z is a complex variable. The range ef values or "1 for which the above summation converges is called the region 0/ convergence Qf X(z). The following two examples

iIlusttale this, .

E.wr;mpi.e 2.1

Considerthe sequence

{(l' II::::: O.

);1(11) = I

0, II < O.

(2.2)

I

f

30

Discrete- Time Signals and Systems

Sequences and the z-Trensiorm

31

The r-transforrrt of:l:1 (11) is

We thus ,I1ote that the specjficafion of the z-trausform, X(z), ofa sequence is complete auly when irs region of convergence is also specified. In other words, the inverse ztransform of X(t) can be uniquely round only if'its region of'convergence-is also specified. For example, one may note that XI (z) and X2(Z) in the above examples have exactly the same form , except a sign reversal. HenIX, if their regions of convergence are not specified, then both may be interpreted as the z-transforms of either left~sided Or right-sided sequences.

Two-sided seqtrenees may also exist. A two-sidedsequence is one that extends from n = -00 to n =+00. The following example shows how to deal with two-sided sequences.

oc

XI(·t) = L:a"z-~ .::0

whish converges to

I X1() = l-a,rl

(2..3)

Example. 2.3

for taz-'I < 1. i.e, Izi > 101. We may also write

XI{z) = _ z ~-a

(2:.4)

Consider the sequence

() { d'., n ~ 0,

:1:3 II = n

b, /I <0,

(2.7)

for Izl"> 1111.

Rrample 2.2

where 101 < Ibl· As we shall see, 111c conaiti.on 101 < Ibl is necessary to make the convergence of the z-transform of Xi (11) possible. The z-transfnrm of .xAI!) 'is

-.I cc

XJ(z) = L b"z-n + Ld'z-·

I'I=-ne ri=O

(2.8)

Consider the sequence

{o. II :0:0, ~2(1I}= b\ /I <0.

(2.5)

Clearly. me first sum converges when l:rl < Ibl, and the second sum converges when !,lll > laJ. Thus, we obtain

The z-tran form of ."\'1(11) IS

X(l Z Z z(/I-b)

J Z = -b ---z + -z_ --()_ = -,-( 2--"":'(,-;-7 )-'-C ;;:--'-'--h=)

(2.9)

-I

Xl{Z) = L b~"t-"

for lal <:: Izi < Ihl.

= fw'z)", ,

(1=1

We may note that the region of convergence of %3(:0) is the urea in between two concentric circles. Tills is true, in general. for aJI two-sided sequences. For a sequence with a rational s-transform, the radii of the two circles are determined by two of the poles of the a-transform of the sequence, The right-sided part of the sequence-is determined by the poles which are surrounded by the region of convergence, and the poles surrounding the region of convergence determine the left-sided part of the sequence, The following example, which also shows one way or calculating the inverse z-traasfcrm, clarifies the above points.

which converges to

(2;6)

for 1.::1 < Ibl·

The two sequences presented in the above eX31llple? are .different in many respects. The sequence x: (n) in Example 2.1 is called right-sided, smce lis non-zero elements start at a finite » ='" /11 (here, III = 0) and extend up to /1 = -1-00. On the other ban~, tile sequence

( .)' Example 2.2 is a 1e{1",,·idedol1e. I is non-zero elements stall at a finite n= tt2 (~ere.

il:2 /1 ill - .' . r' h id d d I ft ded

n'l = -1) and extend up to n = -00. This definition 0 ng .Hi! e. an e -S] •

sequence, also implies that the region of convergence _of a nghHllde~ sequence IS always the exterior ora circle (lzl > lal in ~xamplc 21), while that.of a left-sided sequence is .. always the interior of a circle (1:.;1 < Ibl ill Example 2.2).

EX{U7lpEe J.4

Consider <I two-sided sequence, x(n), with (he z-rransform

(2.IO)

and the region or convergence 0.7 < 12'1 < 2_

32

Disere:!e- TIme Signals and Systems

Sequem~es and !.he z-treostarm

33

To find .:1:(11), i .. e. the inverse z-transform o:f XC''), we use themethnd ofp.utiaJ fraction and

expand X (,) as "

ABC

XC,>') = +" + --' -

" 1 -0.5.r1 1+ O.7c1 ! +2rT'

All a Iternative way of performing an i averse a-transfcrmoan be derived by using the O:J!.wJ!y integral theorem, which is, stated as rollows:

where A, B and -C are constants thatcan be determined as follows:

1 i b_I' {l' k=O.

- .. -r dz=

2ilrJ c - 0, If,£ 9,

(2-16)

A = 0 - 0.5::-I)X{.::)I,=Q_' = I

B = (I + O.lz-T )X(<;} 1'=-0.7 = -2 C = (l + 2Z-T)X(_;;)J"=_~ = I..

wlH<:~e C is a counterclockwise eentour that encircles the origin,

The a-transform relation, reproduced here ror convenience, is giv.;:.n by

This __ gives

IX

X{z) = E x(n}z-".

'j=-~

(2. (1)

1 2 1

J'(z) = 1- O.5z-T - 1 + 0.7z-1 + 1 + 221 •

(2.11)

Muleiplying both side~ of (2.1 7) by /-1 and integrating, we obtain

We trear eaoh of the terms in the above equation separately, To expand these terms and. from there, cx:tfm;:t their correspond mg.sequences, we us!' the .following idellTIty. which holds for 1111 < I:

(2.18)

1 ,

l_a=I+"'+(1""+ ....

where C is a contour within the region of eonvergerree of X(z) and encircling the origin. Interchanging the order of integration and summation Oil me right-handside Of ().. -18), we obtain

We aote that withiJJ the region of convergence of }(;;)~ both 10"5::-' 1 and 10. 7:rl ) are less than one, and rhus

I ! 05-1+0.2-2

-1---O~~5-:;;-----'-1 = +. z.) j' + ...

CU2)

(2. !9)

aDd

:): 2

- J + 0.7::-1 = -1 - (-0.7):-1

= -2(1 + (_0:7),,-1 + (_0.7)2:::--2 + ... ).

A pplica tion ef the Cauchy integra I theorem !.O (2. J 9) give-s the inverse a-transform rela li an,

(2.13)

x(n) = -21 .1 }((z)zn-I dz, 1\1 J'c

(2 . .20)

However. for the third term on the right-hand side or (2. 1'1). J1;~-11 )0 I, and. thus. anexpansion similar to the last \1110 is not applicable, A similarexptl-ll!\ioD will bepo~sible if we rearrange this term as

0_);;-

I + 4-l 1 + 05z .

where C is a countercleckwise clcsed.comourin the region of convergence of X(z) and encircling tbecrigin of the z-plane,

For rational a-transforms, contour integrals are often conveniently evaluated using the residue theorem. i.e,

!l.52 ( (' , ~

--_ = 0.5:, 1 + (~O.51:-+ -O.5t-=-- + ... / 1+0.5_

x(,,) = _1-. i X(Z)t"-1 d:: 2TL} Jt;

= l)re~{due1; of X·'{.;: }z,,-I a! the peles inside C].

(2.21)

Here, within the region of convergence of X(;;), 10-5·:1 < l , nnd, thus, we may write

(i.14)

lngeneral, if X(z)t'--I is a rational Iurrctlon . .of z, and zp is a pole of X(?)l-I, repealed m time-s" then

Substituting (2.12), (2.13) and (2.14) into (2.11) and recalling (2.1). we obtain

{ -: -.2.)", II < ,I),

x(,,)= (I,Sn-2(-O.7)", ,,:::.:0.

(2.L5)

1 [d"'-I.i.( J]

',' . - - _",-1 ,,_ '. . 'I' E

residue ot %(z)_ a! +p - (m _ 1)! . dz",-l .•. ~".'

(l.22)

34

Discre!e-T1'me Signals arrd Syslems

System Function

35

where ,1,b(,,) = (z - Zp)'" X(.z)~n-I. In particular, if there is a .first-order pole at-.i!"'= z"!" i.e, m = I, then

00 If

L x(n)y'(n) =.,.---:' X(z)?'(l/z~)z-I da,

"=-00 ~'lr}

(224)

Region I

z-plane

(2.23)

2,,2 Parseval's Relation

Among various important r~suJ.ts and properties oftbe s-transform, in this bock we, Inc particularly interested in Pntsev,(Il's reiation" which state; tb:tl tbr any pair of sequences x(n) and _v (11)"

where the superscript asterisk denotes complex eonjugation and the CClOttmr of integration is taken ia.the overlap o~ th?- regions of convergence of X (z)w and r(l/z'). ] r %(z) and YJz) converge on the umt circle, tbcll we can ehoose 'Z = e '-, and (2.24) becomes

Figure 2.1 Possible re_Qlons 01 convergence for a 'lVi(o-pole z-transform

(2.25)

This shews that the input-output relation for a Iinear, time-invariant system corresponds to a multiplication of the z-transforms of the input and the impulse response of rhe system.

The s-transform of the impulse response of a linear, time-invariant system is referred to as itS systern functkm. The systemfuncaon evaluated over the unit circle_, I,,] = 1, is the frequency response of the system, H(e-1"'). For any particular frequency w, /i(e!"') is the gai 11 (complex-valued, in .general) of the system.when its input is-the compfex sinusoid ei"".

Any stable, linear, time-invariant system has a finite frequency response for ail values ofw, This means that the region of convergence of H (z) has to include the unit eircle, This. Ear:t C;Pl be used to determine uniq uely the region of convergence of any rati onal system lnnctien, once its poles. are- known. As an example, if we consider a sequence with the l_. transform

Furthermore, if y(n) = x(n), for all n. then (2.25) becomes

(2.26)

I

Equation (2_26) hasrhe following interpretation. The totalenergyin /J seqllelice x(n.), i.e. L'::~ """,,--]X(l1) 12. may be equ.illtilemly abutined b.l'.al'emgin:g IX{eJf.i) 12 owr o}lecycle oflha;,

. .

.2.3 System Function

I

Consider a discrete-time, linear, time-invariantsystem with the impulse response l!.{n). With X(li) and )1(/1) denoting', respectively, the input and output of the system,

I

H(z) = (I _ 0.52-1)( I _ 2;;--J) ,

(2.30)

yen) =;~tf/) .t h(n},

(2.27)

we fiml that there me three possible regionscf convergence for H(z) •. as specified ill Figure 2_ L These are regions I. Ilaud ill, each giving a different time sequence. However, .. if we assume that H(.1) "is the system function of a stable, time-invariant system, the only acceptable region of convergence will be region II, Netting this. we obtain (see Problem 2..1)

where the asterisk denotes coevolution and is defined as

X(II) * hen) = f h(k)x(n- k).

k=--"M

(2.28)

Equation (2.27) suggests that any linear, time-invariant system is completely characterized by its impulse response, h(I1). Taking the z-transfcrm from both sides or{1.l?), we

obtain -

() { -j x 2",.

h 11 =

-~ X 0.5",

11<0, fl~O.

(2.31 )

J

Y(z) = X(z)H(z}.

(2..29)

We note that the impulse 'response, hen). obtained above, extends from 11 = -00 to " = +00. This _[i:II~a1IS that, although the input, 6(/1),. is applied at time 11 = 0, the system output takes Don-zero values even prior to tbat. Such a system is called non-causal. In

36

Discrete-Time Signals and Systems

.37

are called stechastic processes. Adaptivealgorithms arc designed to ext rae r these eharacte ris tics an d use t b em fb r ad j usti fig the fi I te r coefficien ts.

A disorete-time, stochastic process is an indexed set of random variables {.l:'(n); n = ... , -2,- i,O, 1,2, ... J. As a random signal, the index n is associated with. lime or posSibly some other physical dimension, [a this hook, for convenience, we frequently refer to 11 as the time index. So lar, we have used the notation X(ll) to refer to a particular sequence x(n) that extends from n = -00 to 11 = +00. We use' the notation {.'l:(n)} for a stochastic process a. particular sequence x(n) which may be a single realization of tl:ml.

The elements of a stochastic process, {x(n)}, fOT diff~reI1[ values of n, are in genera] comptex-valued random variable's that are eharaorerized by their probability distribution functions. Tbe interrelationships between different CIClnt:DlS of {x(n)} are determined by their joint distribution functions, Such distribution functions, in general, may change with the lime Index II. A stochastic process is called sltltiOlwry in {he strict sense if all of its (&ingIe and joint) distribuiio» functions are independent of a shifl in the time IJrigin.

S(I!)"L-I_C_(Z_) --ll X(n) 'L-I_fl_<Z_) _:-_y_(,n .. ~ Equalizer

Channel

Ffgure2.2 A communicll,tion s)!stem

contrast to this, a system is said 1:0 be causal if its Impulse response is non-zero only for non-negative values ofn. Non-causal systems, although not reali-stie, may be eneountered in some theoretical developments .. It is lmportant that we find a practical solution for handling such QaS,BS. The followmg example eonsisers such a case and gives a selutiou to that.

E'Kample 2.5

Figure 2.2 shows a communica tion sjii t¢m. I1 consj'Sls of lit commnn ieat ion chan ne j IV hich is characterized by the-system function

2.4.1 Stochasticave'rages

It is often useful to characterize stochastic processes by st<t tistica! a verages or their elements. These averages are called ensembte averages and, in general, are timedependent. For example, the mean of the .I! th element of a stochastic process {,,(u}}, is defined as

C(zj = I - 2.52-1 + ,-1 = (1 - 0.5z-1 )(1 - 2;;:-]).

(2.31)

The equalizer, H(z), should be selected so that the original transmitted signal, s(n), ca n be

recovered from the eqlJ,alj~er output without -any di~toni(l!l.. .

For this will shall select H(z}so W\lL\VC Sllly\li) = .\'(11). This ~1Ii1 be achieved if fl(::) is selected SD that C(z)H(z) = I. This gives

17I ... (n} = E[x(n)]

(235)

where E[·] denotes statistical expectation. II should be noted that since 111.< (ll) is ingeneral a functionof 11, it may not be possible to obtain m ... (n.) by time averagiagofa single realization orall'! srochasticproeess -{.Y(IIJ], unless ('he process possesseseertain special properties, indicated at the end of this chapter. Instead, 11 has to be fixed and averaging has to be done over the 11 th element of the stochastic pro.c:e.ss, !is a single random variable.

1n our later developments we are heavily dependent on the follcwtag averages,

1. The autocorrelation [unction: For a stochastic process {x(n)}, it is defined as

(2.33)

Notihg.Lha.l this is similar La H(z) in (2.30), \ve finel that the equalizer impulse response is the-one given tn (2.31). Th:i8.of course, is non-causal and. therefote~ not realizable. The 'problem ell." be easily sol ved by sbi.ftitJ.S the non-csnsai IT.spOH$C of tbe equalizer l{! the righ I by Sl,ffficienl num ber of samples so ll'iat the remalnlngncn-causal samples are sulficientlysmall and can be ignored. Mathematically, we say

¢",,(n,m) = Elx(ll)x'(m)j

(2.36)

(1 . .>4)

where A is thsnum ber of sample delays introduced to achieve II realizable causal s:%\em. We me (he approximation sign .. "", in (2.34), since 1'11:: ignore tbe non-causal samples of ;:-":'/C(=). The equ11kizer output is then .\'(n - fl.).

where the superscript asterisk denotes complex ccnjugatlon.

2. The cross-correlatlon function: It is defined rOT two stochastic processes {X(LI)} and {r(nJ} as

<p s: An, PII) = E[x(n)y" (m)J.

(2.37)

2.4 StochastIc Processes

A stochastic process {."(II)} is said to be suutonary ill IIle wide sense if m,,(11) and w;,<.~(n, 111) are independent Qf a shiftin the time origin, Thatis.Jor any k, tr1.and II:,

The input signal to an adaptive ftltera.ild its desired output are, in genera], random, l.e, they are Dot known a priori. However, they exhibit some statistical characteristies (hat have to be utilized for optimum adjustment of the filter coefficients. Such random Signals

m~(fl) = m,..(11 + k)

38

Discrete- Time-Signals and Systems

and

1

These imply that m",.(I'l) is- a constant for ill /I and ¢xx(n,m) depends on £he difference n - m only. Then, it would be more a ppropriate to define the autocorrelation fun oti an of {x(n)} as

tP",..,(k) = E[x(n)x' (n - k)].

(2_38)

SiinHarly, the processes {xCn nand {Y(/I)} are said.to be jointly stationaryin the wide semmiftheir rn¢tuiS-3te independent of IJ, and ,rP;t;J,(n, m) depends on n - m oidy. We may tlIen define the cross-correlation function of {X(II) Jand {),(_lI)} as

rf":.y(k) = E[x(Ii)Y~(1! - k)].

(2,.39)

Besides the autocorrelation and cross-correlation functions, the autooovariance and crollS-..el'lvari'MlOO M'lCtions. are also d~fm,ed. For stationary processes, t1\eSe ave defined as

1'x..,(k) = E[(x(ll) - m,,)(x(n - k) - In.)']

(2.40)

and

I

re spec tJvely. By e.xp!Uld:m.~, the right-hand sides of (2.40) and (2.41), we obtain

(2.42)

J

and

(2.4-3)

respectively. This shows thaI the correlation and covariance functions. differ by some bias. whicb is. determined by the means of the corresponding processes.

We m~y also note [hut for many random ,sigtlals the signal samples become less correlated as lhlllY become more separated in time. Thus, we may write

J

lim oP,-,,(k) = Imi,

k-'JC

(2.44) (2.45) (2.46) (2.47)

lim l.rx(k) = 0. ~-3;0

lim rI>.<:y(k) = !n_rm;.,

k-(Xi

11m 1''xJ.{k) = 0,

k~o<!

Stochaslic Processes

39

Other iinport~nt p£operties of the corretaticn and cava riance functions tha t should be noted here are their symmetry properties which are summarized below:

f~x(k) = ¢i:x( -k), /,,,,,,(k-) = .,,;"(-k), ¢xy(k) = <b;'J;{ ~k), 1 .. y(k) = 'Y_;....{ -k)c

(2..48) (2.49) (2.50) (2.51 )

Wemay,also note that

rP=(O) = E[j,\:(n)l2'l = mean-square oJ x(n.) 1:~'.,(O) = 0-; = variance of x(Il).

(2.S2) (1.53)

204.2 z-trans.fo.rmrepresentations The s-transform of q,~(k) is given by

<p"",(z) = t ¢, ... (k)z-k. /;=--'JJI;J

(2.54)

We note that a necessary condition for \lin,(:z) to be convergent is that mot should be zero (see Problem P2.3). We .assume this for the random processes that are considered" in me r~t of this chapter. and also the fellowmgcaapiers. Exceptional cases will be mentioned expJjcitly ..

From (2.48) we .aote thaI

ID_,",.(z) = 1>:,,( \/z').

Simi'larly, n<l>Xy(z) denotes the z-transform of ¢"",(k), then

(2,55)

(2.56)

Equation (2..55) implies that if 1I>;o (z) is a ra tioaal Iuncti OIl of 2, then its poles and zeros must occur ill complex-conjugate reciprocal pairs,,as depicted in Figure 2.3. Moreover, (2.55) implies tnal the points [hat bel 0 ng tothe region of convergence of<p_u(z) also occur in c-01D.pleX:,cbnjug,HtI reciprocal pairs. This in turn suggests-that the regioIloh:onver-

gence of iIl"Az) must be of the fbrm (see Figure 2.3) .

1 lal < Izi < lal·

(2S7)

It is important that we note this covers the unit circle, lzl = I.

The inverse z-transferm relation (2,20) m~y be used to evaluate 1'xAO) as

<1> ... ,..(0) "" -21 . i.. <PTl;(Z)Z-1 ru. _1fJ Ie "

(2.58)

Dlsct:ete- Time Signals and Systems

Stochastic Processes

41

Izl=o1 lil=JaJ

z-plane

Coajuga ting both sides of (2.62), .and replacing n by III., we obtain N

X,v(eJ"')=L x·(in)eJ",,,,.

rtI'=-N

(2 .. 63)

Next, we multiply (2.62) and (2.63) to obtain

':

'" .x"

/ <; _ .. _ .•.....

I:d= IIlal

N .N

IX N(e)"'}f' = L L x(n)x' (m) e-jw(n-m} ..

n=-Xm=-N

Taking the expectation oil beth sloe:S of (2.64), and interchanging the order of expectation and double summation, we get

figure2.3 Poles and zeros of ,8, typibahHranslorm of an autocorrelattonfunctlen

N II

E!lXN(e'''')12j = L L E[x(rl}x' (mlj e-jul(,,-m).

fr=-J\~ nl=-IV

(2.65)

We aSSWl1e that ~.<Az-) is oonvergeat on the urn! circle and select the unit circle asthe contour of integration. For this we suhstitutez by el"". Then,wchanges: from -wto -l-rras -we traverse the uni t circle once. Noting tbat z 1 dz = j dw, (2.S~) becomes

N oting I hat E[x{n )x' (m)] = ¢,,~(n - Ill). and I.etting k = n - 117, we may rearrangethe termsin (2.65) toobtain

( :"I - ! 1"· (jw') d

rf:!.u 01 - 2n- _" ~ .... e .w.

(:L59)

(2,66)

Since I.nl< = 0, * can.cembine (2.-52).and (2.53) with (2.59) 'to obtain.

To simplify (2.66) we assume thai for kgre-.ater [hall <to arbitrary large constant, but less [han infinity, t/!={k) is identically equal to zero. This.Tn general, is a fair assumptlon, unless the {X{II)} conlai"Uss,i·ntisoidal eomponems, in which case the summationon lh.e right-hand side of (1.6"6) will not be convergent. With this assumption, we. gel from (2.66)

(2.60)

2.4.3 The power spectral density

Tlie function iI> <.\ (z), when evaluated on the unit eircle, is the Fourier traosferm or the autoccrrelation sequence iP;r,x(k). It is called the power spectra! den~it)' sjnceitrefiect;s the spectral content of the underlying process as a function of fr~uency. It is also called the /lOWer speatrtan or sim ply the spectrum. The convergence performance of an adaptive filter is directly related LO ibe spectrum of irs input process. Next, we present ~ direct development of the power spectral density ora wide-sense, stationary, discrete-Lime,

stochastic process which reveals its .im'Por~an~ properties, .

Consider 11 sequence x(n) wmch represents asingle realizarion of a zero-mean, wldesense, statlonary, stochastic process {x(n)}. We. consider a window or2N + I. elements of x(n) as

(2.67)

which is nothing but the Fourier transform of the autocorrelation function, ¢~":f(kJ We may thus write

(2.68)

(2.61)

The Function <I>"", (e)"') is called the power speer ral density of the stochastic. wide-sense, stationary process {x(nJ}. II is defined as. in (2.68) or, mareconveniently, as the Fourier transform of the .autocorrelaliolJ function or {x(jl)}:

{x(n), -NSlrSN, l; /1) =

. IV! 0, otherwise.

ce

<II.tt(ej-') = L r/J", (k) e-iwk.

;,= -00

(2 .. 69)

By defrrruien, the discrete-time Fourier transform of x,"",n) is

lXJ N

XN(ef"') = L: x;v(n)e-JI.m = L- X(II) c-i""'.

I'T=~OO n=-iJI

(2.62)

The power spectral density possesses certain special properties. These are indicated below ror later reference.

42

Discrete- Time Signals and Systems

Property I When the limit in (2.68) exists, iJ.>.xAel"') ha.s the following interpretation: 2~ ~ xx (eJ",) dw = average contribution of the Frequency components

of {x(/l)}loca~ed between w and w + dw. (2.70)

This interpretation matches (2.60) if both sides of (2.70) are in regrated over w from -\'f to +?r. We will cia bora te more on this later, once we introduce response of linear systems to random signals; sec Erample 2.6.

Property 2 The power spectral density w_.",(eJ'W) is always real and lIOn-negative.

This property is obvious from definition (2.68), since IX N(eJ"')12 is always real and nonnegative.

Property 3 The power spectral densilY of a real-valued suuionary stochastia process is even. i.e. symmetric with respect to the origin w = O. In othsr words.

.(2.7l)

However, (his may 1101 true when the process is cpmplex-I'alued.

This follows from (2.69) by replacing k with -k and noting that for a real-valued stationary pTOceS'S rP.o(k) = ¢.~.~( -k).

I

2.4.4 Response of linear systems to stochastic processes We consider a linear time-invariant discrete-time system with input {x(n)}, output {y(n.)} and impulse response h(l1) as depicted in Figure 2.4. The input and output sequences arc stochastic processes, but the system impulse response, h(II), is a deterministic sequence. Since {x.(n)} and {Y(IIl} are stochastic processes, we are interested in fi:nd.ing how they are statistically related. We assume that tPxAk) is known, and find the relationships that relate this with ¢..,,(k) and rP-")I(k). These relationships can be conveniently established through the a-transforms of the sequences.

We note that.

I

cI>.~.,,(z) = E q)XJ,(k)z-k

k=-<x"

00

= E £[.1:(11).1"(11- k)I;;-k

~=-<X!

(2.72)

I

x(n) I

--. hen)

y(n)

II

J

Figure 2.4 A linear time-Invariant system

Stochastk Processes

43

Since Doth summation and expectation are linear operators, their orders can be interchanged. Using this in (2.72) we get

OC 00

<l!xy(z) = L 2: h' (l)Ef::x{n)x'(n - k _1)Jz-k

1:=-001=-=

cc eo

= L It (I) L ¢,IpOCk + l)z-k.

1=-<:<, k=--;;<l

! .

If' we substitute k+ I by m, we get

co eo

il'o\)'(z) = L IJ'(l) 2: tP.xx(m)z-(m-l)

1=- m=-tiil

oc

= L 1((I)z' E tpx.,,(m)z-m.

,=-~ m=-oo

Thi!l' gives

where H(i) follows the eonventional definition

00

N(z) = E h(n)z-II.

n=-oo

Furthermore, using (2.55) (4.56) and (2.75) we can also g~t

1l?J'~(Z) = H(z)cI>;;;x(z).

The autocerrelation of the output process {yen)} Is obtained as follows:

wyi.z) = E ¢.Vy(k),rk

1:=-00

00

= L E[y(n)y"(n - kll;;;-k

k=-oo

~ fro W ]

= k~ E I~ h(l)x(n - I) ",~,>/t(m)x·(I1- k-Iu) :-k

r.::Q oc (Xl

= L hell L: h'(m) L E[x(n -/)x·(n - k - 1I1)]:-k

1=-00 m~-"" k=-""

:oq oc 00

= L h(/) L /{(m) L I/J xx (k+ m _/)Z~k.

1=-00 nt=-o.:l k=~t:Jo

(2.73)

(2.74)

(2.15)

(2.76)

( .77)

(1.7&)

44

Discrefe- Time Signals and Systems

Substituting k + nt -/ by p. we get

oc ~ .3:.-

{')'J·G;:) =L hff):F1E }({m)affl z= t/!~,.(P)z-P

l'= -"2!i.': III = -:no P'= ~

(2~79)

or

(2.80)

It would be also convenient if we assume that z vades only ever the unit circle, i.e, Izi = J_ Til that case lIz"' =, z and (2.75) and (i,SO) simplifY to'

(2.~1)

(2.82)

respeetively. Also. by replacing 2 with e'".1 .. we obtain

<p-.YJ,(eflJ) = H"(eiw,w;c.."t"(eJ"'), if!yAej",) = 1'l(e:j"').p .• ~{(l.J"'), <p,.,),(ei"') = IH(e/"')I"$.,._~(ej"').

(2.83) (2.84) (2:85)

These equa t ions show how the cross power spectral densities. <P"", (e}"'] and 9'> r~'(~.J"'), and also the output power spectral density, <VJ'I,(eje!), are related to the input power spectral deusi tt, 4>.<.< (e j",). and the system transfer-Iunction, fI( e1"')-. As a usefulappl icatlon of the above results, we consider the following example.

Example 2.6

Consider a bandpass filter with a magnitude response as ill Fi~'uw 'L'i. The input process to thefilter is-a zero-mean, wide-sense, staiionary, sl6chas'lJ,: precess {~(nl). We are illl~led in finding

IH(el"')1

OJ

Figurec :1.5 Magnitude- respense '11 a bandpasa II Iter

StoOh'8Slic Processe.s

the variance of the Ii!ril'r output. {y(1I1 I. We note iha;

. {. I W <w<w

IH(eJw}1 =. ' . T - .. -~'

0, otherwise.

Su]y;;t:imti,n,g this-into (2.85) and using an equation ~milar to (2.60) fer {)lell}}. we. obtain of; = ~ I" I f{(cJ"') I "Ip,,(<!f") diu = ,1 I" <I>I«e/'") dw.

_1r J -'ff ~1f J~r

45

(2.86)

lfw! apl?roac~esw" then we may writeW:2 - "" = dw,wl1cre dw is a variable approachingzero. [0 thai case, we may write

This pmves- the interpretation of the power spectral density given .by Preperty I in Section 2.4:), i.e, (2.70).

Consider tae case where there is a third process, {d(n)}, whose cross-correlation with the input process. {x(n)}, of Figure 2.4 is known, We are interested in finding the crosscorrelatien of {d(n)} and {i{n)}.

rJ1 terms of.z-wnsforms, we have

'"'

4i,I_\,(2) = L I/.ld.,.(k)c~k

k=-<XJ

~

= E EId(n)y"(n - k)]z-k

k=-r,p

= kf;-cQ E[d(lI) I~ 1!'(/)x*(II- k: -I)] z-k

oo x

= L It(1) LrP<h(k.+ I)z-k.

I=-<e k=-~

Substituting k + I by m, we obtain

<l>,d") = f.: 1/'(1):.' f (bd~(m)z-'"

,= -:><: m =-<Xl

or

Also, using (2.56) we can get, from (2.9IJ).

(2.1:;8)

(2.89)

(2.90)

(2.9! )

1

46

DJscrele- Time Signals and Systems

Problem·s

47

IT the system output is referred to as {!I(II)}. 'find the followings: (i) q,,,,,(z) and <iI~,,(;z).,

(ii) The cross-correlation and autocorrelation funcriorrs <p~"(k) and ¢U!«k).

2.4..5 Ergodicl.ty and time aver.ages

Ute e,stimaticm . .of stochastic averages, as was suggested above; requires a large number of realizations (sampl€ sequences) of the underlying stochastic processes, even under the condition that the processes are stationary. This is not feasible, in. practice, where usually olllya single realization of each stocna.sLre process is available. In tlnH case, we h8'1'C no choice but ~o use time avera..ges to es.tima,lil' the desiredensemble "averages .. Then, a fundamental question thatarises is the following. UttderwhaLcondition(l)) do tiNleavemges become equal toensembleaverages? As may beilltui:livdy understQ"od. it turns out that under rathermikl conditions the @nly requirement for the time andenaembte averages to be the same is that the corresponding stochastic precess be stationaryjsee Papoulis (199]).

A stationary stochastic process {x(Il)} is said to' beergod ic i fits ensemble averages 3r-C equal to lime averages, Ergodicity is usually defined for specllc )lVerages, FQr example, we may come across terms such as mean-ergodic or cerrelatfon-ergodic, In adaptive filters theory it is always-assumed that all. thennderlying.prosesses ate ergodl« in lhe strict sense .. This means, thai all averages can be obtained by time averages. We make such assumption threughout this book, whenever necessary,

P;t7 Repeat P2.6 when

Find the-answers fOJ thetwo cases when a = b and a -f b.

P2.8 Repeat P2,6 when lI(z) is afinite-duratien impulse response system with

N-I

H(z) =Lh(ll)z:-~.

n=D

Problems

P2.9 Work eut the .detalls of derivatlon of (,2.()6) from (2,65).

P2.1Q By direc; derivation show that for ·a linear time-invariant SySltetll with input fx(n)}, outp:ut. t_v(n)J. and system function Eft)

1[>;>,,(;;) = R(lI)q;, .... (z).

P2, I Find the inverse z-transform of (2.30) when

(i) its region of convergence is regi on J of Figure 2, I ; [ii) its regie 11 of convergence is region II of Figure 1.1; (iii) its region of convergence is region III of Figu.re 2,1.

Also. if {d(n)} is a- third process

P2.! Use the basic definlfiousof thecorrelation and covariance Iunctions LO prove the symmetry properties (2A8)-(251),

P2.11 Wrlt,e tile following a-transform relations in terms of the time series "(n) and the correlation rune ti ens:

Pl.3 Show (hal for a stationary stochastic process, {x(n)} , if 111" of. O. then (i>~Az) contains a :summl,UjOIl lilat is not convergent for any value of the complex variable z,

l?2A Consider a stationary stochastic process,

x{n) = U(I!) + Sin.(Woll + OJ

where {IJ(n)} is a ,stati:olla.ry whik noise, Wo is a tixeQ. angular fr.equency, and Dis .lj random phase which is uniformly distributed in thcinrerval -r, :s (J:-::; sr , btl! constant fer each realization of {x(n)}~ Find the autocorrelation funetioa of {X(IIJ} and show that 'li.r«z) has no region of convergence in the z-plane,

P2.ll Consider the: system shown in Figure P2.t2. The input processes, {u(n)} and {v(n)], are zero-mean and uncorrelated with each oilier, Derivethe relationships that

P2,~ Prove the symmetry equations (2.55) and (2.56).

P2.6 A st~:tlonary white nofseproeess, {lI(n)}. is passed through a-linear time-lnl/ruian! System llJ'j·tll the sY5temfunct/(Jf]

v(n)

I

H(z) = I' lal <: 1.

l-ar

u(n)

yen)

Figute P2..12

48 D,;screle~ Time S.ignals and System.s

rda'!".;: <Pviz), <fl",,(z), H(z) and G(2) witn- tbe following fUllctioll$: (1) muyeE).

(ii) <I>"y{:<).

(iii) <I>J'i'(z).

PZ.H Consider the system shown in Figure ·P2.13. The input, {Ei(n)},is a s:tatiom1.ty zero-mean, unit-variance .. white noise process. Sh.ow that

1

(1) <I>x-Az) = (1 _ O.5z-i)(1 - 0,5z)"

(ii) ~l'J,{t} = 4.

1- 2z (iii)m"y(t) = I ~ 0.52·


v(n) 1 x(n) y(n)
1- OSt-et 1-2.z-1 Figure P2.13,

P2.14 Cousider the system shown in Figure P2. ['4. The input, {1.I(n)}, is a stationary zero-mean, unit-variance, white noise process. Show that

(i) <pX).(m) = D hU + m)g' (fl, (ii) 4>:tj'(z) = H(z) 0" (l/z·),.

when! h(n) and g(n) are the impulse responses of tile. subsystems H(z) and Gez), respectively .

x(n)

H(l.}

yen)

G(z)

Figure P2..14

3

Wiener Filters

In tlitS chapter we study a class of optimun;i linear filters known as Wiener filters. AB we \ViU see ill later chapters, the concept of Wiener filters is essenrial as well as hetpfuJ to understand and appreciate adaptive filters. Furthermere, Wiener filtering is- general-and applicable La any application that involves licearestimation ofa desired signal sequence from another related sequence. Applieaiions such .as prediction, smoothing, joint process estimatien, and channel eqnalization (deconvolution) are a11 covered by Wiener fillers.

We study Wiener-filters by looking at them lremdiffereru angles, We first develop tile theory of causal Iransvej·saJ WronGr nJrer,s, for the C{!seof discrete-tbne, reel-valued signals. This will then be extended to the case of complex-valued signals. bur discussion tallows with astudy of UIIClJnstra.ined Witmer.filrers. The term uncon:stra£nedsignifies (bat the iiI.tet impulse response is allowed to be nou-eansaland -infinite in duration. Hie study of uncoustrained Wiener filrers is "'W instructiee, all it reveals many importanr.aspcetsof Wiener !lIters which otherwise would be difficult to. See.

In the theory of Wiener filters the underlying signals are assumed to be randem processes and the filter is designed using the statistics obtained by ensemble averaging. We folloW this approach whiledoing the theoretical development and analysis of Wiener filters. However, from the impleraentation point of view and, in particular. while developing adaptive algat:ithlJ.lS in later chapters, we have to consider the, use of time averages i nsiead of ensemble a vetag.e$, The adoption t>flhis approach In the development of Wiener filters iSalS.0 possible, once we assumeall theunderlying processesare ergodic; that is, their lime and ensemble averages are the-same (see Section 2A,S).

3.1 Mean-Square Error Criterion

Figure 3.1 shows the blockschematlo of a linear discrete-time filter W(z) ill the ccatexr of estimating a desired signal d(n) based on all excitation __ l:(n). Here, we assume that hOU1.l:(II) and d(lI) are samples of'infinite lenglh. random processes. The filter output is y(nl and e(n) is the estimation errer. Clearly, the smaller the estimation error, rhe better the filter perforrnanee. ,&.$ Ute error approaches zero, the. output of the filter approaches the desired sigual,d(n). Hence, the question tha1'ari~es is the following: What: is the most appropriate ehoice for the parameters of the filter which would result in the smallest possible estimation error? To a certain clttent, the statemen 1 of this question itself gives

f

J

j

f

50

Wrener Filters

51

Wiener Filler - the Transversal, Real-Val~ed Case

den)

where. p tak~·s integer values 1,2,3" ... Clearly, the case of p = 2 leads to the Wieper filter performance function defined above. Cases where p > 2, with p being even, may result in more than one minimum and/or maximum point. Furthermore, the case of add ptllrns out to be- dlffkliIt to .haadle matheznatica!Jy., because of the Jliwilulus sign on e'(n).

+

e(n)

x(n) _yen)

--l~. Wet) . :t:

FI.gure .3.1 Block diagram of a1lltering'proplem

3.2 Wiener Filte't- the Transversal, Real-Valued Case Consider a transversal filter as shown in Figure 3.2. The filter input, x(1!), and its desired output, d(I'I), are assumed to be real-valued stationary processes. The filter tap weights, Wo, WI,"" ww_ 1> are also assumed to be real-valued. The filter input and tap-weight vectors are defined, respectively, as the column vectors

us SOIDe liin(s on tlre dioice of tile nIter parameters. SInce we want 'the estimation error to be as small as possiblo, a. sttajghtfmward approach to the;.t:\esrgn of the filter parameters appears to be 'to. choose-an apprapri(itf! function of this estimation error as a cost function and select that .set of filter parameters which op.timizes this cost function ill 'Some sense'. This is indeed thephilosophy that underlies almost aU filter design approaches. The various details of this design principle will become clear as we go along, Commonly used synonyms for the cost function are tlre performance !wlGliDn and the p.erjormtmce

surface. .

·111 choosing a perfonnsnee funetion the following points have to be considered:

1. The performance function-must be mathematically.tractable ..

2.. The performanoe Iunctirm should preferably have a single :minimum (or maximum) point, so that the optirrllnn set of filter parameters could be selected tlllambigum:t~ly.

The tractability of the performance function is essential. as it permits anal-ysis of the fi] terand alse gr.eatiy simplifies the developrnen tof adaptive algorithms for adjustment of

the filtt;r parameters. The number of mlnima (or maxima) points for a perfarrnance function is closelyrelated to the filter structure. The reaursive (inllnite~duration impulse response - IlR) filters, in general, result in performance Iunetions that may have many minima (or maxim-a) points, whereas the lion " recllrslM' (flnite-duratioc impu!sem":spollse - FIR) filters are guaranteed t,o have a siIiS'le global minianrm (or maximum) point if a proper perfonnaJ;lCc fl1llction is used. Because of this, application of the IIR filters in adaptive filtering' has peen very limited. In this book, also, with the exception of a few cases, our discussion is limited,to FIR adaptive tillers.

III Wiener filters the performance function is chosen to be

(3.3)

and

x(n) ~ [X(I1) x(ll- I) ... x(n -N -l~ lW,

(3.4)

where the superscript T stands for transpose.

The filter output is

#-1

Y('.I) = L lI'fx(rI- i) = W T x(n}, i~O

(3.5)

which can also be written as

(3.6)

(3.1 )

where E[.] denotes stattstlcalexpectation. In lact, the performance function e, whiGh is also called the /treal/-square erro« crirerion, turns O,u1 to be the simplest possible functinn that satisfies the two requiremen [8. noted above. It can easily be handled mathematical! y, and in many cases of inter est it has a single global minimum. 1'0 particnlat.Jntheease of PIR filters the performanee function f is a hyperparaboloid (bowl shaped) with a single

minimum poi.nt which can eaSily be calculated by using the second-order statlsticsol' the underlyii;lg rarr4o.m processes,

It iainstructive.to note that a possible generalizaf on of the mean-sq uare error criterion (3.1) is

I;p = E[je(n)it'l,

(3,2)

FIgure 3.2 Alransvers:a.1 IUter

52

Wie.nerFiJ!ers

Wiener Filter- the Transversa', Bea/-Valued Case

53

since WTX(JI) is; a scalar and thus it is equal to its transpose, i.e. wT x,(n) = (wT x(n))T =

xT(n.)w. Thus, we may write .

These equations may eelleetively be writtenas

\7, = 0,

(3_14)

e(nl' = den) - yen) ='d(fI) - wTx(n) = d.(N) - x T (n)w.

where '17 is the gradient operator defined as the column vector

(3c7)

[a 8 a JT

v = eh~p OIVl - -. 811'N_1 '

(3.15)

Using (3.7) in (3 .. 1) we get

€ = E[e(n-)J = E[(d(n) - w T x(n)) (d(ll) - x T (II)W)].

(3.8)

and (I on the right-hand side of(3.14) denotes the column vector eonsisriug' of N zeros, To find l)1e partial derivatives of, with respect 10 the-filter tap weights, we first expand (3.12) as

Expanding (he right-hand side of (3.8) and noting that w 0.'U1 be shifted out of the expectation operator, E[·], .sim;e it .is not a statistical variable, we obtai-n

N-I ,v-I N-I

~ = Eld2(1'l)] - 2 L p,lV/ + L L HttW/>,f/",.

1=0- 1=0 >1)=0

(1 . .16)

e = E[d1(n)j- wTElx(n)d(n))- E[d(n)xT(n)]w+ wTEIx(fJ)xT(n)]w.

(3.9)

Also, w~ note that the double 9ul1l:lfiation Oll the right-hand side of (3.16) may be expanded as

Nexl, if we denne the N x 1 cross-eorrelation vector

p = E[x(ll}d(nl] = [Po PI

(3_10)

N-IN~I N-IN-I N~I N~I

E L WtWm'.ri" =E L 1~·'WII,r,,,, + Wi L W/J"/i+Wr L w",r,,,, +1I?r/i. (3.17)

1=0 m=O J=O ",,,,0 I=C m",,~

'-;1 m~i I #-1 ~'OJ i

and the N )II N autocerrelation matrix

roo '01 r02
flO "n fl.2
R = Erx(l/)x T (n)] = f2D '21 'n. 'O.N-l fl,N_I

f2,N-l

Substitutlng (3.!7) into (:116), taking partial derivative of € with respect to 11'". and

replacing m by t. we obtain ..

('3".11 )

o~ N-) ..

811" = -2Pi + L w'('/i + 'if), fori =O.I, ... ,N - L

r 1=0

(3.18)

r"·-J.O 'N-I.I fN-I,2

'N-I·,N-I

and note that E[d(n)xT(n)j = _pT,and also wTp = 'pTW,. w~obtaiII t = E[d2(1l)] _lWT P + w TRw.

To s.implify this, we note thai

'Ii = E[X{li - l)x(n - r1J = 1jJ",,,(i - f);

{3.19)

(3.12)

where ¢.u(t -I) is the autocorrelation function of X(II) for lag i -I. Similarly,

This isa quadratic function of the tap-weight vector W with a single global minimum. [ We give. the futl details of this function in Ch<lpter 4.

To obtain the set of tap weights that mintmizes the perfermanee fll1lctio.n e, we need to solve the system. of equations that results from setting the partial derivatives of l; with .respect 10 every tapweight to zero. That is,

(3.20)

Considering the symmetry property ofllie autocorrelation function, i.e. ¢.-.:;r;(k) = ¢~.,"( -k), we get

e~ .

-.-. = 0, for i = 0,1,. _ -IN-I. .aWf

(3.13)

(3.21 )

Substi rutiag (3.21) in (3.18) we 0 blain

! II may be noted that fOT (3_ 12) to correspond to a convex quadratic surfaee, so that it has a unique minimum point, arid not a saddle poirtt, "R has to be a posi tiV(l definite matrix. TJjis point, which is missed out here, will be examined in detail in Chapter 4.

at: ,v-I

--'=---- = 2 '\"" filII', - 2PI! for j = 0, 1, ... , N - 1,

aw, f=o

(3.22)

1

i

1

J

54

Wiener Filtf;lrs

55

Wiener' F1I!er - the Transversal, Real-Valued Case

which can be expressed using matrix notation as

in-put, x(n), is a .. stationary white PfOOe>;S with 1I variance ofunity. Theadditive noise, ,1/(/1), is ZeT·(}mean and uncorrelated with x(n}, and its variance is ~ = 0.1. We want to.ccmpute the eptimum

vahmst>f w(r mid "'.I thaI minimize Ef.el(iI1)l· .

(3.13)

Letting \7, = 0 gives the following equatloaIrom wruih. the optimum set of Wiener filter tap weights can be obtained:

We need to compute R and p to obtain {he optimum values of 1<'0 and I~I that minimize E[e~(n) J.

For this example, we gel

Rw,,=p.

R = [ EI.r(Y'JJ £[x(~)X(.Il-I)J] = [1 OJ.

B[x(fI- J)x(n)] E[..2(n - I)] .0 I,

(3.28)

Note that we have added the subscript '0' to w to emphasize that it is the optimum tapweight vector. Equation (3.24), which is known asthe Wieller-HtJP! equation, has the following solution:

Thi~ rQllow:s:>~Ul. x(n.) is \,,?i"te,. (bus E!x(l~x(1! - I)J = Blx(!! - 1).>;;(1'1)] = 0, and also i! has a vanance of umty, The latter implies that E[.l(n)] = EI';(n - J l] = I.

Also. we note that den) = 2x(I'I) + 3x(n - I} + v(Il), and .. ihus,

(3.25)

assuming thai R bas an inverse.

Replaeing w by Wo and Rwo by p in (3.12) we obtain

€miJi = E[d1(n)1 ~ w!p

= E{d~(n.)J - wJR!jVo.

[ E,[x{n)d(rr)] ] P = E[x(rl ~ I )d(nl]

= [ EfAi/l}(2x(n) +3x(~I-IJ + U(II) 11 . J.

E[x(n - l)(h-{n) + ],l"(" - 1) + 1I(11))J

(3.29)

(3.26)

Expanding-the terms underthe expectation operators.end noting that .E!~2(l!l] = EEx2(1'I -I)~ = 1 and E[X(II).'t(Jl- J)] = E[x(n)li(n)J = E[x(1I- n ... (nl) = D, we gel

This is the mirdf/':1u.m mean-square errOl" that can be achieved by the transversal Wiener filter J,j/{Z} and is obtained when its tap weights an: chosen according to the optimum solution given 1;ly (3.25).

For our later reference, we may also note that by substituting (3.25) into (3.26.) we obtain

p = [~].

(3:30)

Similarly, weob,!.1in

(3.27)

Efd2(n)] "" EI(2x(n) +3.~{rI- 1) + v{n)?1

= 4 El,-2(II)] + 9E[x~(n ~ I)] + a; = 13.1.

{3.31}

Exomple 3.1

Consider the modelling problem shewnin Figure 3.3; The pl~t is a two-tap filter with.an additive noise, w(Il), added 1:0, its output. A two-tap Wiener filter \\1.111 tap weights Wo and '''J is used to model the plant parameters. The same input is applied to both the plant Hud Wiener filter, The

Substituting (3.2-8), (3.30) .and (Hi) in (3.12-) we get

(3..32)

This is -, a paraboloid in the.three-dimensional space with the axes ,j'll, WI Jmd f Figure 3A shows this parabcloid, We maynore that the optimum tap weights of the Wiener tilt", 'are given by (3.25), which for the present example may be written as

(3.33)

X{I1}

·d(n)

Plant

Also,. from. (3.26),

(H4)

-

Model (Wiener Pilter)

Flflure 3.3 A modelling problem

Clearly, tbe valne.~ (If wQ,o. '~",IIl:n,d. ~min coincide with the minimum pointin Ftgure 3.4.

56

Wiener Fillets

Pl"incip.le of Orthogonality

57

5

these are linear operators). we obtain

15

alI [. 8e(n)]

-.-. = E 2e(J1)~ >

aWl . .8wj

for i = 0, 1, ... ,N - t,

(3.37)

10

where e(/!) =d(ll) - Y(II). Since d(/!) is indeplllldent of thefilter tap weights, we get 8c(n) = _ 8Y(Il) = -X(II - iJ

QW; 91'J1 '

(3.38)

where the last result is obtained bi' replacing for yen) from (3.5). Using this result ill (3.37), we obtain

a~.· . = -2E~e(")x()l - ill, for i = 0, 1, ..... N - L !V[

(3.39)

o 5

3

4

From om diseussion in the: previous section we know that when the Wiener filter tap weights are set ia their optimal values, the partial derivatives of theoost function,~, with respect t.o the filter tap weights are all zero. Hence, if «"tn) is [he estimarlonerror when the fiher tap weigh ts are set equal to their optimal. val ties, then (3.39) become-s

2

Efen(n)x(n - ill = 0, for i = 0, L, .. _, N - 1.

(3.40)

p ', 35)

This shows that at the optimalsetting of the Wlener filter tap weights, the estimation error is uncorrelated withthe filter tap inputs, i.e. the input samples used for estimation. This LS known as the principle of orthogonality.

The principle.of orthogonality is an elegant result of Wiener filtering that is frequently used for simple derivations of results which ctherwise would seem far more difficult to derive, We will llS!< the principle of orthcgcnalitythrcughom this book fOT many of our derivations ..

As a useful corollary to the principle of orthogcnali Iy, we note that the filter au rpu tis also uucorrelatedwith the estimation error when itstap wei'ghtsan:l set to their optimal values. Tills may be shown as follows:

Wj

o 0

Figure 3A The performance surface ·qf the modeJ I1ng. proble{fl of Hgu re 3.3

The features of interest on the performance surface in Figure 3:4 and the results obtainedin (3.33) lind (334) and also Figure 3.4 may be" uoderstood better if we note that the right side. of (3.32) mayalso be expressed lIS

Clearly. the minimum value of{ lsachieved when the last two terms 011 the right-hand side of ('3.3 5) are famed to 1:"'fO. This coincides wuh the results in (333) and (3.34).

[ N-J ]

E[e,,(n)l',,(n)] = E eo(fl) ~. W<>,,!X(II - i)

,=D

3.3 Principle of Orthogonality

,v-I

= L wo,iE!ell(n)li(n - i}]. .=0

(3.41 )

III t1;Lis section we present an alternative approach to the design of Wiener filters. This presentation is a complement to the derivations in the previous sectioa in the sense thai the approach presented below can be considered as a simplified/shortened version of the approaoh in the previous section. More importantly, it leads to more insight iuto the concept of tJ:Hl Wie.der IiIJering problem. .

We start with the cost function equation {l.l)' which in the ca$C of real-valued data may be written as

where Yn(n) is the Wiener filter output when its lap weights are set to their optimal

valves. Theil, using (3.40) in'(3.41) we obtain .

E.(e,,(Il)y,,(n)J = O.

(3.42)

(3.36)

We may also refer te t11(: above result by saying that the optimized Wie.flerjiller output and (he estimation error are orthogonal.

The words orihogQnqlity and orthogonol are commonly USed for referring to pairs of random variables that are urrcorrelated with each other. This originates lrom the fact that the set of an random variables wi th finite second women ts'consritutes a linear space with

Taking partial derivatives of f. with respect to the filter tap weights, {w;;. i = 0, I, ... ,N - ·I}, and interchanging the derivative and ex pectation operators (since

58

Wiener .Fillers

Normalized Perlormance FUllction

59

1

an inner product. The.lnnerproduet in this spaceis defined to be rbecQrreiation between its elements. Ia particular, if x and y are two elements. of the linear space of random variables, then the inner product of x and y is defined as B{.vJ, when x and yare realvalued. or E[XJ'~l. in the more general case of'complex-valned random variables. Tlren, in analogy with the Euelidian space in which the elementsare vectors, the geometrical coneepts such as Qlt40gonali !:y" projection and subspaees may also he defined fQr the space of random variable'S. Theinterested reader may refer to Honig. and Messersehmit! (1984) for an excellent" yet simple, discussion on this topic.

Next, we use the principle of orthogonality to give an alteraarive derivation of the Wiener-Hopf equation of (3.24) and also theminimnramean-squared error of(3.26). We note that

3.4 No.rmalized Performance Functi:on Equation (3AJ) can be written as

(3.4.8)

8qua,ring both sides of (3.48) and taking expectation, we get

(3.49)

N-I

e,,(I1) =d(ll) - L wo;x(n - I), 1=0

We may note that E[e~(n)] = emin, .and the last term in (3..49) is zero becauseof (3.42). Thus, we obtain

(3.43)

[' - ... 2 I

~lllio = E d'{n)] - Ely,,(n) ,

(350)

where lb~ w",/satl_: the optimum. values of the Wiener filter tap weigb~s. Substituting (3.43) in (3.40) and rearranging the results" we get

N-]

E Elx(n ~ i)x(1t - 1)]w"l = Erd(n)x(n - i)], for i = 0,1, .... , N - 1. (3.44)

1=0

which suggests that the minimrdn mean-squareerror (II the Wiener filter outpu: is the dfjfel·elfce bel ween (he meal'l~sq:uare error of the dcsit:{:d output and the mean-square "error of the best esiimate oj that ai lhefilter output.

It is appropriate it we define the ratio

We also note tbatEfx(ll - t)"'(.I.I - I)] = Tli and Efd{n)x(n - i)] = Pj. Using.these in (3.44) we obtain

(3.51 )

N-l

Lfi/Wi>,1=PI, fori=O,I, ... ,N-l, 1=0

(3.4"5)

as the ncrmafized performance Iunetion, We may note that (= 1 when Y(l!) is forced to zero, i.e, when no estimation of d(n) has been made. It reaches its minimum value, (_mi'" when the filter tap weights are-chosen to achieve the minimum mean-squared error. This

is given by .

which is Ili:l:thing but (3-.24) in expanded form, Also, we note that

{min = E[<;!(Il)]

= E[eo(n)(d(n) - y,,(n))]

= E[e,,(n)d(n)] - Ele,,(n)y,,(n)] = E[e-\)(n)d(n)J,

(3.52)

(3.46)

Noting that ("lin cannot be negative. we find that its value remains between 0 and L The value of (",,-~ is an indication of the ability 0/ Ihefil,el" 10 estimate the desired output. A value oj ('Ilin close to ;,era is {fn indication n/ goo.1i perjtJr'lnculCe oj the filter. and a VlJtue of (m;" dose -/0 one indicates poor performance of the filter.

where (3,42) has been used to obtain the last equality. Now, substituting (3.43) in (3.46), we obtaan

3 .. 5 Extension to the Comple~~Valued Case

N-I

= E[dl(nJl- L ",,,,iE[d(n)x(l1 - ill

1=0

1\'-1

= E[lf(n)J - E WQ,iP-{,

i=C

(3.47)

There are some practical applications in which the underlying random processes are cornplex-vahred. For imCllnce. in da{a ~ransmisruon rhe.raost frequently 11-8cd $ignalling teehnrques are phase shift k-eying (PSKJ and quadrature amplitude modularion (QiU\.1) in wh.ich the baseband si!g:nal consists of lWO separate components which are the realand imaginary parts of a complex-valued signal. Moreover, in. the case of'frequency domain implemenrarlon of adaptive filters (Cbaprer B) and subbandadaptivefilters (Chapter 9), we wiU be dealing with complex-valued signals, even though the original signals may be real-valued.

which is nothing but (3.26) in expanded Iorm,

60

Wiener Fi./Iers

Extension 10 the Complex-Valued Case

61

In this section we extend the results. of'the previous two seetions to the case of complexvalued signals, We assume a transversal filter as.ln Fignre '3.2. Tbe input, x{n), tile desired output, d(II}, and lhe filter tap weights are all assumed to life complex variables. Then, the esdraarion error, e(n), is also complex-and we may write

Noting thai

.v-I

,e(ll) = d(lI) - L Wkx(n- k),

1r=0

(3.58)

(= EUe(n)fJ = E[e(lz)c"'(n)],

(3.53)

we obtain

where the asterisk denotes complex conjugation.

As in thereat-valued case.jhe performance function, e, in the complex-valued case is also a quadratic function offilter lap weights, Similarly, to find tlte optimum set of the filter mp we~ghts, we have to solve the system. of equations .that results Irom setting the partial derivatives of.E with respect 10 ~veTy'!:ap weight La zero . However, noung that the filter [UP weights are complex variables, the conventional definition of derivative with respect co an independent variable is not applicable to the present case. In fact, we note that each tap weight, in the present case, consists oftwo independent variables Urn 1 make the real and imaginaryparts of that. Thus, the partial derivatives wi threspect to thesetwo independent variables have to be performed separately and the results have to be set to zero to obtain the epnmum tap weights Or the Wiener filter. In particular, to obtain the optimum set of filter tap weights. the following set of equations have to be solved si multaneously.

(n9)

and

{3 . .(0)

Applying the definition (3-.56) we obtain

(3.61)

aod

8~ = 0 and 8w".R

~

~ =0, fOT i= 0, 1" ... ,N -1,

VWi,J

(3.54)

c. 8w; 811ii

'iJw,w; = -I} .. + i s+> 1 +i(-j} = 1+ i = 2.

1lIi;1I: oW/,1

(3.62)

where 111111: and 1IIi I denete the real-and imaginary parts of 11-'" r-espectively. To write (3.54) in a mor~ cQrnpat! I'onn, we note that ~, IVI.R and wlJ are <:LLl real. Tills implies that the partial derivatives in (3.54) are also all real and thus the pairs ofequaticns in (3.54) ill'll;Y be combined to obtain

Substituting (3.61) and (3.62) into (3.59) and (3.60), respectively. and the results into (3.57), we obtain

V;,l = -2E[e(n)x' (11 - i)'].

(3.63)

a~ +i .~ -D' -- j--

OII'r,R aW/.1 '

for i = 0, 1 , ... , N - _].

(3.55)

_ When the Wiener filter rap wefghtsa-re set to their optimal values, VSl = O. This g1V~

where) = .;::T, This, in turn, suggests the fnllewing deftahion of the gradlem. of at function with respect to a complex variable 111 = WII: + jWt:

E[ec(n)x·(n - ill = 0, for i = 0, 1, ... , N - I,

(3.64)

(3.56)

where e,,(r.) is the optimum estimerion.erroz. The set of equations (3.64) repre-sents, (he principle of a,.lilp.g(JIwlftyjor the case ofcompiax-valned signals.

To proceed witb the derivarion of the Wi'eiler-Hopfequation, we define the input and rap-weight vectors er the filter as

We note that when { is a real function of lI'R and WI, the real and imaginary parts of V~{ are, respectively, equal to ()08WH and 8f;!Olf'J, and in that case V;{ = 0 implies that 8t;j8wR = BOoII'1 = O. It is 1"1] this context that we can say (3.54) and (J.5$) are equivalent. This would net be true, in general, if f, was complex (s.ee Problem 3.5).

With the above background, we may now continue with the derivation or the principle of orthogonality and iIS subseq uent results, for the case of complex-valued signals. From (3.53) we.note that

x(n) ~ [.1'(n, x(!! - I) ... x(lI- N + l)]T

(3,65)

and

. " [.. '" JT

W = .11'0 WI ..• IIIN-l

(3.66)

(;3.57)

respectively, where the asterisk .and T dscote coraplex conjugation and transpose, respectively. Notethat the elements of (he column vector w are complex conjugates of

62

Wiener Filters

Unconstrained' Wiener Filters

the actual lap weights of thefilter, while co nj ugati.o 0 is li6La.pplied to samples of the input in x(/I). Also, for future reference we niay writ~

x(rz) = [.x~(n) x*(n - 1) ... x~(n - N + l)IH I

(3.67)

x(n)

W(z)

yen)

~ e(n) :+:1--.

den)

and

(3.68)

Flg~re 3.5 Block d'lag:mm Of a WieFl!'lf filter

where the superseript H denotes complex-conjugate transpose or Hermitian, The set of equations (3.64) may also be written as

E[eb(lI)x(n)] = ij.

(3.70)

of the important aspects of the Wiem:r filter, which otherwise could. not be easily understood.

Consider the Wiener filler shown in Figure 3.1, and repeated here in Figure 3.5, for cnnvenlencs. We assumethat the filter W(z) may be ncn-eausal andjgr IIR. Tokeep the derivations in this section as simple as possible and also to concentrate more on the concepts, we consider only the case In which the unde.rlyillg signals and system parameters are real-valued. Moreover, we assume that the complex variable z remains on the unit circle, i.e.lzi = I. This implies that z· = 2-1_ Also, [or Iurure reference, we note. that wl1cn "he coefficientsof a system function, SUGb as W(l),. arc real-valued, W'(l/i<') = W(Z-I) for all values of'z, and W(z-I) = W'(z), when 1;;:1 = L

:The derivationsthat follow in this section depend highly on the results developed in Secdon 2.4.4 of the previous chapter .. The. reader is encouraged to review the larterseetiou before continuing with the rest of this section.

E[e:(/I)x(~I- .i)] = 0.,. for i = 0, 1, ... ,N - t.

(3.69)

Using defi'n:ltit)ll (3.65), Chese may be packed together as

Also, we note that

e~(I'I) = den) - -W~l:(I1),

(3.71)

J

where Wg is the optimum tap-weight vector ot the Wiener filter.

Replacing (3.71) in (3.7Q). we obtain

(3.72)

3.6.1 Performance function

Recall thai the Wiener filter performance function is defined as

Rearranging (3.72) we get

'llw~ =p,

(3.73)

where R = E[x(n )xM' (nl1 and p = E[x(n )d' (n)l. This is (he Wf,e/J(!)·-H opjeq.uqlion for the case ojconrplex-1'lllued $-jgnals.

Also, Iollowin:1 the same derivations as (3.26) and (3.47), for the present case we obtain

Substituting 10'(11) by d(n) - y(n) and expanding, we get

~.'~ .' .

e = E[a' (fI)l + E[r(n)) - 2E[Y(Il)d(n)].

(3.75)

III terms of autocorrelarion and cross-correlation t'unCllionl; (5~ Chapter 2), we Ii1~y write

{min = EUd(II)I'l- w~p

= E[Id(n.)IZ] - W~"RlV.,..

(3.74)

(3.76)

Replacing the last D.VO terms on the right-hand side of (3.76) witb. their corresponding inverse.a-transform (elations, we obtain

3,6 Unconstrained Wiener Filters

r) Ii' (dz Ii' dz-

~ = ¢d'HO +--_ . <1ijy z)-:- - 2 x - .. 4i"Az)-'-.

2'1f] ,C 1 27f1 C z

(3.77)

The developments ill the previous three sections put some c~lliltraints on the 'Wiener filter by assuming that it is causal and the duration of its impulse response is limited. In this section we remove such constraints and let the Wiener filter impulse response-,III;, extend from i = --00 to i = +00-; and deriv~ equations for the filte.r pe-rt'ounance function and its optim.al ~yitteill. Function. StIch developments are very instructive for understanding many

Also, from au, discusslon in Chapter 2; Section 2.4.4, we recall that when x(n) and yen) are related, as in Figure 3.5, for an arbitrarysequence 4(ll), il>)".d(z) = W(Z)<I>:td{t). Also, if z is selected to be on the unit circle in the a-place, then <PyA?) = IW(z)12q,x",,{z),

64

Wiener FUlers

Unconstrained Wiener Filte'fs

65

If¥(z)12 = W(z)W'(z) and l'V"(z) = W(Fl). Using these in (3.77), we obtain ~ = rPdd(fl) + -21 ,11 W(2W~l\'X(Z) ~ - 2 x ~ 1 W{Z)il?"d(Z) dz

1rJ Ie '" ~1rJ [a z

= <Pdd{O) + 2~Ii [w'(Z)g;x;«z} - 241',,<1(2)] W(z)~z,

(3.78)

G(z)

where the con tour ofintegration., C, isthe unit circle. This is the performance function for Q Wienerjllre.r with Ihe systemfimetion W(~), in tis most general form , It covers [[R and FIR, as JIIell as causal and nolt~cr;tllsaJ.fifters, The fOllowing examples show some of the flexi bHi ties of (3.78),

Plant

l-WllZ-1 yen)

-----'--~

l-w Z-l

! L

Wiener Filter

x(n)

FiguI'e 3,6 A rnodellinq problem with an fiR model

E:nw:pk3.2

To keep our discussion simple. we assume all the involved signals and <sYsl{)1Il parameters are realvalued. The input sequence, .T(II). is assumed to be.a white Pr0()¢Ss with zero mean and a variance of unitY,and uncorrelated with the additive noise IItlll, This implies thai

COllllld~r tile ()~~ wher~ the WjenlJffilfer ls ail N -tap FLR -rut;;! with l~e system funetion 11'-1

W{r) = L 11'/.0-1_ (,3,79)

1=0

(3.84)

This is-the case [hal we studied in Section 3.2.

ViiI!:g (3. 7-9lin the first line of (3.1S). we obtain

Wenote Chat d(lI}jslhe noise-corrupted output nftheplam, G(z), when it is excitedwith the-input x(,,)< Then. using the relationship (2.15) of theprevious chapter. ~nd noting iliat all the signals here are real- valued, we get

(3.85)

Using this.in 0.78) we obtain.

(_3J;!ll)

(3.86)

lntercha aging tbe order of the integrations .and summafions, (3.80) iff simplified to

. /'1- 1 ~I Ii,' --'''-1- J N -I I i" .. -/-1 ,_

~= ~if1t(O) + L L Wr"'",")_. .;r;.z,«Z'l~ dz- 2: E 11'1-. <I>",d(")Z dz.

1~1J ",=9 ~<[J c I=Q 21fJ _ .r'

(J.RI)

Usiog [be residuetheorem LO calculate the above integrals, and assuming that (1(,,-1) ha.S 0[\ pole inside the unit circle, WfJ get

Usii(g the inverse e-transform relation, thiN gives

. ('J "'j-I"Q 1-wowl Wo . ["'I-'VO -I "'.0 ]

E;=¢<ld 0 +--. +--2 --G(Wj )+~G(DQ) •

11'1 1-)1'1 IItJ Wi !VI.

(3.87)

N-l "'-1 N-I

f. =. t/ldd(O) + L: I: Wllt'mrP=(m -I} - 2 L "'16".,t( -tj.

1=0 ",=0 / ~o'

(3.82)

This is the performance function of lh.e UR .filter shown ill Figure- 3.6. We 110te that although we have selected a very simple example, the resulting performance function is a cernplieated one. It is dear thata performance function such as (3.B7) - or more complicated ones that would result [or higher order fi lters are diffibult LO handle, 1.Il particular, we may find thai there can be many local minima, arid searching for the global minimum of the performance function may not be a trivial task, This, wh.en compared with the nicely shaped quad ratic performance function of FIR filters. makes it clear why most of the aneution in adaptive filters has been devoted to the transversal s tructure,

Now, using the notations rbd"(O) = E[d1 (11)1. <p',;d(-/) = PI -and ¢=(m - f) = <p~.<(1- tn) = rl,,,, we S;!!! thatthe performance function given by .(3.'32) is the same as what we derived earlier in (J.16).

Example 3.3

Consider the 111 odeiling problem depicted. in Figure 3.6. where at plan! G{z) is being modelled by a Singll;}-pO'le. single-zero, Wie.ne:r filter

3.6.2 Optimum transfer tunciion

(J.83)

We now derivean equation for the optimum .tran,sfer function of unconstrained Wiem;:t filters, i.e. when the filter impulse response is allowed rei extend from time n = -d(J to

66

Wiener FiI.fers

Unconsfr3·;ned Wiener Fillers

67

1

11 = +ClQ' We use the principle of orthogonality for this purpose. Since the filter impulse response stretches from time II = -00 to 11 = +00, the principle of orthogonali ty fer realvalued signafs suggests

d(n.)

..

n

W,,(ejIDJ)

E[~D(n)X(1l - i)] = 0, lor f = ... ,-2, -],0,. 1,2" .. ,

(3.88)

where e,,(Il) is the optirmrm estimationerror and is given by

00

'!'o(fI) = d(n) - L: wo.l.xtn - I). 1;"-i'.I:J

(3.~9)

X(1I)

x,(n)

H~re, the 11'0 IS are the sanrplesof tl16 optimized Wi¢ner 'filter impulse response, Sllbstituti~g (3.89) in (3,88) and rearranging the result, we obtain "

-00-

L wo,lE[x{n- !)x(Il"- 0] = E[d(lf}X:(II- ill,

1=-00

(3.90)

* denotes conjugation

Figure S.l Procedure forcalOl;llatlng the transfer lu"nction of a Wiener filter through a sequence of filtering and a veragin.g

co

E 1VO,1¢;<x(i - I) = I/!w(i), for i = , .. , -2, -l, 0,1,2" ...

1=-=

(3.91.)

the frequency axis only. The signal spectra belonging to negative frequencies are completely rejected. AS" a result; theBItered signals, di(ll)and xln), are bothcomplesvalued. The eross-correlatron Qftft{fi} and xi(n) with zero Jag, i.e, E[d;(n)xi('l)J, gives a quautityproportional to iI>.r,,(e!""), and the average energy of ~(n) gives a quantity propo_rtiomil to 4i =( eM) - see Papoulis (1991). The ratio of these two quantities gives Wo(efl'I). This interpretation becomes more interesting if we note that Wo(eJ"',) is also the optim urn tap weigh! of a single-tap WienerfiJter whose input and desired output arc the complex-valued random processes x[(n).and dr{n.). respectively; see Problem PJ.l2.

The mlaimnm mean-squared estimation error for the unconstrained Wiener filtering case Call be obtained by substituting (3:93) in (3.78) .. Far this, we first note that When Izi = I,

We may also note that E1x(lI-l)x(1I - i)J = cP=(i -l) and E[d(li)x(n - i)] = rPd:<(i). Using these in (3.90) we get

Noting that (3.91) holds [or all values uf i, we may take z-transforme Oil both sides to obtain

(3,92)

(3.93)

I Wo(zW = W,,(z) W; (z) = W(z) q;~.Az)

OJ 4i;.x;(z)

= w (z)q;xd(Z) Qq;x;,(z}

since, On the unit circJe,<p:u,(:<) = .;p·x4{z)al1d iI>~,,(z) = <P;_.(z). Using this result in (3.78). we get

(3.95)

This is referred toas the W'ie1!er-Hopf equation for the unconstrained Wiemir filtering problem. The optimum tmconstrained Wiener filler is given by

1

RepHlclng Z by e1''' iII (3.93) we obtain

(3.94)

.1 i da

~m;n = rPlld(O) -;:;-: . W"(z)oI!,,A,,) -.

LTV C s:

(3.96)

This result .has an interesting interpreta rion. It shows that the/requency response of the optimal Wiener filler ,for aparticular frequency, say.w = WI, is. determined bJI the rmio "!!If the cross-power spectral density of d(ll) and X(lI)' to the power spectral density of x{n), at W = Wi' This, in. tum, may bee ohtained through '3. sequence Qf filtering and averaging steps, as depicted in Figure 'J.7. The sequences :.t(II) and d(lI) are first filtered by two identlcal narrow-band filters, centred at w = W" To retain the phase information ef the underlying signals, these filters are designed to pick up signals from the positive side of

This may be considered as a d aal of the previous derivations ill (3.26), (3,27) and (J. 74); see Problem P3 .. 13.

Replacing z by ef'" in (3.96), we obtain

(3.97)

68

Wiener Filters

Unconstrained Wiener Fillers

69

v,(n)

Since tl{lI) and 1.Ij(1l) are uncorrelated with each other, the second and third terms on the right-hand side of (3.100) are zero. Thus, we obtain

(3.101)

raking a-transforms on both sides of (3.101), we ge

e(n)

(3.102)

Wiener Filter

To find 4>dr(z), we note that only 1;(/1) is common to xCn) and d(Il). and the signals LI(n), III(n) and !/o(lZ) are unoorrelated with one another. Considering these, and following a procedure similar to the one used to arrive at (3.102) one can show that

Figure 3.8 Block diagram of a modelling problem

3.6.3 Modelling

In rills and the subsequent two subsections, we discuss three specific applications of Wienerfilters, namely modelling, inverse modelling, and noise caneellaiion. These cover most of the cases that we encounter in adaptivefil~eci:ng. Our aim, in these presentations, is to highligh t some' of the important features of Wiener iiI ters when applied to various applications of adaptive signa! processing.

Consider the modelling problem depicted in Figure 3.8. An e timare of the model of a plant G(z) is to be. obtained by the Wiener filter W(z). Theplantinputll.(/I),rontaminated with an additive noise £li(n), is available as the Wlener filter input. The noise sequence ",,(11) may be thought of as introduced by a transducer that is used to get samples of the plant input. There is also an additive noise ",,,(Il) at the plant output. The sequences lI(n). Vj(n) and v,,{n) are assumed to he stationary, zero-mean and uncorrelated with one another.

We note that, for [hoc present problem, the Wiener filter input and its desired output a~e . respectively,

(3.103)

where d'(n) is the plant output when the additive noise I',,{II) is excluded from that. Moreover, from OUI discussions in the previous chapter, we have

ID,jI,,(Z) = G(z)<Il",,(z}.

(3.104)

Thus,

iPdx(Z) = G(z)iP.,,(z).

(3105)

Using (3.1'0.1) and (3.102) in (3.93), we obtain

<I> ",,(z) G(z).

<P,,"(Z) + <I?",~, (z)

(3.106)

(3_98)

We note that Wo (z) is equal tc G( z), only wlwnofi vw .. (z) is eq ual to zero. That is, when uT(n) is zero for all values or fL

II is also instructive to replace z byej-' in (3106). This gives

and

d(n) = g" It U(l1) + £I" (n),

(3.99)

(3.107)

where the gus are the samples of the plant impulse response, and tile asterisk denotes eonvot ution,

We use (3.93) to obtain theoptimum transfer function of the Wiener filter, W,,(z). FOT this, we should first fincj q,~x(zJ and 'i1d.~(z). We note that

rb.~x(k) = E~~'(Il)x(n - fl.)]

= E[(u(n) + 11,(n))(tI(n - k) + ",(rr - k))j = E[U(I1)II(1I - k)] + E[u(n)II;(I1- k)]

-1- E[q(n)u(n - k») + E[I/;(II)II;(1I- k)l.

This result has the following interpretation. Matchillgbetweell the unconstrained Wiener filter andthe plont frequency response (1/ allY particularfrequency, w, depends on the.signalto-noise power spectral density ratio iI>"!,(elW)/<PJ,,,&(ei"'). Perfect nuuching is achieved when lids ratio is. infinity (i.e. when <i>",,,.(eI"') "" 0). (lIlil the mismatch between the plant and its mode! increases as <p,,,,(el"')/<l'>,,I"( (eM) deGrease.'!!. Note that <P1;1I(e:l"') and ID",,,,(e1"') are power spectral density functions, and [bUS. ate real and non-negative.

We may also. define

(3.100)

(3.108)

70

Wi'ener Filters

and ~ote that K'J:,Jw) is real and .varies ill the range 0 to J. since ~e power spectral density functious <pu .. (e ) and w",JI,(e'''') are both real and rron-neganve, Further, to prevent ambiguity of the above ratto, we assume that for all values of w, q.""(~) and <])"l"f(efl") are never simultaneously equal to zero. Using this, we obtain

1

(3.109)

An expression for tilt: minimum mean-square error of the modelling problem is obtained by replacing (3.105) and (3.109) iII (3.97). This gives

(3.110)

We may also note that d'(n) and !lo(lI) are uncorrelated, and thus

(3.i 1 i)

Also,

}

(3.112)

Sub tituting (3.111) and (3.l12)lnto (3.110) we obtain

(3.113)

We note that the minimum mean-square error of the estirnarion error consists of two dislinGt components. The first one conies directly from ~he additive noise, 1)0(11), at the plant output, The Wiener filter will not be able ~o reduce this component since 1/0(11) is uneorrelated with its inpu; XCII}. The second component arises due to the input noise, lI;(n), which, in tum, results in some mismatch between G(z) and Wo(z). Thus, the best performance that one ean expect from the optimum unconstrained Wiener filter is {min = <{I".". (0) and this happens when the input noise IIi (n) is absent.

Another very important and useful concept Llia.t can be understood based on the above theoretical exercise is the principle' of correknion cancellation. We remarked above that the Wiene, filter cannot do anything to reduce tile contribution rp~o"o (0) from the total milan-square error. This is because the inputx(n) of the Wiener filter is uncorrelated with lhe output noise.lIp(nJ and hence the filter tries to match irs output yen) wil!1 the plant output d'{I1) without bothering. about tltl(/I) .. In other words, the Wi~er filter attempts to estimate that pan of the target signal d(ll) that is correlated with its own input x(n) (i.e. d'(n)) and leave the remaining part of den) (i.e, vo(n)) unadeeted. This is known as the principle of correlation cancellation. However, as Doted above, perfect cancellation of the correlated part d'(n) from den) will be possible when the input noise 11(11) is absent.

Unconstrained WIener Filters

71

den) =s(n)

y(n)

H(z)

W(z)

FIgure 3.9 C;:h.<!I1nel equaltzatlon

3.6.4

Inverse modelling

Inverse modelling bas applications in both communications and control. However, most of the theory of inverse modelling has been developed in the context of channel equalization. We-also concentrate 011 the latter. Figure 3.9 depicts a cbanue) equalization scenario. The data samples, sen), are transmitted through-a communication channel with the system Iunction H(z). The received.signal at the channel output is contaminated with an additive noise 11(11) which is assumed to be uocorrelated witb the data samples, S(II). An t%iua!ize1", W(z), is ased t{)pwcess tbe. received .!lois:)' signal samples, )o(IJ), to recover the original da ta samples, 8(11).

When theadditive noise at the channel output.is absent, the equalizer has thefollowing trivia I solnti on:

(3.114)

In the absence of channel noise, this results in perfect recovery of the original data samples, as W'o(z)H(z) = 1. This implies that Y(TI) = 8(11), and thus e(ll) = 0, for all zr. This, dearly, is the optimum solution, as it results in a zero mean-square error which of course is the minimum since mean-square error is a non-negative quantity. The lollewing examplegives a better view of tile problem.

Example 3.4

Consider a channel with

(3.lt5)

Also, assume that the channel noise, V{II), is zero, for all II. The-channel output then is obtained by convolviag the iaput data sequence, S(I1), with thechannel Impulse response, /I", which consists of three uon-zero samples fru = -0.4, hi = 1 and 112 = -0.4. This gives

X(l1) = -OA,,(,l} + .(II- I) - 0.4.1'(/1 - 2)

in the absence of channel noise.

We note that.each sample of ."«(ll) is made from a mixture of three successive samples of the original data. This is called intersyrnbol interference (lSI) and should be compensated or cancelled forcorrect detection of transmitted data. For this purpose, we may usean equalizer with the

12 Wie./'jer Fllte.rs
~. : ~
•• • : • •• • ~I •• •• ••
,
-10 -5 0 5 10
(a)
~. • • .~ • ! II!! • • • : .... ~

-10 -5 0 5 10
~. (b)
• • .~ • •• .i. ••• : • •• • ~

-10 -5 0 5 10
(c) Uncon-sli-ained Wiener Fillers 73

Wfien the.al.mlllieJ lUJ.isc, v(n) ,is oon-zera, !he solutioll previded /iy (J.I 14) may notbe optimal. The cllaIlne.i noise also passes through the equalizer and rnay be greatly enhanced inthe freqoonGY bands where Ei(el"') is small. ln this situation a compromise has to be made between the cancellation of lSI and noise .enbaneemeat, As. we show below. the Wiener filter achieves tills trade-off in an effeetlve way.

To derive an equation for W,,(z) when the'chanaelnoise is non-sere, we use (3.93). We note that

X(lI) = h." "" .>(n) + 1;1(11)

and

den) = .>(n) ,

(3.121)

where h" is the impulse response of the channel, H(z).

Noting that S{fI) and u(n) are uneorrelated and using the results of Section 204.4, we obtain from (3.120)

(3.122)

Figurf!, 3.10 Impulse response 'of (a) el'1annel response, (b) eq,ualizer response, and (oj cascaaeot channeland .equalizer

Also, from (3.120) and (3. 121) we may note that X(II) is the output of a systemwlth input 3(11) and impulse response h",plus an uncorrelated noise, W(I1) .. Noting these, and the fact that all the processes and system parameters are real-valued, we obtain

(3.L2;1)

SYSHlm function (see (3.1 14))

Note that the above result is independent of v(n). Also, with Izl = 1, we may. also write

(3.124)

(3J11)

Using (3.122) and (3.124) in (3.93),. we obtain

F1I.¢[o~izing the denoniinator of W,,(z,) IIDd rearranging, we get

W ( ) _ .. -2.5 / . Q.z - tt -.O.5z 1)(1 -2z I)"

{3.118)

(3.125)

This: isasystem function with one pCileinside and o;lle pole outside the unit circle. Willl referencero our disccssinns in rhe previous chapter, we ·mcall· that (3.118) will correspond \0 a stable timein""a;ria.nl sYStem, if the region. of canvergence of Wo (.2") includes the unit circle, Considering this and finding the inverse 1'-trnusform of Wo(")' weobtain (see Chapter 2, Section 2.3)

This is the gene-ral solution to the equalization problem when there is 00, co ns traint OIl the equa lizer length. and, also, it may be let to be non-causal. Equation (3.125) includes the effects. of the autocorrelation function of the data, s(n), and me. noise, ve:n).

Togive an interpretation of (3.12$), we divide the numeratorand denominator by the first term in the-denominater to obtain

p.1l9)

I 1

W,,(z) = i!'o (.) .~( ).

1 + ."'w z Hz

<p,Az) IH'z) [2

(3J26)

To obtain this result, we have noted that Wo(z) of(3.! 18) is similar to fl(z) 0[(2.30), except fur the factor -2'.5 ill (J.118). Fi!;';L=5 3. W(a), (b) and (cj.show the samplesof the Impujse responses orlhe channel, equalizer, and their coavohni 00, respectively, &tistence 'oflSI at the channel.output, as DOled .. above, i.'J due 10 more than one (here; Illree;J non-zerosampies in the channel Impulse response .. This is observed in Figure 3.10(a:). Figure 3.l0(c) shows that tbq lSI is compl~ly

removed after pas1ring tbe received signal through the equalizer. '

Next, we replace" byej"" and define thepa:.rameter

(3;121)

74

Wiener Filters

We rrlllJ note that this is the signal-to-ncise power spectral density ratio at the channel output. 4'lJ~(e.1W}IH(el"')12 and (i",,(e1"') are the signal power 'spectral density and noise power spectral density, respectively, at the channel output. Substituting (3.127) in (3.126) and rearranging, we obtain .

W. (efw) = p(eJw) 1

o I + p{e!"') - H(eJW) .

(3.l28)

We note that the frequency response of the optimized equalizer is proportional to the inverse of the channel frequency response, with a prcportionality factor that is frequency dependent. Furthermore, pre'''') is a non-negative Teal quantity. for power spectra are uon-negative real functions. Hence,

(3.129)

This brings us to the following interpretation of (3.128). TheJrequenty response of the gpt#motl eqUalizer resembles rhe channe! inverse wirhtn a real-'J,uluedfactor ill tire range of zero to (JIlC. This faecor, which is freljuency dependent, depends Oil 1ne signal-to-noise power spectral density ratio, p( ei"'), at the equalizer input. Itapproaches one when p( eJWl is large. and reduces with p«i)JM).

Once again, it is important to note [hal different frequencies are treatedindependently of one another by the equalizer. In particular, at a given frequency w = to';, Wo(eJlilf) depends only on thevalues of Jf(efw) and p(I/"') at w = w;. With tills background, we shall now examine (3-.l1~) closely to see how the equalizer is able to make a good trade-off between the cancellation of IS1 and noise enhancement. In the frequency regionswhere the noise is almost absent, the value of p(e.iW) is very large and hence the equalizer approximates the inverse of tile channel closely without any significant enhancement of noise. On the other band, in the frequency regions where the noise level is high (relative to the signal level) thevalue of p(.eiw) is Oot farge and hence the equalizer does nat approximate the channel inverse well. This, of course, is fa prevent noise enhancement,

Example 3.5

Consider the channel B{z) of Example 3.4. We assume that the data sequence, s(n), is binary (taking '\"aJ\l6S of + I and -I) and while. We also assume that ",{n) is a white noise proc-ess with a variance of 0.04. Wilh these we obtain

<P.,(z} = 1 and q;.",,(z) = OJ)4.

J

Using these in rt 125) we get

-0.4 + z - O.4r

W (2) - .

o - (-0.4 + z I - 0.4,,-2)( -0.4 + z - OAr) + 0.q4 .

J

Figure 3.1 I presents the plots ofl/IH(e.M)j3]](j 1 Wo(ei",)I. ~e note that at taose.Irequencies where 1/18(e''')1 is small, a near perfect matehbetween I/IH(e>"')1 and! WQ(eJ'ol)I is observed. On J:he

Unconstrained Wiener Filters

75

~2

-4

_6L_----~----~----~----~----~

o 0.1 0..2 0.3 0.4 0.5

NORMAL~EDFREQUENCY

other hand, at.those frequencies where III H (cFlI is large, the deviation between the 111'0 increases. We mayalso-note thai 1 Wo(eiw)I remainslessthan 1 1 10(e' .... ) 1 for.all values of w. This is consistent with the conclusion drawn above, since a small value of 1/j1f(eJ"')1 implies thai 1f1(ej",)I is large and r.b us. according to CU 27), p( ej ... ) also is f'll.rge. This, in turn, implies mat thcra.uo p(eJ"')/(1 + p(~'")) is dose to one. and hence from (3.128) we get

A similarargument may be used to' explain why Wo(ei"') is significantly smaller than I/IH(eJ"')1 when the latter is large. Furthermore, (he faci !hal I Wo(eJ"",)! remains less than l/IH(eJw}l, for all values of w, is 'Predicted by (3.128).

3.6.5 Noise cancellation

Figure 3.12 depicts a rypical noise canceller set-up. There are two inputs 10 this set-up: a signal source, .s(}]), ana II noise source, yin). These two signals, which are assumed 10 be unccrrelated with each other, are mixed together through the system functions H(z) and G(z} and result in the prinlary input, d(lI) , and reference input, x(II).as shown in Figure 3.12_ The reference input is passed through a Wiener filter W(z) which is designed so that the difference between Ute primary input and the filter outputis minimized in thee mean-squared sense. The noise canceller output is the error sequence e(n). The aim of a noise-canceller set-up as explained above is to extract the signal s(/I) from tho primary input den).

76

sen)

I---d_(.;,,_n_) _+H+e(!1)

primary input

Yen)

Figure 3.12 Nolse canceller set-up

We note that

x:(n) = 1"(11) + '10,,* s (tt)

(3.13l)

and

den) = sen) + g;, *v(n),

(3.131)

where h" and gil are [lie impulse responses of the filters 8(2;) and GC-<O'),EespectiveJy.

Notifig that s(/I) and 1.'(n) are uncorrelated with each, 'other and re;c:a.!ling the results of Section 2.4.4, from .. {3.t3t) we obtain

(3.133)

Ttl find iJ!:<k(:a), w<; note Lhatd{n) and x(n) are refated wit):J each other through the signal sequences s(n) and v(n) and the filters H(z) and G(z). Since sen) .and lI(rl) are nncorrelated wi til each other. their contribution in <P d.r:(z) may be considered separately. In particular, we may write

(3.134)

whereif>~,(z). is if>tK(z) when lI(n) = 0, tor all values of n, and q;~«z) is <I>rb;(z) when S(II) = 0, for all values ef n. Thus. we obtain

(3.135)

and

(3.1 J6)

Unconstrained Wiener Filters

77

Recall that we assume 1;;1 = 1. Snbwtllttn,g (3.135) and (3.116) in (3.134), we get

(3.l37)

Using (3.133) and (3.137) in (3.93), we obtain

(3. L38)

A comparison of (1.138) wi-th (3,1 (6) and (3.125) reveals that (3.138) may be thought of as a gea eralimrion of the results we obtained in the previous two' sections for the modelling and inverse modelling scenaries.In fact, if we refer to Figure 3.12, we can. easilyfind that the mode llin g and inverse modelling scenariosare embedded in the noise canceller set-up. Wb.ile tryiJJg to minimize the mean-sq uare value of the outp_ut error, we must strike a balance between noise cancellatien and signal cancellation at 'UTe output of the. noise canceller. Cancellation of the Daise 11(11) occurs when the Wiener filter W (z) is chosen to be close LO G(z), and cancellation of the signal ;I'(n) occurs when W(z)is close to the inverse of B(l). III this sense we lttay note that the noiseeauceller .Lre,atss(.ll) and v(rl) without making any distinction between them and tries to> cancel both ofthem as much as possible so as La aeaieve lheminimWll mean-squareerror ill e{-Il)'. This se!<ms contrary to the main goal 6ftheuoise'CftilCeller, which is meant to cancel only the noise, The following discussion aims-at revealing some of the peculiar characterisrics of the-noise canceller setup and show under which conditions an acceptable cancellation occurs.

To proceed with our di.s¢l!ssioil, we deflne ppn(e.M), Pter(eft.» and POQt(ej",), as the sigl1~l-to-nojse power spectral density ratios at the primary inpur, reference input, and output. respectively. B.y direct io:spection of Figure 3.12 and application of (2 .. 85) we obtain

(3.139)

(3.140)

To derive a similar equation for Pilm(eft.'), we note that S(II) reaches the canceller output through two routes: one direct and one through the cascade of H($) and W(z). This gives

(3.141)

where the superscript S refers to the portion oJiD.~(eft,J) that comes Irom sen). Similarly. v(n) reaches the output through the routes -G(z) and WCz). Thus,

(3.142)

78

Wiener Fi.lters

Replacing Wee}"') by Wo(eJ",), and! using- (3..138) in (3.J41) and (3.142), we obtain

(3.143)

and

r'e$,tWctrvely. Hence, pout(eN) can now be obtained as

(3.145)

1

Comparing (3.145) with· ~~ 1.4.0), we find !hat

. 1

P<)ul Cd"') = . (ftu)· pref e .

(3.14~)

This is blown as pOlw!r i!7I1ersiolJ(Widrow et al., 1915). It shows that lhe signal-te-notse PQ)I'er speetra! dinsity rcti« at tk« noiseeaneelle» output ill e.quallo the imtrse of the sigllailo.-1l0~e powe« spt'ctral de11.sify ratio at the r1!fetence blJlut. This means !"ha.t if the signaJro-neise power spectraldensity ratio at the reference input is low, then we should expect a good cancellation of the noise at the output. 0,] the other hand, we should expect a poor performance from the canceller when the signal-to-noise power spectral density ratieat the refereace input is high. This surprising result.suggests that the noise caneeller works better in SilUqliol1S when the noise level is high ant} s.ignal fevel if low. The following example gives a dear picture of this general result,

Exampk3.6

J

To demonstrate how the power insersion property of the noisecanceller may be utilized in practice, we consider II: receiver with two omni-directional (equally sensitive to all dlrection~) antennas, A and B. as ill Figure 3.13.

A desired SiS11:!J :I'(n) = ~(n)~osnwo arrives ill the direction perpendicular to the line connecting A ana B. Ali interferer (jannner) signal 1'(11) = (j(n) eosllWo arrives 11:1 au angle fi" wteh respect TO the direcrionof sen). The amplitudes ~(n) and fJ('I'I} are narrow-band baseband signalS. This implies, that .I'(n) und 1/(11) are narrow-band signalsconcantraied around 41 = WOo Such signals may be treated. as $lngl'e tones, and thus a filter with two degrees of f're&Iom is sufficient for auy linear filtering that may have to be performed on them. This is why only a lWO. lap linearcombiner is. consjdenldin Figure 3..13, This is expected to perform almostas good as any other unconstrained linear. (Wiener) IDler. We also assume lhata(lI)and p(,nr·are zero-mean and uncorrelated wilheach other, Th.G two omnia are separated bya t;list!i:Il¢ of I metres, The linear combine, coefficients are adjus~ed so that the output error, e(n),. is minimized in the meansquare sense,

I

Unconstrained Wiener Fitters

jl'9

A

sen)

den)

I

yen}

Figure 3·.13 A receiver with lwo·omni-direct!ona.1 antennas

Thedesired.signal, s{n), arrives at the same time at both omnis, However; I'(n) arrivesat B first, nnd:anil'el> at A whh a deJay

s = /sinE"

" . ,

c

(3.]47)

whereeis the propagation .speed, To add Ibis to the tinte index n, U·Pas to he Dornia1i~d by tim time st(lp T which correspouds to Que inyremeril of 11. This.gives

~' _ [sinl1Q 0- cT '

(3.1481

N oting these. in Figure 3.13, we have

d(n) = O{li) ~OSllWa +(1(11) cos[(n - 60)wol x(n) = 0;(,,) COS·/lWo +~(n)oosnwo

:\'(11) = a(rr) sin /Two + $(11) SillJIW"o

(3.14g) (3.150) (3.ISJ)

II lIIay be .IlO red tha t in (3. 149) 'life have used f3.{]!) instead ort3( 1"1 - fio). This. which has bee n done to simplify the following. equations, in practice is v.al.rd witha very good approximation because (If the narrow bandwidth of f](J,1, which implies that its variation in time is StOW, and the small size 0[60•

To find the optimmroceelficieats of'the linearcombiner; we shall derive and" solve the WillnerBopf equation-governing-the linear combiner. We note here that

R =[ E [xl (II)] Ef.x(I~_i.(II)J 1· .

• Elx(n).i{ll)] E!X-(n).1

(3.152)

E!2(n)] = E[(Il'.(n.) cosnwo + p(n) oosnwoJ2J.

(3 . .153)

80

Wiener Filters

Expanding (3.1 53) and recalling thar a,{n) and /3(11) are uncorrelated with each other, we- obtain

1 2

B[..?'{il}] = a~ ;0"11 +-H~ErGOs2/lW,,1 + ujE [cos UIWoJ),

(3.154)

where 0;. and ~ are variancesof a(n) and 0(11), respilJl;:tiv¢ly. Also, E[GO~2n!<!oJ is replaced by its time average.~ This is assumed 10 be zero. Thus, we obtain

(3.155)

Similarly, we can obtain

E[?(JI)j = a! + ~ 2

(3.156)

and

E[x(l1)x(nll = O.

(3.157)

Substituting these ill (3.1S2) we ha ve

(3.158)

It is also straightforward to show that

(3.159)

Using (3 .l58) and (3.15-9) in the Wiener -H opf equation -Rw" = p. we get

[a! +?CO~ 5ow" j

a;;.+rrp:

fJ~ sin Void" ,

07.+aj ,

I

(3.160)

Wo =

The ophmized output of the receiver is

co(") =d(n) - w~x(n),

(3.161)

2Strictly speaking the replacement of the- time aver.ag_-e of the periodic sequence cos 211w" ali Efoos 2trw"j does .not tit into the conventional definitions of stochastic processes. The SegUeDC'; Cos mwo Is deterministic and thus 11 does not, really make senseto talk about its expectation which conventionally is defined as an ensemble average. On the other hand'. this is a reality (hat in many occasions in adaptive fllters (such as our example here) the involved signals arc deterministic and the time averages are -used to evaluate the performance of the filters and/or ealculate their parameters. This is ill this conte~t that we replace statistical expectations by time averages. We may note that the problem Slated in Example 3.6 could also be put to a more statistical form to prevent the above arguments: sl::e-Problem P3.21. Here, we have decided not to do this in order LD emphasize the fact that ill practice time averages are used instead of statistical expectatiens,

Summary and Discussion

81

wMrex{1I.) = [x(l1).~(/IW. Usi-ng (3. 149), (3.150), (3.151) and (3.160).iI1 (3.161), we get, after some manipulations,

(3.162)

Now, by inspection of (3.150) and (3.162), we lind thai

.. . al • . the ref . ~ l-'1e signai-te-nerse ratio at . e re,erenc>e input = ~

f!

(3.l63j

and

. . the c (uWrla ~

the slgnal-to-ncise ratio at e output == (c-' L~ = d

u;;)op"

(3.164)

which match the power inversion equation (3.146).

3.7 Summary and Discussion

ill this chapter we reviewed a class of optimum linear systems collectively known as Wiener filters. We noted that the performance function used in formulating the Wiener filters is an elegant choice which leads to a mathematically tractable problem. We discussed the Wiener filters in the context of discrete-time signals and systems, and presented different formulations of the Wiener filtering problem. We started with the Wiener filtering problem for a finite-duration impulse response (FIR) filter. The case of real-valued signals was dealt with first, and the formulation was then extended [0 the case or complex-valued sign-als.

The uaconsrrainedWiener filters were also discussed in detail. By unconstrained, we mean there is no constraint on the duratio» of tbe impulse response of th~ futer, It may extend from time n = -00 to II = +00. This study, althougb non-realistic in actual implemen lation, turned out to be very instructive in revealing many aspects of the_ Wiener Ii] ters that could not be easily perceived when the duration of the filter impulse response is limited.

The eminent features of the Wiener filters which were observed are:

l. For a transversal Wiener filter. the performance functiea is a quadratic function of its tap weigh ts with a single global minimum. The set of tap weigll ts that minimizes the Wiener filter cest function can be obtained analytically by solving a set of simulta:neous linear equations, known as the Wiener=Hopf equation.

2. Wben the optimum Wlencdllte, is used. tbe estimatiDn errnris uaeorrelated with the input samples of the filrer. This property of Wiener filters, which is referred 10 as the principle of orthogonality, is useful and handy for many related derivations.

3. The Wiener filter can also be viewed as a correlation canceller In the sense that the Optimum Wiener filter cancels that part of the desired output ilia tis 00 rrel a ted with Its input, while generating the estimation error.

4. In the case of unconstrained Wiener filter'S, the Wiener filter treats different frequency components of the underlying processes separately. In particular, the Wiener filter

82

Wiener filters

transfer fUllCtiOll at any 'partieular frequency Ckp0Uds only on the power spectral density of the filter input and the erose-power spectra! density between the filter input and its desired output, at rhat frequency.

The last property, although it could only be derived ia the case of UIl con st rained Wiener filters, is also approximately valid when the filter I.en'gth is constrained. The concept of power spectra and their influence on Ihe performance ofWieaer "filters. is fundamental to uaderstanding the behaviour of adaptive filters, We note that the. adaptive filters, as commonly implemented, are aimed at implementing Wiener filters .. In this chapter we saw that the optimum coefficients of the Wiener filter are a function of me autocorrelation function of thefilter input and the cross-correlation function between th(;l..lilter .inpul and its desired output. Since correlation functions and power spectra are uniquely related, we also saw (hat the optimum coefficients can be expressed in tenus of the corresponding power spectra indead of the correlation functitms. In the next Jew chapters, ·",'e w.lll show that file eonvergenee behaviour of adaptive filters is Closely related to the power spectrum of their inputs. In the: rest of th~s book we will make frequent references to the results derived in tl:Jis chapter.

Problems

P3.1 Considera two-tap Wiener filter with the following statistics:

(0 Use the above information to obtain the performance function or the filter, By direct evaluation of the performance function obtain the optimum values of the ulter rap weights.

(ii) Insert the resrrlt obtained in (i) In the perfcrrsance function expressionto obtain the minimum mean-square error of the filter.

(ill) Find the optimum tap weights of the filter and its minimum mean-square error using the equations derived in lIDS chapter to confirm rheresolts obtained in (i) and (Ii).

P3.2 Consider a three-tap Wiener filter with the following statistics:

E[d2(n)] = la,

R = [O~5

0.25

0.5

0.25]

O.~ ,

1

0.5

Repeat Steps (i), (ii) and (ili)of Pro blem P3. L

J

P3.3 Consider the modelling problem shown in Figure P3.3,

(i) Find the correlation matrix R of the filter tap inputs and the cross-correlation vector II between the filter tap inputs and its desired output.

Problems

83

(ii) Find the optimum !:ap weights of the Wiener filter.

(iii) Wha:t is ~ minimum mean-squared error? Obtain this analytically as well as by direct Insp{letioll of FiguE'e P3.1 ..

whlte ~

--~·---.~~I ~--~~

Mth unit 1 ~ az.-1

variance

ern)

Figure P3.3

P3.4 Consider the channelequallzatson problem shown in Figure P.3.4. The data symbols, S(li:), are assumed to be samples of a stationary white process.

(1) Find the correlation matrix R of the equalizer tap inputs and the eross-oorreiation

vector p between the equalizer tap inputs and the desired output .. (ii) Find t,he optimum tap weights of the equalizer.

(m) What is the minimum mean-square error 'at the equalizer output?

(iv) Could you guess the results obtained in (ii) and. (iii) without going through tile derivations? How and why?

s(n)

l~az-I Channel

X(fI)1 -I ._1 I yen)

I Wo +W1Z +w2z I

Equalizer

+ e(n) :-f'___"'-.

1

Pig1:Jre: 'P3A

P3.S In Section 3.5 we ernphaS:i7..ed that for a complex variable 11'

(p3.5-1)

does not imply tlJ~l

8f(w) 8f(w)

--=--=0

DlI'g aWl '

ill.general. In this problem we want to elaborate on. this further,

84

Wiener Filters

Problems

85

(i) Assume that!{w) = III'" and show that for this function (P3.5- J) is true, but (p3.5- 2) is false.

(il) Can you e",reiId thlsresiJlr to ,llie ease when

L Itll)) = Ea,-lIi ;=0

V(n)

1

Equalizer

I ~ J.l~~1 +035..:-'" Channel

and the a;s are fixedreal or complex yQefficieIl,ts?

Figure 1"(1,9

P3.6 WOrk out the details of the der.ivatioll of (3.74).

M.II) By following a procedure si,milru Eo the om: given in Section 3.6.1, show that when the ineolved PIOC!l.SSes and system parameters are complex-valued

P3.7 In Section 3.5,f.or the. complex-valued 'signals, we used the principle of orthogonality to derive the Wiener -Hopf equation and the minimum mean-square error (3.,74). Starting with the de:finitionof the performance function, derive an equation similar to (3,12) fQr theease of complex-valued signals, Use this eq uatioa to give a direct den vation for the Wiener-Hopf equation ill the present case. AJso,con:fi.rru the minimum. meansquareerror equarion (3.74).

where .~{ x} denotes the realpart of x. Proceed with the above result to develop t1:te dual of (3:78).

¥3.8 Show Ilia! fora Wiollar filter with acemplex-valued tap-input vector x(n) and optimum tap-weight vector w"

P3.11 8how that (3.9:3) Is a v.uid result even when the involved processes are complex ~ valued.

\11~P = E[I~x(I1)Pl,

P3.12 Consider Figure P3.12, in which x,(n) and di(n) are the ourpats of two similar narrowbandfilters centred at w = wj,as in Figure 3.7. Show that ifwlO is the optimum value of Wi that minimizes the.meen-sqeare error of the output crror:e;{n), then

where p = Efd(ll)x"(n)] and d(n) is the desired output of the filter. Use this result to argue that ~p is always positive. Also, U$ tlHl above result to derive an equation similar to (3.50) Cor thegenera] case of Wieuer fitters wirhcomplex-valued signals.

'J {I, k=O, E[s(ll)s(n - Ii) = ',' 0,

kef. 0,

d(n.) n d/(n)
~ )
w,
+
xCn) n x/en) _LD e;(n)
) X _'-v
ro,
W.
, P3.9' Consider the droll nnel e,qualizatitin problem depicted in FigureP3.9. Assume that lhe underl.yingpcocesse,sare teal-valued with

~

and

-, ( ) ( k')] { O'~, k: = O.

E~n~n-· = ... . .

0, k# o.

U) FOIO"~ = 0, obtain the equalizer tap weights by cli reci solution of the Wiencr~ Hopf equation. TO' be sure of your results yon may also guess the equalizer tap weights and compare them with the calculated ones.

(ii) Find the equalizer tap weights when ~ = O. I and compare the results with what you obtained in ei}.

(ijj) PItH tile magnitude aadphase responses pf the two designs obtained above and eompare the resuhs,

F1gure P3.12

P3..13 Assuming that Wo(z) is the Qptlmum system function of a FIR Wiener filter, show that (3.9'6) can be.converted to (1.26), and vice-versa.

P3.14 Give a detailed derivation. of (3.122) from (3.120).

I.'3.1.S Give it detailed derivation of (3.123).

86

Wiener Filters

P3.16 For the noise canceller set-up of Figure 3.12, consider the case when <I>s.(z)IH{t)12« i;liw(z).

0) Show that, in this case,

W (2) ~ G(z) + <I>".f(z) H' (2)

o ~ <p",,(:;;) , .

(ii) Show that the power spectral density of [he noise reaching the noise canceller output is

(iii) Define the signal distortion at thecanceller output, D(z), as the ratio ofthc power spectral density of'the signal propagating through W,,(z) to the output to the power spectral density of the signal al the primary input Show that

D(zJ R::: 11I(z)G(z) + p",r(zW·

l (iv) Show that (he result obtained in CUi) may be written as D(z) !::;: Pttr(z)

, Ppri(z)

when Prcr(z) <K [8(z)[·IG(z)l.

P3.l7 Consider the noise canceller set-up shown in Figure P3.l7. (1) Derive an. uceonstrained Wiener filter Wo(z).

Oi) Show that the power inversion formula (3.146) is also valid for this set-up.

den) +:t: e(n)

v(n)

FlgunH>3.17

P3.18 Consider an array of three omni-directional antennas as in Figure P3 . .I8. The signal, s(n), and jammer, v(n), are narrowband processes, as in Example 3.6. To cancel

Problems

87

the jammer, we use a two-tap filter, similar to the One used in Figure J.U, at either of the points I or 2, in Figure P3.18.

(i) To maximize the cancellation of the jammer, where will you place the two-tap filter? (ii) Peryour choice in (i) cnd the optimum values of the filter tap weights.

(iii) Find sn expression for the signal and jammer eomponents reaching the canceller output, andeonfirm the power inversion formula.

F1gure P3.1'8

P3.19 Consider an array of three ornni-directional antennas as in Figure P3.J9. The signal, s(n), and jammer, lI(n), are narrow-band processes, as in Example 3.6.

(i) Find the optimum values of the filtertap weights that minimize the mean-square error of the outpur error e(ri).

(ii) Find all expression for thceanoellar output, and investigate the validity of the power inversion formula in this case,

sen)

den)

Figure P3.19

88 Wiener Filtel'S

P3.20 Repeat P3.19 for the arrayshown in Figure P3.20., ano gC!m]:la~e tile results obtained with those o( P3.19.

(}

Q

4

sen)

den)

Eigenanalysis and the Performance Surface

yen)

Figure P3:20

Tbe tra.nsl'ersalWlener filterwM il1£rothlC'eil in rhe previous chapter as a powerful signal }Xfocessing structure with a unique performance funtfioD> -which has many desirable features lor adaptive filtering applications, In particular, it was noted that the performance function of the transversal Wiener filter has 11 unique global minimum point which can be easily obtained using the second-order moments of the underlying processes, This is a consequence of the 'fact thacl the performance function of the transversal Wiener filler is a .convex quad ratic Iurrction of j rs tap weJghts.,

QUr goal in this. chapter is to a:Tlalys~ in detail Ute quadratic performance Iuncrion of the transversal Wiener filter. We get a Clear picture of the sh<j.p:e of the performance. funct10n when it is visualized asa surface in the (N +1 )-dimensiona I space of variables consisting of the filler tap weights, as. the first N axes, and the performanoe fun ctlon r as the (N + 1) th axis, This is called the perfonnence surface.

The shape of the perfonnao'ce surface of a transversal Wiener filter is closely related to 111:(: eigerrvalnes of the correlation matrix II of the filter tap inputs .. Heoce, 'we start wifh a thorouga discussion on the eigenvalues and eigen vee tors of the correlation matrix R.

F3.21 To prevent time averages and derive [he results presented in Example 3.6 through e~J>emble averages, the desired signal and jammer may be redefined as s(ll) = a(l1) 005(nwo + 'Pt) and l'{n) = .8(11) COS(IIW<;> + 'Pl), .. respectively, where i{Ji and 'P'.!. are random initial phases of the carrier, and assumed to he uniformly distributed in the interval -1l' to +11:. The amplitudes Q!\Il) and (3(II) , asin Example.3.6, are-eacorrelated narrow-band baseband signals, Furthermore, the random phases 'PI and!(J2 are .<l,'SsumeQ to be independent among themselves as wellas with .resP\lct to 6'.(11) and /3(n).

(I) Using the new definitions 0[:8(n) and lI(n) .• show that the same result as in (3.160) IS also obtained through ensemble averages,

(il) Sl10w that, for the present case,

a}o(ll) ...

e..,(n) = ~ + c1 [cm{llw~ +!PI} - cos((/! - .so)w" + 'PI)]

D""!p(n) r. ( )

- ~lC:OS 1!Wo +!f'2 -cos(n - Oo)Wo +'P2)J.

o;,+ua ..

4.1. Eigenvalues and Eigenvectors Let

(iii) Use the result.in (in to verify the poW~r inverSion formula in the present ease ..

R = E[x(Il)x'f(i'I)]

(4.1)



be the N X N correlation matrix or a complex-valued 'Wide~sense stationarv stoebastlc process represented by the N x I observation vector x(n) = [:ten) .~(rI- J) •.. x(l1- N + l)J1", where the superscripts Hand T .denote Hermitian and transpose, r-espectively.

A non-zetc N x I vector q is said ro be an eigenvector of R.if it satisfies the equation

Rq = >.q

(4.2)

90

Eige.Ranalysls and the .Performance Surface

Properties of Eigenvalues and Eigenvectors

91

for some scalar constant );. The scalar J. is called the eigenvalue of R assoeiated with the eigenvector q. W~note that if q is an eigenveetor of R,. tben for any nOD-ZW"O scalar t.!, aq is also aneigenvector of R, corresponding to the same eigenvalue, ..\. This is easily verified by multiplying (4.2) through by Q.

To .find the eigenvalues and eigenvectors ofR,we note that (4.2) may be rearranged as

We note that vHx(n) and x~(n)' constiture sparr of comples-conjugate scalars. This, when used in (4.6), gives

(4.7)

1

(R - >'rl)q = 0,

(4.3)

which bltol1'-nega:-ti'lle [braDY vector 1'.

From (4.7) we notel:batwben , is non-zero, the Hermitiaa ferDl"H-R" may be zero only when there is a consistent dependency between the elements of the observation yectotx(n), so that v~(n.) = 0 for all observatiousofx{n). Fora random process {x(n)}, this cap only happen. When {x(n)} consists of e sum of L sinusoids with L < N. io practice, we lind that this situation is very rare and thus for any non-zero ", "B R" is almost always positive. We tl]lu, say that the correlation matrix R is almost always positive definite.

. With this background, we are now prepared to discuss the properties of'theeigen values and eigenvectors Of the correlatien matrix R.

wh-ere I is the N x N identity matrix, and 0 is theN x I null vector. To prevent the trivial solution q = 0, thematrix R - ).J has to be .singalar, This implies

det(R - )J) = 0,

(4.4)

1

where det(·) denotes determinant, Equation (4.4) is called the eharacteristifwqUtlti{m of the matrix R. Thechata,Cteristic equation (4.4), when expanded, is an Nth order equation ill tile unknown parameter J.. The roots of this equation, which may be called ~,..\r, ... , >'i'1-I, are CbeeigenvaJ'Iletl of R. When A,S are distinct, R - Ail, for t = 0, I, ... ,N - I , will be of rank. N - 1. This leads to N elgcll:vectorsqo, q I, ... , IlJ,r -I, for the matrix R, wmoh are unique Ill' to. a scale Factor. On the other hand, when the ch_aract-cri.stic equation (4.4) has repeated roots, the matrix His said to have deg(':lle1'l:ue eigenvalues. In tbat case, the eigenvectors of R will net be unique, For example, if Am is an eigenvalue of R repeated p times, then the rank of R. - ..\ml is N - p, and thus the solution cif the equation (R - J.,,,I)q,,., = 0 can be any vector in a p:<limensi.onal8'I.IDspare of the N~dimell-Sl0ual cemplex vector spa.be. This, in.geueral, creates some cqnfUliion in ci:gemma1ysios of matrices which should be handled carefully, To prevent such confusien, in the dis.cus~on that follows, wherever necessary. we start wita the case thai lhe eigenvalues of R ate distinct. The results will then be extended to the case of repeated eigenvalues.

Property 1 The eigenvalues of the correlation matrix R au all real (HId non-negq!ive.

O~nsider an eigenvector .q;- of R and its ccrresponding 'eigenvalue Ai' These twO' an: reLat~d according to the equation

(4.8)

Premultiplying (4.8) byqr and noting that >I_{:is a scalar, we get

f

(4.9)

4.2 Properflas of Eigenvalues and Eigenvectors

Wt!. discuss the various. properties of the 'eigenvalues, and elgenvecters of rue correlation matrfx K Some Qf the properties derived here are directly related to the fa.ct thatthe correlation matrix RIs Hermitian andaon-aegative definite. A matrix A, in general, is said to be Hennltian if A = Ali. This, for the correlation matrix R, is. observed by direct inspect jon of (4.1 ) .. The, N x N Hermitian matrix A is said to be non-negative definite or positive semi.riejilLi te, if

The quantity <Lllq, on the tight-hand side is always real and positive. since it is the squared le~gth of'the vector IJj. Furthermore, the Hermitian form qr1Rql on the Left-hand side of (4.9} is always real and nos-negative, since the correlation matrix R ill nonnegative definite. Noting these, it follows from (4.9) that

Af ~ 0, far i = 0, 1, ... ,N - 1.

(4.W)

Property 2 T] 'Ii aJldqj are two. eigenvectors o.f the eotrelatlon matrix R that correspond to two of Its distinct eigeIlVOLf(es. then

I

(4.5)

(4./1)

1

for any N x I vector y, The fact that Ais Hermitian implies that ;t Av is real-valued. This- can be seen easily if we note that with the dirneasions specified. above, ~i A~· is a scalar and (vB Av)' = ("Ii AY )-H = -0 Av. For the cotrelation matrix R, to show that vl±Rv elm never be negative, we replace R from (4.1) to obtain

In ether words, eigenveC{prsassocitrli:dwfth tit!! disLin'ct eigenvalues oj the correlation matrix R {Ire mutually orthngonal.

j

v"R" = ",sE[x(n)xB(n)]v = Efv~(n)xH(n}vl_

Let Ai and >'j be the distinct eigenvahiescerrespouding to the eigenvectors. ~ and qj. respectively. We have

(4.[2)

92

Eigenanaly.s;is and th.e Performance Surface

and

(4.13)

Applying theoonjugate transpose on both sides of (4. U) and npting that >'1 IS a scalar and for the Hermitian matrix R, RH = R, we obtain

(4.14)

Premultiplying (4 . .13) by llI"H, post-multiplying' (4.14) by qj' and subtmd:ing tlJ.e two re.sulting equations, gives

(4.15)

Ndrmg, that >../ and >'J are. distinct, th:is gives (4.11).

Prop.erty.3 Let qO,.ql, ..... , qN _] be theeigi!lilieetors (Issocfaltui wttl: the dislim~t eigenvalues .Ao, >'It _ .. ,)w -I of the N X N C(HTi!iation matrix: R. respectively. Asmme lhal the ,eigenvectors q(h ql, ... , qN -I (Ire ,all normalized to have a lel~gtll oj unity .. and dejihe the Nx N matrix

(4.16)

Q is then a ullitary malr.f)r:, i.e.

This impliiM that the matrices Q alia QHar.e the rnlierse of each athe«.

To show this property, we note that the ijth element of the N x N matrix QllQ is the product of the i th row of QfI, whichis q~. and fuajth coharm of Q, which iSllJ. That is,

(4.18)

Noting this, (4.17) follows immediately from Property 2,

In cases where the correlation matrix R has one or more repeated eigenvalues, as was netedabove, attached to each of these repeatedeigenvalues there is a subspace of the sams dimension as thernultlplicity oftheelgenvalae.in whicli any vector is all eigenvector ofR. From Property 2, we can say that the subspaoes Lhat belong- to distiact eigenvalues are orthogonal Moreocer, within each subspace we can always find a s-et oforthogonal basis vectors which. span (he whole subspace. Cleatly. such a set is not unique, but can always be choseu. This means Ibat [or any repeated eigenvalue with PlUltipJiCity p, one can always find a set of P QrthogQD'a! eigeuveetors, Noting this, we: can say, in general, that fur any N x Ncorrelatitm matrix R, we cap always make a unitary matrix Q whose columns are made-up ef a set of eigenvectors of R.

Properties of Eigenvalues and Eigenvectors

93

Property 4 For any N x N correlation matrix 9.. we\C/I!1 always fold a set (;}j mutut!1fy ortlwgolra/,e.ig:eflveclolS. Sf/eh Q ses may be used as a basis to express any vector in the Nd'imenSf.01Ull space of complex vectors.

This property follows Irorn, the above. discussion.

Propert;y 5 Unitary Similarity Transformation. The correlation matrix R can always be decomposed as

(4.19)

wnere the matrix Q is made lip from a. set of ll'1il4'e1tg II! ortfiogonaleigertVec.tors oJ:R as

specified ia (4.16) and (4. 17), ..

(4.20)

and file order of the eigensalues ),0').1" _. , AN _I matches lito I oj the corresponding elg,em'ec/,ors ill the eolumns oIQ.

to prove this property We note lhlU the set of'equatioas

(4.21)

may be packed (o.geilieir as a single matrix equatiea

RQ=QA.

(4.2Z)

Then, post-multiplying (4.22) by QH and noting that QQH = I, we can get (4.19).

The right-hand side of (4.19) may be expanded as

N.-I

R = L \q,.qp.

i~O

(4.23)

PrOperty 6 Let ..\(1) AI " ••. ).,N _ I be the eigenvalues of the correlation matrix R. Then,

N-I

rr[RJ = L );1"

i~Q

(4.24)

where ti"[R] denotes trace of R and is deji11ed (1S the-sum of tile dingonal elemems afRo

Taking the trace on both sides of (4 .. 19), we get

te[R] = tr[QAQH].

(4.25)

94

Eigenanalysis an~ the Performance Surface-

To proceed. we may use the following result from matrix algebra.H' A and B are N x M and M x N matrices., respectively, then

li'{ABj = tr[BA,1,

(4.26)

Using this result, we may swap QA and QH on the right-hand side of (4.25). Then, noting that QElQ = I, (4.25) is simplified as

tr[RJ = trfAJ.

(4.27)

Using definition (4.20) in (4.27) completes the proof.

An alternative way of proving the above result is by direct expansion of (4.4); see Problem P4.8. This proof shows that the identity (4.24) ill not limited 0 the Hermitian matrices. It applies tu any square matrfx,

Property 7: Minimax Theorem' The distinct dgrtfll'ailros AO > A) > ... > A"'_I of the correlation matrix R of all observation vector x(n). lJ11d their corresponding eigenvectors, qo, q], ...• ~-I' may be obtained through I.beJollOJvilig optimization procedure:

(4.28)

and for i = 1,2, - ..• N - 1

(4_29)

with

J

for a $j < i

(4.10)

where IIqjll ~ m; denotes (he lengih or nannof the c()li!plex l'ectorlJi·

Altem(l1ively. the following procedure may also be I(se,d to obtain rheeig:ellv(uUl'S of the correlation matrix R, ill the ascending order:

J

(4.:n)

Gild/or i = N - 2, ... , 1,0

(4_,32)

with

J

(4.33)

J

1 In lhe: matrix algebra literature, the minimax theorem is usually stated using the Hermitian form q;HRql instead ofE[lqHx(nWl. see .Haykin (1991), for example. The method that we have adopted'here is 10 simplify some of our discussions in the following chapters. This: method has been adopted from Farhang-Boroujeny and Gazer (1992),

Properties of Eigenvalues and Eigenvectors

95

Let us assume that tbe set of vectors that satisfies the minimax optimization procedure are the unit-length vectors Po, PI-, ... ,Pi\' _ t· From Property 4 we recall. that the eigenvectors qo, ql j'" I qN _1 arc a set of basis vectors [or the N-dimef\$ional complex vector space. This implies that. we may write

N-I

Pi = L (}:ijqj, for i = 0, 1, ... , N - 1, 1"",0

(4.34)

where the complex-valued coefficients ll:iiJ> are the coordinates of the complex vectors Po, PI,··· ,PN-l in tb,e N-climeJlsiooal space spanned by tile crasis v-cotors:!'>, 'h,··· ,qN-J' Let Po be tbe unit-length complex vector which maximizes B[I~x{n)1 J. We note that

E[IJl~x(n)121 == E[Nx(ll)xH(n}poJ

H . H

= Po E[x(n)x (n)lpo

= P'lj'-Rpo.

(4.35)

Substinrtiag (4.23) in (4.35) we obtain

N-l

EUp~x(lIW"] = l:= >'i[l~qiq~Po-

1=-0

(436)

Using Property 3, we get

H •

Po q, = (to;

(4.37)

and

(4.3&)

S-ubstituting these in (4.36) we obtain

N-I

El~ x{n)\2] = L ,\\00;\2. 1=0

(4.39)

On. the other hand, we may note -that, since Ao > >'ll A2, ... ,AN -J'

{{-I lV-I

L A;laol S Ao L IO'orl2

;=0 ;=0

( 4.40)

where the equality holds (i.e, Po maximizes EUp~x(I!)12J) only wben(.tQ; = 0 for i = 1,2, ... , N - I. Furthermore, the faer the Po is constrained to the hmglb of unity implies that

N-J

LI%12=1.

<=0

(4.41 )

96

Eigen3'na/ysis and the Perform.,nce Surface

Properties of Eigenvalues and Efgenv,eclors

97

Application of(4.39) and (4.41) in (4.40) gives-

(4.42)

PrQperty8 The eigtlWall!f!s of the carreiauon matri» .R of a df .. serete-iime stationary stochastic process {~(;l)} are bounded' by the fninim{LI.Il and maximum ~ahies of the power spectral density. !Ii", ... (eJ"'). of the process.

Pn =~ooqil with 10'001 = 1.

(4043)

Tlre minimax theorem, as introduced in Property 7, views the eigenvectors of the correlation matrix of a discrete-time stochastic process as the conjugate of it set of tap-weight vectorscorresponding to a set of FIR filters which are optimized in. tlre minimax seuss introduced there. Such filters are conveniently called eigel/filters. The minimax optinrizaricn procedure suggests that tlm.eigenfi1ters may be obtained through amaximizatinn ora minimization procedure that looks at the output powers of the eigenfilters. In particular, the maxim urn and minimum eigenva:lues of R may be obtained by solving the following two independent problems, respecticely:

and this is achieved when

We may now that the factor Oiilo is, arbitrary and has no significanc-e since it does not affect the maximum in (4.42) because of the constraint 101001 = 1 in (4.43). Hence, without any loss of generality, we assutheaoo = 1. This gives

(4.44)

(4.50)

asa solution to the maximization problem

and

(4.45)

(4.51)

The fact that the solution obtained here is not unique follows from the more general fa.ct tha'( tlreeigenveetor corresponding to ali eigenvalue is always arbitrary to the extent of a scalar multiplier factor. Here, the. scalar multiplier i:s eoastrained to have a modulus of unity to satisfy the condition that both the pfand lL vectors are constrained to the 19n91h of unity;

In proceeding to lind P l s we Dote that the!;OnSlraillt (4_30)., for i = I, implies that

L(!tQ;(.z) denote the system funetion of the i th eigenfilter of the discrete-time stochastic prosess {x(n)}. Usip:.g: the Parseval's relation (equation (2.26) of Chapter 2), we obtain

(4.52)

pJ'Iqo = O.

(4.46)

With the ceustraint Ilqlll = 1, this gives

This in turn. requires 'PI to bdi.oti1.ed to a<l'inear combination or ql, CU, .. ',~_I> only. Thal is,

(4.53)

N-I

PI = E rliJqj· j~1

OD the other band, if we detlne .i;(1I) as the outPl,lt of the i th digetrliltet .of R, i.e,

(4.47)

(4.54)

Noting this and following a procedure similar to the one used to find Po, we gel

tbj!D. using the power spectral density relationships provided in Chapter 2, we obtain

(4.48)

(4.5.5)

We may also recall from the results presented in Chapter 2 that

and

(4.49)

(4.56)

Following the same procedure for the rest of the eigenvalues and eigeaveetors of R completes the proof of the first procedure of the minimax theorem.

The alternative procedure of the mimnrax theo relJj,. suggested by (4.3 J )-(4.33), can also be proved isa similar way.

Substituting (4.55) in (4.56) we obtain

(4.57)

98

E'igenanalysfs and the Perfarmam;e Surface

Properties or Eigefllfalues and Eigenvectors

99

This .result has the following interpretation. The s.fgllQ/ pOWl!F tu the output qf the ill! eige.ujffler' oJ the correlation matrix R ala stochastic process VIiC(n)} isgiven by a weigh/ltd. a'letageoj,/repoj~er speCtral dsnsiiy~f {Xenn. The weigli.till:gfiiRctWn used/or U1'froging is the sqiiarf!t/ magnitude response. pI the corresponding elgi'mjifler.

Using the above results, (4.5"0) may be written as

Using (4.66) we obtain

(4.67)

Substituting for R from (4 .. 19) and assuming that theeigenvectcrsq-, Ill, ... ,'IN _ I' are normalized to the length ofuoity2 SO that QHQ = I. we obtain.

(4.58)

(4:68)

subject to the constraint

(4.59)

Noting that A is.a diagonal matrix, this cllilaily shows that the &.Iemc.nts of x'(I1) are uncorrelated with one another.

n is worthnoting that the ith'elcmcnt of x(n) is the outputof the itheigenfilter of the correlation matrix of the process {x-(n)}, i.e, the variablex;(n)as defined by (5-.54). Thus, an alternative way of statiag Property 9 rs to say that tIre eigenfillel"s tlss~dated with a prOIi,€_FS x(n) may be selected so tha; their o!ifPIiI" samples, at any time i.m;um/ n, canstituie a set of mutually .orthogonlil raIJdom variables.

it may also be noted that by premultiplying (4.66) wifh Q and using QQH = .I, we obtain

We may also note that

(4 .. 60)

where

X(II) = Qx'(n).

(4iI9)

(4.61)

RepLacing.x'(n) by the: columa vecrot IX~(II) A) (n) ... xN_! (n)]T, and e}"'PaD9i,ug (4.69) in terms oftbe elements of x/en) and columns of Q, we get

Witb the censtraint (4.59), (4.60) simplifies to

21 1" ] Qo(efl"') ]2q,:u (e1"') dw fq,;,:~.

7T ~j1'

(4.62.)

N-I

x(,,) =2: X;(iI)q,. 1=0

(4.70)

I

Using (4,.62) in (4.58) we obtain

Tbis is known as the KaFbulIen-LtJe've e,-,:paf!sii:m.

1

(4.63)

J

I

i

EXlDllple 4.1

Fonowiu~ a similar procedure, we may also find that

(4_64)

Consider a stationary random process V{1!J1 that is generated by passing ;J. rea l-valued stationary zero-mean, unit-variance, while noise. process {lI{l1)} througl» a system with the syStem function

where

(4.65)

v1 - r2- ff(;;) =-1. ----I' -QlZ

(4.11)

Property 9 Let x(n) be all observation vector willi' the correlation matrix R. Assume-that QO,q'b ... , qJif-1 we a set o!ort}wgonaleigenvectors afR and {he matrix Qis defined as in (4.16). Then, the elem(mts.oJ lhi! vector

where uis II real-valued ceastant in the range -1 t9 + 1. We wantto ..rerify sCl"mE of the results developedabove.for the process {",(/lH.

We note lli.a.l for the Imit-va:rianEll white noise process {''In)}

(4.65)

<l!",,(z) = L

constinue a seiof uncorrelated random variables. The tronsj-ornUi:Ii(m iJejil'led by (4,66) is caned the Karhwen.-Loe.W! transform.

\!CThls ill net necessary for the above property to hold. liowe"ver,it is a useful assumption as it

simplifies Our disclJ:j.s!~n.· ,

10Cl

At 0, using (2.80) Qr Chapter 2 and noting that it is real-valued, we obtain

Eigenanalysis and the Performance Surface

Taking an. inverse a-transform, we gel

¢"",(I.:-) = oJkl, [or k = ... -1, - I, 0, 1,2, ....

Properlies of Eigel"Jllalue'sand Eigenve.ctors

101

(4.72)

From Property g we recall that the eigenvalues of the correlation matrix R are bounded by the minimum and maximum values of 'li .... (e/-'). To illustrate this, in Figures 4.2{a), (b) and (c.) we have p/Qtfed the rninimlIDl811d marimmiI ei~lll:m:sof R for values of a = 0.5,0 . .75 and 6..9, as Pi varies from 2 to zo, It lDay be noted that the limits predicted by the minimum "and maximum values of -»=(ej",) are achieved asymptotically as N increases, However, for values of a close to One) such limits are approached only when N is: very large. This may be explained using the concept of eigenfliu:rs .. We note tllai when c:r is close [0 one, the peak of the power spectral density function cI>.q(eI~') Is very narrow; see the case of a = 0.15 in Figure 4.1. To pick up this peak a-ccurately. an eigenfilter with a very narrow pass-band (i.e. high selectivity) is required. On llie other hand, a narrow- band filter can be realized only if the filter length, N, is selected long enough.

(4.1])

Using this result, we :ll:od that the correlation matrix uT an N-tap transversal filter with input

{x(n)} is

[j_, T ~-'1
a a~ ...
a ... »> (4.J4)
R= . .
_N-l aN-l
a Example 4.2

Consider the ease where the input process, {Xffj)}, to an N-tapU'a.llSversal filter consists of the summation of a zero-mean, white noise process, {u(Il)}, and a complex sinusoid. {c1(".p~Oi}, where e is an initial random phase whieb varies fOT differenr realiza tions of tile precess, The correlation matrix of {X(II)} is

:Nex:t, we present some numerical results that demonstrate the relarionships between the power spectral de~ily of the process {.~{,,)},1>.<Ae-'""), and its Corresponding eorrelation matrix,

F~re 4_1 shows a set of (be plots of <I>"",(ef.» for values of IX = O. 0.5, and 0.75. We note that Q = 0 corresponds 10 the case where {x(n)} is white and, therefore, irs. power spectral density is flat. As e inereases frem 0 t.o J, {x(n») becomes more coloured and Ior values of n close to L, most of its energy is concentrated around w == 0.

~

w6

6j \

D' \

__.J5

« a:

54

w a..

(j)a- \

a: ........

w

~2

a..

8~------~------~------~------~-------'

7

\

\

'. a=O.75 \

,~ ....

" ........ u=O

1~------~~~~----------------------~

a=0.5

el.·• .. o1,N-'" 1
f~ 1 ... eJ{N-2Jw.
R = 0;1+ e . , (4.75)
e-J(;-1).>. e-j(N-2)"", where the first term Q!J the right-hand side is the correlation matrix of the white noise 'process and the sI:C0Dd krill is thai Qfthe sinusoidal process. We are interested in finding the eigenvalues and eigenvectors ofR These are conveuiently obtained through the minimax theorem and theconcept of eigenfil ters.

Figure 4.3 shows the power spectra] density of the process {X(II)}. It consists of a fiat level which is contributed by {II(")} and an impulse at w = Wo due to the sinusoidal part of {x(nl}. The eigcnfilter that picks up rnaximunrenergy of the input is the one that is nwelled to the sinusoidal part of the input. The coefficients of this filler are the elements.of the eigenfiltcr

(4.76)

The factor l/.JN in (4.76) is to normalize 'l!l to the length of uniry, The Vector qu can easily be confirmed LO be an eigenvector of R by evalnati ng Rqo and noting that this gives

(4.77)

This also sh0WS that the eigenvalue 'corresponding to the eigenvector qo is

(4.78)

..... _ - --

._ . - .. - . - . - . - ,-_ ~- -: :. = . =- .. - . - .. -

OL_ ~ L_ ~ l_ __

o 0.1 0.2 0.3 0.4 0.5

NORMALIZED FREQUENCY

Figure 4.1 Power spectral density of {x{n)} for dltlerenl values 01 the parameter f"<

Also. [rom the minimax theorem, we note that the ftS! of the eigenvectors of R have to be orthogonal to qo. i.e,

(4.79)

1(12

Efgenanalysis and the Performance' Surfaoe

10°

10~~--------~----------_L----------~--------~

o 5 10 15 20 N

(a)

,q,max

_w _

q.,mlll xx

10°

A .

mill

o

5

10 N (b)

15

20

Figure 4.2 Minimum and m~l(imlJm elg'Emvalues of the eorretatlon matrix for d [fferelltvalu!'js of the parameter n':(a) 0: = 0.5, (b) tl< = 0.75, (e) a = 0_9

Properti'es of Ej,genvalu-es and Eigenv,ectors

104'--------'------------''----------''------.-....J

o 5 10 rs N

(c)

Figure 4..2 Conti!luf}o

5·.-------.-------.-------.-------.-------.

4.5

0.5

;: 4 Ci5

Z 3.5 w

Cl

_j 3 <

a:

t)2.5 lJJ

@) 2

a:

~1.5

o

a,

0.2 0.4 0.6 O.B

NORMALIZED FREOUENCY

103

20

Figure 4.,3 Power spectral (lensi'ty of the process {x(n)} consislIng of a white noise plusa single lone sinusoidal sigflal

104

Eigenanalysis and the Performance Surface

Using-ibis, it is not difficulL (see Problem P4.7) to show that

Rq( = U;qit for i = I, 2, ... , N - I.

(4:80)

This result showsthat ·as long <ci (4.79) hold, the eigenvectors q 1.!Il, •.. ,!Lv -1 of R are arbitrary. In other words, any set of vectors which belongs to the subspace orthogonal to the eigenvector ql) makes an -aceeprable set for the rest of the eigenvectors of R Furthermore, the eigenvhlues corresponding to these eigenvectors an: all equal to u!.

4.3 The Performance Surface

With the background developed so far, we are now ready to proceed with. exploring the performanee surfaeeof transversal Wiener filters. We start with the case where the'; filter coefficients, input and desired output are real-valued. The results will then be extended to the complex-valued case.

We recall from Chapter 3 that the performance function of a transversal Wiener filler with a real-valued inpur sequence x(n) and a desired OUtpUL sequence d(n) is

(4.81)

where the superscript T denotes vector or matrix transpose w = lWu W1 ••• lVN_l]T is the filter tap-weight vector, R = E[x(n)x r (1'1)1 is the correlation matrix of the .filter tap-input vector x(n) = [x(n) X(71- 1) ... X(I1- N + I W, and Jl = E[d(n)x(n») is the cross-ccrrelatien vector between d(l1) and x{n). We want to stntiy the shape of the performance function e when it is viewed as a surface in the (N + J )-dimensional Euclidian space constituted by the filter tap weights Wi, i = 0, I, ... ,N - 1 ,and the performance function, ~.

Also, we recall that the optimum value of the Witmer filter tap'-weighl vector is obtained frorn theWiener-Hopf equation

(4.82)

The performance function' may be rearranged as: follows:

(4.83)

where we have aoted tliat wT ¥ = p1'w. Nex::, we substitute fOT pin (4.83} frorn (4.82) and add and subtract the term woRwo to obtain

(4.84)

Since RT = R, the first four terms on the ri.ght-haud side of (4.84) can be combined [0 obtain

(4.85)

The Performanc{J Surface

105

1

2

4

5

3

Figure 4.4 A typical performance surface of.a two-tap transversal filter

We may also recall from Chapter 3 that

(4.86)

where (min is theminimum value of ~ which is obtained when \\' = wII. Substituting (4.86) in (4)~5), we ~et

(4.&7)

This re-sult has the following. interpretation. The Don-negative definiteness of the correlation matrix R implies that the second term on the right-hand side of (4.87) is non-negative. When R is positive definite (a case very likely to. happen in practice), the second term OD the right-hand side of(4.87) is zero only when Vi' = w", and in that case. € coincides with its mi.ni:m.mn value. This is depicted in Figure 4.4 where a lypic-oJ performance surface of a two-tap Wiener filter is presented by a SeL of contours which correspond to different levels -of t;, and

£mill<6 <Q< ....

To proceed further, we define the vector

v~w-w"

(4.88)

106

Eigenanalysisand the Perfo1'm~nce SUrface

and substitute it in (4-.87) to obtain

(4.89)

Tijis simpler form of the 'performance funorion in effect is equivalent to shifting the origin of the N-dimensional Euclidian space defined by the elements of w to the point W = wQ' The new Euclidian space has a new set of axes given by va, Vi," . ,V'N-l; see Figure 4.4. These are in parallel wrth the original axes Wo. I~'!, ... , wiI'_I' Obviously, the shape of the performance surface is not affected by the shift in the orlgin.

To simplify (4.89) further, we use the unitary similarity transformation Le. (4.19) of the previous section, which for real-valued signals is written as

(4.90)

Substituting (4.90) in (4.89) we obtain

(= {min + vTQAQTv.

(4.91)

We define

and note that multiplication of the vector v by the unitary matrix QT is equivalent to rotating the I}-axes toa new set ofaxe.s gives by vb, v: , . . 'lI~. _ I, as depicted ill Figu re 4.4. The new ax:e~ arein the directions specified by the rows of the transformation matrix QT. We may further aote that the rows of Q T are the eigenvectors of the correlation matrix R. This means that the J -axes, defined by (4:92), are in the directions of the basis vectors specified by the eigenvectors of R.

Snbstituting (4.92) in (4.91) we obtain

~ = ~"'in + vITAv'.

J

(4.93)

This in known as the canonical form of the performance function, Expanding (4,93) in terms of the elements of the vector v' and the diagenal elements of the matrix l1., We ~l

N-I

.; = {min + L A{uf

;-=;;-0

(4.94)

This, when compared with the previous forms of the performance funerion in (4.81) and (4 . .89), is a much easier function to visualize, l.J;J. particular, if all the variables 'Va, v'1 , ••• , IJ~ _I e except v~, are set to' zero, then

(4.95)

This is a parabola whnse minimum occurs at vic = 0, The parameter Ak determines the shape of the parabola, in the sense that for .smaller values of Ak the resulting parabolas

the Performance Surface

107

0.9
0.8
0.7
0.6
0::
-rEa. 5
>V'
0.4
o.a
0.2
0.1
0
-1 -0.5 o 0.5

v'

k

Figure 4.5 The effect oteigenvalues on the shape sf the performance fu notion when en!y one 91 the tillw tap weights is v;:,!l'ieQ

are wider (flatter in shape) when compared with those obtained for la~ger values of ;\". TIlls is demonstrated if Figure 4.5 where /,£, as a function ofdlc> is plotted for a few values of Ak.

When all varia bles 'lI~, 1111 •.• 1 v;v _ 1 are varied simultaneously, the performance function €, in the (N + 1)-dimen,~1onal Euclidian space, is a hyperparabela, The path, traced byeas we. move alDng any Df the axes va, u;, .... vI. _ ( is a parabola whose shape is determined by the corresponding eigenvalue.

The hyperparabola shape of the perfcrmancc surface can be best understood in the case of a two, tap filter when, the performance surface can easily be visualized in. the Jdimensional Eucl idian space whose axes axe the two independenttaps of the filter and the function ~; see Figure 3.4 as an example, Alternatively, the contour plots, such as those presented in Figure 4.4, may be used to visualize the performance surface in a very convenient way.

FOI N = 2, the canonical form of the perfbrmance function is

(4.96)

This may be rearranged as

(')1 (')2

Vo .r. ~ = I

all , at

(4.97)

108

Eigenanalysis and the Performance Surfaoe

Sr---~----~---'----~----~---'

2

-2

-3 l___ __ --'-- --'-_-'- --'- L___--'

-a

-1

2

3

o

v' o

-2

FigUre 4.6 A typical pial of the ellipse defined by (4.97)

where

a _It -(rru'n

0- ---

>"0

(4.98)

and

(4.99)

Equation (4.97) represents an ellipse whose pr.neipal axes are alollgv~ and 1I~, and for (11 > (la, 'the lengths of its. major and minor principal axes are 201 and 2ao, respectively. These are highlighted in Figure 4.6, where a typical plot of the ellipse defined by (4.97) is presented, We may also note tha"tttl /110 = .j >.,;/ >'1. This implies Uta! for a particular performance surface the aspect ratio of the contour ellipses is fixed and is equal to the square root of the ratio of its eigenvalues. In other words, the eccentricity of the contour ellipses of a performance surface is determined by the ratio of the eigenvalues of the corresponding correlation matrix. A larger ratio of the eigenvalues results in more- eccentric ellipses and, thus. a narrower bowl-shape performance surface.

The Performance Surface

109

Example 43

Consider the ease where a two-tap transversal Wiener filter is characterized by the following parameters:

We. want LO explore the perforrna:uce surface of [his filter for values of u .ranglag rrorn 0 to 1.

The performance function of the filter is obtained by substituting the above parameters in (4.81). Thi!! gives

~ = [WD IVtl·[l a] [!Po] -2[1 11["'°] + 2.

. a. J W] WI

(4.100)

SQJvin-g the Wiener -Hopf equation LO obtain the optimum tap weights o:fthe filter, we obtain

[WD.ll] =R-Ip"" [1 "'1-I[l] = [I~tll.

11'0,1 Dc I I 1

1 +cx

(4.101)

Using this result, W_e gel

(4.102)

Also,

20; [' 11] [t1U]

= 1 + 0'. -t- [Vo lid Q 1 VI'

(4.103)

To convert this to its canonical form, we should first find the eigenvalues and eigen vectors of R. To find the eigeavalues of R, we should solve the characteristic equation

I), -I -(} I

det(XI - R)= = D.

-Q_ ).,-1

(4.104)

Expanding (4. 104), we obtain

(), - 1 f - (:i = 0,

which gives

(4.J 05)

and

(4.106)

110

EigenanalYs;s and the P,erformance Surface

[-\J'~ I -ti.] I[q."ou] =0

-Q -\J - 1. qDI

.(4.107)

-,

and

[,),1 - 1 -0;

--a ], [qT~] _ 0

..\1- t qll -.'

(4.108)

Substituting (4. Um IIlld C4.1(6) io (4.Hi7) >lud' (4, 1G8), respecti·vely, we ,tb¢aio

Using rhese:res\l.lls a:nd normalizing qQ andq, to have lengths Qfl,lIlHy, we obtain

1

1 ![ I ]

and 'ql = f;i .'

v2,-1

II may be noted thatth(H.'ligenvectors. qo and q] ofR are indej;)el1d\lot of'the parameter IX, This is an interesting.property of the correlation matrices of two-tap transversal filters which Implies that the .J·ljxes are always obtained. by a 45 degree rotatien of the t!-ax~. The eigenvectors associated with the eerrelation matrices of three- tap transversal filkts also h ave so me special (OrIll. This is discussed ill Problem 1'4.5.

Wf.tll the above results, we .gel

[vb.] I. [I

'U~ = .;"2 .. 1

(4.109)

and

(4.110)

Figures 4.7(a.), (b), and (c) show tbe comour plots of the penoU\l1l1lce surface of the two-tap transversal filler for a = 0.5, 0 .. 8 and 0.95, whiehcortespond to (he eigenvalue.ratios af], 9 and 39. resj?eotiv'ety. Th6$e plots clearly show how the ~ntriyi):yof theperformance surface changes as (he eigenvalue ratio of the correlation matrix R inereases.

The above results may be generalized as follows. The performance surface 'of an N -tap transversal filterwith real-valued data i!) a !;typerPau:bQloid ill the (N + I)-dimensional Euctidian space whose axes are ihe N ulp-weight variables of the fllter and the performance function {. The performance function may also be represented by a set of uyperel1ipses in the N-dimWisional Euclidian space ofthe filter tap-weight variables. Each h)lpcreUipse corresponds to a fixed value of €. The direclions ofthe principal axes of the hyperellipse.s .are determined by theeigeaveetors of the correlation matrix R. The size of the various princjpaJ axes of each hyperellipse are proportioaal to the square fool of the inverse of the corresponding: eigenvalues. Thus, the eccentricity of the hyperellips0s is determined by the spread of theeigenvalues of the.eorrelatien matrix R. This shows that the shape of the performance surface of'a Wiener FIR filter is directly related to the spread of the eigenvaluesof R. In additlon, from Property .8 oflb,e eigenvalues and

The perform<l.nce Surface

111

-11
-3
-4
-s
-5 0 5
we
(a)
5

4 ~4

4.9

_5~~----~----~_L--~------~~~

-5

5

(b)

Figure 4.1 Performancitsllrisce ot a two-tap tran:sversa'l filt,er lor ,ditfe~entaige.nval.ue spread

oj R: (a) >.01>'1 = 3, (b) ),,0/.\1 = 9, (c)).,;/ A\ = 39 .

112

Eigenarralysls and the Performance Surface

(e)

Figure 4.7 Contlnuad

eigenvectors we .recall that the spread of U1C eigenvalues of the correlation matrix of a stochastic process {X(fI)} is directly linked to the variation ill the power spectral density function <I>",,,(e-'Jd) of the process, 11J.i& in turn, means that there is a close relationship between the power spectral density ofa random process and the-shape of the performance surface of an FIR Wiener filter for which the latter is used as in put.

The above results cau easily be extended to the case where the filter coefficients, input and desired output are complex-valued. We should remember that the elements of all the involved vectors and matrices are complex-val ued and replace all the transpose operators in the developed equations by Hermitian operators. Doing this (4.93) becomes

(4.111)

This can be expanded as

.'.1-1

€ = €min +L: ,\1'11\12.

i'=(\

(4.1 (2)

The difference between this result and its dual (for the real-valued case) in (4.94) is an additional modulus sign on the drs, in (4.112). This, of course, is due to the fact that here the v~sate complex-valued.

The performance function { of (4. ! 12) may be thought of as a hyperparabola in the (N + 1 )-dimensional Space whose first N axes are defined by the complex-valued

Problems

113

variables, ilie'lJ:s and its (N +l)th axis is the real-valued performance fnnctiOll~. To prevent such a.mixed domain and have a clearer picture of the performance surface in the case of complex signals, we DJ..a)' expand (4.112) further by replacing V; With vtR + iJ;). where 1/~1t and v~,J are the real and imaginary parts of t1;. With this, we obtain

/II-I

{ =€l11in + L ArC V0R + vfd·

;=0

(4.113)

Here, uta and 1J~1 are both real-valued variables. Equation (4.113) shows that II/e performance surface of all N-tap transversal Wiener filter Willi complex-valued coefficients is a hyperparaboia ill the (2N + I )-dimcnsional Euclidian space of the variables consisting oj the real «TId imaginary parts of the filler coefficients and the performance function.

Problems

P4.J Consider the performance Iuncticn

(i) Convert this to its canonical form.

(ii) Plot the set of contour ellipses of the performance surface of ~ [or values of (= 1 2, 3 and 4.

P4.2 R is a correlation maITL'!:.

(i) Using the unitary similarity transformation, show that fer any integer II

Rfi = Q.A"QH.

(ii.) The matrix RIJ2 Witll the property R 1j1R 1/2 = R is.defined as the square-root of R.

Show that

(iii) Show that the identity

is valid for any rational number a.

1~4.3. Consider the correlation .matrlx R of an N x 1 observation vector X{fl), and an arbitrary N x N unitary transformatiun matrix U. Define the vector

xu(n) = Ux(,,)

and its corresponding correlation matrix Ru = E[xu(n)xS(n)].

114

Eige.nanalysis and fhe Pe.rform·ance Surface

(I) Sh.DW that H, and Ru share the same set 'of eigenvalues.

(li) Find an expression for theeigenvectors of Ru "in terms.of Lt.e eigtiDvectO[so[ Rand the transformatien matrix U.

P4.4 In Example 4.3 we noted that the eigenvectors of the correlation matrix of alLY two-tap transversal fitter with real-valued input are fixed and are

Plot the magnitude responses of the eigenfilters defined by qo and q'l and verify that qp corresponds to a: lowpass filter and ql eorresponds 10 a hlghpassone, How do you relate this observation with the minimax theorem?

-P4.5 Consider the correlation matrix R of a three-tap transversel filter with a realvalued input ."«(11).

(i) Show that when E[x"(II)] = 1, R has the form

[I PI

R=. PI 1 fh P·I

1

(ii) Show that

J

1

is an eigenvector of R and find its corresp-onding eigenvalue. (iii) Show that the other eigenvectors of R arc

for j = 1,2

J

where

1

Find the eigenvalues that correspond to ql tl(ldCJ1.

Problems

115

(iv) For the following numerical values plot the magnitude responses of the-eigenfihers defined by Ilo,QI and q,2 and find thai in all case.s these co Hf;)SpO rtd to bandpass, Icwpass and b~gbpass filters, respectively:

PI P2
05 0.25
0..:8 0 . .30
M -0.4 How do you relate this observation to the minimax theorem?

P4.6 Consider the correlation matrix R af aa observation vector.x('i1) .. Define the vector

X(II) = R-1/2x(n),

where a-1j:l is the inverse of Rl/2, and R 1/2 is defined as in Problem P4.2. Show thai the correlation matrix of X (II) isthe identity matrix.

P4.,7 Consider the case discussed in Example 4,2, and the eigenvector qo as defined by (4.76). Show that any vector q] thi;l.t is orthogenal Wllo (i,e. Qiliqo = 0) is a solurion to the eq-uation

P4.8 The determinant Man N x N matrix A can be obtained by iterating the equation

N-I

del(A) = L Q{JJcofo.J(A), ;=0

where au is the ijt.h element of A, and cof~{A) denotes the 'ijth cofactor of A which is defined as

cofij(A) = (-I)'+J det(Aq)

where Aq is the (N -1) x (N - 1) matrixobtained by d:eleting the ith row and jto celunm or A, This procedure is general and applicable to all square matrices, Us,? this procedure to show that

(i) <equation (4.24) is-a valid result for anyarbirrary square matrix A, (ii) for any square matrix A

N-I

deltA) = II \,

i=O'

where the ).;-S!ire theeigenvalues of A.

116

Eigenana'lysis and t/Je Per/:ormimce Sutface

P4.9 Give a proof for the rniaimax precedure suggested by (4.31}-(4.33).

:P4.10 Consider a tnter whose inPUL is the vector i::{n), as defined in Prablem P4.6, and its output iSy(Jl) "" wTi(ll), where wis N :>( 1 tap-weight vector oftbtdilLer. Discuss the shape of tile performance surface of this filter.

P4.1] Work out lhe details of the derivation of (4 .. 70).

P4.12 Give a detailed derivation of (4.111).

P4.13 The input process to an N-taptransversal filter is x(n) = o! eJw," + 0i ei.".,' + v(n)

where u'J and 0:1 are uncorrelated, complex-valued, zero-mean, random variables with var1an~ ·uiando1, respectively, and {I/(")} is a white noise process. with variance unity.

0) Derive an equation fer the sorrelatlon matrix R, of the observation vector at the filter input,

(li) Following an argument similar to tpe onein Example 4.4. show that thesmallest N - 1 eigenvalues of R are all equal tou!.

(iii) Let

'and

Show that the eigenvectors corresponding to the largest two eigenvalues of R are I

and

where aoo, 0'01, [1:10 and (Xt J are a set coeffieients to be found. Propose a minimax procedure for finding these coelfieients.

(iv) Find the coefficiclllS aoo. Cl'.m ,.alO and Cl'l;1of part (iiI) in [he case where n:1;'u! = O.

Discuss the uniqueness of theanswer in the cases where r:ti of c{'and cri =tl'~.

P4.14 Equation (4, t L3) suggests rhat the performance surface of an N-tap FIR Wiener filter with complex-valued input is equivalent to the performancesarface of a iN-tap fitter with real-valued input. Furtliermore, the eigenvalues corresponding to the Latter surface appear with multiplicity of at I.ea$t two. Tb.i~ problem suggests an altemative procedure which also leads to the same results.

Problems

117

(i) Show th.'al; t.he Hermitian. fgrm wHRw may be expanded as

where Ute subscripts Rand r rcl"ef to realand imaginary parts.

Him: Note that RT = -R[

T' .

v Rrv = U.

(ii) Show tbat theequation

and this implies that for any arbitrary vector v;

(p4.14-1)

implies

Also, multiplying (p4.14-1) through by i= v=t. we get R(jf.Jj) = ),,(JIt·). Show

that this implies ..

R~!a1e these with (4.113).

5

Search Methods

\

In the previous two chapters we established that the optimum tap weights of .3 transversal Wiener filter can be obtained bysctving the Wi ene r-Hopf equation, provided the required statlstics of the underlying sjgnaJs are available. We arrived ar this solution byminimizing a cost {unction that is a .qWlrinltic funcrion of the fitter tapweight vector. Ail altemative way of lindlng the optirnum tap weights gf a transversal filter is to use an iterative search algorithm that starts at some arbitrary initial point in the tap-weight vector space ana progressivsly moves towards the optimum tap-weight vector in steps. Eac·h step is chosen so that the underlying cost function is reduced. If the ami function is convex (wbiQ'h Js so for the tt",nsv'C!tsa! filter problem), then such an iterative search procedure is gnaranteed to conver:ge to the optimum .solution. The prtsciple.offineing the optimum tap"weight vector by progressive minimization .of the underlying cost function by means of an iterative algorithm is central to the development of adaptive algerithms, which will be discussed extensively in the forthcoming chapters of this book. Using a highly sii:hpLifiedlangu.age, we might state at this point. that adaptive: algornhms are nothing but iterative search a1gOrithms. derived for minimizing the underlying cost function with the true statisti<;s replaced by their estimates obtained in some manner. Bence,::! very thorough understanding of the iterative algorithms from the point of view of their development and convergence property is an essential prerequisite for the study of adaptive algorithms. This is the s.ubje,t'"f of this chapter.

In thischapter we discuss two gradlem-4)(l:sed iterative methods for searching rhe performance surface-of'a transversal Wiener filter to and the tap weights that correspond to its minimum point. These methods are idealized versions of tile class of practical algorithms which will be presented in the next few chapters. We assume that the correlation m(l:lrix of the input aamples to the .filter and the cross-correlation vector between the desired output and filter input are known a prioei,

The first method that we' discuss is known as the mellwd of steepest tk$celll. The basic concept behind this method is simple. Asstl1lling tha t the cost function to be rninhttized is convex, we may start with an arbitrary point on the performance surface and take a small step in Ute direction in which the cost function decreases fastest. This corresponds 10 a step along the steepest-deseent slope of the performance surfaceat that point. Repeating thiss\1tce$sively, convergence towards theb\Yttot).10f the perfonrul.1w¢ surface, at which point the set of parameters that minimize the cost function assume

120

Search Methods

Method of .Steepes.!; Descenf

121

wb!lre e(n) = den) - yen) is the esumasion error of !:he Wiener filter. Also, we recall thar the performance function e can be expanded as

(5.5)

where R = Elx(n)x T (n)J is the autocorrelatioa matrix of the filler input and p = E[x(n)d(n)J is the eross-correlation vector between the filter input and its" desired output. The function {(whose details wer-e given in lhe previous chapter) is a quadratic function of the filter lap-weight vector w .. It has a Single global minimum which can be obtained by solving tire Wiener-Hopf equation

(5.6)

(5.1)

if R.ii:nd iii are available. Here, we assume that .R" and P .areavailable, but resort to a different approach to find 11'oj. Instead of trying to solveequatioa (5.6) directly, we choose an itemtive search mel/lad in which starting with an initial guess for 1\'0' say w(D), a recursive search method that may require ruauy iterations (s,\eps) to eonverge to w" is used. An understanding of this method is basic to tile development of the iteratlve algarithms which are commonly used in !be implementation of adaptive filters in practice.

TI)e method Of steepest des'oonl is a general scheme that uses the fdHowmg steps to search for the minimum point of any convex [unction of a set of parnmeter:s;

I. Start with. an initial guess of the parameters whose optimum values ale to be found for

minimizing the function, -

2. Find the gradient of tile [Unction wiLh respect 10 these parameters at the present point,

3. Update the parameters by taking a ~tep in Lh.e, OPPosite direcrien of the gradient vector obtained Ia Seep 2. Tbis corresponds to a step in the dir6.ction of steepest descent in the cost function at the present point. Furthermore, the size of the step taken is ehesen proportional LO the size of the gradient vector.

4. Repeal Steps, 2 and 3" until mo further signifi.cant change is observed in the parameters.

To irnplemen ( this proeeduze in the case of the transversal filter showa in Figure 5 .. I ,

we recall from Chapter 3 that .

.Figure5.1 A tsansversat filter

.their OptiinULll values, is guat.allteed. For the. \:ralls-versal WIener tilteTS we 'liiId \:Dat tills method may suffer from siow' .convergence: The second methqd that we introduce can overcome this problemat the cost of additionel complexity. This, which is known as Newton's method, takes steps that are in the dU"COtiOD pointing towardsths bottom of the performance surface.

OUf discussion in till" chapter is limited to the case wher,e the filtertap wei~hts. input and desire:d output are real-valued 0 The extension of the resultS to UIe case of cQmplro:~ valued S1g11als Is straightforward and deferred to the problems a L the end of rhe chapter.

5.1 Mefhod of S1eepest Descent

Consider a transversal Wiener filter as in. Figure 5.1. The filter input, ](0(11), and its desired output, d(n),are assumed to be real-valued ~fiqu~.IlGes. The filter tap weights, Wo, lt'l , ... , II'N _ I i are also assumed to be real-valued. The lilte:r inpuland tap-weight vectors are define_d, respect] vel y. by the- column vectors '

and

\If= 2l.lw - 211,

(5,7)

X(I1) = [x(n) x(n -l) ... x(rl - N + lW,

(5.2)

where 'V is the gradient operator defined as the column vector

where the superscript T stands for transpose. The filter output is

(5.})

(5.8)

We recall from Chapter j that the optimum tap-weight vector Wo is the one that minimizes the performance function

According to the above procedure, if w(k) is the tap-weight vector at the k th iteration, then the. followmg recursive equation maybe USCg to update w(k):

e= E[~(n)l

(5.4)

(,5.9)

-122

Search Methods

Method of S.teepes!· Descenr

123

where /J> is a positive scalar cal!ed the step-size, and \l k~ denotes the gradient veetor 'i7~ evaluated at the point w =' w(k). Substituting ($.7) in (3 .. 9), we get

The vector reeurstve equation (5.18) may be separated into the scalar recursive equations

w(k + 1) = w{k) - 211'Rw(kj - p).

(5.10)

14(k+ I) = (I - 211),;)vl(lC-), for i = 0, 1, ... , N-l,

(5.19)

1

As: we shall soon show, the convergence of w(k) to the optimum solution 'i\'" and the speed at which this convergence takes placeare dependent on the size ofthe step-size parameteru, A large step-size may result in divergence of lhls recursive equation.

To see bow the recursive update w(k) converges towards W", we r-earrange (5.10) as

where -d;(k) is the itb element of I he vector Vi (IC),

Starting with a set ef initial values '110(0),1'/1 (0), .. _ ''UN _I (0) and iten~ting (5.19) k times, we get

w(k + I) = (I - 2pB;}w(k) + 2p.p,

(5J I)

vf(k) = (I - 211),;/V:(0), fOT f = 0, 1, .... , N - 1.

(5.20)

where 1 is rhe N x N idelltity matrix. Next, we substitute for p from (5.6). Also,wesubtract Wo from both sicks of (5.11) and rearrange the result to obtain

w(k + 1) - w(} = (I - 2p.R)(w(k) - wo).

(S.12)

From (5.13) and (5 .. 17) we see that II'(k) converges ttl-wo ff and only ifv'(k) converges to, the zero vector. But, (5.20) implies that v (k) can converge to zero if and onlyif the stepsize parameter p. is selected so that

I):eftning theveo-toI"l'-(.k} as

II - 2,u).;1 < I, for i = 0, I,. _ . , N - I.

(5.21)

v(k) = w(k) - We'

(5.13)

When (5.2I) is satisfied, the scalars u;(k), for i = 0:,1, ... , N - 1, exponentially decay towards zero as the number of iterations, k, Increases, Farthermore, (5.21) provides the condition for the recursive equations (S.20) and, hence, the Steepest-descent algori thin to be stable. The inequalities (5.21) may be expande€l as

apd substituting this in (5.12), 'We obtain

"(k + 1) = {J - 2.uR)'v(k).

(5.14)

-I < I - 2p.).., < I

This. is the tap-weigh! update equation in terms of the tt.-axes (see Chapter 4 for further diseussicn onths 'U-al'es). ThIS result can be simplified. funber. if \ve transform these to rhe v'-·axes {see (4.92) of Olap:ter 4 fOT the definition of li-ares). Recall from Chapter 4 that R has [be following unitarysimilarity decomposition:

or

(5.15)

I

o < j.l < >:;' for i = 0, I, ... ,N - 1.

where A is a diagonal matrix consisting of [he-e-igenvalues >.0, ),1, ..•• , )w _ I of R and the columns of Q contain the corresponding orthonormal eigenvectors. Substituting (5.15) in (5.14) and replacing [ wlth QQT, we get

Noting that the step-size parameter fJ is common for all values or i, eonvergence (stabWty) of the steepest-descent algorithm is-guaranteed only when

l'(k + I) = (QQT - 2JtQAQT)v(k) = Q(I - 2.uA)QTY(k).

. 1

O<U<A'

mil'

(S.23)

(5.16)

where "mo, is the maximum of the eige:nvah.les >'0,),1.' . - ,),1'1 _I, The left limit in (5.23) refers to the fact that the tap-weight correction must be-in the opposjtedirection of the gradient vector. The right limit is to ensure that all the scalar tap-weight parameters in

the recursive equations (5.19) decal' exponentially as k increases, .

Figure 5.2 depicts a set of plots that shows how a particular tap-weight parameter 11;(k) varies as .afunctlon of the iteration index k and for different values of the step-size parameter 1/... The cases considered here correspond to the typi-cal distrncr ranges of 11., referred [0 as oMrdamped (0 < jJ. < 1/2)",-), underdamped (1/2),; < it < 1/)../), and rm.rlable (, .. < 0 or /1-> 1/)",).

Premultiplying (5.16) by Q T and recalling the transformatfon

(5.17)

j

v(k + I) = (I ~ 2p,A}v'(k).

(5-18)

124

Search Methods

1 --

1~-----------------~

(b)

(a)

g '>

0.2

0·L-------~--~===d o

10 k

_1l..-------------'

o

20

10 k

20

40,....----------.--------..,

40r--------------~

(d~

Ie)

30,

~20

>

-20

10

20

10 k

_40l..-----~~---~'

o

.20

10 k

Figure 5.2 Convergence of" v~ (k) as a lunctionat iteration index. 11:, lor d iHel"enl val UIlS 01 the step-size parameter 1-'. (a) Overearnpedcase- 0 < I.l < 1/'2.>'1. (bJ underdarnpec case: 1/2>", < 'II- < 1/,\, (e) unstable: J1, < 0, (d) unstable: }1. > 1! AI

,

We may now derive a more explicit formulation for the transient behaviour 01· the.steepesr-deseent algorithm in termsof the origiaa I tap-weigh; vector w( k). We note that

v;{k) = Wo + v(lc) = Wo + Qvl(k)

[ v~(k) 1

! " .. -I-j "lA(:.k.)

=w .. + Qh 11.1 ..........

I u;"'_I(k)

.v-I

= Wo + E qiv~(k)

1=0

(5.24)

Method of Steepest Descent

1.25

where qo. q I , ... ,cv.~ _ I are .the eige:ll:vcc!orsassociated With the eigenvalues ,>'0, ).1' ..•• , AN _ I of the cerrelatlon matrix R. Substitu tj ag (5.20) in (5.24) we obtain

N-J

I'i'(k) = lila + L '1);(0)(1 - 2~Ai/q,.. 1=0

(5 .. 25)

This result shows that the transient behaviour of the steepest-descent algorithm for an lV-tap transversal: 'filter is determined by a. sum of N exponential terms each of which is controlled by one of the eigenvalues of the correlation matrix R . .Eacneigenvaiue )./ determines a partieular made of eonvetgenee ill [he direction defined by its associated eigenvectorq,. The various modes work independently of one another. FPf a selected value of the step-size pararneter /l, the geometrical ratie fae.!of 1 - 211,)", which determines how fast the ii.1l mode converges, is determined by the value of ).1.

Ex.7lInp/e 5,1

Consider the modelling prohlem depicted in Figure s.s. The input signal. X(II!, is generated bj' pa.~$iug a wbi!e noise ~lgnlil, lI(rt),. i.hr()l,gh 3. colouring filter with the system function

. /l-r:i H(z] =-·--1' 1- a:;C

(5.26)

whereo is a real-valued constant in the range - J to + I. The plan I is a two-tap Fl R system withthe s ystem Iunction

P(.z) = 1-4z-l.

All adaptive filter with the S}"sl.cm function

11'(::) = Wo + 1<'1;;-1

is used to identify the plant system funotion. The sreepest-descent wg.orithm is used to find the optimum values ofthe rap weights Wo and .".J .We want 10 sec,.as (he i teration number iaoreases, h9w the tap wej'gh.ts IVa and 11·! .eonverge rewards the plant coefficients 'land -4, resp;eclivdy. We examine this for diffemnJ values of the parameter icl:,

From the results. derived ill EiqtmpJe 4. I of Chapter 4 .• we note th;l:t

and

E!X(,II)x(n- 1)1 = (t.

x(n)

d(n.)

Mod¢J

Figure 5.3 A modell1ng problem

}

J

J

126

Search Me.!hods

These give

Learning Curve

127

and

R = E[X(f/)xT(IlJI = [~ 7].

where X{II) = [X(II) x(/I- I)JT. FUTthelIDOTe. the dements of [he rross-correlation vector p = E[x(II)d(n)J are obtained as follows:

PI = E!-"(lI-l)d(n)] = E]x{n-I)(_..:(n) - 4x(II-I»)J = E[x(lI- I ):C(lIlj- 4E.[.?(n - I)J = t:t - 4.

(,5.27)

These give

Po = E[;I;(n)d(1I1] = E[.l(II)(_~(II) - 4,,"{n- 1»1 = E['\~(II)l - 4E[x{n)x(11- I)] = i - 4a

o -'1

2

(a)

(c)

p= [1-40].

La-4

(S.l&)

Substlturing (5.27) and (5.28) in (.5.11), we get

["'O(k+l)] [J-2/J.

",,(k+ I) ~ -2J.ul:

-ZILa] [lI'o(kl] [1-40']

+2p, .

1 - 2J1 11'[ (k) 0: - 4

(~.29)

(b)

Starting with all initial value w(O) =lwJI(O) 11'1 (OJr and Jetting the recursive equation (5.29) rnn, we gel tWQ seqm\nce$ of the tap-weigh\. V'J:Tiable'i> wa(k) and '\'1 (It}. We may the~ p.lut w, (1\) versus II'o(k) to get [he trajectory (path) that thesteepesr-descent algorithm follows. Figures 5A(a), (b). eel and (d) show rom such trajectories that we have obtained [or values of u = 0, 0.5, O.75'ana 0,9. respectively, Also shown in (be figures are the contour plots that bighligbr the performance surface of the filter. The convergence of the algorithm along the steepest-descent slope of the performance surface can be clearly seen. The results presented 'are fOT It = 0<05 and 30 iterations, for aIL cases. [I is interesting to IIO!e that in the case (¥ = 0, which cormspo,ods to a white input sequence, )((11), the convergence is· almost complete 'Within ~O iterations. However, we other three cases require more iterationsbefore they converge to the minimum point of the performance surface. This can be understood If we 1l0~ that the eigenvalues of Rare ..\0= 1 + a and ).1 = I-D'. and for o close to one, (he geurnerrieal ratio factor I - 211),1 may be very dose to one. This introdu(;es ,1\ slow mode, of convergence along the t1; -axis (i.e, i 0 the direction defined by the eigenvector ql ).

5.2 Learning Curve

Although the recursive equations (5.19) and (5.24) provide detailed information about the tram.ientbehaviotrr of th~ steepest-descent algcritnm, the muiti-parameternatnre of the equations makes it difficult to visualiaesuch behaviour graphically. Instead it is more convenient to consider tlie variation of the mean-square error (!vISE), i.e, the performance function .;, versus the number of iterations.

We dennef(k) as the value ef'the performance rUllction~ when", = w(k). Then, using (4.94) of Chapter 4,. we get

101-1

(k} = ~lfIin + L I\v?(k),

i=O

(5.30)

(d)

where ~lni" is the minimum MSE. Substituting (5.20) in (5.30) we obtain

.Figure It4 Trajectories showing how the fTller tap weights vary when the steepest-descent alqortthrn is used: (a) CI: = 0, (b) a = 0,5, (0' o = 0.75, (d) CI = 0.9. Each plot is based on 30 Iterations and JL = 0,05

,v-J

E{k) = {min + L ).;(1 - 2.u,.\,)1kt,f(O).

. i~O

(5.31 )

128

Search Mc,thods

Learning Curve,

129

Whe11 jL is selected within tile bounds defined by (5.2:3), the terms under the summation in (5.31) c"mverg~ to zero as k increases. As:a result, the minimum MER isadhicrvea after a sufflcienr number of iterations.

The curve obtained by plo.tting~(k) as a (unction of the iteration index, k, is called the ledrnil1g C'/U'vti. A learning curve of' the-sieepest-desceet algorithm, as can be seen from (531), eonsists of the sum of N expanentially deeayin:g terms each <?If which corresponds to one of me modes or convergence of the algorithm. Bach exponential term mJ1.J' be characterized by a time consrant which is ohtaiaed as. follows.

Let

&O~------~------~--------c--------.-------,

45

40 35

30

(5.32)

ur

~ 25

20

and define 1", as the time constant associated with theesponential term (I - 2fJ.)..i)lk. Solving (5..32) fOf 1",-, We gel

15 10

-1

7· -. ..

,,- 21n(1 - 2jJ,).,/) .

(s.:n)

5

For small values oftbe step-size ·parameter /1. ,;"hen 2/t1.,- « J , we note that

oL---~~====------~----~~--~

o 20 40 60

80

100

k

(5.34)

FIgure 5.5, A !earnlng curve of the modelll!l{,l problem. The ~ [MSE) axis is scaled ,linearly

(5.35)

These are the tirue constants that chatacll$ize the learning etrrve of the rncdellmg problem. F'igure 55shov.~ a learning curve of the modelling problem when :w(0) = [2 2JT, ClI = 0.75 and /J. = 0.05. Far these values, we o'bi:aiu

Substirutiag this in (5.33) we obtain

and

(5.J8)

This result, which is-true for all values of l' =0, 1, .... , N - 1, shows that. in geo.era\, the number of time constants that characterize a learning curve we equa I to the nnm ber of filter taps. Furthermore, the time xxmstants thai are associated with the smaller eigenvalues are Iaeger than those associated with the larger eigenvalues.

(5.36)

~ • ~ , •• r ~ ~ • r~.'. r •••• r •• r .:r ~. I· - ~ - ••• -~~ ... r • " • " ... " t t~,. ~ " " t ••• " " ~

" " , ". " •• " .~~. ~"~ •• ~ _ ~ .~". r r " • " " ... " •• -:: •• _ .... 4 •• r~ ... ! r " 4 .. " .. " ••• " r

Consider the modelling arrangement discussed ill ExampJe 5.1. The correlation matrix R of the filIDr .inpl'lt is given by (5.27). The elgenvalues oCR are

and

..1.1=1-0'..

.. " •• , •••• , " • r~" r " • " ••• ~ _ ... _ ~: •• ~ •• " • "

. - - .. ;. ~""." ..

1.0--2 '------...L.-- ----' -'- -4. ---l

o

20

40

60

eo

100

'I"'" (. j .

. - 4pl-a.

(537)

k

Figure 5.6 A ieaming curve oftha modelling problem. The ~ (MSE} axis is scaled logaritllrnicalJy

J

130

Search Methods

The Effecl' of Eigemr,alue Spread

131

The existence of two distiucj time .constants on the learning eurve in Figure 5.5 is dearly observed ..

The two time cons\-a:niS could he pbserY;;:d more clearly if thet,a;_xjsweq:_.waled lcgarithmicelly.

To see this, the learning curve of the modelling problem is plotted in Figure 5-.6 with the e'nis sealed logarithmically. The two expcnentials.appear as two straight lines·on this piaL The first pan of the plot, with.a steep slope, isdaminantly comrelled by '0. The remaining part of the learning curve shows thecontribution of the seeond eoxpo'lltlntial which iseharacterized by Tl' EStimates of the time constants.may beobtained by finding the number ofitet:itions required for 0; to dr'o.p 2·73 (i.e. the napier number) times: along each of the slopes, This gives

TO~ 3

"7"1"" 20,

0.8 0.7

and

which match well with. those in (.Ii.3S).

5.3 The Effect of Eige!nvalue Spread

Our study tn the pzevieus two s'ectiQns shO\vs thar the perfomranee of the steepestdesce[lt al.gOcf/tluTI is highly deperrdent 011 [he e~gel1val(Je:s of tlte ooer-elation 1118IrfX. R In general, a wider sp.rea:d of'raeeigenvalues results in a poorer performance of the steepesrdescent algorithm. To ,gain further insight into this property of the steepest-descent algorithm, we find the optimum value of the step-size parameter fJ which results in the (as test p.ossible ceavergeace of'tbe steepesr-descem algorithm.

We note that the speeds at which various modes Qf the steepest-deseent algorithm cOllverg¢ are determined by the siz:e (absolute value) of the geometrical ratio factors 1 - 2p.),,;, far i = 0, J, ... ,N - I. For a given value of f,!, the transient time of the steepest-descent algorithm is determined by the largest e.]:ement in tneset {II - 2/J.)./1, t = 0, 1, .... N - I}. The optimum value of ii that minirniees the largest element in the latter set is obtained by looking at the.twoexrseme cases which correspond to Am", and ).mm, i.e .. LOC maximum and minimum eigenvalues of R. Figure '3.7 shows the plots of II - 2P),m;n I arid 11 - 2j..(\n3~jas Iunctions of It, The plots of the-other eig-envalues lie in between these two plots. From these plots we: caa clearly see that the optimum value of the step-size parameter p, corresponds La the point where the two plots meet. This is (he point highligbted as P."pl inFignre 5.7. Il corresponds to the case where

0.1

[.loptO•S

0.2

0.4 [.l

0.8

Figure 5.1 The extreme "oases showin.g how i - 2/-!A, varies as alunction QI the step-size param~ter /1

Substituting (5040) in (5.41) we obtain

($.42)

niis bas a valU€ that remains between 0 and. 1. When ..\"'''' = Am,", f3 = 0 and the stcepest-deseerrt algorithm can converge in one step. As the' ratio Amw:! AmiD increases, /3 also increases and becomes close to one when Ama.-!Amon is large. Clearly, a value or f3 close to one corresponds to a slow mode of .convergen¢¢. Thus, we note that the ratio ).rn11x/Amm plays a fundernenral role in limiting theceuvergence performance of the steepest-descent algorithm. This ratio is caned thel?igellvalu~ spread.

We. may also recall from the previous chapter that the values of Am,. and AmiD are crose.]y related to the maximum and minlmum values or the power spectral density or the nnderlying process. Notiug this, we may say that the perjarmmlcll oj the steepestdescent algorithm i" closeZr relaterl 10 the shape of the power spectra; deniil.V of ihe I.mderlyi71,r; input process. A wide distribution of the energy of the underlying process within different frequency bands introduces slow modes or convergence which result in a poor performance of the steepest-descent algorithm. When the underlying process contains very little energy in a band of frequencies, We say the filter is weakly excited in that band. Weak excitation, as we see. degrades the performance of the sreepest-deseent algorithm.

(5.39)

Solving this for li-opl' we obtain

(5 . .40),

For this choice of the step-size parameter, J - 2p:"P1Amln is positive and J - 2~I9:P'['\",a~ is negative. These correspcnd to the ,overdampt'd arid uuderdamped cases presented in Figure 5.2,. respectively. However, the LWO rnodes converge at the same speed. For Ii- =li-lJl>l' thespeed of couvergeneeof the steepest-descent algorithm is determined by the geometrical ratio factor

(s.4:J)

132

Search Methods

Newton's .Melhod

133

5.4 Newton"s Method

w~ also note that R-Jpis equal to the eptimum tap-weigh.t vector woo Using this in (5A5) we oetain

Our discussious in the previous sections show tbat the steepest-descent algorithm may suffer from .slow modes of convergence which arise as a result or (he spread in the eigenvalues of the correlation matrix R. This means that if we Can somehow get rid of the eigenvaJw spread, we can get mucb better C()[;[vergeno.(! peUO)"mance. This is exactly w'hat Newton's raethad does .. To derive Newtl;)fl'S method for the quadratic case, w:e stan from the steepe-St-descent algorttbm given in, (5.10). Using p = Rwb• (5.10) becomes

w(k + 1) = "I - 21t}w(k) + 2p.w".

f5A{i)

Subtracting Wo from bethsides 9f {5A6.), we get

w(k + t) = w(k) - :2,p,R.(w(k) - ,...~).

(5.43-)

lV(/t + I) - Wo = (1- 2/))(w(k) - w<».

(5.47)

We .may Dote that il is the presence of R in (5.43) that causes the eigenvalue-Spread preblera.in the steepest-descent algoritsm. Newton's method overcomes this problem by rep~acing the scalar step-size parameter iJ 'With a matrix step-size :given :by ~IR-J. The resulting algorithm is

Starting withan initial value w(O) and iterating (5.47), weobtain

w(k) - w" = (I - 2~)k(w(0) - woJ.

(5_48)

w(k+ L) = w~k) - tJR-1'i]kS-

(5.44)

The original Newton's method selects the step-size parameter p. equal to 0.5. This leads toeoovergence of w(k) to its optimum value, \1'0' in one iteratiou.In particular, we note that settingr, = O.S and k = 1 in (,5.4&), we obtain w(J) = Wo' However, in the aetual imVlemenlation of adaptive fiJfe:rs, where [he exact values of 'V:kS and R-! are Dol avaHableandbave [0 be estimated, we need to use a step-size parameter much smaller than 0.5. TItUS, au ev:all;l at ion of Newton's recursion (5.44)f;or values of IJ. 0/= 0..5 is instructive for om further study in later 'chapters.

Using (5.48) and following the same line ofderivations as in tho case of the steepestdeseect method, it 1S straigbtforward to show that (see Problem P5.4)

Fi.g'UTe 5..& demmlslDates the e.ff:oct ·of tne addition ofR-1 ill front of the g.radient yoo_tot in Newton's update equation (j.44). This hl\.$ the etTec! of rotating rhegradient vector [.0 Ute direction pointing towards the minimum point of the performancesurface.

Substituting (5.7) in (5 .. 44) we obtain

w{k + I.) = w(k) ~2J..R~1(RU'(k) ~p) = (1 - 2,u)w(k) + lttR-! p.

(5.45)

{(k) = ~"';~ + (1 - 2tL)lk(t(O) - (mm),

(5..49)

where ~(k) i~ Ule value of the performance function, {,Mum w = w(k).

From (5.49) we note that the stability b[ Newlon's algorithm is guaranteed when It - 2.u1 < 1 oI,equi1la.!cnlly.,

(5.50)

With reference to (5A9), we make the follo\-vingqbservatiolls. The Ird.llsjen/ beh~\Ji('Jur of Newton's algc,Jrithm is characterized by (/ single e;'<.pOfleil I ic;.l whose cotrespvnding1imfJ constant is obtatned by solving the equation

(5.51)

When 2jJ, -e; I, this gives

1

"-::':<-. 41-'

(5_5.2)

Figure 5.8 The negative giadienl vector and Its correction by Newton's melhod

This r~Qllshow;'l that Newton's method has only one mode of convergence and that is solely determined by -its step-sizeparameter p..

134

Search Methods

Problems

13S

1

5 .. 5 An Alternative Int.erpretation of Newlonjs Algorithm Further insight inle the operation of Newton'salgorithm lsdeveleped by givin.g an alternative derivatioll of it. This.derivation uses the Ka;rnunep-Lt!eve transform (KLT) waich was introduced in the previous chapter.

Feran observation vector x(n) with real-valued elements, the KLT is delIned by the equation

where

ptN = E[:![IN (If)d(n)).

Since RW = J, (5.60) sImplifies to

(5.61)

wiJ.I (k + I) ~ (I - 2p.)v/N (k) + 2pw't',

(5.62)

x'(n) = QTX(II)

h ~JN ('n/JI/)-1 lA' IN··· . al r I '.. IN

were IV., =.tiI. P = P IS the opnmum v ue 0 t 1(: tll-p,",welght veeter w •

Comparing this with Newton's algorithm (5.46), we find that the steepest-descent algori thm in this case works JUSt like Ne,Wlon'.s algorithm.

Next, we show that the recursive equation (5.60) is nothing but Newton's recursive equation (.$ .. 44) written in a slightly different form. For this, we use- (558)i]] (5.62) to obtain

where Q i.s the N x N matrix whose celunms are the eigenvectors qa, ql, ..... , <iN _ I of the correlation matrix 11 = E[x(n)x T (n)l. We reeajl from Ch.apter 4 that tile elements of the transformed vector X'(I1) , denoted by .rb(n),x'l(n» ... ,-l'~_I(lI), c:ornlitu!:e a set of muruallyuacorrelaeed random variables. Furthermore, <4.68) implies tbat

E!.i,,2(n)] = \, For i = 0, 1, ... , N - I,

(5.54)

(5.63)

cJN ~l/j I

oK; (n) = At X/(II) ,

for i =0.1, ..... N - 1,

(5.55)

Prernultiplying b~th sidesof~his equation by (A1/2QT)-1 = Q A-1/2 (since (QT) -J = Q). we get (5.4:6),. which can easily be converted to (5.44).

The above developmentshowsthat Newton's algorithm may be viewed as a steepesldesrenl o.('gotUhm for the u'fmsjomred i11P1i.i Signal. The eigenvalue-spread problem associated with tile steepesr-deseeat algorithm is resolved by decorrelating the filter inputsamples (Ihrougb their corresponding Karhunen- Loeve transform) followed by a power normalization procedure. This {sa IIIhiteldng process; namely, the input samples CIrce decorrekued and ttiell 1u)l"lIlaiized 10 tfle UniT power prior to lhejilieri:ng process.

where A,S are the eigenvalues of the 00 rrel ali on matrix R.

We define the vector )I:"'>'(n) whose elements are

where the superscript Niiignifies the fact that xV" (II) is normalized (0 the power ofunity (see (5.51) below). These equations mas collectively bewritten as.

(5.56)

Problems

J

where A is a diagonal matdxcoruatlng of the eigenvalues AD, )'1, ... \ ~\N _ l- It is sLrajglitforward to show that

PS.l Starting witb the canonical form. of theperformauee function, i.e, (4.93), sugg¢st

an alternative derivation of (5.15). .

(5.57)

j

where t is the N x N identity matrix.

We also define

PS.2. Show that when tin: steepest-deseeut algorithm (5.10) is used, the time constaats that control the variation of the tap we~g:nts of a transversal ruler are

(5.58)

, I

"lj"=2p,A;' ror i=O, 1,. .... ,N-I.

J

and note that

P5.3 Give a detailed derivation of (5.25) ill the case where the underlying signals are complex-valued.

(5.59)

P5.4 Give a detailed derivation of (5.49).

This resuhshows that a filter with an input-veetor x(n) .and output y(n) = w T x(n) may al rerna live I y be rea li zed by using X,N (11) and w'N as the.fi.lrer inputand tap-weighr vector, respectively. The steepest-descent algorithm for this realization may be written as (see (5.11))

P5.S Show that if Ul the steepest-descent algorithm the tap-weight vector is initialized to zero,

1

(5.60)

where wi> is the optimum tap-weight VCCt0T.

136

Se'arch Methods

P9.6 Consider the modelling problem d€.pictcd in FigureP5.6. Note that the input to the model is a noisy version of tile plant .input, Ttre 'additive noise al the model input, "'i(II), is white and its variance is d1. The sequence v..;.(I'1) i§ the plant noise. It is uncorrelated with u(n} and 1>",(11). The correlation matrix of the plant input, u(n},. is denoted by R The model has to be selected so that the MSE at the ruod'eI output. is minimized.

(i) Find theeorrelation matrix: of (he model input and show that it shares the same set of eigenvectors with R.

(it) Derive the corteapoadiug Wiener-Hopf equation.

(iii) Shdwtbat the difference between the plant tap-weigfit ve.ctor,.'W", and Its estimate, w:o• which is ebtainedthrough the Wiener=Hopf equation derived in (ii), is

where theg/s are the etgenvectors of Rand p is the cress-cerretation between the model input and the desired output.

(iv) Shew that the mismatch of the plant and msdel, i.e, thediffeeence w" - '"'". results in an excess MSE at die model output which is .given. by

N -1 (T')2

MSE ~ '" 'II P

eX€e5S ' = 0"1 D ,7)2'

1=(1 )·AA/ +u;-

(v) If the steepest-descent algorithm is used to find w", find the time constants' of the resliJiill;g learning curve. How dO these lime constants vary with q}? Djs~llss the' eigenvalue-spread p.I'obJem as .fT~ varies.

e(n)

Model

Figure 11'5.6

PS.7 Consider a transversal filter with. the input and cap-weight 'vectors x{n} and w, respe(ltively, and output

y(n) = wT x(n).

Problems

Define the veetor

x(n) = R-tl2x(n),

where R = E[x(n)~T(n:)J_ Let :i(n) be the input to a filler whose output is obtained through the equation

y(n) = wri(n),

where'" is the fillertap~weight vector.

(i) Derive an equation forw so that the two outputs Y(fI) and y(n) are the same. (ii) Derive a steepesl-de,seent update equation fOT the tap-weight vector w.

(iii) Derive au equation that demonstrates the variation ,of the tap weights of the :filter as the steepest-deseent algorithm derived In parr (ii) is running.

(iv) Find the time constants of the learning curve of the algorithm.

(v) Show that the update equation derived in (ii' is- equivalent TO Newton's -algorithm.

P5:.8 Consider a two-tap Wiener fitter which 1$ charac!~rized bytbe following parameters:

R = [,.1. O.8l 0.8 I j

and

p= [~l

where B. is the correlation matrix of the filter tap-input vector, x(n), and p is the crosscorrelatien between X{I1} andthe desired output, den).

(I) Find the range o( the step-size parameter iJ. that ensures convergence of the steepest -descent algoritbm. Does this result depend GIn the cross-correlation vector p'?

(iL) Run the. steepest-descem algcrithm for flo = 0.0.5, 0..1, 0..5 and 1 and plot the corresponding lraj~!.ories in the (wo, wI)-'pianc.

(ill) For flo = 0.05, plot w(I(k) and WI (k), sepa:r.ately,4S functions of'the iteration index, k,

(iv) On the plots obtained in (iii). you should find tharthe variation in each tap weigh t is signified by two distint.t time constants, This hrlp!ies that the variation .0f each tap weight may be decomposed into the summation of two distinct exponentialseries. Explain this observation.

PS.9 For the modelling problem discussed in EX!lniple 5.1, plot the trajectories of the steepest,desoem and Newton's ·alg:oritbm OIl the same plane for p, = 0..0.5 andze = 0,0.5, 0.75 aed 0.9. Comment on your observations.

PS.IO Consider the InudelliIlg probl€J1l depicted in Figure 5.3. Let x(n) = 1, for all values of n.

(i) Derive the steepest-descent algorithm whicb may be used to find the model parameters,

138 Search Methods

(ii] Derive an equation for the performance fum:tion or tile present problem, and plot the contours that show its performance surface.

(iii) Run the algorithm that you have derived in (I) and find themodel parameters that j'f converges to.

(iv) On the performance surface obtained ill (ii), plot the trajectory showing the variation of the model parameters. Comment on your 'observation.

i

6

The LMS AJgorithm

The celebrated least-mean-square (LMS) algorithm is introduced in this chapter. The LMS algorithm. which was rust proposed by Widmw and Holl in 1960, is. the most widely usedadaptive filtering algorithm, in practice. This wide spectrum of applications of the LMS algorithm can be attributed to its simplicity and robnstnesstc slgnal statistics. The LMS algorirhm bas also been cited and worked upon by many researchers, and over [he J'ear~ many modifications to it have been proposed. In this and the subsequent few chapters we introduceand study several of such modifications.

6.1 Derivation of the LMS Algorithm

Figure 6.1 depicts all N~tap transversal adaptive filter: The filter input, x(n}. desired output, .d(n), and the filter output,

N-l

yen} = L 1-I1;(n)x(n - i),

;=0

l6.1)

are assumed to be real-Valued sequences.The lap weights \Vo(~I), 1\'1 (n), ... , 1-1'''_1 (Il) __ are selected 'so that the djffer~llce (error)

e(l1) = d(n) - y{n) ,

(6.2)

is minimized in some sense. It may be noted that the IiLOOr tap weights ·areexplicitly indica ted [0 be functions of the time index: 11. This signifies that in an adaptive £i:J ter, in general, lap weights ate tfrne varying, since they are continuously being adapted so that any variations in [he signal's statistics could be tracked, The LMS algorithm changes (adapts) the filter tap weights so tlulletn! is minimized in the mean-square sense, thus the narne least mean square. When the processes .:c(I1) and den) an: jointly stationary, this algorithmcouverges to a set of tap weights which, on average, are equal to the WienerHopf solution discussed in Chapter 3. In other words, the LJYffi algorithm is a practical scheme for realizing Wiener filters, without explicitly solving the Wiener-Hepfequation. It is a sequential algoriihm which can be used to-adapt the "lap weights of a filrer' by continuous observation of its input, x(n), and desired output, d(n).

140

1'he LMS Algorithm

Figu~e 0.1 An N~tap transversa! adafltive· filter

The cqnventionaJ LMS algorithm IS ,8 stochastic implementation of the steepest. descent algorithm. II s.imply replaces the COst function I; = E[el(n)] byits iustanraneous coarse estimate £(n) = ,I(Il). Substituting ~(n) = eZ(I'I) [or .f in the. steepest-descent recursion (5,9), of Chapter 5, aad replacing the iteration index k by the time index n, we obtain

w(n + 1) = w(n) - p.'Vi(n)

(6.3)

where '.1'(11) = [wo(ri) WI (n) ... WN-l (n)l', It is the algorithm step-size parameter and V is the gradient operator defined as the column vector

ra f) 8 Y

V = lawo aWl ... aWN_I] .

(0.4)

We note that the ith element of the gradient vector \7;(n) is

8e1(n) = le(n) 8e(!!} .

{Jwi aWl

Substituting (6.2) in the last faetor on the right- hand side of (6.5) and noting that den) is independent of lVi. we obtain

(6.6)

Substituting for y(n) from (6.1) we get 8£12(11)

-_ = -Ze(n)x(n - t).

Bw,

(6.7)

Average Tap-Weight Behaviour of the LMS A.lgorithm

141

Table 6.1 Summary ol'lhe LMS algoritnm

Input:

Tap-weight vector, Well), Input vector. )(IJ).

and desired output, d(II). Filter output, Y(II),

Tap-weight vector update, w(n + I).

Output:

1. FilI ering:

yen) = ",T(n)x(r.) 2. Error estimation: e(n) = d(f') - yen)

3. Tap-weight vector adaptation: 1"(11+ 1) =W(II) + 2tJP(n)x(n)

Using (6:4) .and (6.7) we obtain

'Vi(u) = -2e(lI)x(n),

(6.8)

where x(n) = Ix(n) X(I1- 1) ... x(n - N + I )IT. Substituting this result in (6-3) we get

w(n+ I) = "(n) +2J1.1!(n)x(~I).

(6.9)

This is referred [0 as the LMS recursion. It suggests a simple procedure for recursive adaptation of the filter coefficients after the arrival of every new input sample, X(I1) , and its corresponding desired output sample, d(n). Equations (Ii 1 ), (6.2) and (~.9). in this order. specify the three steps required to complete each iteration of the LMSaIgoritJ1ID. Equation (6.1) is referred to as filtering. It is performed to obtain the filter 'output Equation (6.2) is used to calculate the estimation error. Equation (6.9) is the tap-weight adaptation recursion, Table 6J summarizes ofIhe LMS algorithm.

The eminent feature of the LMS algorithm which has made it the most popular adaptive filtering scheme is its simplicity. Its implementation requires 2N + 1 multiplications (N multiplications for calculating the. output .veil), one to obtain (2p.) x e{lI) and N for the scalar-by-vector multiplication (2tte(r1)) x x(n) and 2N additions. Another important feature of the LMS algorithm which is equally important from an implementation point or view is irs stable and robust performance against different signal conditions. This aspect of the LMS algorithm will be studied in later chapters when it is compared with ether alternative adaptive filtering algorithms. The major problem of the LMS recursion (6.9) is its slow convergence when the. underlying input process is higbly coloured. This aspect of [be LMS algoritlnn is discussed in the next section and: solutions Lo it will be-given if! later dtap!ers.

6.2 Average Tap-Weight Behaviour of the LMS Algorithm Consider the case where the filter input. x(n), and its desired output d(fI), are stationary. In that case the optimum tap-weight vector, wQ' of the transversal Wiener filler is fixed and can he obtaine.d according to the Wiener-Hopf equation (3.24). Subtracting Wo

142

fhe LMS Algorithm

from both SIdes of (6.9~ we obtain

'v( n + I) = v(n) -+ 2p,t!(n )x(n),

(6.10)

where V(II) = w{n) - we is the weight-error vector, We alse uore-rhar

1

('(/1) = d(n} - wT (n)x{n) = d(n) -x7(n)w(lI)

= den) _;.,:T (II)W" - X T (n)(w(lI,) - wo) =I',,(n) _xT(fl)v(n)

(6.11 )

where

(6.12)

is t.lle estimation eIT0r when the-filter tap weights are optimum. Substituting (6.li) in (6.10) and rearranging, we obtain

V(II + 1) = (I - 2p,x(lI)x 1" en) )v(n) + 2pe,,(n)x(n),

(6.13)

where I Is the Identity matrix. Taking expectations on both sides of (6.13). we get

E[v(n + 1.)J = E.W - 2J"'X(Il)x T (n)}v(n)] + 2J;lE[e" (n)x(n)] = E[eI - 2p'X{I1)xT(Il))v(n)},

(6.14)

where the last equality foHows from the fad that E[e,,{n.)x(n)] = 0, according to the

principle of orthogerrality. .

The main difficu.lty wish any further 31Jalysis of the righr-hand side of (0.14) is that it involves evaluation of the third-order moment vector Elx(/;)x T(n)v(II)J: wllic,h,in general is a difficult rnatlrematieal task. Ditr"«<'mt approaches have been adopted by researchers to overoorue [hi& mathema tical hurdle, The most widely lIseda,nalysis assumes that the present ous'Cr'!'atio;ll data samples (x{n),d{l1)) a:lic-i.nd~pend'¢:I)'t of the ?aslobservati'\')lI.s (x(Jl -I), d(n - I)), (X(ll - 2), d(n - l)}, .... ; see, for example, Widrowet al. (1976) and Feuer and Weinstein (1985). This is referred to m; the l';Jdru;€mi(!flc(! assumption. Using (he Indepcndeuce assumption, we can argue that siaee ~(I1) depends only on the past observatlons (x{n- \),4(11 - I)), (x(n - 2), d(n - 2)), ... , it is independent of x(n), and thus

E[x(n)x T(I'I)\'(Il)J = E[x(ll)x T (n)JElv(n)J.

(6.15)

We may note that in most practical cases the independence assumption is questionable. For e.)<ample. in the ease ofa length N rransvers <I. I filter the input vectors

x[n) = [xCn) .1"(/1 - l) ... x(n - N + 1 W

J

and

'r x(n-l)=[x{n-l) x(n-2) .. _ x(n-N)]

Average Tap-Weight Behaviour of 'helMS Algorithm

143

nave (N - I) terms incomrnon, QU( of N. Neverthefessrexperience witb the LMS algorithm has shown that the predicticns made by the independence assumption match the computer simulaticns and the actual performance of'the LMS algorithm, in practice. This may be.explained as fonows. The tap_-weight vector w(n) at any given time has been affected by the wliole past history of the observation data samples. (:x(n- 1), d(1I ~ 1)),

(x(n ~ 2:), d(n - 2)), Wben the step-size parameter Jj is small the share of the last N

observations in the present value of W(lt) is small, and thus we may say X (71) and w(n) are w~k.ly dependent This clearly leads to (6.15), with some degree. of approxirna tion, if Wi!. can.assume that the observatlon samples whicb are apart from eachother at a distance of N err greater are weakly dependenr, This reasoning seems to be more appealing than the independence assumption. In any case we use (6.15) and other si:mi;lareql'lations (approximations) whioh will be introduced later to proceed with OUI analysis in this. book.

Substituting (LIs) in (6. .. 14) we obtain

E{v(1J- + I)] = (I - 2f!R)E[v(n)]

(6_16)

where R = .E[x(n)xT(~I)1 is the caudation matrix aHlle input vector X(II).

Comparing the recursions (6.16) nad (5.14), we find that they ale ofexactly the same mathcmatleal tbrm. Thedeterministi~ weight-errGr 'lOOtQf v(k) in (5.14) of the steepestdescent- algorithm is replaced by the averaged weight-error vector E[\'(n)] of the LMS algorithm. This suggests that, on average, the LMS algoritlrrrt, behaves just like the steepest-descent algorithm. In particular, similar to the steepest-descent algorithm, -the LMS algorithm is eonrroljed by N modes of COl) vergence which are cbaracterized I)y the eigenvalues of (be correlation matrix R_ Consequen tty, the convergence- behaviour of the LMS algorithm is directly Linked to the eigerrvalue spread of the correlation matrix R. Furthermore, recalling the relationship between the .eigenvalue spread of R and [he power spectrum of ~(i1), we can say thai the convergence of the LMSa.Igoritbm 1$ din:ctly releted to Ole flatness in lhe spectral content of the und-erlying input process.

Fellowing a similar procedure as in Chapter 5, by manipulating (6.16) we can show that Efven)] COUV6l'ges to zero when p remains within the r-aoge

(6.17)

where )..m.I~ is the maximum. elgeovalueof'R. However, we should point QUi here that the above range does nat necessarily guarantee the stability of the LMS algorithm .. The convergence oj't1ie LMS algorithm requires convergence of the mean ofw(n) towards w (> (did also c(myiirgencfl of the variance of the elemr!nlsojw (n) /0 some limited y.alues_ As we shal I show later, toguarantee the stability of the LMS algorithm the latter requirement imposes' astringent condition on the size of /1. Furthermore, we may note that the independence assumption used to obtain (6,16) was based {ill the assumption that iJ. was very small. The-upper Iimir of JJ- in (6.1 7) may badly violate this assumption. Thus, the validity of (6.17), evea for the convergeoce oj E[w(n)]. lsquesdonabte,

~xQnpU6_1

Consider the modellingprablem afE::rrunpJe 5.1, which is repeated in Figure 6.2. for convenience .. As in Bxample 5.1, the input signal, itn), is generated by passing a white noise signal, ven),

144

The LMS Algorllhm

d(n~)

x(n)

een)

Model

Figure 6.2 A modelling problem

through a colouring fi;Jlerwi;th the sysfero function

vel ~,i!n' ') ~ ~ ~

lE, = r ~ 0:;:1 '

(6.18)

where 0; is a real-varued consranr.in the range ~! to + I. The plant is atwo-tap F'I& sy~tanWl:1:h the system firtl&(ion PC?} = I - 4:;-1. Ail adaptive filreT with the system fnnetinn r¥(z) = I\IQ + W[z--'I is used to, identify theplant, Here, the LMS algorithm is used to find the optimum "iral:lles of the tap weights iVa and "'1' We want to ace, as Ibe iteration number increases, how the- tap weights wD and WI conver~ toward the plant eoefficients, 'l and -4, respectively, We examine this fOT differeru values of the par<11li~te.ro. We r&;aI! from EXlIIIlple ;;.1 ilia( the ~parameter' ,0: c-onlt9b the eigenvalue spread of the cotrelailon raatrix R of rhe input samples to the filter W(~)~

f'igu~res 6.3 (il) , (b). (c) and Cd} p.J:6SeIl~! TOW; pi01S snowjng typical tmjllctories of the LM:S algorithm which have been obtained for the-values bf P' = O. 0.5,0.75 amI 0,9, te~pectively" Also show 11 in tae figures-a r-'e· tlte con Lour pl ots which hi ghlight the performanee surface e f thcfilter. The': results presented arefor /J. = O.Q]and l50 iterations., [or· allcases. In comparison with the parameters used in Figure 5.4 Gffuam:ple 5.1, hero 11 is selected five times smaller, while the number of iterations is chosen five Limes Hirger. Comparing the results here with !.hose of Figure 5A;. wecan 91early see that. as predicted above, the LM,S algorithm, 011 ,average; follows the same trajectories as the steepest-descent algorithm. In particular, the convergence of the LMS :algoritiun aloilg the 5~e~est -desceru slOPe of tbe p.elfo.lTl1;11:nec surface is clearly obserwd. NSo, we note that in the oo~e ,~= 0, which corresponds to a white input sequence. full convergence of the LMS algorithm is almost complete within 150 iterations. However, theether [bree cases require more iteratians before they eorrvergerothe vicinity of lim minimmn.peirrt of the performance surface .. This, as was oared in Example 5_ I. eaa.be understood if'we note I!h<tl the eigenvalues of IJIe correlatien maltU: R of the input samples tethe .adacptivc filter are Ao = 1 + 0: ant! Al = I - a:, and for a elose to 'One, tho time censtant rj = 1/41-').1 may bIZ> very large.

6 .. 3 MSE .Behaviour of the LMS Algorithm

Til this section, the variation of ~('I) = _E[i1(n)] as tile LMS algorithm is being iterated is studied. I This studyls directly related 1:0 theconvergenee of the LMS algori thm.

I The derivationsprovided in this section follow the-work '(lfFCl1,~[ and Weinstein (1.985).. Prior to Feuer and Weinstein (19:85). Horowitz and &IlTIB (198".1)als1). arrived at similar results, ~using a different approach.

MSE Behaviour ortne L.MS Algor.ithm

145

, •• r= --~-::>,","-~------,

o -1

~~-2

-3

-4

-,5 L_j~__'__~>-------"'-------'

-4

-1

-4

-5 .______..----"--_,,_~"'-------'

--4

2

-2

o

o

2

-2

(a)

(b)

-1

o -1

-3

-4

~--2, -3

-2

2

-2

o

2

(d)

Figure 6.3. naj:6"Clorfe.s. showing how the 'filter .tap weights vary when the LMS algorithm is used: [a) .Cl' = o. fb) Q = 0.5. (c) q = (1.75. and (d) a = 0,,9. Each plot. is.based on 150 itemtl'9ns ilnd.j.J. = .(1.01

In the derivatious that follow it is assumed that

1. the input, x(n), and desired output, d(n), are zero-mean stationary precesses,

2. x(n.) and d(1J) are jointly Ga!1SSian-clis.tributed random variables, for all n, and

3 .. at Lime n. the tap-weight vector w(nJ is independent of the input vector x(n) and the

desired outputtl{n).

The validity of the last assumption is justified for small values of the step-size parameter f ... · as '.I'M discussed in the previous sectien. This, as was noted before, is referred to as the independence assursption. Assumption 1 greatly simpliftes the analysis. Assumption 2

146

The tMS Algorithm

results in some-simplification in the final results, as the third and higher-ordermomenls that appear ill. the derivations can be expressed ill terms of the econd-order moments when the underlying random variables are jointly Gaussian.

1

6,3.1 Learning curve

We note from (6.1l) that the estimation error, e(n), can be expressed.as

(6.19)

Squaring both sides 01 (6.19) and taking the expectation on both sides, we obtain

E[tf(17)l = E[e!(n)] + E[(vT (n)x(lI)ll- 2E[e.,{1l)v't (lI)x(n)].

(6.20)

Noting that v'r (n )X(II) = xT (n ),,(n) and using the independence assumption the second term 0(1. the right-hand side of (6.20) can be expanded as'!

E[(vT(n)x(n))2] = E[vT(n)x(n)xT(n)v(nl]

= ElyT (n)E[X(I!}x T (11)1"(11)] = E["T (n)Rv(n)].

(6.21 )

oting that B[{vT (n)x(II»)'] is a scalar and using (6.21), we may also write

E[(VT (n)x{n)iJ = I'r[E[(vT (n)x(nd!]] =Lr [EfvT (n)Rv(n)]] = E[!T[vT(nJR1'(n.)lJ I

I

(6.22)

I

where trU denotes the trace of a matrix, and in writing the last identity we have noted that 'trace' and 'expectation' are linear operators and; thus, could be exchanged. This result can be further simplified by llsing the following result from matrix algebra. For any pair of N X M and .M x N matrices A and B,

J

lr[AB] = triBAl·

(6.23)

I

2We note that when x and yare two independent random variables.

E[xy] = E!.~IE(y! = E[:tE[rfl·

1

Also,

f

A similar procedure is used to arrive at (6.21) and other similar derivation that appear in.the rest of this book.

MSE Behaviour of the LMS Algorithm

147

Using.this identity. we obtain

E[tr[vT (n)Rv(n)]J = E[tr[V(lI)"T (n)R]J = tr[E[v(n)vT(n)JRJ.

(.6.24)

Defining the correlation matrixof the weight-error vector v(n) as

K(n) = E[v(n),X(n)],

the above result reduces to

(6.26)

Using theindependence assumption and noting that eo(n) is a scalar, the last term on the right-hand side of (6.20) can be written as

E[eo(lJ)vT (1*(11)] = E[vT (n) X{!l) e", (n)]

= EIVI'{n)] E[x (11) eo (n)]

=0,

(6.27)

where the last step follows from the principle of orthogonality which stales that the optimal estimation error and the input data samples to a Wiener filter are orthogonal uncerrelated) i.e. E{eo(n)x(n)] = O.

Using (6.26), and (6.27) in (6.20), we obtain

f(n) = E[l(n)] = fmin + 1r[K(n)RJ

(6.28)

where {;min = E[e~(n)l, i.e. the minimum mean-square error (MSE) at the filter output.

Tbis result may be written in a more convenient form. for future analysis, if we recall from Chapter 4 that the correlation matrix R may be decomposed as

(6.29)

where Q is the N x N matrix. whose columns are the eigenvectors of Rand A is the diagonal matrix consisting of the eigsnvalues >'o,.AJ,··· )W-l of R. Substituting.(6.29)

in (6.28) and using the identity (6.23), we obtain .

fen) = enrin +tr[K'(n)A.)

(6.30)

where K'Jn) = QTK.(n)Q. Furthermore, using (6.25), and recalling the definition tl(n) = Q v(n) from Chapter 4. we find {hal

(6.31)

Also, we recall that v' (n) is the weight-error vector in the coordinates defined by the basis vectors specified by the eigenvectors of R ..

148

The LMS AlgDrithm

Noting that A is a diagonal matrix. (6.30) can be expanded as

N-I

t(n) = ~ -I- L A;~i(n)

1=0

(6.32)

where J4(n) is the ijth element of the matrix K' (n).

The plot nll) versus the time illdeJli n, define.d by (6.2B) Dr its ai!€'mative Jorms ill {6.30) or (6.32), is called the learning eurve of the LMS algorithm. It is very similar to the learning curve of the steepest-descent algorithm, since, according [Q the derivations in the previous seotion, the LMS algorithm Oil average follows the same trajectory as the sreepest-deseent algorithm. The-noisy van a lions of the filter tap weigius in the case of the LMS algorithm introduce some additional error and push up its learning curve compared with that of the steepest-descent algorithm, However. when the step-size parameter, u, is small (which is usually the case io practice) we find that the difference between the two 'CUf1'es is noticeable orily whe.n th~.Y bave eenverged ll'Qd approaehsd their steady state. The following example shows this.

Example 6.1

Figure 6.4 shows tim learning curves of the LMS algorithm and the steepest-descent algorithm for the modelling problem discussed in Examples 5.1 and 6.1, when <.l = 0.i5 and p. = 0.0 I. For both eases- !h.e filter I~p weigh~ Iflu'e b«'tl initialized With wo(O) = fl'l (0) = O. The learning eurve of the steepest-descent algorithm has been obrained by insetting the numerical VaJl,l6S of the, parameters

---- LMS algorithm Steepest-descent algOrithm

10-1~-- __ ~ -L ~ L- ~ ~

o

100 150 200

NO. OF ITERATIONS

250

3{)O

50

Rllilre 6:4 learnfng curves of the steepest-descent algorilhrn a.nd the lMS algorifhm (or the modelling problem ell Figure 6.2 and the parameter VallJBS ala = 0.75 and p; = 0.01

MSE Behaviour of the /..MS Algorithm

149

in (5.31). The learning curve of the LMSalgorithm is obtained by an ensemble average of the sequence ;(11) over IOeO independent runs. We note that the two curves match closely. The learning curve or the LMS algorithm remains sligbtly above the learning curve of the steepestdescent algorithm. This is' because ofthe use of noisy estimates of the gradient vector in the L-MS algoriihm,

We shall emphasize that, despite (be noisy variation of the filPeT lap weighl'S, the learning curve of the LMS algorithm matches closely the theoretical results of the steepest-descent algorithm. In particular, (531) is applicable and the time constant equation

1

7..=--

, 4/-tAt

(633)

can, be used for predicting the transient behaviour of the LMS algorithm.

6.3.2 The weight~error correfation matrix

The weight-error correlarion matrix K(n) plays an important role in the study of the LMS algorithm. From (6.28) we note that the value of «(11) is directly related to K(n}. Equation (6.28) implies that the stability of the LMS algorithm is guaranteed if, as 11 increases, the elements of K(n) remain bounded, Also, from (6.30) and (6.32) we note that K'(fl) may equivalently be used in the study of the convergence of the LMS algorithm. Here, we develop a time-update equation for K'(nJ.

Multi-p¥-ing both sides of (6.13) from the left by QT, using the definitions v(n) = Q v(n) and x'(f1-) = QT (71), and rearranging the result. we obtain

(6.34)

Next, we multiply both sides of (-6.37) Irorn the right by their respective transposes, take statistical expectation of the result and expand to obtain

K' (11 + 1) = K' (n) - 2p,E{x'(n)x'T (1l)v'(n)vIT (n)] - 2J,£[v'(n)vIT(n)-x'(n)x/T(Il)]

-I- 4lElx'(n)x/T (1/) 1"'(1/)\1 T (nJxl(n)xfT (nll

+ 2;tE!e,,(n)x'(ri)y,T (n)J + 2p.E[eo(n)v'(nhP(nl]

- 4JiE[e" (n)x' (n),,1T (n)x'(n)xIT (!Ill

- 4p2Eled(n)x' (n)x'T (n)l" (n)x,T (fI)J

+ 4Jl2E[e;(lI)x' (n)x'T (rl)l.

(6.35)

We note that the independence assumption (which states that V(II) is Independent ofx(1l ) and den)) is also applicable to the transformed (prime) variables. in. (6.15). That is, the

150

The LMS Algorithm

1

random vectorv'(n) j·s mdep<::ndtlDtofx'(n;) and d(II). This is immediately observedjf we note that :i (II) and i(ll) are indepehdentlyob~ai;Jled from )((n) and v(n), :rcspec(ivc~y. Also, lh~ assumption thal d(lI) and x(n) are zero-mean and metually Gaussiandistributed implies thatd(n) and x(lJ) are also zero-mean and jointly Gaussian. Furthermore, using the definition x'(n) = QTx(n), we note that the principle of orthogonality, i.e. E{eo(n)x(rr)] = 0, may also be written as

E[eo(n)l"(n)] = o.

(6.36)

Noting Chis, which shows that eo{n) and x'(n) are uneorrelated, and the ract that d(n) and x'(Il) and, tbus, eo{n) and x'(n) ar.ejointLy Gaussian, weean say that the random variables eo(n) Md:xl(n) are .independent ef eaeh other .. 3 Also, the independence of r'(«) from den) and x(n}implies fua"t v'(n) and eo('!.) are independent, since e·Q(n) depends Only on den) andx(n), With these paws in mind,the expectations on theright-hand side of (6.35) can be simplified as follows:

Elx' {n)x'T (n:)v'(n)v'T (N)] = E[x'(n);x'T (n)JE[v' (n)v'T (TI)] =AK'(n)

(6-,37)

where we have noted that E(,l(ll)x'T (n)] = Ii.. -Similarly E[I"(n)v'r(n)x'(n);;lt (ill] = K'(n)A·

((:i.38)

Simplification of the third expectation requires some algebraic manipulations; These are provided in Appendix 6A. The remit is

Elx'(1t)xiT (11) Vi {n)v!T (n)x'(IJ)x'T (nl] = 2A.K' (n)A +tr[AK' (n)]A. (6.39)

Using, the independence of eo(n), x' (n) and i(lI) and noting that eo (1'1) has zero mean, we get

I E[l:lo(n)x'(Il)~"T (/1;)] = Ele,;(II)]E[x'(n)fT (!i)l = 0,

(6.40)

1

where 0 denotes the N x N zero matrix. Similarly,

(6.41)

(6.42)

(6.43)

l We recall that when the random variables x and p are jointlyGaussiaa and uacorrelated, they are also.in dependent (see .P apou Ii s, 1991).

MSE Behaviour of the LMS Afgorit.hm

151

and

Ere; (n)x' (n)x1T (n)] = El~ (n )lE!x' (JI)X'T (n)] = €minA.·

(!'i.44)

Substituting (6.37)-(6.44) in (6.3'5), we obtain

.K'(n.+ 1) = K'(n) - 2p.(A.1{I(n) + K'(n)A.)

+SiAH;'(n)A + 4tltr[AK,(n)]A + 4p;,2S~ (6.45)

The difference equatioa (6.45) is dilli:cuH to handle. However; the fact thaL Ais a diagonal matrix. can be used to sim:pJ[[l' the analysis. Consider the j th d:iagonru element ofK'(Il)anct note that its corresponding time-updateequation, obtained from (6.45), is

N.-I

!C;i(lI+ I) = pjk!n,(n) +4#2),,, ~ ..\l9,,(n) + 4t?emln..\f,

(6.46)

where

(6.47)

andwe have noted that

/if-I

tr[AK'(n)J = L ..\}c.u(n). r-v

(6.48)

The important feature of (6.46) to be noted is that the update of .!C;,(11) is independent of the off~diagooaJ elements ofK'(n). Furthermore. we 11.0te that since K'(!!) isa~orrelation matrix, k.i#(II) ~ k~i(fI )kQ(n) for all values Of i and}. This suggests that the convergence of the diagenal elements of K' (n) is suffieisnt to ensure thecon vergence ef all elements of'ft, which, in tnrn, are required toguarantee the stability of the LMS algorithm. Tlrus, we concenrrete on (6.46), for i = 0, I, ... , N - I.

Let us define the column vectors

(6A9)

'and

(6.50)

and the matrlx

(6.5\ )

152

T/Je LMS Algorithm

MSE Behaviour of ,he LMS Algorllhm

153

(6.52)

We nom. that ~0l;<;<:S' is proportional to emin' This is intuitively uuderstandable if we note that when w(n.) has converged toa viciility of WQ the variance of the elements of the stocbastis gradient vector 'Vi? (n) are: proportional toem;o (see Problem _PQ.l). We also note that similar to ~mln' {""~ also has the units .of power. It is 'convenient to normali1"e {",,,,,,,, to {win, SO thata danension-free degradation measure Is obta-ined. The result is called misadjustme:nt and denoted as .M. For the LMS algorithm, from (6.56)

we obtain ..

where diag[···J refers to a diagonal matrix qorisu;tmg of the indicatede1"iHlMts. Considertng these definitions and the ti;me-up~te ~q\(atipn (6.46), for i = 0, 1, ... , N - I, we get

The difference equation (6.52) can be used to study the stability of the LMS algorithm. As was noted before, the stability 01 the LMS'aIgorithm is guaranteed if the elements of K(n) (or, equivalently, theelements of .k'(n)) remain bounded. as 11 increases. The necessary and s.p:fficient CQuditlon for (bis to ljappeu is that all the eigmvalues of the coefficient matrix F of (6.51) be less than one in magnlmde, Feuer and Weinstein (198.5) have discussed th:edgu'I'alues .ofF and given the, condition required to keep the LMS algorithm stable. Here, we will eomment on the st.abi!'ity of the LMS hlgorithnUn an indirect way. This is done after we find an expression for the excess MSE of the L..MS algorithm, which is defined below.

M = .~;x=. = 4;,l >. T (I - F)-I x ...

~ ... iD

(6.57)

The special structure of (he matrix (I - F) C.3.11 be used to, find .. its inverse.

We note from (6,51) that

(6..58)

6.3.3 Excess MSE and misediustmeni

We note that even when the filttr tap-weight vector w(tI) approaches its opthnal value, w:o, :m:d the J,1lea!l of the stochastic gradient V"CEQf V;(I1.) tends to zero) the instantsneous value of this gradient may not be zero; This results in a perturbation of the tapweight vector w(n J around Lts optimal value, "'0' evenarterconvergence of the algorithm. This, in turn, increases the- MSB of the LMS algorithm to II level above the minimum MSB thal would be obtained if the ftltertap weights were fixed 3:[ their opti)1ra! values, This additional error IS called thee:xcess M31£. ill other words, the excess MSE of an adaptive filter is defined as the difference between its steady~ljtate: MSE and its minimum MSE.

The steady-state MSE of the LMS algorithm can be found from (6..28) or, equivaIently, (6.10) or (6.32) by letting the time-index n tend to Infinity. Thus, subtracting ~min from beth sides of (6.28), we ebtsin

On the other hand, we note that according ttl tlie matrix inversion lemma," for an arbitrary positive-definite N x N matrix A., auy N x 1 vector" and a scalar Dr,

A-I TA-I

( ..!.. . T)-l _ -I _ a .. aa

A,a~ -A T i'

l+aaA-a

Letting A = diagfl- Po, I - Pl,"" I - PN-d, a =A and 0: = -4 .. / in (6 .. 59) to obtain the inverse of (J -F), substituting the result in (6.57), and after some straightforward manipulations, we get

(6.S9)

i\'-I /lA, L l-2w.i

M = -----"_=.:;.0 _

N-! ),

J '" f.L I f;;o' I - 2/1),'<

(6.60)

{O;<q1i$ = tr[K( (0) R 1

(6.53)

n is useful to simplify this result by making some approp ri ate approximations, so that it can conveniently be 'used for the selection of'the step-size parameter, p,. ln practice, one usually selects p sa that a mtsadjustmeut of 10% (M = OJ) or less is.achieved.Tn that case, We may find. that

where, ·('l:XOCSS denotes the excess MSE. Alternatively, IT (6;32) Is used, we get

N-l

(I'K<= = L. Ajk';t(OO) = >.'k'(oo).

[=0

(6.54)

When the LMS algorithm is conuergent, k'(i'!) converges toa bounded steady-state value and we can say k'(n + 1) = k'(n), when II -1-,00, Noting mis,.from (6.52) we obtain

(6.61)

(6.55)

!I The general form of the marrix inversion le.rrlmil; SUI l~ ttUIl if A. B ami C are, respectively.

N>i: N, M x M and N x M. rna trices and the nece-ssary inverses exist, then

Substituting this iu (6.54) we get

(6.50)

(A + CBCT)-l = A-I -A-1C(a-1 + CTA-Iq-ICTA-I.

CleaTly. the identity (6.59) is a special case of th~.

154

The LMS Algorithm

MSE Beh.aviour of tne LMS Algorithm

155

M= /.ktr[R]

I -/J.tr[R]"

(6.62)

From (6.66) we note that :Jis an inereasing function of f1;, since its derivative with respect to PIs always pesitive, In a similar way, W!;l can shew that M is an i:ocreasing{U11ction of :1. Ihis,in tUIIl, implies that Jhe flu'sadjustmettt M 0[(6.60) is all Increasing function a/the step-size parameter, 1->. Thus starting with p = 0 (i.e. the lower bound of /.,.) and increasing 1.£ we find that :J and M also start from zero and increase with fl.. We also note !hat when :r approaches unity, M tends to infinity. This clearly coincides with the upper bound of the step-size parameter, say 1.£=,. below which /10 has to remain to ensure the stable behavionn of the LMS algorithm. Thus, the value of J1.1JIV, is obtained by linding' the first positive root of the equation

where the last equality is obtained from (4.24). This approximation is understood if we Bote that when M is small, the summation all the left-hand Side of (6.61) is also small. Moreover, when the tatter summation is small, /J)../ « 1, for i = 0, L, ... , N - I, and thus these may be deleted from the denominators ofthe terms under the summation on the right-hand side Of (6.60). Thus we obtain

Furthermore, we note that when Mis small, say M .$ 0.1,1.£ tr[R] is also small, and thus it may be ignore-din the denominator of (6.62) to obtain

M ~.",tr[Rl.

(6.63)

(6.67)

This is a very convenient.equation, ilstr[R] is equal to the sum of he powers of the signal samples at the filter tap inputs. This can be easily measured and used f-or the selection of 11K step-size parameter, /J.-, for achieving a certain. level of rnisadjustment. Furthermore, when the input process to the filter is non-stationary, estimate of rr[Ri may be updated recursively and the step-size parameter, po, chosen accordingly to, keep a certain level of misadjustment.

Finding the exact solution of this problem, in general, turns out to be a difficult mathematical task. Furthermore from a practical point of view such a SOlution is not rewarding because it de-pends GD the statistics of the filter input in a complicated way. Here we give an upper bound of J1. tha t depends only Oll ~r=o 1 A, = IJ'[R]. This results in a smaller (marc stringent) value as the upper bound of u, but a value that can easily be measured in practice, For this we note thar when

6.3.4 Stability

1

o < P < -2 -",~N~--"--I )..-.

LA~O r

(6.68)

f

In Chapter 5 we noted that the steepest-descent algorithm remains stable only when its corresponding step-size parameter, /./,_, takes a value between zero and an upper bound value which was found to be dependent on the stausdcs or the filler input. The same ill true for the LMS algorithm. However, [he use of a stochastic gradient in the LMS algorithm makes it more sensitive to the value of its step-size parameter p. and, as a result, the upper bound of /1, which can ensure the stable behaviour of the LMS algorithm, is much lower than the corresponding bocnd in the case of the steepestdescent algorithm. To find the upper bound of p. that guarantees the stability of the L:M,S algorithm we elaborateon the misadjustment equation (6.60).

We define -

the foUowillg inequality always holds:

(6.69)

The proof of this inequality is discussed in Problem P6.4. From (6-.69), we find that the value of 1.£ that satisfies the equation

I,

(6.70)

1

N-I >..

:.r - \;'" 11- i

- L.., 1 - 2/~).·

!=G '

(6.64)

satisfies the inequality

and note thaJ

jV-1 x

" /1oj <1-

~ I - 211)..-

r=O ,..., 1

(6.70

J

(6.65)

Furthermore. any value of /" fual remains between zero and the solution of (6.70) satisfies (6.71). This means that (6·.70) gives-an upper bound for 1.£ which is sufficient for the stability of the LMS algorithm, but.is not necessary in general. If we call the solution of (6.70) l4.a-u we obtain

We also note that

I

(6.66)

I 1

fJ.max= 3 "'N .. -\ \" L...,,1=O ,r':\,

. i 3tr[R]'

(6.72)

156

TheLMS Algorithm

To summarize, we found that, under the assumptions made at tile beginning of this section • .the LMS algorithm remains stable wben

(6.13)

The sig:ni6,cance of the UPPQT beundnf J.t, which is "provided by (6. 73}. i~ !;hat it cap. easily be measured from the filter input samples. We also note that the raage of ~ d]',U is _provided by (6 .. 13)i'.'J sufficient for the stability of the LMS algorlthm, but is not necessary. The firs! pDs1tive rw! of (6.67) gi-vesa.dIiMc accurate uppe,r hpund Of l~· However, this depends <OIl the filter input statistics in a very complicated way which prohibirs its appli¢.a.bility in aetnal practice,

6.3.5 The effect of inifial v~rues ·of ta.p weights on the transient .behaviour of the· LMS algorlthm

As was IlQte.d before, the LMS algorithm en average follews the same trajectory as the steepest-descent algorithm. A::. a' result, the learning curves of the two a!gorit:h:ms are found to be similar when the same step-size parameter is used for both. In particular, the learningcarve equation (5.31) is also (approximately) appiJ.icab\e to the LMS algorithm. Thus. w,e may write

/X-I

{(n) :::o;:tmin + L Aifl - 2ttA/)2IIV:2(O). (6.74)

;=()

In most applications the filter tap weight!; are all initialized to zero .. 10 that ease.

vIOl = w(O) - wo = -WIt. (6.75)

Using this result and recalling the defiltition Vi (0) = Q T'l(O) , we get

1"(0) = -w~ (6.76)

whenI' w~ = QTwo. Using (6.76) in (6.74), we obtain

#-1

{In) ~,{",," + E .\AI -2.u.\I)'bIW~1

1=0

(6.77)

where W~,i is the itb elemeru of w~.

The contribution of various modes of convergence of the LMS algorithm (i.e, the 'terms under (he summation on the right-hand side .0f (6.74)) on its learning curve depends on the ),,111:;; coefficients. A::. a result, we find that even for a similar eigenvalue distribution, the convergence behaviour of the LMSalgorirhm is application dependent. For instance. if the w{,,1s ccrresponding te the smaller etgen.values ofR are all dose to zero, t1i¢n the transient behaviourofthe LMS algorithm is determjned by the larger

Computet S.imuJafions

157

eigenvalues. of R whose associated lime comt3;:llts-are small, thus a fast convergence is observed. On the contrary, if the w'",.s corresponding to the smaller eigenvalues of R are ::;ig,nifi.cantly large, then we find that the slower modes of the LMSalgorilb.mare prominent on its learning curve .. Examples given in the next, section show that these two extreme cases can happen in practice.

6.4 Computer Simulations

Computer simulation pIays a major role in: the study of adaptive filters. In the analysis presented in the previous .'i.cctian" we bad to consider a number of assJ,Iinptio.n,<; to make the problem mathematically tractable. TIle validity of these assumptieas and the matching between mathematical results and the actual pertormenee of adaptive filters are usually verified lhr-ough computer simulations.

Inthis section we present a few examples of computer simulations. We present ex.ampteiLof fow differeat appli.cationsQ[ adaptive filters:

.. System medeiling

.. Channel equalization

.. Adaptive line ilnhanqement (this is anexample of prediction) .. Beam[oUllIr,lg,

Ourbbjec;tives in this presentation are to:

l , Help the novice reader to make a quick start at doing computer simuiations,

2. Cuee!;; the accuracy of the developed theoretical results.

3. Enhance the uaderstandiag of the theoretical results by careful observation and

Interpretation of simulation results.

All the, results ;given below have been generated by using, the MATLAB numerical .paekage. The MA1LAB. programs used to generate fue results-presented in this-section and other parts or this book are available all aaaecempanying diskette. A list of these programs (m-files as they are called in MATLAB) is given at the end of the book and also in the. 'read.rae' :file OD the attached illskette. We eneourage the novice reader to try to run these programs, as this, we believe, isessential for a better understandingof the adeptivefdtering concepts,

6.4..1 System modeillng

Consider a system modelling problem as depicted in Figure 6.5. The filter input is obtained by passing a unit variance white Gaussian sequence, !.I(lI),tbrough I:l fille.r with the system mnction H(z). The plant,Wo(z),. is assumed to be a fiaite impulse response (FIR) system with the impulse response duration of N samples. The plant output is contaminated WiLI;i an additive white Gaussian noise sequence, eo(n), with variance u-~. An N-tap adaptive filter, W(z), is used. toestimate the plant parameters.

FOT simulatinns, in tl::iis section we select N = 15., ~ = 0.001 and

(6.78)

158

The LMS Algorllhm

Compu.ter S;mu.lati"ons

159

1

showsthe power spectral deusities of the two input:$ generated by using the filters HI (i?) Rod Hz(z). These plots are-obtained by noting t!:tat

(6,81)

(6,79)

lind @",,(ei_,}) = I, sincell{JI)i:s a unit "analise white noise proeesaThe fact that H2(t) generares a preeess that is highly coloured, while the process generated by "I (z) is

relatively fiat is clearly 'seen. .

Figures 6.7(a}and (b) show the learning curves of the LMS algorithm for the two choices- of H(z}. The step-size parameter, /1, Js selected according to the simplified misadjustrnentequarion (6.63) for the misadjustmeut values 10%,. 20% and 300!~. The 1i1tar tap ~ights are initialized to ZeFO. Each plot is obtai:zwd by an ens~ b!~a~'e[age of 100 independent simulation runs. We note that ~nrln = !T~ = 0.001, and this .is achieved when the model and plant coefficienta match. Careful examination of the results presented in Figures 6.7(a) and (b) revsals (hat the predictions made by (6.63) are accurate fp:t the case.') where J1. is set fOJ a misadjustment of 10% (or less) ... For larger values of .fJ.,. we find that a more accurate theoretica] estimate of the, misadjustmem is obtained by using equation (6;60). Such an estimate. of course, requires calculation of the eigenvalues of the corretation maw R. The MA TLAB program 'rnedelling.m" on theaccompanyiI(g diskette contains instruetions that generate' the matrix R and the other parameters required for .these calculations. The readeF is encouraged to USe this. program and experiment with it t.b examine tbe effect of'various parameters, such as the step size, 14 tbeplant Model. W,,(z),apd (he input sequence to the adaptive filter, Suah. experiments. will gr.eatLy enhance the readers understandlngof rae concepts of eonvergeaee and misadjustment.

Experiments with the LMS algorithm shaw that the accuracy of the misadjustment equations developedabove varies IviLl:! the statistics of the filter input and the step-size parameter. For example, we-find that-all Qfthe three plots in Figus€ 6.7(a) and two nf the plats in Pigure_6.7(0) match the theoretical predictions made by (6.60), but the third plot in Figwe 6,7(0) (i.e .. the case M = 30%) does not match (6:60). Ttl the latter C!J1le theLMSalgorithm experiences some-instability problem. The mismatch between the theory and experimen ts here is attributed to the fact that the independence assumption made in the development of the theoretical results is baUly violated for larger values of po.

Figure 6.5 Adcaptive mode-I!ingof an FIR plant

Wepresellt the results .ofsimul ations for twochoices of input whieh are.characterized by

and

(6.80)

TIUl first C,IToice results ·in,lPl Iaput, x( /I).. whose corresponding cone-Iauo_n metriA has au eigenvalue spread of 1.45. This IS close 10 \ilbite input. On theeontrary, the second choice of R($) re_stIlts is a highly coloured input with an associated d:gellvaluc. spread of 28.7. From the £esa/{s of Chapter 4, we reca.ll thaI the ejgem<;r.!lue spread figures. can approximately be obtained from the underlying power spectra] densities. Figure. 6.6

I

3r------,------~------~------_r------~

I

\

I

f 2.15

00 z

gJ. 2

_,

i!E

ti 1.5

w n, en

c::

UJ ~

a

0.. 0.5

....

....

,

,

,

\

0.1 0.2 0.3 0.4

NORMALIZED FREQUENCY

0.5

6.4.2 Channe/equalization

Figure 6.8 depicts a channel equalization problem, The Input.sequence to the channel is assumed to be binary (takingvalues of -l-I and -]) and white. The channel system function is denoted by H{z). The channel noise, Ve(II), is modelled as aneddirivewhire Gaussi_a-II process with variance u~" Theequalizer is implemented as an N-lap transversalfilter, The desired output of the equalizer is assumed to be s(ll- L':,..), i.e. a delayod replica of the transmitted data symbols. For the training of the equalizer, it is assumed ~t the tr-ansmitted data symbols ate available at the receiver .. This is called the training mode. Once the equalizer is trained and swit¢hed to the data mode, its output, after passing through a slicer, gives the transmitted symbols. A discnssionoa the training and data mod.e of equalizers can be found in Chapter 1.

\

\

,

, ,

....

J

...

.... __ .....

J

FIgUt'e 6.6 Power spectral d:ensltias of the two inpvt precesses us!!-.d tor the strnulatlon of the modelling problem: (a) H(z) = H1(z), '(b) H(Z) = H2(z) .

160

The LMS Afgorithm

----10%

----20%

...... "· .. 30%

. - . - . -. MInimum MSE

ur 10.-' eo

.:E

Q

SOD

1000 i 500 2000

NO. OF ITERATIONS

(a)

2500

----10%

---- 20%

........... 300/0

,_. _. -. Minimum MSE

':\: ··.t _.:\ _ ~ ':. \\

-, \ :i:

. '-:. ~ :·~.~!;..~~ .• ;:'~;,,\c.i~;;r~i:::·~G"'';'·~_·~;~· ;;.iU~~~9ri9~~

-._.-.-._._.-._.-.-._.-._._._.

o

500

1 000 15002000

NO. OF ITERATIONS

(b)

2500

3000

3000

FTgure6.7 L~arnlng curves of the LMS algorIthm for the modeWng problem ·of Fi9.L1re 6.5, for the two Input processes discussed In Lhe text: (a) H{z) = H,(z) and (b) H(z) = H'2(z). The step-size parameter, jJ" is selected tor the misadjustment value$10%, 20% and 30%, according to Ihesimpllfllldllquat!on (6.63)

Computer Simulations

181

W(z)

sen)

H(z)

-.6.

Z

Figure 6.8 Adaptive channel equallzatlon

Two choices of the channel response, H(z). are considered for OUI study here. These are purposefully selected to be the same as the two chcices of f:J(z) In the modelling problem above, where H(z) was used to shape the power spectral density of the input process to the plant and mode], This facilitates a comparison of the tesults in the two cases. III par tic ular, we note that, in the present problem,

= IH(e.1"')f + ~.

<

(6.82)

Comparing (6.S1) and (6.82) we note iharwhen a similar H(z) is used. for both cases and the signal-to-noise ratio at the channel output is high (i.e. 0;, i small) the power spectral densities of the input samples to the lWO adaptive filters are almost the same. This, in turn, Implies tb.at the convergence of both Iilters is controlled by the same set of eigenvalues. As a result. on aveuge we may expect to SU sitnilar Iearning-C1).rw~s far bath cases,

Figures 6.9(a) and (b) present the learning curves of the equalizer for the two choices Qf the channel response, i.e . .Rt(z) and H2(z) of (6.79) and (6.l!0), respectively. The equalizer lel'lglb, N,and the delay, A, are set equal te 15 and 9, respectively. The step-size parameter, J.i, is chosen according to me simplified equation (6.63) for the' three misadjustment values 10%,20%, and 30%. The equalizer tap weights are initialized to zero. Each plot.Is based on an ensemble average of lUOindependenl simulation fU,QS. The MA TLAB program used to obtain these results is available on the accompanied diskette, It is called equalizer.m'. Careful study of Figures 6.9(a) and (b) and further numerical rests (using the 'equ!llizer.m' or any sintilar simulation program) reveal thai similar to the modelling case the theoretical and simulation results match well when the step-size parameter, p, is small. However, the accuracy of the theoretical results is lost for larger val nell of p,. The latter effeer is more noticeable when the eigenvalue spread of the correlation matrix R is large.

Comparing the results presented ill Figures 6.. 7(a) and 6.9(a), we lind that the performance of the adaptive filters in both cases are about the. same. Moreover, these

162

The LMS Algorithm

101 .------,-------r------,-------~----_,,_----_.

-----10%

----20%~

·,·········30%

. -. _. _. Minimum MSE

W 10-1
(f)
~
10-"2
1 10-9
0 1000 1500 2000

NO. OF IT~RATIONS

(a]

2800

500

----10%

---- 20%

...... , .... 30%

. _. -. -. Minimum MSE

I

J

10~

1000 1500 2000

NO. OF ITERATIONS

(b)

2.500

500

o

Computer Simulations

163

results compare very well witb the prediction made hy theory. We recall that the_se correspond to the case where the eigenvalue spread of the correlation matrix R is small Some differences between the results of the two cases are observed as the eigenvalue spread of R increases. In. partieular, a comparison of Figures 6.7(b) and 6.9{b) shows that the learning curve ofthe channel equalizer is predominantly controlled by its slower modes of convergence, while in the modelling case a balance of slow and fast modes Of convergence is observed. In the latter case. a drop in the MSE from 10 to OJ within the first 100 iterations of the LMS algorithm is observed. The slower modes of the algorithm arc observed after the filter output MSE has dropped to a relatively low level. As a result the existence of slow and fast modes of convergence on the learning curve ate clearly visible. On the co ntrary , in theease of channel eq nalization, we find that the convergence of the LMSa(gorithm Is predorulsantly determined by its slower modes. We canhardly see any fast mode of convergence on the learning curves presented in Figure 6.9(b). An explanation of this phenomenori. which is usually observed when the LMS algorithm is used to adapt channel equalizers, is instructive,

As was noted before, besides the eigenvalue spread of R. the transient behaviour of the LMS algorithm is also affected by the initial offset of the filter tap weights from their optimal values; see (6.74). We also noted that when the filter tap weights are initialized to zero, tile transient behaviour of the LMS al:gorithm is affected by the optimum tap weights of the filter; see (6.77). To be more precise, the contribution ofvarious modes of convergence of the LMS algorithm in shaping its learning curve is determined by the values of AiW~~i for i = 0, 1, ... ,N - J.

For a modelling problem, the statistics of the filter input and its optimum tap weigh ls, w" (i.e, the plant response) are, in general, independent of each other. In this situation it is hard to make any comment on the values of the ).ilV~; terms, The only comment that may be made is that ifwe aSSl1Il1e that the sratistics-of'the filter input are fixed and the plant response is arbitrary, the elements ofw~, i.e. the \II~\iS, may be thought of as II set of zero-mean random variables whose values change from one plant to another, and they all have the same variance, say ~ e , Using this in (6..77), We obtain, for the modelling

problem, ~

3000

]i-l

E[t'(n)] ~ €mill + ~" I:' ),/(1 - 211),;)271 i=Q

(6.83 )

where the statistical expectation on €(n) is with respect to the variations in the IV~ is, i.e,

the plant response. .

On the ccatrary, in the case of channel equalization there .is a close nda~tionsl.rip between the filter (equalizer) input statistics and the optimum setting Of its tap weights, The equalizer is aaiiproo to implement the inverse of the channel response, Le,

3000

I

Figllre. 6.9 Learning curves of the LMS algorithm for the Channel equalizer. for the two choices of channel responeea discussed In the text (a) H(z) = HI (z) and (13) H(~) = H~'z). The step-slzepaeameter. IJ.. is selected for the mlsadjustment values 10%,20% and 30-%. according to the simplified equation l6.63).

This result, which may be referred to as the spectral inversion property of the channel e-qualizer, can be used to evaluate the >'il~; terms when the equalizer length is relatively long. A procedure for approximation of ),;W~f is discussed in Problem P6.1 1. The result

164

Tbe lMS Algorithm

Compuler Simulations

165

there is that when the equalizer length iV is relatively long

>'III~I ~l "or; - 0 1 N I

0, N ' .' • - , , .. ,. " - .

x(n)

1 N~l , '

€(n)::::-: {"',in + N L: (I - 2pA;)2n.

;;1)

(6.86)

-;f
x(lt-M} 7
yen)
L.._ -.M W(z) q:
z / -
+

(6;85)

Substituting this in (6.77) we get. for an N-tap channel equalizer.

The difference between the learning curves .of the modelling and channel equalization proble,m5m<lY now be ~w.plaw.ed by eomparmg (6,83) M.d (6.86)., When the ei~j] l<alueS ),0, ),1,"" )W-l are widelyspread ana n is small (i.e, the adaptaUo.ll has just started), the summation .on the right-hand side of (0.83) is predornina:n:tiy determined by tne larger ),/s. However, noting that the geometeical regressor factors, the (1 - 2jl>.S2t,s, correspondingto the larger A,5, con verge to zero ata relatively fast r3!'t!e, the summarion on the right-hand side of (6.83) experien,ct}s a fast drop to. a level significantly below its initial valee when. II = 0 .. Theslower modes of the LMS algorithnrare a bserved aCrer this initial fast drop of the MSE, This. of eeurse, lswhat we observe in Figurc6.7(b). In the case of channel equalizer, we notefuatwhen n 'is small all the terms under the snmraafina an the right-hand side of (6.86) are about the same, Tills means that there is no dominant term ill the latter summation and, as a result, anlike the modeIll!lJg problem case. the couvergense of the faster modes of the LM'S afgorit[UIi may nat reduce {(nJ signi(icanJ;:ly. A :Signific{mt r~ductio;ti ine(>l} after convergence of the faster modes of the LMS algQJi,thmmay only be observed When ib~ filter length, N,. islargeand only a few of HIe eigenvalues of'R are small.

e(.n)

Figure 6.10 Adaptive linea enhancer

assume that

x(n) = a sin (woJl + fJ)+ 1'(11),

(6.87)

where V(II) is a white noise sequence. The delay parameter M is set 10 1, since 1'(11) is white,

Figure 6.11 shows the learning curves of the adaptive line enhancer when x(n) is chosen as in (6.8.7). The [onawIng parameters are used to obtain these results: N = 30, M = I, fl = 1, w", = O.,J, and B is chosen to be a random '\IambIc withcons.tant

6.4.3 Adaptiv,e line enhancement

0 "
..
J-i'&.
'.
-2
i\ '.,
m ~-
'0
.....,. -4
w
U)
::;:;;
0
w ... ,6
NI "
::J
« \
::;:;;
II: -8
0
.2
~10 Misadjusfment 1%

5%

10%

Adaptive line enhancement refers. to the case where a noisy signal consisting of a few .sinusoidal coinp.ommts is a vail a bleand Ute aim: is to filter o;ut the noise part of the signal. The mooring s<;llut;on to. tills problem is tri via1. The noisy signa:l is passed through a filler which is, tuned to thesinusoidal components. When the frequency of (he sine waves pmsent in the noisy signal are known, OfCOlliSC, a fixed filter will suffice. However, when the sine-wave frequencies are unknown or may be time-varying, an -adaptive sol ution has .10 be adopted.

Figure 6.! (l d'epicl:s the block schematic of an adaptive line enhancer . It is basically an M-slep~ahead predictor, The assumption ill that tbe noise samples which are more than M samples apart are uncorrelatsd with one another. As a resalt, the predictor can only make a prediction about the sinusoidal components of the input signal, and when adapted lO minimize the €Itltput MSE, the line enhancer will be a filter tuned. to the sinusoid al components. The maximum possible rejection of the noise wiH also be achieved. since any portion of the noise that passes through the prediction filter will enhance the output MSE whose minimization is the criterion in adapting the filter tap weights.

Here, to simplify our discussiou, we assume that the enhancer input 9O,nsis~ of (l. single sinusoidal component and the additive noise is white. More specifically, we

'._

""r"" ~..._ 4 ~

.. ~ •. ~

_12~------~--------_i--~ _L ___

o 50 100 150 200

NO. OF ITEAA TlONS

Figure 6_1'1 Le(lrning curves 0.1 thee ad,aptive Iina enhancer, Ths line enhancer M8E Is

normalized to the InpLlt signal power .

166

The LMS Algorithm

distribution in the range Qro to 211', for different !$imuJa:tion runs, The variance of lI{n) is chosen as lOdE below the sinusoidal signal energy. The learning curves are given for three choices of me step-size panimeter,f,i, which result in 1 %, 5% and 10·% misadjustrnent, The predictortap weightsare initialized to-zero. The pr.ograrn used to obtain these results is available on the act;OIiIpanyt)1g diskette. It is c.aJJ!XI 'Isnhncr.m',

From the results presented in Figure 6.11, it appears lJii'at the convergence of'the line enhancer is governed 'by only one mode. Examination of the eigenvalues. of the underlying process and the resulting lime constants of the various modes of the lineenhancer reveals [hat tile mode :rba1 is observed in Figure 6.11 coincides with the fastest CO\IVerglimc~ mode of tile l"MS algorithm. in the present casco An 'explanatlon of 1bis phensmenon is instructive.

We note-that the-optimized predictor of the line enhancer is a filter tuned to the peak of the spectrum 8I'x(n). Funhenaore, from the minimax theorem (of Chapter 4) we may say thai the latter is the eigenfilrer associared with the maximum eigenvalue of the correlation matrix R oJ the nnderlying process, This iiilplioo that the optimum Lapweight vector of the line enhancer coincides with the eigenveetor associated with the larges; eigenvalue of its ecrresponding Gorrelatjon matrbe, In other words., in the Euclidian space.associated with the tap weights of the line enhaneer, the line connecting the origin to the point defined by the optimized lap weights is along the eigenvector assoeiated with largest eigenvalue of its corresponding eorrelatioa matrix. This. clearly explains w1:Jy the learning curves of the line enhancer presented i1:l, Figure 6..11are predomma.ntly controlled by only one mode and this cqim::ides With the fastest mode of conv(lrg\!o:ce of the corresponding LMS a[gprithm.

1

6.4 .. 4 Beamtorming

J

Consider a two-elemenr antenna array similar to the one discussed in Example 1.6. The array consists of two omnl-directienal (equally sensitive to all directions) antennas A and E, as in Figure 6.12. The desired .sJgnal. s(n) = (;fen) cos{ l1!AJu. + ¢ I) arrives in the direction perpendicular to the line connecting A and B. An interferer Gammer) signal 11(11) = .0<") COS (nwo + 411) arrives at an angle (}o relative to sen). The signal. sequences

A

I

Figure 6 .. 12 A twc-element antenna.array

1

Computer Simulations

167

s(,Il) and v(n) :are assumed to be narrow-hand processes with random phasese, and ¢2, respectively. Itis also assumed that the random amplitudes a(n) and .8(n) are zero-mean and uncorrelated witheach ether, The two omaisare separated bya distance of I = )",,/2 metres, when, A.< Is the w.aveJengrh associated with the continuous time carrier frequency

Wo t.,!, = t:'

(6.88)

with T being the sampling period .. The coefficients, IVi) and 11'1, of thebeamfunner .are adjusted so that the output error, e(n), is minimized. in the mean-square sense.

As in Example 3.6, the adaptive beamformer of Figure 6.12 is characterized by the following signal sequencesr'

1. Primary input

(6.89)

2. Reference tap-input vector

X(/I) = [~(n)]

.x(n)

= [. a(n) c~s(nulQ + i1r .. d + .8(n) qos(nw" +r/Fl)] . a(/I) Sllj(IIW." + ¢II) + .8(11) sm(l1Wo + rP2l

(6.90)

The phase shift cPo is introduced because of the difference between the arrival time of'the jammer at A and B. It is given by

(6.91)

where e is the prepagstlon speed, Replacrog' { with >'a/2 ju (6:91) and noting that IP~/c = 2,,"/ Aa. we obtain

(6.92)

We note that, as expected, rPo is independent of the sampling period T. It depends only on the angle of arrival of the jammer signal, eGo

The beamforrner coefficients, 1110 and wI, are selected (adapted) so thaI the difference,

l'

e(lI} = d(n) - W X(I1) ,

where VI = \Wo wljT, is min-imiz.ed in Lhe m.eal:l-sq_1l1;l.re serlr~. T\\c e.r.rnr signal e(u) is the beamformer output.

For II given set of beamforrner ceefficien tSII'O and 1111 and a signal arriving a tan angle 0, the array power ,gain, 9(0), is.defined as the ratio of we signal power in the outp ut e(n) to the signal power at one of the omnis, Assurningvthat a narrow-band signal

5 Tn Example 3.6, to' simplify the derivations ¢1 and ¢z were assumed to be zero.

faB

The LMS Algorithm

,(1'1) cos nwo is arriving at an angle 8,

e(n) = 1'(n)[cos(nw". -1l"sin 9) - 11'0 cosnwo - WI slnnw,,]

= ')'(n) [(cos(lTsin 6)- wQ)cos nwo + (sin(1fsinB) - WI) sin nwol = a{Oh(n) sin(nwo + <p(O)) ,

(6.93)

where

a(O) = J (cos(1fsinO) - \tIo))! + (~in{7I"sinB) - !I'll

and

(8) _ ... -I (C.OS{"Il"f sin B) - ltIo)

«J -tan •. .

. si:n(xsjnB)-w[.·

Using these, we get

9(8) = t?(O} = (cos(r.sin 8) - wof + (sir! (x sin 9) - II/Ii.

(6:94)

9(0), wben ploued ag~!1SL the angle of arrival of the received signal, is called the direclil1il), pauem of the array (beamformer), The names Ql1(lm psuem, "trap pal I ern and spatial r.esj1(Jllseare also used to refer to 9(11). The directivity patterns are usually ploued in polar coordinates.

Figure 6. i3 ShOWiS the directivity pattern of the two-element bmunfotmer of Figure 6.t2 when its ooeffick:ni~ have been adjusted ncar their .<lptima! values using

•• ~ • ~ ~~ •• + ....

300

270

Figure '6.13 The dlrectrvtty pattern 01 the Iwoelem.entalltenna arraY when a jammer arrlv~ from the dirllction 45° wltM respect to the desired signal, as defined in Figurll<.6. 12

.STmplitied LMS AlgorI!hms

169

the LMS algorithm. The follewing parameters have been used toobtaln thess results:

·0 ' 0 2

60 = 45., ~ = O. I, frp = I,

where ~ and fl~t are the variances of 0,(11 land fJ(n). respectively. The results, as could be predicted from the theory, shew a clear deep nnll in I he direerion from which the jammer arrives (8 = (0) and a reasonably good gain in the direction of the desired signal (9 = 0). TIle array pattern issynnnetrical with respeerto the line c.QnnectingA to B, bece:U!$eofthe cmni-direetionalproperties of the antennas. TM MA. TLAB program used to obtain this result is available on the accompanying dis:kette. 11 is called 'bformllr.m'. We encourag!!' the reader to try tlris progl'aro for different values of go,·a! and ui. An interesting observation that can be made is taa t a null is always produced 10 the direction of arrival of the desired signal or jammer, wnlcheveris stronger. The theererical results related to these observadons.ean be [.aunO ill Clfapter 3, Section 3.6.5.

6.5 Simplified LMS Algorithms

Over lhe yearsa number ofmodificaritlnswrueb simplify the hardware implementation of the LMS algorithm luwe been proposed (Hirsch and Wolf, 19'70; Claasen arid Mecklenbrauke1f, linn; and Duttweiler, 1982). These simplifications are discussed in this section. The most important members of lbis class of algorithms are:

The Sign Algorlthm This algorithm lsebtalned from (he conventional LMS recursion (6 .. 9) by replacing f!{n) with its sign. nus leads.to the folloWing reeursien:

w(n + 1) = w(l!) + 2f.1 sign(c(n))x(n).

(6.95)

Because .of the replacement of e(n) by itssign, .iaiplemenunien of this reclli~l()n may be cheaper than the conventional LMS recursion, especially in highspeed applications where a hardware iroplementation of the adaptation recursion may be necessary. Purthermare, the .step-slze parameter is usuaUy selected to be a power-of-two so that no multiplioation would be required [or implementing the recursion (6.95). A set of shift and add/subtract operations would suffice to update thefilter Lap weights,

The Signed.-Reg;re5,So.r AlgoritltmThe signed-regressnr algorithm is obtained from the eonventional LMS recursion (6.9) .oy replacing the tap-input Vector XC'I) with the vector sign(X.(lt)), Where the sign function is applied 10 the vector x(/J) .on an element-byelement basis. Thesigned-regressor recursion is then

well + l) = w(/7) + 2.lHt.(n)sign(x(n)).

(6.96)

AltliOUgll quite sirmlar in form, the signed-regressor algorithm performs much better than the sign algorithm, This will be shown later through a simulation example.

TbeSigu-Sign Algo.rithm Thesign-slgn algorithm, as may be understood fwm its name, combines the sign and signed-regressor recursions, resulting in the roUawing recursion:

1

w(n + 1) = w(n) +4P sign(e:(n)) s~gn(x(n)).

(6.97)

170

The LMS Algorithm

lt may be noted that even though in many practical cases all the above algorithms are likely to converge to the optimum Wi~ner-aopf solution, lhis may not be true in general. For example, the sign-slgrr algorithm converges toward a.set of tap weights that satisfy the equation

E[S'ign(e(n)X(fl))J = 0,

(6.9B)

which in general may net be equivalent to the principle of orthogonality

E{e(n)x(n)] = 0

(6.99)

which leads to the Wiener-H apr equation. For instance, when the clements of the vector x(n) are zero-mean but have a non-symmetricel distribution around zero, the elements of e{n)x(n) may also have a non-symmetrical distribution around zero. In thatcase, it Js !:ikely that the solutions to (6.98) and (6.99) lead to tWQ different set of Lap weigh ts, Nevertheless, we shall emphasize that ill most of the praetical applications the scenario that was just mentioned i unlikely to happen, Even i(ithappens, the solutions obtained from (e.9B") and (6.99) are usually about ilia same.

To compare the. performance of the algorithms that were introduced above witb the conventional LMS algorithm ana among themselves, we run the system mo.delling problem that was introduced in. Section 6.4.1. Figure 6, 14 shows the convergence behaviour of the algorithms when the input colouring filter H (z) = HI (z) is used and

J

'.
~
\
I
I
,
-
I '.."

::: -.-.~ •• ~::;:;;~;:<>

,

---CO/lventlonal LMS ~ - __ Signed Regressor . ..... _._ Sign

...•...... Sign-Sign

I lli , 0-1
2
-2 I
10 I
1 \
10-3 1
10-4
a 0.5 1.5 2 2.5

NO. OF ITERATIONS

3

3.5 4

x lQ4

I

Figure 6.14 Learning curves of the conventlonal LMS algoriltlm and Its strnplltled verslons, Dltferent step-size parameters ara used. These have been selected experimentally so thai all algorithms approach Ihe same steady-state MSE

Simplified LMS Algorithms

171

the step-size parameters for different algorithms are selected experimentally so that.they all reach the same steady-state MSE.

From the results presented in Figure 6.14, we see that the performance of the signedregressor algorithm is only slightly worse than theconventlonai LMS algorithm. However, the sign and sign-sign algorithms are both much slower than. the conventional LM algorithm. Their coo vergence behaviour is also rather peculiar. They converge very slowly at the bllginning, but \S~d up as the MSE 1evel drops. This can be ex.plained as follows.

Consider the sign algorithm recursion and note that it may be written as

. ern)

w(n + 1) = w(n) + 2p je(n) I x{n},

{6JOO)

since sign(e(iI) = e(n)//e(n)!. This may be rearranged as

w(n + I) = w(n) + 2Ie0)! e(n)x(n).

(6.101)

Inspection of (6.101) reveals that the sign algorithm may be thought of as all LMS algorithm will< a variable step-size parameter tl(ll) = fL/le(n)l. The step-size para-meter p.'(n) increases. on an average, as the slgn algorithm converges, since eCn) decreases in magnitude. Thus, to keep the sIgn algorithm stable, with a small steady-state error, a very small step-size parameter J.l has to be used. Choosing a very small It leads to an equally small value (on average) for t!'(r1) in the initial portion of the sign algorithm, This clearly explains why the sign algorithm initially converges very slowly. However; as the algorithm converges and e(lI) becomes smaller in magnitude the step-size parameter ,/(n) becomes larger, on average, acd this, of course.Ieads to a Jaster convergence of the algorithm. A rigOTOUS analysis of the "Sign algorithm for a non-stationary case can be found in Ev.<eda (.\990b) .

The same procedure may be followed to explain the behaviour of the signed-regressor algorithm. In this case, each tap of the filler is controlled bya separate variable step-size parameter. In particular, the step-size parameter of the ith tap of the filter at the IJth iteration is j.i;(I1) = IJ./lx(/I - i)J, where f.J, is a. common parameter to all taps. The fundamental difference between the variable step-size parameters, the 11-;(n)&, here and what was observed above for the sign algorithm is that 111 the present case the variations "in the j.I.;(n)s are independent of the filcer convergence. The selection of the common parameter fL is based on the average size of IX(fI)I. This leads to a more homogeneous convergence of the signed-regressor algorithm when compared with the sign algmithm_ In fact, the analysis of the signed-regressor algorithm given by Eweda (1<)90a) shows that rO[ Gaussian signals the convergence behaviour of the signed-regressor algorithm is very similar to the eonven tional LMS algorithm. The replacemen l, of the x{Tl - f) t!!:rIDS by their signs leads to an increase in the time constants of the algorithm learning curve by a rued factor of rr/2. This. dearly, increases the convergence time of the signed-regressor algorithm by the same factor waen it is compared with theccnventional LMS algorithm. Problem P6.13 contain the necessary theoretical elements which lead to this. result. Another interesting proposal which also leads to some simplification of tile LMS algorithm was suggested by Duttweiier nn2). H,'e -suggested that in calculating: the

172

The LMS Algorithm

gradient vector e(n)x (u) , e(n) and/or x{,,) may be quantized to their respective nearest power-of-two. This leads to an algorithm that performs v.ery similar to the conventional LMS algorithm.

6.6 Normaliz.ed LMS Algorithm

The nermalized LMS (NLMS) algorithm may he viewed as a special.impletnenratioc of the l.MS algorithm which takes into account the variation in the signal jevd at the filter input and selects a normalized step-size parameter which resul ts in a sta bIe as well as fast converging adaptation algorithm, The NLMS algorithm may be developed [rom different viewpoints. Goodwin and Sin (I984) formula Led the NLMS algorithm as a constrained optimizaricn problem; see also Haykin (1991). Nitzberg (1985) obtained the Nl.MS recursion by running the conventional LMS algorithm many times, for every now sample 01 the input. Here, we start with a rather straightforward derivation of the NLMS recursion and later show that the recursion obtained satisfies the constrained optimization criterlon of Goodwin and Sill and also that it matches the result of Nitzberg,

We consider th-e LMS recursion

wen + 1) = wen) + 2J.i.(IT)e(n)x(n),

(6.J02)

where the step-size parameter fAll) i time-varying, We select #(71) so that the a posteriori error

(6.103)

is minimized ill magnitude. Substituting (6.102) in (3.103) and rearranging, we obtain

.e+(I1) = (\ - L/.4(It}X' (n)x(n))e(n).

Min.imizin:g (e+(n))2 with respect to ,u(n) results in the following: 1

,LI(") = 2xT(n)x(n)'

(6.105)

whieh forces e* (n) to zero. Substituting (6.lO5)in (6.102) we obtain

1

W(I1+ 1) = w(n) +T':"{ -) ( ) e(11)x(II).

x II x fI

(6.106)

This is the NLMS recursion. When this is combined with fire ill terin g equa tion (6.!) and the error estimation equarion (6.2) we obtain the NLMS algorithm. There have been a variety of inrerpretations to the 1\1LMS algorithm. W~ review Some or these below, since it can help in enhancing our understanding of this- algorithm.

1. The use of /ten) as in (6.105) is appealing, since il selects a step-size parameter proportional to the inverse of the instantaneous signal sample's energy at the adaptive filter input. This matches the misadjustrneat equation (6.63) which suggests that a

Normalized LMS Algorithm

173

step-size parameter for the LMS algorithm should be selected _proportional to the inverse of the average total ener$Y at [he filter tap inputs. Note that

,v-I [N-I]

triRI = ~ E[f(lI- i)]. = E ~ xl(n - i) ,

and Lr~1l1 .:I?·(11 - t') is the total instantaneous signal energy at the filter tap inputs.

2. The NLMS recursion (6.106) is equivalent to running the LMS recursion for every new sample of input for many iterations untilit converges. (Nitzberg; 1985); see Problem P6.14.

3. The Nl.MSrecursionmay also be derived by solving the following constrained

optimization problem (Goodwin a.nd Sin, 1984):

Given the tap-input vector x(n} and the desired output sample d(II), choose the updated .sap-weight vector w(n + 1) so as to minimize the squared Euclidian norm or thedifferenee

'1(1'1) = w(n + 1) - W(II)

(6.107)

subject to the constraint

wT(n + I)X(Il) = d{II).

(6.108)

Observe that the solution given by (6.106) satisfies the constraint (6.10&). Hence, we define '1NLI\1S(rt) as

1

'11'1 MS(n) = "'(n + 1) - wen) = ::r ( ). ( ) e(n)x.(IIJ.

. x 71 x 11

(6.109)

We will now show that l1NJ..MS{n) is indeed the solution to the preblem posed above.

Let the optimum '1"/(11) be given by

(6.110)

where 7Jl ell) indicates any difference that may exist between 1}o(lI) and "7NLMS(ll)· Since the updated vector W(II + 1) = wen) + 'I1NLMS(n) satisfies the constraint (6.108) we get

Tbe tap-weight vector W(11 + 1) = w{n} +'1,,(11) also satisfies the constraint (6.108), since '110(11) is the optimum solution Thus,

(w(n) + fjlo(fIWx(n) = d(lI).

(6.112)

Subtracting (6.111) fro:tn(6.1 U» and using (6.110), we get "7icn}x(Tl} = o.

(6,113)

1

J

J

1

j

I

The LMS Algorithm

Variable Step~Siz.e LMS ALgorithm

175

Wert)

Table 6.2 Summary of the' normalized LMSalgorfth.m

l1NfMs(n)

Input: Tap-Iveigbl ¥WIiQr, W(II), Input veetor, 1'(11),

and desired 'Output, den).

Output: Filter output, )1(;1),

Tap-weight vector update, "-(11 + I).

L .Fiitepll'g:

)'(11) = W T(n)x(lI) 2_ Error estimation:

<1(") = din) - )I(lI)

3. Tap-weight vector adaptation:

w{1l + 1) = w{n) + T ( ) ~) {-e(i~)x{n)

XliX II TV' .

Figure 6.15 Ga'Qmefricallnterpretatiqri. of the NLM$ r:acuroron

Multiplying the left-hand and right-hand sides of (6.110) by their respective transposes, from left, we obtain

o rthogonruto this subspace, It is.also t.lle vector connecting the point associated with wen} to its projection on the subspace. Tills cle.arlY shows that t]N'LMS(n) is the minimum le'ngth vector that results in the updated ~<.Ip-weighi vector '11'(;, + J) = W(tl) +11NLMs(nJ subject to tile constraint te. 108),

De-spite its appealing interpretations, the NLMS recursion (6,106) is seldom used in actual applications, Instead. it is often. observed thatthe followiQ:g relaxed recursion results in a more reliable implemeutation of adaptive filters:

Tj!(.n)11Q(n) = ('I1NI.MS(n) +111 (/1))T(1JNLMS(n) + 171 (11»)

= 'I1];LMS (1l)17NIA\4S (1i) + 17f (n}'11 en) + 2vkl<I'S(n)T/l (n),.

Preli1ul:ti_plyj:ng (6, lOS!) by 7lT (11) and using (6, II J), we obtain 111 (n) 71N1 .. MS (n) = o.

(6, 114)

((5.1 (5)

W{II + I) = w(n) + T ( ) 1) e(n)x{il). x II:!: 11 +'1,b

(6.11&)

S'Ubs.tltuting (6.1] 5) in (6,1 14) we obtain

(6.1 16)

In this recursion jl and 'l/J are positive constants which should be.selected ap prop ria tely , The ta tiona le for the introducsion of the eonstaat 'if; Is to prevent division by a small value when tlie squared Euclidian nann, ;.,T (n)x(.Il), is small, Thisresults in a more stable im prementatiou of the NLMS algorithm. The constant it may be thought of as.a step-size parameter which controls the rate of convergence of the algorithm and also its misadjustment. We also nole that the recursion ("6.J 18') reduces 10 (6.106) when ii, = I and 1/J = O. Table 6.2 summarizes the NLMS algortthm.

TIlls suggests that the squared Euclidian norm of tile vecter ·1]0(11), i.e .. '7~(II)T]Q(n), attains its minimum when ~be squared Euclidian [lOTIn of the vector TJI ('I) is minimum, This, of course, is achieved when '71 (n) = -0. Thus, we obtain

(6.l17)

This completes our proof."

Figure 6.! 5 gives a geometrical interpretation of the above result. The tap-welglll vector W(fl) is represented ~y a point. The constraint wT(n+ l)x(n) = .d(lI} limits W(II + I) to the points in a subspace whesedimensicn is Que less than the filter length, N, i,e, N -\. This is represented as a line in Figure 6,15. The vector tl'N"LMs(I"t) is

6.7 Variable Step-Si'.ze LMS Algorithm

Theanalysis presenred La Section (;.3 shows that the step-size; parameter, p" plays a significant role-in controlling the performance of the LMS algorithm, On the one hand, the speed ;of convergence of the LM5 algorithm changes in proportion to its step-size patmmeter. As a result, a large step-size parameter may be required to minimize the transient time of the LMS algerlthin, On the other hand, to achieve a small misadjustmenr asmall step-size: parameter has to be used. These are. ooniIlctin.g-reqlriremen.r.s and, thus, a eompromise solution bas. to be adopted. The variablestep-size LMS (VSLMS) algorithm which is introduced in this section is an effective solution to this problem (Harris, Chabries and Bishop, 1986).

fi The above results could also be derived by application of the methodcf the Lagrange multiplier; see Seotion 6.10.1 for an example-of the use ofthe Lagrange multiplier. Here we have selected 1\0 give a d:i¢ol'Q~vi:l:tion QUite results from the 6 rst principles of vectorcalculus, TIlls derivatioa isalsc instructive, since its application leads to the geometrical interpretation of the NLMS recursion depicted i.n figute6 .. 15.

176

The L.MS Algorithm

The vstMS algorithm works eo me basis of a simpk b:ew:isDctltat comes from the mechanism of the LMS algoruhm. Each ttLP of the adaptive ruler is given a separate time-varying step-size parameter and rhe LMS recursion is written as

Wt(n + I) = w;(7I) + 2pAn)e(l/)x{11 - i), thr i = 0" I, ••• ,N - 1,

(9.l19)

where \11,-(1'1) is Ili.e ith element of the tap~weight vector 11'(11) ani:! l11(n) is its associated step-size parameter at iterp:tion .II. the adjustment of the step-size pasametsr /),i(") is. done as follows. The correspondirrg stochassic gradient term Ki{n) = e(n)x(n - i) is monitored OVer successive iterarions.of the algorithm and J11.{n) is increased if the latter term censlsrently shows-a positlve.or negative d:irection~ This happens when the adaptive filter bas not yet converged. As the adaptive filte-r tap weights. converge to some vicini"l;y of thew c--ptimull\.\!Q.lucs, \ht. 2l'\!erageS of the stbchaStic gradient teI;ll\~ approac.h zeeo and hence ~h.t<y change signs more ft<::quentlY. This is detected by iee alg9rithm and the eerrespondiag st'ep-I'lfze parameters are gradually reduced to some minimum values. If tb.e situation changes and thealgcrtthm begins to hu-nt for a new optimum poins, then the gradient terms will indicate consistent (positive or negative) directions, resulting in an increase in the eorrespnndiegstep-size parameters, Toensure bhat the step-si2e parameters do n01 become too [aIlle (which liIay result in system ins1l,l:bility) or too small (which ma:y result lri a slow.reaction of the system to sudden changes), upper and lower Iimits should be specified ror each step-size parameter,

Following 'the above argument, the VSLMS algorithm step-size parameters, the !-',,(n)s, may be adjusted using the following recursion:

b(';(fl) = J.ti(n - l) + p sign fg,(il}] sign[g;.(I1- 1)]

(6.120)

who!:e p is a small poshivestep-siz» parameter, Thesign' funetiens may be dropped from (0.120). This results ill the following al terna live step-size parameter update eq ua tioa:

(6.121)

Both update equations (6.120) and (6.1Zl) work well in practice. Whj~ of the two ehoices works better is a pplieatlon dependen r, The ehoiee of one over the {Jth~r may also be decided .on (hi: basis of lheavailable hIlTdware/s0flwan!: platform on which the algorithm is to- be implemented. For instance, ira digital signal processor is being used, then recursion (6.121) may be much easier to implement. On the ocher hand, if a. custom chip is to he designed, them the update equation (6.120) may be preferred,

The-derivation of an 'ineql.la!ity'SUnilax Ip (6.73) to determine the range of the step-size

parameters thateusure the. stability of the VSLMS algorithm is rather difficult, because of the time-variation of the step-size parametess, Here, we-adopt a simple approach by assuming that the step-size parameters vary sl'ow}y so that for the stability analysis 'they may be assumed fixed and use the aealcgy between the resulting VSL1\il:8 algorithm equations and the conventional LMS algorithm to arrive ar a result which. through computer simulations, has been fo-und to be reasonable. Further results on the vSLMS algorithm misadjusunent and its tracking behaviour. along with computer simulation results, can be found in Chapter 14.

The set of update equations (6; L19) may be written in vector Iorm as

wtll + I) = w(n) + 2p\n)e-(n)x(n),

(6_ i21)

li;1dable Step-Size LMS Algorithm

17.7

-rable &.3 SUmmary of an implementation of variable ste~F.'lize- LMS 31 fjoritflm Input: Tap~wei'ghl vector, w(n),

:input vector, .x(II),

Gradient terms 8'0(11- 1 ),gl (II - J), - - - ,KN -I (n - 1), Step-size parameters, 1'0 (I! - Ll, fl.l (11 - 1), ... , I-lN- 1(1'1- il, anddesiredeutput. d(nl.

Output; Filter output, Y(Il),

I:;lJl-w.eigtit vectorupdate, W{II + 1), Gradient terms g()(n)'CI (11), ... ,g", _I (Ii),

-a,nd updated step-size parameters M{)(n),lll (II), ... ,PN-I {II).

I. Fi hering:

.1'(11) = wT(n),.{n) 2, Error estimation:

e(n) = d{n) - Y(I1)

3. Tap weig;hts and stqt-ECl_e parameters adaptation;

For i = 0, ! I _ .•• N - I ,gl(ll} = e(Il}""X{II- i)

~[n) = I{;(n - J) + psignjgl(n)l~ig'J1I$"f(Il-1)1 if I-'i(n) > ,:Lmru,jI.,(II) = ~

If /-li(lI) < il'min.! /1.;(11) = /An,n

w/(/1 + 1) = 1\1,(/1) + 2i-1,(n)g,(,j)

end

where I'-(n) is a diagonal matrix consisting of the step-size parameters /-lo(n) , /11 (n), •.. ,f),N -1 (n). Equation (6.l22) rna)' further he rearranged as

(6.12:1)

where notations. follow those of Section 6.2. Comparing (6. L23) with (6. 13) and the suhaeq nen t discussions on the stability of the conventional LMS algorithm in Section 63, we maY' .argue that t"Q ensure the stabiJity of rue VSLMS algorirhm, the scalar step-size parameter p. in (6.13) should be replaced by the diagonal matrix./1.(n). This leads to the inequality'

tr[J.t(n)RI < ~

(6.124)

as a sufficient condition which assures the stability of the VSLMS algoritlna, Although the inequality (6.124) may be used to impose some dynamic bounds on the step-size' parameters J4{n) as the adaptation of the filter proceeds, this leads to a rather complicated pre eess. Instead, in practice we usually prefer to use (6.73) to limit atJ /kt(n)s- to 'the same maximum value, say fJ.rniu'

The minimum bound that may be imposed -on the vanab!e step-size parameters, the p., (It)5 can be as low as zero- However, in actual praotice a positivebound is usually used

1.See. Chapter 14 for a formal deri:vaLioll of (6, 1'24).

1

1

178

The LMS .AI:gorilhm

Table-SA Summary of ~he compclex LMS algorlth-m

Input: Tap·weight ve<:tor, W(II), Input vector, x( II}.

and desired oUtput, d(II).

O:Ul:plll: Filler output, Y(II).

T.ap"weigi1t vector update, 11'(11 + 1).

I. Filtering;

Y(ll) = w~l(lI)x(lIl 2, Error estim ali on :

e(n) = d~/l) - y(n)

3, Ta:p-wdgW vector ad<lptiti.QII':

w(!'l + 1) = wen) + 2JLe'(~I)x(n)

so that the adaptation process wj.ll be on all the time and possible variations in the adaptive filter optimum tap weights can always be tracked, Here, we USIl the 110tatiOil /.lmio 10 reler to this lowe, bound. Table 6.3 summarizes an implementation of the VSLMS algorithm.

6.8 LMS Alg,orithm for Complex-Va.lued Sig,nals

In applications such as data transmission with quadrature amplitude modulation (QAM) signalling, and beamforming with baseband processing ·of signals, the underlying data signals and filter ·Guclficients are complex-valued, To modify the LMS teeursion for such applications, we use the definition of the gradient of real-valued fUIlCtj01lS of complex-valued variables as was defin!!d in Section 3.~. We consider an adaptivefiltcI with a complex-valued tap-input vector x(/I), a lBp"weight vector w(n) = [wQ(n) wi(/!) ... \\IN _I (nW, output Yen) = ~(n)x(I/), and desired output d(n).

The LMS .alg0othm .in this case works.on the ba5_is of the update equation

(6.125)

where V; denotescomplex gradient operator with .respeet 'to the variable vector w. This is defined as

(6.126)

where V'~, as was defined in Section 3.5, is aeomplex gradient with respect to the complex variable w. We recall that

c 8. a 9 ... =~+J-,

U11':R 81'1.'1

(6.127)

LMS A1gorlthm for Complex-Valued SignalS

179

where II'R and WI are the real and imaginary parts of w, respectively, andj = J'=T. We also note that in (6.126) the elements of the gradient vector v; are complex gradients with respect to the elements of'w and these elements are the qmjugate-s of the actual tap weights, i.e .. wo, wi, ... , }liN _ I' Furthermore; we nole that a direet substitution in (6.127) gfvl;/S

(6.11-8)

Replacing le(nW' by e(n)e-~ (Tl), using (6.128), and following a derivation similar to the Me that led to equation (3.63), we obtain

9;; le(n}12 = -leO (n)x(n - i), for i = 0,1, .. , N - I,

(6.129)

where the asterisk denotes complex conjugation. Substituting (6.129) and the delini tion

(0.116) in (6.125). we obtain .

w(n+ I) =\\'(n) + 2/-11" (n):x(nJ.

(6.130)

This is the desired LMs recursion for the case where the underlying processes are cemplex-valued. Table 6.4 summarizes implementatian Of the LMS algorithm for compfex-valued signals,

The eonvergence properties of the l.M8 algorithm for complex-valued signals arc very siml tar to those of the real-valued signals. These properties are summarized below for reference:

• TJJe time-constant equation (6.33) is also applicable toadaptive filters witb complexvalued signals.

• The misadjsstmenrequation (6.60) has l{) be modified slIghtly. This modification is the result of the fact that forcenrplex-valued jointly Gaussian random variables the equality (6A·6) has to' be replaced by

(6.131 )

Taking note of this and follcwing a similar derivation as in. Section 6.3, We Gbtains

(6.132)

• When the step-size parameter, 11-, is small, so that ux;«: I, [or i = 0., 1, ... ,.N - 1,. (6. lJ2) reduces to (6.63). Thus, the approximatlon (6.63) is also appticable to tbeease

where the undedyi!lg signals are complex-valued, ' .

a A detailed derivation of this result ean be found in Haykin (991).

You might also like