Multimicrophone Speech Dereverberation With Noise For Hands-Free Communication

Multimicrophone Speech Dereverberation with Noise
for Hands-free Communication
by
Mohammad Ariful Haque
DOCTOR OF PHILOSOPHY
Department of Electrical and Electronic Engineering

BANGLADESH UNIVERSITY OF ENGINEERING AND TECHNOLOGY
2009
CANDIDATES DECLARATION
It is hereby declared that this dissertation or any part of it has not been submitted
elsewhere for the award of any degree or diploma.
Signature of the Candidate
Mohammad Ariful Haque
The dissertation entitled Multimicrophone Speech Dereverberation with Noise

for Hands-free Communication, submitted by Mohammad Ariful Haque, Roll No:
P10060603, Session: October 2006, has been accepted as satisfactory in partial fulfillment of
the requirement for the degree of Doctor of Philosophy on November 23, 2009.
Board of Examiners
1.
Prof. Md. Kamrul Hasan
Department of Electrical & Electronic Engineering
BUET, Dhaka 1000
Chairman
(Supervisor)
2.
Prof. M. Rezwan Khan
The Vice Chancellor
United International University, Dhaka 1209
Member
Prof. Kazi Mujibur Rahman

BUET, Dhaka 1000
Member
Dr. Newaz M. Syfur Rahim

BUET, Dhaka 1000
Member
Dr. Mohammed Imamul Hassan Bhuiyan

BUET, Dhaka 1000
Member
3.
4.
5.
6.
Prof. Satya Prasad Majumder
Head of the Department
BUET, Dhaka 1000
Member
(Ex-officio)
7.
Prof. Keikichi Hirose
Department of Information and Communication Engineering
The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, Japan
ii
Member
(External)
Dedication
To the people who are working for real change of the humanity.
iii
Contents
Acknowledgements
xviii
Abstract
xix
1 Introduction
1.1
Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2
Hands-free Communication Environment . . . . . . . . . . . . . . . . .
1.2.1
Background noise . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2.2
Reverberation . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3
Effect of Reverberation on Speech Perception
. . . . . . . . . . . . . .
1.4
Dereverberation problem . . . . . . . . . . . . . . . . . . . . . . . . . .
1.5
Research Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.6
Dissertation Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
2 Literature Review
2.1
13
Non-channel-information-based Techniques . . . . . . . . . . . . . . . .
14
2.1.1
Beamforming . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14
2.1.2
LP residual processing . . . . . . . . . . . . . . . . . . . . . . .
15
iv
CONTENTS
2.2
2.3
2.1.3
Spectral enhancement . . . . . . . . . . . . . . . . . . . . . . .
17
2.1.4
LIME . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
18
2.1.5
HERB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
20
Channel-information-based Techniques . . . . . . . . . . . . . . . . . .
20
2.2.1
Direct equalization . . . . . . . . . . . . . . . . . . . . . . . . .
21
2.2.2
Blind channel identification and equalization . . . . . . . . . . .
22
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
25
3 Multichannel LMS Algorithm for Blind Channel Identification:

Robustness Issue
26
3.1
Basic Concept and Techniques . . . . . . . . . . . . . . . . . . . . . . .
26
3.1.1
Problem formulation . . . . . . . . . . . . . . . . . . . . . . . .
26
3.1.2
Identifiability condition . . . . . . . . . . . . . . . . . . . . . . .
27
Review of Common BCI Schemes . . . . . . . . . . . . . . . . . . . . .
28
3.2.1
Multichannel LMS algorithm . . . . . . . . . . . . . . . . . . . .
29
3.2.2
Convergence analysis of MCLMS algorithm . . . . . . . . . . . .
30
3.2.3
Limitations of the time-domain MCLMS algorithm . . . . . . .
32
3.3
The MCFLMS Algorithm . . . . . . . . . . . . . . . . . . . . . . . . .
33
3.4
The Normalized MCFLMS Algorithm . . . . . . . . . . . . . . . . . . .
36
3.5
Convergence Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . .
37
3.6
Effect of Noise on the NMCFLMS Solution . . . . . . . . . . . . . . . .
42
3.7
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
46
3.2
4 Variable Step-size Multichannel Frequency-Domain LMS for Blind
CONTENTS
vi
Identification of FIR Channels
47
4.1
Optimum-Step-size MCFLMS Algorithm . . . . . . . . . . . . . . . . .
48
4.2
Convergence Analysis of the VSS-MCFLMS Algorithm . . . . . . . . .
49
4.3
VSS-MCFLMS vs. NMCFLMS: Algorithmic Difference . . . . . . . . .
55
4.3.1
Performance analysis of the NMCFLMS . . . . . . . . . . . . .
55
4.3.2
Comparison of computational complexity . . . . . . . . . . . . .
58
Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
60
4.4.1
Random multichannel system . . . . . . . . . . . . . . . . . . .
60
4.4.2
The virtual acoustic room . . . . . . . . . . . . . . . . . . . . .
62
4.4.3
Acoustic multichannel system . . . . . . . . . . . . . . . . . . .
64
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
67
4.4
4.5
5 Noise Robust Multichannel Time- and Frequency-Domain LMS-type

Algorithms
68
5.1
Excitation-driven Robust MCLMS Algorithm . . . . . . . . . . . . . .
68
5.1.1
The desired estimate from the noisy eigenvectors
. . . . . . . .
69
5.1.2
The noise-robust MCLMS algorithm . . . . . . . . . . . . . . .
72
5.1.3
VSS-RMCLMS and proportionate algorithms . . . . . . . . . .
75
5.1.4
Robust multichannel frequency-domain LMS algorithms
. . . .
77
5.1.5
Simulation results . . . . . . . . . . . . . . . . . . . . . . . . . .
79
5.1.6
Limitation of excitation-driven MCLMS algorithm . . . . . . . .
82
Spectrally Constrained RMCLMS Algorithm . . . . . . . . . . . . . . .
82
5.2.1
86
5.2
Simulation results . . . . . . . . . . . . . . . . . . . . . . . . . .
CONTENTS
5.3
vii
5.2.2
Estimation of long AIR . . . . . . . . . . . . . . . . . . . . . . .
89
5.2.3
Estimation of real reverberant acoustic channels . . . . . . . . .
91
5.2.4
Effect of number of microphones . . . . . . . . . . . . . . . . . .
92
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
93
6 Robust Speech Dereverberation Using Channel Information

6.1
6.2
6.3
95
Spectrally Constrained Channel Shortening for Speech Dereverberation
95
6.1.1
Delay-and-sum beamforming . . . . . . . . . . . . . . . . . . . .
96
6.1.2
Channel shortening . . . . . . . . . . . . . . . . . . . . . . . . .
97
6.1.3
Simulation results . . . . . . . . . . . . . . . . . . . . . . . . . . 100
6.1.4
Limitations of channel shortening . . . . . . . . . . . . . . . . . 104
Zero Forcing Equalizer based Dereverberation . . . . . . . . . . . . . . 106

6.2.1
Eigenfilter based SNR improvement . . . . . . . . . . . . . . . . 106
6.2.2
Zero forcing equalizer . . . . . . . . . . . . . . . . . . . . . . . . 110
6.2.3
Simulation results . . . . . . . . . . . . . . . . . . . . . . . . . . 116
6.2.4
Acoustic channels using the Image model . . . . . . . . . . . . . 117
6.2.5
Real reverberant channels . . . . . . . . . . . . . . . . . . . . . 124
6.2.6
Time-varying condition . . . . . . . . . . . . . . . . . . . . . . . 128
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
7 Conclusions and Further Research
130
7.1
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
7.2
Future Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
CONTENTS
viii
A Iterative solution for finding the eigenvector corresponding to the

largest eigenvalue
List of publications
134
136
List of Tables
5.1
Impact of approximations on the desired estimate . . . . . . . . . . . .
5.2
Comparison of the final solution using conventional and robust MCLMS

algorithm: M = 5, L = 512 . . . . . . . . . . . . . . . . . . . . . . . .
5.3
72
75
Spectrally constrained normalized multichannel frequency-domain LMS

algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
87
6.1
Results of DRR, PESQ and SNR improvement . . . . . . . . . . . . . . 103
6.2
Results of SNR, DRR and PESQ improvement with and without delayand-sum beamformer . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
6.3
Results of DRR, PESQ and SNR improvement . . . . . . . . . . . . . . 105
6.4
Frequency domain constrained eigensolver algorithm . . . . . . . . . . . 111
6.5
Block adaptive zero forcing equalizer algorithm . . . . . . . . . . . . . 116
6.6
Effect of the eigenfilter on the dereverberation performance of the

proposed scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
6.7
Performance of the proposed algorithm at different SNRs . . . . . . . . 121
6.8
Quality of the dereverberated speech in terms of LLR for the proposed

and other state-of-the-art techniques . . . . . . . . . . . . . . . . . . . 121
ix
LIST OF TABLES
6.9
Quality of the dereverberated speech in terms of segSNR for the

proposed and other state-of-the-art techniques . . . . . . . . . . . . . . 122
6.10 Quality of the dereverberated speech in terms of WSS for the proposed
6.11 Quality of the dereverberated speech in terms of PESQ for the proposed
6.12 Quality of the dereverberated speech for the real acoustic channels in
terms of LLR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
terms of segSNR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
terms of WSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
terms of PESQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
List of Figures
1.1
Hands-free speech communication environment. . . . . . . . . . . . . .
1.2
Acoustic wave propagation from the speaker to the microphone. . . . .
1.3
Typical acoustic impulse response with different segments. . . . . . . .
1.4
Spectrogram and time-waveform of a clean speech signal. . . . . . . . .
1.5
Spectrogram and time-waveform of a reverberated speech signal. . . . .
1.6
Multimicrophone speech dereverberation scenario. . . . . . . . . . . . .
1.7
Block diagram representation of the proposed dereverberation model. .
10
2.1
The structure of Generalized Sidelobe Canceller. . . . . . . . . . . . . .
14
2.2
Spectral subtraction technique for dereverberation. . . . . . . . . . . .
17
2.3
LIME structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
19
2.4
Misconvergence of the NMCLMS algorithm in blind identification of the

AIRs in noisy condition . . . . . . . . . . . . . . . . . . . . . . . . . . .
23
3.1
Block diagram of a multi-microphone speech acquisition system. . . . .
27
3.2
Amplitude distribution of the transform coefficients in the noise-free case

after long iterations . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
xi
41
LIST OF FIGURES
3.3
xii
Eigenvalue profile and amplitude distribution of the transform

coefficients in the noisy condition after long iterations . . . . . . . . . .
3.4
Relative weights of the noise-free eigenvectors after 15000 iterations at

SN R = 15 dB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.5
43
44
Magnitude spectrums and estimated impulse responses obtained from

the NMCFLMS algorithms and convergence analysis . . . . . . . . . .
45
4.1
Stability of the MCFLMS algorithm using the optimal-step-size . . . .
54
4.2
Reorientation of the eigenvalue profile in the NMCFLMS algorithm . .
58
4.3
Comparison of the computational complexities of the proposed VSSMCFLMS and NMCFLMS algorithms . . . . . . . . . . . . . . . . . .
60
4.4
Impulse responses of a single input 3 output FIR system . . . . . . . .
61
4.5
NPM profile of the VSS-MCFLMS and NMCFLMS algorithms for M =

3 channels L = 32 random coefficients SIMO system at SNR = 20 dB. .
4.6
62
NPM profile of the VSS-MCFLMS and NMCFLMS algorithms at SNR

= 40 dB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
62
4.7
Schematic diagram of the simulated room
. . . . . . . . . . . . . . . .
63
4.8
Virtual sources in a rectangular room . . . . . . . . . . . . . . . . . . .
64
4.9
Noise robustness of the NMCFLMS and the proposed VSS-MCFLMS

algorithms for acoustic system at low SNR . . . . . . . . . . . . . . . .
65
4.10 NPM of the VSS-MCFLMS and NMCFLMS algorithms for SIMO FIR
system at SNR = 25 dB . . . . . . . . . . . . . . . . . . . . . . . . . .
65
4.11 NPM of the VSS-MCFLMS and NMCFLMS algorithms for acoustic

system at SNR = 40 dB . . . . . . . . . . . . . . . . . . . . . . . . . .
66
LIST OF FIGURES
xiii
4.12 NPM of the VSS-MCFLMS and NMCFLMS algorithms for acoustic

system at SNR = 20 dB for speech input . . . . . . . . . . . . . . . . .
5.1
Convergence profile of the excitation-driven robust and conventional

multichannel time-domain LMS-type algorithms . . . . . . . . . . . . .
5.2
88
NPM profile the spectrally constrained and conventional MCFLMS

algorithms with speech input at SNR = 15 dB . . . . . . . . . . . . . .
5.5
81
NPM profile of the spectrally constrained and conventional MCFLMS

algorithms with white Gaussian input at SNR = 10 dB . . . . . . . . .
5.4
80
Convergence profile of the excitation-driven robust and conventional

multichannel frequency-domain LMS-type algorithms . . . . . . . . . .
5.3
67
88
Convergence profile of the Spectrally Constrained RNMCFLMS

algorithm for the identification of long acoustic channels . . . . . . . .
89
5.6
Time-domain samples of the true and estimated channels . . . . . . . .
90
5.7
Channel estimation profile with iterations for time-varying channels

using the spectrally constrained RNMCFLMS algorithm. . . . . . . . .
5.8
Convergence profile of the Spectrally Constrained RNMCFLMS

algorithm for the identification of real reverberant acoustic channels . .
5.9
91
92
(a) True acoustic channel obtained from the MARDY. (b) Estimated
channel using the Spectrally Constrained algorithm. . . . . . . . . . . .
92
5.10 Effect of the number of microphones on algorithms convergence profile
93
6.1
Dereverberation technique using the delay-and-sum beamformer and

channel shortening equalizer. . . . . . . . . . . . . . . . . . . . . . . . .
6.2
96
Impulse responses of the original, estimated and shortened channels . . 101
LIST OF FIGURES
6.3
xiv
Convergence profile of the shortening algorithm with fixed step-size and

proposed variable step-size. . . . . . . . . . . . . . . . . . . . . . . . . . 102
6.4
Frequency spectrum of the original and shortened channels . . . . . . . 102
6.5
Comparison of mean time per iteration for the proposed and infinitynorm algorithms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
6.6
Block diagram model of the proposed zero-forcing dereverberation

technique. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
6.7
Block diagram of the signal path and noise path for the kth channel. . 108
6.8
Channel equalization at SNR =
20 dB using the proposed
dereverberation method. (a) original (b) estimated (c) equalized. . . . . 117

6.9
Frequency spectrum of the equivalent channel from the speaker to the

output of the beamformer (a) without spectral constraint (b) with
spectral constraint, = 0.5. . . . . . . . . . . . . . . . . . . . . . . . . 118
6.10 Spectrogram of the (a) clean speech (b) noisy reverberated speech at
30 dB SNR (c) denoised speech (d) dereverberated using the proposed
method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
6.11 Quality of the dereverberated speech at different block-lengths of the
proposed zero forcing equalization. . . . . . . . . . . . . . . . . . . . . 120
6.12 Impulse responses of real reverberant acoustic channels. The length of
each impulse response is L = 4400. . . . . . . . . . . . . . . . . . . . . 124
6.13 Convergence profile of the robust NMCFLMS algorithm for time-varying
channels. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
Glossary
AIR
Acoustic Impulse Response
Avg-Seg-SNR
Average Segmental SNR
AR
Auto Regressive
BCI
Blind Channel Identification
CR
Cross-Relation
DRR
Direct-to-Reverberant energy Ratio
FIR
Finite Impulse Response
GSC
Generalized Sidelobe Canceler
HERB
Harmonicity based dEReverBeration
ISS
Inverse-filtering with Spectral Subtraction
LIME
LInear-predictive Multi-input Equalization
LLR
Log-Likelihood Ratio
LMS
Least-Mean-Square
LP
Linear Prediction
LPC
Linear Prediction Coefficients
xv
glossary
xvi
LS
Least-Squares
MARDY
Multichannel Acoustic Reverberation Database at York
MCLMS
Multi Channel Least-Mean-Square
MCFLMS
Multi Channel Frequency-domain Least-Mean-Square
MINT
Multiple-input/output INverse-filtering Theorem
MIMO
Multiple-Input Multiple-Output
MLP
Multichannel Linear Prediction
MMSE
Minimum Mean-Square Error
MRE
Mutually Referenced Equalizers
NMCFLMS
Normalized Multi Channel Frequency-domain Least-Mean-Square
NPM
Normalized Projection Misalignment
PESQ
Perceptual Evaluation of Speech Quality
PVSS-MCFLMS Proportionate Variable Step Size Multi Channel Frequency-domain LMS

RMCLMS
Robust Multi Channel Least-Mean-Square
RMCFLMS
Robust Multi Channel Frequency-domain Least-Mean-Square
RNMCFLMS
Robust Normalized Multi Channel Frequency-domain LMS
RT
Reverberation Time
SIMO
Single-Input Multiple-Output
SNR
Signal-to-Noise Ratio
SOS
Second-Order Statistics
VSS
Variable Step Size
glossary
xvii
VSS-MCFLMS Variable Step Size Multi Channel Frequency-domain LMS

WSS
Weighted Spectral Slope
ZFE
Zero-Forcing Equalizer
Acknowledgments
All praises and thanks belong to Allah, the creator and manager of this world, for
His innumerable favors and bounties on me. It is He who has granted me physical and
mental capabilities to complete the dissertation, without anything or anyone compelling
Him to do so.
I cannot find appropriate words to convey my appreciation and gratitude to Prof.
Md. Kamrul Hasan for allowing me to work with him. It was his idea to work with
dereverberation, an interesting and challenging problem. His guidance and amazing
insight helped me to overcome many obstacles throughout the period of my research.
I am grateful for his patience, support and encouragement. His care was extended not
only to academic affairs, but also during some tumultuous period of my life. For this,
I am ever grateful to him.
I would like to thank the esteemed committee members for willing to take the extra
burden of reading and discussing the dissertation with me. This gave me a deeper
insight on the research topic. I am personally grateful to Prof. Aminul Hoque, Head
of the department of EEE, for his encouragement to complete the work. I also express
my gratitude for the members of committee for advanced studies and research (CASR)
for approving the financial grant requested for the research work.
I also feel lucky for the excellent research environment we have in the DSP lab.
I got acquainted with many signal processing topics ranging from speech to image
and biomedical signal processing from the discussions and presentations of the fellow
students. I am particularly thankful to, Toufiqul Islam, for the interaction I had with
him on channel shortening technique.
Finally, this dissertation is as much of my effort as it is to my family. In fact it is a
result of the motivation and support provided by my parents, brother, sister, and my
wife.
xviii
Abstract
Reverberation is one of the primary factors that degrade the quality of speech
signals when recorded by a distant microphone in order to facilitate hands-free
communication. Undoing the effect of reverberation is still a challenging problem
especially when additive noise and time-varying acoustic channels are considered. In
this dissertation, several multimicrophone dereverberation techniques are developed
that can dereverberate the recorded speech as well as improve the signal-to-noise
ratio (SNR) considering a practical acoustic environment. The methods are based
on the adaptive estimation of the long acoustic impulse responses (AIRs) using the
multichannel LMS (MCLMS) algorithm. Although the MCLMS algorithm is attractive
for its simplicity and computational efficiency, it suffers from slow convergence rate,
step-size ambiguity, and last but not the least, lack of robustness in the presence
of noise. A variable-step-size frequency-domain MCLMS algorithm is proposed that
can ensure stability and optimal convergence speed both in the noise-free and noisy
conditions. To improve the noise-robustness of the class of MCLMS algorithms, two
novel solutions, namely, excitation-driven MCLMS and spectrally constraint MCLMS
algorithms are proposed that can successfully estimate the long AIRs with reasonable
accuracy.
Based on adaptive estimation of the AIRs, two different dereverberation techniques
are proposed. In the first approach, dereverberation is achieved by suppressing the
late reverberation using channel shortening technique and the SNR is improved by
delay-and-sum beamforming. The proposed shortening algorithm is also optimized
so that it makes a trade-off between shortening performance and spectral distortion
in the dereverberated speech. In the second approach, the power of the speech
components in the received microphone signals are first enhanced by an eigenfilter
and then a block-adaptive zero-forcing equalizer is employed to eliminate the channel
distortion introduced by the AIRs and eigenfilter. The eigenfilter is efficiently estimated
avoiding the tedious Cholesky factorization and it also resists spectral nulls so that
noise amplification is mitigated at the output of the zero-forcing equalizer. Extensive
experiments are conducted, using both simulated and real reverberant acoustic
channels, which demonstrate that the proposed methods can offer better speech quality
and SNR improvement as compared to the state-of-the-art dereverberation techniques.
xix
Chapter 1
Introduction
1.1
Motivation
Hands-free system is one of the key aspects of next generation speech communication.
The main user benefit in hands-free operation is the ability to walk freely without
wearing a headset or holding a microphone.
This provides a natural way of
communication while giving a lecture or video conferencing. The hands-free system

can ensure safety and convenience while driving a car. The voice-controlled systems
may allow the surgeons and nurses to move freely around the patient. The hearing aids
and cochlear implants may increase the hearing capacity of the disabled persons.
In all the above-mentioned applications, the common point is that the microphone is
placed at a certain distance from the speaker. This introduces several types of distortion
in the recorded microphone signal that are not usually encountered in conventional
hand-held operation. Due to the distance between the speaker and microphone, the
direct path signal as well as signal reflections (i.e., reverberation) from the walls, floors
and other objects are recorded. These reflections can seriously degrade the intelligibility
of the recorded speech signal. Moreover, strong background noise from the fans, traffic,
and other audio equipments further deteriorates the perceptual quality of the speech. It
should be noted that many, if not all, existing acoustic signal processing techniques such
as automatic speech recognition, speech coding algorithms fail completely or experience
1
Chapter 1. Introduction
fan
speaker
microphone
Figure 1.1: Hands-free speech communication environment.

a dramatically reduced performance when reverberation and noise are present.
To counterbalance the degradations caused by reverberation and background noise,
high-performance speech enhancement techniques are required. Although state-of-theart acoustic signal processing algorithms are available to reduce noise, the algorithms
that can compensate the detrimental effect of reverberation in a practical noisy
reverberant environment are yet to come.
1.2
Hands-free Communication Environment
Figure 1.1 depicts a typical hands-free speech communication environment. Contrary

to classical communication systems, the microphone is located at certain distance from
the speaker which allows him to walk freely in the room. However, this advantage is
obtained as a trade-off between the mobility of the speaker and the amount of unwanted
signals in the recorded microphone signal. The more the distance between the speaker
and the microphone, the greater the noise, reverberation, and other interfering signals
corrupt the clean speech signal. The properties of the noise signals and characteristics
of the acoustic environment in a typical hands-free communication system are discussed
below.
ceiling
microphone
speaker
Figure 1.2: Acoustic wave propagation from the speaker to the microphone.
1.2.1
Background noise
Background noise is the most common unwanted signal in the recorded sound generally
arising from fans, traffic, audio equipment, or other speakers present in the room. One
of the widely used properties of a noise signal is the assumption that it is a stationary
signal. This mean that its statistical properties, such as, mean and variance do not
change with time. A good statistical model of the stationary noise in the time-domain
is a zero mean Gaussian process. This statistical model is known as Gaussian noise
which mean that every sample of the noise signal is a random value with a Gaussian
probability density function given by
p(x) =
x2
1
e 22
(2)
(1.1)
where 2 is the variance of the noise signal. The engine noise, air conditioner or fan
noise are good examples of stationary noise. There are non-stationary noises too, such
as sound from door opening/closing, passing cars etc. In this work, the stationary
Gaussian noise is considered only.
1.2.2
Reverberation
Reverberation is intuitively described by the concept of reflections. Acoustic waves,

coming from the speaker, propagate through the air and are reflected by the walls,
floor, ceiling, and in principle by any object present in the room. The acoustic wave
propagation model in reverberant condition is illustrated in Fig. 1.2. Due to the
differences in the lengths of the propagation paths and diverse absorption capacity of
the different interior objects, each wavefront arrives at the microphone with different
amplitude and phase. As a result, the microphone captures not only the direct signal
from the speaker, but also multiple delayed copies of the signal with reduced magnitude
and changed spectral content. The term reverberation, therefore, implies the presence
of delayed and attenuated copies of the source signal in the received signal.
The recorded speech in a reverberant environment generally consists of a direct
path signal, reflections that arrive shortly after the direct signal (known as early
reverberation), and reflections that arrive after the early reverberation (known as late
reverberation). The different components of the recorded speech can be described as
follows.
Direct path signal: The first sound that is received through the free-space without
any reflection is called the direct path signal. In case the speaker is not in line of sight
of the microphone, there is no direct sound.
Early reverberation: The reflections of the original source signal that are received
within approximately 50 ms from the arrival of the direct path signal are known as
early reverberation. Early reverberation is not perceived as a separate sound from the
direct sound and hence it, in fact, reinforces the direct sound. For example, due to this
reinforcement, it is easier to hold conversations in a closed room as compared to the
outdoor. Early reverberation also causes a spectral distortion known as coloration.
Late reverberation: The reflections which arrive with larger delays after the
arrival of the direct path signal is called late reverberation. It is the late reverberation
that significantly impairs the perceptual quality and intelligibility of the speech signal.
0.1
early reflections
late reflections
0.08
amplitude
0.06
direct path
0.04
0.02
0
0.02
50
100
150
time (ms)
200
250
300
Figure 1.3: Typical acoustic impulse response with different segments.

The process of reverberation can be mathematically described by the Acoustic
Impulse Responses (AIRs) between the speaker and microphone. The AIR is the signal
that is measured at the microphone in response to a source that produces a sound
impulse. The recorded microphone signal is, therefore, the convolution of the source
signal with the AIR. The AIR can be divided into three segments, the direct path, early
reflections, and late reflections, as illustrated in Fig. 1.3. The convolution of these
segments with the source signal results in the direct path signal, early reverberation,
and late reverberation, respectively. Furthermore, it is important to note that the
energy of the reflections decays at an exponential rate. This exponential decay is
a well-known property of the AIR, which is quantified by the parameter known as
reverberation time. The reverberation time, denoted by RT60 , is defined as the time
that is necessary for a 60 dB decay of the sound energy after switching off a sound
source.
Figure 1.4: Spectrogram and time-waveform of a clean speech signal.
1.3
Effect of Reverberation on Speech Perception
In fact, speech and audio appears more pleasant to the listener when some reverberation
is present [1]. However, in highly reverberant environments the intelligibility of speech
signal drops considerably [2]. The effects of reverberation on speech are clearly visible
in the spectrogram as well as from the time waveform.
Fig.
1.4 illustrates the
spectrogram and waveform of a clean speech signal. It is observed from the spectrogram
that the clean speech has vivid harmonic structure whereas the time-waveform shows
that phonemes are well separated in time. The spectrogram and waveform of the
reverberated signal are shown in Fig. 1.5. The distortion of the speech signal that is
caused by reverberation is clearly visible. The frequency spectrum has been smeared
which is observed from the spectrogram and the phonemes are overlapped which
is visible from both the spectrogram and time-waveform. Due to this overlapping,
the empty spaces between words and syllabi are now filled by reverberation. These
distortions cause an audible difference between the clean and the reverberant speech,
and degrade perceptual quality and intelligibility of the speech.
The distortions resulting from the reverberation seem to go unnoticed by the normal
Figure 1.5: Spectrogram and time-waveform of a reverberated speech signal.

listeners especially in a small room. To a large extent this is due to the fact that we use
our two ears and brain to focus on the speaker. As a result we can ignore and reduce the
reverberation and background noise [3]. To the contrary, when speech signal is recorded
by a single microphone and transmitted to another room via a communication channel,
or just recorded and listened later, the spatial cues are lost and the brain cannot
undo the reverberation any more. This is not only annoying, but also the listeners
get tired and lose their attention faster [4]. Similarly, speech acquisition systems,
speech recognition systems and acoustic source localization systems do not process
the incoming signals as the human auditory system, and their performance seriously
degrades with increasing levels of background noise and reverberation [5], [6].
1.4
Dereverberation problem
Dereverberation is an acoustic signal processing technique that aims to extract an

estimate of the original speech signal from one or more observations of the reverberant
signal. An effective, efficient and robust dereverberation technique requires that the
following issues must be taken into account.
Dereverberation is a blind problem i.e., the source signal is unknown to the

receiver. No training pulses can be sent to estimate the long AIR, typically
consisting of thousands of coefficients.
The degree of complexity of dereverberation problem increases many fold in a
noisy environment. The frequency response of the AIR have nulls that cause
significant noise amplification during the channel equalization process.
The AIRs are inherently time-varying and a slight movement of the head, which
is natural during conversation, causes the AIR to be changed.
At some point, the reflections arrival rate exceeds even the sampling rate.
All these factors make the speech dereverberation problem a difficult task. Since
the early days of acoustic signal processing, researchers have developed numerous
methods to deal with reverberation and background noise. A detail survey of the
existing dereverberation techniques are presented in chapter 2. However, only few
of them are useful in practice. Moreover, their performance is limited to certain
acoustic environment. The objective of this dissertation is to develop an effective,
computationally efficient, and robust speech dereverberation technique that can
eliminate the effect of reverberation in a practical noisy and time-varying environment.
1.5
Research Overview
A typical multi-microphone dereverberation scenario is presented in Fig. 1.6. In the

case of a single speaker and multiple microphones, the total system to be identified
and equalized can be considered as single-input multiple-output (SIMO) system.
Blind channel identification (BCI) techniques can therefore be utilized to estimate
the AIRs, followed by an appropriate equalizer for dereverberation. However, because
of the low-rank model of speech, these channel identification problems become rankdeficient or at least very ill-conditioned. Therefore it is quite difficult to estimate
and track the complete AIRs, specially when a large amount of background noise is
fan
Dereverberation
speaker
microphone
array
Figure 1.6: Multimicrophone speech dereverberation scenario.

present. A number of BCI techniques are available in the literature [30],[31],[33],[34],
however, few of them are suitable for adaptive solution. Among the different adaptive
BCI techniques, the multichannel Least-Mean-Square (MCLMS) algorithm [34] is
particularly attractive for its simplicity and computational efficiency. Nevertheless, the
MCLMS algorithm is subject to three major shortcomings such as slow convergence
rate, step-size ambiguity that influences the speed, final misalignment and stability of
the algorithm, and last but not the least, the lack of robustness in the presence of noise.
Even we have a robust estimate of the AIRs, it is not straight forward to achieve
dereverberation by inverse filtering technique. The reasons are as follows. Multichannel
inverse filtering techniques are sensitive to channel estimation accuracy and hence
direct inversion based on the erroneous estimates does not work in the noisy condition.
Moreover, noise power gets amplified when inverse filtering operation is performed.
Therefore, the equalizer should be properly designed so that noise amplification is
mitigated. With this background, the research outcome of this dissertation can be
summarized as follows.
A class of MCLMS algorithms are proposed for blind estimation of the AIRs
Source
signal
Channel
10
Additive
noise
Microphone
signal
Enhanced
signal
Dereverberated
speech
v1(n)
H1(z)
y1(n)
x1(n)
v2(n)
s(n)
H2(z)
y2(n)
x2(n)
z (n)
SIGNAL
DEREVERBERATION
ENHANCING
BLOCK
BLOCK
s(n)
vM(n)
HM(z)
yM(n)
xM(n)
ROBUST
CHANNEL
ESTIMATE
Figure 1.7: Block diagram representation of the proposed dereverberation model.

overcoming the limitations of slow convergence speed, step-size ambiguity and
non-robustness of the conventional algorithms.
A complete speech dereverberation technique is presented based on adaptive
estimation of the AIRs, that can dereverberate the speech as well as improves
the Signal-to-Noise Ratio (SNR).
The block diagram of the proposed dereverberation model is illustrated in Fig. 1.7.
The proposed method works in two stages. First, the AIRs are adaptively estimated
from the noisy microphone signal using the robust MCLMS algorithms. Then the
received signal are filtered through a multichannel signal power enhancement block
that improve the SNR of the recorded signal. Then a suitable equalizer is employed
that dereverberate the speech utilizing the estimates of the AIRs.
1.6
11
Dissertation Outline
This section provides a chapter by chapter overview of the whole dissertation.

Chapter 2 presents a literature review of the state-of-the-art dereverberation
techniques. The existing techniques are categorized into two main groups based on
whether they require channel information or not to achieve speech dereverberation. It
is opined that dereverberation techniques based on channel estimation are theoretically
more attractive than the non-channel-information based approach as they eradicate
reverberation by undoing the process that caused reverberation.
Chapter 3 briefly describes the BCI techniques and the necessary identifiability
conditions of the BCI. The time- and frequency-domain MCLMS algorithms are
reviewed as an effective algorithm for BCI. The detail convergence analysis of the
MCLMS algorithm is also presented in this chapter. It is demonstrated that the
minimum mean-square solution that minimizes the cross-relation error does not give
the desired channel estimate in the noisy condition.
A variable step-size (VSS) multichannel frequency-domain LMS (VSS-MCFLMS)
algorithm is proposed in Chapter 4 for removing the step-size ambiguity of the
MCFLMS algorithm.
The expression of VSS is derived in such a way that it
minimizes the misalignment of the estimated channel vector with the true one in each
iteration. It has been demonstrated that the proposed VSS guarantees the stability
of the MCLMS algorithm. Performance comparison of the proposed algorithm with
a state-of-the-art normalized MCFLMS (NMCFLMS) algorithm shows that the VSSMCFLMS algorithm is more noise-robust without sacrificing the speed of convergence
and computational efficiency.
None of the time- and frequency-domain MCLMS algorithms are sufficiently robust
for estimating the AIRs in the noisy condition. Two novel solutions are presented in
Chapter 5 that improve the noise-robustness of the class of MCLMS algorithms. The
first one is termed as excitation-driven MCLMS algorithm and the second one is called
spectrally-constrained MCLMS algorithm. It is demonstrated that the later algorithm
12
can successfully estimate the AIRs in a time-varying acoustic environment with noise.
Chapter 6 addresses the problem of speech dereverberation utilizing the estimates
of AIRs. Two dereverberation techniques are presented. The first approach focus
on the suppression of late reflections based on the fact that dereverberation does not
need complete equalization of the acoustic channel and, therefore, a shortened channel
which requires less computation with acceptable performance can serve the purpose.
The proposed shortening algorithm is optimized so that it makes a trade-off between
shortening performance and spectral distortion in the dereverberated speech. In the
second approach, both early and late reverberations are eliminated using a zero forcing
equalizer and the SNR is improved using an eigenfilter. The eigenfilter is efficiently
computed avoiding the tedious Cholesky decomposition, solely from the estimates of
AIRs. The design of the eigenfilter also resists spectral nulls in equivalent channel
so that noise amplification is significantly mitigated at the output of the equalization
process. Then the block-adaptive zero-forcing equalizer eliminate the channel distortion
introduced by the AIRs and eigenfilter. The proposed technique is found effective for
dereverberation in a situation where the speaker moves from his/her position causing
frequent changes in the AIR. Extensive simulation results are presented using both
simulated and real reverberant channels that demonstrate the superior performance of
the proposed methods as compared to the state-of-the-art dereverberation techniques.
The conclusion, Chapter 7, summarizes the main contributions of this work and
gives directions for further research.
Chapter 2
Literature Review
The effect of reverberation on speech perception was first reported in a Patent by Ryall
in 1938 [7]. Since then so many researchers have contributed to the dereverberation
problem that it became a complicated task, if not impossible, to categorize all these
methods. The classification may be based on different criteria such as the number of
required microphones (single-channel or multi-channel), the need for channel estimation
(channel-estimation based or non-channel-estimation based), or the way the technique
affects the reverberation (those affecting entire reverberation or those affecting late
reverberation). An extensive survey of the dereverberation techniques is presented
in [8] by grouping them in two major classes based on whether the AIRs need to be
estimated or not. The first category that does not require channel estimation is termed
as reverberation suppression technique. The second category that requires channel
knowledge is termed as reverberation cancelation technique. Although the survey seems
to be comprehensive, the classification does not fit with the nomenclature. Not all the
techniques that are based on channel estimation are capable of canceling reverberation
rather they can only reduce the reverberation effect. For this reason, we categorize the
dereverberation techniques as channel-information-based and non-channel-informationbased techniques, whether they cancel or suppress the reverberation.
13
Chapter 2. Literature Review
14
e(n)
s(n)
w(n)
Figure 2.1: The structure of Generalized Sidelobe Canceller.
2.1
Non-channel-information-based Techniques
Non-channel-information-based approach refers to those techniques that reduce the

effect of reverberation without any information of the AIRs that caused reverberation.
2.1.1
Beamforming
Beamforming methods can be applied to the speech dereverberation problem by

steering a microphone array to the speaker. Microphone array achieve directionality by
exploiting the fact that incoming acoustic waves will generally arrive at the different
microphones with slightly different times. The frequency components of these sounds
could either reinforce or cancel, depending on the angle of arrival, the frequency of the
component, the distance between the microphones, and the geometry of the microphone
array. As a result, by summing the microphone outputs, the array develops a directional
response that favors some directions over others.
A common type of beamformer is the generalized sidelobe canceler (GSC) which is
used when the source location is known. The GSC is composed of an adaptive filter
portion and a non-adaptive filter portion as shown in Fig. 2.1. The nonadaptive filter,
g is steered in the direction of s(n). The adaptive portion is constructed as the cascade
of a blocking matrix B and an adaptive filter w(n). The purpose of B is to stop the
desired signal s(n) from feeding into the adaptive portion of the system. Thus, the
15
adaptive filter w(n) will try to match the interference in the adaptive branch as close
as possible to the interference in the nonadaptive branch. The optimal filter w(n) is
found by minimizing the energy of the error signal
e(n) = gT s(n) BwT (n)s(n).

The blocking matrix B is usually not unique.
(2.1)
Thus, with the GSC-based
dereverberation approach, if we consider reverberation to be the interference coming

from directions other than the direct path and assume that the speakers location is
known, the amount of reverberation can be reduced.
A more general beamformer technique, called the Transfer Function Generalized
Sidelobe Canceller (TF-GSC), was proposed by Gannot et al. in [9]. They derived
a solution for arbitrary transfer functions, rather than relying on the assumptions
that the received signals are simple delayed versions of the source signal. A suboptimal solution was proposed using transfer function ratios. The blocking matrix
was constructed using the same transfer function ratios, thereby significantly reducing
the leakage of the desired signal. Although this solution can be used in a moderate
reverberant environment it should be noted that it does not reduce the amount of
reverberation in highly reverberant condition.
Bitzer et al. showed the theoretical amount of noise reduction obtained by the
GSC as a function of reverberation time [10]. They concluded that little reduction is
observed for reverberation times greater than 200 ms. In a reverberant environment
coherent noise fields trend to diffuse fields. Unfortunately the GSC is fundamentally
unable to reduce diffuse noise sources.
2.1.2
LP residual processing
The speech signal is often modeled as the output of a time-varying all-pole filter excited
by a random noise for unvoiced speech and quasi-periodic pulses for voiced speech. The
all-pole filter coefficients can be estimated through Linear Prediction (LP) analysis of
16
the recorded speech and are commonly called Linear Prediction Coefficients (LPC).
The excitation sequence, or LP residual, can be obtained by inverse filtering of the
speech signal. The motivation for the LP residual based dereverberation techniques
is the observation that in reverberant environments, the LP residual of voiced speech
segments contains the original impulses followed by several other peaks due to multipath reflections. Furthermore, an important assumption is made that the LPCs are
unaffected by reverberation. Consequently, dereverberation is achieved by attenuating
the peaks in the excitation sequence due to multi-path reflections, and synthesizing
the enhanced speech waveform using the modified LP residual and the time-varying
all-pole filter with coefficients calculated from the reverberant speech.
Yegnanarayana and Murthy proposed a single microphone technique to
dereverberate speech [11] [12], and provided a comprehensive study on the effects of
reverberation on the LP residual. The technique is based on analysis of short (2 ms)
segments of data to enhance the regions in the speech signal having low Signal to
Reverberation Ratio (SRR) components. The short segment analysis shows that SRR is
different in different segments of speech. The processing technique involves identifying
and manipulating the LP residual in three different regions of the speech signal,
namely, high SRR region, low SRR region and only reverberation component region. A
weighting function is derived to modify the LP residual. The weighted residual samples
are used to excite the time-varying LP all-pole filter to obtain perceptually enhanced
speech.
Experiments performed by Gillespie [13] showed that the kurtosis of the LP residual
is a reasonable measure of reverberation.
The LP residual signal becomes more
Gaussian due to reverberation, consequently, the kurtosis becomes smaller. In [13]

the microphone signals are processed by a sub-band adaptive filtering structure so
that the kurtosis of the LP residual of the reconstructed speech gets maximized. In
this way, they attain good solutions to the problem of blind speech dereverberation.
However, the calculation of the kurtosis and its derivative are prone to instability [14].
In order to reduce this sensitivity, a single-channel blind dereverberation algorithm
that uses a maximum likelihood approach to estimate the inverse filter was proposed
17
Estimation of Reverberation power
P(w,m)
Computation of
suppression gain
G(w,m)
Reverberated
signal , y(k)
Y(w,m)
STFT
Dereverberated
signal, z(k)
Z(w,m)
Inv-STFT
Figure 2.2: Spectral subtraction technique for dereverberation.

by Tonelli et al. in [14], and was extended to multiple channels [15]. Their simulation
results showed that good dereverberation is achieved even in real rooms, although preprocessing might be necessary, particularly for widely spaced microphones. Both the
kurtosis and maximum likelihood based algorithms require sufficient spacing between
the microphones.
Above techniques rely on the observation that in reverberant conditions the LP
residual contains the original excitation impulses followed by several other peaks due
to reverberation. Moreover, they rely on the important assumption that the calculated
LP coefficients of the all-pole filter are unaffected by the multi-path reflections of the
room. Gaubitch and Naylor showed that this latter assumption holds only in a spatially
averaged sense [16], and that it can not be guaranteed at a single point in space for a
given room.
2.1.3
Spectral enhancement
The spectral enhancement techniques achieve dereverberation by modifying the shorttime spectrum of the received microphone signal. The block diagram of the spectral
subtraction based dereverberation technique is illustrated in Fig. 2.2. In this technique,
an estimate of the late reverberant energy, P (w, m), is obtained directly from the
18
microphone signal. Then the suppression gain, G(w, m), is obtained as

G(w, m) =
|Y (w, m)|2 P (w, m)

.
|Y (w, m)|2
(2.2)
If G(w, m) 0, then it is replaced by a small positive number. Now the STFT of the
dereverberated speech is obtained as
Z(w, m) = G(w, m)Y (w, m).
(2.3)
The dereverberated signal, z(k), is reconstructed from the estimated STFT, Z(w, m),
through the inverse-STFT.
Lebart et al. proposed a single-microphone spectral enhancement technique for
speech dereverberation [17] which only requires an estimate of the reverberation time
to calculate the reverberation energy. Lebart et al. assumed that the reverberation time
was frequency independent, and implicitly assumed that the energy related to the direct
sound could be ignored. Wu et al. proposed a two-stage approach for multi-microphone
dereverberation [18]. In the first stage the LP residual enhancement technique proposed
by Gillespie [13] was used to enhance the Direct to Reverberation Ratio (DRR). In a
second stage spectral subtraction was used to reduce late reverberation. They used
a heuristic function to estimate the late reverberant energy, thereby assuming that
the first stage was able to reduce a significant amount of reverberation. In [19] a
single-microphone solution was proposed by the same authors using a similar two-stage
approach.
2.1.4
LIME
In [20], [21] Delcroix et al. proposed a multi-channel dereverberation algorithm called

LInear-predictive Multi-input Equalization (LIME). The structure of the proposed
technique is depicted in Fig. 2.3. The method can be summarized as follows.
+
1. The matrix Q is calculated as Q = E{u(n 1)uT (n 1)} E{u(n1)uT (n)},
where u(n) is the multimicrophone received signal vector at nth instant, A+ is
the Moore-Penrose generalized inverse of matrix A.
19
Matrix Q
calculation
Prediction filter
calculation
u (n)
1
H1(z)
-1
z
AR polynomial
calculation
w (z)
1
~
e(n)
e(n)
s(n)
1/a(z)
u (n)
HM(z)
z-1
~
1/a(z)
^
s(n)
wM(z)
Figure 2.3: LIME structure.

2. The first column of Q gives the prediction filter set, w
3. The prediction residual is calculated as e(n) = u1 (n) u(n 1)T w
4. The estimated AR polynomial are obtained from the characteristic polynomial
of Q.
5. The input signal is recovered by filtering the prediction residual, e(n), with the
estimated AR polynomial.
In [22], Delcroix et al. address the speech dereverberation problem in the presence
of a coherent noise source. They showed that the LIME algorithm can achieve both
dereverberation and noise reduction.
Simulation results showed that LIME could achieve almost perfect dereverberation
for short duration impulse responses. However, the method is vulnerable to large
channel length, sensitive to speech duration and not suitable for incoherent noise.
2.1.5
20
HERB
Nakatani et al. proposed a single-microphone speech dereverberation technique called

Harmonicity based dEReverBeration (HERB) [23],[24], [25].
In the design of a
dereverberation filter, HERB explicitly uses the fact that the source signal has a
harmonic structure. The HERB dereverberation filter, W (k), is calculated as follows:
!
b k)
X(l,
W (k) = A
(2.4)
X(l, k)
b k) are discrete STFTs of an observed reverberant signal and the
where X(l, k) and X(l,
output of an adaptive harmonic filter at time frame l and frequency bin k, respectively.
b k)/X(l, k) for each
Here A() is a function that calculates the weighted average of X(l,
k over different time frames. The adaptive harmonic filter is a time-varying filter
that extracts frequency components whose frequencies correspond to multiples of the
fundamental frequency of a short speech segment. The filter, W (k), has been proven
to approximate the inverse filter of the acoustic transfer function between a speaker
and a microphone. In [26] Kinoshita et al. evaluated the effect on speech intelligibility,
and the potential to use HERB as a preprocessing algorithm for ASR. In both cases
HERB seems to be able to decrease the Word Error Rate (WER) of the ASR system.
The main disadvantage is that they required more than 5000 reverberant words, i.e.,
more than 60 minutes of speech data, to acquire the dereverberation filter under the
assumption that the system is time-invariant.
2.2
Channel-information-based Techniques
Channel-information-based approach refers to those techniques that reduce or cancel

the effect of reverberation from the speech signal using the information about the AIRs
that caused reverberation.
These methods are theoretically the most accurate solutions for the dereverberation
problem. They eliminate reverberation by undoing, i.e., inverting, the effect of the
AIRs.
The inversion/equalization of the AIRs can fall under one of two broad
21
categories.
2.2.1
Direct equalization
The direct equalization methods bypass the channel identification stage and attempt to
find the inverse of the AIRs directly from the microphone signals. Bakir et al proposed
a multichannel direct equalization technique for speech dereverberation based on the
Mutually Referenced Equalizers (MRE) [27]. In MRE, second order statistics of each
channels output are used to find a set of equalizers, one for each possible delays. Let,
v0 , v1 , v2 , v3 , and v4 represent the multichannel equalizers with 0, 1, 2, 3, 4 sample
delay respectively, we can write the MRE cost function as
JM RE = E |v0T xn |v1T xn+1 |2 + + E |v3T xn |v4T xn+1 |2
(2.5)
where, xn is the multichannel received signal vector at nth sample and xn+1 represents
one sample advanced received signal vector. The MRE equalizers can be obtained by
minimizing this cost function using the LMS or RLS adaptive algorithm. Regarding
the application of MRE equalizers for speech dereverberation, two relevant points arise.
First, the number of equalizers need to be increased to make it robust to noise in real
environments. This increases the computation time enormously. Second, the algorithm
is sensitive to channel order mismatch and thus unpractical for real situations.
A correlation based multichannel direct inverse filtering technique was proposed by
Furuya et al in [28]. Here, the inverse filters were estimated using the correlation matrix
between multichannel received signals assuming that the source signal is stationary and
statistically white. Since the whiteness assumption does not hold for speech input, the
received signal is needed to be pre-whitened before calculating the coefficients of the
inverse filter. The whitening filter is usually estimated by long-term averaging of the
reverberant speech spectrum with a short-time span. The filter thus obtained only
corresponds to the magnitude spectrum of the AR system transfer function and hence
the pre-whitening performance becomes erroneous and causes improper inversion of
the AIRs. Moreover, the inverse filter can only reduce the energy in early reflections
22
and significant energy remains in the late reflection and, therefore, post processing is
necessary to reduce the late reverberation.
2.2.2
Blind channel identification and equalization
The methods in this category are the most common and most studied. BCI has
long been a topic of interest in communication theory. The reason is that in wireless
environments, the cost of channel training and the rapid changes in the channel make
classical channel identification too costly and limited in value. The first generation
of BCI methods relied on higher-order statistics (HOS) to estimate the channel.
However, these methods suffer from one or more problems of slow convergence, high
computational complexity or local minima. The first breakthrough for equalization
methods came when Tong et. al published result on the feasibility of BCI, based solely
on the second-order statistics (SOS) of the output (receiver) signal [29]. The authors
showed that by having a single-input multiple-output FIR channel (SIMO) model, and
assuming the input s(n) was independent and identically distributed (i. i. d.), it is
possible to identify all the FIR channels in the SIMO system. A SIMO system is easily
obtained from SISO system either by having temporal or spatial oversampling. Since
the reporting of Tong et al, several closed form batch solutions for BCI have been
proposed and reviewed in [30],[31],[32].
Gannot and Moonen [33] use subspace methods for dereverberation both in the
fullband and in the subband domains. Huang and Benesty proposed the cross-relation
between the different microphone signals as an error function for adaptive filters and
used it to derive multichannel LMS and Newton adaptive filters both in the time
domain [34] and in the frequency domain [35]. The main short-coming of the MCLMS
algorithm, however, is related to the selection of appropriate step-size which greatly
influences the speed, final misalignment and stability of the algorithm. The stepsize is dependent on the power of the microphone signals and hence the optimum
value varies with acoustic environment. This dependency was relaxed by proposing
the normalized multichannel frequency-domain LMS (NMCFLMS) algorithm in which
23
0
2
NPM (dB)
4
6
8
SNR = 30 dB
SNR = 40 dB
SNR = 25 dB
10
12
14
0
0.5
1.5
2
iterations
2.5
3
4
x 10
Figure 2.4: Misconvergence of the NMCLMS algorithm in blind identification of the

AIRs in noisy condition
the step-size was limited in the range from 0 to 2 irrespective of the signal power.
The normalizing factor reduces the eigenvalue disparity and thus the convergence is
accelerated. It is reported in [35] that the NMCFLMS algorithm is able to identify
channels of the order of hundreds of taps, which makes it more realistic for identifying
acoustic impulse responses. However, the main limitation of the NMCFLMS algorithm
is that it fails to converge to the desired solution even in a moderate SNR environment.
The convergence profile of the NMCFLMS algorithm in identifying acoustic channels
in the noisy condition is shown in Fig. 2.4. The channel estimation accuracy is
indicated by the normalized projection misalignment (NPM) [52]. The lower the value
of NPM, the better the channel estimate. It is observed that even at 30 dB SNR,
the NMCFLMS algorithm exhibits rapid divergence from the desired solution after an
initial convergence (as indicated by 12.5 dB NPM).
After achieving a robust estimation of the AIRs, dereverberation can be achieved in
principle by an inverse system. Direct inversion of the AIR is not usually encouraged
since: (i) it can be several thousand taps in length, (ii) have non-minimum phase [36]
and (iii) may contain spectral nulls that causes narrow band noise amplification during
inversion. If the above-mentioned limitations can be overcome, the direct inversion
would be an elegant dereverberation technique as it requires less computation and
gives a smaller amount of spectral distortion as compared to other methods.
24
Several approaches other than direct inversion have been studied. Least squares
(LS) inverse filters can be designed for speech dereverberation by minimizing the error
function |h(n) g(n) (n k)|2 [37] which can also be applied in an adaptive
framework [38]. Here, h(n) and g(n) represent the AIR and equalizer impulse response,
respectively. Although LS technique is more noise robust than direct inversion, the
advantage is obtained at the expense of computational complexity as the minimum of
the error function is to be searched for a wide range of delays. Homomorphic inverse
filtering has been investigated [37],[39], where the impulse response is decomposed into a
minimum phase component and an all-pass component. Consequently, magnitude and
phase are equalized separately, where an exact inverse can be found for the magnitude,
while the phase can be equalized, e.g., using matched filtering [39]. It is important to
note that magnitude compensation alone results in audible distortions in the processed
speech signal [36],[39].
Speech dereverberation does not need complete equalization of the acoustic
channel and, therefore, a shortened channel which requires less computation with
acceptable performance, can serve the purpose. The LS minimization [40] is a very
popular technique for channel shortening, however, it suffers from severe distortion
of the equalized channel showing nonuniform spectral attenuation.
To overcome
this limitation, a shortening technique based on infinity-norm optimization has been

proposed in [41] that gives near flat frequency spectrum of the equalized channel.
However, the method is computationally intensive and exhibits very slow convergence
rate.
In the multi-channel case, an exact inverse can be achieved with the Multipleinput/output INverse-filtering Theorem (MINT) [42] and its subband version [43]. If
there are no common zeros between the two AIRS, h1 (n) and h2 (n), then a pair of
inverse filters, g1 (n) and g2 (n), can be found such that:
h1 (n) g1 (n) + h2 (n) g2 (n) = (n).
(2.6)
Thus, exact inverse filtering can be performed. However, it has been observed that
the MINT method has limited value for practical dereverberation problem. Even if
25
the channel estimate contains moderate estimation errors, equalization using MINT
inversion introduces significant spectral distortion.
2.3
Conclusion
All the existing dereverberation techniques can be divided into two broad classes based
on whether they equalize the AIR or not. The fundamental limitation of the approach
that does not equaluze the AIR is that it cannot eradicate the cause of reverberation
and hence always give a suboptimal performance. Therefore, the better approach from
theoretical point of view would be to equalize the AIRs that caused reverberation using
a proper inverse filtering technique. The AIRs can be equalized using an inverse filter
directly obtained from the received microphone signals, however, such methods are
very sensitive to additive noise.
Speech dereverberation can be perfectly done through blind identification folowed
by equalization of the AIRs using MINT method. But the MINT method requires
that the AIRs are to be exactly known in advance, which is a very difficult task in a
practical acoustic environment. The single-channel inversion of the AIRs are not as
much sensitive as the MINT method, however, narrowband noise amplification occurs
due to the presence of spectral nulls in the AIRs. In the subsequent chapters, we
present robust multichannel blind adaptive algorithms that can estimate the AIRs
with reasonable accuracy in the noisy condition. Then we consider the equalization
of AIRs mitigating the noise amplification problem and preserving the quality of the
speech signal utilizing these adaptive estimates.
Chapter 3
Multichannel LMS Algorithm for
Blind Channel Identification:
Robustness Issue
Generally blind identification technique aims to retrieve the unknown information of a
channel from the received signal only. At first glance, the problem may seem impossible
to solve. How is it possible to distinguish the signal from the channel when neither
is known? The beauty of blind channel identification rests on the exploitation of
structures of the channel and properties of the input to separate the input from the
channel.
3.1
3.1.1
Basic Concept and Techniques

Problem formulation
Consider a speech signal recorded inside an echoic room using a linear array of
microphones. The block diagram of the speech acquisition system is shown in Fig. 3.1.
The received signals at the microphones can be modeled as convolutional mixtures of
the speech signal and the impulse responses of the acoustic paths between source and
26
Chapter 3. MCLMS Algorithm for BCI: Robustness Issue

Input
Additive
Noise
Channels
s(n)
H1 (z)
H2 (z)
x1(n)
v 2 (n)
+
.
.
.
HM (z)
Microphone signal
v1(n)
y1(n)
y 2 (n)
yM (n)
27
vM (n)
.
.
.
x 2 (n)
xM (n)
Figure 3.1: Block diagram of a multi-microphone speech acquisition system.

microphones. The channel outputs and observed signals are then given by
yi (n) = s(n) hi (n)
xi (n) = yi (n) + vi (n),
(3.1)
i = 1, 2, , M
(3.2)
where M is the number of microphones, s(n), yi (n), xi (n), vi (n) and hi (n) denote,
respectively, the clean speech, reverberant speech, the reverberant speech corrupted
by background noise, observation noise, and impulse response of the source to ith
microphone. Using vector notation, (3.1) can be written as
yi (n) = hTi (n)s(n)
(3.3)
where, hi = [hi,0 hi,1 hi,L1 ]T denotes the impulse response vector of the ith channel
of length L and s(n) = [s(n) s(n 1) s(n L + 1)]T .
A BCI algorithm estimates hi , i = 1, 2, , M , solely from the observations xi (n),
n = 0, 1, , N 1, where N denotes the data length.
3.1.2
Identifiability condition
In general, a system is considered completely identifiable if all unknown parameters of

the system can be uniquely determined from the available data. But, it is clear from
28
(3.3) that in the absence of noise, a given output sequence ym (n) can only at best imply
a unique input s(n) and a unique channel impulse response hm (n) up to an unknown
scalar. Given this constraint, the identifiability conditions can be listed as follows [31]:
1. The channel transfer functions dont contain any common zeros.
2. The autocorrelation matrix of the source signal is of full rank.
3. N 3L + 1.
The identifiability conditions shown above essentially ensures the following intuitive
requirements:
1. The channels cannot be identical. They must be sufficiently different enough
from each other.
2. The input must be a complex sequence. It cannot be constant or a single sinusoid.
3. There must be enough number of output samples available. A set of output data
cannot provide sufficient information about the system having a larger set of
unknown parameters.
3.2
Review of Common BCI Schemes
The notion of blind channel identification has become known since early 80s. During
the 90s there was an increasing research interest devoted to BCI, when Tong et al. [29]
explored the possibility of BCI using SOS. As the SOS contains sufficient information
for blind identification, many other approaches for BCI have been developed such
as the least-squares approach [30], maximum-likelihood method [31], and subspace
method [33].
Although these methods are able to yield a good estimate of the
channel impulse response, they are generally computationally intensive and difficult
to implement in the adaptive mode. Among the various techniques proposed so far,
the adaptive multichannel least-mean-square (MCLMS) algorithm [34] outperforms
29
the aforementioned ones. The beauty of LMS is its less computational complexity
and efficiency in real-time applications. In the recent era, adaptive filtering in the
frequency domain has attracted a great deal of research interest with a view to reducing
the computational complexities of the convolution and correlation operations needed in
the time-domain algorithm. The frequency-domain implementation of the multichannel
Newton (MCN) algorithm known as the normalized multichannel frequency-domain
LMS (NMCFLMS) has been proposed as an efficient and effective method for BCI
[35]. The main short-coming of the frequency-domain MCLMS algorithm, however,
is related to the selection of appropriate step-size which greatly influences the speed,
final misalignment and stability of the algorithm. Although the step-size ambiguity
is resolved to some extent using the normalizing factor of the NMCFLMS algorithm,
it cannot ensure optimal convergence speed. Moreover, the algorithm diverges from
the desired solution even in a moderate SNR environment [46]. In the subsequent
chapters, a class of step-size optimized robust MCLMS algorithm will be developed for
blind estimation of the AIRs.
3.2.1
Multichannel LMS algorithm
Fist, we briefly describe the basic MCLMS algorithm proposed in [34]. The method is
based on the cross-relation (CR) between the received signals and different channels
in the noise-free case. The CR is as follows:
yi (n) hj = yj (n) hi .
(3.4)
Therefore, an error function may be defined as

eij (n) = xi (n) b
hj xj (n) b
hi
where, b denotes an estimate. Using vector notation, (3.5) can be written as
b j xT (n)h
bi
eij (n) = xTi (n)h
j
(3.5)
30
where xi (n) = [xi (n) xi (n 1) . . . xi (n L + 1)]T . Now, the instantaneous cost

function, including all the CR error terms, can be obtained as
M M
M
1 X
M
X
1 XX 2
J(n) =
e (n) =
e2ij (n)
2 i=1 j=1 ij
i=1 j=i+1
(3.6)
where, eii = 0. The LMS-type adaptive algorithms estimate h by minimizing the cost
function in (3.6). The update equation of the MCLMS algorithm is given by
b + 1) = h(n)
b
h(n
J(n)
(3.7)
where is the step-size, and J(n) is given by

"
T
T # T
J(n)
J(n)
J(n)
J(n) =
b1
bk
bM
h
h
h
PM 1 PM
2
[ i=1
J(n)
j=i+1 eij (n)]
=
bk
bk
h
h
=
k1
X
M
X
2eik xi (n) +
i=1
j=k+1
k1
X
M
X
2eik xi (n) +
i=1
= 2
2ekj (xj (n))

2ejk xj (n)
j=k+1
M
X
eik (n)xi (n)
i=1
where, the last step follows from the fact that ekk = 0. We may express this equation
concisely in matrix form as
J(n)
= 2X(n)ek (n)
bk
h
(3.8)
where, X(n) = [x1 (n) x2 (n) . . . xM (n)] and ek (n) = [e1k (n) e2k (n) . . . eM k (n)]T . It is
to be mentioned here that the channel estimate is always normalized after each update
in order to avoid a trivial estimate with all zero elements [34].
3.2.2
Convergence analysis of MCLMS algorithm
The update equation in (3.7) can be represented using the M L M L autocorrelation
matrix, R(n),
as
b + 1) = h(n)
b
b
h(n
2R(n)
h(n)
(3.9)
31
where
P
21 (n)
R
i6=1 Rii (n)
12 (n)
R
i6=2 Rii (n)
R(n)
=
..
..
.
.
1M (n)
R
R(n)
2M
ij (n) = xi (n)xT (n),

and R
j
...
M 1 (n)
R
M 2 (n)
R
..
.
P
...
i6=M Rii (n)
...
..
.
i, j = 1, 2, . . . , M . Taking the statistical expectation of
(3.9), we obtain [34]
b
b
b
h(n
+ 1) = h(n)
2Rh(n)
(3.10)
b
b
where, h(n)
= E{h(m)}
and R = E{R(n)}
and we assume statistical independence
b
between R(n)
and h(n).
The autocorrelation matrix, R can be diagonalized as
R = UUT
(3.11)
where U is the unitary matrix whose columns are the eigenvectors of R and is a
diagonal matrix with diagonal elements k , 1 k M L, equal to the eigenvalues of
R. Substituting (3.11) into (3.10) and premultiplying by UT , we obtain
o
o
b
b
h
(n + 1) = (I 2)h
(n)
(3.12)
b
b
where, I denotes the identity matrix and h
(n) = UT h(n)
represents an orthogonal
b
mapping of h(n)
in the transformed domain. The set of M L first-order difference
equations are now decoupled. Therefore, the solution of the kth equation can be
obtained as [49],
o
b
hk (n) = Ck (1 2k )n u(n), k = 1, 2, ..., M L
(3.13)
o
o
b
where b
hk (n), k = 1, 2, . . . , M L are the components of h
(n), Ck is an arbitrary constant
b
that depends on the initial value of h(n)
and u(n) is the unit step function. Now the
32
b
channel estimate h(n)
can be expressed as,
o
b
b
h(n)
= Uh
(n)
h
=
u1 . . . uk . . . uM L
o
b
h1 (n)
..
i o
b
hk (n)
..
.
o
b
hM L (n)
(3.14)
where, uk is the eigenvector corresponding to the eigenvalue k . For the stability of

the algorithm, all the terms in (3.13) must satisfy the relation |1 2k | < 1. As a
o
result, b
hk (n) decays with iteration where the rate of decay is dependent on the value
of k . The larger the value of k , the greater the rate of decay. Therefore, after a large
o
number of iterations, the final value of b
hk (n) can be expressed as
o
b
hk (N )|N very large = k , when k 6= min
= min , when k = min
(3.15)
where represents a small number and min k . Substituting (3.15) into (3.14), the
final estimate of the channel can be approximated as
b
h(N
)|N very large min umin
where umin is the eigenvector corresponding to the minimum eigenvalue min . Therefore,
we find that the MCLMS algorithm converges to the eigenvector that corresponds to
the minimum eigenvalue of the data correlation matrix.
3.2.3
Limitations of the time-domain MCLMS algorithm
The time-domain MCLMS algorithm exploits the channel diversity and minimizes
a cross-relation error criterion between the different microphone signals to obtain
the desired channel estimate. The main advantage of the MCLMS algorithm is the
algebraic simplicity which makes it a potential choice in many applications requiring
BCI. However, the time-domain MCLMS algorithm is characterized with the following
limitations which make it unacceptable for practical applications.
33
The time-domain MCLMS algorithm exhibits very slow convergence rate or no

convergence at all because of the large variations in the eigenvalues of the input
signal autocorrelation matrix [51].
The choice of step-size is a critical factor for time-domain MCLMS algorithm.
The proper step-size is dependent on the signal power and it influences the speed
of convergence as well as the final misalignment error.
The performance of the time-domain MCLMS algorithm significantly deteriorates
in the noisy environment. A number of research articles report that the algorithm
fails to estimate the desired channel even in a moderate SNR condition [46], [47].
3.3
The MCFLMS Algorithm
In order to accelerate the convergence speed of the MCLMS algorithm, several

frequency-domain block-adaptive MCLMS algorithms (MCFLMS) are proposed in [35].
The derivation of the MCFLMS algorithm is based on the cross-relation criteria in the
frequency-domain using the overlap-save method. The linear filtering of a signal can
ij (m) of length
be obtained in block mode using circular convolution. Let the vector y
b j . We can express y
ij (m) as
2L is defined as the circular convolution between xi and h
b 10 (m)
ij (m) = Cxi (m)h
y
j
(3.16)
where, m is the block time index, Cxi (m) is a circulant matrix and
ij (m) = [
y
yij (mL L) yij (mL L + 1) yij (mL) yij (mL + L 1)]
xi (mL L)
xi (mL + L 1) . . . xi (mL L + 1)
xi (mL L + 1)
xi (mL L)
. . . xi (mL L + 2)
..
..
..
...
.
.
.
Cxi (m) =
xi (mL)
xi (mL 1)
...
xi (mL + 1)
..
..
..
..
.
.
.
.
xi (mL + L 1) xi (mL + L 2) . . .
xi (mL L)
b 10 (m) = [h
b T (m) 0T ]T .
h
j
j
L1
34
ij (m) are identical to the result of linear

It is observed that the last L points of y
convolution
01
01
b 10 (m)
ij (m) = WL2L
yij (m) = WL2L
y
Cxi (m)h
j
01
10
b j (m)
h
= WL2L
Cxi (m)W2LL
(3.17)
where
yij (m) = [yij (mL) yij (mL + 1) yij (mL + L 1)]T
01
WL2L
= [0LL ILL ]
10
W2LL
= [ILL 0LL ]T
b j (m) = [b
h
hj,0 (m) b
hj,1 (m) b
hj,L1 (m)]T
where I denotes an identity matrix and 0 is a matrix of zeros. A block of error signal
based on the cross-relation between ith and jth channel is determined as
eij (m) = yij (m) yji (m)
01
10
b j (m)
= WL2L
[Cxi (m)W2LL
h
10
b i (m)].
Cxj (m)W2LL
h
(3.18)
Let FLL be the discrete Fourier transform (DFT) matrix of size L L. Then the
block error sequence in the frequency-domain can be expressed as
eij (m) = FLL eij (m)
01
10
b (m)
= WL2L
[Dxi (m)W2LL
h
j
10
b (m)]
Dxj (m)W2LL
h
i
where underline denotes frequency domain.
(3.19)
The circulant matrix Cxi (m) can be
decomposed as
Cxi (m) = F1
2L2L Dxi (m)F2L2L
(3.20)
35
where Dxi (m) is a diagonal matrix whose elements are obtained from the DFT
coefficients of the first column of Cxi (m) and
01
01
WL2L
= FLL WL2L
F1
2L2L
10
10
W2LL
= F2L2L W2LL
F1
LL
b (m) = FLL h
b j (m).
h
j
The frequency-domain cost function Jf (m) using the frequency-domain block error
signal eij (m) is defined as
Jf (m) =
M
1
X
M
X
eH
ij (m)eij (m)
(3.21)
i=1 j=i+1
where H denotes the Hermitian transpose. The MCFLMS algorithm approaches the
desired solution by going along the opposite direction of the gradient at each iteration.
The update equation of the MCFLMS algorithm is given by
b (m) f Jk (m),
b (m + 1) = h
h
k
k
k = 1, 2, , M
(3.22)
where f is the step-size in the frequency-domain and the gradient vector Jk (m) can
be obtained as
Jk (m) =
Jf (m)
b (m)
h
k
10
WL2L
M
X
01
Dxi (m)W2LL
eik (m)
(3.23)
i=1
where denotes complex conjugate and

10
10
F1
= FLL WL2L
WL2L
2L2L
01
01
F1
= F2L2L W2LL
W2LL
LL
10
WL2L
= [ILL 0LL ]
01
W2LL
= [0LL ILL ]T .
Using (3.23), the MCFLMS update equation can be expressed as

b (m) f W 10
b (m + 1) = h
h
k
k
L2L
M
X
Dxi (m)
i=1
01
W2LL
eik (m),
k = 1, 2, , M.
(3.24)
36
Concatenating the M impulse response vectors into a longer one, we can write the
update equation for the MCFLMS algorithm as
b + 1) = h(m)
b
h(m
f Jf (m)
(3.25)
where
b
b T (m) h
b T (m) h
b T (m)]T
h(m)
= [h
1
2
M
T
(m)]T .
Jf (m) = [J1T (m) J2T (m) JM
Compared with a multichannel time-domain block LMS algorithm, the MCFLMS is

computationally more efficient since it uses the FFT to calculate block convolution and
block correlation in the frequency domain. For each processed frame (M 2 +2M ), FFTs
of 2L points are sufficient to implement exact linear convolutions between the channel
outputs and the adaptive filter coefficients and to implement exact correlations between
the outputs and the error signals. Although a block delay is unavoidably introduced
in the MCFLMS algorithm, such a delay is tolerable for most applications.
3.4
The Normalized MCFLMS Algorithm
The MCFLMS algorithm converges to the desired solution with faster convergence
speed, however, the performance is critically dependent on the choice of a proper
step-size that influences the speed of convergence as well as the final misalignment
error. The selection of step-size is dependent on the power of the microphone signals
and hence the algorithm requires tuning with the change of acoustic environment.
Consequently, the normalized MCFLMS (NMCFLMS) algorithm was proposed that
relaxes the dependency of step-size parameter on the signal power. The algorithm also
reduces the eigenvalue spread of the autocorrelation matrix of the input signal and
thus accelerates the convergence speed.
The update equation of the NMCFLMS algorithm is expressed as
b 10 (m + 1) = h
b 10 (m) P1 (m)
h
k
k
k
M
X
Dxi (m)
i=1
e01
ik (m),
k = 1, 2, , M
(3.26)
37
where,
Pk (m) =
M
X
Dxi (m)Dxi (m)
i=1,i6=k
b 10 (m)
h
k
10
b (m)
= W2LL
h
k
01
e01
ik (m) = W2LL eik (m).
Here 0 < < 2 is thestep-size parameter for NMCFLMS algorithm. The power
spectrum Pk (m) of the channel outputs is usually computed using a recursive scheme:
M
X
Pk (m) = Pk (m 1) + (1 )
Dxi (m)Dxi (m),
i=1,i6=k
k = 1, 2, , M
where is a smoothing parameter.
(3.27)
From a computational point of view, the
NMCFLMS algorithm can be easily implemented, even for a real-time application

since the normalization matrix, Pk (m) is diagonal, and it is straightforward to find its
inverse.
3.5
Convergence Analysis
In order to perform convergence analysis of the NMCFLMS algorithm, we need to

represent the update equation in L length.
Therefore, premultiplying (3.26) by
10
10
WL2L
= FLL WL2L
F1
2L2L , we get
b (m) W 10 P1 (m)
b (m + 1) = h
h
k
k
L2L k
M
X
Dxi (m)
i=1
01
W2LL
eik (m),
k = 1, 2, , M
(3.28)
For the ease of convergence analysis with noise, the update equation of the NMCFLMS
38
algorithm in (3.28) is rearranged as follows:

b (m + 1) = h
b (m) 2W 10 P1 (m)W 10
h
k
k
L2L k
2L2L
M
X
01
Dxi (m)W2LL
eik (m)
i=1
b (m) 2W 10 P1 (m)W 10 W 10
= h
k
L2L
2LL
L2L k
M
X
01
eik (m)
Dxi (m)W2LL
i=1
b (m) 2Pk (m)W 10

= h
k
L2L
M
X
01
Dxi (m)W2LL
eik (m)
(3.29)
i=1
where we have used the relation [35]

10
10
10
W2L2L
= 0.5 I2L2L = W2LL
WL2L
and
10
10
Pk (m) = WL2L
P1
k (m)W2LL .
Now, using the observation data correlation matrix, (3.29) can be expressed as [35]
b (m + 1) = h
b (m) 2Pk (m)[R
1k (m) R
2k (m)
h
k
k
X
b
ii (m) R
M k (m)]h(m)
R
i6=k
k = 1, 2, , M
where
h T
iT
b
b (m) h
b T (m) h
b T (m)
h(m)
= h
1
2
M
ij (m) are given by
and the entries R
ij (m) = SH (m)Sx (m)
R
xi
j
with
10
01
SH
xi (m) = WL2L Dxi (m)W2LL
01
10
Sxi (m) = WL2L
Dxi (m)W2LL
.
(3.30)
39
Concatenating the M impulse response vectors, (3.30) can be written as

b + 1) = h(m)
b
b
x (m)h(m)
h(m
2P(m)R
(3.31)
where
P(m) =
P1 (m)
0
..
.
...
P2 (m) . . .
..
..
.
.
0
..
.
0
x (m) is defined as
and R
P
x (m) =
R
i6=1 Rii (m)
12 (m)
R
..
.
1M (m)
R
. . . PM (m)
21 (m)
R
...
P
i6=2 Rii (m) . . .

..
..
.
.
2M (m)
R
M 1 (m)
R
M 2 (m)
R
..
.
P
...
i6=M Rii (m)
Taking the statistical expectation of (3.31), we get
b
b
b
h(m
+ 1) = h(m)
2PRx h(m)
(3.32)
b
b
x (m)}. To relate the channel
where, h(m)
= E{h(m)},
P = E{P(m)} and Rx = E{R
estimate with the eigenvectors of the clean data correlation matrix, we expand the
noisy data autocorrelation matrix Rx as
Rx = Ry + Rv + Ryv + Rvy = Ry + Rn
where Rn = Rv + Ryv + Rvy .
(3.33)
The matrices Ry and Rv are the clean output
data and noise autocorrelation matrices, respectively, and Ryv and Rvy denote the
crosscorrelation matrices between them. In what follows is the eigen analysis of the
NMCFLMS algorithm with noise.
Since the autocorrelation matrix Ry is Hermitian, it can be represented as
Ry = Uy y UH
y
(3.34)
40
where Uy is an orthogonal matrix whose columns are the eigenvectors of Ry and y

is a diagonal matrix with diagonal elements i (1 < 2 < . . . < M L ) equal to the
eigenvalues of Ry . Substituting (3.33) and (3.34) into (3.32), we obtain
h(m
+ 1) = h(m)
2PUy y UH
y h(m)
2PRn h(m).
(3.35)
Premultiplying by UH
y , (3.35) can be represented as
o
o
o
(m + 1)
(m) 2(0y + UH
h
= h
y PRn Uy )h (m)
o
o
= h
(m) 2Th
(m)
(3.36)
where
o
h
(m) = UH
y h(m)
(3.37)
0y = P 0 y
(3.38)
P 0 = Diag UH
y PUy
(3.39)
T = 0y + UH
y PRn Uy
(3.40)
where Diag[] refers to a diagonal matrix with diagonal elements of UH

y PUy . As P
is a diagonal matrix with close diagonal elements and UH
y Uy is an identity matrix,
we have found that UH
y PUy is also very close to a diagonal matrix.
our approximation in (3.36) introduces negligible error.
Therefore,
Equation (3.40) can be
diagonalized as T = VDV1 , where V and D are the matrices whose columns and
diagonal elements are, respectively, the eigenvectors and eigenvalues of T. Substituting
T in (3.36) and premultiplying it by V1 , we obtain
g(m + 1) = (I 2D)g(m)
(3.41)
o
g(m) = V1 h
(m).
(3.42)
where
The set of M L first-order difference equations in (3.41) are now decoupled. The
solution of the k-th equation can be obtained as
gk (m) = Ck (1 2dk )m u(m), k = 1, 2, . . . , M L
(3.43)
Profile of gk(m)
41
8
6
4
2
0
0
10
5
0
100
200
300
Index, k
400
10
12
500
600
Figure 3.2: Amplitude distribution of the transform coefficients, gk (m), at the end of
60000 iterations.
where Ck is an arbitrary constant which depends on the initial value of g(m), and
u(m) is a unit step function. Clearly, gk (m) converges to zero exponentially for all dk
provided that |1 2dk | < 1 except for dk = 0.
In absence of noise, T = 0y = D and V becomes an identity matrix. Therefore,
the final value of gk (m) for noise-free case becomes
0, when d 6= 0
k
gk () =
C , when d = 0.
k
(3.44)
The profile of gk (m) for noise-free case is shown in Fig. 3.2 after 60000 iterations
which justifies the well known results in (3.44). Using (3.37), (3.42) and (3.44), the
for noise-free case can be obtained as

final estimate of h
h()
= Uy Vg() = Uy g() = Ck uky =0
(3.45)
where uky =0 is the column of the orthogonal matrix Uy corresponding to ky = 0,

i.e. the zero eigenvalue of the diagonal matrix y . As is well known, for noise-free
case the multichannel blind NMCFLMS algorithm converges to a scaled version of the
eigenvector of the autocorrelation matrix Ry corresponding to the zero eigenvalue.
3.6
42
Effect of Noise on the NMCFLMS Solution
In presence of noise, however, we can see from (3.40) that the diagonal matrix 0y is
additionally corrupted by the noise term UH
y PRn Uy . The resultant matrix T in (3.40)
would be diagonal only if Rn and P are diagonal matrices with equal diagonal entries.
However, by definition, Rv is a matrix with unequal diagonal entries. Also, in practical
cases, Rv contains off-diagonal entries and Ryv , Rvy are non-zero matrices. Therefore,
Rn = Rv +Ryv +Rvy and in turn T would be non-diagonal matrices containing unequal
diagonal entries. As a result, in presence of noise, none of the diagonal entries in matrix
D would be practically zero. Therefore, from (3.43), we can deduce that for a stable
system, gk (m), k = 1, 2, . . . , M L would decay exponentially to zero with iterations
unless a constraint is applied. Thus it appears that no fruitful final output can be
obtained from the NMCFLMS algorithm in the noisy case. However in this analysis,
rather than the actual values of gk (m), k = 1, 2, . . . , M L, the relative values of the
elements are important. Using (3.43), the ratio of the kth component to the i-th one
can be expressed as
Ck
gk (m)
=
gi (m)
Ci
1 2dk
1 2di
m
where i, k = 1, 2, . . . M L.
(3.46)
If |1 2dk | |1 2di | for all k, the expression in (3.46) diminishes exponentially to

zero except for k = i as m tends to infinity. Therefore, after sufficiently large number of
iterations, all the components gk (m), k = 1, 2, . . . , M L become negligible with respect
to the ith component which corresponds to the minimum eigenvalue of D . This
situation is depicted in Fig. 3.3 where the upper plot shows the eigenvalue profile of the
matrix T and the lower plot indicates the amplitude profile of gk (m), k = 1, 2, , M L
after sufficiently large number of iterations. It is seen that though all the components
have decaying trend with iterations, only one component survives (with respect to the
others) after sufficient iterations. Therefore, in noisy case
, when k = i
gk (m)|mvery large =
0, when k 6= i
(3.47)
where is an arbitrary constant. Using (3.42) and (3.47), we can now conclude that
o
after sufficiently large number of iterations, h
would be equal to the scaled version of
(a)
0.6
d
43
0.4
0.2
0
100
200
300
Index, i
400
500
600
400
500
600
66
g (m)
x 10
(b)
1
0.5
0
100
200
300
Index, k
Figure 3.3: (a) Eigenvalue profile of matrix T, i.e., the profile of di , i = 1, 2, . . . , M L,

(b) Amplitude of gk (m)|mvery large , k = 1, 2, . . . , M L.
vi , where vi is the ith column of V corresponding to the eigenvalue di . This conclusion
in noisy case as
and (3.37) give the final estimate of h
o
h
= Uy h
(m)|mvery large = Uy Vg(m) = Uy vi
v1i
h
i
v2i
= u1y u2y . . . uM Ly .
..
vM Li
=
ML
X
vki uky .
(3.48)
k=1
The above equation reveals that, unlike the noise-free case solution in (3.45), the
estimate in noisy case is the weighted sum of the eigenvectors of Ry , where the weights
are the elements of the ith column of the matrix V, i.e. vi . To vividly show the relation
between the true and the noisy estimate, we rewrite (3.48) as
h
= 0
h
|{z}
true vector
h
| noisy
{z }
contribution f rom noise
(3.49)
Relative weights in noisy case
44
Most dominant weight in noisy case

Weight corresponding to the zero eigenvalue
of the noisefree case
2
5
10
15
20
25
30
35
40
0
1
100
200
300
Index
400
500
600
Figure 3.4: Amplitude distribution (normalized with respect to the 1st coefficient) of
the elements of vi (i = 1) after 15000 iterations for SNR=15 dB. Here, in addition to
the 1st element (corresponding to the zero eigenvalue in noise-free case), 2nd, 3rd, 6th,
7th, 9th, 16th, 17th and 18th elements are also significant.
where, the true impulse response vector h is the scaled value of u1y (e.g., h = 1/c u1y ,
c is the scaling parameter), 0 = c v1i is the weight parameter, and

h
| noisy
{z }
is the linear combination of the eigenvectors uky , k = 2, 3, , M L.
noise contribution
In noise-free case, V would be an identity matrix and hence vi would contain

only one nonzero element and the eigenvector corresponding to that element would be
the final estimate. In such a case, (3.48) and (3.45) become identical. The relative
amplitude profile of the elements of vi (i = 1) for noisy case is shown in Fig. 3.4. As
can be seen, in addition to the element in noise-free case (i.e., 1st element), there are
other dominant elements (2nd, 3rd, 6th, 7th, 9th, 16th, 17th and 18th elements etc.)
in vi . The most dominant element (i.e., 2nd element), however, in this case is different
from that in the clean case (i.e., 1st element). The amount of noise level actually
controls this scenario.
To empirically justify (3.48), we show in Fig. 3.5 both time- and frequency-domain
results of channel estimate using it and the NMCFLMS algorithm at SNR=15 dB.
It is evident from this figure that the solution obtained from the weighted sum of
Amplitude
Magnitude
Magnitude
Magnitude
0.2
(a)
0.1
0
0.2
0.4
0.6
0.5
0
0.8
1
1.2
1.4
Normalized frequency (x pi)
1.6
1.8
1.6
1.8
1.6
1.8
(b)
0.2
0.4
0.6
0.5
0
45
0.8
1
1.2
1.4
(c)
0.2
0.4
0.2
0.1
0
0.6
0.8
1
1.2
1.4
(d)
100
200
NMCFLMS
300
400
Using the terms of (3.48)
500
600
Samples
Figure 3.5: (a) Magnitude spectrum of the true channel, (b) Magnitude spectrum of the
linear combination of all the eigenvectors of Ry according to the weight profile shown
in Fig. 3.4, (c) Estimated magnitude spectrum using the NMCFLMS, (d) Estimated
impulse responses (concatenated) using the two methods in the time-domain.
all the eigenvectors of Ry is very close to that of the NMCFLMS estimate. The little
difference in the two curves can be attributed to incomplete convergence of the adaptive
algorithm. It is also interesting to observe the narrowband shape of the magnitude
spectra in Figs. 3.5 (b) and (c) of the estimates at 15 dB SNR. This shape in contrast
to the uniform spectrum of the true channel as shown in Fig. 3.5 (a)) is due to the
additional eigenvectors (i.e., 2nd, 3rd, 6th, 7th, 9th, 16th, 17th and 18th etc) with
dominant narrowband characteristics. The relative weight of the eigenvector of the
noise-free case solution is less as compared to unified strength of other vectors in the
estimate. The presence of noise thus invokes other eigenvectors in the solution and
deemphasize the relative effect of the desired one when the SNR is below a certain
46
threshold value.
3.7
Conclusion
In this chapter, we reviewed the blind channel identification technique and the
identifiability conditions of BCI. We introduced the time- and frequency-domain
multichannel LMS algorithms as an effective algorithm for BCI. The detail convergence
analysis of the NMCFLMS algorithm is also presented that gave a generalized view of
the final solution both in the noise-free and noisy conditions. It has been shown that
the final solution of the NMCFLMS algorithm comes from the weighted combination
of all the eigenvectors of the clean data correlation matrix. The presence of noise
paves the way for the other eigenvectors to become dominant over the eigenvector
corresponding to the minimum eigenvalue. As a result, the conventional minimum
mean-square cross-relation error solution cannot ensure desired channel estimate in
the presence of noise.
Chapter 4
Variable Step-size Multichannel
Frequency-Domain LMS for Blind
Identification of FIR Channels
The choice of step-size is a critical factor in blind identification of SIMO channels using
the MCFLMS algorithm. The proper step-size is dependent on the signal power and
it influences the speed of convergence as well as the final misalignment error. The
NMCFLMS algorithm can relax the dependency of step-size parameter on the signal
power, however, it cannot ensure the appropriate step-size for optimal convergence.
We propose a variable-step-size MCFLMS (VSS-MCFLMS) algorithm which optimizes
the performance of the algorithm in each iteration to achieve minimum misalignment
between the true and estimated channel vector. The proposed VSS ensures minimum
mean-squared-error solution in the mean for both noise-free and noisy conditions, and
more noise robust as compared to the NMCFLMS algorithm. Using theoretical analysis
and numerical example, it is shown that this step-size guarantees the stability of the
algorithm.
47
Chapter 4. VSS-MCFLMS algorithm for Blind Identification of FIR Channels
4.1
48
Optimum-Step-size MCFLMS Algorithm
The update equation of the conventional MCFLMS algorithm can be expressed as (Eq.
3.24)
b (m + 1) = h
b (m) f W 10
h
k
k
L2L
M
X
Dxi (m)
i=1
01
W2LL
eik (m),
k = 1, 2, , M.
update equation for the MCFLMS algorithm as
b + 1) = h(m)
b
h(m
f Jf (m)
(4.1)
where
b
b T (m) h
b T (m) h
b T (m)]T
h(m)
= [h
1
2
M
T
Jf (m) = [J1T (m) J2T (m) JM
(m)]T .
Now we derive an expression of optimal-step-size for the MCFLMS algorithm in the

b
least-squares (LS) sense. The step-size is adapted so that the distance between h(m+1)
and h is minimum at each iteration. We define the cost function as
b + 1)]H [h h(m
b + 1)]
J (m) = [h h(m
(4.2)
b
where is a constant used to resolve the scaling ambiguity between h and h(m
+ 1).
The MCFLMS update equation for adaptive step size f (m) can be written as
b + 1) = h(m)
b
h(m
f (m)Jf (m).
(4.3)
Substituting (4.3) into (4.2), we obtain

b
b
J (m) = [h h(m)
+ f (m)Jf (m)]H [h h(m)
+f (m)Jf (m)].
(4.4)
Rearranging and simplifying (4.4), we get

b
J (m) = hH h 2hH h(m)
+ 2f (m)hH Jf (m)
Hb
H
b
b
+2 h(m)
h(m) 22 f (m)h(m)
Jf (m)
+2 2f (m)||Jf (m)||2
(4.5)
49
where |||| is the l2 norm. To obtain the optimal-step-size, opt

f (m), the partial derivative
of J (m) is equated to zero. Setting, J (m)/f (m) = 0 gives
opt
f (m)
b H (m) 1 hH
h
=
Jf (m)
||Jf (m)||2
(4.6)
Equation (4.6) is not implementable in practical applications because of the presence

of the true channel vector h which is unknown in our case. It has been reported in [50]
that for noise-free observations, the gradient vector is approximately orthogonal to the
true impulse response vector in the time-domain. Since two orthogonal vectors in the
time-domain remains orthogonal in the frequency-domain, we can approximate
hH Jf (m)
= 0.
(4.7)
Therefore, the optimal-step-size for MCFLMS algorithm in noise-free case can be

obtained from (4.6) as
opt
f (m)
b H (m)
h
=
Jf (m).
||Jf (m)||2
(4.8)
In this work, we investigate the effectiveness of the MCFLMS algorithm in (4.1) but
replacing the fixed-step-size f by opt
f (m) both in noise-free and noisy conditions.
We also give a performance comparison of the proposed VSS-MCFLMS with the
NMCFLMS [35] in different noisy environments and show the superiority of our
method. Before we do this, the stability and convergence analysis of the proposed
VSS-MCFLMS and the algorithmic difference between the two approaches in presence
of noise would be interesting.
4.2
Convergence Analysis of the VSS-MCFLMS

Algorithm
In this section, we give a theoretical analysis of the mean convergence of the VSSMCFLMS algorithm. In particular, we focus on the mechanism how the adaptive
algorithm for BCI converges to the eigenvector corresponding to the minimum
eigenvalue both in noise-free and noisy cases.
50
Using the M L M L autocorrelation matrix R(m),

the gradient vector Jf (m)
can be represented as
Jf (m) = R(m)
h(m)
where R(m)
is defined as [35]
P
(m)
21 (m)
R
R
i6=1 ii
P
12 (m)
R
i6=2 Rii (m)
R(m)
=
..
..
.
.
1M (m)
2M (m)
R
R
...
(4.9)
M 1 (m)
R
M 2 (m)
R
..
.
P
...
i6=M Rii (m).
...
..
.
Substituting (4.9) into (4.3), the update equation of the VSS-MCFLMS algorithm can
be written as
b + 1) = h(m)
b
b
h(m
f (m)R(m)
h(m).
(4.10)
Taking the statistical expectation of (4.10), we obtain [49]
h(m
+ 1) = h(m)
f Rh(m)
(4.11)
where, h(m)
= E{h(m)},
f = E{f (m)} and E{R(m)}

= R and we assume
b
statistical independence among f (m), R(m)

and h(m).
The autocorrelation matrix,
R, is Hermitian and hence it can be represented as
R = UUH
(4.12)
where U is the unitary matrix whose columns are the eigenvectors of R and is a
diagonal matrix with diagonal elements k , 1 k M L, equal to the eigenvalues of
R. Substituting (4.12) into (4.11), we obtain
o
o
h
(m + 1) = (I
f )h
(m)
(4.13)
o
(m) = UH h(m).
h
(4.14)
where,
The set of M L first-order difference equations in (4.13) are now decoupled. Therefore,
the solution of the kth equation can be obtained as
o
f k )m u(m), k = 1, 2, ..., M L
h
k (m) = Dk (1
(4.15)
51
o
o
where h
(m),
k
=
1,
2,
.
.
.
,
M
L
are
the
components
of
h
(m), Dk is an arbitrary
k
constant which depends on the initial value of h(m)

and u(m) is the unit step function.
Now using (4.14), the channel estimate h(m)

can be expressed as
o
h(m)
= Uh
(m)
h
=
u1 . . . uk . . . uM L
o
h
1 (m)
..
.
o
h
k (m)
..
.
o
h
M L (m)
o
o
= u 1 h
(m)
+
.
.
.
+
u
(m) + . . .
h
1
k k
o
+uM L h
M L (m)
(4.16)
where, uk is the eigenvector corresponding to the eigenvalue k . Therefore, at any

iteration, adaptive estimate of the channel is the weighted combination of all the
o
o
eigenvectors with h
k (m) as the weighting function for the eigenvector, uk . hk (m) in
(4.15) diminishes exponentially to zero except one which corresponds to the minimum
o
(m) in the noise-free
eigenvalue min = 0 in the noise-free case. Now the final value of h
k
case can be obtained as
o
h
k () = 0, when k 6= 0
= Dmin , when k = min .
(4.17)
As a result, the final estimate of h(m)

in the noise-free case using (4.16) and (4.17)
52
can be expressed as
o
h()
= Uh
()
0
..
.
h
=
u1 . . . umin . . . uM L
Dmin
..
.
= Dmin umin .
(4.18)
We see that, in the noise-free case the VSS-MCFLMS algorithm converges to a

scaled version of the eigenvector of the autocorrelation matrix R corresponding to
the minimum eigenvalue.
From (4.16) and (4.18), we find that the speed of convergence of the MCFLMS
o
algorithm depends on how quickly we can diminish the h
k (m) corresponding to nonzero eigenvalues. With this point of view, we define a cost function, Jo (m) as
ML
X
o
2
|h
Jo (m) =
k (m + 1)|
k=1
ML
X
o
=
|h
f k ] |2 .
k (m) [1
(4.19)
k=1
Differentiating (4.19) with respect to

f , we get
ML
X
Jo (m)
o
2
f k ] (k ).
=2
|h
k (m)| [1
f
k=1
(4.20)
Equating (4.20) to zero gives,

ML
X
o
2
|h
k (m)| k
f =
k=1
ML
X
(4.21)
o
2 2
|h
k (m)| k
k=1
Now, we verify the similarity between f (m) obtained in (4.8) and that of (4.21).
53
Substituting (4.9) into (4.8), we get

b H (m)R(m)
b
h
h(m)
f (m) =
b H (m)R
b
H (m)R(m)
h
h(m)
h o
iH
b (m) (m)h
b o (m)
h
= h
iH
b o (m)
b o (m) 2 (m)h
h
ML
X
k=1
ML
X
b (m)|2 k (m)
|h
k
.
(4.22)
bo (m)|2 2 (m)
|h
k
k
k=1
It shows that although step-size is computed from complex vectors, it is a real number.
Equations (4.21) and (4.22) are similar in form. It reveals that the minimum norm
solution of (4.19) is identical to the LS solution of (4.2). Therefore, the variable step-size
represented by (4.8) gives the fastest convergence speed for the MCFLMS algorithm
in the noise-free case.
We now discuss the stability of the MCFLMS algorithm when the proposed variable
step-size is adopted. From (4.15) we see that the VSS-MCFLMS algorithm will be
bo (m)
unstable if for any value of k, |1
f k | becomes greater than 1 and hence h
k
will rise exponentially. In that case according to (4.22), f (m) becomes (neglecting all
other components as compared to the rising one)
o
b (m)|2 k
|h
k
f (m)
=
o
b
|h (m)|2 2
k
1
=
.
k
(4.23)
Therefore, we find that the proposed f (m) is auto regulated which forces the blowing
component to decay as quickly as possible and thus ensures the stability of the
MCFLMS algorithm. To verify the above statement, a fixed step-size, f was arbitrarily
selected for a random multichannel system with SNR = 20 dB that causes the estimated
channel coefficients, h(m)

to blow and thus results in the divergence of the MCFLMS
algorithm. After certain iterations, the large fixed f was replaced by the proposed
variable step-size f (m). The results are shown in Fig. 4.1. The norm of h(m)
blows as
54
step size,
x 10
0.5
0
500
1000
1500
2000
2500
iterations, m
3000
3500
4000
4500
101
hnorm
10
x 10
5
0
500
1000
1500
2000
2500
iterations, m
3000
3500
4000
4500
500
1000
1500
2000
2500
iterations, m
3000
3500
4000
4500
NPM, dB
0
10
20
30
Figure 4.1: Stability of the MCFLMS algorithm using the optimal-step-size

long as fixed f is present and the corresponding normalized projection misalignment
(NPM) shows the divergence of the algorithm. But as soon as it is replaced by the
optimal f (m), the algorithm returns back from divergence and thereafter converges
to the desired solution.
However, in the presence of noise, none of the eigenvalues (k ) of the autocorrelation
o
matrix R is zero. Therefore, for a stable system all the h
k (m) decay with iterations.
o
But, it is clear that all the components of h

(m) decay exponentially to zero but each
component decays at a different rate depending on the value of k . Using (4.15), the
ratio of the kth component to minimum eigenvalue component can be expressed as
m
o
Dk
1
f k
h
k (m)
.
(4.24)
=
.
o
Dmin 1
f min
h
min (m)
Now for any k except one which corresponds to the minimum eigenvalue, we can write
|1
f k | < |1
f min |. As a result the expression in (4.24) diminishes exponentially
to zero as m tends to infinity. It means that after a large number of iterations all the
orthogonal components become negligible as compared to the minimum eigenvalue
component. Therefore, similar to the noise-free case, we can conclude that when
the received data is corrupted by noise, the MCFLMS algorithm converges to the
55
eigenvector corresponding to the minimum eigenvalue.
4.3
VSS-MCFLMS vs.
NMCFLMS: Algorithmic
Difference
In this section, we highlight the difference between the proposed VSS-MCFLMS and
the NMCFLMS algorithms in terms of (i) final solution in noisy environments and (ii)
computational complexity.
4.3.1
Performance analysis of the NMCFLMS
The update equation of the NMCFLMS (Eq. 3.28) is expressed as

b (m + 1) = h
b (m) W10 Pk (m)1
h
k
k
L2L
M
X
Dxi (m)
i=1
01
W2LL
eik (m), k = 1, 2, , M.
(4.25)
where,
Pk (m) =
M
X
Dxi (m)Dxi (m)
i=1,i6=k
and the step-size is selected to be 1 for simplifying the derivation. In order to

differentiate VSS-MCFLMS from NMCFLMS algorithm, we rearrange the update
equation of the NMCFLMS algorithm in (4.25) as follows
b (m + 1) = h
b (m) 2W 10 Pk (m)1 W10 W10
h
k
k
L2L
2LL
L2L
M
X
01
Dxi (m)W2LL
eij (m)
i=1
b (m) 2Pk (m)W10

= h
k
L2L
M
X
01
Dxi (m)W2LL
eik (m)
i=1
where we have used the relation [35]

10
10
10
W2L2L
= 0.5 I2L2L = W2LL
WL2L
(4.26)
56
and
10
10
WL2L
Pk (m)1 W2LL
= Pk (m).
Therefore, the NMCFLMS algorithm can be expressed as

b + 1) = h(m)
b
b
h(m
2P(m)R(m)
h(m)
where
P(m) =
(4.27)
P1 (m)
0
..
.
0
...
P2 (m) . . .
..
..
.
.
0
..
.
(4.28)
. . . PM (m)
Taking the statistical expectation of (4.27), we get
b
b
b
h(m
+ 1) = h(m)
2PRh(m)
(4.29)
b
where, P = E{P(m)} and we assume statistical independence between P, R and h(m).
Substituting (4.12) into (4.29) and premultiplying by UH we obtain,
o
o
o
(m) 2UH PUh
h
(m + 1) = h
(m)
(m)
= (I 2p )h
(4.30)
where
p = Diag UH PU
(4.31)
and Diag[] refers to a diagonal matrix with diagonal elements of UH PU. We have
found that UH PU is very close to a diagonal matrix. Therefore, our approximation in
(4.30) introduces insignificant error.
Observing (4.13) and (4.30), we can clearly visualize the algorithmic difference
between the proposed VSS-MCFLMS and NMCFLMS. We find that an additional
multiplying factor, p appears in the NMCFLMS algorithm which modulates the
eigenvalues of the data correlation matrix. From (4.31) we can derive the analytic
expression of the diagonal components of p which can be expressed in vector form as
57
u211 p1
u221 p1
u212 p2
u222 p2
+ ... +
u21(M L) pM L
+ ... +
..
.
u22(M L) pM L
u2k1 p1 + u2k2 p2 + . . . + u2k(M L) pM L

..
.
u2(M L)1 p1 + u2(M L)2 p2 + . . . + u2(M L)(M L) pM L

where, uk1 uk2 . . . uk(M L) are the components of eigenvector uk with k = 1, 2, . . . , M L
and p1 , p2 , . . . , pM L are the diagonal components of P. Since uk is a unit norm
vector, if p1 , p2 , . . . , pM L were equal, the diagonal elements of p would be equal. But
p1 , p2 , . . . , pM L are computed from the received data of different channel and hence
they are in general unequal. As a result the diagonal elements of p are also unequal.
Therefore, the scaling factor of the minimum eigenvalue is usually not the minimum
one.
In the noise-free case the minimum eigenvalue of the data correlation matrix is zero.
As a result the minimum value in the resultant eigenvalue profile of the NMCFLMS
algorithm remains zero, even after a larger scaling factor is attached to it. But in noisy
conditions, the eigenvalues of the data correlation matrix comes closer to each other.
Therefore, it is very likely that the minimum eigenvalue remains no longer minimum
in the resultant eigenvalue profile. In that case the NMCFLMS algorithm will fail
to converge to the eigenvector corresponding to the minimum eigenvalue of the data
correlation matrix. But for the proposed VSS-MCFLMS algorithm, Eq. (4.13) shows
that no such scaling factor appears to modulate the eigenvalue profile and hence it
converges to the minimum eigenvalue solution both in noise-free and noisy conditions.
To verify the above statement, we present in Fig. 4.2, the eigenvalue profiles of
original data correlation matrix and those obtained from the NMCFLMS algorithm for
a 5 channel acoustic systems with 128 coefficients at SNR=20 dB. The scaling factor
p is also depicted in the figure. It is seen that though the eigenvalue in position 29
is the minimum in original correlation matrix, the situation is no longer maintained in
58
Magnitude
15
0.026
27
5
0
0
100
0.4
Magnitude
(a)
0.027
10
28
200
0.16
0
0
27
100
30
300
Index
400
31
32
500
33
600
(b)
0.14
0.2
29
28
200
29
300
Index
30
400
31
32
500
33
600
Magnitude
x 10
4.5
4
3.5
27
(c)
28
29
30
31
32
33
0.5
0
0
100
200
300
Index
400
500
600
Figure 4.2: (a) Eigenvalue profile of the data correlation matrix. (b) Scaling factor p .
(c) Resultant eigenvalue profile of the NMCFLMS algorithm.
the NMCFLMS case. The eigenvalue in position 31 takes the minimum position. As
a result, the NMCFLMS algorithm will misconverge completely which can be verified
from the adaptive solution in the simulation section.
4.3.2
Comparison of computational complexity
Unlike the NMCFLMS algorithm in (4.25) which is implemented in 2L length, the

proposed VSS-MCFLMS algorithm is implemented in L length for the purpose of
convergence analysis and derivation of the optimal step-size. However, we can form an
equivalent representation of the VSS-MCFLMS in 2L length in order to compare its
computational complexity with the NMCFLMS which was originally implemented in
10
2L length. Premultiplying (4.1) by W2LL
, we get
b 10 (m + 1) = h
b 10 (m) f J 10 (m)
h
f
(4.32)
b 10 (m) = W 10 h(m)
b
h
2LL
(4.33)
where,

10
Jf10 (m) = W2LL
Jf (m)
10
W2LL
= F2L2L [ILL 0LL ]T F1
LL .
59
(4.34)
(4.35)
10
Premultiplying (4.33) and (4.34) by WL2L
we get
10
b
b 10 (m)
h(m)
= WL2L
h
(4.36)
10
Jf (m) = WL2L
Jf10 (m)
(4.37)
10
10
where, we have used the relation WL2L
W2LL
= ILL . Substituting (4.36) into (4.8),
the expression of the optimal step-size can be expressed as

opt
f (m)
10
b 10 (m)]H
[WL2L
h
W 10 J 10 (m).
=
10
||WL2L
Jf2L (m)||2 L2L f
(4.38)
One can write

10
10
10
10
[WL2L
]H WL2L
= 0.5 W2LL
WL2L
= 0.25 I2L2L .
(4.39)
Using (4.39), opt

f (m) in (4.38) can be written as
10
opt
f (m)
b (m)]H
[h
Jf10 (m).
=
||Jf10 (m)||2
(4.40)
Therefore, the optimal step-size of the proposed VSS-MCFLMS can be evaluated using
the 2L representation of the algorithm given in (4.32). With this optimal step-size,
(4.32) can be written as
b 10 (m) opt (m)J 10 (m).
b 10 (m + 1) = h
h
f
f
(4.41)
In Fig. 4.3, we give a comparison of the computational complexities of the proposed

VSS-MCFLMS and NMCFLMS algorithms for a three-channel system with random
impulse response. We have used the notations VSS-MCFLMS(L), VSS-MCFLMS
(2L) and NMCFLMS (2L) to indicate the L and 2L length implementation of the
VSS-MCFLMS and NMCFLMS algorithms. As shown in the figure, there is a linear
increase in floating point operations (FLOPS) with the increase of channel length for
all three implementations. The VSS-MCFLMS(2L) and the NMCFLMS(2L) show
almost identical computational complexities which is revealed from overlapping of the
two curves. However, the VSS-MCFLMS(L) has more computational complexity than
the other two implementations.
60
Computational Complexity (FLOPS)
x 10
VSSMCFLMS (2L)
NMCFLMS (2L)
VSSMCFLMS (L)
7
6
5
4
3
2
1
0
0
Figure 4.3:
50
100
150
200
Channel Length, L
250
300
Comparison of the computational complexities of the proposed
VSS-MCFLMS and NMCFLMS algorithms.
The VSS-MCFLMS(2L) and the
NMCFLMS(2L) show almost identical computational complexities which is revealed

from overlapping of the two curves.
4.4
Simulation Results
In this section, we investigate the performance of the proposed VSS-MCFLMS

algorithm and present comparative results with the NMCFLMS algorithm [35] for both
random coefficients channels and acoustic multichannel systems. The performance
index used for measurement of improvement and comparison is the normalized
projection misalignment [52] defined as
||(m)||
dB
||h||
hT h(m)
h(m)
(m) = h
2
||h(m)||
NPM(m) = 20 log10
where || || is the l2 norm.
4.4.1
Random multichannel system
First we present blind channel identification results for random coefficient SIMO FIR
system. The impulse responses are generated using the randn function of MATLAB.
61
4
2
0
2
10
15
20
25
30
20
25
30
20
25
30
Amplitude
h2
5
0
5
10
15
h3
5
0
5
10
15
Samples
Figure 4.4: Impulse responses of a single input 3 output FIR system.

The number of channels is M = 3 and the length of each channel impulse response is
L = 32. The source signal and the additive noise were zero mean Gaussian random
sequence. The true channel impulse responses are shown in Fig. 4.4.
Now we compare performances of the proposed VSS-MCFLMS and NMCFLMS
algorithms for random channel estimation in a moderate signal-to-noise environment.
Fig. 4.5 shows the results for both the algorithms at SNR = 20 dB. The proposed VSSMCFLMS algorithm shows lower final misalignment as compared to the NMCFLMS
algorithm without sacrificing the speed of convergence. The NPM of NMCFLMS
algorithm reaches at 20 dB in the initial stage of iterations which is indeed a good
estimate of the channel coefficients at this noise level. But after this rapid convergence,
it gradually diverges and finally settles between 10 to 12 dB. To the contrary, the
VSS-MCFLMS algorithm reaches at 20 dB with the same speed of convergence but
it shows no sign of divergence from initial convergence. At a relatively high SNR, both
the algorithms perform almost identically as shown in Fig. 4.6. Here we find that at
SNR = 40 dB, there is no significant difference between the two algorithms in terms
of speed of convergence as well as final misalignment.
62
NPM, dB
NMCFLMS
10
VSSMCFLMS
15
20
1000
2000
3000
4000
iterations, m
5000
6000
Figure 4.5: NPM profile of the VSS-MCFLMS and NMCFLMS algorithms for M = 3
channels L = 32 random coefficients SIMO system at SNR = 20 dB.
5
10
NMCFLMS
NPM, dB
15
20
VSSMCFLMS
25
30
35
40
45
2000
4000
6000
8000
iterations, m
10000
12000
Figure 4.6: NPM profile of the VSS-MCFLMS and NMCFLMS algorithms for M = 3
channels L = 32 random coefficients SIMO system at SNR = 40 dB.
4.4.2
The virtual acoustic room
A virtual acoustic room is used throughout the dissertation for evaluating the
conventional and proposed channel estimation as well as speech dereverberation
algorithms. The dimension of the room is taken to be (5 4 3) m. The schematic
diagram of the room is shown in Fig. 4.7. A linear array consisting of M = 5
63
z-coordinate
2.5
2
microphone array
1.5
speaker
1
0.5
0
4
3
2
1
0
0.5
y-coordinate
1.5
2.5
3.5
4.5
x-coordinate
Figure 4.7: Schematic diagram of the simulated room

microphones with uniform separation is placed inside the room. The first microphone
is positioned at (1.0, 1.5, 1.6) m and the other microphones is located at d = 0.2 m
apart along the y-coordinate. The speaker is initially positioned at (2.0, 1.2, 1.6) m.
The impulse responses were generated using the image model reported in [53].
Now we briefly describe how image model works. A sound source, reflecting off a
wall is equivalent to two sources, the original source in front of the wall, and a virtual
source (the mirror image of the original source) behind the wall as shown in Fig. 4.8.
The image method identifies all virtual source positions out of a specified maximum
distance. The free path propagation from these virtual sources to the listener position
then determines the impulse response. Fig. 4.8 shows a rectangular room containing
a source X and a listener O. Some nearby virtual sources are also indicated. From the
listeners point of view, listening to the source reflections is equivalent to listening to
the free field response of the virtual sources.
64
virtual source
source
listener
room
Figure 4.8: Virtual sources in a rectangular room. The dotted line from the source
to the listener represents a reflected sound path which is equivalent to the free field
contribution from the indicated virtual source.
4.4.3
Acoustic multichannel system
In this subsection, we present simulation results of acoustic channel identification by

the VSS-MCFLMS and NMCFLMS algorithm. The AIRs were generated considering
the virtual room described in Section 4.4.2 for reverberation time T60 = 0.1 s and then
truncated so as to make the length 128. The sampling frequency was 8 kHz. The
source signal and additive noise was Gaussian white sequence unless otherwise stated.
Fig. 4.9 shows NPM vs SNR for both the algorithms. It is seen that up to 25 dB,
the NMCFLMS algorithm completely fails to give an estimate of the channel impulse
response. This happens because the scaling matrix p misconverges the algorithm to
a spurious solution. At any SNR, VSS-MCFLMS algorithm gives better estimate as
compared to the NMCFLMS algorithm as revealed from this figure. We now provide
the NPM profile of the estimated channel for both the algorithms at SNR 25 dB
VSSMCFLMS
NMCFLMS
10
NPM (dB)
65
20
30
40
50
60
0
10
20
30
SNR (dB)
40
50
60
Figure 4.9: Noise robustness of the NMCFLMS and the proposed VSS-MCFLMS
algorithms for a M = 5 channel L = 128 coefficients acoustic system at low SNR.
0
5
NPM, dB
10
15
20
VSSMCFLMS
25
NMCFLMS
2
6
iterations, m
10
4
x 10
Figure 4.10: NPM of the VSS-MCFLMS and NMCFLMS algorithms for M = 5

acoustic channels L = 128 coefficients SIMO FIR system at SNR = 25 dB.
in Fig. 4.10. In case of the NMCFLMS algorithm, we see a rapid convergence at
initial iterations but with increased iterations the NPM deteriorates until complete
misconvergence. But the proposed VSS-MCFLMS algorithm shows more robustness
to noise as it maintains a reasonable NPM level in the steady state solution.
Fig. 4.11 shows the convergence profile for both the algorithms at SNR = 40 dB
where the VSS-MCFLMS algorithm converges to NPM = 40 dB with no sign of
66
5
10
NPM, dB
15
20
NMCFLMS
25
30
VSSMCFLMS
35
40
2
6
iterations, m
10
4
x 10

acoustic channels L = 128 coefficients SIMO FIR system at SNR = 40 dB.
misconvergence. On the other hand, the NMCFLMS algorithm first converges towards
NPM = 35 dB and then diverges from this solution to a steady NPM of 26 dB.
Even at this low noise environment, VSS-MCFLMS algorithm shows an improvement
of almost 15 dB in the NPM.
For speech dereverberation, the acoustic channels should be estimated using the
speech input. Unfortunately, both the VSS-MCFLMS and NMCFLMS algorithms fail
to estimate the AIR even in a moderate SNR when speech input is considered. Fig. 4.12
shows the NPM of the VSS-MCFLMS and NMCFLMS algorithm with speech input
at SNR = 20 dB. The VSS-MCFLMS algorithm rapidly diverges from initial better
estimate to an unacceptable solution represented by the 0 dB NPM. The NMCFLMS
algorithm shows better performance, converging to the desired solution in the initial
stage of iterations. However, the algorithm cannot sustain this transient solution and
gradually misconverges to an undesired solution as indicated by the increasing level of
NPM. Thus we find that both the algorithms are unsuitable for channel estimation in
the dereverberation problem.
67
0
2
NPM (dB)
VSSMCFLMS
6
8
NMCFLMS
10
12
0
500
1000
1500
2000
2500
iterations, m
3000
3500
4000

acoustic channels with L = 128 coefficients at SNR = 20 dB for speech input.
4.5
Conclusion
In this chapter, we have proposed a variable step-size (VSS) multichannel frequencydomain LMS (MCFLMS) algorithm. The expression of VSS has been derived in such
a way that it minimizes the misalignment of the estimated channel vector with the
true one at each iteration. It has been demonstrated that the proposed variable stepsize guarantees the stability of the MCFLMS algorithm. The convergence analysis has
revealed that the VSS-MCFLMS algorithm is more noise-robust as compared to the
NMCFLMS algorithm. In spite of the relative robustness, the VSS-MCFLMS algorithm
is unable to estimate the acoustic channels with speech input even at moderate SNR of
20 dB. Therefore, we need to improve the robustness of the class of MCLMS algorithms
for speech dereverberation.
Chapter 5
Noise Robust Multichannel Timeand Frequency-Domain LMS-type
Algorithms
In this chapter, we present two novel solutions to improve the noise robustness
of multichannel LMS-type algorithms for both time- and frequency-domain
implementations. The proposed algorithms are termed as excitation-driven MCLMS
algorithm and spectrally constrained MCLMS algorithm. The former converges to a
steady-state multi-eigenvector solution instead of converging to the traditional singleeigenvector solution and thus provides improved robustness. The second approach
relies on the fact that the misconvergence characteristic is associated with nonuniform
spectral attenuation of the estimated channel coefficients. Therefore a novel cost
function is formulated that inherently opposes such spectral attenuation and thus
contribute to ameliorating the misconvergence of the MCLMS algorithm.
5.1
Excitation-driven Robust MCLMS Algorithm
It is demonstrated in Chap. 3 that the final estimate of the MCLMS algorithm comes
from only one eigenvector that corresponds to the minimum eigenvalue of the data
68
Chapter 5. Noise-Robust MCLMS-type Algorithms
69
correlation matrix. Here we show that the single eigenvector solution cannot produce
a reasonable estimate of the channel impulse response when observations are corrupted
by noise.
5.1.1
The desired estimate from the noisy eigenvectors
The ideal expression of the desired final estimate can be expressed as
hd = R1 Rh
(5.1)
where R1 R = I and h is the true channel impulse response. Substituting R1 =

U1 UT into (5.1), we obtain
hd = U1 UT Rh
(5.2)
Now R can be expanded as [48]

R = Ry + Rv + Ryv + Rvy = Ry + Rn
where Rn = Rv + Ryv + Rvy .
The matrices Ry and Rv are the clean data
correlation matrix and noise correlation matrix, respectively and Ryv and Rvy denote
the crosscorrelation matrices between them. For a SIMO system, the true channel
vector lies in the null space of the clean data correlation matrix. Therefore, Ry h = 0,
and we can write (5.2) as
hd = U1 UT Rn h.
(5.3)
In order to achieve this solution in the final estimate of the adaptive algorithm, let
us simplify Rn . Although, Rn is a non-diagonal matrix, the off-diagonal components
are generally much smaller than the diagonal ones. If we assume that the signal and
noise are uncorrelated, Rn reduces to Rv . Now, in a multichannel speech enhancement
system, noise may be introduced from the system level (sampling jitter, quantization
noise) as well as from the environment. The system level noise is usually uncorrelated.
70
The environmental noise, in the presence of multiple noise sources, contains both
correlated and uncorrelated noise.
The proportion of the two noise components
depends on the acoustic environment, however, the uncorrelated component is generally

dominant over the correlated one and a diffuse noise field is modeled with uncorrelated
noise only [54]. A diffuse noise field implies that the noise is not coming from a
particular direction, and is a good model for a number of practical reverberant noise
environments encountered in speech enhancement applications. For example, in a car
environment, noise is mainly composed of three independent components such as the
engine, the contact between tires and road, and the wind fluctuations. All these noises
can be roughly considered as diffuse. It is shown in [55] that acoustic noises in a car
environment are uncorrelated for frequencies above 210 Hz. Under this assumption,
Rn reduces to a diagonal matrix Rv which can be expressed as
P
2 I
0LL
...
0LL
i6=1 vi LL
P
0LL
0LL
i6=2 vi ILL . . .
..
..
..
...
.
.
.
P
2
0LL
0LL
...
i6=M vi ILL
where v2i is the noise power in the ith channel. Therefore, we can write (5.3) as
hd U1 UT Rv h.
(5.4)
Now, the diagonal components of Rv are divided into M blocks with L identical
elements. Each block corresponds to a specific channel whose value for the jth channel
P
can be defined as j = i6=j v2i . Since the sensors are close to one another and j
is the sum of noise powers on all channels except the self one, we can approximate
1 2 M = .
With this approximation, Rv becomes a diagonal
matrix with equal components, which significantly reduces the systemic complexity
and computational load for the adaptive implementation of the noise robust MCLMS
71
algorithm. Now we can write (5.4) as

hd U1 ho
h
=
u1 . . . uk . . . uM L
ho1
..
i ho
..
ho
ML
M L
(5.5)
(5.6)
where ho = UT h, and hok is the kth component of ho . Therefore, we find that the
desired channel estimate is a weighted combination of all the eigenvectors with weight
profile inversely proportional to the eigenvalues. This relationship is a generalization
of the single-eigenvector solution applicable for the noise-free case. Since the minimum
eigenvalue is zero in the noise-free case, the corresponding eigenvector should receive
infinite weight in the linear combination of the eigenvectors. As a result, the MCLMS
algorithm, which always converges to the eigenvector corresponding to the smallest
eigenvalue, can converge to the desired solution only in the noise-free condition.
Now, let us see the impact of different approximations on the formulation of the
desired estimate. Table 5.1 shows the NPM values for M = 5 acoustic channels
of length L = 512, obtained using (5.3)-(5.5), and compares these results with
the conventional single-eigenvector solution of the MCLMS algorithm. Eq. (5.3)
corresponds to the ideal solution, which gives 218.6 dB NPM. Considering signal
and noise are uncorrelated, we obtain a final estimate of 23.3 dB. Next, due to
the assumption that diagonal components of Rv are equal (made only for analytical
simplification), the NPM of the desired solution increases to 18.3 dB. Although
it seems that the approximation in each stage introduces some degree of error, the
weighted combination of the eigenvectors in the noisy condition is a much better
solution than the single-eigenvector solution corresponding to the smallest eigenvalue.
72
Table 5.1: Impact of approximations on the desired estimate

(SNR = 10 dB)
Estimator
NPM (dB)
Equation (5.3)
218.6
Equation (5.4)
23.3
Equation (5.5)
18.3
MCLMS solution
5.1.2
0.0
The noise-robust MCLMS algorithm
In this section, we present a noise-robust MCLMS (RMCLMS) algorithm that results

in a good weighted combination of the eigenvectors in the final estimate and thus
shows no sign of misconvergence in the presence of noise. The update equation of the
proposed algorithm is expressed as
b + 1) = h(n)
b
h(n
J(n) + h(n)
(5.7)
T (n) h
T (n) h
T (n)]T acts as an excitation function for the original
where h(n)
= [h
1
2
M
MCLMS algorithm coupled through a tunable parameter which is estimated as in
[48]:
h
T
(n)J(n)
=
.
2
||h(n)||
(5.8)
Here, h(n)
resembles the true channel vector in large dominant components and is
determined from the adaptive estimate as
sign[b
b i (n)|
hi,l (n)], |b
hi,l (n)| > 0.75 max |h
i
hi,l (n) =
0,
otherwise
(5.9)
where i = 1, 2, ..., M , l = 0, 1, 2, ..., L 1, sign[] represents the signum function

b i (n)| is the maximum absolute value
evaluating the sign of a real number, and max |h
i
of the ith channel coefficients. After the initial transient phase, we can approximate
73
h(n)
as a constant vector because of the fact that the position of the large dominant
components in the channel estimate are essentially fixed within the first few iterations.
Therefore the following convergence analysis of the proposed RMCLMS algorithm can
be performed by taking h(n)

as a constant vector.
b
h(n)
Substituting J(n) = 2R
into (5.7) and taking the statistical expectation, we
get
b
b
b
h(n
+ 1) = h(n)
2Rh(n)
+ h
(5.10)
Premultiplying (5.10) by UT and using R = UUT , we obtain

o
o
b
b
o
h
(n + 1) = (I 2)h
(n) + h
(5.11)
b
b
o = [h
o h
o h
o ]T = UT h.
The set of M L firstwhere, h
(n) = UT h(n)
and h
1
ML
k
order difference equations in (5.10) are now decoupled. Therefore, the kth equation in
the transformed-vector domain can be expressed as
o
o
b
o.
hk (n + 1) = (1 2k )b
hk (n) + h
k
(5.12)
Now, the solution of (5.12) can be represented as

o
b
hk (n) = hkH (n) + hkP
(5.13)
o
where hkH (n) and hkP denote the homogeneous and particular solutions of b
hk (n),
respectively. Similar to (4.15), the homogeneous solution can be written as
hkH (n) = Ck (1 2k )n u(n).
(5.14)
As hkP satisfies (5.12), we obtain

o.
hkP = (1 2k )hkP + h
k
(5.15)
o . Substituting hkP into

For obtaining the particular solution, we assume hkP = k h
k
(5.15) yields
k =
2k
74
and therefore
hkP
o
h
k
=
.
2k
Substituting the values of hkH (n) and hkP into (5.13), the solution of the kth difference
equation of the robust MCLMS algorithm can be written as
o
h
o
b
hk (n) = Ck (1 2k )n u(n) + k .
2k
In the presence of noise, none of the k is zero and therefore the transient term
represented by (5.14) decays exponentially to zero for any k. Therefore, the final
o
value of b
hk (n) can be obtained as
o
h
o
k
b
,
hk () =
2k
k = 1, 2, , M L.
(5.16)
o
b
b
Utilizing the relations h(n)
= Uh
(n) and (5.16), the final estimate of the channel can
be expressed as
b
h()
= U
o /21
h
1
o /22
h
2
..
.
o /2M L
h
ML
1/1
0
0
1/2

U
=
.
..
2 ..
.
0
0
=
o.
U1 h
2
...
...
...
0
..
.
. . . 1/M L
o
h
1
o
h2
..
.
o
h
ML
(5.17)
We now compare the final solution given by (5.17) resulting from the RMCLMS
algorithm with the desired solution in (5.5). Both expressions represent the final
solution as a weighted combination of all the eigenvectors with similar weight profile
which include constant terms, inverse of the eigenvalue, and a transform-domain
variable. It is known that the MCLMS-type algorithms can only estimate the channel
impulse response with a scaling factor ambiguity [34]. As a result, the difference in
75
Table 5.2: Comparison of the final solution using conventional and robust MCLMS
algorithm: M = 5, L = 512
(SNR = 10 dB)
Algorithm
MCLMS solution
RMCLMS solution
NPM (dB)
0.0
9.3
the constant terms (/2 and ) is not significant at all. The second term, which is the
o
inverse of the eigenvalue 1/, is common in both expressions. Finally, by definition, h
resembles the true transform vector ho . Therefore, the proposed RMCLMS algorithm
can reasonably approach the desired solution in terms of a weighted combination of
o
the noisy eigenvectors. The error introduced due to the approximation of ho by h
is illustrated in Table 5.2 using a numerical example. Here, we see that the NPM
value achieved by the RMCLMS solution is 9.3 dB, which is a good estimate of
the channel for the given 10 dB noise level. To the contrary, the single-eigenvector
corresponding to the smallest eigenvalue shows almost 0 dB NPM, and this figure
indicates a complete failure of the conventional MCLMS algorithm to estimate the
channel in a noisy condition.
5.1.3
VSS-RMCLMS and proportionate algorithms
The step size controls the speed of convergence as well as stability of the LMS algorithm.
We first formulate a variable step size for the RMCLMS algorithm that guarantees the
stability and at the same time ensures fast decay of the transient response, giving
rapid convergence to the steady-state solution. The update equation of the RMCLMS
algorithm in the transform-domain is given by (5.11). For the stability of the algorithm,
we should choose a step size, (n), that causes the transient terms in (5.11) to decay
rapidly with iteration. For this, (n) is selected such that the squared norm of the
transient part of (5.11) is minimum at each iteration. Then, a cost function Jo (n) is
76
defined as
o
b
Jo (n) = ||(I 2(n))h
(n)||2
ML
X
o
=
| [1 2(n)k ] b
hk (n)|2 .
(5.18)
k=1
Differentiating (5.18) with respect to (n) and equating to zero gives,

ML
X
o
k |b
hk (n)|2
h o iT
o
b
b
h
(n) h
(n)
=
(n) = k=1
h
i
T
ML
o
o
X
2h
b
b
o
2
h
(n)
(n)
2 b
2
2 k |hk (n)|
k=1
iT
b
b
UT h(n)
UT h(n)
=
.
h
iT
T
T
T
b
b
2 U h(n) U U Uh(n)
(5.19)
Rearranging (5.19) and substituting UUT = R, we finally obtain

T
b
b
h
(n)Rh(n)
(n) = T
.
b
b
2h
(n)RT Rh(n)
(5.20)
Using instantaneous parameters, (n) can be expressed as

b T (n)R(n)
b
h
h(n)
(n) =
.
b T (n)R
b
T (n)R(n)
2h
h(n)
(5.21)
Employing the relation J(n) = 2R(n)

h(n),
(n) can be adaptively estimated as
(n) =
b T (n)J(n)
h
||J(n)||2 +
(5.22)
where is a regularization parameter that is incorporated to prevent division by zero.

Now let us see how the stability of the RMCLMS algorithm is ensured when the
proposed variable step size is adopted. We know that the RMCLMS algorithm will be
unstable if for any value of k, |1 2(n)k | becomes greater than 1, which causes b
hok (n)
to rise exponentially. In that case, according to (5.19) the proposed (n) becomes
(neglecting all other components as compared to the rising one)
(n)
=
|b
hok (n)|2 k
2|b
ho (n)|2 2
k
1
.
=
2k
(5.23)
77
Therefore, we find that (n) is auto-regulated, which forces |1 2(n)k | to be always

less than 1 and thus ensures the stability as well as fast convergence of the RMCLMS
algorithm.
The acoustic channel is not sparse in a strict sense. However, the magnitudes of
most of its coefficients are very small as compared to those of some of its dominant
components. Therefore, we can consider acoustic channels as quasi-sparse channels,
and use the proportionate algorithm for improved performance of the MCLMS-type
algorithms. Similar to [56] where the proportionate algorithm has been derived for
the normalized frequency-domain MCLMS algorithm, the update equation of the
proportionate VSS-RMCLMS (PVSS-MCLMS) algorithm can be expressed as
b i (n + 1) = h
b i (n) LGi (n)(n)Ji (n) + (n)h
i (n),
h
i = 1, 2, . . . , M
(5.24)
where, Gi (n) = diag[gi,0 (n) gi,1 (n) . . . gi,l (n) . . . gi,L1 (n)], and gi,l (n) is expressed as
gi,l (n) =
1
|b
hi,l (n)|
+ (1 + ) L1
2L
X
hi,l (n)| +
2 |b
(5.25)
l=0
where and are controlling parameters.
5.1.4
Robust multichannel frequency-domain LMS algorithms
The multichannel frequency-domain LMS algorithm give accelerated convergence speed

as compared to the time-domain counterpart. However, the algorithm also suffers
from non-robustness to noise. In this subsection, we propose excitation-driven robust
MCFLMS (RMCFLMS) algorithm that provides robust performance in the presence
additive noise while maintaining the faster convergence speed. The update equation
of MCFLMS algorithm as expressed in Eq. (4.1) is rewritten for convenience of the
derivation.
b + 1) = h(m)
b
h(m
f Jf (m).
(5.26)
78
We formulate this update equation with a length-2L vector instead of a length-L vector
which reduces the cost of computation as shown in Sec. 4.3.2. To get an equivalent
length-2L vector update equation of the MCFLMS algorithm, we pre-multiply (5.26)
10
by W2LL
, which gives
b 10 (m) f J 10 (m)
b 10 (m + 1) = h
h
f
(5.27)
b 10 (m) = W 10 h(m)
b
h
2LL
(5.28)
10
W2LL
= F2L2L [ILL 0LL ]T F1
LL .
(5.29)
where,
Now, the update equation of the RMCFLMS algorithm can be formulated as

10 (m) f J 10 (m) + f h
10 (m + 1) = h
10 (m)
h
f
(5.30)
10 (m) is the excitation function in the frequency-domain to ensure robustness

where h
of the MCFLMS algorithm. We know that the true channel vector, whether it is
random or acoustic, has a reasonably flat wide-band spectrum in the frequency domain.
Therefore, we can estimate the excitation function as
10
(m)| = 1
|h
k
6
10 (m) =
h
k
6
(5.31)
10 (m), k = 1, 2, , M L.
h
k
(5.32)
Now, the variable-step-size RMCFLMS (VSS-RMCFLMS), which ensures fast decay

of the transient response, can be expressed as
10 (m) f (m)J 10 (m) + f (m)h
10 (m)
10 (m + 1) = h
h
f
(5.33)
where f (m) can be defined similar to (5.22) as

10
(m)]H
[h
f (m) =
Jf10 (m).
10
2
||Jf (m)|| +
(5.34)
Similarly, the proportionate VSS-RMCFLMS (PVSS-RMCFLMS) algorithm can be

expressed as
i (m + 1) = h
i (m) LGi (m)f (m)Jf (m)
h
i
i (m), i = 1, 2, , M
+f (m)h
(5.35)
79
where
10
Jfi (m) = WL2L
Jk10 (m)
i (m) = W 10 h
10 (m)
h
L2L
10
= FLL [ILL 0LL ]F1
WL2L
2L2L
and Gi (m) = diag[gi,0 (m) gi,1 (m) . . . gi,l (m) . . . gi,L1 (m)] where
gk,l (m) =
k,l (m)|
1
|h
+ (1 + ) L1
.
2L
X
k,l (m)| +
2 |h
l=0
The proposed robust technique is equally applicable to the NMCFLMS algorithm

[35], which is too sensitive to observation noise but shows very fast speed of convergence
in acoustic channel estimation.
The update equation for the robust NMCFLMS
(RNMCFLMS) algorithm can be formulated as

b 10 (m) P1 (m)
b 10 (m + 1) = h
h
k
k
k
M
X
Dxi (m)
i=1
e01
ik (m)
k (m), k = 1, 2, , M
+ h
(5.36)
where,
Pk (m) =
M
X
Dxi (m)Dxi (m)
i=1,i6=k
and is the step size.
5.1.5
Simulation results
Now we demonstrate the performance of the conventional multichannel LMS-type

algorithms such as fixed-step-size MCLMS (FSS-MCLMS) [34], VSS-MCLMS [50],
PVSS-MCLMS, VSS-MCFLMS, PVSS-MCFLMS, and NMCFLMS [35] algorithms
both in the time and frequency domain implementations. At the same time, we
80
NPM, dB
FSSMCLMS
VSSMCLMS
PVSSMCLMS
FSSRMCLMS
VSSRMCLMS
PVSSRMCLMS
6
8
10
0
2
3
iterations, n
5
4
x 10
NPM, dB
2
4
VSSMCLMS
PVSSMCLMS
VSSRMCLMS
PVSSRMCLMS
8
10
12
0
2
iterations, n
4
5
x 10
Figure 5.1: Convergence profile of the excitation-driven robust and conventional

multichannel time-domain LMS-type algorithms using (a) random signal input at SNR
= 10 dB and (b) speech input at SNR = 15 dB.
present comparative channel estimation results of the proposed excitation driven FSSRMCLMS, VSS-RMCLMS, PVSS-RMCLMS, VSS-RMCFLMS, PVSS-RMCFLMS,
and RNMCFLMS algorithms.
The update equation parameters chosen in the
simulations are = 0.5, = 0.75, = 108 , = L. The AIRs of the acoustic

multichannel systems were generated considering the virtual room described in Section
4.4.2 for reverberation time T60 = 0.2 s and then truncated so as to make the length
512. The sampling frequency was 8 kHz. Both speech and random signal source were
used for channel identification. The additive noise considered was computer generated
Gaussian random sequence.
(a)
PVSSMCFLMS
NPM, dB
VSSMCFLMS
NMCFLMS
10
PVSSRMCFLMS
VSSRMCFLMS
15
20
0
RNMCFLMS
1000
2000
3000
frame index, m
4000
(b)
2
NPM, dB
81
NMCFLMS
PVSSMCFLMS
VSSMCFLMS
6
PVSSRMCFLMS
8
10
0
RNMCFLMS
1000
VSSRMCFLMS
2000
frame index, m
3000
Figure 5.2: Convergence profile of the excitation-driven robust and conventional

multichannel frequency-domain LMS-type algorithms for using (a) random signal input
at SNR = 10 dB and (b) speech input at SNR = 15 dB.
In Fig. 5.1, we present channel estimation results of the MCLMS-type algorithms

at 10 and 15 dB SNR for random and speech inputs, respectively.
In both the
cases, the conventional algorithms show good initial convergence as revealed from
the lower NPM values in the early stage of iterations. But following this apparent
convergence, NPM starts to increase until complete misconvergence. To the contrary,
the proposed algorithms converges to a steady-state solution with no sacrifice in the
speed of convergence. The accuracy of the final estimate is, however, dictated by the
noise level of the observation data. Moreover, the proportionate version can improve
the final misalignment performance while keeping the same speed of convergence.
82
Next, we present comparative performance results for the same acoustic systems
considered in Chapter 4 using the frequency-domain MCLMS algorithm. In Fig. 5.2,
we show the results of channel estimation at 10 dB and 15 dB SNR using random
and speech inputs, respectively. The advantage of the frequency domain is readily
understood from the number of iterations it requires to reach the final convergence as
compared to the time-domain algorithm. However, the misconvergence phenomenon
is still prevalent in the frequency domain. Here we find that the proposed excitation
function added to the original update equation brings noise robustness in the channel
estimation for all variants of the MCFLMS algorithm. Particularly in Fig. 5.2(a) we
note that the NMCFLMS algorithm, which shows the highest speed of convergence
with the least robust characteristics, has now become the superior of all algorithms.
5.1.6
Limitation of excitation-driven MCLMS algorithm
Excitation-driven MCLMS algorithm achieves robust performance by converging the

algorithm to a steady-state solution. The algorithm assumes that the AIR does not
change during the convergence. However, the acoustics channels are highly timevarying and the slight movement of the speaker causes significant variation in the
AIR. As a result a steady-state based solution which necessarily assumes that the
acoustic conditions remain time-invariant until the convergence is achieved does not
suit the dereverberation problem. In the next section, a class of spectrally constrained
MCLMS algorithms are proposed that can estimate the AIR in a time-varying noisy
condition.
5.2
Spectrally Constrained RMCLMS Algorithm
It is observed that both the NMCFLMS and VSS-MCFLMS algorithms give good
initial estimate of the channels followed by rapid divergence from this better estimate in
the presence of additive noise. This misconvergence is associated with the nonuniform
spectral attenuation of the estimated channel impulse response as illustrated in Section
83
3.6. Therefore we propose a modified cost function Jmod (m) = Jf (m) + (m)Jp (m),
where Jf (ref. to Eq. 3.21) and Jp are the original and penalty cost functions,
respectively which are coupled through the Lagrange multiplier, (m). The penalty
function that can ameliorate the misconvergence of the MCFLMS-type algorithms is
defined in this work as
maximize Jp (m) =
ML
Y
b (m)|2
|h
i
(5.37)
i=1
subject to
b (m)|2 + |h
b (m)|2 + + |h
b (m)|2 = 1
|h
1
2
ML
ML
(5.38)
where (5.38) is ensured by the unit norm constraint imposed on the update equation.
Now substituting (5.38) into (5.37), we obtain
2
b (m)|2 |h
b (m)|2 |h
b
Jp (m) = |h
1
2
(M L1) (m)|
1
2
2
b (m)| |h
b
|h
.
1
(M L1) (m)|
ML
(5.39)
b (m), we get
Differentiating (5.39) with respect to h
k
1
b (m)|2 2|h
b (m)|2
|h
1
k
ML
ML
Y
2
b
b (m)|2 .
|hM L1 (m)| }
|h
i
b (m){
Jp k (m) = 2h
k
(5.40)
i=1,i6=k
We know that the penalty function, Jp (m) will be either maximized or minimized when
Jp k (m) = 0 for all k. From (5.40), we see that Jp k (m) can be zero for two different
conditions.
hk (m) = 0 for any k, the Jp k (m) becomes zero. But this condition
1. When b
minimizes the penalty function.
2. When the following condition is satisfied,
1
2
b (m)|2 + + |h
b
b (m)|2 + + 2|h
|h
1
k
M L1 (m)| =
ML
(5.41)
Jp k (m) also becomes zero. However, this condition maximizes the penalty
function.
84
Therefore it is clear that if we maximize the penalty function by going along the
gradient of the penalty function, the later condition will be satisfied. Formulating Eq.
(5.40) for each value of k, we may obtain (M L 1) simultaneous linear equations of
the same form as (5.41). Adding all such equations we get
ML 1
2
b (m)|2 + + |h
b (m)|2 + + |h
b
|h
.
1
k
M L1 (m)| =
M 2 L2
(5.42)
Subtracting (5.42) from (5.41), we obtain the condition for penalty function
b (m)|2 =
maximization as |h
k
1
,
M 2 L2
k = 1, 2, , M L. Thus the penalty function
will be maximum when the estimated channel coefficients have uniform magnitude
spectra in the frequency-domain. Therefore to combat nonuniform spectral attenuation
problem in the misconvergence phase, spectral flatness can be attached as a constraint
with the original cost function, Jf (m) via the penalty term, Jp (m) using the Lagrange
multiplier. The total regularized cost function to be minimized can be defined as
Jt (m) = Jf (m) + (m){Jp (m)}. The negative sign before Jp (m) ensures that while
Jt (m) is minimized, Jp (m) is maximized. The adaptive update rule for this constrained
minimization can be readily obtained as
b
f (m) + (m)f (m)Jp (m)
b + 1) = h(m) f (m)J
h(m
.
b
M L ||h(m)||
(5.43)
The beauty of the proposed penalty function is that its gradient remains almost inactive
as compared to the original signal gradient in the initial phase of iterations. This
phenomena stems from the fact that the true channel vector whether it is acoustic
or random is spectrally wide band. Thus the original cost function is expected to
be better minimized with a wide band estimate of the channel. This leads to almost
negligible gradient of the penalty term and thereby making no noticeable effect on
the undate equation. However, when the misconvergence starts because of nonuniform
spectral attenuation in the estimate, the initially dormant gradient of the penalty term
becomes active, enforces spectral flatness and eventually eradicates misconvergence.
In order to simplify the expression of the penalty gradient, we take natural logarithm
on both sides of (5.37). This does not relax the functionality of the penalty term.
85
Therefore, we can rewrite the penalty cost function as

Jp (m) =
ML
X
b (m)|2 ).
ln(|h
i
(5.44)
i=1
Now the penalty function gradient is obtained as

T
Jp (m) = [Jp 1 (m) Jp k (m) Jp M (m)]T

where Jp k (m) is
Jp k (m) =
Jp (m)
2
Jp (m)
b (m)
+j i
=
h
r
k
2
b
b (m)
b
|
h
(m)|
h
hk (m)}
k
k
br (m) = real{h
b (m)} and h
bi (m) = imag{h
b (m)}. Therefore, we can write
where h
k
k
k
k
Jp (m) as
b
Jp (m) = Q(m)h(m)
(5.45)
b (m)|2 , k = 1, 2, M L.
where Q(m) is a diagonal matrix with diagonal elements 2/|h
k
The coupling factor, (m) is estimated such that the total gradient becomes zero
(Jt (m) = 0) in the steady-state condition. This gives Jf (m) = (m)Jp (m) and
premultiplying both sides by JpH (m), we can obtain (m) as
JH (m)J (m)
f
p
(m) =
.
2
||Jp (m)||
(5.46)
Similarly, the spectral constraint, Jp (m), can also be attached to the update
equation of the normalized MCFLMS (NMCFLMS) algorithm in order to improve the
noise robustness. The update equation of the original NMCFLMS algorithm can be
expressed as [35]
b 10 (m + 1) = h
b 10 (m) P1 (m)J 01 , k = 1, , M
h
k
k
k
k
(5.47)
where
b 10 (m) = F2L2L [ILL OLL ]T h
b k (m)
h
k
M
X
Pk (m) =
Dxi (m)Dxi (m)
(5.48)
(5.49)
i=1,i6=k
Jk01 (m) =
M
X
i=1
01
Dxi (m)W2LL
eik (m)
(5.50)
86
update equation as
b 10 (m + 1) = h
b 10 (m) P1 (m)J 01 (m)
h
(5.51)
b 10 (m) and J 01 (m) are obtained by concatenating the individual channel

where h
vectors. P(m) is a diagonal matrix with diagonal terms of Pk (m) in sequential order.
Now, the update equation for the robust NMCFLMS algorithm can be written as
b 10 (m + 1) = h
b 10 (m) Jn 01 (m) + n (m)Jp 10 (m)
h
(5.52)
where
Jn 01 (m) = P1 (m)J 01 (m)
(5.53)
b 10 (m)
Jp 10 (m) = Q10 (m)h
[J 10 (m)]H J 01 (m)
p
n
n (m) =
10
2
||Jp (m)||
(5.54)
(5.55)
In this work, the AIRs will be estimated using (5.52) for speech dereverberation. The
extra computational cost required to implement the proposed penalty term is not
significant. For example, the total number of multiplications and divisions required
by the NMCFLMS algorithm is 4M 2 L + 5M L2 + 4M L per iteration, whereas the
increase in the computational cost due to the added penalty term is only 5M L + 1.
The implementation of our blind channel estimation algorithm is shown in Table 5.3.
5.2.1
Simulation results
Now we investigate the performance of the proposed Spectrally Constrained

RNMCFLMS and RVSS-MCFLMS algorithms in (5.52) and (5.43) for acoustic
multichannel systems. We also compare the performance of the proposed method
with that of the original NMCFLMS and VSS-MCFLMS to show the robustness of our
method. In all cases, the step-size parameter, for the NMCFLMS and RNMCFLMS
algorithms was fixed to 0.5, unless otherwise stated.
Fig. 5.3 shows the results of channel estimation with random input at SNR = 10 dB.
It can be observed that the penalty term ensures robustness of both the algorithms
87
Table 5.3: Spectrally constrained normalized multichannel frequency-domain LMS

algorithm
Step 1
Set iteration index m = 0, parameter = 2.

Initialize the channel vectors, hk = [1 01(L1) ]T for k = 1, , M .
Step 2 k = 1, 2, , M
Compute
b k (m) xT (m)L1 h
b i (m),
eik (mL) = xTi (m)L1 h
k
eik (m) = [eik (mL) eik (mL + 1) eik (mL + L 1)]T ,
eik (m) = FLL eik (m).
10
b (m), Pk (m) and J 01 (m) as defined in (5.48), (5.49) and (5.50),

Obtain h
k
k
respectively.
Step 3 Compute Jn 01 (m), Jp 10 (m) and n (m) as defined in (5.53), (5.54) and (5.55),
respectively.
step 4 Update h:
b 10 (m + 1) = h
b 10 (m) Jn 01 (m) + n (m)Jp 10 (m)
h
b + 1) = [ILL 0LL ]F1 h
b 10 (m + 1)
h(m
2L2L
Step 5 Set m = m + 1 and go to step 2.
NPM, dB
88
NMCFLMS ( = 0.5)
VSSMCFLMS
5
10
RNMCFLMS ( = 0.5)
RVSSMCFLMS
15
20
0
RNMCFLMS ( = 0.25)
500
1000
Frames, m
1500
Figure 5.3: NPM profile of the spectrally constrained and conventional MCFLMS
algorithms for acoustic channel identification with white Gaussian input at SNR = 10
dB.
0
NPM, dB
NMCFLMS
RNMCFLMS
VSSMCFLMS
10
RVSSMCFLMS
15
0
Figure 5.4:
5000
10000
Frames, m
15000
NPM profile the spectrally constrained and conventional MCFLMS
algorithms for acoustic channel identification with speech input at SNR = 15 dB.
without sacrificing the speed of convergence for the reason explained after (5.43). It can
also be seen that the step-size parameter, , acts as a trade-off between the convergence
speed and final misalignment error for the proposed RNMCFLMS algorithm.
We now present the NPM profile of the estimated channel using speech input
at SNR 15 dB in Fig.
5.4.
In case of the NMCFLMS and VSS-MCFLMS
algorithms we see good initial convergence. With increased iterations, the NMCFLMS
completely misconverges. As stated earlier, the VSS-MCFLMS is more robust than
the NMCFLMS and shows slow misconverging trend. To the contrary, the proposed
NPM (dB)
89
without spectral constraint

with spectral constraint
4
6
8
10
0
500
1000
1500 2000 2500 3000

iteration index, m
3500 4000 4500
Figure 5.5: Convergence profile of the Spectrally Constrained RNMCFLMS algorithm

for the identification acoustic channels using speech input at SNR = 25 dB.
spectrally constrained RNMCFLMS and RVSS-MCFLMS algorithms show no sign of
misconvergence with noise.
5.2.2
Estimation of long AIR
Now, the spectrally constrained RNMCFLMS algorithm will be used for estimating
the long AIR commonly encountered in dereverberation problem. The AIRs were
generated considering the same virtual room described in Section 4.4.2 for reverberation
time T60 = 0.55 s and then truncated so as to make the length 4400. The sampling
frequency was 8 kHz. For speech input, we consider a female speech, sampled at 8 kHz.
For noise, we consider computer generated Gaussian random sequence.
Fig. 5.5 shows the channel estimation results with conventional and proposed
RNMCFLMS algorithms at 25 dB SNR with speech input. The conventional algorithm
shows good initial convergence as revealed from the lower NPM values in the early
stage of iterations. But following this apparent convergence, NPM starts to increase
until complete misconvergence. To the contrary, the spectrally constrained algorithm
converges to a steady-state solution with almost no sacrifice in the speed of convergence.
The accuracy of the final estimate is, however, dictated by the noise level of the
90
(a) true channel

1
Ch4
Ch3
Ch2
Ch1
Ch5
0
1
0.2
0.4
0.6
0.8
1.2
1.4
1.6
1.8
2.2
4
x 10
(b) estimated by the spectral constraints NMCFLMS
amplitude
1
0
1
0.2
0.4
0.6
0.8
1.2
1.4
1.6
1.8
2.2
4
x 10
(c) estimated by the NMCFLMS
1
0
1
0.2
0.4
0.6
0.8
1.2
sample index
1.4
1.6
1.8
2.2
4
x 10
Figure 5.6: (a) The true 5 acoustic channels with 4400 coefficients generated by the
Image model. (b) Estimated channel using the Spectrally Constrained RNMCFLMS
algorithm. (c) Estimated channel using the original NMCFLMS algorithm
observation data.
Fig. 5.6 illustrates the true 5 acoustic channels with 4400 coefficients generated by
the Image model and those estimated using the spectrally constrained and conventional
NMCFLMS algorithms. It is observed that the RNMCFLMS algorithm gives close
estimation of the AIRs as indicated by the NPM level of around 8 dB. However,
without this constraint, the algorithms fails to estimate the channel even in moderate
noise of 25 dB as shown in Fig. 5.6 (c).
Fig. 5.7 shows the convergence profile of the RNMCFLMS algorithm in a timevarying condition. Here the source position was shifted four times to the left by 1 cm
at each step during the channel estimation process. The notches in the NPM curve
shows the instant when the AIR became changed. It is observed that the algorithm
steadily converges to the final solution despite frequent changes in the AIRs without
91
0
1
NPM (dB)
2
3
4
5
6
7
0
500
1000
iteration index, m
1500
Figure 5.7: Channel estimation profile with iterations for time-varying channels using
the spectrally constrained RNMCFLMS algorithm.
significant perturbation.
5.2.3
Estimation of real reverberant acoustic channels
The proposed Spectrally Constrained RNMCFLMS algorithm is applied for estimating

the real reverberant acoustic channel. The experimental data of the acoustic impulse
responses were obtained from the Multichannel Acoustic Reverberation Database at
York (MARDY) available at [57]. The length of each AIR was truncated to L = 4400
samples for estimation purpose.
The channel estimation performance of the RNMCFLMS algorithm for real
reverberant channels are illustrated in Fig. 5.8 and Fig. 5.9. Fig. 5.8 shows that the
algorithm steadily converges to the desired solution with no sign of misconvergence.
However, comparing Fig. 5.5 with Fig. 5.8, we find that the channel estimation
accuracy is slightly lower for real channels as compared to simulated channels.
Moreover, it requires longer iterations to converge for real channels. In Fig. 5.9, we
compare the time domain samples of the true channel and the estimated channel. It is
observed that the estimated channels are close to the true channels which is quantified
as 7 dB NPM between the true and estimated channels as shown in Fig. 5.8.
92
NPM (dB)
8
0
2000
4000
6000
iterations, m
8000
10000

for the identification of real reverberant acoustic channels using speech input at SNR
= 30 dB.
(a) True channel
1
Ch2
Ch1
0.5
Ch4
Ch3
Ch5
amplitude
0.5
1
0.2
0.4
0.6
0.8
1.2
1.4
1.6
1.8
2.2
4
x 10
(b) Estimated channel
1
Ch1
0.5
Ch5
Ch4
Ch3
Ch2
0
0.5
1
0.2
0.4
0.6
0.8
1.2
1.4
sample index
1.6
1.8
2.2
4
x 10
Figure 5.9: (a) True acoustic channel obtained from the MARDY. (b) Estimated
channel using the Spectrally Constrained algorithm.
5.2.4
Effect of number of microphones
The channel estimation performance improves with increasing the number of

microphones. It is commonly known that the blind channel identification algorithms
93
0
1
NPM (dB)
2
3
4
M = 5 channels
5
M = 8 channels
6
7
0
500
1000
1500
Iteration index
2000
2500

for the identification of M = 5, 8 real reverberant acoustic channels with L = 3073
coefficients using speech input at SNR = 20 dB.
cannot estimate the common zeros of the different channels.
As the number of
microphone increases, the number of zeros that are common to all the channels are
reduced. As a result the estimation quality improves. However, the more the number
of channels the higher the computational complexity. Fig. 5.10 shows the channel
estimation profile for real reverberant channels with 5 and 8 microphones. The final
NPM values with 8 microphones are around 1.21 dB better than that of using 5
microphones.
5.3
Conclusion
In this chapter, we have proposed two novel solutions to improve the noise-robustness
of both the NMCFLMS and the VSS-MCFLMS algorithms. The first algorithm which
is termed as excitation-driven MCLMS, converges in the steady-state, to a weighted
combination of all the eigenvectors giving a noise-robust solution.
However, the
algorithm is not suitable for dereverberation problem as the AIR does not remain
time-invariant in a practical situation allowing steady-state convergence. The second
technique is free from such limitation as it can estimate the time-varying AIRs required
94
for speed dereverberation. We have proposed a novel cost function that inherently
opposed nonuniform spectral attenuation resulting from the noisy update vector and
thus contributed to ameliorating the misconvergence of the MCFLMS algorithm.
Chapter 6
Robust Speech Dereverberation
Using Channel Information
Robust channel estimation algorithms developed in the previous chapters will now
be utilized for speech dereverberation. We present two different techniques that can
dereverberate speech as well as improve SNR of the signal recorded by an array of
microphones. In the first technique, the focus is primarily on the suppression of
late reverberation whereas in the second one, the elimination of both early and late
reflections are targeted. The proposed techniques do not require a priori information
about the AIRs, location of the source and microphones, or statistical properties of the
speech/noise, which are some common assumptions in the related literature.
6.1
Spectrally Constrained Channel Shortening for

Speech Dereverberation
Speech dereverberation does not need complete equalization of the acoustic channel
and, therefore, a shortened channel which requires less computation with acceptable
performance can serve the purpose.
Channel shortening by least-squares (LS)
minimization method is very common but it suffers from severe distortion in the
95
Chapter 6. Robust Speech Dereverberation
96
y1(n)
H1 (z)
H2 (z)
x1(n)
v 2 (n)
x 2 (n)
Input
s(n)
v1(n)
y 2 (n)
.
.
.
HM (z)
yM (n) vM (n)
.
.
.
xM (n)
Delay and sum beamforming
Additive
Noise
_
x(n)
Output
W(z)
^s(n)
Shortening filter
Figure 6.1: Dereverberation technique using the delay-and-sum beamformer and

channel shortening equalizer.
frequency domain. In this section, we propose a spectrally constrained LS minimization
algorithm with optimal convergence rate that effectively mitigates spectral distortion
of the equalized channel. A complete multimicrophone speech dereverberation model
is presented including channel estimation, delay-and-sum beamforming and channel
shortening stages.
The proposed technique shows significant improvement in the
perceptual quality of speech and signal-to-noise ratio. The schematic diagram of the
proposed dereverberation model is shown is Fig. 6.1. The method works in two stages:
delay-and-sum beamforming followed by channel shortening.
6.1.1
Delay-and-sum beamforming
In the first stage, we perform delay-and-sum beamforming which means the signals
received by the microphone array are time-aligned with respect to each other and then
added together. The output of the delay-and-sum beamformer can be expressed as
M
M
1 X
1 X
x(n) =
xk (n k ) =
[yk (n k ) + vk (n k )]
M k=1
M k=1
(6.1)
97
where k is the required delay to compensate for the propagation time of kth channel.
We assume that k is known to us. Since yk (n) = sT (n)hk , we can write (6.1) as
"
#
M
X
1
x(n) = sT (n)
hk,k + v(n k )
M k=1
+ v(n k )
= sT (n)h
where hk,k is the k samples delayed version of hk , v(n) =
(6.2)
1
M
PM
k=1
vk (n). Therefore,
delay-and-sum beamforming stage converts the SIMO AIRs to an equivalent SISO
channel, h.
Moreover, any interfering signal that is not coincident with the
speech source is attenuated at the output of the beamformer when the signals are
superimposed. Thus, delay-and-sum beamforming improves the signal-to-noise ratio of
the received microphone signal.
6.1.2
Channel shortening
The shortening filter shown in Fig. 6.1 minimizes the energy in the late reflections
and hence, reduces the reverberation effect. To design the shortening filter, we assume
that there is no significant variation in the AIRs until the LS minimization algorithm
b
converges. Let h,
w and c represent the estimated equivalent SISO channel, shortening
filter and equalized channel impulse responses of length Lh , Lw and Lc , respectively.
Therefore we can write
b
c = Hw
b
b is the tall convolution matrix of h,
where H
which is Lc Lw Toeplitz. The equalized
channel response c can be divided into two parts: early portion cearly and late portion
clate . Therefore, the cost function for minimizing the energy in the late portion can be
written as
Jlate =
b
b T D2 Hw
wT Aw
clate T clate
wT H
=
=
b
b T Hw
cT c
wT Bw
wT H
(6.3)
b D = ILc diag[11(+1) , 01(Lc 1) ] and + 1 is

b and B = H
b T H,
b T D2 H
where A = H
the length of the early portion. For iterative solution, we define the cost function as
98
[58]
Jlate (l) = wT (l)Aw(l).
(6.4)
The gradient-descent update equation can be expressed as

w(l + 1) = w(l) Jlate (l).
(6.5)
where Jlate (l) = (A + AT )w(l). Here the step-size governs the stability and
convergence speed of the algorithm. Now, we propose a variable step-size that ensures
optimal performance in the adaptation process. Let wopt be the optimum equalizer.
We define a cost function such as
J (l) = [wopt w(l + 1)]T [wopt w(l + 1)]
(6.6)
which measures the distance between w(l + 1) and wopt in each iteration. Substituting
(6.5) into (6.6) and setting J (l)/(l) = 0, we obtain the expression of variable
step-size which is
adap (l) =
JlT (l)[w(l) wopt ]

||Jlate (l)||2
(6.7)
where ||.|| is the l2 norm. In the above expression, wopt is unknown. However, it can
be easily shown that
T
Jlate
(l)wopt = wT (l)(A + AT )wopt = 0.
Therefore, the optimal step-size from (6.7) becomes

adap (l) =
T
Jlate
(l)w(l)
.
||Jlate (l)||2
(6.8)
The gradient-descent update equation with variable step-size can be expressed as

w(l + 1) = w(l) adap (l)Jlate (l).
(6.9)
To include the denominator term wT Bw of (6.18) in the update equation, we re

normalize w with wT Bw after each iteration.
The update equation of (6.9) converges in the mean to wopt which is the optimum
solution in the LS sense. However, the major shortcoming of LS minimization is that it
99
shows nonuniform spectral attenuation causing severe speech distortion. To overcome

this limitation, we propose a penalty function which is
maximize J pc (l) =
Lc
Y
|ci (l)|2 =
i=1
Lc
Y
|b
hi wi (l)|2
(6.10)
i=1
subject to
|c1 (l)|2 + |c2 (l)|2 + + |cLc (l)|2 =
1
Lc
(6.11)
where the underline denotes frequency-domain quantities and ci (n) = b

hi wi (n). Here
b
ci , b
hi and wi represents ith elements of Lc point DFT of c, h
and w, respectively. Eq.
(6.11) can be easily ensured by imposing the unit norm constraint on the shortened
channel vector, c. Maximizing the penalty function J pc (l) ensures near spectral flatness
of c. The proof of this statement comes from the fact that the product of some elements
becomes maximum only when all the elements are equal (provided that their summation
remains constant). In order to simplify the expression of the penalty gradient, we take
natural logarithm on both sides of (6.10) which does not relax the functionality of the
penalty term. Therefore, we can rewrite the penalty cost function as
Jp c (l) =
Lc
X
ln(|ci (l)|2 ).
(6.12)
i=1
Now, the gradient of the penalty function is obtained as

T
Jp c (l) = [Jp 1 (l) Jp i (l) Jp Lc (l)]T

where Jp i (l) can be expressed as [51]
Jp i (l) = 2
Jp (l)
2
= .
wi (l)
wi (l)
(6.13)
Therefore, we can write Jp c (l) as

Jp c (l) = Q(l)w(l)
(6.14)
where Q(l) is a diagonal matrix with diagonal elements 2/|wk (l)|2 , k = 1, 2, , Lw .

Finally, the update rule for the proposed spectrally constrained algorithm with variable
step-size is given by
w(l + 1) = w(l) adap (l)Jlate (l) (1 )F1 {Q(l)w(l)}
(6.15)
100
where F1 represents inverse DFT and 0 1 is a trade-off parameter between

LS minimization and spectral flatness. The higher the value of , the more the late
reverberation is minimized. The lower the value of , the more spectral flatness in
ensured in the shortened channel.
6.1.3
Simulation results
In this section, we present simulation results to evaluate the performance of the

proposed spectrally constrained LS algorithm for acoustic channel shortening. At the
same time, we present comparative results with very recent infinity-norm optimization
method [41], which also does not target complete equalization of the AIRs.
The AIRs were generated considering the same virtual room described in Section
4.4.2 with reverberation time T60 = 0.55 s and then truncated so as to make the length
4400. For speech input, a number of both female and male utterances, sampled at
8 kHz were used. The additive noise was white zero-mean Gaussian. In all cases,
= 200 samples and = 0.98 were considered.
Fig. 6.2 (a) and (b) depict the original and estimated channel using the robust
MCLMS algorithm in noise at 20 dB SNR. The accuracy of channel estimation in terms
of normalized projection misalignment (NPM) is 8.63dB. The shortened channel by
the proposed algorithm using this estimate is shown in Fig. 6.2(c). We see that the
proposed algorithm is quite successful in reducing the late reflections.
The effectiveness of the proposed step-size in the convergence of the shortening
algorithm in demonstrated in Fig. 6.3. It is observed that the proposed step-size is
automatically adjusted to the value needed for optimal convergence. In Fig. 6.4,
we compare the frequency spectrum of the original and equalized channels using
the proposed spectrally constrained algorithm and LS minimization. The equalized
channel through LS minimization has severe frequency distortion showing narrowband
characteristics. To the contrary, the proposed algorithm with spectral constraint is
effective in preserving near spectral flatness of the original channel and hence introduces
less distortion in the speech.
Now we evaluate the performance of our proposed
101
1
(a)
0.5
0
amplitude
500
1000
1500
2000
2500
3000
3500
4000
2500
3000
3500
4000
1
(b)
0.5
0
0
1
0.5
0
0.5
500
1000
1500
2000
(c)
1000
2000
3000
samples
4000
5000
6000
Figure 6.2: Channel impulse responses: (a) original (b) estimated using robust MCLMS
algorithm (c) shortened channel using the estimated impulse responses.
technique and compare the result with infinity-norm optimization, both using the
estimated channels, in terms of direct-to-reverberant energy ratio (DRR), perceptual
evaluation of speech quality (PESQ) and signal-to-noise ratio (SNR). DRR is a popular
objective measure of the room reverberation. If reverberation is considered as noise,
DRR is similar to SNR. The DRR is measured as
P 2
d (n)
DRR = 10 log10 Pn 2
n r (n)
(6.16)
where d(n) and r(n) are the direct sound and reverberant part of the recorded signal.
The time boundary between the direct sound and reverberant part is usually taken as
50 ms. Table 6.3 shows that the proposed algorithm gives higher DRR as compared to
infinity-norm optimization. PESQ scores measure the level of speech distortion at the
output of the shortened channel. The advantage of PESQ is that it maintains a good
correlation with subjective score in a very wide range of conditions. The highest value
of PESQ is 5 which indicates the compared signal is exactly identical to the original
signal. The proposed technique gives better PESQ score and higher SNR as compared
to the infinity-norm optimization. Both PESQ and SNR results indicate that channel
shortening can be a good approach for speech dereverberation.
102
0
fixed = 0.13
fixed
10
J (dB)
fixed = 0.01
= 0.05
adap
15
20
0
100
200
300
400
500
600
iterations
700
800
900
1000
Figure 6.3: Convergence profile of the shortening algorithm with fixed step-size and
proposed variable step-size.
1
(a)
0.5
magnitude
0
0
0.2
0.4
0.6
0.8
1.2
1.4
1.6
1.8
1.2
1.4
1.6
1.8
1.2
1.4
1.6
1.8
1
(b)
0.5
0
0
0.2
0.4
0.6
0.8
1
(c)
0.5
0
0
0.2
0.4
0.6
0.8
1
/
Figure 6.4: Frequency spectrum of the original and shortened channels: (a) original
(b) shortened using the proposed spectral constraint algorithm (c) shortened using LS
minimization.
Although it may appear that the performance improvement of the proposed
algorithm as compared to the infinity-norm optimization is not significant in terms
of PESQ and SNR improvement, the strength of the proposed algorithm lies in
its computational efficiency. For example, the proposed algorithm converges in 500
iterations, whereas the infinity-norm optimization requires 50, 000 iterations to produce
steady-state results. Not only that, mean time per iteration is higher in infinity-norm
103
Table 6.1: Results of DRR, PESQ and SNR improvement

Speech
DRR(dB)
(TIMIT database)
PESQ
SNR(dB)
rev
-norm
proposed
rev
-norm
proposed
rev
-norm
proposed
6.56
12.65
13.54
2.09
2.44
2.47
20
25.30
25.80
5.36
13.72
15.12
1.98
2.26
2.30
20
22.27
23.00
4.31
13.67
14.97
1.98
2.17
2.28
20
22.04
22.55
5.37
13.30
14.27
1.99
2.14
2.30
20
23.06
23.94
5.09
13.64
14.93
2.09
2.28
2.31
20
23.43
24.00
5.40
13.29
14.40
2.22
2.33
2.41
20
23.18
23.54
4.11
12.08
13.05
2.12
2.17
2.29
20
23.73
24.03
4.86
12.36
13.21
2.32
2.38
2.43
20
24.40
24.78
Female
Male
35
infinitynorm
proposed
mean time per

iteration (ms)
30
25
20
15
10
2000
2500
3000
3500
channel length, L
4000
Figure 6.5: Comparison of mean time per iteration for the proposed and infinity-norm
algorithms.
algorithm than the proposed one. The mean time per iteration for the two comparing
methods is shown in Fig. 6.5 using a system comprising 2.4 GHz Intel(R) Core(TM)2
Quad CPU with 996 MB RAM. Therefore, the proposed algorithm is much faster than
infinity-norm optimization.
Now we investigate the effectiveness of the beamformer in the proposed
dereverberation model. The shortening filter could be designed from each of the
estimated AIR in stead of obtaining it from the equivalent SISO channel after
beamforming operation.
Table 6.2 shows the improvement in speech quality for
104
Table 6.2: Results of SNR, DRR and PESQ improvement with and without delay-andsum beamformer
Shortening filter
SNR (dB)
DRR (dB)
PESQ
No shortening filter: reverberated
20
6.55
2.09
Ch-1
23.62
12.92
2.15
Ch-2
17.53
12.19
2.18
Ch-3
23.41
11.86
2.52
Ch-4
18.38
12.49
2.15
Ch-5
18.27
12.95
2.35
25.86
13.53
2.47
Individual filter
Delay-and-sum beamformed
individual and delay-and-sum shortening filters in terms of SNR, PESQ and DRR.
The SNR of the reverberated and dereverberated signals are estimated as the ratio
of the energy of the speech component to that of noise component in those signals,
respectively. The results show that the shortening filter followed by delay-and-sum
beamforming gives the highest improvement in SNR and DRR. The PESQ points are
also better than most of the individual outputs. Considering all these factors, it can be
remarked that the beamformer plays a positive role in the dereverberation technique.
Finally we present speech dereverberation performance using the real reverberant
channels obtained from MARDY. The average improvement in DRR is 7.25 dB, PESQ
improves by 0.38 points and improvement in SNR is 4.96 dB.
6.1.4
Limitations of channel shortening
The channel shortening approach is an efficient solution for speech dereverberation

that requires less computational complexity as compared to complete equalization of
105
Table 6.3: Results of DRR, PESQ and SNR improvement

Speech
DRR(dB)
(TIMIT database)
PESQ
SNR(dB)
rev
proposed
rev
proposed
rev
proposed
8.70
15.87
2.18
2.65
30
34.74
8.57
15.94
2.21
2.61
30
34.77
8.25
15.20
2.25
2.61
30
35.04
9.64
17.10
2.31
2.65
30
35.40
7.63
15.11
2.55
2.89
30
34.81
7.98
15.48
2.13
2.55
30
35.40
7.46
14.17
2.33
2.73
30
35.08
8.92
16.34
2.44
2.79
30
34.44
Female
Male
the AIRs. The psychoacoustic properties of the human auditory system allows us to
keep the early reverberation unchanged but the late reverberation must be suppressed
to the extent as much as possible to improve the perceptual quality of the speech
signal. In spite of these advantages, the channel shortening technique has the following
limitations.
1. The shortening filter obtained from the LS minimization is a narrowband filter.
As a result the early reflections does not remain unchanged after the shortening
process. This introduces speech distortion in the dereverberated signal.
2. The early reverberation causes a spectral distortion called coloration effect. The
shortening filter cannot eliminate this distortion.
3. No sophisticated SNR improvement technique can be incorporated with the
shortening filter. The reason is that the SNR improvement filter heavily distorts
the AIR and hence the distortion in the early portion of the shortened channel
is so severe that the perceptual quality of the speech drastically falls.
6.2
106
Zero Forcing Equalizer based Dereverberation
In this section, a dereverberation technique is proposed that eliminates both early

and late reverberation using a zero forcing equalizer (ZFE). The ZFE offers complete
channel equalization with the least cost of computation.
However, the ZFE is
discouraged in a noisy condition as it gives severe noise amplification. In order to

mitigate this detrimental effect, the multimicrophone received signals are first filtered
through an eigenfilter which improves the power of the speech signal as compared
to that of additive noise. The eigenfilter is efficiently computed avoiding the tedious
Cholesky decomposition, solely from the estimates of AIRs. The design of the eigenfilter
also resists spectral nulls in equivalent channel so that noise amplification is significantly
mitigated at the output of the equalization process. Now the zero-forcing equalizer can
be employed to eliminate the channel distortion introduced by the AIRs and eigenfilter.
The ZFE has been implemented in block-adaptive mode in order to equalize time
varying AIRs as well as reduce memory requirement of the DSP processor.
The proposed dereverberation model, which is depicted in Fig. 6.6, is composed
of three different blocks, such as, blind channel estimation, SNR improvement and
zero-forcing equalization. The channel estimation block estimates the AIRs using the
robust MCLMS algorithm. The operation of the remaining two blocks are discussed in
the following.
6.2.1
Eigenfilter based SNR improvement
If the ZFE is cascaded just after the channel estimation stage, the SNR deteriorates
at the output of the equalizer due to severe noise amplification near the spectral nulls.
Therefore, an SNR improvement scheme is essential before the channel equalization
stage. To this end, we propose a modified eigenfilter in this section. The modification
is made in two ways. First, the conventional design of eigenfilters is computationally
expensive. Therefore, an efficient eigensolver technique is proposed that reduces the
cost of computation by avoiding the Cholesky decomposition. Second, the equivalent

Input
Channels
Additive
Noise
s(n)
x1(n)
y 2 (n)
v 2 (n)
.
.
.
hM
Block IDFT
Output
F -1
^s(n)
z(n)
z2(n)
yM (n) vM (n)
Zero forcing equalizer
x 2 (n)
.
.
.
Block DFT
z1(n)
h1
h2
Observations Eigenfilter block
v1(n)
y1(n)
107
.
.
.
x M (n)
zM(n)
.
.
.
Figure 6.6:
.
.
.
.
.
.
Channel Estimation
^
h1
h2
...
^
,
Block diagram model of the proposed zero-forcing dereverberation
technique.
channel becomes extremely narrowband when the AIRs are filtered through the
eigenfilters. As a result, the speech signals get distorted at the output of the eigenfilters.
To overcome this limitation as well as remove the spectral nulls, a frequency-domain
constraint is attached to the eigenfilters that improves the quality of the dereverberated
speech.
It is reported in the literature that an eigenfilter that maximizes the SNR can
be obtained from the eigenvector of the data correlation matrix corresponding to the
largest eigenvalue [51]. However, speech dereverberation is a blind problem and we
do not have access to the speech signal. One may use the correlation matrix obtained
from the received microphone signals, with the assumption that the desired signal (here
speech signal) is a wide-sense stationary (WSS) random process. But speech signal is
highly nonstationary and the WSS assumption does not hold at all. Therefore, the
eigenfilter (eigenvector corresponding to the largest eigenvalue) estimated from the
data correlation matrix is not a proper choice.
108
Signal path
signal
zk
Noise path
v
noise
zk
Figure 6.7: Block diagram of the signal path and noise path for the kth channel.
We propose an improved eigenfilter technique utilizing the estimates of the AIRs
that enhances the energy in the signal path as compared to that of the noise path.
Let, gk (n) of length Lg represents the eigenfilter in the kth channel. Now, we can
separate the signal and noise path as shown in Fig. 6.7 where zksignal and zknoise are the
speech and noise components at the output of the filter gk , respectively. If we design
gk (n) such that the energy of the signal path maximizes with respect to that of the
noise path, the SNR will increase at the output of the eigenfilter. Let heq (n) of length
Leq = L + Lg 1 represents the equivalent channel impulse response at the output of
the eigenfilter block. Then, we can write
heq (n) = H(n)g(n)
(6.17)
T
where g(n) = [g1T (n) g2T (n) gM
(n)]T is the composite eigenfilter and H(n) =
[H1 (n) H2 (n) . . . HM (n)] is the composite convolution matrix, where Hi (n) of size
Leq Lg is the convolution matrix of hi (n). Now, the desired objective function can
be written as
hTeq (n)heq (n)
gT (n)A(n)g(n)
=
Jc (n) = T
g (n)g(n)
gT (n)g(n)
(6.18)
where A(n) = HT (n)H(n). The optimal method for maximizing the signal path energy
would find g(n) so as to maximize gT (n)A(n)g(n) while satisfying gT (n)g(n) = 1. We
see, therefore, that this problem may be viewed as an eigenvalue problem and the
optimum FIR filter that maximizes Jc (n) can be obtained as
gopt (n) = qmax
(6.19)
109
where qmax is the eigenvector associated with the largest eigenvalue of A(n). We can
b
easily estimate A(n) from the estimates of AIRs, h(n).
Since the AIRs may vary with
time, a fixed qmax obtained from A(n) of a particular instant would not work for the
entire speech waveform. Therefore, we need to update the matrix A(n) with a new
b
set of h(n)
at regular intervals. Moreover, a sharp change in the AIRs usually gives
an abrupt rise in the cost function, and this fluctuation may also be used for updating
b
the matrix A(n) with a new set of h(n)
whenever the AIRs change.
An iterative algorithm is proposed here for finding the eigenfilter which gives a
number of advantages over the onepass solution. First, we can avoid computationally
intensive Cholesky decomposition which may become unstable for large A(n) [58].
Second, the optimum eigenfilter is extremely narrowband in the frequency-domain
causing speech distortion in the output. Moreover, spectral nulls are present in the
equivalent channels which causes significant noise amplification at the output of the
equalization process.
Therefore, we enforce spectral flatness in the eigenfilter, in
addition to the signal path energy maximization, by incorporating a spectral constraint.

The objective function for iterative solution can be written as
b
Ji (n) = gT (n)A(n)g(n)
b
where A(n)
denotes the estimate of A(n).
(6.20)
We propose an efficient eigensolver
for finding the eigenvector corresponding to the maximum eigenvalue. The update
equation at lth iteration can be expressed as
l
b
gl+1 (n) = A(n)g
(n)
(6.21)
b
where = 1/Tr{A(n)}
is the step size and Tr{} represents the trace of a matrix. The
proof of (6.21) is provided in the Appendix which shows that the algorithm converges in
b
the mean to the eigenvector of A(n)
corresponding to the largest eigenvalue. In order
to enforce spectral flatness of gl (n) in the frequency-domain, we formulate a penalty
function similar to (5.37) which is
maximize Jf c (n) =
Lg
Y
i=1
|g li (n)|2
(6.22)
110
where g li (n) represents the ith elements of Lg point discrete Fourier transform (DFT)
of gl (n). Maximizing the penalty function, Jf c (n), tries to make each and every
component of gl (n) uniform in the frequency-domain. Thus it resists spectral nulls
in the equivalent channel impulse response, heq . In order to simplify the expression of
the penalty gradient, we take natural logarithm on both sides of (6.22) which does not
relax the functionality of the penalty term. Therefore, we can rewrite the penalty cost
function as
Jf c (n) =
Lg
X
ln(|g li (n)|2 ).
(6.23)
i=1
Now, the gradient of the penalty function is obtained as

Jf c (n) = [J1T (n) JkT (n) JLTg (n)]T
where Jk (n) can be expressed as [51]
Jk (n) =
Jf c (n)
2
=
.
l
conj{g k (n)}
conj{g lk (n)}
(6.24)
Therefore, we can write Jf c (n) as

Jf c (n) = Q(n)gl (n)
(6.25)
where Q(n) is a diagonal matrix with diagonal elements 2/|g li (n)|2 , i = 1, 2, , Lg .

Finally, the update rule for the proposed spectrally constrained eigensolver algorithm
is given by
l
b
gl+1 (n) = A(n)g
(n) + (1 )F1 Jf c (n)
(6.26)
where 0 1 is a tuning parameter for the frequency-domain spectral constraint.

The lower the value of , the stronger the spectral constraint. The implementation of
our signal power enhancement scheme is shown in Table 6.4.
6.2.2
Zero forcing equalizer
Dereverberation of speech requires blind equalization of the AIRs. Among the various
linear equalization techniques proposed in the literature, the zero-forcing equalizer
111
Table 6.4: Frequency domain constrained eigensolver algorithm

Step 1
b
b
Pre-compute A(n)
using the estimated channels, h(n).
b
Obtain = 1/Tr{A(n)}.
Step 2
Set iteration index l = 0.

Initialize the eigenfilter vector, gl (n) = [1 01(Lg ) ]T .
l
b
Step 3 Compute A(n)g
(n), and Jf c (n) as defined in (6.25).
step 4 Update g:
l
b
gl+1 (n) = A(n)g
(n) + (1 )F1 Jf c (n)
Step 5 Set l = l + 1 and go to step 3.
(ZFE) and the minimum mean-square error (MMSE) equalizer are the most common
[60]. Although the MMSE technique is more noise robust than the ZFE, the advantage
is obtained at the expense of computational complexity as the minimum of the error
function is to be searched for a wide range of delays. The ZFE is computationally
efficient and gives direct equalization of the AIRs in the frequency domain. However,
the ZFE can lead to considerable noise amplification which makes it unsuitable for
practical applications.
In the proposed dereverberation technique, the eigenfilter
along with the frequency-domain constraint can effectively compensate for such SNR
degradation by providing adequate signal power enhancement in the previous stage.
Moreover, the ZFE is implemented in block adaptive mode for the reasons stated after
(6.27).
We can obtain an estimate of the equivalent channel, heq (n), from the speaker to
b
the output of the eigenfilter from the estimates of AIRs, h(n),
and impulse response of
the eigenfilter, g(n). Let heq represents the equivalent channel vector in the frequencydomain. Therefore, the kth frequency-component of the required ZFE can be expressed
112
as
jk
a(e
heq (ejk )
) =
|heq (ejk )|2 +
(6.27)
where heq (ejk ) is the kth frequency-component of heq . A small positive number, , is
added in the denominator to avoid division by zero. However, this simple ZFE is not
implementable in practice. The main reason is that the DFT size for obtaining heq
should be, at least, the sum of the lengths of the signal vector and the channel impulse
response vector minus one. But the length of the speech signal is usually undefined.
It is not practically possible to store the entire speech waveform and then perform
zero forcing equalization. Moreover, the AIRs are slowly time-varying. We cannot
assume the same heq for the entire speech signal. The above mentioned problems can
be resolved by two different ways. The first one is the time-domain filter obtained from
the IDFT of heq (ejk ). However, the zeros of the AIRs are very close to the unit circle
in the z-plane and hence the FIR approximation of the inverse filter becomes very high
order. In other words, the noncausal part of the inverse filter is prohibitively large
for causal implementation. For example, the length of the noncausal part of a typical
inverse filter is around 2.5 105 which requires 31.25s delay for causal implementation.
The second approach is the block adaptive ZFE in the frequency-domain. Although a
block delay is unavoidably introduced in this case, such a delay is smaller than that
with the causal implementation in the time domain. For example, with a block size
of 9L and L = 4400, the block delay is 4.95s. In this work, we propose a block ZFE
utilizing the overlap-save method [61].
The combined output of the signal power enhancement eigenfilters as shown in Fig.
6.1 can be expressed as
z(n) = x1 (n) g1 (n) + + xM (n) gM (n)
= s(n) heq (n) + v(n)
(6.28)
where heq (n) = h1 (n) g1 (n) + + hM (n) gM (n) and v(n) = v1 (n) g1 (n) + +
vM (n) gM (n). The improved SNR at the output of the eigenfilter can be estimated
113
from the expression

P
SNR =
|s(n) heq (n)|2

P
.
|v(n)|2
(6.29)
The power of the signal term in (6.28) is significantly enhanced as compared to that
of the noise term due to eigenfiltering in the previous stage. Therefore, the noise term
can be ignored in the derivation of the ZFE for simplification.
Now, we formulate a suitable transformation operation that can transform a block
of data at the output of the eigenfilter to a direct product of the source signal vector and
equivalent channel vector. Then it becomes easy to dereverberate that block of data
by canceling out the effect of heq (n) using the estimates. Let z(m) represents a vector
of length pL (p 2, an integer number), that results from the circular convolution of
the source signal vector, s, and the equivalent channel, heq ,
z(m) = Cs (m)h10
eq (m)
(6.30)
where m represents block-time index,

z(m) = [z(m(p 1)L L) z((m + 1)(p 1)L 1)]T
and Cs (m) is a circulant matrix with the first column as
sb (m) = [s(m(p 1)L L) s((m + 1)(p 1)L 1)]T
with
T
T
T
h10
eq (m) = [heq (m) 0(p1)L1 ] .
We can find from inspection that the last (p 1)L points of z(m) are identical to linear
convolution between s and heq which can be represented as
01
z(m)
z(m) = W(p1)LpL
10
01
heq (m)
Cs (m)WpLL
= W(p1)LpL
(6.31)
where
z(m) = [z(m(p 1)L) z((m + 1)(p 1)L 1)]T
114
and
01
W(p1)LpL
= [0(p1)LL I(p1)L(p1)L ]
10
WpLL
= [ILL 0L(p1)L ]T .
The circulant matrix Cs (m) can be decomposed as

Cs (m) = F1
pLpL Ds (m)FpLpL
(6.32)
where Ds (m) is a diagonal matrix whose elements are obtained from the DFT
coefficients of the first column of Cs (m). Substituting Ds (m) into (6.31) and taking
DFT of z(m), we obtain the frequency-domain block vector as
01
Ds (m)h10
z(m) = W(p1)LpL
eq (m)
(6.33)
where
z(m) = F(p1)L(p1)L z(m)
01
01
W(p1)LpL
= F(p1)L(p1)L W(p1)LpL
F1
pLpL
10
h10
eq (m) = FpLpL WpLL heq (m).
01
Multiplying both sides of (6.33) with WpL(p1)L
, we get
01
01
WpL(p1)L
z(m) = WpLpL
Ds (m)h10
eq (m)
(6.34)
where
01
WpL(p1)L
= FpLpL [0(p1)LL I(p1)L(p1)L ]T
F1
(p1)L(p1)L
01
01
01
W(p1)LpL
= WpL(p1)L
WpLpL
0LL
0LL
= FpLpL
0(p1)LL I(p1)L(p1)L
F1
pLpL .
01
For acoustic channels, L is usually very large and hence WpLpL
can be approximated
by the identity matrix as [35]

01
WpLpL
p1
p
IpLpL .
(6.35)
115
Substituting (6.35) into (6.33) gives
01
WpL(p1)L
z(m)
p1
p
Ds (m)h10
eq (m).
(6.36)
Now, the right hand side of (6.36) is the product of source data matrix and equivalent
channel vector in the frequency-domain. The term, (p1)/p is simply a scalar quantity.
Therefore, its effect can be neglected in the derivation. Now, the block-adaptive ZFE
that can compensate for h10
eq (m) in z(m) can be
a 0 ...
1
0 a2 . . .
.. .. . .
.
. .
A(m) =
0 0 ...
.. .. . .
.
. .
0 0 ...
easily obtained as
0 ... 0
0 ... 0
..
..
.
.
0
ai . . . 0
..
..
.
.
0
0 . . . apL
(6.37)
10
10
2
where ai = conj{h10
eq,i (m)}/(|heq,i (m)| + ) and heq,i (m) is the ith component of the
vector h10
eq (m). An estimate of A(m) can be obtained from the estimated AIRs and
eigenfilter impulse response. Therefore, the source signal block,
sb (m) can be extracted
from z(m) as
01
b
sb (m) = F1
pLpL A(m)WpL(p1)L z(m)
(6.38)
where
sb (m) = [b
s(m(p 1)L L) sb(m(p 1)L) sb((m + 1)(p 1)L 1)]T .
Now, we can obtain the dereverberated speech block at the output of the equalizer
corresponding to each element of z(m) as
01
b
sb (m) = W(p1)LpL
sb (m)
(6.39)
by discarding the first L samples from

sb (m). Finally, the entire speech signal can be
reconstructed by concatenating the successive dereverberated blocks, bsb (m), as
bs(n) = [b
s(0) sb((p 1)L 1)
|
{z
}
1st block
sb((p 1)L) sb(2(p 1)L 1) ]T .
|
{z
}
2nd block
The implementation of the ZFE is shown in Table 6.5.
(6.40)
116
Table 6.5: Block adaptive zero forcing equalizer algorithm

Step 1 Set the block-index m = 0, block-length parameter p = 10.
Step 2 Obtain the vector z(m) = [z(m(p 1)L) z((m + 1)(p 1)L 1)]T from
the combined output of the eigenfilters.
Step 3 Compute the equalization matrix A(m) as defined in (6.37) using the estimate
of h10
eq (m).
Step 4 Obtain
sb (m) and bsb (m) using (6.38) and (6.39), respectively.
Step 5 Set m = m + 1 and go to step 2.
Step 6 Obtain dereverberated speech signal:
bs(n) = [b
s(0) sb((p 1)L 1) sb((p 1)L)
sb(2(p 1)L 1) ]T
6.2.3
Simulation results
In this section, we present simulation results to evaluate the performance of the

proposed multi-microphone speech dereverberation technique (consisting of channel
estimation, SNR enhancement and zero forcing equalization blocks) for both the
simulated and real acoustic channels under time-invariant and time-varying conditions
with noise.
The simulated channels were generated using the well-known image
model [53] and the the real reverberant channels were obtained from the multichannel
acoustic reverberation database at York (MARDY) [57]. We also present comparative
dereverberation performance using the correlation based multichannel Inversion with
Spectral Subtraction (ISS) [28], Multichannel Linear Prediction (MLP) [22], infinitynorm minimization (-norm) [41] algorithms. For speech input, a number of both
female and male utterances, sampled at 8 kHz were used. Different objective measures
that are used to evaluate the quality of speech are log-likelihood ratio (LLR), average
segmental SNR (segSNR), weighted spectral slope (WSS) and perceptual evaluation of
117
(a)
1
0.5
0
0.5
500
1000
1500
2000
2500
3000
3500
4000
2500
3000
3500
4000
(b)
amplitude
1
0.5
0
0.5
500
1000
1500
2000
(c)
1
0.5
0
0.5
1000
2000
3000
samples
4000
5000
Figure 6.8: Channel equalization at SNR = 20 dB using the proposed dereverberation

method. (a) original (b) estimated (c) equalized.
speech quality (PESQ) [59]. The higher the values of the segSNR and PESQ, and the
lower the values of LLR and WSS, the better the speech quality.
6.2.4
Acoustic channels using the Image model
The dimension of the room size was taken to be (543) m. A linear array consisting of
M = 5 microphones with uniform separation of d = 0.2 m was used in the experiment.
The first microphone was positioned at (1.0, 1.5, 1.6) m and the location of the other
microphones can be obtained by successively adding d = 0.2 m to the y-coordinate of
the first microphone. The initial position of the speaker was fixed at (2.0, 1.2, 1.6) m.
Wall reflection coefficients are 0.9 for all walls, ceiling, and the floor. The length of each
impulse response was L = 4400 samples and the reverberation time T60 = 0.55s was
considered. The additive noise was white zero-mean Gaussian. In all cases, = 0.5,
= 0.05, eigenfilter length, Lg = 1100 were used. The block length for ZFE was 9 L,
which means a block-delay of 4.95s.
Fig. 6.8 (a) depicts the original channel and (b) the estimated channel using the
118
1
(a)
magnitude
0.5
0
0
0.5
1.5
(b)
0.8
0.6
0.4
0.2
0
0.2
0.4
0.6
0.8
1.2
1.4
1.6
1.8
Figure 6.9: Frequency spectrum of the equivalent channel from the speaker to the
output of the beamformer (a) without spectral constraint (b) with spectral constraint,
= 0.5.
Table 6.6: Effect of the eigenfilter on the dereverberation performance of the proposed
scheme
Cases
SNR
LLR
segSNR
WSS
PESQ
Reverberated
20
0.791
3.58
59.63
2.010
Case 1
19.49
0.697
0.74
39.89
2.380
Case 2
22.13
0.511
0.24
40.75
2.454
Case 3
24.58
0.485
0.20
41.49
2.553
Case 1: Dereverberated without eigenfilters;
Case 2: Dereverberated with eigenfilters but no constraint;
Case 3: Dereverberated with eigenfilters having constraint;
robust NMCFLMS algorithm at 20 dB SNR. The NPM between the original and
estimated channels is 7.8 dB. The direct inversion using the MINT method fails
to equalize the AIRs with such an estimate. Fig. 6.8 (c) shows the IDFT sequence of
the equalized channel at the output of the ZFE using the proposed method. We see
that the equalized channel is near impulse like and both the early and late reflections
are significantly attenuated. As a result, we can say that the dereverberation of the
119
speech signal is achieved at the output of the ZFE.

The effectiveness of the proposed spectral constraint in the eigenfilter is
demonstrated in Fig. 6.9. It is observed in Fig. 6.9(a) that, without the constraint, the
channel frequency response severely attenuates in most of the frequency components.
To the contrary, the proposed spectral constraint can successfully resist such spectral
attenuation, as shown in Fig. 6.9(b), and thus improves the perceptual quality of the
dereverberated speech. Table 6.6 shows the effect of the proposed eigenfilter with and
without constraint on the quality of the dereverberated speech. The results show that
the SNR of the dereverberated speech deteriorates if the received noisy signals are not
enhanced by the eigenfilter before zero forcing equalization. As a result, the quality
of speech falls which is reflected in different objective measures. On the other hand,
the eigenfilter can provide sufficient amplification which is effective for improving the
SNR at the output of the ZFE. The constrained optimization of the eigenfilter can
further improve the SNR, LLR, segSNR, WSS and PESQ of the dereverberated speech
as compared to those obtained without the constraint.
The dereverberation performance of the proposed algorithm is demonstrated
through spectrogram in Fig. 6.10. It is observed that the clean speech has vivid
harmonic structure in the short time Fourier transform (STFT) domain (Fig. 6.10
(a)). The spectrogram of the noisy (SNR=30 dB) reverberant speech as depicted in
Fig. 6.10 (b) shows smearing of the frequency spectrum due to reverberation. The
presence of noise is understood from the yellowish surface of the figure. The output of
the eigenfilter, shown in Fig. 6.10 (c), confirms the denoising effect by the appearance
of the bluish background that represents low energy. Finally, the spectrogram of the
dereverberated speech using the proposed method is shown in Fig. 6.10 (d). As can
be seen, the smearing is lessened and the harmonic structure has reappeared.
Fig.
6.11 shows the effect of zero-forcing equalizer block-length on the
dereverberation performance of our algorithm. The results reveal that the proposed
technique is slightly dependent on this parameter giving almost similar performance
when the block-length is varied.
120
Figure 6.10: Spectrogram of the (a) clean speech (b) noisy reverberated speech at 30
dB SNR (c) denoised speech (d) dereverberated using the proposed method.
objective scores
LLR
SegSNR
PESQ
0
3
6
7
block length ( L)
Figure 6.11: Quality of the dereverberated speech at different block-lengths of the

proposed zero forcing equalization.
121
Table 6.7: Performance of the proposed algorithm at different SNRs

NPM (dB)
SNR (dB)
LLR
segSNR
WSS
PESQ
(estimated channel)
input
output
rev
derev
rev
derev
rev
derev
rev
derev
8.23
25
28.2
0.74
0.39
1.15
1.55
51.00
32.88
2.23
2.86
7.76
20
23.0
0.94
0.59
1.34
1.08
50.54
35.58
2.16
2.75
7.60
15
18.0
1.25
0.84
1.77
0.57
49.64
36.71
2.05
2.58
5.33
10
12.7
1.66
1.32
2.63
-0.04
49.06
38.79
1.84
2.32
The performance of the proposed algorithm at different SNRs is presented in Table

6.7. The input and output SNRs are estimated from the ratio of the energy of the
speech component to that of noise component in the received microphone signal and
dereverberated signal, respectively. It is observed that the proposed eigenfilter can
improve the overall SNR of the speech signal despite some noise amplification by the
ZFE. LLR, segSNR, WSS and PESQ measures show that the technique is effective for
dereverberation in a wide range of SNRs.
Table 6.8: Quality of the dereverberated speech in terms of LLR for the proposed and
other state-of-the-art techniques
Speech
LLR
(TIMIT database)
rev
-norm
ISS
MLP
proposed
1.225
1.017
0.989
0.993
0.751
1.265
1.074
1.156
0.930
0.915
1.039
0.875
1.055
0.709
0.746
1.064
0.876
0.935
0.831
0.749
0.869
0.690
0.609
0.655
0.545
1.118
0.861
1.001
0.918
0.698
1.030
0.750
0.804
0.766
0.638
1.290
1.057
0.959
0.974
0.886
Female
Male
122
Table 6.9: Quality of the dereverberated speech in terms of segSNR for the proposed
and other state-of-the-art techniques
Speech
segSNR
(TIMIT database)
rev
-norm
ISS
MLP
proposed
3.44
4.93
1.31
2.03
2.41
4.51
4.97
1.29
3.69
0.55
3.53
3.72
2.64
2.57
0.39
3.69
3.69
2.02
4.33
0.45
5.89
4.42
2.50
2.76
1.40
4.92
5.14
2.33
3.62
1.27
5.18
5.34
4.29
2.29
0.49
5.30
6.36
2.39
3.96
0.69
Female
Male
Table 6.10: Quality of the dereverberated speech in terms of WSS for the proposed
Speech
WSS
(TIMIT database)
rev
-norm
ISS
MLP
proposed
68.44
60.34
61.22
64.15
40.99
70.58
61.22
80.49
66.85
45.77
65.78
58.67
54.71
56.36
42.15
61.64
55.36
47.33
62.27
36.82
48.56
48.65
45.16
29.85
33.47
59.24
52.46
79.26
45.50
39.45
51.96
49.07
53.41
42.82
34.47
55.62
52.77
72.00
39.37
38.55
Female
Male
123
Table 6.11: Quality of the dereverberated speech in terms of PESQ for the proposed
Speech
PESQ
(TIMIT database)
rev
-norm
ISS
MLP
proposed
1.883
1.852
2.117
1.756
2.634
1.823
2.181
2.051
1.923
2.600
1.893
1.964
2.095
1.935
2.624
2.061
2.166
2.333
1.978
2.750
2.329
2.327
2.346
2.246
2.923
1.542
2.169
1.594
2.024
2.572
1.960
2.240
2.030
1.962
2.741
2.063
2.315
1.958
2.207
2.659
Female
Male
Now, we compare the performance of the proposed method with other state-ofthe-art dereverberation techniques (-norm, ISS and MLP methods) in Tables 6.8 to
6.11. For the -norm method, the length of the shortening filter was taken 1100, the
step-size in the update equation was selected 0.00001 and the iteration was continued
until the cost function reaches a steady-state minimum value. The ISS and MLP
methods were implemented with the same parameters as in the respective papers.
SNR = 20 dB was considered for evaluating all the techniques. The results show that
the proposed technique performs better than the comparing methods in terms of LLR,
segSNR, WSS and PESQ. The average improvement in the LLR is 0.372 point, which is
0.160, 0.196, 0.106 points better than the -norm, ISS and MLP methods, respectively.
The average improvement in the segSNR is 4.76 dB, which is 5.02, 2.55, 3.36 dB better
than the -norm, ISS and MLP methods, respectively. The average improvement in
the WSS score is 21.26, which is 15.85, 22.73, 11.93 score better than the -norm, ISS
and MLP methods, respectively. The average improvement in PESQ is 0.743 point,
which is 0.536, 0.622, 0.684 points better than the -norm, ISS and MLP methods,
124
Amplitude
Ch2
Ch1
0.5
Ch3
Ch4
Ch5
Ch6
Ch8
Ch7
0
0.5
1
1.5
0.5
1.5
2.5
Samples
3.5
4
x 10
Figure 6.12: Impulse responses of real reverberant acoustic channels. The length of
each impulse response is L = 4400.
respectively.
6.2.5
Real reverberant channels
In this section, we present the speech dereverberation performance of our proposed

technique using the real reverberant acoustic channels obtained from the MARDY.
The eight AIRs that were used for generating the reverberated sound are shown in Fig.
6.12. The length of each impulse response was truncated to L = 4400. The additive
noise in each channel was a low passed filtered zero mean white Gaussian sequence.
The filter was desinged using Parks-McClellan algorithm [61] with a 3500 Hz passband
cutoff frequency, 3600 Hz stopband cutoff frequency, with a sampling frequency of
8000 Hz, 40 dB attenuation in the stopband, and 3 dB of ripple in the passband. In all
cases, = 0.9, = 0.0003, and the eigenfilter length, Lg = 629 were used. The block
length for the ZFE was 9 L, which means a block-delay of 4.95s.
The LLR, segSNR, WSS and PESQ score of the proposed method along with other
state-of-the-art dereverberation techniques are presented in Tables 6.12 to 6.15. SNR =
25 dB was considered for evaluating all the techniques. It is observed that the proposed
technique surpasses all the comparing methods by a significant margin. The average
125
Table 6.12: Quality of the dereverberated speech for the real acoustic channels in terms
of LLR
Speech
LLR
(TIMIT database)
rev
-norm
ISS
MLP
proposed
0.954
0.910
0.790
0.978
0.748
1.080
1.077
1.179
0.923
0.815
0.831
0.882
0.870
0.690
0.638
0.849
0.863
0.822
0.849
0.666
0.692
0.783
0.786
0.643
0.512
0.954
0.819
1.130
0.890
0.749
0.778
0.735
1.130
0.770
0.665
1.049
1.030
1.000
0.988
0.789
Female
Male
of segSNR
Speech
segSNR
(TIMIT database)
rev
-norm
ISS
MLP
proposed
2.24
2.49
1.82
5.03
1.75
3.29
3.36
1.32
6.50
0.39
2.79
1.98
2.36
5.55
1.83
2.69
2.06
4.17
6.07
2.40
4.62
4.97
2.50
5.07
1.96
5.24
4.77
2.66
6.71
0.50
5.08
2.94
3.09
5.72
0.14
4.84
5.32
1.75
6.43
0.02
Female
Male
126
of WSS
Speech
WSS
(TIMIT database)
rev
-norm
ISS
MLP
proposed
43.19
41.75
51.01
55.00
33.46
48.08
50.48
61.53
66.00
43.43
42.96
45.32
56.78
61.66
32.34
42.23
42.27
44.73
58.19
32.58
29.35
33.38
30.63
30.69
28.47
43.79
42.32
49.07
44.72
40.00
34.06
36.75
45.86
41.62
31.71
39.91
42.12
37.39
39.50
35.66
Female
Male
of PESQ
Speech
PESQ
(TIMIT database)
rev
-norm
ISS
MLP
proposed
2.534
2.691
2.301
1.840
2.739
2.469
2.573
2.340
1.939
2.681
2.576
2.669
2.344
1.937
2.818
2.490
2.700
2.573
2.042
2.792
2.650
2.758
2.755
2.301
2.896
2.412
2.504
2.091
2.030
2.721
2.510
2.671
2.390
2.020
2.711
2.580
2.678
2.566
2.237
2.741
Female
Male
127
improvement in the LLR is 0.200 point, which is 0.190, 0.266, 0.144 points better than
the -norm, ISS and MLP methods, respectively. The average improvement in the
segSNR is 4.81 dB, which is 4.45, 3.42, 6.84 dB better than the -norm, ISS and MLP
methods, respectively. The average improvement in the WSS score is 5.74, which is
7.09, 12.41, 14.96 score better than the -norm, ISS and MLP methods, respectively.
The average improvement in the PESQ is 0.235 point, which is 0.107, 0.342, 0.719
points better than the -norm, ISS and MLP methods, respectively. The average
improvement in SNR for these utterances using the real acoustic channels was 2.43 dB.
The inferior performance of the comparing methods can be reasoned as follows.
The ISS assumes that the source signal is white which does not hold for the speech
input. Therefore, the received signal is prewhitened before calculating the coefficients
of the inverse filter. The technique proposed in [28] for estimating the whitening
filter is based on the magnitude spectrum of the autoregressive (AR) system of the
speech signal. Since the phase spectrum of the AR system function is ignored, the
prewhitening performance becomes erroneous which causes improper inversion of the
AIRs. Moreover, the presence of noise which is not considered in the ISS further
deteriorates its performance. The MLP method [22] estimates the AR parameters
of the speech from the characteristic polynomials of the prediction matrix calculated
using the correlation between the current samples and one sample delayed version
of the multichannel received signals. The prediction matrix was estimated from a
2s speech samples. However, the AR parameters cannot be assumed stationary for
such a long duration. As a result, the estimated AR parameters is an average of
the actual variables which deteriorates the perceptual quality of the dereverberated
speech. Moreover, in our implementation of the MLP method, the characteristic
polynomials of the prediction matrix tend to diverge from the actual value when the
AIRs exceed a few hundred taps. For comparison purposes, we have used known AR
parameters for simulating the MLP method. The method further assumes that at
least one microphone is closer to the speaker than the noise source in order to obtain
the source LP residual from the noisy received signal. However, the assumption does
not hold for incoherent noise and the dereverberated output was found severely noise
128
corrupted. The shortening of AIRs based on the -norm optimization proposed in

[41] does not consider channel estimation inaccuracy which is a significant source of
error for a practical implementation. Moreover, the method does not compensate early
reflections which cause the different objective measures except the PESQ to show poor
speech quality.
6.2.6
Time-varying condition
In a realistic environment, the acoustic channels are time-varying. The slight movement
of the speakers head, which is very natural during conversation, causes the AIRs to
be changed. An adaptive channel estimation algorithm can track the time-varying
channels. Therefore, the proposed dereverberation technique is suitable for changing
acoustic condition. In order to simulate the time-varying condition, the length of each
impulse response was taken to be L = 2400 samples corresponding to the reverberation
time T60 = 0.3s. Fig. 6.13 shows the NPM of channel estimation in which the source
position was shifted six times from the original position. In the first four cases, the
speaker moved to the left by 1 cm at each step and in the last two cases, the speaker
moved to the right by 1 cm at each step. The notches in the NPM curve show the
instant when the speaker moved. We find that the algorithm remains in a good NPM
level despite frequent changes in the AIRs and quickly converges to the previous level
after the movement of the speaker. In order to visualize how fast the algorithm can
track the time-varying AIRs, the speaker was moved in faster paces in subsequent
experiments. The convergence profile of the adaptive channel estimation algorithm is
shown in Figs. 9 (a) to (d). Here, we see that the algorithm requires around 20 blocks
of data to converge to the previous NPM level after the speaker has moved. Since each
block of data requires L number of new speech samples, the algorithm requires around
6 seconds (for 8000 Hz sampling frequency) to converge in the time-varying condition.
In other words, if the AIRs remain same for around 20 blocks of data, we can obtain
an estimate of the AIRs from the noisy received signal. Since the estimated channels
show a good NPM value, the dereverberation performance using these estimates would
129
(a)
6.5
7
7.5
2700
2800
2900
3000
3100
3200
3300
3400
3500
3600
3100
3150
(b)
6.5
NPM (dB)
7
7.5
2650
6
2700
2750
2800
2850
2900
2950
3000
3050
(c)
6.5
7
7.5
2650
2700
2750
2800
2850
2900
(d)
6
7
8
2660
2680
2700
2720
2740
Blockindex
2760
2780
Figure 6.13: Convergence profile of the robust NMCFLMS algorithm for time-varying
channels.
be reasonable.
6.3
Conclusion
In this chapter,
we have proposed two complete multimicrophone speech
dereverberation techniques suitable for a noisy environment with slowly varying long
acoustic channels. In the first approach channel shortening algorithm was utilized to
suppress the energy in the late reflections. A step-size optimized iterative shortening
algorithm was proposed that maintained a trade-off between shortening performance
and spectral distortion of the dereverberated speech. In the second dereverberation
approach, a block-adaptive zero forcing equalizer along with eigenfilter based signal
power enhancement was proposed that eliminate both early and late reverberations.
The technique was found effective for practical room impulse responses and time
varying acoustic environment.
Chapter 7
Conclusions and Further Research
The speaker enjoys natural way of communication using hands-free systems. This is
because the microphones are placed a certain distance away from him and there is
no need for wearing a headset or holding a microphone during conversation. However
this freedom of movement usually comes at the expense of increased background noise
and reverberation recorded by the distant microphones. These contaminations may
lead to total loss of intelligiblity of the speech signal. Since the early days of acoustic
signal processing, researchers have developed numerous algorithms to counteract the
detrimental effect of reverberation and background noise but their performance is
limited and only few of these are useful in practice. In this dissertation, several
multi-microphone speech dereverberation techniques have been developed using robust
acoustic channel estimation and equalization in order to improve the performance
of hands-free systems. This chapter summarizes the obtained results, highlights the
contributions and provides a guideline for future research work.
7.1
Conclusions
In this dissertation, we have dealt with the problem of speech dereverberation

considering a noisy and time-varying acoustic environment.
Although various
alternatives are available in the literature, the blind channel identification and
130
Chapter 7. Conclusions and Further Research
131
equalization approach allows perfect dereverberation from theoretical perspective.

However, the major limitation was that no available techniques could blindly estimate
the AIRs in a realistic environment. We have proposed multichannel LMS algorithms
that have overcome the limitations of low convergence speed, step-size ambiguity and
lack of robustness to noise. The proposed VSS-MCFLMS algorithm ensures minimum
misalignment between the true and estimated channel vector in each iteration and thus
ensures optimal convergence speed. It has been demonstrated that the conventional
MCFLMS algorithm cannot provide noise-robust solution as the MMSE solution does
not remain collinear with the true channel vector in the noisy condition. The noiserobust solution has been ensured by attaching a spectral constraint along with the
conventional update equation of the MCFLMS algorithm. The constraint is obtained
by maximizing a novel cost function that inherently opposes the detrimental effect of
nonuniform spectral attenuation by the noisy gradient vector. The proposed spectrally
constrained NMCLFMS algorithm could successfully estimate the long AIR in a timevarying condition with reasonable accuracy.
The robust estimates of AIRs were then utilized to dereverberate the speech as well
as improve the SNR. Two different dereverberation approaches were proposed, the
first one only suppresses the late reflections considering the fact that early reflections
is beneficial from psychoacoustic perspective as they reinforce the direct sound. To
the contrary, the second technique eliminates both early and late reflections. This is
due to the fact that the AIRs get distorted by the eigenfilter that improved the SNR
and hence complete equalization was necessary for better perceptual quality in the
dereverberated speech. One of the elegant features of the proposed technique is that
it does not require a priori information about the source or microphone location and
assumptions on statistical properties of the speech or noise signal.
In the proposed dereverberation framework, a SIMO FIR system was utilized to
model the speaker and the microphone array. The AIRs have been estimated using the
robust MCLMS algorithms developed in Chapter 5. The performance of the algorithms
have been studied for the identification of both random coefficients- and acousticchannels with random sequence and speech inputs. The simulations were conducted
132
for wide range of SNRs and channel lengths. Acoustic channels were obtained from
the Image model developed by Allen et al. as well as from experimental data stored
in the MARDY [57]. The noise considered was computer generated additive white
Gaussian random signal. The channel estimation accuracy was measured by normalized
projection misalignment (NPM) index. The perceptual quality of the dereverberated
speech was measured using a variety of objective measures such as LLR, avg-Seg-SNR,
WSS, and PESQ measures. Both female and male utterances, randomly taken from
the TIMIT database, have been used to evaluate and compare the performance of the
proposed techniques.
The channel estimation results of the proposed robust MCLMS algorithms show
that the noise-robustness was achieved with almost no sacrifice in the speed of
convergence. However, the final NPM was dictated by the noise level in the received
signal. The spectrally constrained NMCFLMS algorithm achieved a steady-state NPM
of 8 dB when a 5 channel 4400 coefficients long acoustic systems were estimated with
speech input at 25 dB SNR. The NPM decreased to 5 dB when the SNR is 10 dB.
The algorithm can also track the variation in AIRs when the speaker moves slowly
from his/her original position. For real reverberant channels, the obtained final NPM
was 7 dB at 30 dB SNR. The proposed dereverberation technique can provide around
3 dB SNR improvement in the range of 10 to 25 dB. The quality of the dereverberated
speech were significantly improved as compared to the state-of-the-art techniques.
7.2
Future Research
In this section, we provide some suggestions and guidelines for future research work.
The effectiveness of the proposed noise-robust MCLMS algorithm is studied
and verified considering white Gaussian noise.
The acoustic environment often
includes colored background noise, which violates certain assumptions of the developed
algorithms. The simulation environment becomes more realistic when colored noise is
considered. Thus there is a room for improvement of the blind channel identification
133
algorithms by making them robust against colored noise.

The channel estimation algorithm and the zero forcing equalizer process the speech
signal in blocks. This imposes certain limitations on the performance of the proposed
algorithm. First, the acoustic impulse responses should remain constant for at least
one block of data. The algorithm performs well when the channel does not change for
several blocks. Moreover, the block-adaptive zero-forcing equalizer cannot process the
signal until a full block of data is received. These limitations hinder the application of
the proposed technique in real-time which should be resolved in future work.
The zero-forcing equalizer tends to amplify noise in the dereverberated signal.
Although this effect is mitigated by the spectral constraint attached to the iterative
eigenfilter, there is still room for improvement. Particularly, the equalization techniques
based on least-square minimization is known to have better performance in the noisy
condition at the expense of computational complexity. The suitability of the leastsquare equalization technique in the proposed dereverberation model can be studied in
future.
The SIMO acoustic model considered in this work essentially implies that only one
speaker is talking at a time. However, in a realistic speech communication environment,
there may be more than one speaker. To deal with multiple speakers, we have to
consider multiple-input multiple-output (MIMO) model. The future research direction
may be steered towards speech dereverberation in a MIMO system.
Appendix A
Iterative solution for finding the
eigenvector corresponding to the
largest eigenvalue
b corresponding to the
The iterative update equation for finding the eigenvector of A
largest eigenvalue can be formulated as
b
g(l + 1) = Ag(l)
(A.1)
b = UUT
A
(A.2)
b can be diagonalized as
A
b and is a
where U is the unitary matrix whose columns are the eigenvectors of A
b
diagonal matrix with diagonal elements k , 1 k Lg , be the eigenvalues of A.
Substituting (A.2) into (A.1) and premultiplying by UT , we obtain
go (l + 1) = go (l)
(A.3)
where, go (l) = UT g. The set of Lg first-order difference equations in (A.1) are now
decoupled. Therefore, the solution of the kth equation can be obtained as [49],
gko (l) = Ck (k )l u(l)
134
(A.4)
Appendix A. Iterative solution for finding the eigenvector
135
where gko (l) is the component of go (l), Ck is an arbitrary constant that depends on the
initial value of g(l), and u(l) is the unit step function. Now g(l) can be obtained as,
g(l) = Ugo (l)
g1o (l)
h
=
u 1 . . . u k . . . u Lg
..
.
i
o
gk (l)
..
.
gLo g (l)
(A.5)
b
where, uk is the eigenvector corresponding to the eigenvalue k . Since = 1/T r{A},
we have k < 1 for all k. As a result, gko (l) decays with each iteration, where the rate
of decay is dependent on the value of k . The larger the value of k , the smaller the
rate of decay. Therefore, after a large number of iterations, the final value of gko (l) can
be expressed as
gko (N )|N large = k , when k 6= max
= max , when k = max
(A.6)
where represents a small number and max k . Substituting (A.6) into (A.5), the
final estimate of the channel can be approximated as
g(N )|N large max umax
where umax is the eigenvector corresponding to the largest eigenvalue max .
List of Publications
Journal
1. M. A. Haque and M. K. Hasan, Noise robust multichannel frequency-domain
LMS-type algorithms for blind channel identification, IEEE Signal Processing
Letters, pp.305-308, vol.15, 2008.
2. M. A. Haque and M. K. Hasan, Robust multichannel LMS-type algorithms
with fast decaying transient for blind identification of acoustic channels, Journal
IET Signal Processing (formerly IEE Proceedings, UK), vol. 2, no. 4, pp.431-441,
Dec. 2008.
3. M. A. Haque and M. K. Hasan, Variable step-size multichannel frequencydomain LMS algorithm for blind identification of finite impulse response
systems, Journal IET Signal Processing (formerly IEE Proceedings, UK), vol.
1, no. 4, pp.182-189, 2007.
4. M. A. Haque, M. S. A. Bashar, P. A. Naylor, K. Hirose and M. K. Hasan,
Energy constrained frequency-domain normalized LMS algorithm for blind
channel identification, Signal, Image and Video Processing (SIViP), Springer
(UK), pp.203-213, 2007.
International Conferences
1. M. A. Haque and M. K. Hasan, Performance comparison of the frequencydomain multichannel normalized and variable step-size LMS algorithms, Proc.
136
List of Publications
137
of 15th European Signal Processing Conference (EUSIPCO 2007), September 3-7,

2007, Poznan, Poland.
2. M. A. Haque and M. K. Hasan, Variable step size frequency domain
multichannel LMS algorithm for blind channel identification with noise, Proc. of
5th International Symposium on Communication Systems, Networks and Digital
Signal Processing (CSNDSP 2006), July 19-21, 2006, Patras, Greece.
Submitted Papers
1. M. A. Haque, Toufiqul Islam and M. K. Hasan, Acoustic Channel
Shortening with Spectrally Constrained Least-Square Minimization for Speech
Dereverberation in Noise, submitted to Journal IET Signal Processing.
2. M.
A.
Haque,
Toufiqul Islam and M. K. Hasan,
Robust Speech
Dereverberation Based on Blind Adaptive Estimation of Acoustic Channels,

submitted to IEEE Trans. Audio, Speech, Language Process.
Bibliography
[1] F. A. Everest, The Master Handbook of Acoustics, McGraw-Hill, 4th edition,
2001
[2] T. Houtgast and H. J. M. Steeneken, A review of the MTF concept in room
acoustics and its use for estimating speech intelligibility in auditoria, Journal of
the Acoustical Society of America, vol. 77, no. 3, pp. 1069-1077, Mar. 1985
[3] J. Blauert, Spatial Hearing: The Psychophysics of Human Sound Localisation,
The MIT Press, 1983
[4] I. J. Tashev, Sound Capture and Processing: Practical Approaches, John Wiley
& Sons, West Sussex, U. K., 2009
[5] J. H. L. Hansen and M. A. Clements, Constrained iterative speech enhancement
with application to speech recognition, IEEE Trans. Signal Process., vol. 39, no.
4, pp. 795-805, Apr. 1991
[6] M. Omologo, P. Svaizer and M. Matassoni, Environmental conditions and acoustic
transduction in hands-free speech recognition, Speech Communication, vol. 25, no.
3, pp. 75-95, Aug. 1998
[7] L.E. Ryall, Improvements in electric signal amplifiers incorporating voice-operated
devices, G.B. Patent No. 509613, 1939.
[8] E. A. P. Habets, Single- and Multi-Microphone Speech Dereverberation using
Spectral Enhancement, Ph. D. Dissertation, Technische Universiteit Eindhoven,
2007
[9] S. Gannot, D. Burshtein and E. Weinstein, Signal enhancement using beamforming
and nonstationarity with applications to speech, IEEE Trans. Signal Process., vol.
49, no. 8, pp. 1614-1626, 2001.
[10] J. Bitzer, K. Simmer and K.D. Kammeyer, Theoretical Noise Reduction Limits of
the Generalized Sidelobe Canceller for Speech Enhancement, in Proc. of the IEEE
International Conference on Acoustics, Speech, and Signal Processing (ICASSP99),
Mar. 1999, vol. 5, pp. 2965-2968
138
BIBLIOGRAPHY
139
[11] B. Yegnanarayana, Enhancement of reverberant speech using LP residual, in

Proc. of the IEEE International Conference on Acoustics, Speech, and Signal
Processing (ICASSP98), 1998, vol. 1, pp. 405-408
[12] B. Yegnanarayana and P.S. Murthy, Enhancement of reverberant speech using
LP residual signal , IEEE Trans. Speech Audio Process., vol. 8, no. 3, pp. 267-281,
2000
[13] B.W. Gillespie, H. Malvar, and D. Florencio, Speech dereverberation via
maximumkurtosis subband adaptive filtering, in Proc. of the IEEE International
Conference on Acoustics, Speech, and Signal Processing (ICASSP01), 2001, vol. 6,
pp. 3701-3704
[14] N. Mitianoudis M. Tonelli and M.E. Davies, Maximum Likelihood approach to
blind audio de-reverberation, in Proc. of the 7th International Conf. on Digital
Audio Effects (DAFX04), Naples, Italy, Oct. 2004, pp. 1-6
[15] M. Tonelli, M.G. Jafari, and M.E. Davies, A multi-channel Maximum Likelihood
approach to de-reverberation, in Proc. of the European Signal Process. Conf.
(EUSIPCO06), Florence, Italy, Sept. 2006
[16] N.D. Gaubitch, P.A. Naylor, and D. Ward, On the use of linear prediction for
dereverberation of speech, in Proc. of the International Workshop on Acoustic
Echo and Noise Control (IWAENC03), Kyoto, Japan, 2003, pp. 99-102.
[17] K. Lebart, J.M. Boucher, and P.N Denbigh, A new method based on spectral
subtraction for speech dereverberation, Acta Acoustica, vol. 87, no. 3, pp. 359366, 2001
[18] M. Wu and D. Wang, A two-stage algorithm for enhancement of reverberant
speech, in Proc. of the IEEE International Conference on Acoustics, Speech, and
Signal Processing (ICASSP05), 2005, vol. 1, pp. 1085-1088.
[19] M. Wu and D. Wang, A Two-Stage Algorithm for One-Microphone Reverberant
Speech Enhancement, IEEE Trans. Audio, Speech, Lang. Process., May. 2006, vol.
14, no. 3, pp. 774-784
[20] M. Delcroix, T. Hikichi, and M. Miyoshi, Blind dereverberation algorithm for
speech signals based on multi-channel linear prediction, Acoustical Science and
Technology, vol. 26, no. 5, pp. 432-439, 2005
[21] M. Delcroix, T. Hikichi, and M. Miyoshi, Precise dereverberation using
multichannel linear prediction, IEEE Trans. Audio, Speech, Language Process.,
vol. 15, no. 2, pp. 430-440, 2006
[22] M. Delcroix, T. Hikichi and M. Miyoshi, Dereverberation and Denoising Using
Multichannel Linear Prediction, IEEE Trans. Audio, Speech, Lang. Process., Aug.
2007, vol. 15, no. 6, pp. 1791-1801
BIBLIOGRAPHY
140
[23] T. Nakatani, M. Miyoshi, and K. Kinoshita, Implementation and Effects of Single

Channel Dereverberation based on the Harmonic Structure of Speech, in Proc. of
the International Workshop on Acoustic Echo and Noise Control (IWAENC03),
2003, pp. 91-94
[24] T. Nakatani and M. Miyoshi, Blind dereverberation of single channel speech
signal based on harmonic structure, in Proc. of the International Conf. on Acoust.,
Speech, and Signal Process. (ICASSP03), 2003, vol. 1, pp. 92-95
[25] T. Nakatani, K. Kinoshita and M. Miyoshi, Harmonicity-Based Blind
Dereverberation for Single-Channel Speech Signals, IEEE Trans. Audio, Speech,
Lang. Process., Jan. 2007, vol. 15, no. 1, pp. 80-95
[26] K. Kinoshita, T. Nakatani, and M. Miyoshi, Harmonicity Based Dereverberation
for Improving Automatic Speech Recognition Performance and Speech
Intelligibility, IEICE Trans. on Fundamentals of Electronics Communications and
Computer Sciences, vol. E88-A, no. 7, pp. 1724-1731, 2005
[27] T. S. Bakir and R. M. Mersereau, Blind adaptive dereverberation of speech signals
using a microphone array, in Proc. ASAP, 2003.
[28] K. Furuya and A. Kataoka, Robust Speech Dereverberation Using Multichannel
Blind Deconvolution With Spectral Subtraction, IEEE Trans. Audio, Speech,
Lang. Process., Jul. 2007, vol. 15, no. 5, pp. 1579-1591
[29] L. Tong, G. Xu, and T. Kailath, A new approach to blind identification and
equalization of multipath channels, in Proc. 25th Asilomar Conf. Signals, Syst.,
Comput., 1991, vol. 2, pp. 856860
[30] G. Xu, H. Liu, L. Tong, and T. Kailath, A least-squares approach to blind
channel identification, IEEE Trans. Signal Process., Dec. 1995, vol. 43, no. 12, pp.
29822993.
[31] Y. Hua, Fast maximum likelihood for blind identification of multiple FIR
channels, IEEE Trans. Signal Process., Mar. 1996, vol. 44, no. 3, pp. 661672.
[32] L. Tong and S. Perreau, Multichannel blind identification: from subspace to
maximum likelihood methods, vol. 86, no. 10, pp. 1951-1968, Oct. 1998
[33] S. Gannot and M. Moonen, Subspace methods for multimircophone speech
dereverberation, EURASIP J. Appl. Signal Process., vol. 2003, no. 11, pp.
10741090, 2003.
[34] Y. Huang and J. Benesty, Adaptive multi-channel least mean square and Newton
algorithms for blind channel identification, Signal Process., Aug. 2002, vol. 82, no.
8, pp. 11271138.
[35] Y. Huang and J. Benesty, A class of frequency-domain adaptive approaches to
blind multichannel identification, IEEE Trans. Signal Process., Jan. 2003, vol. 51,
no. 1, pp. 11-24
BIBLIOGRAPHY
141
[36] S. T. Neely and J. B. Allen, Invertibility of a room impulse response, J. Acoust.

Soc. Amer., vol. 66, no. 1, pp. 165-169, July 1979
[37] J. Mourjopoulos, P. Clarkson, and J. Hammond, A comparative study of leastsquares and homomorphic techniques for the inversion of mixed phase signals, in
Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, May 1982, vol. 7, pp.
1858-1861
[38] P. A. Nelson, F. O. Brustamante, and H. Hamada, Inverse filter design and
equalization zones in multichannel sound reproduction, IEEE Trans. Speech Audio
Processing, vol. 3, no. 3, pp. 185-192, Nov. 1995
[39] B. D. Radlovic and R. A. Kennedy, Nonminimum-phase equalization and its
subjective importance in room acoustics, IEEE Trans. Speech Audio Processing,
vol. 8, no. 6, pp. 728-737, Nov. 2000
[40] P. J. Melsa, R. C. Younce, and C. E. Rohrs, Impulse response shortening
for discrete multitone transceivers, EEE Trans. Commun., vol. 44, no. 12, pp.
16621672, 1996
[41] T. Mei, A. Mertins, and M. Kallinger, Room impulse response shortening with
infinity-norm optimization, in Proc. IEEE Int. Conf. Acoust., Speech, Signal
Process., 2009
[42] M. Miyoshi and Y. Kaneda, Inverse Filtering of Room Acoustics, IEEE Trans.
On Acoust., Speech and Signal Process., Feb. 1988, vol 36, no.2, pp. 145-152.
[43] K. Yamada, J. Wang, and F. Itakura, Recovering of broad band reverberant
speech signal by sub-band MINT method, in Proc. IEEE Int. Conf. Acoust.,
Speech, Signal Processing, 1991, pp. 969972
[44] B.D. Radlovic, R. Williamson, and R. Kennedy, On the poor robustness of
sound equalization in reverberant environments, in Proc. of the IEEE International
Conference on Acoustics, Speech, and Signal Processing (ICASSP99), 1999, vol. 2,
pp. 881-884
[45] B.D. Radlovic, R. Williamson, and R. Kennedy, Equalization in an acoustic
reverberant environment: robustness results, IEEE Trans. Speech Audio
Processing, vol. 8, no. 3, pp. 311-319, 2000
[46] M. K. Hasan, J. Benesty, P. A. Naylor, and D. B. Ward, Improving robustness of
blind adaptive multichannel identification algorithms using constraints, in Proc.
European Signal Processing Conference, 2005.
[47] M. K. Hasan and P. A. Naylor, Effect of noise on blind adaptive multichannel
identification algorithms: Robustness issue, in Proc. European Signal Processing
Conference, 2006.
BIBLIOGRAPHY
142
[48] M. A. Haque, M. S. A. Bashar, P. A. Naylor, K. Hirose and M. K. Hasan,

Energy constrained frequency-domain normalized LMS algorithm for blind channel
identification, Signal, Image and Video Processing (SIViP), Springer (UK),
pp.203-213, 2007
[49] J. Proakis, C. Rader, F. Ling, C. Nikias, M. Moonen and I. Proudler,Algorithms
for Statistical Signal Processing, Pearson Education, Inc, 2002
[50] J. Chen, Y. Huang and J. Benesty,Optimal step size of the adaptive multichannel
lms algorithm for blind simo identification,IEEE Signal Processing Lett., Mar.
2005, vol. 12, no. 3, pp. 173-176
[51] S. Haykin, Adaptive Filter Theory, 3rd Edition, Prentice- Hall, Inc, Upper
Saddle River, New Jersey, 1996
[52] D. Morgan, J. Benesty and M. Sondhi, On the evaluation of estimated impulse
responses, IEEE Signal Processing Lett., July 1998, vol. 5, no. 7, pp. 174-176
[53] J. Allen and D. Berkley, Image method for efficiently simulating small-room
acoustics, J. Acoust. Soc. Amer., Apr. 1979, vol. 65, no. 4, pp. 943-950
[54] Alexander Kaps, Acoustic noise reduction using a multiple-input single-output
kalman filter, in Proc. International Workshop on Acoust. Echo and Noise Control,
Sept. 2005
[55] Alexandre Guerin, Regine Le Bouquin-Jeannes, and Gerard Faucon, A twosensor noise reduction system: Applications for hands-free car kit, EURASIP J.
Applied Signal Process., vol. 2003, no. 11, pp. 11251134, 2003
[56] R. Ahmed, W. H. Khong, and P. A. Naylor, Proportionate frequency domain
adaptive algorithms for blind channel identification, in Proc. IEEE Int. Conf.
Acoust., Speech, and Signal Process., 2006, vol. V, pp. 2932
[57] Multichannel
Acoustic
Reverberation
(http://www.commsp.ee.ic.ac.uk/sap)
Database
at
York
[58] M. Ding, B. L. Evans, R. K. Martin, and C. R. Johnson, Minimum intersymbol

interference methods for time domain equalizer design, in Global Telecommun.
Conf., 2003, vol. 4, pp. 21462150
[59] Yi Hu and P. C. Loizou, Evaluation of Objective Quality Measures for Speech
Enhancement, in IEEE Trans. Audio, Speech, Language Process., 2008, vol. 16,
no. 1, pp. 229238
[60] A. Goldsmith, Wireless Communication, Cambridge University Press, New
York, USA, 2005.
[61] J. G. Proakis and D. G. Manolakis, Introduction to Digital Signal Processing,
Macmillan Publishing Company, New York, 1989

Multimicrophone Speech Dereverberation With Noise For Hands-Free Communication

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Multimicrophone Speech Dereverberation With Noise For Hands-Free Communication

Uploaded by

Copyright:

Available Formats

Multimicrophone Speech Dereverberation with Noise

for Hands-free Communication

Department of Electrical and Electronic Engineering

Signature of the Candidate

Mohammad Ariful Haque

The dissertation entitled Multimicrophone Speech Dereverberation with Noise

Prof. Kazi Mujibur Rahman

Dr. Newaz M. Syfur Rahim

Dr. Mohammed Imamul Hassan Bhuiyan

Hands-free Communication Environment . . . . . . . . . . . . . . . . .

Effect of Reverberation on Speech Perception

Blind channel identification and equalization . . . . . . . . . . .

3 Multichannel LMS Algorithm for Blind Channel Identification:

Basic Concept and Techniques . . . . . . . . . . . . . . . . . . . . . . .

Review of Common BCI Schemes . . . . . . . . . . . . . . . . . . . . .

Multichannel LMS algorithm . . . . . . . . . . . . . . . . . . . .

Convergence analysis of MCLMS algorithm . . . . . . . . . . . .

Limitations of the time-domain MCLMS algorithm . . . . . . .

The MCFLMS Algorithm . . . . . . . . . . . . . . . . . . . . . . . . .

The Normalized MCFLMS Algorithm . . . . . . . . . . . . . . . . . . .

Effect of Noise on the NMCFLMS Solution . . . . . . . . . . . . . . . .

4 Variable Step-size Multichannel Frequency-Domain LMS for Blind

Identification of FIR Channels

Optimum-Step-size MCFLMS Algorithm . . . . . . . . . . . . . . . . .

Convergence Analysis of the VSS-MCFLMS Algorithm . . . . . . . . .

VSS-MCFLMS vs. NMCFLMS: Algorithmic Difference . . . . . . . . .

Performance analysis of the NMCFLMS . . . . . . . . . . . . .

Comparison of computational complexity . . . . . . . . . . . . .

Random multichannel system . . . . . . . . . . . . . . . . . . .

The virtual acoustic room . . . . . . . . . . . . . . . . . . . . .

Acoustic multichannel system . . . . . . . . . . . . . . . . . . .

5 Noise Robust Multichannel Time- and Frequency-Domain LMS-type

Excitation-driven Robust MCLMS Algorithm . . . . . . . . . . . . . .

The desired estimate from the noisy eigenvectors

The noise-robust MCLMS algorithm . . . . . . . . . . . . . . .

VSS-RMCLMS and proportionate algorithms . . . . . . . . . .

Robust multichannel frequency-domain LMS algorithms

Limitation of excitation-driven MCLMS algorithm . . . . . . . .

Spectrally Constrained RMCLMS Algorithm . . . . . . . . . . . . . . .

Estimation of long AIR . . . . . . . . . . . . . . . . . . . . . . .

Estimation of real reverberant acoustic channels . . . . . . . . .

Effect of number of microphones . . . . . . . . . . . . . . . . . .

6 Robust Speech Dereverberation Using Channel Information

Spectrally Constrained Channel Shortening for Speech Dereverberation

Simulation results . . . . . . . . . . . . . . . . . . . . . . . . . . 100

Limitations of channel shortening . . . . . . . . . . . . . . . . . 104

Zero Forcing Equalizer based Dereverberation . . . . . . . . . . . . . . 106

Eigenfilter based SNR improvement . . . . . . . . . . . . . . . . 106

Zero forcing equalizer . . . . . . . . . . . . . . . . . . . . . . . . 110

Simulation results . . . . . . . . . . . . . . . . . . . . . . . . . . 116

Acoustic channels using the Image model . . . . . . . . . . . . . 117

Real reverberant channels . . . . . . . . . . . . . . . . . . . . . 124

Time-varying condition . . . . . . . . . . . . . . . . . . . . . . . 128

7 Conclusions and Further Research

Future Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

A Iterative solution for finding the eigenvector corresponding to the

Impact of approximations on the desired estimate . . . . . . . . . . . .

Comparison of the final solution using conventional and robust MCLMS

Spectrally constrained normalized multichannel frequency-domain LMS

Results of DRR, PESQ and SNR improvement . . . . . . . . . . . . . . 103

Results of DRR, PESQ and SNR improvement . . . . . . . . . . . . . . 105

Frequency domain constrained eigensolver algorithm . . . . . . . . . . . 111

Block adaptive zero forcing equalizer algorithm . . . . . . . . . . . . . 116