Professional Documents
Culture Documents
Correspondence
Suppression of Acoustic Noise in Speech Using tant design factor in noise suppressionsince large misadjust-
Two Microphone Adaptive Noise Cancellation ment manifests itself as apronounced echo in thespeech wave-
form.Echo canbepresent intheoutput speechsince the
STEVEN F. BOLL AND DENNIS C. PULSIPHER output is continually fed back when estimating thetap
weights. The echo is removed by reducing the adaptation step
size, and thus the misadjustment.This reduction, of course,
Abstract-Acoustic noise with energy greater or equal to the speech conflicts withtherequirement of quick settling time. The
can be suppressed by adaptively filtering a separately recorded corre- tradeoff between misadjustment and settling time is discussed
latedversionofthenoisesignaland subtracting it from the speech below.
waveform. It is shown that for this application of adaptive noise can- Another issue is filter causalty. In general, a noncausal filter
cellation, large filter lengths are required to account for a highly rever- is required if the noise reaches the speaker before reaching the
berantrecordingenvironmentand that thereis a directrelationbe- reference microphone. Noncausal adaptivefilters are easily
tween filtermisadjustmentandinducedechointhe output speech. generated by placing a delay into the primary channel. How-
The second reference noise signal is adaptively filtered using the least ever, more tap weights are then required with the accompany-
meansquares, LMS, andthelatticegradientalgorithms.Thesetwo ing misadjustment problems described above. For the experi-
approaches are compared in terms of degree of noise power reduction, ments described here, the reference microphone was placed
algorithm convergence time, and degree of speech enhancement. Both next to thenoise source, eliminating the need for delay.
methods were shown to reduce ambient noise power by at least 20 dB
with minimal speech distortion and thus to be potentially powerful as 111. EXPERIMENTATION
AND RESULTS
noisesuppressionpreprocessorsforvoicecommunicationinsevere An analog white noise generator was played out through a
noise environments. loudspeakerinto a hard-walled room.The reference signal
microphone was placed next to the loud speaker, while the
I. INTRODUCTION primary microphone was placed 12 ft away next to the control
terminal. Theauthor (D.P.) spokeintothe primarymicro-
It hasbeen shown that there is asignificant reductionin phone while controllingthestereo recordingprogram. The
measured speech intelligibility and quality due to the ambient noise power was amplified t o such a level that the recorded
background noise generated in many operating environments speech was completely masked. The signals were filtered at
[ 1 1. A number of single microphone approaches for reducing 3.2kHz, sampled at6.67 kHz,and quantized to15 bits.
the backgroundnoise added t o speech have beendeveloped Recordings were made with and without speech present, each
[2]. However, thesemethodsbecome ineffective when the lasting 23.4 s.
noisepower is equal t o or greater than the signal power or Each algorithms performance is measured in terms of the
when the noise spectral characteristics change rapidly in time. degree of steady-state noise power reduction during nonspeech
This correspondence describes an alternative approach to noise activity, the time it takes to reach this steady-state value (algo-
suppression in which a second correlated noise source is adap- rithm settling time), and the amount of echo induced when
tively filtered to minimize the output power between the two speech is present. Threeexperiments were conducted to
microphone signals. Two adaptive algorithm implementations measure algorithm settling time and induced echo as a func-
were investigated: the LMS approach [31 andthelattice tion of specifiedmisadjustment. Step sizes were used corre-
gradient approach [4]. Each approach was compared in terms sponding to misadjustments of 1, 5 , and10percent.The
of degree of noise power reduction, algorithm settling time, results showed that both algorithms converge to a steady-state
and degree of speech enhancement. noisepower reduction of -20 dB in approximately 15 s for
10percent misadjustment and 21 s for 5 percent misadjust-
11. IMPLEMENTATIONCONSIDERATIONS
ment. At 1 percent misadjustment the step size for the LMS
The estimated adaptive filter in the absence of uncorrelated algorithm was so small that the noise power was reduced by
noise represents a transfer function equalto the productof the only - 10 dB before the data ran out.Forthelattice algo-
transfer function from the noise source to the speaker multi- rithm, at 1 percent misadjustment, essentially no convergence
plied by the inverse of the transfer function from the noise was measured. In listening totheoutput during
speech
source tothe reference microphone. Based on simulation activity, it was judged that at 10 percent misadjustment an
studies [ 51, approximating this inverse transfer function ade- unacceptable amount of echo was present and that at 5 per-
quately requires using an all-zero filter having 1500tap cent misadjustment the echo was just noticeable.
weights. Sucha large filter inturn increases misadjustment Toillustrate thisnoisesuppression capability, isometric
(the ratio of excess to minimum mean-square error, [ 3 I ). As plots of the short-time magnitude spectra with and without
is discussed below, the amount of misadjustment is an impor- noisesuppression are shownin Figs. 1 and 2. Adescription
of theplotconstruction is described in [ 21. Fig. 1 corre-
sponds to the short-time spectrum of the unprocessed speech
ManuscriptreceivedSeptember 5, 1979; revisedFebruary 8, 1980. signal: the pipe began to. Fig. 2 corresponds to the pro-
Thisworkwas supported by theInformationProcessingTechniques
Branch of the Defense Advanced Research Projects Agency, monitored cessed speech signal using the 5 percent misadjustment after
by the Naval Research Laboratory under Contract N00173-79-C-0045. the filter has converged. Since the noise was acoustically
S. F. Boll is with the Department of Computer Science, University of added, n o underlying clean speech spectrum was available for
Utah, Salt Lake City, UT 84112. comparison. However, it was judged that the intelligibility of
D. C. Pulsipher is with Sandia Laboratories, Livermore,CA 94550. the processed speechhad clearly improved.This was based