You are on page 1of 10

DESIGN AND IMPLEMENTATION

OF A NONLINEAR ACOUSTIC ECHO CANCELLER


Abdellatif Ben Rabaa, Abdellatif Mtibaa, Mohamed Abid and Rached Tourki
Electronic and Micro-Electronic Laboratory, Sciences Faculty of Monastir
Route de Kairouan, 5000 - Monastir, TUNISIA.
Tel : 216 3 460 919, Fax : 216 3 462 873.
Abstract - In this paper, an Acoustic Echo Canceller based on Neural Networks and
a Fast Affine Projection (FAP) algorithm has been introduced. This structure allows a
large set of trade-offs between convergence rate, residual error, tracking capacity, and
arithmetic complexity. Hence, the proposed structure bas the potential for solving
another difficult nonlinear adaptive signal processing tasks ; such as system
identification where nonlinearity and nonstationarity are both important factors. To
investigate the feasibility of practical implementation of the proposed structure, a
Hardw;are/Software implementation is discussed. The specific objective is to find an
implementation that satisfies all the system design constraints at least cost.

1. INTRODUCTION
Acoustic Echo Cancellers (AECs) are widely used to cancel echo signals and to
prevent acoustic feedback in teleconference systems. The AEC reduces the echo
signal by subtracting the estimated echo signal. This signal is produced by an
adaptive filter that models the room transfer function between a loudspeaker and a
microphone (Figure 1). The adaptive filter is trained by an adaptive algorithm. A
variety of techniques have been proposed that strike a balance between
performance and computational complexity [ 11.
External Inputs
I

Echo

E c h o signal

Figure 1 : Acoustic Echo Cancellation


A major difficulty usually encountered in acoustic echo cancellation is the
implementation of long adaptive filters (the filter length L is often chosen between
250 and 4000) [ 2 ] . Thus the fast versions of the Recursive Least Square ( U S )
algorithm, which reduce the arithmetic complexity to the order of 8L
multiplications per iteration, are disqualified because of their complexity [5]. The
Normalized Least Mean Square (NLMS) algorithm allows for a simple
implementation (2L multiplications per iteration) and robust performances [3].

-0/98/$10.00
0-7803-4997 396
However, the NLMS convergence rate depends on the input signal statistics,
and may be very slow when the input signal is correlated [4], which is the case in
acoustic echo cancellation.
In order to improve the adaptive behavior of the NLMS algorithm, Ozeki and
Umeda propose in [SI the Affine Projection (AP) algorithm. The general AP
algorithms are based on a multiple dimension projection per tap update. This
algorithm has properties that lie between those of the NLMS and RLS, i.e. less
complexity than RLS but much faster convergence than NLMS for an input signal
such as speech [3].
However, linear Acoustic Echo cancellers are incapable of reducing nonlinear
distortions which are generated mainly in the loudspeaker during large signal peaks
[ 6 ] .Nonlinear distortions in loudspeaker sometimes severely degrade the quality of
sound reproduction. These distortions include nonlinearity in the suspension system
and inhomogenety in the flux density [7]. Nonlinear techniques must be employed
to deal with1 system nonlinearities.
Neural networks are well suited for the nonlinear systems identification by
virtue of its ability to learn from its environment and the distributed nonlinearity
built into its design. However, the use of neural network remains contingent on the
availability of powerful hardware to provide adequate speed. Fortunately, the high
density of modern technologies lets us implement a large number of identical,
concurrently operating processors on one chip, thus exploiting the inherent
parallelism of neural networks. The regularity of neural networks and the small
number of well-defined arithmetic operations used by neural algorithms greatly
simplify the design and layout of VLSI circuits [8].
In this paper, an Acoustic Echo Canceller based on Neural Networks in
cascaded with a Fast Affine Projection (FAP) algorithm has been introduced. This
structure allows a large set of trade-offs between convergence rate, residual error,
tracking capacity, and arithmetic complexity. Hence, the proposed structure has the
potential for solving another difficult nonlinear adaptive signal processing tasks;
such as system identification where nonlinearity and nonstationarity are both
important factors.
To investigate the feasibility of practical implementation of the proposed
structure, a mixed hardware/sofhvare implementation is discussed. We focus on the
hardware rnodule design with the use of VHDL. VHDL is the name of the IEEE
1076 Hardware Description Language standard for Very high-speed digital circuit
design. Sirnulation results are performed by using the V-System simulator tool [9].
Galileo, a logic synthesis tool [lo], has been used in order to generate the gate level
of the hardlware module. The use of VHDL and logic synthesis provides flexibility
in the implementation of the proposed AEC. Modification of the design only
involves changes in the text code of the VHDL source. The specific objective is to
find an implementationthat satisfies all the system design constraints at least cost.
This paper is organised as follows : In section 2 we present the proposed AEC
structure. In section 3 we present a mixed hardwarehoftware implementation of the
proposed structure. In order to allow an efficient communication between the
hardware-software modules, an extendible communication model has been
developed in section 4. The paper concludes with some final remarks in section 5 .

397
2. THE PROPOSED ACOUSTIC ECHO CANCELLER
The most common measure of AEC performance is Echo Return Loss
Enhancement (ERLE). The ERLE is simply the ratio of the echo signal to the
residual echo. To improve the ERLE compared to a linear NLMS Acoustic Echo
Canceller, Birkett and Goubran propose in [ 113 an AEC based on a cascaded neural
network and linear transversal filter. Hence, the loudspeaker is modelled by a fully
adaptive three layer time delay feedforward neural network. The linear transversal
filter part is trained by the NLMS algorithm.
A Feedforward Neural Network offers significant improvement in ERLE in the
low to medium volume range where acoustic and digital signal processing related
noise are significant. However, when feedforward structures are utilized at high
volumes, little or no improvement in converged ERLE is observed for filtered noise
inputs, and this is confirmed by both the Volterra models and three neural network
models [ 1 11. It appears that the roodspeakerphone transfer function may contain
poles when the loudspeaker is at high volumes. This is most likely caused by
resonances in the plastics of the speaker phone and to a lesser extent poles in the
room transfer function [12]. In order to more accurately model the
roodspeakerphone transfer function, recurrent neural networks might be
necessary. Unfortunately, training recurrent neural networks to perform certain
tasks is known to be difficult and its computational complexity increases as Om4),
where N is the total number of neurons in the network. When the learning task of
interest is a difficult one, N may assume a large value making the computational
requirement of the algorithm unacceptable [13]. To resolve this drawback, we
propose to model the roodspeakerphone transfer hnction through the use of a
feedforward neural network fitted with tapped delay lines at its input and output
and a feedback loop (Figure 2). This identification scheme is founded on a standard
technique, which is called a series-parallel model in [14]. The choice of this
formulation allows the use of the standard backpropagation algorithm suited to
training feedforward neural neworks [ 151.
A. The backpropagation algorithm
At the n'th time point, the activation function of the neuron i is a sigmoidal
function described by the logistic function :
1
Y ,(4= w,(4)
=
1+ exp(-v, (n)) '
where v,( n )is the internal activation of the tth neuron, and y , ( n )is the output of
the tth neuron.
The error for the neurons on the output layer is:
6,(I.
= Q Y V , (.))(dl )(. - Y ,(.I)
@'is the first derivative of the activation function and d,(n)is the reference
signal.

398
The errors found for the output neurons are then propagated back to the
previous layer so that errors can be assigned to neurons contained in hidden layers.
'This calculation is governed by:

6,(1' = @'(vi(n)>CSjw,,j
J
where { i } is the set of all neurons in a non-input layer and (i} is the set of all
neurons in the next layer. This calculation is repeated for every hidden layer in the
network. After every non-input neuron has an activation and error associated with
it, the network's weights can be updated.
The change by which each weight should be changed is :

AwJ,J= W J('>'J
where is the learning rate. It is a constant controlling the size of weight changes.
The weight is changed by evaluating :
wl,J (' + '1 = wJ,J('1 + A w I , J '

A-q
External Inputs

Neural Net.
update
. .
7----

I
i

Echo Signal

algorithm (FAP)

Figure 2 : The proposed Acoustic Echo Canceller structure


B. The Fast Affine Projection algorithm
The Affine Projection algorithm consists of four parts: decorrelation of the
input data sequence, calculation of the decorrelation filter coefficients, convolution
of the input vector X(n) and the filter vector H(n), and a adjustment of the filter
vector H(n) using the decorrelated input signal as the following equation:
H(n+l) = H(n) + p X(n) g(n)
where X(n) is a L x p matrix defined as:

399
X(n) = [xL(n), xL(n-1), ...., x~(n-p+l)]

where xL(n) = [x(n), x(n-l), ....., x(n-L+1)IT


p, p, and L denote respectively the step size, the projection order, and the length of
the filter H(n).
The decorrelation filter vector g(n) is calculated as:
g(n) = (X(n)TX(n) + 61)-’e(n)
e(n) = [y(n), y(n-1), ...., y(n-p+l)lT- X(n)TH(n)
6 denotes a small positive number for initializing and regularizing the covariance
matrix X(n)TX(n) [5].
Fast versions of the AP algorithms have been derived [17] [18], resulting in a
multiplicative complexity of about 2L + p2 for the most efficient versions [19]. The
reduction of computational complexity of the AP algorithm is achieved by the
recursive update of the decorrelation filter vector g(k) and the employment of a
filter z(n) an approximation of H(n) [5] [17]. In this paper, we propose the use of a
simplified FAP algorithm that is introduced by [l8] and [19]. This algorithm is
described in List 1.

List 1: Fast Affine Projection Algorithm [ 181, [ 191

z(k-l)=z(k)+XL(k-p-l)Sp(k)

Subscripts of vectors represent the number of elements.


a
P
The Affine Projection algorithms implicity require the inversion of a covariance
matrix of dimension p (the projection size, with p << L in acoustic echo
cancellation) [3]. The correlation matrix R, is approximated as a Toeplitz matrix.
The inversion of a Toeplitz matrix is dominated by Ob2) operations [4]. Filter
adjustment and convolution needed 2L operations.

400
3. HARDWAREYSOFTWARE IMPLEMENTATION
By using embedded processors and implementing tasks as programs, a purely
software implementation requires a shorter design time and a lower cost compared
to other implement,ations. This solution also makes easier the maintainability,
Depending on the specification and on the system operating environment, a
software solution can range from a simple microprocessor to a multiprocessor
configuration. Unfortunately, software implementation does not always satisfy all
requirements, in particular, the timing performance. Hardware implementations
produce faster solutions which are generally more expensive. Moreover they
require a longer design time, but is more efficient if we consider the computation
rapidity, the circuit area and the power consumption criteria. It allows an
appropriate choice of the architectural elements in order to achieve the
computational task with minimum loss of time. This solution appears to be the only
realistic possibility for implementing real-time learning neural networks [8]. In
order to optimmazes the performance/cost factor, we propose in this paper the use
of a mixed hardwarehoftware implementation.
A. The Software IModule
For execute the program code of the FAP algorithm, a fixed point DSP
implementation has been selected. This software implementation of the FAP
algorithm is a way of more fully exploiting the flexibility of the DSP. In fact, an AP
algorithm can be considered as a generalization of both Block RLS (BRLS) and
NLMS algorithms. If p = 1 or p = L, it is expected to be equivalent to the NLMS
and BRLS algorithmis respectively [16].
However, the FAP algorithm contains division operations, which results in
requiring more operational DSP cycles than it appears. This is because one division
requires more than 20 cycles whereas multiplicatiordaccumulation (MAC)
operation requires 1 cycle. To provide necessary amount of computation, we
propose to implement the division operations in the hardware module.
B. The Hardware Module
As shown in Figure 3, the hardware module is composed by a set of neural
processors and a global controller. Also, there are three bus, one for transmitting
instructions, two for broadcasting with a bandwidth of 8 bits (OUT and IN), and
two for inter-processors communications with a bandwidth of 2 bits [20].

Neural Neural
Global Processor
Conlroller

Bus OUT

Figure: 3 : Global Architecture of the Hardware Module


Each neural processor is based on three functional units : the memory unit, the
processing unit and the control unit (Figure 4).

401
1. The memory unit : The large number of connections between the neurons is
a severe drawback for the implementation of a neural network on silicon. To
resolve this problem, we propose the use of a delay line with optimized area as
memorisation elements. This is possible in our case, because the algorithm allows a
sequential access to the input samples and the weights. One delay line is a dynamic
memory with sequential access and high density [21]. It is a way to increase the
hardware density. The delay line allows the execution of a read-write cycle in one
clock cycle, which enables the operating unit to be used at maximum rate.
2. The processing unit : The proposed processing unit is made up of two
multiplier, one for computing the potential of the neuron and the other for updating
the weight vector. There is a direct path from the multiplier to the adder that allows
for the neural processor to be operated in Vector Mode where a simultaneous
multiply and accumulate can occur in one clock cycle. The data path is connected
to a dynamic delay lines with sequential access (Figure 4).

Figure 4 : Specific architecture of a Neural Processor


3. The control unit : The control unit controlled the computation of the
potential, the state of the neuron and the data transfer with the other neurons. the
controller unit generates, in response to the clock pulses, the commands for the
memories, multiplexors, and registers. To ensure a correct synchronisation, the
operating unit registers are sensible to the rising edge clock pulse while the
memorisation elements are sensible to the falling edge. This control unit is a state
machine, it is automatically synthesized using logic synthesis tools such as Galileo.

402
4. THE COMMUNICATION MODEL
In the system architecture, the communication between the hardware module
and the DSP that may operate on different clock rates is based on sending and
receiving data from each other via communication channels. This communication
may be performed in half or full-duplex modes. The channel communication
protocol that we use is based on FIFO communication semantics while performing
the block oriented communication [22]. The main units of our communication
model are a FIFO controller and two FIFO buffers for control, and input-output
data, to ensuring bi-directional communication. The FIFO buffer states (full, or
empty) are transparent to the controller via the empty, and full status signals as
shown in figure 6 .

Figure 6 : Block Diagram of the Communication Unit

403
After verifying the functionality correctness the communication units (FIFO
buffers, and FIFO controller) are synthesized together with logic synthesis tool,
Galileo in order to reach the gate level structure of the full communication model.
The logic synthesis tool starts with two kinds of information : an RTL specification
given in VHDL and a functional unit library, which can include complex functional
units. Synthesis results of the communication unit are achieved for Xilinx 4000
output technology [23]. We have chosen area optimization during synthesis
operation. The schematic synthesis result is presented in figure 7.

Figure 7 : A schematic synthesis results from Galileo of the Communication Unit

5. CONCLUSION
An Acoustic Echo Canceller based on Neural Networks and a FAP algorithm
has been introduced. This structure allows a large set of trade-offs between
convergence rate, residual error, tracking capacity, and arithmetic complexity.
Hence, the proposed structure has the potential for solving another difficult
nonlinear adaptive signal processing tasks ; such as system identification where
nonlinearity and nonstationarity are both important factors. To investigate the
feasibility of practical implementation, a Hardware/SoRware implementation is
discussed. The use of VHDL and logic synthesis provides flexibility in the
implementation of the proposed AEC. Modification of the design only involves
changes in the text code of the VHDL source.
References
[ I ] E. Hansler, "The- Hands-Free Telephone Problem : A second annotated
bibliography update", in Int. Workshop Acoustic Echo Noise Control, 1995.
[2] S. M. Kuo, J. Chen, "Analysis of finite length acoustic echo cancellation
system", Speech Communication, 16, 1995, pp. 255-260.
[3] M. Montazeri "Une famille d'algorithmes adaptatifs comprenant les algorithmes
NLMS et U S , Application a l'annulation d'echo acoustique", Thkse,
Universite de Paris-Sud, France, Sept. 1994.

404
[4] S. Haykin, "Adaptive filter theory", Printice Hall, Toronto, 1991.
[5] K. Ozeki, T. Umeda, "An Adaptive Filtering Algorithm Using an Orthogonal
Projection to an Affine Subspace and Its Properties", Electronics
Communication in Japan, Vol. 67-A, No 2, 1984.
[6] A. N Birkett, R. A Goubran, "Nonlinear loudspeaker compensation for hands-
free acoustic echo cancellation", Electronics Letters, Vol. 32, No 12, 1996.
[7] F.X.Y. Gao, W.M. Snelgrove, "Adaptive Linearization of A Loudspeaker",
ICASSP, 1991.
[SI D. Hammerstrom, "A VLSI Architecture for High-Performance, Low-Cost, On-
Chip Learning", IJCCN, February 28, 1990, pp. 11.537-544.
[9] Model Technology, "V-SystedWindows, User's Manual: VHDL" , June 1995.
[ 101 Galileo: HDL Synthesis Manual, Atlantic, 1995.
[ 111 A. N. Birkett, R. A. Goubran, "Acoustic Echo Cancellation Using Neural
Networks Struclures", IEEE ICASSP'95.
[I21 Y. Haneda, h;. Makino, Y. Kaneda, "Common Acoustical Pole and Zero
Modelling of Room Transfer Functions", IEEE Transactions on Speech and
Audio Processing, Vol. 2, No 2, April 1994, pp. 320-328.
[13] B. Cohen, D. Saad, E. Marom, "Efficient Training of Recurrent Neural
Network with Time Delays", Neural Networks, Vol. 10, N"1, 1997.
[ 141 Kumpati S. Narendra and Kannan Parthasarathy, "Identification and Control
of Dynamical Systems Using Neural Networks", IEEE Transactions on Neural
Networks, Vol. 1, N"1, March 1989, pp. 4-27,.
[ 151 D. R. Hush and B. G. Home, "Progress in Supervised Neural Networks", IEEE
Signal Processing Magazine, 10, (l), pp. 8-39, 1993.
[16] J. Benesty, IF. Amand, A. Gilloire and Y. Grenier, "Adaptive filtering
algorithms for stereophonic acoustic echo cancellation", ICASSPP5.
[ 171 S. L. Gay, S. 'I'avathia, "The Fast Affine Projection Algorithm", ICASSP'95
[ 181 M. Tanaka, Y , Kaneda, S. Makino and J. Kojima, "Fast Projection Algorithm
and Step Size Control", ICASSP'95, pp. 945-948.
[I91 S. Oh, D. Linebarger, B. Priest, and B. Raghothaman, "A Fast Affine
Projection Algorithm for an Acoustic Echo Canceller using a Fixed-point DSP
Processor", ICASSP'97, pp. 4121-4124.
E201 D. Mueller, D. Hammerstrom, "A neural network systems component", Inter.
Conf. on Neural Net., 1993.
[21] A Elhalwani, P. Le Scan, "VLSI Architecture of the Generalized Multi Delay
Frequency-Domain Algorithm for Acoustic Echo Cancellation", ICASSP'95
[22] A. Baganne, J . L. Philippe, E. Martin, "A CO-design Methodology for
Telecommunication Systems: A Case Study of an Acoustic Echo Canceller",
1997 IEEE Workshop on Signal Processing Systems, SiPS'97, pp. 273-282.
[23] XILINX, "The Programmabe Logic Data Book", 1994.

405

You might also like