You are on page 1of 77

Compression Techniques For Digital Hearing Aids

Garini Nikoleta
September 15, 2009

Preface
This thesis is based upon studies conducted during October 2008 to August 2009 at
the Department of Electrical and Computer Engineering of the University of Patras. It
deals with some basic issues related to Digital Hearing Aids and more specifically, with the
matter of compression in hearing aid devices.
There has been explosion in the number of digital hearing aids on the market in the
last five years. At last count, there are 22 manufacturers with digital hearing aids marketed
under 40 different model names. Manufacturers are moving toward their third or fourth
generation of digital products.
The first chapter is a general introduction to hearing aids. It refers briefly to the human
auditory system and the exact problems faced by people with hearing impairment. It also
presents the underlying theory behind compression and its major role in decreasing the range
of sound levels in the environment to better match the dynamic range of a hearing-impaired
person. Compression systems are used to achieve specific aims and different compression
parameters are needed for each rationale.
Chapter 2 contains different approaches for frequency compression. Some of them are
Multiband Compression, Wide Dynamic Range Compression and Output Limiting Compression. The classic frequency-domain compression uses FFT processing and the ideal and
practical FFT systems are described. In order to approximate the non-uniform frequency
resolution of the human auditory system, warped compression systems are used for speech
enhancement.
Chapter 3 is dedicated to the theory of Multirate Filter Banks and the Polyphase Decomposition as an efficient way of implementing them. A different prototype filter design is
thoroughly described and is proposed since it provides a minimum combined approximation
error.
Chapter 4 explains the approximation of the time-domain post filter with gain coefficients being adapted at the frequency domain by an allpole filter of lower degree. A way of
eliminating sharp zeros in the filters frequency response is suggested and simulation results
provide us an evaluation of the proposed technique.
Appendices A, B, C and D serve as reference and provide Matlab code and some
useful proofs and derivations.

Acknowledgements
This master thesis was successfully completed during my graduation at the inter-departmental
program Signal Processing & Communication Systems of the Department of Computer
Engineering and Informatics at the University of Patras. Its main target is to enhance and
evaluate some compression techniques applied to Digital Hearing Aids.
I am deeply thankful to my Professor George Moustakides for his advice, for his unique
support and for the pleasant environment he has offered me at the Department of Electrical
and Computer Engineering at the University of Patras. His enthusiasm about the project
and his experience helped me to copy with issues in Digital Signal Processing that seemed
to me difficult at first.
The master thesis evaluation was performed by Nikolaos P.Galatsanos, Professor at
the Department of Electrical and Computer Engineering at the University of Patras, and
Professor Emmanouil Psarakis of the Department of Computer Engineering and Informatics
at the University of Patras.
Garini Nikoleta
Patras, 2009

Contents
1 Introduction to Hearing Aids
1.1 Description of Human Auditory System and Acoustic Measurements
1.1.1 Cochlear Tuning and Frequency Selectivity . . . . . . . . . .
1.1.2 Linear Amplifiers and Gains . . . . . . . . . . . . . . . . . . .
1.1.3 Sound Pressure Level and Absolute Threshold of Hearing . .
1.2 Problems Faced by Hearing-impaired People . . . . . . . . . . . . . .
1.3 Compression In Hearing Aids . . . . . . . . . . . . . . . . . . . . . .
1.3.1 Compressions major role: Reducing Signals Dynamic Range
1.3.2 Basic Characteristics of a Compressor . . . . . . . . . . . . .
1.3.3 Rationales for use of Compressors . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

8
9
10
12
12
14
16
16
17
19

2 Approaches for Compression in FrequencyDomain


2.1 Multiband Compression and FFT Processing . . . . .
2.2 FrequencyDomain Compression . . . . . . . . . . . .
2.2.1 Describing Ideal and Practical FFT system . .
2.2.2 SideBranch Architecture . . . . . . . . . . . .
2.3 Warped Compression System . . . . . . . . . . . . . .
2.3.1 Concerns in designing Compression Systems . .
2.3.2 Digital Frequency Warping . . . . . . . . . . .
2.3.3 Compressor using frequency warping . . . . . .
2.4 Warped LowDelay Post Filter . . . . . . . . . . . . .
2.4.1 Warped Post Filter for Speech Enhancement .
2.4.2 Warped Low Delay Post Filter . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

22
22
23
24
25
26
26
27
28
29
29
31

.
.
.
.
.
.
.
.
.
.
.

32
32
33
35
35
36
36
38
42
42
43
45

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

3 Filter Banks and Prototype Filter Structures


3.1 Multirate Systems and Filter Banks . . . . . . . . . . . . . . .
3.2 Uniform and Non-Uniform DFT Filter Banks . . . . . . . . . .
3.3 Polyphase Representation of Filter Banks . . . . . . . . . . . .
3.3.1 Basic concept of Polyphase Decomposition . . . . . . .
3.3.2 Why the name Polyphase Decomposition? . . . . . .
3.3.3 Polyphase Implementation of Uniform DFT filter banks
3.4 Efficient Non-Uniform Filter Bank Equalizer . . . . . . . . . . .
3.5 Prototype Filter Design . . . . . . . . . . . . . . . . . . . . . .
3.5.1 Different Parameters of prototype filter . . . . . . . . .
3.5.2 Realization of different prototype filter structures . . . .
3.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

4 Low Delay Time-Domain Post Filter


4.1 Uniform Auto-Regressive Low-Delay Post Filter . . . . . . .
4.2 Allpass transformed Auto-Regressive Low-Delay Post Filter
4.2.1 Approximation of the warped post filter . . . . . . .
4.2.2 Approximation of the uniform post filter . . . . . . .
4.3 Simulation Results . . . . . . . . . . . . . . . . . . . . . . .
4.4 Low Delay FIR Filter Design . . . . . . . . . . . . . . . . .
4.5 Elimination of Deep Nulls . . . . . . . . . . . . . . . . . . .
4.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

46
46
50
50
51
54
55
61
62

A The DFT and IDFT Matrices

64

B Solve min-max problem using Linear Programming

65

C Proofs & Derivations

67

D Matlab code

69

List of Figures
1.1
1.2
1.3
1.4
1.5

The anatomy of the peripheral auditory system [3]. . . . . . . . . . . . . . .


Frequency Threshold Tuning curves. . . . . . . . . . . . . . . . . . . . . . .
Tuning curves for five neurons [6]. . . . . . . . . . . . . . . . . . . . . . . .
Results of a masking experiment [7]. . . . . . . . . . . . . . . . . . . . . . .
Human auditory thresholds as a function of frequency. Sounds that fall in
the shaded region below the curve are below threshold and therefore inaudible.
1.6 Saturation Sound Pressure Level Frequency Response of a hearing-aid [1]. .
1.7 Audiogram with different speech sounds. . . . . . . . . . . . . . . . . . . . .
1.8 Decreased Dynamic Range for hearing-impaired people. . . . . . . . . . . .
1.9 Input/Output curves showing effects of Output Limiting Compression(left)
and Wide Dynamic Range Compression(right). . . . . . . . . . . . . . . . .
1.10 The effects of a compressor on a signal. Only the middle portion of the input
is above the compressors threshold. Note the overshoot when the signal level
increases (it takes some time for the gain to decrease), and the attenuation
when the input signal returns to the first level (and the gain increases). The
release time is generally longer than the attack time. . . . . . . . . . . . . .
2.1
2.2
2.3
2.4
2.5
2.6
2.7
3.1
3.2
3.3
3.4
3.5
3.6
3.7
3.8
3.9

Block diagram of a multi-channel compression system. . . . . . . . . . . . .


Block diagram of an ideal frequency-domain compression system using 128point FFT and sampling rate 16kHz. . . . . . . . . . . . . . . . . . . . . .
Side-branch compression architecture. . . . . . . . . . . . . . . . . . . . . .
Group delay in samples for a single all-pass filter having the warping parameter a=0.5756 [12]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Block diagram of a compression system using frequency warping for both
frequency analysis and filtered signal synthesis. . . . . . . . . . . . . . . . .
Diagram of a warped post-filter for speech enhancement. . . . . . . . . . . .
Diagram of a warped low-delay post-filter for speech enhancement. . . . . .
Block Diagram of an M-band analysis-synthesis filter bank. . . . . . . . . .
Typical filter responses of digital filter banks. . . . . . . . . . . . . . . . . .
The simplest example of a uniform-DFT filter bank. . . . . . . . . . . . . .
Schematic of the relation between h(n) and its l-th polyphase component. .
A prototype lowpass response of an M-th band filter. . . . . . . . . . . . . .
Implementation of the uniform DFT bank using polyphase decomposition. .
Polyphase decomposition of the uniform DFT filter bank with decimation by
a factor of M. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Polyphase network (PPN) realization of a DFT analysis-synthesis filter bank
for a prototype filter of length L + 1 = 2M. . . . . . . . . . . . . . . . . . .
System of filter-bank equalizer. . . . . . . . . . . . . . . . . . . . . . . . . .

9
10
11
11
12
13
13
15
17

18
23
24
25
27
28
30
31
32
33
34
36
36
37
38
39
40

3.10 Filter-bank summation method for time-varying spectral gain factors Wi (k 0 )


adapted at a reduced sampling rate. . . . . . . . . . . . . . . . . . . . . . .
3.11 Polyphase network implementation of the FBE for the direct-form filter. . .
4.1
4.2

Approximation of a uniform FIR filter by a uniform AR filter. . . . . . . . .


Basic concept of approximation: the two signals y and e must have the same
statistical characteristics. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3 Approximation of the uniform post-filter by a warped AR filter. . . . . . . .
4.4 Network for calculation of the (P + 1)-warped impulse autocorrelation coefficients hs hs (). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.5 Approximation of the uniform time-domain post-filter by an allpass transformed AR filter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.6 Magnitude Response in dB and approximation of an FIR filter by a uniform
AR filter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.7 Magnitude Response in dB and approximation of a uniform FIR post-filter
by an allpass transformed AR filter. . . . . . . . . . . . . . . . . . . . . . .
4.8 Magnitude Response in dB and approximation of a warped FIR post-filter
by a warped AR filter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.9 Introduction of an all-pole filter so as to eliminate deep nulls in FIR filter
frequency response. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.10 Magnitude and Phase Response of a filter with one sharp zero at a specified
frequency. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.11 Magnitude and Phase Response of a filter after the elimination of its sharp
zero. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

40
43
46
49
51
53
54
55
56
57
61
62
63

List of Tables
3.1
3.2
3.3

The maximum combined error for 32 frequency bands. . . . . . . . . . . . .


The maximum combined error for 64 frequency bands. . . . . . . . . . . . .
The maximum combined error for different compression factors Mc . . . . . .

44
44
45

4.1

The maximum approximation error for different filter lengths in symmetrical


and non-symmetrical case. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

58

Chapter 1

Introduction to Hearing Aids


Hearing aids are devices that partially overcome auditory deficits and are normally employed
to compensate for hearing-loss in hearing-impaired people. The main objective of a hearing
aid is to fit the dynamic range of speech and everyday sounds into the restricted dynamic
range of the impaired ear. In order to achieve a better understanding of this device and its
function, we need to explain how the sound is perceived by the human auditory system and
which exactly the problems encountered by hearing-impaired people are.
Some sounds are totally inaudible and others can be detected because part of their
spectra is audible,but may not be correctly identified because other parts of their spectratypically those parts at high frequencies-remain inaudible.The range of levels between the
weakest sound that can be heard and the most intense sound that can be tolerated is less
for a person with hearing impairment than for a normal listener.To compensate for this,
hearing aids amplify weak sounds more than they amplify intense sounds [1].
The most common type of hearing loss is sensorineural hearing loss in which the root
cause lies in the vestibulocochlear nerve, the inner ear or central processing centers of the
brain. People with sensorineural hearing loss usually experience abnormal perception of
loudness named loudness recruitment, i.e. a slight increase in sound intensity above the
threshold of hearing can be unbearably loud for them but the very-low-intensity sounds
are inaudible. Sensorineural impairment diminishes the ability of a person to detect and
analyze energy at one frequency in the presence of energy at other frequencies. Similarly,
a person with hearing loss has decreased ability to hear a signal that rapidly follows,or is
rapidly followed by a different signal. This decreased frequency and temporal resolution
makes it more likely for a hearing-impaired person that noise will mask speech.
In order to avoid such difficulties, different types of compression hearing aids are therefore suggested. The compression algorithm is a system dependent characteristic since the
core of used hearing aid forces the set of allowed algorithms. Apart from compression, the
main parameters to program are the Noise Reduction techniques and Feedback Cancellation algorithms. Noise Reduction is an important stage in the hearing aid signal processing
since hearing-impaired people have to understand speech with background noise. The third
problem to solve is Feedback Cancellation in order to fit the hearing aid to the patient. This
phenomenon is produced when the sound goes from the loudspeaker to the microphone. It
often causes the hearing aid howl and limits the maximum gain that can be used without
instability, reducing the sound quality when the gain is close to the limit.

CHAPTER 1. INTRODUCTION TO HEARING AIDS

1.1

Description of Human Auditory System and Acoustic


Measurements

In order to hear a sound, the auditory system must accomplish three basic tasks. First it
must deliver the acoustic stimulus to the receptors; second, it must transduce the stimulus
from pressure changes into electrical signals; and third, it must process these electrical
signals so that they can efficiently indicate the qualities of the sound source such as pitch,
loudness and location. The human ear can be divided into three fairly distinct components
according to both anatomical position and function: the outer ear, which is responsible
for gathering sound energy and funnelling it to the eardrum, the middle ear which acts as
a mechanical transformer and the inner ear where the auditory receptors (hair cells) are
located [2]. The Fig. 1.1 shows a detailed aspect of anatomy of the human ear:

Figure 1.1: The anatomy of the peripheral auditory system [3].


Sound waves enter the auditory canal, travel through it and hit the tympanic membrane
(eardrum). This wave information travels across the air-filled middle ear cavity via a series of
delicate bones (malleus,incus and stapes) which convert the lower-pressure eardrum sound
vibrations into higher-pressure sound vibrations at the oval window. Higher pressure is
necessary because the inner ear beyond the oval window contains liquid rather than air.
Consequently, the sound information is converted from waveform to nerve impulses in the
cochlea.
In the inner ear, the cochlea is a tube coiled up into a spiral, divided along its length by
two membranes: Reissners membrane and the basilar membrane. It has three fluidfilled
sections: scala media with the Organ of Corti which transforms mechanical waves to electric
signals in neurons, scala tympani and scala vestibuli [4]. The Organ of Corti has hair cells
which are columnar cells with a bundle of 100200 specialized mechanosensors for hearing
at the top (cilia) that transform the fluid waves into nerve signals. Atop the longest cilia
rests the tectorial membrane which moves back and forth with each cycle of sound, tilting
the cilia and allowing electric current into the hair cell. The sound information travels down
the vestibulocochlear nerve and is further processed until it eventually reaches the thalamus
from where it is relayed to the cortex.

CHAPTER 1. INTRODUCTION TO HEARING AIDS

1.1.1

10

Cochlear Tuning and Frequency Selectivity

Hair cells are the sensory receptors of both the auditory and the vestibular systems and
transform mechanical energy into neural signals. They are mainly classified as inner-hair
cells and outer-hair cells which are over three times more numerous and affect the response
of the basilar membrane. Mechanical properties of the basilar membrane affect the way it
responds to sounds of different frequencies.
It is known that the location of the peak of the traveling wave on the basilar membrane
is determined by the frequency of the originating sound. When a certain frequency sound
stimulates a point on the membrane, it responds by moving and hair cells at that site are
stimulated by the force that this movement creates. Therefore, groups of hair cells only
respond if certain frequencies are present in the originating sound [5].
Each place on the basilar membrane is tuned to s particular characteristic frequency. As
a whole, the basilar membrane behaves as a bank of over-lapping bandpass filters (auditory
filters). In this way, it extracts quite detailed information about the spectral decomposition
of sounds and performs a partial spectral/Fourier analysis of the sound, with each place
on it being most sensitive to a different frequency component. The frequency sensitivity of
a hair cell can be displayed as a tuning curve and the phenomenon is know as cochlear
tuning.
Frequency Threshold Tuning curves can be obtained by finding the level of a pure tone
required to produce a justmeasurable increase in the firing rate of a neuron, as a function
of frequency of the pure tone. These curves are equivalent to the tuning curves on the
basilar membrane, they are characteristically V-shaped as shown is Fig. 1.2 and their peak
represents the frequency at which the cell is most sensitive:

Figure 1.2: Frequency Threshold Tuning curves.


The closer the frequency of the tone to the characteristic frequency of the neuron, the
lower is the level required. The important point is that frequency selectivity in the auditory
nerve is very similar to that on the basilar membrane, since each nerve fiber innervates a
single inner hair cell. Because of the difficulties involved in measuring the vibration of the
basilar membrane directly, most of our knowledge derives from auditory nerve recording.
An example is illustrated in Fig. 1.3 which shows frequency threshold tuning curves recorded
from the auditory nerve of a chinchilla (by Ruggero and Semple,1992). Five curves are shown
depicting the tuning properties of five neurons with characteristic frequencies ranging from
about 500Hz to 16kHz. The tuning curves for those five neurons are plotted on a linear and
a logarithmic axis.
Frequency Selectivity is one of the most important topics in hearing, because the
nature of auditory perception is largely determined by the ears ability to separate out the

CHAPTER 1. INTRODUCTION TO HEARING AIDS

11

Figure 1.3: Tuning curves for five neurons [6].


different frequency components of sounds. It refers to the ability of the auditory system to
resolve the components of sinusoidal waves in a complex sound. Frequency selectivity can
be measured at all stages of the auditory system from the basilar membrane to the auditory
cortex, as well as in our perceptions.
The perception of a sound depends not only on its own frequency and intensity but also
on other sounds present at the same time. For example, typical classroom sounds, created
by movement, coughing, rustling of papers, make the instructors voice difficult to hear.
This phenomenon is called masking. Technically speaking, masking is defined as the rise in
threshold of one tone (test tone) due to the presence of another (masker) tone. It is known
that a signal is most easily masked by a sound having frequency components close to those
of the signal. This led to the idea that our ability to separate the components of a complex
sound depends on the frequency-resolving power of the basilar membrane. It also led to the
idea that masking reflects the limits of frequency selectivity and provides a way to quantify
it. Fig. 1.4 illustrates the results of a masking experiment. The line indicates the amount
that the threshold is raised in the presence of a masking noise centered at 410Hz. So for a
410Hz tone, the threshold is raised by about 60 dB above absolute threshold.

Figure 1.4: Results of a masking experiment [7].


As far as the perception of intensity is concerned, the human ear has incredible absolute
sensitivity and dynamic range. The most intense sound we can hear without immediate
damage to the ear is at least 140 dB above the faintest sound we can just detect. This
corresponds to an intensity ratio of 100.000.000.000.000:1. The Absolute Threshold is the
smallest value of some stimulus that a listener can detect. In order to investigate our

CHAPTER 1. INTRODUCTION TO HEARING AIDS

12

perceptual capabilities, it is useful to generate an absolute threshold curve, which relates


the frequency of a signal to the intensity at which it can be detected by the ear. Fig. 1.5
shows a plot of the thresholds of hearing for a range of frequencies.

Figure 1.5: Human auditory thresholds as a function of frequency. Sounds that fall in the
shaded region below the curve are below threshold and therefore inaudible.
The smallest detectable change in intensity which is a matter of Intensity Discrimination
is measured using a variety of psychophysical methods and various stimuli. Although the
difference threshold depends on several factors including duration, intensity and the kinds
of stimuli on which the measurement is made, Webers law holds for most stimuli. In other
words, the smallest detectable change is a constant fraction of the intensity of the stimulus.

1.1.2

Linear Amplifiers and Gains

Amplifiers inside hearing aids can be classified as linear and nonlinear. Linear amplifiers
multiply the input signal by a fixed amount despite its magnitude. The behavior of a
linear amplifier is not affected by how many signals it is amplifying at the same time. For
example, if signal C is amplified by 30dB when it is the only signal present in the input,
then it will still be amplified by 30dB even when several other signals are simultaneously
being amplified by the device [1].
The gain of any device relates the amplitude of the signal coming out of the device to
the amplitude of the signal going into the device. Gain is thus calculated as the output
amplitude divided by the input amplitude or as the Output Level minus the Input Level
expressed in dB SPL(Sound Pressure Level). To fully describe the gain of a linear amplifier,
it is necessary to state its gain at every frequency within the frequency range of interest.
This is referred to as the gainfrequency response or gain curve. Thus, the degree of
amplification is represented as a graph of Gain versus frequency(GainFrequency Response)
or a graph of Output Level versus Input Level (IOcurve) which shows the dependence of
Output Sound Pressure Level on Input Sound Pressure Level for a particular signal or
frequency. It should be noted that the highest level produced by a hearing aid is know as
Saturation Sound Pressure Level (SSPL).

1.1.3

Sound Pressure Level and Absolute Threshold of Hearing

All amplifiers become nonlinear when the input or output signals exceed a certain level.
This happens because amplifiers are unable to handle signals larger than the voltage of the

CHAPTER 1. INTRODUCTION TO HEARING AIDS

13

battery that powers the amplifier. As with gain, the SSPL varies with frequency and a
useful measure is the SSPL Response curve.

Figure 1.6: Saturation Sound Pressure Level Frequency Response of a hearing-aid [1].
The Absolute Threshold of Hearing is the minimum sound level of a pure tone that
an average ear with normal hearing can hear in a noiseless environment. It relates to the
sound that can just be heard by the organism and it is not a discrete point and is therefore
classed as the point at which a response is elicited a specified percentage of the time. It is
expressed in dB SPL and can be measured using psychological methods.
For a hearingimpaired person, threshold of hearing is different from that of a normal
listener and a way to determine the hearing loss is Acoustic Audiogram. An Audiogram is
a chart depicting hearing test results:

Figure 1.7: Audiogram with different speech sounds.


A hearing testwhich is performed by an audiologist in a sound insulated room determines a persons hearing sensitivity at different frequencies and diagnoses the exact type
of hearing loss. Generally, prolonged exposure to excessive sound levels may cause hearing
defects. There are two types of hearing loss in specific:
Sensorineural hearing loss occurs when the hair cells of cochlea are damaged or worn
out. Typical causes are the aging process and excessive exposure to noise. This is the

CHAPTER 1. INTRODUCTION TO HEARING AIDS

14

most common type of hearing loss and as yet there is no cure, though hearing aids
can help.
Conductive hearing loss occurs when the sound is not being transmitted through the
ear canal and middle ear to the inner ear. Common causes are wax in the ear canal,
fluid in the middle ear or damage to the middle ear bones. This type of hearing loss
can often be successfully treated with medication or surgery.

1.2

Problems Faced by Hearing-impaired People

The following problems are those that are mainly related to the most common type of
hearing loss, sensorineural hearing loss [1]:
Decreased Audibility
While hearing-impaired people do not hear some sounds at all, people with a severe
hearing loss may not hear any speech sound unless they are shouted and those with a
moderate loss are more likely to hear some sounds and not others. Particularly, softer
phonemesusually consonantsmay not be heard i.e. the sequence of sounds i e a a r
might have originate as pick the black harp and could have been heard as kick the
cat hard. For people with hearingimpairment, essential parts of some phonemes are
not audible and they recognize sound by noting which frequencies contain the most
energy. In general, the high-frequency components of speech are weaker than the low
frequency components. Thus, hearingimpaired people usually miss highfrequency
information.
To overcome these problems, a hearing aid has to provide more amplification at frequencies where speech has the weakest components (usually high frequencies). Hence,
hearing aids provide different amount of gain in different frequency regions.
Decreased Dynamic Range
Unfortunately, it is not always appropriate to amplify soft sounds by the amount
needed to make them audible. Sensorineural hearing loss increases the threshold of
hearing much more than the threshold of loudness discomfort. The dynamic range of
the ear is the level difference between discomfort and threshold of audibility and it
is less in case of a hearingimpaired person. Reduced dynamic range for people with
hearing loss is depicted in Fig. 1.8.
If the sounds of the environment are to fit within the restricted dynamic range of a
patient, then the hearing aid must amplify weak sounds more than it amplifies intense
sounds. This is the main target of compression.
Decreased Frequency Resolution
People with sensorineural loss deal with the difficulty of separating sounds of different
frequencies which are represented at different places within the cochlea. Decreased
frequency resolution of these people is due to the loss of the ability of outer hair cells
to increased sensitivity of the cochlea for tuning frequencies (frequencies at which the
affected part of the cochlea is tuned). The essence is that even when a speech and a
noise component have different frequencies, which are close enough, the cochlea will
have a single broad region of activity rather than two finely tuned separate regions.
The brain is thus unable to untangle the signal from the noise.

CHAPTER 1. INTRODUCTION TO HEARING AIDS

15

Figure 1.8: Decreased Dynamic Range for hearing-impaired people.


If frequency resolution is sufficiently decreased,relatively intense lowfrequency parts
of speech may mask the weaker higher frequency components. This is referred to as
upward spread of masking and a prescribed hearing aid will minimize its amount by
making sure that there is no frequency region in which speech is much louder than
for the remaining regions.
Even normalhearing people have poorer resolution at high intensity levels than at
lower levels. The difficulty of hearingimpaired ones in separating sounds is partly
caused by the damaged hair cells in the cochlea and partly by their need to listen
at elevated level. In general, frequency resolution gradually decreases as hearing loss
increases.
Some ways to minimize problems caused by decreased frequency resolution are noise
reduction at the hearing aid system, usage of directional microphones to focus on
wanted sounds and suppress those coming from unwanted directions and lastly, appropriate variation of gain with frequency so that lowfrequency parts of speech or
noise will not mask highfrequency parts.
Decreased Temporal Resolution Weaker sounds may often be masked by intense
sounds that immediately precede or follow them and this affects speech intelligibility.
As hearing loss gets worse, the ability to hear weak sounds during brief gaps in a
more intense masker gradually decreases. To compensate for this decreased temporal
resolution, hearing aids perform FastActing Compression where the gain is rapidly
increased or decreased during weak or intense sounds respectively. The main disadvantage of this method is that unwanted weak background noises are made more audible.
Combining all the above mentioned deficits which cause hearing loss, we conclude that
all together can cause severe reduction in intelligibility. A hearingimpaired person
needs a better Signal-to-Noise-Ratio (SNR) than that of a normal listener, if both are
to understand the same amount of speech. More specifically, the average SNR deficit
associated with a moderate hearing loss is estimated to be about 4dB while in case of

CHAPTER 1. INTRODUCTION TO HEARING AIDS

16

a severe hearing loss is about 10dB.

1.3

Compression In Hearing Aids

Compressions major role is to decrease the range of sound levels in the environment so as to
better match the dynamic range of a person with hearing impairment. The compressor may
be most active at low, mid or high sound levels or it may vary its gain across a wide range
of sound levels, in which case it is known as Wide Dynamic Range Compressor (WDRC).
A compressor can react to a change in Input Level within only a few thousands of a second
or it can be as slow as to spend many tens of seconds to fully react. The degree to which a
compressor finally reacts at a change in input level may be represented as an InputOutput
Diagram or as a GainInput Diagram.
Compression may be either linear or nonlinear. For sounds of a given frequency, a linear
compressor amplifies by the same amount no matter what the level of the signal is, or what
other sounds are simultaneously present.In this case, the problem is that intense sounds
become more intense and thus, annoying. The solution is to put a compression threshold
which is the input level above which the compressor and is clearly visible on InputOutput
Diagram.
Another measure which is related to the slope of the curves on IO Diagram or Gain
Input Diagrams is the compression ratio that describes the variation in Output Level that
corresponds to a variation in Input Level.
Benefits of compression can be summarized as follows:
It can make lowlevel speech more intelligible, by increasing gain and hence audibility.
It can make highlevel sounds more comfortable and less distorted.
In midlevel environments, it offers little advantage relative to a wellfitted linear aid,
but once the Input Level varies from this, its advantages become evident.
However, the two most important disadvantages are:
Greater likelihood of feedback oscillation and
excessive amplification of unwanted lowerlevel background noises.

1.3.1

Compressions major role: Reducing Signals Dynamic Range

The rationale for compression is to compensate for the reduced dynamic range found in the
impaired ear and the increased growth of loudness (recruitment) that accompanies hearing
loss. In fact, a compressor is an amplifier that automatically turns its gain down as the
input signal level rises.
There are three basic ways by which the dynamic range of signals can be reduced:
LowLevel Compression where after amplification lower levels come closer together
while the spacing of upper levels is not affected.
Wide Dynamic-Range Compression (WDRC) in which compression is applied more
gradually over a wide range of input levels.
Compression Limiting or HighLevel Compression where lowlevel sounds are amplified linearly, but the inputs from moderate to intense sounds are squashed into a
narrower range of outputs. Its name is due to the fact that the output is not allowed
to exceed a set limit.

CHAPTER 1. INTRODUCTION TO HEARING AIDS

17

Salient features of Output Limiting Compression and Wide Dynamic Range Compression are shown in Fig. 1.9. Output Limiting Compression has two main features: high
compression kneepoint and high compression ratio. On the other hand, WDRC is associated with low compression thresholds (below 55dB SPL) and low compression ratios (less
than 5:1) [8].

Figure 1.9: Input/Output curves showing effects of Output Limiting Compression(left) and
Wide Dynamic Range Compression(right).

1.3.2

Basic Characteristics of a Compressor

A compressor is intrinsically a dynamic device: it changes the gain according to changes in


the input signal level. Sound in the environment is changing constantly in intensity over
time and a compression hearing aid has to respond to these changes. The dynamic aspects
of compression as well as its static compression characteristics must be taken into account.
We refer below to some of its basic characteristics like attack and release times, compression
threshold and compression ratio.
Dynamic Compression Characteristics
The attack and release times are the lengths of time it takes for a compression circuit to
respond to changes in the intensity of an input SPL. Specifically, for any type of compressor
the attack time is the time taken for the compressor to react to an increase in signal level,
while the release time is the time taken to react to a decrease in input level. In Fig. 1.10, we
observe two waveforms that are the input and the output to a compressor and we mention
the attack and release transitions that follow a rise and a decrease respectively in signal
level:
Most attack and release times are set to achieve a best compromise between two undesirable extremes. Times that are too fast will cause the gain to fluctuate rapidly and
this may cause a pumping reception by the listener. Quick attack times (i.e. less than
10ms) prevent sudden, transient sounds from becoming too loud for the listener. In general,
release times need to be longer than attack times to prevent a fluttering perception on
the part of the listener [8].
Although the attack and release times could be made to have extremely short values
(close to zero), the consequences are undesirable. If the release time is too short, the gain
will vary during each voice pitch period and the compressor will thus distort the waveform.

CHAPTER 1. INTRODUCTION TO HEARING AIDS

18

Figure 1.10: The effects of a compressor on a signal. Only the middle portion of the input
is above the compressors threshold. Note the overshoot when the signal level increases
(it takes some time for the gain to decrease), and the attenuation when the input signal
returns to the first level (and the gain increases). The release time is generally longer than
the attack time.
On the other hand, if the attack time is extremely short and the release time long, distortion
will be minimal. However, very brief soundslike clicks will cause a decrease in gain and
the gain will stay low for a long time afterwards. Suitable values for attack times in hearing
aids are usually around 5 ms, while release times are rarely less than 20ms. In addition,
the attack and release times have a major effect on how compressors affect the levels of the
different syllables of speech.
It should be noted that apart from attack and release times, the parameters in a compression system are also the number of frequency channels and the compression ratios in
each frequency band. It may well be that the optimum compressor adjustment is a function
of the type and amount of background noise, interference and the characteristics of individual hearing loss. A serious matter to deal with is to identify different sound environments
for the purpose of adjusting compression or other signalprocessing system parameters.
Static Compression Characteristics
The attack and release times tell us how quickly a compressor operates; we need different
terms to tell us by how much a compressor decreases the gain as level rises. After having
specified these gain changes, we make the assumption that the compressor has had time
to fully react to variations in signal level and thus, we study the static characteristics that
are applicable to signals. The Sound Pressure Level above which the hearing aid begins
compression is referred to as the compression threshold. Another significant characteristic
is the compression ratio which is defined as the change in Input Level needed to produce
a 1dB change in Output Level. Compression Ratios can have any value greater than 1:1
and less than 1:1 is also possible but correspond to dynamicrange expanders rather than
compressors. In hearing aids with WDRC, compression ratios in the range of 1:5:1 to 3:1
are very common.
As far as the DynamicRange Compressor is concerned, it involves several engineering
tradeoffs. It is very important to realize that there is not a unique best compressor design.

CHAPTER 1. INTRODUCTION TO HEARING AIDS

19

Each system involves tradeoffs between processing complexity, frequency resolution, time
delay and quantization noise. The most important processing concerns are the system
frequency resolution and the processing time delay. Most digital compression systems use
multiple frequency bands. For any given processing approach, increased frequency resolution
comes at the price of increased processing delay.
Interaction between static and dynamic aspects of compression
With incoming sounds, the attack/release times of a hearing aid interact with the compression ratio and these interactions affect the sound quality for the listener. Fast attack/release
times have the effect of temporarily reducing the ratio or amount of compression for any
given sound stimulus. In general, a combination of short attack/release times (i.e. 10ms)
and high compression ratios (i.e. 10:1) cause distortion. If the same short attack/release
times are combined with low compression ratios (i.e. 2:1) then the sound quality is not
quite so compromised.
Dynamic and static aspects of compression are found in predictable combinations today.
Syllabic Compression, with its relatively short attack and release times, is mostly often
associated with Wide Dynamic Range Compression hearing aids that have a low compression
threshold (or kneepoint) and low compression ratio of less than 5:1. It is also sometimes
encountered with Output Limiting Compression hearing aids in which thresholds and ratios
of compression are high.

1.3.3

Rationales for use of Compressors

The following section tries to outline several theoretical reasons why compressors should be
included in hearing aids:
Avoiding discomfort, distortion and damage
As the input to the hearing aid increases, its output cannot be allowed to keep on getting
bigger. There are two reasons why the maximum must be limited. Firstly, if excessively
intense signals are presented to the hearing aid wearer, the resulting loudness will cause
discomfort. Thus, this loudness discomfort level which is subjective for each wearer provides
an upper limit to the hearing aid SSPL. Secondly, excessively intense signals may cause
further damage to the aid wearers residual hearing capability.
These two reasons explain why the maximum output must be limited, but this limiting
could be achieved with either peak clipping or compression limiting. The reason for preferring compression limiting over peak clipping in nearly all cases is that peak clipping creates
distortion and even though so does compression limiting, the type of distortion created by
peak clipping is far more objectionable than the type created by compression limiting.
When compression limiting is used to control the SSPL of a hearing aid, a high compression ratio is needed so that the output SPL does not rise significantly for very intense input
levels. The attack time must be short so that gain decreases rapidly enough to prevent
loudness discomfort. As with all compressors, the release time must not be so short that
it starts distorting the waveform. If a hearing aid does not include a compression limiter,
peak clipping will occur once the input signal becomes sufficiently intense. If the hearing aid
contains Wide Dynamic Range Compression, the input level needed to cause peak clipping
may be so high that peak clipping seldom occurs.

CHAPTER 1. INTRODUCTION TO HEARING AIDS

20

Reducing intersyllabic and interphonemic intensity differences


The most intense speech soundslike some vowelsare about 30dB more intense than the
weakest soundslike some unvoiced consonants. For people with reduced dynamic range
even when range is adequate to hear weak phonemes without intense ones being too loud, it
is likely for the weaker phonemes to be temporally masked by the stronger ones. A possible
solution is to include a fastacting compressor and in such a case compression is called
syllabic compression or phonemic compression.
A problem that might appear in fast compression is that it alters the intensity relationships between different phonemes and syllables. In some cases, the hearing aid wearer uses
the relative intensities of sounds to help identify them and thus, even if altering relative
intensities increases their audibility it decreases the intelligibility of some speech sounds.
Another possible problem is the effect of compression on brief weak sounds that follow
closely after sustained intense sounds. If the release time is longer than the gap between
the intense and the weak sound, then gain will still be decreased when the brief weak sound
arrives. Hence, such weak sounds will be less audible than they would be in case of linear
amplification.
The most severe problem of fastacting compression is that if the gain is fast enough to
increase when a soft phoneme occurs, it will also be fast enough in case of pauses between
words.This matters when there exists background noise which is less intense than speech
and the compressor increases its gain during the noise and decreases it during speech. This
disadvantage has to be weighed against the advantages of fastacting compression.
One important observation is that compressors intended to decrease the intensity differences between syllables must have compression thresholds low enough for the compression to
be active across a range of input levels and leave some intensity differences intact, and high
enough to significantly decrease dynamic range. Attack and release times must be short
enough that the gain can vary appreciably from one syllable or phoneme to the following,
but not so short as to create significant amount of distortion to the waveform.
Reducing differences in longterm level
As well as changing the intersyllabic relationships, the fastacting compressor decreases
the mean level difference between the soft and the intense speech. An alternative use of
compression is to decrease the longerterm dynamic range without changing the intensity
relationships between syllables that follow each other closely in time. This is achieved by
using longer attack and release times than the typical duration of syllables.
Normalizing loudness
Normalizing the perception of loudness is possibly the most popular rationale for using
compression. It is known that sensorineural hearing loss greatly affects loudness perception.
The principle of loudness normalization is as follows: For any input level and frequency,
give the hearing aid the gain needed for the wearer to report the loudness to be the same
as that which a person with normal hearing would report. The required amount of gain at
each input level can be deduced by graphs that depict loudness of sounds at different levels.
Loudness can be measured in several ways but only subjectively.
The most common way of achieving loudness normalization is with separate compressors located in each channel of a multichannel hearing aid. Alternatively, a hearing aid
may contain only two channels and have a compressor in only the highfrequency channel.
However, it is possible to combine a compressor with a filter that alters its shape with

CHAPTER 1. INTRODUCTION TO HEARING AIDS

21

input level, so that even a single channel hearing aid can have a leveldependent frequency
response.
Maximizing intelligibility
Multichannel Compression can be used to achieve in each frequency region the amount
of audibility that maximizes intelligibility, subject to some constraint about overall loudness. Although the overall loudness of broadband sounds may be well normalized, such an
approach will result in loudness not being normalized in any frequency region.
Reducing noise
The interfering effect of background noise is the single biggest problem faced by hearing aid
wearers. There are several assumptions made so that compression will decrease the effects
of noise:
Noise usually has a greater lowfrequency emphasis than does speech and thus, the
lowfrequency parts of speech are more likely to be mask and hence convey little
information.
Lowfrequency parts of noise may cause upward spread of masking and so mask the
highfrequency parts of speech.
Lowfrequency parts of noise contribute most to the loudness of noise.
Noise is more of a problem in highlevel environments than in lowlevel environments.
Consequently, if the lowfrequency parts of the noise cause masking and excessive loudness and at the same time, lowfrequency parts of speech do not convey useful information,
then increase of comfort and improvement in intelligibility can be achieved by decreasing
lowfrequency gain in highlevel environments. Hearing aids aimed at noise reduction have
often been marketed as Automatic Signal Processing Devices. An additional benefit of such
devices is that the aid wearers own voice has a greater lowfrequency emphasis and a greater
overall level at the hearing aid microphone than the voice of other people. Consequently,
lowfrequency compression can help give the wearers own voice a more acceptable tonal
quality than would occur for linear amplification.
Although noise reduction discussed so far aims to minimize only lowfrequency noise,
more advanced multichannel hearing aids can decrease noise or signal in any frequency region
where SNR is estimated to be particularly poor. This type of hearing aids estimates SNR
within each channel by taking advantage of the fluctuations in level that are characteristic
of speech, in comparison to the more constant level of background noise. In each channel,
the envelope is analyzed by a speech or nonspeech detector where higher parts are assumed
to represent the peaks of speech signal and lower level parts represent background noise.
The speech/nonspeech detector combines these estimates of signal level and noise level
to estimate the SNR in each channel and thus, the appropriate gain for each channel is
calculated.

Chapter 2

Approaches for Compression in


FrequencyDomain
Current hearing aid devices employ a set of bandpass filters with different gains. The
number of filters in a specified hearing aid may vary and their central frequencies are
most commonly set at 250Hz, 500Hz, 750Hz, 1kHz, 1.5kHz, 2kHz and 4kHz covering the
audible frequency range of the human ear [9]. The gains at these specified frequencies
are programmable and set according to the specific audiogram of a patient. We therefore understand that existing methods for compression in hearing aids are related to the
frequencydomain in which gain calculation takes place.
First, a standard clinical procedure is followed to measure the patients audiogram,
which normally shows the hearing loss at specified frequencies (i.e. 250Hz, 500Hz, 1kHz,
2kHz and 4kHz). After that, a target is generated depending upon how much gain the
patient needs in order to compensate for hearing loss at the specified frequencies. At last,
the target is programmed into the hearing aid, on which the patient is subjected to some
tests under different sound situations. Mapping of the audiogram to the target is a basic
process to preselect the required gains of the hearing aid for the user. It is thus desirable to
make this first gain as close to the real requirement as possible to save effort and time in
the following process.

2.1

Multiband Compression and FFT Processing

Multi-channel dynamic-range compression is a basic part of digital hearing aids. The design of a digital compressor involves many considerations, including frequency resolution,
processing group delay, quantization noise, and algorithm complexity. A multichannel
compressor combines a filter bank with compression in each frequency band. In most
implementations, compressors operate independently in each channel but there are some
systems where compression gains can be grouped across adjacent bands. The compressor
output involves the response of each frequency band to the signal present in that band and
even some simple signals might cause complicated responses. The system output is finally
produced by adding compressed signals in each band as shown in Fig. 2.1:
Through multiband compression, hearing aids separate the input signal out to different
frequency bands and each subband signal goes through a different channel. Each channel has
its own compressor and the amount of compression is different at each frequency depending
on the patients hearing loss or input signal level.The amount of compression is bigger for
higher compression ratios and low compression thresholds. Furthermore, a disadvantage
of singlechannel compression over multichannel compression is that in the former, when
22

CHAPTER 2. APPROACHES FOR COMPRESSION IN FREQUENCYDOMAIN

23

Figure 2.1: Block diagram of a multi-channel compression system.


gain is reduced the frequency content of the signal is reduced across all frequencies, while
multichannel compression avoids this problem.
The Fast Fourier Transform (FFT) provides a convenient way of calculating the spectrum of a signal extracting its frequency information. The usefulness of FFT to digital
hearing aid processing is obvious given the fact that all hearing aid processing is dependent
on the frequency of the signal, such as increasing the compression ratio at high frequencies.
The healthy cochlea can be viewed as a biological Fourier Transform since it separates sound
into frequency regions along the basilar membrane, with high frequency sounds vibrating
the basilar end and low frequencies vibrating the apical end [10].
The frequency analysis techniqueFFT provides great sound quality, however it has
other challenges for effective implementation. The FFT technique is based on a uniform
spacing of frequency components while the auditory system is based on a logarithmic spacing. The human ears ability to resolve sounds is best modeled by a system in which the
bandwidth of frequency analysis is nearly constant at low frequencies and increases proportionally at higher frequencies (auditory Bark scale). This is due to the logarithmic frequency
coding on the basilar membrane.
The hearing instrument channels can be matched to the auditory system channels but is
accomplished at the expense of processing efficiency. For example, in an FFT based system,
the uniformly spaced bands can be combined to provide bandwidths similar to the auditory
system. This approach can provide an excellent representation of auditory system; however,
it requires a highresolution FFT to achieve the necessary lowfrequency resolution. While
this approach can provide excellent sound quality, the required processing can delay the
output. If this processing delay is too long, the hearing aid might have a negative user
perception (e.g. an echo).

2.2

FrequencyDomain Compression

Filter banks represent one approach to timedomain processing. The input sequence is
convolved with the filters one sample at a time and the resulting output sequence is formed
by summing the filter outputs. An alternative approach is to divide the signal into short
segments, transform each segment into the frequency domain using an FFT, compute the
compression gains from the computed input spectrum and apply them to the signal, and
then inverse transform to return to the time domain.

CHAPTER 2. APPROACHES FOR COMPRESSION IN FREQUENCYDOMAIN

2.2.1

24

Describing Ideal and Practical FFT system

Ideal FFT system


In Figure 2.2, a block diagram of a frequencydomain compressor is shown where sampling
rate is 16kHz and FFT size is set to 128 samples [11]:

Figure 2.2: Block diagram of an ideal frequency-domain compression system using 128-point
FFT and sampling rate 16kHz.
Initially, the input fills a data buffer, is windowed and zeropadded. Using overlap
add method, the FFT of the block is calculated and the power spectrum is estimated at
a 125Hz frequency spacing. These power estimates in the desired frequency bands are
computed in individual frequency bins at low frequencies and combined frequency bins at
higher frequencies. In this way, approximation to human auditory frequency analysis is
achieved. For each block of data, power spectrum is thus computed and a sequence of
signal samples is produced at the block sampling rate.
Compressor gains in each band are computed for the FFT system and afterwards, the
FFT of the input signal is multiplied by the compressor gains to give the compressed signal
in frequencydomain. The compressed signal is finally inverse transformed to give the time
sequence and all sequences are combined using overlapadd technique.
The frequency-domain compressor can be considered to be a filtering operation; the
spectrum of the input signal is multiplied by the spectrum of the compression filter to
give the spectrum of the compressed output signal. However, the compression filter is
designed in the frequency domain, so the length of its impulse response is not known and
can lead to temporal aliasing. Consequently, the length of the filter response must be chosen
appropriately so as to eliminate temporal aliasing.
Practical FFT system
The FFT system with temporal aliasing eliminated requires a total of four FFTs: a forward
FFT for the input segment, an inverse FFT for the compression gains, a forward FFT for
the truncated compression impulse response, and an inverse FFT for the filtered segment.
A practical digital hearing aid, in general, will not have the signal-processing capability
to perform four FFTs. The DSP may not be fast enough, or the battery drain may be

CHAPTER 2. APPROACHES FOR COMPRESSION IN FREQUENCYDOMAIN

25

too great. One solution to this problem is to provide circuitry on the DSP chip that is
dedicated to computing the FFT or to exploit special properties of the FFT and digital
filters to design a transform with a reduced operations count.
An additional solution is to compromise on the compression filter design to reduce the
number of FFTs needed.The shorter the impulse response, the smoother the frequency
response. Thus smoothing the compression-gain frequency response is equivalent to an
approximate truncation of the impulse response. The smoothing does not produce an exact
truncation, so some residual temporal aliasing distortion is possible. A careful selection of
the input segment length, FFT size, and frequency-domain smoothing will result in temporal
aliasing distortion that can not be perceived under most listening conditions.
Furthermore, the time delay of the FFT compressor depends on the size of the input
buffer and the size of the FFT. The FFT can not be computed until the input buffer is filled,
so there is a processing delay while the input segment is accumulated. The compression
frequency response is also specified as a real number greater than zero in each frequency
band. A frequency response that is pure real has a corresponding impulse response that
is linear-phase. Another probable way by which the delay can be adjusted is by changing
the size of the input segment and/or that of the FFT. A shorter input segment means that
the input buffer will be filled sooner, with a corresponding reduction in the overall delay.
However, a shorter input buffer means that the FFTs will have to be computed more often,
and the processing capacity of the DSP or the battery drain will need to be increased. The
other option is by using a smaller FFT. If the input buffer size is halved and the FFT size
halved, then the delay will be also be halved without an increase in the computational or
power requirements. However, the frequency resolution for a smaller FFT is reduced.

2.2.2

SideBranch Architecture

In Figure 2.3, we observe the block diagram of the sidebranch compression architecture
which has the advantage of combining low quantization noise of FIR filter bank with the
efficiency of spectral gain calculation using the FFT:

Figure 2.3: Side-branch compression architecture.


The side-branch system separates the input signal filtering from the frequency analysis and calculation of the compression gains. Increasing the number of taps in the FIR
filter allows for finer adjustment of the compressor frequency response at the expense of
increased processing complexity and system delay. Another way of viewing the side-branch
compressor is that it is an FFT system in which the compression filter is transformed into
the time domain, with the filtering then performed via time-domain convolution rather than
frequency-domain multiplication.

CHAPTER 2. APPROACHES FOR COMPRESSION IN FREQUENCYDOMAIN

26

In this implementation, the input signal fills a K/2-sample buffer and present K/2
samples are appended to the previous K/2 samples to give a total of K samples which
are then windowed to provide the input to the K-point FFT. The signal power spectrum is
computed from the FFT bins. Frequency bands are peak-detected and compressor gains are
computed from the peak-detector outputs. The compression gains are inverse transformed
to give the impulse response of the compression filter. Because the gains as a function of
frequency are real, the impulse response has even symmetry and yields a linear-phase filter.
The impulse response can be windowed if desired to smooth its frequency response. The
K/2 most-recent input samples are then convolved with the K-point FIR filter to produce
the final output.

2.3

Warped Compression System

Hearing losses are typically frequencydependent, so the compressor is designed as to provide different amounts of dynamicrange compression in different frequency regions. The
solution is a multichannel system, such as a filter bank with different degrees of compression in each channel. The design of a multichannel compressor involves a fundamental
tradeoff between frequency resolution and time delay.
For any given processing approach, increased frequency resolution comes at the price of
increased processing delay. Compared to conventional digital processing algorithms, the use
of digital frequency warping inherently gives frequency resolution on an auditory frequency
scale and reduces the amount of processing delay for a specifieddegree of lowfrequency
resolution. The processing delay of a frequencywarped compressor, which is described in
a following section, is frequencydependent with greater delay at low frequencies than at
high frequencies. Consequently, a frequencywarped compressor must take into account
the frequency resolution, overall system processing delay and delay variation
across frequency. The target is to design a compression system that avoids audible artifacts
caused by the system delay and has good frequency resolution on a criticalband frequency
scale [12].

2.3.1

Concerns in designing Compression Systems

Frequency Resolution
The main concern in designing a multichannel compressor is to match systems frequency
resolution to that of the human auditory system. Digital frequency analysis typically provides constantbandwidth frequency resolution. However, human auditory systems resolution is more accurately modeled by a filter bank having a nearly constant bandwidth
at low frequencies but proportional to frequency as it increases. This mismatch between
digital and auditory analysis can be greatly reduced by replacing the conventional uniform
frequency analysis by a warped frequency analysis. Frequency warping uses a conformal
mapping so as to reallocate frequency samples close to the Bark frequency scale and is
described in more detail in a following section.
Overall Processing Delay
A second concern in designing a compression system for a hearing aid is the overall processing delay which might cause coloration effects when the hearingaid wearer is talking. When
talking, the talkers own voice reaches the cochlea with minimal delay via bone conduction

CHAPTER 2. APPROACHES FOR COMPRESSION IN FREQUENCYDOMAIN

27

and through the hearingaid vent. This signal interacts with the delayed and amplified signal produced by the hearing aid device to produce a combfiltered spectrum at the cochlea.
Delays as short as 3 to 6msec that are constant across frequency are detectible and overall
delays in the range of 15 to 20msec can be judged as disturbing or objectionable.
The overall processing delay is due to several factors. Certain aspects of the overall system delay, such as the A/D and D/A converter delays, are not affected by signal processing
since they are fixed by the hardware. The total software processing delay is the sum of
the time required to fill the input buffer, the group delay inherent in frequency-domain or
time-domain filtering and the time needed to execute the code before the output signal is
available.

2.3.2

Digital Frequency Warping

Frequency warping uses a conformal mapping to give a nonuniform spacing of frequency


samples around the unit circle in the complex zplane. It is achieved by replacing unit
delays in a digital filter with firstorder allpass filters. The all pass filter is given by:
z 1 a
(2.1)
1 az 1
where a is the warping parameter. The value of the warping parameter that gives a
closest fit to the Bark frequency scale is a = 0.5756 for sampling rate 16kHz. For this
choice of parameters, group delay at low frequencies exceeds one sample and is less than
one sample at high frequencies as illustrated in Fig. 2.4.
A(z) =

Figure 2.4: Group delay in samples for a single all-pass filter having the warping parameter
a=0.5756 [12].
The transfer function of the warped FIR filter is the weighted sum of the outputs of
each allpass section:
K
X
W (z) =
bk Ak (z)
(2.2)
k=0

for a filter with K +1 taps. Forcing the real filter coefficients {bk } to have even symmetry for
an unwarped FIR filter yields a linearphase filter, in which the filter delay is independent

CHAPTER 2. APPROACHES FOR COMPRESSION IN FREQUENCYDOMAIN

28

Figure 2.5: Block diagram of a compression system using frequency warping for both frequency analysis and filtered signal synthesis.
of the coefficients as long as symmetry is preserved. This symmetry property guarantees
that no phase modification will occur as the compressor changes gain in response to the
incoming signal. In a binaural fitting (hearing aids on both ears), the coefficient symmetry
also ensures that identical amounts of group delay are introduced at the two ears and thus,
preserving the interaural phase differences used for sound localization.
Frequency warping can be used to design both finiteimpulse response and infinite
impulse response (IIR) filters. Improved frequency resolution in a conventional FIR filter
requires increasing the filter length, which leads to a further increase in group delay. Similarly, improved frequency resolution in a warped FIR filter requires an increase in the
number of allpass filter sections which also leads to a rise in filter delay. There is therefore
a tradeoff between frequency resolution and group delay for both conventional and warped
filters, although the warped filter has less delay at low frequencies than a conventional filter
with the same lowfrequency resolution.

2.3.3

Compressor using frequency warping

A dynamicrange compression system using warped frequency analysis is presented in


Fig. 2.5.
The compressor combines a warped FIR filter and a warped FFT. The input signal x(n)
is passed through a cascade of all-pass filters, with the output of the k -th all-pass stage
given by pk (n). The sequence of delayed samples {pk (n)} is then windowed and its FFT
is calculated. Because of the fact that the data sequence is windowed, the spectrum is
smoothed in the warped frequency domain, giving smoothly overlapping frequency bands.
The result of the FFT is a spectrum sampled on a Bark frequency scale. The implementation
of the algorithm is done on a sample-by-sample basis or using block data processing where

CHAPTER 2. APPROACHES FOR COMPRESSION IN FREQUENCYDOMAIN

29

the compression gains are updated once per block (block processing is typically used) [12].
The compression gains are computed from the warped power spectrum and are pure real
numbers so the inverse FFT to give the warped time-domain filter results in a filter with real
and even-symmetrical coefficients. The system output is finally calculated by convolving
the delayed samples with the compression gain filter:
y(n) =

K
X

gk (n)pk (n)

(2.3)

k=0

where {gk (n)} are the compression filter coefficients.


Comparing a conventional FIR system to a warped compression system of the same
length, the latter will require more computational resources because of the all-pass filters in
the tapped delay line. Nevertheless, in many cases the warped FIR filter is shorter than the
conventional FIR filter needed to achieve the same degree of auditory frequency resolution.

2.4

Warped LowDelay Post Filter

We live in a noisy world. In all applications (telecommunications, hands-free communications, recording, human-machine interfaces, etc.) that require at least one microphone, the
signal of interest is usually contaminated by background noise and reverberations. Therefore, the microphone signal has to be cleaned with digital signal processing tools before
it is played out,transmitted or stored. As a result, nowadays digital hearing aids are mostly
equipped with speech enhancement systems.
By speech enhancement, we mean the improvement in intelligibility and/or quality of a
degraded speech signal which includes not only noise reduction but also dereverberation and
separation of independent signals. Speech enhancement is a very difficult problem because
the nature and characteristics of noise signals can change dramatically in time and the
performance measure can also be defined differently for each application. To measure the
performance, two perceptual criteria are widely used: quality (subjective) and intelligibility (objective). In general, it is not possible to improve both quality and intelligibility at
the same time and quality is usually improved at the expense of sacrificing intelligibility [13].

2.4.1

Warped Post Filter for Speech Enhancement

For the suppression of background noise, a noise reduction system has to achieve a high
quality for the enhanced speech but without causing a significant signal delay. A high signal
delay can cause coloration effects while the hearing-aid user is talking. In such a case,
the talkers own voice reaches the cochlea with minimal delay (via bone conduction and
through hearing-aid vent) and interacts with the delayed and amplified signal produced by
the hearing-aid. This leads to perceptual annoying artifacts.
In order to achieve these two conflicting goals, the main focus is the development of a
post-filter in the considered noise reduction system sketched in Fig. 2.6 [14]:
The calculation of the filter coefficients is done in the frequency domain while the actual filtering is performed in the time-domain. For the adaptation of filter coefficients
in frequency-domain, the noisy input speech signal x(k) is transformed into the spectraldomain by means of a frequency-warped Discrete Fourier Transform (DFT) Analysis filter
bank which is described in more detail in Chapter 3. It should be mentioned that a filter bank with non-uniform (approximately Bark-scaled) frequency resolution incorporates
a perceptual model of the human auditory system. Thus, a lower number of frequency
channels can be taken comparing to those of a uniform filter bank.

CHAPTER 2. APPROACHES FOR COMPRESSION IN FREQUENCYDOMAIN

30

Figure 2.6: Diagram of a warped post-filter for speech enhancement.


The M spectral coefficients Xi (k 0 ) are calculated by the DFT polyphase network (PPN)
Analysis filter bank with downsampling and the calculation of the M spectral gain factors
Wi (k 0 ) is done at decimated time instants:
 
k
0
k =
k
(2.4)
r
where r denotes the downsampling rate. Gains Wi (k 0 ) are estimated by a common spectral
speech estimator (such as the Wiener filter) and are real and bounded by Wthres < Wi (k 0 ) <
1. The lower limit of the gains Wthres (noise floor) can be achieved by restricting the a
priori SNR to a lower threshold.
After calculating the spectral weighting coefficients, we obtain the (L + 1) time-domain
weighting coefficients wn (k 0 ) by the periodical Generalized Discrete Fourier Transform
(GDFT) of size M instead of Inverse DFT:
wn (k 0 ) =

M
1
X

Wi (k 0 )ej M i(nn0 )

(2.5)

i=0

where the variable n0 ensures coefficients with non-zero phase. The choice n0 = L/2 (with
L even) yields coefficients with linear-phase property:
wn (k 0 ) = wLn (k 0 )

(2.6)

As a consequence, the FIR filter coefficients are obtained by the following relationship:
hs (n, k 0 ) = h(n)wn (k 0 )

(2.7)

and have also linear-phase property. Thereby, h(n) denotes the real finite impulse response
of the prototype filter of the Analysis filter bank. Finally, the coefficients of the warped
time-domain post-filter are hs (n, k 0 ) and its transfer function is Hs (
z , k 0 ) where z = A(z)
denotes the z-variable changed by all-pass transformation. The actual filtering of noisy
speech x(k) is done by this warped time-domain filter and the output signal y(k) = s(k) is
denoised speech.

CHAPTER 2. APPROACHES FOR COMPRESSION IN FREQUENCYDOMAIN

2.4.2

31

Warped Low Delay Post Filter

A modification of the resulted time-domain post filter is needed so as to achieve a reduced


signal delay with almost no loss of the subjective speech quality. In order to further reduce
signal delay, the original post filter of degree-L is approximated by an IIR filter of lower
degree-P as depicted in Fig. 2.7.

Figure 2.7: Diagram of a warped low-delay post-filter for speech enhancement.


Specifically, this IIR filter is an all-pole or Auto-Regressive (AR) filter since firstly,it has
minimum phase property and hence, it is always stable and secondly, it can achieve a very
low signal delay of only a few samples. This low-delay post filter allows to decrease the
delay and computational complexity (due to its lower degree) without a noticeable decrease
for the perceived speech quality. The approximation of the original filter by an AR filter is
done by methods taken from Parametric Spectrum Analysis and is described in Chapter 4
where a comprehensive concept for the low-delay post filter is developed.

Chapter 3

Filter Banks and Prototype Filter


Structures
3.1

Multirate Systems and Filter Banks

If a sequence x(n) is bandlimited, then it is possible to decimate it by use of appropriate


multirate techniques. The desire to reduce the sampling rate whenever possible is completely
understandable since it usually reduces the processing as well as the storage requirements.
However if x(n) is not bandlimited but has most of its energy in the low frequency region,
some kind of data rate reduction is feasible even though it cannot be decimated without
aliasing. This is indeed achieved by a technique called subband decomposition in which the
average number of bits per sample is reduced while the average number of samples per unit
time is unchanged. Subband Decomposition is implemented using filter banks [15].
A digital filter bank is a collection of digital filters having a common input or a common
output as shown in Fig. 3.1:
The left system is called an analysis bank and its filters Hk (z) are the analysis filters.
An input signal is splitted into M signals xk (n) typically called subband signals. The
system on the right part of the figure is a synthesis bank and Gk (z) are the synthesis filters
which combine the M-subband signals into a single signal x
(n). Frequency responses can
be marginally overlapping, non overlapping or very much overlapping, depending on the
application as depicted in Fig. 3.2.

Figure 3.1: Block Diagram of an M-band analysis-synthesis filter bank.


32

CHAPTER 3. FILTER BANKS AND PROTOTYPE FILTER STRUCTURES

33

Figure 3.2: Typical filter responses of digital filter banks.


Digital filter banks are an integral part of many speech and audio processing algorithms
used in todays communication systems, such as digital hearing aids. They are usually
employed for adaptive subband filtering, for example to perform multichannel dynamicrange compression in digital hearing aids or acoustic echo cancellation. Another common
task is speech enhancement by noise reduction. It should be noticed that the choice of
the filter bank affects the systems performance in terms of signal quality, computational
complexity and signal delay.
Due to requirements of restricted computational power (restricted capacity of battery
and small size of the chip set) and low overall processing delay, the algorithmic signal
delay of the filter bank used for signal enhancement must be considerably lower than the
tolerable processing delay (e.g. the latency between the analog input and output signal of
the system). Furthermore, a filter bank with non-uniform time-frequency resolution - similar
to that of the human auditory system- is desirable to perform multichannel dynamic-range
compression and noise reduction with a lower number of frequency bands.

3.2

Uniform and Non-Uniform DFT Filter Banks

A common choice for many applications is the uniform DFT analysis-synthesis filter bank
(AS FB). First, the DFT filter bank is based on the M M DFT matrix that has elements
[Wkm ] = W km where W ej2/M and k,m indicating the row and the column respectively
(see Appendix A). In Figure 3.3, we might see a simple example of a uniform DFT filter
bank [15].
The input sequence x(n) passes through a delay chain and M sequences si (n) are gen-

CHAPTER 3. FILTER BANKS AND PROTOTYPE FILTER STRUCTURES

34

Figure 3.3: The simplest example of a uniform-DFT filter bank.


erated so that si (n) = x(n i). The matrix W represents the conjugate of W and the
outputs of the analysis filters are
xk (n) =

M
1
X

si (n)W ki .

(3.1)

i=0

As a result, for every value of time-n we compute the set of M -signals xk (n) from the
set of M -signals si (n). In z-domain, one can write:
Xk (z) =

M
1
X
i=0

Si (z)W ki =

M
1
X

X(z)z i W ki =

i=0

M
1
X

(zW k )i X(z).

(3.2)

i=0

We can thus write:


Xk (z) = Hk (z)X(z)

(3.3)

Hk (z) , H0 (zW k )

(3.4)

H0 (z) = 1 + z 1 + . . . + z (M 1)

(3.5)

where
and
A filter bank in which the filters are related as in Equation (3.4) is called a uniform DFT
filter bank.
Summarizing, the system presented is equivalent to an analysis bank with analysis filters
Hk (z) with frequency response:
Hk (ej ) = H0 (ej(

2k
)
M

(3.6)

which is a shifted version of a single filter H0 (ej ) which we call the prototype filter. Consequently, we have a bank of M -filters which are uniformly shifted versions of H0 (z) with
large amount of overlap in their frequency responses.
Lastly, we can think of the uniform DFT filter bank as a spectrum analyzer. The k th output xk (n) is the spectrum computed based on the most recent M -samples of the
input sequence x(n). Since xk (n) is the output of Hk (ej ), it dominantly represents the

CHAPTER 3. FILTER BANKS AND PROTOTYPE FILTER STRUCTURES

35

portion of X(ej ) around the region = 2k


M . The resolution of the spectrum analyzer
can be improved by increasing M. This operation is interpreted by the sliding window
mechanism where a sequence x(n) is multiplied by a window in order to reduce the sidelobe
level of the frequency response. This is equivalent to inserting multipliers i just after the
delay chain. As time advances, the window slides past the data computing the new M -DFT
coefficients and thus, it helps to localize the time domain data before computation of the
Fourier transform.

3.3
3.3.1

Polyphase Representation of Filter Banks


Basic concept of Polyphase Decomposition

The polyphase representation can be used to efficiently implement the uniform DFT filter
bank mentioned above [15]. To explain the basic concept, consider a filter
H(z) =

h(n)z n

n=

Separating the even numbered coefficients from the odd numbered ones, its transfer function
is written:
H(z) =
=

h(2n)z

n=
E0 (z 2 )

2n

+z

h(2n + 1)z 2n

(3.7)

n=

+ z 1 E1 (z 2 )

where
E0 (z) =

(3.8)

h(2n)z n

(3.9)

h(2n + 1)z n

(3.10)

n=

and
E1 (z) =

X
n=

We should note that the representations hold whether H(z) is FIR or IIR, causal or
noncausal. Extending this idea, if we are given any integer M we can always decompose
H(z) as:
H(z) =

h(nM )z nM +z 1

n=

h(nM +1)z nM +. . .+z (M 1)

n=

h(nM +M 1)z nM

n=

(3.11)
which is compactly written as:
H(z) =

M
1
X

z l El (z M )

(Type 1 Polyphase)

(3.12)

l=0

The above equation is called Type 1 Polyphase Representation and El (z M ) are the polyphase
components of H(z) which depend on choice of M:
El (z) =

X
n=

el (n)z n

and

el (n) , h(M n + l), 0 l M 1

(3.13)

CHAPTER 3. FILTER BANKS AND PROTOTYPE FILTER STRUCTURES

36

Figure 3.4: Schematic of the relation between h(n) and its l-th polyphase component.

Figure 3.5: A prototype lowpass response of an M-th band filter.


The relation between h(n) and its `-th polyphase component is sketched in Fig. 3.4.
A variation of (3.12) is given by:
H(z) =

M
1
X

z M 1l Rl (z M )

(Type 2 Polyphase)

(3.14)

l=0

where Rl (z) are permutations of El (z):


Rl (z) = EM 1l (z)

3.3.2

(3.15)

Why the name Polyphase Decomposition?

We assume a M -th band filter with response as shown in Fig. 3.5.


We know that the impulse response el (n) of the polyphase component El (z) is obtained
by decimating h(n + l) which means that the polyphase component El (ej ) is an aliased
version of ejl H(ej ). In the summation:
H(z) =

M
1
X

z l El (z M )

(3.16)

l=0

there are M -terms each with magnitude 1/M which add up to approximately unity. So,
the M -terms z l El (z M ) are almost in phase. But for an M -th band filter, E0 (z) is constant
and leads to the conclusion that the phase responses of z l El (z M ) are nearly zero in the
passband. In other words, El (ej ) tries to approximate ejl/M and therefore, the phase
response l () of the `-th polyphase component tries to approximate l/M , for each l.
This is the motivation for the use of the term polyphase decomposition and the main
reason for its use is its computational efficiency in Multirate Signal Processing.

3.3.3

Polyphase Implementation of Uniform DFT filter banks

A set of M -filters is said to be a uniform-DFT filter bank if they are related according to
the relation:
Hk (z) , H0 (zW k )
(3.17)

CHAPTER 3. FILTER BANKS AND PROTOTYPE FILTER STRUCTURES

37

Figure 3.6: Implementation of the uniform DFT bank using polyphase decomposition.
The prototype filter H0 (z) can be expressed as:
H0 (z) =

M
1
X

z l El (z M )

(3.18)

l=0

based on its Type 1 Polyphase Representation. Hence, the k -th filter can be expressed as:
Hk (z) = H0 (zW k ) =

M
1
X

(z 1 W k )l El (z M )

(3.19)

l=0

since (zW k )M = z M . We obtain the output of Hk (z):


Xk (z) =

M
1
X

W kl (z l El (z M )X(z))

(3.20)

l=0

as shown in Fig. 3.6.


If H0 (z) is an AR filter of order-N, (N +1)-multiplications and (N -M +1)-additions are
required to implement the polyphase components. The presence of El (z) permits the use
of a prototype H0 (z) with higher length than M (L + 1 M ). As a consequence, the
prototype and hence all M -filters can have sharper cutoff and higher stopband attenuation.
By using Noble identities and considering filters Di (z) in place of Ei (z), the polyphase
uniform-DFT filter bank structure of Figure 3.6 with decimators is redrawn in the Figure
3.7.
This structure is even more efficient since it requires M -times fewer MPUs and APUs
than the structure shown in Figure 3.6. In our case, a warped DFT analysis filter bank
with downsampling is efficiently implemented using polyphase decomposition.
A common approach to realize algorithms for digital signal processing that are based
on spectral-domain filtering is to employ a DFT analysis-synthesis filter bank (AS FB) as
illustrated in Figure 3.8.
This filter bank is efficiently realized by a polyphase network with the DFT calculated
by the FFT. The subband filters have transfer functions:
Hi (z) =

L
X
l=0

i
h(l)EM
(z l )z l

(3.21)

CHAPTER 3. FILTER BANKS AND PROTOTYPE FILTER STRUCTURES

38

Figure 3.7: Polyphase decomposition of the uniform DFT filter bank with decimation by a
factor of M.
and
Gi (z) =

L
X

i
g(l)EM
(z l+1 )z l

i = 0, 1, . . . , M 1

(3.22)

l=0

where h(n) and g(n) denote the finite impulse responses for the analysis and synthesis
prototype filter, respectively. A common choice for the FIR filter degree is L=M -1, but a
higher degree can also be taken to increase the frequency selectivity of subband filters.
In order to design a warped DFT analysis filter bank -which is the case in our implementationwe replace all the delay elements in Figure 3.7 with allpass filters. The allpass transformation allows to design a nonuniform filter bank whose frequency bands approximate the
Bark frequency scale with great accuracy. The aliasing distortions due to subsampling operations might be reduced since a filter bank with perfect signal reconstruction is assumed
(AS filter bank is linear periodically time-variant) by reducing the subsampling rate R and
by using subband filters of higher degrees having narrow transition bands and high stopband
attenuations.

3.4

Efficient Non-Uniform Filter Bank Equalizer

An alternative filter bank concept to that of the conventional AS FB sketched in Figure


3.7 was proposed by L
ollmann and Vary to apply adaptive subband processing with a
significantly lower signal delay in digital hearing aids [16].
The whole system of subband processing and multiplication with the gains in the timedomain is illustrated in Figure 3.9.
Instead of using synthesis filters, we assume that the output y(n) is taken every n0 = L/2
samples such that the whole system after the prototype filter stage is:

W0 0
0
W0
..
2


..

W1 ejn0 M
0 W1 . . .
H .
.

F 1 = F
F
(3.23)
= FW
..
..

.
.

.
.
.
..
.
.
.
0
.
2
WM 1 ej(M 1)n0 M
0
0 WM 1
0

CHAPTER 3. FILTER BANKS AND PROTOTYPE FILTER STRUCTURES

39

Figure 3.8: Polyphase network (PPN) realization of a DFT analysis-synthesis filter bank
for a prototype filter of length L + 1 = 2M.
where Wn are the spectral domain coefficients adapted at a reduced sampling rate. As
a result, the new coefficients wn0 of Fig.3.9 yield from employing DFT on the data vector
W:
wn0

M
1
X

jn0 2
k
M

(Wk e

)e

j 2
kn
M

M
1
X

k=0

Wk ej M k(nn0 )

(3.24)

k=0

Since samples x(k) result in x(kM ) after downsampling by a factor M, these downsampled versions x(kM ) pass through the subband filters El (z) each of which in case of a
prototype of length L + 1 = 2M has two coefficients. More specifically, the low delay filter
bank of Figure 3.10 will be considered [17].
This filter bank has M -subbands and the impulse response hi (n) of the i -th subband
filter is obtained by a modulation of the prototype lowpass filter with real impulse response
h(n) of length L + 1 M according to:
(
h(n)(i, n), i =0,1,. . . ,M -1;
n=0,1,. . . ,L
hi (n) =
(3.25)
0
else.
The choice for the general modulation sequence and the prototype filter affects the
spectral selectivity and time-frequency resolution of the filter bank. (i, n) can also be
regarded as a transformation kernel of the filter bank and it is periodical so that:
(i, n + mM ) = (i, n)(m),

mZ

(3.26)

CHAPTER 3. FILTER BANKS AND PROTOTYPE FILTER STRUCTURES

40

Figure 3.9: System of filter-bank equalizer.

Figure 3.10: Filter-bank summation method for time-varying spectral gain factors Wi (k 0 )
adapted at a reduced sampling rate.
where the sequence (m) depends on the chosen transform and for many transforms (including the DFT), it is given by (m) = 1, m.
As shown in Figure 3.10, the input-output relation for the filter bank is given in the
z-domain as follows:
M
1
L
X
X
Y (z) =
Wi (
hi (n)z n )X(z)
(3.27)
n=0

i=0

Hence, the overall transfer function is obtained by inserting Equation (3.25) for hi (n)
into Equation (3.27). This yields:
F0 (z) =

L
X

h(n)

M
1
X

n=0

where
wn0 =

Wi (i, n)z n

(3.28)

i=0

M
1
X
i=0

Wi (i, n)

(3.29)

CHAPTER 3. FILTER BANKS AND PROTOTYPE FILTER STRUCTURES

41

are the time-domain weighting factors evaluated by spectral transformation (i.e.GDFT) of


the gain factors. As a result, the filter bank equalizer is a single filter:
F0 (z) =

L
X

hs (n)z n

(3.30)

n=0

whose impulse response hs (n) is the product of the impulse response h(n) of the prototype
filter and the weighting factors wn adapted in the spectral domain:
hs (n, k 0 ) = h(n)wn (k 0 ),

n = 0, 1, . . . , L

(3.31)

It has already been mentioned that the calculation of the spectral gain factors W (k 0 )
can be done by a common spectral speech estimator and the GDFT of size M provides
(L+1)-weighting factors wn (k 0 ) with non zero phase. However the naming as Filter Bank
Equalizer should point out that this kind of time-domain filtering has been developed
from the low delay filter bank which can be regarded as a filter bank used as equalizer.
Furthermore, to obtain the so called Non uniform Filter Bank Equalizer, one should
employ digital frequency warping by means of an allpass transformation where the delay
elements of the discrete subband filters are replaced by allpass filters:
z 1 HA (z)

(3.32)

An allpass filter of first order has frequency response


HA (ej ) =

1 ej
= ej ()
ej

(3.33)

and its phase response ()is given by:


() = + 2 arctan(

sin sin
)
cos cos

(3.34)

The frequency response of the allpass transformed filter bank equalizer (APT FBE) is
derived from Equation (3.30) by using Equations (3.32) and (3.33):
F (ej ) =

L
X

h(n)wn ejn ()

(3.35)

n=0

Thus, the allpass transformation causes a frequency mapping (). This frequency
mapping (warping) is solely determined by the allpass pole according to Equation (3.34).
The uniform FBE with transfer function F0 (z) is included as special case for = 0 since
then HA (z) = 1/z.
On one hand, the FBE needs more multiplications than the corresponding AS FB due
to the time-domain filtering at sampling rate. But on the other hand, the computation of
the gain factors in the spectral-domain is decoupled from the actual filtering in the timedomain. Therefore, no aliasing effects occur and since no synthesis filter bank is needed,
the signal delay is reduced [18].

CHAPTER 3. FILTER BANKS AND PROTOTYPE FILTER STRUCTURES

3.5

42

Prototype Filter Design

The signal reconstruction which is the main target of many signal processing schemes is
generally accomplished by the synthesis filter bank which consists of the upsampling operations and interpolating bandpass filters. A filter bank achieves perfect signal reconstruction
with a delay of d0 samples if:
v(n) = y(n d0 )
(3.36)
for spectral gains Wi = 1, i.
The objective of the prototype lowpass filter design is to achieve perfect reconstruction
and the FBE generally meets this condition as long as the following two requirements are
fulfilled:
1) the general modulation sequence has the property:
(
M
1
X
C,
C 6= 0; n = n0 ; n, n0 {0, 1, . . . , M 1}
(i, n) =
(3.37)
0,
n 6= n0
i=0
which is a condition met by the transformation kernel of the GDFT.
2) a generalized Mth-band filter with impulse response
1

C(mc ) ; n = n0 + mc M, (mc ) 6= 0, mc Z
h(n) = 0;
n 6= n0 + mM, m Z\{mc }

arbitrary;
else

(3.38)

is the prototype lowpass filter design [17]. A suitable Mth-band filter is given by:
h(n) =

sin( 2
1
M (n d0 ))
winL (n)
2
C(mc ) M (n d0 )

where a common used window sequence is defined by:


(
+ ( 1) cos( 2
L n); 0 n L; 0.5 1
winL (n) =
0;
else

(3.39)

(3.40)

A rectangular window achieves a least-squares approximation error, but other window sequences such as Kaiser window, the Hann window ( = 0.5) and the Hamming window
( = 0.54) are often preferred to influence properties of the filter such as transition bandwidth or sidelobe attenuation.

3.5.1

Different Parameters of prototype filter

The polyphase network (PPN) implementation of the FBE described above is shown in
Figure 3.11.
The transposed direct form of the filter is derived from its direct form representation
by transposition of the signal flow graph and following the rules: 1: branch nodes as well
as the system input and output are interchanged. 2: All signal directions are reversed and
delay elements might be inserted in each branch of the time-domain filter to account for
the execution time to calculate the time-domain weighting factors wl (n0 ). These weighting
factors are calculated by a separate network similar to that of the figure but with the difference that the downsampling is performed directly after the delay elements. Hence, the PPN
realization of the transposed direct form requires a slightly higher algorithmic complexity

CHAPTER 3. FILTER BANKS AND PROTOTYPE FILTER STRUCTURES

43

Figure 3.11: Polyphase network implementation of the FBE for the direct-form filter.
compared to that of the direct form realization. The polyphase network decomposition can
be performed for both FIR filters and IIR filters [17].
The utilization of a suitable prototype filter is a scheme that we are interested in
since the polyphase representation allows to improve the spectral selectivity of the
subband filters in order to reduce the cross-talk between adjacent frequency bins.
We consider the PPN implementation of the filter bank used in our system as an alternative
of FFT in order to obtain a better way of estimating energy in each subband and thus,
calculating gains.
In terms of computational complexity, it becomes M logM (due to FFT computation)+2M
(due to computations at M -subband filters) for the polyphase representation of the filter
bank in contrast with M logM of FFT where M is the number of subbands (length of FFT).
For an FFT length equal to 64 (M = 64) and a prototype filter of length L + 1 = 2M = 128,
each filter performs two multiplications and one addition (see section ).

3.5.2

Realization of different prototype filter structures

Idea of Filter Compression in Frequency-Domain


The main target referring to the implementation of the prototype filter used in polyphase
decomposition is to achieve sharper cut off and reduced approximation error. This has led
to the idea of implementing two filters instead of one and in this way, we finally met the
above requirements.
Assume that we have a filter of impulse response hn with L-coefficients h0 , h1 , . . . , hL1 .
By interpolating zeros between the filter coefficients, the impulse response expands in the

CHAPTER 3. FILTER BANKS AND PROTOTYPE FILTER STRUCTURES


Types of filters
1st scheme
2nd scheme
1st filter
2nd filter

Length
128

Delay
32

Transition band edges


1 1.2
[ 32
, 32 ]

120
8

7
4

1
[ 32
4, 1.2
32 4]
1
[ 32
, 0.5 1.2
32 ]

44

Combined error
0.2427
0.1118
0.0067

Table 3.1: The maximum combined error for 32 frequency bands.


Types of filters
1st scheme
2nd scheme
1st filter
2nd filter

Length
128

Delay
32

Transition band edges


1 1.2
, 64 ]
[ 64

120
8

7
4

1
4, 1.2
[ 64
64 4]
1
[ 64 , 0.5 1.2
64 ]

Combined error
0.3219
0.2132
0.0024

Table 3.2: The maximum combined error for 64 frequency bands.

time-domain while the frequency response is compressed respectively. For example, by


expanding in time by a factor of 2 we get:
h0 , 0, h1 , 0, . . . , 0, hL1 .
The compression that occurs in the frequency-domain leads to parasitic components
that must be cut off using a lowpass filter in cascade. More specifically, we assume to
have M = 64 frequency bands and the frequency response of the filter is around zero. By
interpolating, we compress by a factor Mc = 2 in the frequency-domain and a parasitic
component appears around . Practically, we desire to have a sharper cut off and thus,

we assume that the passband frequency is p = M


= 64
and the stopband frequency is

s = 64 (1 + ), where is a sufficiently small quantity (i.e. = 0.2).


The compression by a factor Mc changes the passband and stopband frequencies since

the former becomes p0 = 64


Mc and the later becomes s0 = 64
Mc (1 + ). Moreover,
the transition band increases by a factor Mc and the corresponding delay becomes Mc
delay. As a result, the frequency response for a compression factor Mc = 4 has 3 parasitic
components up to and one half around . The lowpass filter needed in cascade with the

first filter must have a cutoff frequency 2


64 64 (1 + ).
A comparison between the above implementation which includes the two filters in cascade with overall signal delay below the desired delay and one prototype filter such that
the same computational complexity is achieved is required. Tables 3.1 and 3.2 show the
obtained results for M=64 and M=32 frequency bands respectively.
The combined error contained in Tables 3.1, 3.2 is the real and imaginary part approximation error. In addition, we observed that by relaxing the condition for a very narrow
transition band and hence extremely sharp cutoff, the approximation error becomes smaller.
Encouraging results are taken when considering the symmetric case of a prototype filter
since we are only interested in a low signal delay for the time-domain FIR filter of gains
rather than a low delay prototype filter. In such a case and assuming a more relaxing
condition for the bandwidth of the transition band (i.e. taking some space from the pass

band and considering it as [ M


(1 ), M
(1 + )]), we obtained some useful results using the
suggested scheme.

CHAPTER 3. FILTER BANKS AND PROTOTYPE FILTER STRUCTURES

Mc=2
Mc=3
Mc=4

Filter Design
1st scheme

Length
128

Delay
64

1st filter
2nd filter
1st filter
2nd filter
1st filter
2nd filter

120
8
120
8
120
8

30
4
20
4
15
4

Transition Edges
1.2
[ 0.8
64 , 64 ]
2nd scheme
1.2
[ 0.8
32 , 32 ]
0.8
[ 64 , 1 1.2
64 ]
0.8
1.2
[ 64 3, 64 3]
1.2
[ 0.8
64 , 1 64 ]
0.8 1.2
[ 16 , 16 ]
1.2
[ 0.8
64 , 0.5 64 ]

Error
0.2416
0.1363
9.3139e-0.8
0.0913
7.4511e-4
0.0667
0.0024

45

Combined error
0.2416

0.1363
0.0920
0.0262

Table 3.3: The maximum combined error for different compression factors Mc .

The proposed scheme (second scheme) gives us better results according to the estimated
combined approximation error for every value of compression factor Mc . More specifically
as the compression factor Mc increases, the combined error decreases and for Mc = 4, the
suggested idea gives us a better approximation error by an order of magnitude and thus,
by two orders of energy compared to the original (first) scheme.

3.6

Conclusions

An introduction of multirate systems and filter banks as well as a presentation of uniform


DFT filter banks were developed. The non-uniform DFT filter bank yields from an allpass transformation of the (discrete) subband filters so that it approximates the frequency
resolution of the human auditory system.
The Polyphase Representation of filter banks was described in detail as a way to efficiently implement filter banks. Its superiority over FFT was proved by simulation results
when using a scheme including the idea of expanding coefficients in the time-domain and
thus, compressing in the frequency domain. The combined approximation error that yields
using the approximate minimax method is always lower in case of the implementation of a
prototype using a frequency-compressed filter in cascade with a lowpass filter that cuts off
the parasitic components that appear in the frequency-domain as a result of compression.
By using the proposed design for the prototype filter, sharper cut off and reduced cross-talk
between adjacent frequency bins can be achieved. This leads to a better estimation of each
subband energy and thus, more accurate calculation of gains.

Chapter 4

Low Delay Time-Domain Post


Filter
As described in the last section of Chapter 2, our main focus is to realize the time-domain
post filter of gains by using a warped auto-regressive low-delay post filter. The approximation is done by methods taken from parametric spectrum analysis and for simplicity, the
approximation of the uniform AR filter is treated first [14].

4.1

Uniform Auto-Regressive Low-Delay Post Filter

We consider the filter of finite impulse response hs (n) which has transfer function:
Hs (z) =

L
X

hs (n)z n

(4.1)

n=0

This FIR filter is initially approximated by a uniform auto-regressive filter with infinite
impulse response (IIR) hAR (n) and transfer function:
HAR (z) =

0
PP

n=1 n z

(4.2)

The cascade of the original FIR filter with transfer function Hs (z) and the inverse of the
AR filter is sketched in Figure 4.1.

Figure 4.1: Approximation of a uniform FIR filter by a uniform AR filter.

46

CHAPTER 4. LOW DELAY TIME-DOMAIN POST FILTER

47

The input x(k) to the original filter is a white noise sequence x(k) = nw (k) with variance
2 . As a result, the output y(k) of the FIR filter is colored noise and the output
E{x2 (k)} = w
of the inverse AR filter of order-P is:
!
P
X
1
e(k) =
y(k)
n y(k n) .
(4.3)
0
n=1

We know from the theory that for a linear causal system H(z) excited by a white noise
sequence w(k), the output x(n) is determined according to:
x(n) =

hk w(n k).

(4.4)

k=0

If xn is a wide sense stationary (WSS) process, its power spectral density which is the
Fourier transform of its autocorrelation function xx is given by:
xx (ej ) =

xx ejm

(4.5)

m=

and hence:
2
xx (ej ) = |H(ej )|2 w
.

(4.6)

In order to obtain the input white noise sequence w(n), the inverse procedure which is
called noise whitening is realized where the signal x(n) is now the input to the system
with transfer function 1/H(z) and the produced output is w(n). As a consequence, in order
to achieve a good approximation of the original FIR filter Hs (z) of figure 4.1 by an allpole
filter HAR (z), the power of the error signal e(k) has to be minimized since:
e (ej ) = |H(ej )[1 A(ej )]|2 w (ej )
where the power spectral density of white noise is w (ej ) = 1.
The power of the error signal is:
R
R
1
1
j
j 2
j 2
E{e2k } = 2
e (e )d = 2 |H(e )| |1 A(e )| d

(4.7)

(4.8)

The P -unknown coefficients n of filter A(z) can be computed by minimizing the power
with respect to the filter coefficients:
E{e2k }
= 0,

This leads to:


yy () =

P
X

= 1, 2, . . . , P.

n yy ( n),

= 1, 2, . . . , P.

(4.9)

(4.10)

n=1
2 , in
Since the power spectral density of the output signal y(k) is yy (ej ) = hh (ej )w
the time-domain it leads to:
2
yy () = hh ()w
(4.11)

For the AR filter coefficients given by Equation (4.10), the power (variance) of the output
error signal amounts to:
h
i
P
E{e2k } = e2 = 12 yy (0) Pn=1 n yy (n)
(4.12)
0

CHAPTER 4. LOW DELAY TIME-DOMAIN POST FILTER

48

The proof of the above equation is shown in Appendix C.


By normalizing the energy of the input signal x(k) and the output error signal e(k), we
2 , the scaling factor
determine the scaling factor 0 . Hence, by the requirements e2 w
0
reads:
"
#
P
P
X
X
1
2
0 = 2 yy (0)
n yy (n) = hh (0)
n hh (n)
(4.13)
w
n=1

n=1

or equivalently
v
u
P
u
X
t
n hh (n).
0 = hh (0)

(4.14)

n=1

From the Equation (4.10) and taking the (P + 1) different values of , we obtain:
hh (1)1 hh (2)2 . . . hh (P )P

= hh (0)

hh (0)1 hh (1)2 . . . hh (P + 1)P = hh (1)


..
.
hh (P 1)1 hh (P 2)2 . . . hh (0)P

= hh (P )

This set of equations might be written in the following matrix form:

1
hh (0)
hh (1)

hh (P )
1
hh (1)


hh (0)
hh (P + 1)
2

..

=
..
.
..
..
.
..
.
.
.
hh (P ) hh (P 1)
hh (0)
P

0
0
0
..
.

(4.15)

Taking also into account that the scaling factor is


02

= hh (0)

P
X

n hh (n)

(4.16)

n=1

the augmented normal equations to determine the (P + 1)-coefficients n are finally given
by:
2

0
1
hh (0)
hh (1)

hh (P )
1 0

hh (1)


hh (0)
hh (P + 1)

2 0
(4.17)
=

..

..
..
..
.
.. ..
.
.
.
. .
hh (P ) hh (P 1)
hh (0)
P
0
The system of Equations (4.17) contains a correlation matrix that has Toeplitz structure
and can hence be efficiently inverted using the Levinson-Durbin algorithm. In addition, for
a Toeplitz correlation matrix that is positive definite, the AR filter is always stable which
is exactly the case.
There are several reasons why the AR filter approximation has been taken instead of
the more general AR Moving Average (ARMA) approximation. First, the AR filter has
minimum phase property since all poles and zeros are within the unit circle and thus,
it is always stable and a low signal delay can be achieved. Moreover, it leads to a good

CHAPTER 4. LOW DELAY TIME-DOMAIN POST FILTER

49

Figure 4.2: Basic concept of approximation: the two signals y and e must have the same
statistical characteristics.
approximation of the magnitude response of the original filter but not of the phase response.
Nevertheless, this is tolerable for the hearing-aid application since the human ear is relatively
insensitive towards minor phase modifications [14].
In a more compact form, the objective of the described approximation is to determine
the filter coefficients n such that if the input signal x(k) is a white noise sequence, then
the output signal e(k) is a white noise sequence acquiring the same signal characteristics.
In other words, as depicted in Figure 4.2 we desire that the two signals y(k) and y (k)
have the same statistical characteristics rather than being identical.
For the calculation of the AR filter coefficients, the system that we have solved yields
from the main idea of minimizing the energy of the output error signal and thus, minimizing
the quantity
Z

|H(ej ) 1 A(ej ) |2 d = 0.
(4.18)

By defining the quantities A = [1 2 . . . P ]T and E = [ej ej2 . . . ejP ]T the above


equation can equivalently be written as follows:
Z

Z

Z
T
j 2
T
j 2
T
A
|H(e )| Re{E E }d A2
|H(e )| Re{E }d A+
|H(ej )|2 d = 0.

(4.19)
Taking the derivative with respect to A, we finally obtain:
Z

Z
j 2
H
|H(e )| Re{EE }d A =
|H(ej )|2 Re{ET }d

(4.20)

R
R
We define = |H(ej )|2 Re{EEH }d and b = |H(ej )|2 Re{ET }d and the
solution of the system gives the vector A of the unknown AR filter coefficients:
A = 1 b

(4.21)

In this way, the scaling factor 0 which plays the role of the minimization constant is
s
Z

0 =

|H(ej )|2 |1 ET A|2 d

min
A

(4.22)

and eventually, the AR filter has a transfer function


HAR (z) =

1 1

z 1

0
. . . P z P

(4.23)

CHAPTER 4. LOW DELAY TIME-DOMAIN POST FILTER

4.2
4.2.1

50

Allpass transformed Auto-Regressive Low-Delay Post Filter


Approximation of the warped post filter

It has already been mentioned that the allpass transformation of a discrete filter is achieved
by substituting all delay elements by allpass filters:
z 1 HA (z)
where HA (z) is the transfer function of a (causal) real allpass filter of first order
HA (z) =

z 1
,
1 z 1

|| < 1; R.

(4.24)

Its corresponding frequency response is given by


HA (ej ) =

ej
= ej ()
1 ej

where the phase response () is given by the following relation




sin
() = + 2 arctan
cos
Hence, the allpass transformed FIR post-filter has a transfer function

Hs (z) = Hs HA (z)1 .

(4.25)

(4.26)

(4.27)

The frequency warping or allpass transformation is marked by the tilde notation. This
warped post-filter shall be approximated by a warped AR filter with transfer function
AR (z) =
H

0
1

PP

n
n=1 n HA (z)

(4.28)

and thus, its frequency response is denoted by


AR (ej ) =
H

0
1

PP

j ()n
n=1 n e

(4.29)

In this case, we get an illustration of the approximation of a warped FIR filter by a warped
AR filter similar to that shown in Figure 4.1 and the power of the corresponding error signal
that has to be minimized leads to:
Z
2 d() = 0
T A|
|H(ej() )|2 |1 E
(4.30)



= [
= ej() ej2() . . . ejP () T .
where A
1
2 . . .
P ]T and E
yields the system
Taking the derivative with respect to A
Z

|H(e


Z
H

)| Re{EE }d() A =

j() 2

T }d()
|H(ej() )|2 Re{E

(4.31)

or
=
1b
A

(4.32)

CHAPTER 4. LOW DELAY TIME-DOMAIN POST FILTER

51

Figure 4.3: Approximation of the uniform post-filter by a warped AR filter.


where
=

E
H }d()
|H(ej() )|2 Re{E

(4.33)

T }d()
|H(ej() )|2 Re{E

(4.34)

and
b =

The value of the scaling factor 0 is determined in a similar way as in the case of the
uniform approximation. It should be noted that the allpass transformation leads to delayless
feedback loops. Moreover, the coefficients of the frequency warped post-filter hs (n) =
h(n)wn are obtained from an allpass transformed analysis filter bank which has higher
computational complexity compared to that of a uniform filter bank. In case of a multichannel controlled post-filter, the adaptation of the filter coefficients requires more than one
frequency warped analysis filter banks and this leads to high computational complexity.

4.2.2

Approximation of the uniform post filter

To face the above mentioned problem, a different filter approximation is proposed which
is that of a uniform time-domain post-filter approximated by an allpass transformed AR
filter. We calculate the filter coefficients hs (n) by means of one or more DFT analysis
filter banks and thus, provide a uniform post-filter. This uniform time-domain filter is then
approximated by a warped AR filter so as to exploit the benefits of a Bark scaled frequency
resolution. The obtained approximation problem is depicted in Figure 4.3.
For the following derivation, it is useful to express the convolution of a sequence y(k)
by an allpass chain of length- as follows
[]

HA (z) Y (z) Dy[] (k) = hA (k) ? y(k)

(4.35)

[]

where hA (k) is the impulse response of an allpass chain of length-, that is


[]

hA (k) = hA (k) ? hA (k) ? . . . ? hA (k)


[0]

(4.36)

and denotes -times convolution. Specifically, hA (k) = (k) and the unit delay z 1 is
included as a special case for the warping parameter = 0.
By applying the definition (4.35), the output of the inverse warped AR filter with transfer
AR (z)1 is expressed in the time-domain as follows:
function H
!
P
X
1
e(k) =
y(k)
n Dy[n] (k)
(4.37)
0
n=1

CHAPTER 4. LOW DELAY TIME-DOMAIN POST FILTER

52

Similarly to Sec.4.1, the minimization of the power of the error signal e(k)
E{
e2k }
= 0;

leads to
E{y(k)Dy[] (k)} =

P
X

= 1, 2, . . . , P

n E{Dy[n] (k)Dy[] (k)};

(4.38)

= 1, 2, . . . , P

(4.39)

n=1

By defining the warped autocorrelation sequence as


yy (n, ) = E{Dy[n] (k)Dy[] (k)}

(4.40)

the Equation (4.38) is written as


P
X

yy (0, ) =

n yy (n, ).

(4.41)

n=1

The warped autocorrelation sequence yy (n, ) can also be formulated as follows


2
yy (n, ) = w

[n]

[]

Dhs (l)Dhs (l)

(4.42)

l=0

where
hs hs (n, ) =

[n]

[]

Dhs (l)Dhs (l)

l=0

is the warped impulse autocorrelation that has the important property


hs hs (n, ) = hs hs (n + m, + m)

(4.43)

for a real allpass transformation. This property is proved in Appendix C.


As a result, the infinite summation of Equation (4.42) can now be expressed by
hs hs (n n, n) =

[nn]
[n]
Dhs (l)Dhs (l)

l=0

L
X

[n]

hs (l)Dhs

(l)

(4.44)

l=0

or
hs hs (0, n) = hs hs ( n)

(4.45)

This warped impulse autocorrelation sequence is calculated by a network shown in


Fig. 4.4.
For example, the first autocorrelation coefficient hs hs (0) is calculated by
hs hs (0) =

L
X

L
h
i X
[0]
hs (l) hA (l) ? hs (l) =
h2s (l)

l=0

(4.46)

l=0

Similarly, the second warped autocorrelation coefficient is given by


hs hs (1) =

L
X

i
h
[1]
hs (l) hA (l) ? hs (l)

(4.47)

l=0

and in the same way, we obtain the (P + 1)-warped impulse autocorrelation coefficients.

CHAPTER 4. LOW DELAY TIME-DOMAIN POST FILTER

53

Figure 4.4: Network for calculation of the (P + 1)-warped impulse autocorrelation coefficients hs hs ().
Taking into account Equations (4.39) and (4.42), we obtain:
hs hs () =

P
X

n hs hs ( n)

(4.48)

n=1

and the scaling factor 0 is determined as in Sec.4.1:


v
u
P
u
X
n hs hs (n).
0 = ths hs (0)

(4.49)

n=1

As far as the system of equations is concerned, it is given by the system of the augmented
normal Equations (4.17) by replacing hh () with hh (). Therefore, the AR filter coefficients are determined as for the uniform case.
To obtain an overall view of the present approximation, we must take into consideration
the frequency warping caused by the allpass transformation of the AR filter since

= ()

(4.50)

The power of the error signal e(k) for the warped frequency axis
amounts to:
Z 2
1
2

e =
E(
)d

(4.51)
2 0
The partial derivative of
with respect to gives:
1 2

1 2 cos + 2
as shown in Appendix C. It finally yields:
Z 2
1 2
1
2
E()
d

e =
2 0
1 2 cos + 2
noting that the upper and lower limits become

0 = (0) = 0

(4.52)

(4.53)

CHAPTER 4. LOW DELAY TIME-DOMAIN POST FILTER

54

Figure 4.5: Approximation of the uniform time-domain post-filter by an allpass transformed


AR filter.
and

2 = (2) = 2
respectively. Consequently, the output signal e(k) is weighted by a Laguerre filter [21] as
depicted in Fig. 4.5 since the transfer function of a Laguerre filter is

1 2
(4.54)
HL (z) =
1 z 1
The error signal e(k) is now weighted and thus, does not have an approximately flat
spectrum as in the case of a uniform AR filter approximation. Hence, in order to obtain an
error signal with approximately flat spectrum, an inverse Laguerre filter must be applied to
the error signal e(k). The overall filter is eventually a cascade of the warped AR filter and
a Laguerre filter:

1 2
0
ARL (z) =
H

(4.55)
PP
1 n=1 n HA (z)n 1 z 1

4.3

Simulation Results

To ease the treatment of the warped AR filter, the approximation of the uniform post-filter
by a uniform AR filter was first described. The obtained results are shown in Fig. 4.6.
The approximation -as explained in theoretical analysis- is done using the Mean Square
Error (MSE) criterion since our aim is to minimize the output error signal e(k). We might
observe that in the figure, the approximating AR filter tries to catch the outliers while
loses a bit the fluctuations.
Furthermore, we considered the approximation of a warped and a uniform time-domain
post-filter by an allpass transformed AR low delay post-filter as shown in Fig. 4.7 and
Fig. 4.8 respectively. From the obtained figures, we observe that a good approximation
is achieved in almost all cases and especially the one we are interested in which is that
of approximating a warped post-filter by a warped AR filter of lower degree P (P = 20).

CHAPTER 4. LOW DELAY TIME-DOMAIN POST FILTER

55

FIR filter approximated by a uniform AR filter


10

Magnitude in dB

10

15

20

25

0.2

0.4

0.6

0.8

Figure 4.6: Magnitude Response in dB and approximation of an FIR filter by a uniform


AR filter.
Both AR filters provide a good approximation for the magnitude response of the original
time-domain filter despite their lower filter degrees [22].
It should also be mentioned that the warped AR filter has a higher frequency resolution
for the lower bands than the uniform AR filter and hence, it provides a better approximation
for the lower frequencies than the uniform AR filter and vice versa for the high frequencies.
In contrast to the warped FIR filter, the frequency warping does not lead to audible phase
modifications and the problem of designing a phase equalizer does not appear. Some other
evaluation measures and simulation results might also be seen later in Chapter 5. At last, we
observed that we obtain better results regarding the desired approximation when windowing
the gains of the original FIR with a Hamming window.

4.4

Low Delay FIR Filter Design

Non-symmetric Low Delay FIR filter


In case of a symmetric filter of length L + 1 = 2M , the delay is (L 1)/2. For a filter
with 128-taps (M = 64), the delay in symmetric case is 64 samples which is very high for
speech processing applications since the output of the filter at time-n refers to time (n64).
In order to further reduce the signal delay, the following idea was proposed by George V.
Moustakides in [19].
Assume a filter of length L = 2N + 1 and its coefficients h0 , h1 , . . . , hL1 . We can define
two sequences n and n of the same length as hn of even and odd symmetry respectively
which also sum up to hn :
N
+
N
q
h0

N 1
+
N 1
q
h1

...
1
0
...
+
+
...
1
0
...
q
q
. . . hN 1 hN

1
+
1
q
hN +1

. . . N 1
N
...
+
+
. . . N 1 N
...
q
q
. . . h2N 1 h2N

(4.56)

This operation is similar to that of separating a function to a sum of an even and odd

CHAPTER 4. LOW DELAY TIME-DOMAIN POST FILTER

56

Uniform FIR approximated by a warped AR filter


15
10
5

dB

0
5
10
15
20
25

0.2

0.4

0.6

0.8

Figure 4.7: Magnitude Response in dB and approximation of a uniform FIR post-filter by


an allpass transformed AR filter.
function. In such a case of an odd-tapered filter, the frequency response is written:
H(ej ) = ejN {(0 +21 cos +. . .+2N cos N )+j(21 sin +. . .+2N sin N ) (4.57)
An expression of the frequency response of the filter is:
H(ej ) = R(ej )ej()

(4.58)

where R(ej ) approximates the ideal frequency response D(ej ). From the two above
equations and by expressing R(ej ) = Rr (ej ) + jRi (ej ), it yields:
Rr (ej ) = 0 + 21 cos + . . . + 2N cos N
Ri (ej ) = 21 sin + . . . + 2N sin N
L1

2
On the other hand, an even-tapered filter (L = 2N ) might be expressed:
() = N =

N
+
N
q
h0

N 1
+
N 1
q
h1

...
1
1
...
+
+
...
1
1
...
q
q
. . . hN 1 hN

. . . N 1
N
...
+
+
. . . N 1 N
...
q
q
. . . h2N 2 h2N 1

(4.59)

and its frequency response H(ej ) is given by:

(2N 1)

(2N 1)
+. . .+2N cos
)++j(21 sin +. . .+2N sin
)}
2
2
2
2
(4.60)
j
In this case where L = 2N , R(e ) and the phase response of the filter become:

H(ej ) = ej(N 0.5) {(21 cos

Rr (ej ) = 21 cos

(2N 1)
+ . . . + 2N cos
2
2

CHAPTER 4. LOW DELAY TIME-DOMAIN POST FILTER

57

Warped FIR approximated by warped AR filter


15
10
5
0

dB

5
10
15
20
25
30

0.2

0.4

0.6

0.8

Figure 4.8: Magnitude Response in dB and approximation of a warped FIR post-filter by


a warped AR filter.

(2N 1)
+ . . . + 2N sin
2
2
L1

() = (N 0.5) =
2
As a result, an FIR filter has linear phase which leads to a delayed output of (L 1)/2
samples:
L1
H(ej ) = ej 2 R(ej )
(4.61)
Ri (ej ) = 21 sin

Since the filter length L has a high value most of the times, the signal delay becomes
significant which is a deficit for real-time applications. An idea to reduce the delay is to
approximate ejM D(ej ) with R(ej ) instead of approximating the ideal frequency response
D(ej ) with R(ej ). This results in:
H(ej ) ej(

L1
M )
2

D(ej )

(4.62)

and thus, the delay becomes:


L1
M
(4.63)
2
for every M. By choosing the proper values of M, the delay becomes significantly low and
appropriate for real-time applications. So, the problem that must be solved in case of a
filter of length L + 1 = 2M is:
1) to approximate the real part of H(ej )
D=

Re{H(ej )} = 0 + 21 cos + . . . + 2N cos N


with
cos(M )D(ej )
and
2) to approximate the imaginary part of H(ej )
Im{H(ej )} = 21 sin + . . . + 2N sin N

CHAPTER 4. LOW DELAY TIME-DOMAIN POST FILTER


Length
symmetric L=64
symmetric L=128
non symmetric L=128

Delay
32
64
32

58

Error
3.6531e-03
1.7574e-05
3.9822e-04

Table 4.1: The maximum approximation error for different filter lengths in symmetrical and
non-symmetrical case.

with
sin(M )D(ej ).
In other words:
ej

L1

R(ej ) ej(

L1
M )
2

D(ej )

(4.64)

This is achieved by solving the following min-max problem:


min

i ,i

max

W ()|H(ej ) ejM D(ej )|

(4.65)

where
- i and i are the filter parameters which must be computed and summed so as to
obtain the real coefficients hi
- W () is a weight function that shows the maximum allowed error in the passband
and the stopband (it is not defined in the transition band, since we dont care about
its value in this region)
- H(ej ) is the approximating function and
- ejM D(ej ) = cos(M )D(ej ) + j sin(M )D(ej ) is the function that must be approximated so as to achieve a reduced signal delay of M samples.
Summarizing, the general min-max problem helps us to find the parameters i and
i that minimize the distance between the two functions D(ej ) and R(ej ) in the general
case.We know that there is no algorithm that efficiently solves the general min-max problem
and hence,we try to solve the approximate min-max problem which is the one described
above by separating the general problem into two smaller that can be solved using Remez
algorithm (for the symmetric case).
In order to obtain arithmetic results for a filter with predefined specifications (passband,stopband and weighting function W ()), we used the Matlab command fminimax to
solve the complex min-max problem and the approximate min-max problem by solving the
two mentioned separate problems and minimizing their sum:
min max W ()|Rr (ej ) Dr (ej )| + min max W ()|Ri (ej ) Di (ej )|

0 ,...,N

0 ,...,N

(4.66)

This is called the approximate min-max problem.


The results from Table 4.1 show that it is worth it to use a filter of high order in almost
every case since we obtain reduced maximum approximation error. Comparing the cases
of a symmetric and non-symmetric high-order filter, for the non-symmetric filter of delay
D = 32 the maximum approximation error is worse than the symmetric one but it has
reduced signal delay, reduced ripple (very close to zero) and a sharper cutoff region which
leads to reduced leakage between adjacent frequency bins.

CHAPTER 4. LOW DELAY TIME-DOMAIN POST FILTER

59

It should be mentioned that the parameters n and n obtained by the min-max method
and thus, the real coefficients hn give a really good approximation of the magnitude response
|H(ej )| but not always a satisfying approximation in the phase response, especially out
of the passband. The phase responses behavior in the stopband is insignificant, since the
amplitude in this region is very close to zero.
Solve min-max problem using Linear Programming
It has already been mentioned that the approximate min-max method solves the complex
min-max problem by separating it into two subproblems of a real part and an imaginary
part approximation (see equation 3.49). Linear Programming refers to the problem of
maximizing or minimizing a linear function subject to linear constraints [20]. The linear
function is called objective function and the general problem is defined as follows:
f (x1 , x2 , . . . , xn ) = c1 x1 + . . . + cn xn + d
minx cT x,
subject to
Ax b
In our case, the problem is expressed as follows:
min max W ()|Dr () 0 21 cos() . . . 2N cos(N )|

(4.67)

min max W ()|Di () 21 sin() . . . 2N sin(N )|

(4.68)

0 ,...,N J

and
1 ,...,N J

where the region J includes the passband and stopband regions since these are the zones
we are interested in.
Defining
= max W ()|Dr () 0 21 cos() . . . 2N cos(N )|
J

as the max approximation error and such that


max W ()|Dr () 0 21 cos() . . . 2N cos(N )| .
J

The equivalent problem under solution is:


min

0 ,...,N ,

But:
= opt
since is the maximum approximation error and
opt .
Thus, we finally obtain = opt .
The required inequality in the Linear Programming problem is:
W ()|Dr () 0 21 cos() . . . 2N cos(N )| =
W ()[Dr () 0 21 cos() . . . 2N cos(N )]

(4.69)

CHAPTER 4. LOW DELAY TIME-DOMAIN POST FILTER

60

The right inequality yields:


+ 0 (W ()) + 1 (2W () cos()) + . . . + N (2W () cos(N )) W ()Dr ()
(4.70)
and by sampling in K -different J, we obtain the system of inequalities:

1 W (0 ) 2W (0 ) cos(0 ) 2W (0 ) cos(N 0 )
W (0 )Dr (0 )

1 W (1 ) 2W (1 ) cos(1 ) 2W (1 ) cos(N 0 ) 0 W (1 )Dr (1 )

1
..

..
..
..
..
..
.
..
.
.
.
.
.
.
1 W (k ) 2W (k ) cos(k ) 2W (k ) cos(N k )
W (k )Dr (k )
N
(4.71)
Similarly from the left part of the equality the system that yields is the following:

1 W (0 ) 2W (0 ) cos(0 ) 2W (0 ) cos(N 0 )
W (0 )Dr (0 )

1 W (1 ) 2W (1 ) cos(1 ) 2W (1 ) cos(N 0 ) 0 W (1 )Dr (1 )


1

..

..
..
..
..
.
.

.
.
.
.
.
.
..
1 W (k ) 2W (k ) cos(k ) 2W (k ) cos(N k )
W (k )Dr (k )
N
(4.72)
Consequently, for simplicity of programming we solved the min-max problem with the
help of linear programming and not using a Remez-like algorithm (see Appendix B). Unfortunately, the specific algorithm tends to be slow for large filter lengths while it can
sometimes fail converging. Despite that, the idea is extended for an even-tapped filter and
one can acquire a filter with the desired amplitude and delay.
Another approach to a Low Delay FIR filter
Since the linear-phase property is desired in FIR filter design problems, the signal delay is an
important algorithm characteristic. Most hearing-instrument users receive processed sound
together with unprocessed sound leaking directly into the ear canal. At low frequencies,
these components may have similar amplitudes. Interference between these components
can cause noticeable effects if the processed signal is delayed more than about 5 10ms.
For a listener with a severe hearing-loss, who cannot hear the unprocessed sound, the only
problem is the asynchrony between speech sounds and lip movements. Then, delays of
about 50ms are acceptable. The processing delay is finally influenced by several factors in
the algorithm implementation [23].
The filtering performed by the algorithm always introduces some group delay regardless
of implementation. The FIR filter proposed for our problem yields from the frequency
compression gains which provide real and symmetric filter coefficients by using the Inverse
Fast Fourier Transform.
Assume the frequency gains
g0 , g1 , . . . , gN 1
where N = 64 in our case and the property

gn = gN
n

holds for the general case of complex gains.


Our aim is to obtain the filter coefficients
h0 , h1 , . . . , hN 1

CHAPTER 4. LOW DELAY TIME-DOMAIN POST FILTER

61

Figure 4.9: Introduction of an all-pole filter so as to eliminate deep nulls in FIR filter
frequency response.
such that the filter frequency response has amplitude gn and linear phase k that represents
the desired filter delay (for example, k = 16):
2

H(ej ) = gn ejk = gn ejk N n

(4.73)

In this case, the symmetry property is also true because




2
2
gn ejk N n = gN n ejk N (N n)
2

ejk N n = ejk2 ejk N n


which is true since ejk2 = 1.
The Matlab code is presented in Appendix D and gives a 64-tap filter of group delay
equal to 16. Hence, there is an acceptable range of signal delay fluctuations of 1 samples.

4.5

Elimination of Deep Nulls

An AR filter has no zeros outside the origin and is hence less suitable to approximate filters
having (many) zeros within their frequency response. Lollmann and Vary proposed that
the spectral weighting coefficients for speech enhancement should be bounded according to
the equation
Wthres < Wi (k 0 ) < 1;
i = 0, 1, . . . , M
(4.74)
This noise floor ensures that the AR filter does not have to approximate sharp zeros (deep
nulls).
1
Our idea was to introduce a filter with transfer function B(z)
in cascade with the AR
post-filter so as to eliminate the deep nulls in frequency response, as illustrated in figure 4.9.
As a result, the AR filter required in order to approximate the time-domain filter Hs (z) is
now HAR (z) where the inverse transfer function is given by
1
HAR
(z) =

1
(1 A(z))
B(z)

(4.75)

One thing that matters is the order of the polynomial B(z) and whether the correction at a
specific deep null frequency is right. The first thing to do is to find the frequencies at which
sharp zeros exist. In case of one or two deep nulls at frequencies 1 and 1 , 2 respectively,
we need to solve the following system:
- 1st case: if there is one deep null at frequency 1 then we must solve the equation
Hs (ej1 )
1
1 1 ej1

(4.76)

CHAPTER 4. LOW DELAY TIME-DOMAIN POST FILTER

62

- 2nd case: if there are two deep nulls at frequencies 1 and 2 then we must solve the
system of equations
Hs (ej1 )
1
1 1 ej1 2 ej21

(4.77)

Hs (ej2 )
1
1 1 ej2 2 ej22

(4.78)

By solving the two linear systems, we find the appropriate coefficients 1 of the first
filter and the coefficients 1 and 2 of the second two-order filter B(z).
It should be noted that for the above analysis, we made the assumption that there are at
most two frequencies at which sharp zeros are met. Finally, the AR filter that approximates
the new FIR filter after the elimination of deep nulls has frequency response


P
0 1 2n=1 n ejn
HARN (ej ) =
(4.79)
P
1 Pn=1 n ejn
The simplest case of a filter frequency response after the elimination of one deep null is
shown in figures 4.10 and 4.11.
FIR Filter with Deep Null

Magnitude (dB)

28
26
24
22
20

0.2

0.4
0.6
0.8
Normalized Frequency ( rad/sample)

0.2

0.4
0.6
0.8
Normalized Frequency ( rad/sample)

Phase (degrees)

1000

2000

3000

Figure 4.10: Magnitude and Phase Response of a filter with one sharp zero at a specified
frequency.
We observed that by putting a lower threshold at spectral gain values, more intense
ripples are created than those with the proposed method of eliminating deep nulls. In
addition, the AR filter that yields after adding one term at the nominator stays stable and
its group delay is 1 up to 6 samples at most.

4.6

Conclusions

A comprehensive concept for the time-domain FIR post-filter approximated by an IIR


allpole filter was developed. The uniform post-filter was approximated either by a warped
or a uniform AR filter. The uniform AR filter is of interest due to its low computational

CHAPTER 4. LOW DELAY TIME-DOMAIN POST FILTER

63

FIR Filter after eliminating Deep Null

Magnitude (dB)

28
26
24
22
20

0.2

0.4
0.6
0.8
Normalized Frequency ( rad/sample)

0.2

0.4
0.6
0.8
Normalized Frequency ( rad/sample)

Phase (degrees)

1000

2000

3000

Figure 4.11: Magnitude and Phase Response of a filter after the elimination of its sharp
zero.
complexity since no frequency warping is employed. Despite that, the warped AR filter
is mostly preferred since it provides a better approximation at low frequencies than the
uniform AR filter. This higher frequency resolution for the lower bands is favorable for
perceptual based speech enhancement applications.
The third case of the proposed approximations is the approximation of the warped FIR
post-filter by a warped AR filter which has a very high overall computational complexity
but perceptual speech quality can be improved in cost of computational complexity.
An idea of eliminating deep nulls appearing at one or two frequencies was proposed in
place of putting a lower threshold (noise floor) at spectral gain coefficients i.e.by restricting
the a priori SNR value depending on the spectral speech estimator. This idea employs an
allpole filter of order one or two at most depending on the number of deep null frequencies
(we assume that there are one or two frequencies with deep nulls) and changes the AR filter
required to approximated the original time-domain filter after having eliminated deep nulls.

Appendix A

The DFT and IDFT Matrices


A matrix of special interest in Digital Signal Processing is the Discrete Fourier Transform
2
(DFT) matrix. This is a N N matrix defined as WN = [WNkm ] where WN = ej N . In
2km
other words, the entry at the k -th row and m-th column is equal to ej N . Evidently,
this is a symmetric (but complex) matrix. For example:


1 1
W2 =
1 1
and

1 1
1
1
1 j 1 j

W4 =
1 1 1 1
1 j 1 j

The subscripts N on W are usually omitted if they are clear from the context. The
matrix W also satisfies the property W W = N I where W is the transpose-conjugate of
W so that it is unitary. Given a finite length sequence x(n), 0 n N 1 suppose we
define the vector x = [x(0)
x(1) . . .
x(N 1)]T and compute the vector X = Wx.
Then the components of X are said to form the DFT coefficients of the sequence x(n). The
sequence x(n) is the inverse DFT of the sequence X(k) and the matrix W1 is called the
IDFT matrix.
The DFT and IDFT relations are more commonly written as
X(k) =

N
1
X

x(m)W km

m=0

and
x(m) =

N 1
1 X
X(k)W km .
N
k=0

64

Appendix B

Solve min-max problem using


Linear Programming
The complex min-max problem for the FIR filter design cannot be solved in an optimal
way and thus, we try to solve it using Remez algorithm. The way proposed here is based on
Linear Programming and tries to approximate the existing problem by not using a classical
Remez-like algorithm. The used function is firappPM and is presented in the following
Matlab source file:
Listing B.1: MATLAB Code

function B = firappPM (N, F , A, D,W) ;


%
%
%
%

FIR a p p r o x i m a t e Parks M c C l e l l a n

The f u n c t i o n r e t u r n s a l e n g t h N+1 FIR f i l t e r u s i n g t h e a p p r o x i m a t e


minmax method . N must be even .

%
%
%
%
%
%
%
%
%
%

The d e s i r e d f r e q u e n c y r e s p o n s e i s d e s c r i b e d by F ,A and D.
F i s a v e c t o r o f f r e q u e n c y band e d g e s i n p a i r s , i n a s c e n d i n g o r d e r
b e t w e e n 0 and 1 . 1 c o r r e s p o n d s t o t h e N y q u i s t f r e q u e n c y or h a l f t h e
s a m p l i n g f r e q u e n c y . At l e a s t one f r e q u e n c y band must have a nonz e r o
w i d t h . A i s a r e a l v e c t o r t h e same s i z e as F which s p e c i f i e s t h e
d e s i r e d a m p l i t u d e o f t h e f r e q u e n c y r e s p o n s e o f t h e r e s u l t a n t f i l t e r B.
D i s t h e d e s i r e d d e l a y . In our c a s e i t w i l l be 3 2 .
W i s a weight function f o l l o w i n g the l o g i c of the amplitude response
A. Normally i t i s [ 1 1 1 1 ] , b u t i f we want t h e e r r o r i n t h e s e c o n d band
t o be 10 t i m e s s m a l l e r than i n t h e f i r s t t h e n we s e l e c t W=[1 1 10 1 0 ] .

%
%
%
%

We a r e g o i n g t o s o l v e t h e minimax problem w i t h t h e h e l p o f l i n e a r
programming and n o t u s i n g a Remezl i k e a l g o r i t h m . This i s f o r s i m p l i c i t y
o f programming . U n f o r t u n a t e l y t h e a l g o r i t h m t e n d s t o be s l o w f o r l a r g e
f i l t e r l e n g t h s w h i l e i t can sometimes f a i l c o n v e r g i n g .

N a = N/2 + 1 ;
N b = N/ 2 ;
Delay = N b D;
% I n t e r v a l sampling
maxs = N b 1 0 ; maxs1 = c e i l ( maxs (F(2)+F ( 3 ) ) / 2 ) ; maxs2 = maxsmaxs1 ;
om = [ [ 0 : maxs1 ] F ( 2 ) / maxs1 ; F ( 3 ) + [ 0 : maxs2 ] (1 F ( 3 ) ) / maxs2 ] pi ;
Ampl = [A( 2 ) o n e s ( maxs1 + 1 , 1 ) ; A( 4 ) o n e s ( maxs2 + 1 , 1 ) ] ;
W = [W( 1 ) o n e s ( maxs1 + 1 , 1 ) ;W( 3 ) o n e s ( maxs2 + 1 , 1 ) ] ;
Des a = cos (om Delay ) . Ampl . W; Des a = [ Des a ; Des a ] ;

65

APPENDIX B. SOLVE MIN-MAX PROBLEM USING LINEAR PROGRAMMING

66

Des b = sin (om Delay ) . Ampl . W; Des b = [ Des b ; Des b ] ;


% Solve real part
C = zeros ( 2 length (om) , N a +1);
C ( : , 1 ) = [ W; W] ;
f o r i = 1 : N a1
CC = 2 cos ( i om) . W;
C ( : , i +1) = [ CC; CC ] ;
end
C ( : , end ) = [ o n e s ( length (om ) , 1 ) ; o n e s ( length (om ) , 1 ) ] ;
f = [ zeros ( N a , 1 ) ;

1];

disp ( Real p a r t a p p r o x i m a t i o n )
d e l t a a = l i n p r o g ( f , C, Des a ) ;
e r r a = abs ( Des aC ( : , 1 : end1) d e l t a a ( 1 : end 1 ) ) ;
B = d e l t a a ( 1 : end1);
m a x r e a l e r r o r = d e l t a a ( end )
B = [ B( end : 1 : 2 ) ; B ] ;
% Solve imaginary part
C = zeros ( 2 length (om) , N b +1);
for i = 1 : N b
CC = 2 sin ( i om) . W;
C ( : , i ) = [ CC; CC ] ;
end
C ( : , end ) = [ o n e s ( length (om ) , 1 ) ; o n e s ( length (om) , 1 ) ] ;
f =[ zeros ( N b , 1 ) ; 1 ] ;
disp ( Imaginary p a r t a p p r o x i m a t i o n )
d e l t a b = l i n p r o g ( f , C, Des b ) ;
e r r b = abs ( Des b C ( : , 1 : end1) d e l t a b ( 1 : end1) ) ;
BB = d e l t a b ( 1 : end1);
m a x i m a g i n a r y e r r o r = d e l t a b ( end )
B = B + [ BB( end : 1 : 1 ) ; 0 ; BB ] ;
m a x c o m b i n e d e r r o r = sqrt ( max( e r r a . 2 + e r r b . 2 ) )
[ h , w ] = f r e q z (B, 1 , 1 0 2 4 ) ;
figure (1)
plot ( w/ pi , abs ( h ) )
t i t l e ( F i l t e r amplitude response ) ;
figure (2)
plot ( w ( 2 : end ) / pi , d i f f ( phase ( h ) ) / (w(2) w( 1 ) ) ) ; % p l o t group d e l a y
axis ( [ 0 1 0 D 2 ] ) % d e l a y = 32 i n p a s s b a n d [ 0 0 . 6 ]
t i t l e ( F i l t e r group d e l a y r e s p o n s e ) ;

Appendix C

Proofs & Derivations


Proof of Eq.(4.12):
The variance of the error signal reads:
!2
P
X
1
e2 = 2 y(k)
n y(k n)
0
n=1

P
X
1
= 2 y(k)y(k) 2
n y(k n)y(k) +
0
n=1
Since
P
X

!2
n y(k n)

n=1

P X
P
X

P
X

!2
n y(k n)

n=1

n m y(k n)y(k m)

n=1 m=1

taking the expectation value, we get:


P
X

!2
n y(k n)

n=1

P
X
n=1
P
X
n=1
P
X

P
X
m=1
P
X

m y(k n)y(k m)
m yy (n m)

m=1

n yy (n)

n=1

according to Equation (4.10).


Thus, we finally obtain:
E{e2k } = e2 =

1
20

yy (0)

67

PP

n=1 n yy (n)

(C.1)

APPENDIX C. PROOFS & DERIVATIONS

68

Proof of Eq.(4.42):
In the z-domain, Parsevals theorem reads:
 
I
X
1 dz
1
h(k)h (k) =
H(z)H
2j C
z z

(C.2)

By applying this relation to Equation (4.41), we obtain:

[n]
[]
Dhs (l)Dhs (l) =

l=0

1
2j

I
C

n
1
(z)Hs (z)HA
(z )Hs (z 1 )
HA

dz
z

1
dz
H n (z)Hs (z)Hs (z 1 )
2j C A
z
I
1
dz
n+mm
=
HA
(z)Hs (z)Hs (z 1 )
2j C
z
I
1
dz
n+m
HA
(z)Hs (z)HA (z 1 )+m Hs (z 1 )
=
2j C
z

X
[n+m]
[+m]
=
Dhs (l)Dhs (l).
=

l=0

Appendix D

Matlab code
Solution of system of equations 4.17:
The solution of the system of equations 4.17 is done using the following Matlab code:


Listing D.1: MATLAB Code

% c a l c u l a t i o n o f 1 s t p a r t Re{ExH}
P = 20;
N = 1 0 2 4 ; % no o f s a m p l i n g p o i n t s
w1 = linspace ( 0 , pi , N ) ;
E1 = zeros ( N, P ) ;
Es = zeros ( N, P ) ;
for j = 1 : N
for k = 1 : P
E1 ( j , k ) = exp ( sqrt ( 1) k w1 ( j ) ) ;
end
Es ( j , : ) = r e a l ( E1 ( j , : ) ) abs ( FIRf ( j ) ) . 2 ;
end
ReEH = sum( Es ) ;
b = ReEH ;
% c a l c u l a t i o n o f 2nd p a r t
E = zeros ( P , N ) ;
EEH = c e l l ( 0 ) ;
EEHxH2 = zeros (P , P ) ;
clear i j k
for j = 1 : N
for k = 1 : P
E( k , j ) = exp ( sqrt ( 1) k w1 ( j ) ) ;
end
EEH{ j } = E ( : , j ) E ( : , j ) ;
EEH{ j } = abs ( FIRf ( j ) ) . 2 r e a l ( EEH{ j } ) ;%T o e p l i t z s y m m e t r i c a l s t r u c t u r e
end
f o r j = 1 : N1
EEHxH2 = EEH{ j } + EEH{ j +1};
EEH{ j +1} = EEHxH2 ;
end
A = EEHxH2 ;
% s y s t e m s o l u t i o n
x = A\b

69

APPENDIX D. MATLAB CODE

70

It should be mentioned that the coefficients of the FIR filter that must be approximated
are obtained by the m-file Gainvector and FIRfreq presented as follows:


Listing D.2: MATLAB Code

% VECTOR OF FREQUENCY COMPRESSION GAINS


function g a i n v e c t o r = G a i n v e c t o r
M = 64;
g = zeros (M 1 , 1 ) ;
a d d i t = zeros ( 1 ,M) ; a d d i t ( 1 : end ) = 1 0 ;
g a i n v e c t o r = rand ( 1 ,M) + a d d i t ;
g a i n v e c t o r ( 2 : end ) = g a i n v e c t o r ( 2 : end)+ g a i n v e c t o r ( end : 1 : 2 ) ;
g0 = sum( g a i n v e c t o r ( 2 : end ) . ( 1 ) . [ 1 : 6 3 ] ) ;
g a i n v e c t o r ( 1 ) = g0 ;
for k = 1 : M
i f ( k=33 )
g ( k ) = g a i n v e c t o r ( k ) ( 1)( k1) ;
end
end
g a i n v e c t o r ( 3 3 ) = sum( g ) ;

Listing D.3: MATLAB Code

% FIR PostF i l t e r
gainvector = Gainvector ;
a = i f f t ( gainvector ) ;
FIR = a (mod ( ( 0 : 6 3 ) + 3 3 , 6 4 ) + 1 ) ;
N = 1 0 2 4 ; % # no o f s a m p l i n g p o i n t s
w1 = linspace ( 0 , pi , N ) ;
H1 = zeros ( 1 ,N ) ;
FIRf = zeros ( 1 ,N ) ;
K = 6 3 ; % // f i l t e r c o e f f s a r e 63 ( e l i m i n a t i n g t h e l a s t z e r o )
for j = 1 : N
for k = 1 : K
H1( k ) = FIR ( k ) . exp ( sqrt ( 1) ( k1) w1 ( j ) ) ;
end
FIRf ( j ) = sum(H1 ) ;
end

APPENDIX D. MATLAB CODE

71

Regarding the approximation problem, the approximated filter might be either a uniform(Gainvector) or a warped filter (warped).


Listing D.4: MATLAB Code

a l p h a = 0 . 5 ; %warping f a c t o r
L = 6 4 ; %DFTp o i n t s
N = 1024;
w1 = linspace ( 0 , pi , N ) ;
a r g t a n 1 = sin ( w1 ) . / ( cos ( w1 ) a l p h a ) ;
warpphase1 = w1 + 2 atan ( a r g t a n 1 ) ;
H1 = zeros ( 1 ,N ) ;
warpedFIR1 = zeros ( 1 ,N ) ;
K = 63;
for j = 1 : N
for k = 1 : K
H1( k ) = FIR ( k ) . exp ( 1 i ( k1) warpphase1 ( j ) ) ;
end
warpedFIR1 ( j ) = sum(H1 ) ;
end

APPENDIX D. MATLAB CODE

72

Low Delay FIR filter:


In order to obtain a time-domain FIR post-filter with a low signal delay, we apply the
idea presented in section 4.4 Low Delay FIR Filter Design in case of a 64-tapered FIR filter
with desired delay 16 samples. The obtained results are sufficiently satisfying with a delay
of 14 up to 17 samples (fluctuations up to 1-2 samples).


Listing D.5: MATLAB Code

% F i l t e r o f l e n g t h M w i t h group d e l a y n0
gainvector = Gainvector ;
gainvector (17) = 0.01;
M = 64;
n0 = 1 6 ; % d e s i r e d group d e l a y
wf = zeros (M, 1 ) ;
for l = 1 : M
wf ( l ) = 2 pi ( l 1)/M;
end
w = zeros ( length ( g a i n v e c t o r ) , 1 ) ;
f o r k = 1 : length ( g a i n v e c t o r )
w( k ) = g a i n v e c t o r ( k ) exp ( 1 i n0 wf ( k ) ) ;
end
W f i l t = i f f t (w ) ;

APPENDIX D. MATLAB CODE

73

Elimination of Deep Nulls:


As far as the problem of sharp zeros in frequency response is concerned, we proposed the
idea of applying a filter B(z) so as to eliminate sharp zeros existing at specified frequencies.
The Matlab code produced for this problem is the following:


Listing D.6: MATLAB Code

% ELIMINATION OF DEEP NULLS


gainvector = Gainvector ;
dpn = 1 7 ; % deep n u l l p o s i t i o n
g a i n v e c t o r ( dpn ) = 0 . 0 0 1 ; % v a l u e o f f r e q u e n c y g a i n a t t h a t p o s i t i o n
avg = sum( g a i n v e c t o r ) / length ( g a i n v e c t o r ) ;
M = 64;
hDesGrpdel
FIR = r e a l ( W f i l t ) ; % c o e f f s o f r e a l timedomain f i l t e r
n u l l p o s = zeros ( M, 1 ) ;
n u l l f r e q = zeros ( M, 1 ) ;
Fw = zeros ( M, 1 ) ;
f n =0;
% f i n d deep n u l l s
for m = 1 : M
i f ( ( g a i n v e c t o r (m) <= 0 . 0 1 ) && ( g a i n v e c t o r (m) > 10 e 14) )
f p r i n t f ( Deep n u l l a t p o s i t i o n :%d\n , m )
n u l l p o s (m) = m;
f p r i n t f ( Frequency :% f \n , 2 pi (m1)/M )
n u l l f r e q (m) = 2 pi (m1)/M;
end
i f ( n u l l p o s (m) = 0 )
f n = n u l l f r e q (m) ;
end
b1 = ( 1 g a i n v e c t o r ( dpn ) . / avg ) exp ( 1 i f n ) ;
end
FIRnull = g a i n v e c t o r ;
F I R n u l l ( dpn ) = g a i n v e c t o r ( dpn ) / ( 1 b1 exp(1 i f n ) ) ;
wfn = zeros (M, 1 ) ; w n u l l = zeros ( M, 1 ) ;
n0 = 1 6 ;
for l = 1 : M
wfn ( l ) = 2 pi ( l 1)/M;
end
for k = 1 : M
% FIR f r e q u e n c y r e s p o n s e ( w i t h o u t deep n u l l )
w n u l l ( k ) = F IR n ul l ( k ) exp ( 1 i n0 wfn ( k ) ) ;
end
Wnull = i f f t ( w n u l l ) ;
FIRn = r e a l ( Wnull ) ;
% new f r e q u e n c y r e s p o n s e FIRn ( e l i m i n a t i o n o f deep n u l l )
figure (1)
f r e q z ( FIR , 1 , 6 4 )
t i t l e ( FIR F i l t e r with Deep N u l l ) ;
figure (2)
f r e q z ( FIRn , 1 , 6 4 )
t i t l e ( FIR F i l t e r a f t e r e l i m i n a t i n g Deep N u l l ) ;
%AR f i l t e r a p p r o x i m a t i n g FIRn
H = wnull ;
P = 20;
ReExH64 % s o l v e s l i n e a r s y s t e m t o f i n d AR c o e f f i c i e n t s
f i l t A=zeros ( 1 ,P ) ;
A a l l p o l e=zeros ( 1 ,M) ;
Af=zeros ( 1 ,M) ;

APPENDIX D. MATLAB CODE

74

for j = 1 : M
for k = 1 : P
%f i n d c o e f f s x from ReExH64
f i l t A ( k ) = x ( k ) exp ( 1 i n0 wf1 ( j ) ) ;
end
A a l l p o l e ( j ) = sum( f i l t A ) ;
i f ( n u l l p o s ( j ) = 0 )
Af ( j ) = ( 1 b1 exp(1 i f n ) ) / ( 1 A a l l p o l e ( j ) ) ;
else
Af ( j ) = 1 / ( 1 A a l l p o l e ( j ) ) ;
end
end
HxA = H H . A a l l p o l e ;
approxHAa = abs (HxA) . 2 ;
approxHA = ( 1 /M) sum( approxHAa ) ;
a0 = sqrt ( approxHA ) ;
A f t o t = a0 Af ; % f r e q u e n c y r e s p o n s e o f AR f i l t e r
[ Hn ,W] = f r e q z ( FIRn , 1 , 6 4 ) ;
[Num, Den ] = i n v f r e q z ( A f t o t , W, 1 , P ) ;
figure (3)
plot ( W, 10 log10 ( abs ( A f t o t ) ) , r )
t i t l e ( AR F i l t e r Amplitude r e s p o n s e ) ;
figure (4)
g r p d e l a y (Num, Den ,W)
t i t l e ( AR f i l t e r Group Delay r e s p o n s e ) ;

Bibliography
[1] Harvey Dillon, Hearing Aids. Thieme Medical Publishers, 1st edition, May 2001.
[2] http://openlearn.open.ac.uk/file.php/3373/formats/print.htm
[3] http://mail.pittsfield.net/teachersites/WhelihanKathleen/
[4] http://dissertations.ub.rug.nl/FILES/faculties/science/1996/p.w.j.hengel/c1.pdf
[5] http://openlearn.open.ac.uk/mod/resource/view.php?id=263164
[6] http://books.google.gr/books?id=wpYSS8o0PeoC&printsec=frontcover#v=onepage&q=&f=false
[7] http://openlearn.open.ac.uk/mod/resource/view.php?id=263208
[8] Robert E.Sandlin, Hearing Aid Amplification-Technical and Clinical Considerations.
Second Edition. Singular Thomson Learning.
[9] R. Gao, S. Basseas, D.T. Bargiotas, L.H. Tsoukalas, Next-generation hearing prosthetics, IEEE Robotics and Automation Magazine, March 2003.
[10] Brent Edwards and Dave Smriga, Better Hearing Through DSP, GN ReSound North
America.
[11] Hecox K.E Williamson M.J., Cummins K.L. Adaptive Programmable Signal Processing
and filtering for hearing aids,1991.
[12] James M. Kates and Kathryn Hoberg Arehart, Multichannel Dynamic-Range Compression Using Digital Frequency Warping, EURASIP Journal on Applied Signal Processing, vol. 2005, no. 18, pp. 3003-3014, 2005. doi:10.1155/ASP.2005.3003
[13] J. Benesty, S. Makino, and J. Chen, Eds., Speech Enhancement, Springer, New York,
NY, USA, 2005.
[14] Heinrich W. Lllmann and Peter Vary, Post-Filters for Speech Enhancement in HearingAids. Final Report January 2006-January 2007 RWTH Aachen.
[15] P. P. Vaidyanathan, Multirate Systems and Filter Banks, Prentice Hall, PTR, 1993.
[16] H. W. L
ollmann and P. Vary, Efficient Non-Uniform Filter-Bank Equalizer, Institute
of Communication Systems and Data Processing. RWTH Aachen University, D-52056
Aachen, Germany.
[17] H. W. L
ollmann and P. Vary, Low Delay Filter-Banks for Speech and Audio Processing,
Institute of Communication Systems and Data Processing. RWTH Aachen University,
Aachen, Germany.
75

BIBLIOGRAPHY

76

[18] H. W. L
ollmann and P. Vary, Low Delay Filter for Adaptive Noise Reduction, Institute
of Communication Systems and Data Processing. RWTH Aachen University, D-52056
Aachen, Germany.
[19] George V. Moustakides, Basic Techniques in Digital Signal Processing. Tziola Editions.
2004.
[20] Thomas S. Ferguson, LINEAR PROGRAMMING: A Concise Introduction.
[21] H. H. Dam, A. Cantoni, Fellow, IEEE, S. Nordholm, Senior Member, IEEE, and
K. L. Teo, Senior Member, IEEE Digital Laguerre Filter Design With Maximum
Passband-to-Stopband Energy Ratio Subject to Peak and Group Delay Constraints,
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSI: REGULAR PAPERS,
VOL. 53, NO. 5, MAY 2006.
[22] H. W. L
ollmann and P. Vary, A Warped Low Delay Filter for Speech Enhancement,
Institute of Communication Systems and Data Processing. RWTH Aachen University,
D-52056 Aachen, Germany.
[23] FP6-004171 HEARCOM, Hearing in the Communication Society, D-5-1: Sub-set of
signal enhancement techniques operational on PC system, Integrated project, Editor:
Arne Leijon.

You might also like