Professional Documents
Culture Documents
Garini Nikoleta
September 15, 2009
Preface
This thesis is based upon studies conducted during October 2008 to August 2009 at
the Department of Electrical and Computer Engineering of the University of Patras. It
deals with some basic issues related to Digital Hearing Aids and more specifically, with the
matter of compression in hearing aid devices.
There has been explosion in the number of digital hearing aids on the market in the
last five years. At last count, there are 22 manufacturers with digital hearing aids marketed
under 40 different model names. Manufacturers are moving toward their third or fourth
generation of digital products.
The first chapter is a general introduction to hearing aids. It refers briefly to the human
auditory system and the exact problems faced by people with hearing impairment. It also
presents the underlying theory behind compression and its major role in decreasing the range
of sound levels in the environment to better match the dynamic range of a hearing-impaired
person. Compression systems are used to achieve specific aims and different compression
parameters are needed for each rationale.
Chapter 2 contains different approaches for frequency compression. Some of them are
Multiband Compression, Wide Dynamic Range Compression and Output Limiting Compression. The classic frequency-domain compression uses FFT processing and the ideal and
practical FFT systems are described. In order to approximate the non-uniform frequency
resolution of the human auditory system, warped compression systems are used for speech
enhancement.
Chapter 3 is dedicated to the theory of Multirate Filter Banks and the Polyphase Decomposition as an efficient way of implementing them. A different prototype filter design is
thoroughly described and is proposed since it provides a minimum combined approximation
error.
Chapter 4 explains the approximation of the time-domain post filter with gain coefficients being adapted at the frequency domain by an allpole filter of lower degree. A way of
eliminating sharp zeros in the filters frequency response is suggested and simulation results
provide us an evaluation of the proposed technique.
Appendices A, B, C and D serve as reference and provide Matlab code and some
useful proofs and derivations.
Acknowledgements
This master thesis was successfully completed during my graduation at the inter-departmental
program Signal Processing & Communication Systems of the Department of Computer
Engineering and Informatics at the University of Patras. Its main target is to enhance and
evaluate some compression techniques applied to Digital Hearing Aids.
I am deeply thankful to my Professor George Moustakides for his advice, for his unique
support and for the pleasant environment he has offered me at the Department of Electrical
and Computer Engineering at the University of Patras. His enthusiasm about the project
and his experience helped me to copy with issues in Digital Signal Processing that seemed
to me difficult at first.
The master thesis evaluation was performed by Nikolaos P.Galatsanos, Professor at
the Department of Electrical and Computer Engineering at the University of Patras, and
Professor Emmanouil Psarakis of the Department of Computer Engineering and Informatics
at the University of Patras.
Garini Nikoleta
Patras, 2009
Contents
1 Introduction to Hearing Aids
1.1 Description of Human Auditory System and Acoustic Measurements
1.1.1 Cochlear Tuning and Frequency Selectivity . . . . . . . . . .
1.1.2 Linear Amplifiers and Gains . . . . . . . . . . . . . . . . . . .
1.1.3 Sound Pressure Level and Absolute Threshold of Hearing . .
1.2 Problems Faced by Hearing-impaired People . . . . . . . . . . . . . .
1.3 Compression In Hearing Aids . . . . . . . . . . . . . . . . . . . . . .
1.3.1 Compressions major role: Reducing Signals Dynamic Range
1.3.2 Basic Characteristics of a Compressor . . . . . . . . . . . . .
1.3.3 Rationales for use of Compressors . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
8
9
10
12
12
14
16
16
17
19
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
22
22
23
24
25
26
26
27
28
29
29
31
.
.
.
.
.
.
.
.
.
.
.
32
32
33
35
35
36
36
38
42
42
43
45
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
46
46
50
50
51
54
55
61
62
64
65
67
D Matlab code
69
List of Figures
1.1
1.2
1.3
1.4
1.5
9
10
11
11
12
13
13
15
17
18
23
24
25
27
28
30
31
32
33
34
36
36
37
38
39
40
40
43
46
49
51
53
54
55
56
57
61
62
63
List of Tables
3.1
3.2
3.3
44
44
45
4.1
58
Chapter 1
1.1
In order to hear a sound, the auditory system must accomplish three basic tasks. First it
must deliver the acoustic stimulus to the receptors; second, it must transduce the stimulus
from pressure changes into electrical signals; and third, it must process these electrical
signals so that they can efficiently indicate the qualities of the sound source such as pitch,
loudness and location. The human ear can be divided into three fairly distinct components
according to both anatomical position and function: the outer ear, which is responsible
for gathering sound energy and funnelling it to the eardrum, the middle ear which acts as
a mechanical transformer and the inner ear where the auditory receptors (hair cells) are
located [2]. The Fig. 1.1 shows a detailed aspect of anatomy of the human ear:
1.1.1
10
Hair cells are the sensory receptors of both the auditory and the vestibular systems and
transform mechanical energy into neural signals. They are mainly classified as inner-hair
cells and outer-hair cells which are over three times more numerous and affect the response
of the basilar membrane. Mechanical properties of the basilar membrane affect the way it
responds to sounds of different frequencies.
It is known that the location of the peak of the traveling wave on the basilar membrane
is determined by the frequency of the originating sound. When a certain frequency sound
stimulates a point on the membrane, it responds by moving and hair cells at that site are
stimulated by the force that this movement creates. Therefore, groups of hair cells only
respond if certain frequencies are present in the originating sound [5].
Each place on the basilar membrane is tuned to s particular characteristic frequency. As
a whole, the basilar membrane behaves as a bank of over-lapping bandpass filters (auditory
filters). In this way, it extracts quite detailed information about the spectral decomposition
of sounds and performs a partial spectral/Fourier analysis of the sound, with each place
on it being most sensitive to a different frequency component. The frequency sensitivity of
a hair cell can be displayed as a tuning curve and the phenomenon is know as cochlear
tuning.
Frequency Threshold Tuning curves can be obtained by finding the level of a pure tone
required to produce a justmeasurable increase in the firing rate of a neuron, as a function
of frequency of the pure tone. These curves are equivalent to the tuning curves on the
basilar membrane, they are characteristically V-shaped as shown is Fig. 1.2 and their peak
represents the frequency at which the cell is most sensitive:
11
12
Figure 1.5: Human auditory thresholds as a function of frequency. Sounds that fall in the
shaded region below the curve are below threshold and therefore inaudible.
The smallest detectable change in intensity which is a matter of Intensity Discrimination
is measured using a variety of psychophysical methods and various stimuli. Although the
difference threshold depends on several factors including duration, intensity and the kinds
of stimuli on which the measurement is made, Webers law holds for most stimuli. In other
words, the smallest detectable change is a constant fraction of the intensity of the stimulus.
1.1.2
Amplifiers inside hearing aids can be classified as linear and nonlinear. Linear amplifiers
multiply the input signal by a fixed amount despite its magnitude. The behavior of a
linear amplifier is not affected by how many signals it is amplifying at the same time. For
example, if signal C is amplified by 30dB when it is the only signal present in the input,
then it will still be amplified by 30dB even when several other signals are simultaneously
being amplified by the device [1].
The gain of any device relates the amplitude of the signal coming out of the device to
the amplitude of the signal going into the device. Gain is thus calculated as the output
amplitude divided by the input amplitude or as the Output Level minus the Input Level
expressed in dB SPL(Sound Pressure Level). To fully describe the gain of a linear amplifier,
it is necessary to state its gain at every frequency within the frequency range of interest.
This is referred to as the gainfrequency response or gain curve. Thus, the degree of
amplification is represented as a graph of Gain versus frequency(GainFrequency Response)
or a graph of Output Level versus Input Level (IOcurve) which shows the dependence of
Output Sound Pressure Level on Input Sound Pressure Level for a particular signal or
frequency. It should be noted that the highest level produced by a hearing aid is know as
Saturation Sound Pressure Level (SSPL).
1.1.3
All amplifiers become nonlinear when the input or output signals exceed a certain level.
This happens because amplifiers are unable to handle signals larger than the voltage of the
13
battery that powers the amplifier. As with gain, the SSPL varies with frequency and a
useful measure is the SSPL Response curve.
Figure 1.6: Saturation Sound Pressure Level Frequency Response of a hearing-aid [1].
The Absolute Threshold of Hearing is the minimum sound level of a pure tone that
an average ear with normal hearing can hear in a noiseless environment. It relates to the
sound that can just be heard by the organism and it is not a discrete point and is therefore
classed as the point at which a response is elicited a specified percentage of the time. It is
expressed in dB SPL and can be measured using psychological methods.
For a hearingimpaired person, threshold of hearing is different from that of a normal
listener and a way to determine the hearing loss is Acoustic Audiogram. An Audiogram is
a chart depicting hearing test results:
14
most common type of hearing loss and as yet there is no cure, though hearing aids
can help.
Conductive hearing loss occurs when the sound is not being transmitted through the
ear canal and middle ear to the inner ear. Common causes are wax in the ear canal,
fluid in the middle ear or damage to the middle ear bones. This type of hearing loss
can often be successfully treated with medication or surgery.
1.2
The following problems are those that are mainly related to the most common type of
hearing loss, sensorineural hearing loss [1]:
Decreased Audibility
While hearing-impaired people do not hear some sounds at all, people with a severe
hearing loss may not hear any speech sound unless they are shouted and those with a
moderate loss are more likely to hear some sounds and not others. Particularly, softer
phonemesusually consonantsmay not be heard i.e. the sequence of sounds i e a a r
might have originate as pick the black harp and could have been heard as kick the
cat hard. For people with hearingimpairment, essential parts of some phonemes are
not audible and they recognize sound by noting which frequencies contain the most
energy. In general, the high-frequency components of speech are weaker than the low
frequency components. Thus, hearingimpaired people usually miss highfrequency
information.
To overcome these problems, a hearing aid has to provide more amplification at frequencies where speech has the weakest components (usually high frequencies). Hence,
hearing aids provide different amount of gain in different frequency regions.
Decreased Dynamic Range
Unfortunately, it is not always appropriate to amplify soft sounds by the amount
needed to make them audible. Sensorineural hearing loss increases the threshold of
hearing much more than the threshold of loudness discomfort. The dynamic range of
the ear is the level difference between discomfort and threshold of audibility and it
is less in case of a hearingimpaired person. Reduced dynamic range for people with
hearing loss is depicted in Fig. 1.8.
If the sounds of the environment are to fit within the restricted dynamic range of a
patient, then the hearing aid must amplify weak sounds more than it amplifies intense
sounds. This is the main target of compression.
Decreased Frequency Resolution
People with sensorineural loss deal with the difficulty of separating sounds of different
frequencies which are represented at different places within the cochlea. Decreased
frequency resolution of these people is due to the loss of the ability of outer hair cells
to increased sensitivity of the cochlea for tuning frequencies (frequencies at which the
affected part of the cochlea is tuned). The essence is that even when a speech and a
noise component have different frequencies, which are close enough, the cochlea will
have a single broad region of activity rather than two finely tuned separate regions.
The brain is thus unable to untangle the signal from the noise.
15
16
1.3
Compressions major role is to decrease the range of sound levels in the environment so as to
better match the dynamic range of a person with hearing impairment. The compressor may
be most active at low, mid or high sound levels or it may vary its gain across a wide range
of sound levels, in which case it is known as Wide Dynamic Range Compressor (WDRC).
A compressor can react to a change in Input Level within only a few thousands of a second
or it can be as slow as to spend many tens of seconds to fully react. The degree to which a
compressor finally reacts at a change in input level may be represented as an InputOutput
Diagram or as a GainInput Diagram.
Compression may be either linear or nonlinear. For sounds of a given frequency, a linear
compressor amplifies by the same amount no matter what the level of the signal is, or what
other sounds are simultaneously present.In this case, the problem is that intense sounds
become more intense and thus, annoying. The solution is to put a compression threshold
which is the input level above which the compressor and is clearly visible on InputOutput
Diagram.
Another measure which is related to the slope of the curves on IO Diagram or Gain
Input Diagrams is the compression ratio that describes the variation in Output Level that
corresponds to a variation in Input Level.
Benefits of compression can be summarized as follows:
It can make lowlevel speech more intelligible, by increasing gain and hence audibility.
It can make highlevel sounds more comfortable and less distorted.
In midlevel environments, it offers little advantage relative to a wellfitted linear aid,
but once the Input Level varies from this, its advantages become evident.
However, the two most important disadvantages are:
Greater likelihood of feedback oscillation and
excessive amplification of unwanted lowerlevel background noises.
1.3.1
The rationale for compression is to compensate for the reduced dynamic range found in the
impaired ear and the increased growth of loudness (recruitment) that accompanies hearing
loss. In fact, a compressor is an amplifier that automatically turns its gain down as the
input signal level rises.
There are three basic ways by which the dynamic range of signals can be reduced:
LowLevel Compression where after amplification lower levels come closer together
while the spacing of upper levels is not affected.
Wide Dynamic-Range Compression (WDRC) in which compression is applied more
gradually over a wide range of input levels.
Compression Limiting or HighLevel Compression where lowlevel sounds are amplified linearly, but the inputs from moderate to intense sounds are squashed into a
narrower range of outputs. Its name is due to the fact that the output is not allowed
to exceed a set limit.
17
Salient features of Output Limiting Compression and Wide Dynamic Range Compression are shown in Fig. 1.9. Output Limiting Compression has two main features: high
compression kneepoint and high compression ratio. On the other hand, WDRC is associated with low compression thresholds (below 55dB SPL) and low compression ratios (less
than 5:1) [8].
Figure 1.9: Input/Output curves showing effects of Output Limiting Compression(left) and
Wide Dynamic Range Compression(right).
1.3.2
18
Figure 1.10: The effects of a compressor on a signal. Only the middle portion of the input
is above the compressors threshold. Note the overshoot when the signal level increases
(it takes some time for the gain to decrease), and the attenuation when the input signal
returns to the first level (and the gain increases). The release time is generally longer than
the attack time.
On the other hand, if the attack time is extremely short and the release time long, distortion
will be minimal. However, very brief soundslike clicks will cause a decrease in gain and
the gain will stay low for a long time afterwards. Suitable values for attack times in hearing
aids are usually around 5 ms, while release times are rarely less than 20ms. In addition,
the attack and release times have a major effect on how compressors affect the levels of the
different syllables of speech.
It should be noted that apart from attack and release times, the parameters in a compression system are also the number of frequency channels and the compression ratios in
each frequency band. It may well be that the optimum compressor adjustment is a function
of the type and amount of background noise, interference and the characteristics of individual hearing loss. A serious matter to deal with is to identify different sound environments
for the purpose of adjusting compression or other signalprocessing system parameters.
Static Compression Characteristics
The attack and release times tell us how quickly a compressor operates; we need different
terms to tell us by how much a compressor decreases the gain as level rises. After having
specified these gain changes, we make the assumption that the compressor has had time
to fully react to variations in signal level and thus, we study the static characteristics that
are applicable to signals. The Sound Pressure Level above which the hearing aid begins
compression is referred to as the compression threshold. Another significant characteristic
is the compression ratio which is defined as the change in Input Level needed to produce
a 1dB change in Output Level. Compression Ratios can have any value greater than 1:1
and less than 1:1 is also possible but correspond to dynamicrange expanders rather than
compressors. In hearing aids with WDRC, compression ratios in the range of 1:5:1 to 3:1
are very common.
As far as the DynamicRange Compressor is concerned, it involves several engineering
tradeoffs. It is very important to realize that there is not a unique best compressor design.
19
Each system involves tradeoffs between processing complexity, frequency resolution, time
delay and quantization noise. The most important processing concerns are the system
frequency resolution and the processing time delay. Most digital compression systems use
multiple frequency bands. For any given processing approach, increased frequency resolution
comes at the price of increased processing delay.
Interaction between static and dynamic aspects of compression
With incoming sounds, the attack/release times of a hearing aid interact with the compression ratio and these interactions affect the sound quality for the listener. Fast attack/release
times have the effect of temporarily reducing the ratio or amount of compression for any
given sound stimulus. In general, a combination of short attack/release times (i.e. 10ms)
and high compression ratios (i.e. 10:1) cause distortion. If the same short attack/release
times are combined with low compression ratios (i.e. 2:1) then the sound quality is not
quite so compromised.
Dynamic and static aspects of compression are found in predictable combinations today.
Syllabic Compression, with its relatively short attack and release times, is mostly often
associated with Wide Dynamic Range Compression hearing aids that have a low compression
threshold (or kneepoint) and low compression ratio of less than 5:1. It is also sometimes
encountered with Output Limiting Compression hearing aids in which thresholds and ratios
of compression are high.
1.3.3
The following section tries to outline several theoretical reasons why compressors should be
included in hearing aids:
Avoiding discomfort, distortion and damage
As the input to the hearing aid increases, its output cannot be allowed to keep on getting
bigger. There are two reasons why the maximum must be limited. Firstly, if excessively
intense signals are presented to the hearing aid wearer, the resulting loudness will cause
discomfort. Thus, this loudness discomfort level which is subjective for each wearer provides
an upper limit to the hearing aid SSPL. Secondly, excessively intense signals may cause
further damage to the aid wearers residual hearing capability.
These two reasons explain why the maximum output must be limited, but this limiting
could be achieved with either peak clipping or compression limiting. The reason for preferring compression limiting over peak clipping in nearly all cases is that peak clipping creates
distortion and even though so does compression limiting, the type of distortion created by
peak clipping is far more objectionable than the type created by compression limiting.
When compression limiting is used to control the SSPL of a hearing aid, a high compression ratio is needed so that the output SPL does not rise significantly for very intense input
levels. The attack time must be short so that gain decreases rapidly enough to prevent
loudness discomfort. As with all compressors, the release time must not be so short that
it starts distorting the waveform. If a hearing aid does not include a compression limiter,
peak clipping will occur once the input signal becomes sufficiently intense. If the hearing aid
contains Wide Dynamic Range Compression, the input level needed to cause peak clipping
may be so high that peak clipping seldom occurs.
20
21
input level, so that even a single channel hearing aid can have a leveldependent frequency
response.
Maximizing intelligibility
Multichannel Compression can be used to achieve in each frequency region the amount
of audibility that maximizes intelligibility, subject to some constraint about overall loudness. Although the overall loudness of broadband sounds may be well normalized, such an
approach will result in loudness not being normalized in any frequency region.
Reducing noise
The interfering effect of background noise is the single biggest problem faced by hearing aid
wearers. There are several assumptions made so that compression will decrease the effects
of noise:
Noise usually has a greater lowfrequency emphasis than does speech and thus, the
lowfrequency parts of speech are more likely to be mask and hence convey little
information.
Lowfrequency parts of noise may cause upward spread of masking and so mask the
highfrequency parts of speech.
Lowfrequency parts of noise contribute most to the loudness of noise.
Noise is more of a problem in highlevel environments than in lowlevel environments.
Consequently, if the lowfrequency parts of the noise cause masking and excessive loudness and at the same time, lowfrequency parts of speech do not convey useful information,
then increase of comfort and improvement in intelligibility can be achieved by decreasing
lowfrequency gain in highlevel environments. Hearing aids aimed at noise reduction have
often been marketed as Automatic Signal Processing Devices. An additional benefit of such
devices is that the aid wearers own voice has a greater lowfrequency emphasis and a greater
overall level at the hearing aid microphone than the voice of other people. Consequently,
lowfrequency compression can help give the wearers own voice a more acceptable tonal
quality than would occur for linear amplification.
Although noise reduction discussed so far aims to minimize only lowfrequency noise,
more advanced multichannel hearing aids can decrease noise or signal in any frequency region
where SNR is estimated to be particularly poor. This type of hearing aids estimates SNR
within each channel by taking advantage of the fluctuations in level that are characteristic
of speech, in comparison to the more constant level of background noise. In each channel,
the envelope is analyzed by a speech or nonspeech detector where higher parts are assumed
to represent the peaks of speech signal and lower level parts represent background noise.
The speech/nonspeech detector combines these estimates of signal level and noise level
to estimate the SNR in each channel and thus, the appropriate gain for each channel is
calculated.
Chapter 2
2.1
Multi-channel dynamic-range compression is a basic part of digital hearing aids. The design of a digital compressor involves many considerations, including frequency resolution,
processing group delay, quantization noise, and algorithm complexity. A multichannel
compressor combines a filter bank with compression in each frequency band. In most
implementations, compressors operate independently in each channel but there are some
systems where compression gains can be grouped across adjacent bands. The compressor
output involves the response of each frequency band to the signal present in that band and
even some simple signals might cause complicated responses. The system output is finally
produced by adding compressed signals in each band as shown in Fig. 2.1:
Through multiband compression, hearing aids separate the input signal out to different
frequency bands and each subband signal goes through a different channel. Each channel has
its own compressor and the amount of compression is different at each frequency depending
on the patients hearing loss or input signal level.The amount of compression is bigger for
higher compression ratios and low compression thresholds. Furthermore, a disadvantage
of singlechannel compression over multichannel compression is that in the former, when
22
23
2.2
FrequencyDomain Compression
Filter banks represent one approach to timedomain processing. The input sequence is
convolved with the filters one sample at a time and the resulting output sequence is formed
by summing the filter outputs. An alternative approach is to divide the signal into short
segments, transform each segment into the frequency domain using an FFT, compute the
compression gains from the computed input spectrum and apply them to the signal, and
then inverse transform to return to the time domain.
2.2.1
24
Figure 2.2: Block diagram of an ideal frequency-domain compression system using 128-point
FFT and sampling rate 16kHz.
Initially, the input fills a data buffer, is windowed and zeropadded. Using overlap
add method, the FFT of the block is calculated and the power spectrum is estimated at
a 125Hz frequency spacing. These power estimates in the desired frequency bands are
computed in individual frequency bins at low frequencies and combined frequency bins at
higher frequencies. In this way, approximation to human auditory frequency analysis is
achieved. For each block of data, power spectrum is thus computed and a sequence of
signal samples is produced at the block sampling rate.
Compressor gains in each band are computed for the FFT system and afterwards, the
FFT of the input signal is multiplied by the compressor gains to give the compressed signal
in frequencydomain. The compressed signal is finally inverse transformed to give the time
sequence and all sequences are combined using overlapadd technique.
The frequency-domain compressor can be considered to be a filtering operation; the
spectrum of the input signal is multiplied by the spectrum of the compression filter to
give the spectrum of the compressed output signal. However, the compression filter is
designed in the frequency domain, so the length of its impulse response is not known and
can lead to temporal aliasing. Consequently, the length of the filter response must be chosen
appropriately so as to eliminate temporal aliasing.
Practical FFT system
The FFT system with temporal aliasing eliminated requires a total of four FFTs: a forward
FFT for the input segment, an inverse FFT for the compression gains, a forward FFT for
the truncated compression impulse response, and an inverse FFT for the filtered segment.
A practical digital hearing aid, in general, will not have the signal-processing capability
to perform four FFTs. The DSP may not be fast enough, or the battery drain may be
25
too great. One solution to this problem is to provide circuitry on the DSP chip that is
dedicated to computing the FFT or to exploit special properties of the FFT and digital
filters to design a transform with a reduced operations count.
An additional solution is to compromise on the compression filter design to reduce the
number of FFTs needed.The shorter the impulse response, the smoother the frequency
response. Thus smoothing the compression-gain frequency response is equivalent to an
approximate truncation of the impulse response. The smoothing does not produce an exact
truncation, so some residual temporal aliasing distortion is possible. A careful selection of
the input segment length, FFT size, and frequency-domain smoothing will result in temporal
aliasing distortion that can not be perceived under most listening conditions.
Furthermore, the time delay of the FFT compressor depends on the size of the input
buffer and the size of the FFT. The FFT can not be computed until the input buffer is filled,
so there is a processing delay while the input segment is accumulated. The compression
frequency response is also specified as a real number greater than zero in each frequency
band. A frequency response that is pure real has a corresponding impulse response that
is linear-phase. Another probable way by which the delay can be adjusted is by changing
the size of the input segment and/or that of the FFT. A shorter input segment means that
the input buffer will be filled sooner, with a corresponding reduction in the overall delay.
However, a shorter input buffer means that the FFTs will have to be computed more often,
and the processing capacity of the DSP or the battery drain will need to be increased. The
other option is by using a smaller FFT. If the input buffer size is halved and the FFT size
halved, then the delay will be also be halved without an increase in the computational or
power requirements. However, the frequency resolution for a smaller FFT is reduced.
2.2.2
SideBranch Architecture
In Figure 2.3, we observe the block diagram of the sidebranch compression architecture
which has the advantage of combining low quantization noise of FIR filter bank with the
efficiency of spectral gain calculation using the FFT:
26
In this implementation, the input signal fills a K/2-sample buffer and present K/2
samples are appended to the previous K/2 samples to give a total of K samples which
are then windowed to provide the input to the K-point FFT. The signal power spectrum is
computed from the FFT bins. Frequency bands are peak-detected and compressor gains are
computed from the peak-detector outputs. The compression gains are inverse transformed
to give the impulse response of the compression filter. Because the gains as a function of
frequency are real, the impulse response has even symmetry and yields a linear-phase filter.
The impulse response can be windowed if desired to smooth its frequency response. The
K/2 most-recent input samples are then convolved with the K-point FIR filter to produce
the final output.
2.3
Hearing losses are typically frequencydependent, so the compressor is designed as to provide different amounts of dynamicrange compression in different frequency regions. The
solution is a multichannel system, such as a filter bank with different degrees of compression in each channel. The design of a multichannel compressor involves a fundamental
tradeoff between frequency resolution and time delay.
For any given processing approach, increased frequency resolution comes at the price of
increased processing delay. Compared to conventional digital processing algorithms, the use
of digital frequency warping inherently gives frequency resolution on an auditory frequency
scale and reduces the amount of processing delay for a specifieddegree of lowfrequency
resolution. The processing delay of a frequencywarped compressor, which is described in
a following section, is frequencydependent with greater delay at low frequencies than at
high frequencies. Consequently, a frequencywarped compressor must take into account
the frequency resolution, overall system processing delay and delay variation
across frequency. The target is to design a compression system that avoids audible artifacts
caused by the system delay and has good frequency resolution on a criticalband frequency
scale [12].
2.3.1
Frequency Resolution
The main concern in designing a multichannel compressor is to match systems frequency
resolution to that of the human auditory system. Digital frequency analysis typically provides constantbandwidth frequency resolution. However, human auditory systems resolution is more accurately modeled by a filter bank having a nearly constant bandwidth
at low frequencies but proportional to frequency as it increases. This mismatch between
digital and auditory analysis can be greatly reduced by replacing the conventional uniform
frequency analysis by a warped frequency analysis. Frequency warping uses a conformal
mapping so as to reallocate frequency samples close to the Bark frequency scale and is
described in more detail in a following section.
Overall Processing Delay
A second concern in designing a compression system for a hearing aid is the overall processing delay which might cause coloration effects when the hearingaid wearer is talking. When
talking, the talkers own voice reaches the cochlea with minimal delay via bone conduction
27
and through the hearingaid vent. This signal interacts with the delayed and amplified signal produced by the hearing aid device to produce a combfiltered spectrum at the cochlea.
Delays as short as 3 to 6msec that are constant across frequency are detectible and overall
delays in the range of 15 to 20msec can be judged as disturbing or objectionable.
The overall processing delay is due to several factors. Certain aspects of the overall system delay, such as the A/D and D/A converter delays, are not affected by signal processing
since they are fixed by the hardware. The total software processing delay is the sum of
the time required to fill the input buffer, the group delay inherent in frequency-domain or
time-domain filtering and the time needed to execute the code before the output signal is
available.
2.3.2
Figure 2.4: Group delay in samples for a single all-pass filter having the warping parameter
a=0.5756 [12].
The transfer function of the warped FIR filter is the weighted sum of the outputs of
each allpass section:
K
X
W (z) =
bk Ak (z)
(2.2)
k=0
for a filter with K +1 taps. Forcing the real filter coefficients {bk } to have even symmetry for
an unwarped FIR filter yields a linearphase filter, in which the filter delay is independent
28
Figure 2.5: Block diagram of a compression system using frequency warping for both frequency analysis and filtered signal synthesis.
of the coefficients as long as symmetry is preserved. This symmetry property guarantees
that no phase modification will occur as the compressor changes gain in response to the
incoming signal. In a binaural fitting (hearing aids on both ears), the coefficient symmetry
also ensures that identical amounts of group delay are introduced at the two ears and thus,
preserving the interaural phase differences used for sound localization.
Frequency warping can be used to design both finiteimpulse response and infinite
impulse response (IIR) filters. Improved frequency resolution in a conventional FIR filter
requires increasing the filter length, which leads to a further increase in group delay. Similarly, improved frequency resolution in a warped FIR filter requires an increase in the
number of allpass filter sections which also leads to a rise in filter delay. There is therefore
a tradeoff between frequency resolution and group delay for both conventional and warped
filters, although the warped filter has less delay at low frequencies than a conventional filter
with the same lowfrequency resolution.
2.3.3
29
the compression gains are updated once per block (block processing is typically used) [12].
The compression gains are computed from the warped power spectrum and are pure real
numbers so the inverse FFT to give the warped time-domain filter results in a filter with real
and even-symmetrical coefficients. The system output is finally calculated by convolving
the delayed samples with the compression gain filter:
y(n) =
K
X
gk (n)pk (n)
(2.3)
k=0
2.4
We live in a noisy world. In all applications (telecommunications, hands-free communications, recording, human-machine interfaces, etc.) that require at least one microphone, the
signal of interest is usually contaminated by background noise and reverberations. Therefore, the microphone signal has to be cleaned with digital signal processing tools before
it is played out,transmitted or stored. As a result, nowadays digital hearing aids are mostly
equipped with speech enhancement systems.
By speech enhancement, we mean the improvement in intelligibility and/or quality of a
degraded speech signal which includes not only noise reduction but also dereverberation and
separation of independent signals. Speech enhancement is a very difficult problem because
the nature and characteristics of noise signals can change dramatically in time and the
performance measure can also be defined differently for each application. To measure the
performance, two perceptual criteria are widely used: quality (subjective) and intelligibility (objective). In general, it is not possible to improve both quality and intelligibility at
the same time and quality is usually improved at the expense of sacrificing intelligibility [13].
2.4.1
For the suppression of background noise, a noise reduction system has to achieve a high
quality for the enhanced speech but without causing a significant signal delay. A high signal
delay can cause coloration effects while the hearing-aid user is talking. In such a case,
the talkers own voice reaches the cochlea with minimal delay (via bone conduction and
through hearing-aid vent) and interacts with the delayed and amplified signal produced by
the hearing-aid. This leads to perceptual annoying artifacts.
In order to achieve these two conflicting goals, the main focus is the development of a
post-filter in the considered noise reduction system sketched in Fig. 2.6 [14]:
The calculation of the filter coefficients is done in the frequency domain while the actual filtering is performed in the time-domain. For the adaptation of filter coefficients
in frequency-domain, the noisy input speech signal x(k) is transformed into the spectraldomain by means of a frequency-warped Discrete Fourier Transform (DFT) Analysis filter
bank which is described in more detail in Chapter 3. It should be mentioned that a filter bank with non-uniform (approximately Bark-scaled) frequency resolution incorporates
a perceptual model of the human auditory system. Thus, a lower number of frequency
channels can be taken comparing to those of a uniform filter bank.
30
M
1
X
Wi (k 0 )ej M i(nn0 )
(2.5)
i=0
where the variable n0 ensures coefficients with non-zero phase. The choice n0 = L/2 (with
L even) yields coefficients with linear-phase property:
wn (k 0 ) = wLn (k 0 )
(2.6)
As a consequence, the FIR filter coefficients are obtained by the following relationship:
hs (n, k 0 ) = h(n)wn (k 0 )
(2.7)
and have also linear-phase property. Thereby, h(n) denotes the real finite impulse response
of the prototype filter of the Analysis filter bank. Finally, the coefficients of the warped
time-domain post-filter are hs (n, k 0 ) and its transfer function is Hs (
z , k 0 ) where z = A(z)
denotes the z-variable changed by all-pass transformation. The actual filtering of noisy
speech x(k) is done by this warped time-domain filter and the output signal y(k) = s(k) is
denoised speech.
2.4.2
31
Chapter 3
33
3.2
A common choice for many applications is the uniform DFT analysis-synthesis filter bank
(AS FB). First, the DFT filter bank is based on the M M DFT matrix that has elements
[Wkm ] = W km where W ej2/M and k,m indicating the row and the column respectively
(see Appendix A). In Figure 3.3, we might see a simple example of a uniform DFT filter
bank [15].
The input sequence x(n) passes through a delay chain and M sequences si (n) are gen-
34
M
1
X
si (n)W ki .
(3.1)
i=0
As a result, for every value of time-n we compute the set of M -signals xk (n) from the
set of M -signals si (n). In z-domain, one can write:
Xk (z) =
M
1
X
i=0
Si (z)W ki =
M
1
X
X(z)z i W ki =
i=0
M
1
X
(zW k )i X(z).
(3.2)
i=0
(3.3)
Hk (z) , H0 (zW k )
(3.4)
H0 (z) = 1 + z 1 + . . . + z (M 1)
(3.5)
where
and
A filter bank in which the filters are related as in Equation (3.4) is called a uniform DFT
filter bank.
Summarizing, the system presented is equivalent to an analysis bank with analysis filters
Hk (z) with frequency response:
Hk (ej ) = H0 (ej(
2k
)
M
(3.6)
which is a shifted version of a single filter H0 (ej ) which we call the prototype filter. Consequently, we have a bank of M -filters which are uniformly shifted versions of H0 (z) with
large amount of overlap in their frequency responses.
Lastly, we can think of the uniform DFT filter bank as a spectrum analyzer. The k th output xk (n) is the spectrum computed based on the most recent M -samples of the
input sequence x(n). Since xk (n) is the output of Hk (ej ), it dominantly represents the
35
3.3
3.3.1
The polyphase representation can be used to efficiently implement the uniform DFT filter
bank mentioned above [15]. To explain the basic concept, consider a filter
H(z) =
h(n)z n
n=
Separating the even numbered coefficients from the odd numbered ones, its transfer function
is written:
H(z) =
=
h(2n)z
n=
E0 (z 2 )
2n
+z
h(2n + 1)z 2n
(3.7)
n=
+ z 1 E1 (z 2 )
where
E0 (z) =
(3.8)
h(2n)z n
(3.9)
h(2n + 1)z n
(3.10)
n=
and
E1 (z) =
X
n=
We should note that the representations hold whether H(z) is FIR or IIR, causal or
noncausal. Extending this idea, if we are given any integer M we can always decompose
H(z) as:
H(z) =
h(nM )z nM +z 1
n=
n=
h(nM +M 1)z nM
n=
(3.11)
which is compactly written as:
H(z) =
M
1
X
z l El (z M )
(Type 1 Polyphase)
(3.12)
l=0
The above equation is called Type 1 Polyphase Representation and El (z M ) are the polyphase
components of H(z) which depend on choice of M:
El (z) =
X
n=
el (n)z n
and
(3.13)
36
Figure 3.4: Schematic of the relation between h(n) and its l-th polyphase component.
M
1
X
z M 1l Rl (z M )
(Type 2 Polyphase)
(3.14)
l=0
3.3.2
(3.15)
M
1
X
z l El (z M )
(3.16)
l=0
there are M -terms each with magnitude 1/M which add up to approximately unity. So,
the M -terms z l El (z M ) are almost in phase. But for an M -th band filter, E0 (z) is constant
and leads to the conclusion that the phase responses of z l El (z M ) are nearly zero in the
passband. In other words, El (ej ) tries to approximate ejl/M and therefore, the phase
response l () of the `-th polyphase component tries to approximate l/M , for each l.
This is the motivation for the use of the term polyphase decomposition and the main
reason for its use is its computational efficiency in Multirate Signal Processing.
3.3.3
A set of M -filters is said to be a uniform-DFT filter bank if they are related according to
the relation:
Hk (z) , H0 (zW k )
(3.17)
37
Figure 3.6: Implementation of the uniform DFT bank using polyphase decomposition.
The prototype filter H0 (z) can be expressed as:
H0 (z) =
M
1
X
z l El (z M )
(3.18)
l=0
based on its Type 1 Polyphase Representation. Hence, the k -th filter can be expressed as:
Hk (z) = H0 (zW k ) =
M
1
X
(z 1 W k )l El (z M )
(3.19)
l=0
M
1
X
W kl (z l El (z M )X(z))
(3.20)
l=0
L
X
l=0
i
h(l)EM
(z l )z l
(3.21)
38
Figure 3.7: Polyphase decomposition of the uniform DFT filter bank with decimation by a
factor of M.
and
Gi (z) =
L
X
i
g(l)EM
(z l+1 )z l
i = 0, 1, . . . , M 1
(3.22)
l=0
where h(n) and g(n) denote the finite impulse responses for the analysis and synthesis
prototype filter, respectively. A common choice for the FIR filter degree is L=M -1, but a
higher degree can also be taken to increase the frequency selectivity of subband filters.
In order to design a warped DFT analysis filter bank -which is the case in our implementationwe replace all the delay elements in Figure 3.7 with allpass filters. The allpass transformation allows to design a nonuniform filter bank whose frequency bands approximate the
Bark frequency scale with great accuracy. The aliasing distortions due to subsampling operations might be reduced since a filter bank with perfect signal reconstruction is assumed
(AS filter bank is linear periodically time-variant) by reducing the subsampling rate R and
by using subband filters of higher degrees having narrow transition bands and high stopband
attenuations.
3.4
W0 0
0
W0
..
2
..
W1 ejn0 M
0 W1 . . .
H .
.
F 1 = F
F
(3.23)
= FW
..
..
.
.
.
.
.
..
.
.
.
0
.
2
WM 1 ej(M 1)n0 M
0
0 WM 1
0
39
Figure 3.8: Polyphase network (PPN) realization of a DFT analysis-synthesis filter bank
for a prototype filter of length L + 1 = 2M.
where Wn are the spectral domain coefficients adapted at a reduced sampling rate. As
a result, the new coefficients wn0 of Fig.3.9 yield from employing DFT on the data vector
W:
wn0
M
1
X
jn0 2
k
M
(Wk e
)e
j 2
kn
M
M
1
X
k=0
Wk ej M k(nn0 )
(3.24)
k=0
Since samples x(k) result in x(kM ) after downsampling by a factor M, these downsampled versions x(kM ) pass through the subband filters El (z) each of which in case of a
prototype of length L + 1 = 2M has two coefficients. More specifically, the low delay filter
bank of Figure 3.10 will be considered [17].
This filter bank has M -subbands and the impulse response hi (n) of the i -th subband
filter is obtained by a modulation of the prototype lowpass filter with real impulse response
h(n) of length L + 1 M according to:
(
h(n)(i, n), i =0,1,. . . ,M -1;
n=0,1,. . . ,L
hi (n) =
(3.25)
0
else.
The choice for the general modulation sequence and the prototype filter affects the
spectral selectivity and time-frequency resolution of the filter bank. (i, n) can also be
regarded as a transformation kernel of the filter bank and it is periodical so that:
(i, n + mM ) = (i, n)(m),
mZ
(3.26)
40
Figure 3.10: Filter-bank summation method for time-varying spectral gain factors Wi (k 0 )
adapted at a reduced sampling rate.
where the sequence (m) depends on the chosen transform and for many transforms (including the DFT), it is given by (m) = 1, m.
As shown in Figure 3.10, the input-output relation for the filter bank is given in the
z-domain as follows:
M
1
L
X
X
Y (z) =
Wi (
hi (n)z n )X(z)
(3.27)
n=0
i=0
Hence, the overall transfer function is obtained by inserting Equation (3.25) for hi (n)
into Equation (3.27). This yields:
F0 (z) =
L
X
h(n)
M
1
X
n=0
where
wn0 =
Wi (i, n)z n
(3.28)
i=0
M
1
X
i=0
Wi (i, n)
(3.29)
41
L
X
hs (n)z n
(3.30)
n=0
whose impulse response hs (n) is the product of the impulse response h(n) of the prototype
filter and the weighting factors wn adapted in the spectral domain:
hs (n, k 0 ) = h(n)wn (k 0 ),
n = 0, 1, . . . , L
(3.31)
It has already been mentioned that the calculation of the spectral gain factors W (k 0 )
can be done by a common spectral speech estimator and the GDFT of size M provides
(L+1)-weighting factors wn (k 0 ) with non zero phase. However the naming as Filter Bank
Equalizer should point out that this kind of time-domain filtering has been developed
from the low delay filter bank which can be regarded as a filter bank used as equalizer.
Furthermore, to obtain the so called Non uniform Filter Bank Equalizer, one should
employ digital frequency warping by means of an allpass transformation where the delay
elements of the discrete subband filters are replaced by allpass filters:
z 1 HA (z)
(3.32)
1 ej
= ej ()
ej
(3.33)
sin sin
)
cos cos
(3.34)
The frequency response of the allpass transformed filter bank equalizer (APT FBE) is
derived from Equation (3.30) by using Equations (3.32) and (3.33):
F (ej ) =
L
X
h(n)wn ejn ()
(3.35)
n=0
Thus, the allpass transformation causes a frequency mapping (). This frequency
mapping (warping) is solely determined by the allpass pole according to Equation (3.34).
The uniform FBE with transfer function F0 (z) is included as special case for = 0 since
then HA (z) = 1/z.
On one hand, the FBE needs more multiplications than the corresponding AS FB due
to the time-domain filtering at sampling rate. But on the other hand, the computation of
the gain factors in the spectral-domain is decoupled from the actual filtering in the timedomain. Therefore, no aliasing effects occur and since no synthesis filter bank is needed,
the signal delay is reduced [18].
3.5
42
The signal reconstruction which is the main target of many signal processing schemes is
generally accomplished by the synthesis filter bank which consists of the upsampling operations and interpolating bandpass filters. A filter bank achieves perfect signal reconstruction
with a delay of d0 samples if:
v(n) = y(n d0 )
(3.36)
for spectral gains Wi = 1, i.
The objective of the prototype lowpass filter design is to achieve perfect reconstruction
and the FBE generally meets this condition as long as the following two requirements are
fulfilled:
1) the general modulation sequence has the property:
(
M
1
X
C,
C 6= 0; n = n0 ; n, n0 {0, 1, . . . , M 1}
(i, n) =
(3.37)
0,
n 6= n0
i=0
which is a condition met by the transformation kernel of the GDFT.
2) a generalized Mth-band filter with impulse response
1
C(mc ) ; n = n0 + mc M, (mc ) 6= 0, mc Z
h(n) = 0;
n 6= n0 + mM, m Z\{mc }
arbitrary;
else
(3.38)
is the prototype lowpass filter design [17]. A suitable Mth-band filter is given by:
h(n) =
sin( 2
1
M (n d0 ))
winL (n)
2
C(mc ) M (n d0 )
(3.39)
(3.40)
A rectangular window achieves a least-squares approximation error, but other window sequences such as Kaiser window, the Hann window ( = 0.5) and the Hamming window
( = 0.54) are often preferred to influence properties of the filter such as transition bandwidth or sidelobe attenuation.
3.5.1
The polyphase network (PPN) implementation of the FBE described above is shown in
Figure 3.11.
The transposed direct form of the filter is derived from its direct form representation
by transposition of the signal flow graph and following the rules: 1: branch nodes as well
as the system input and output are interchanged. 2: All signal directions are reversed and
delay elements might be inserted in each branch of the time-domain filter to account for
the execution time to calculate the time-domain weighting factors wl (n0 ). These weighting
factors are calculated by a separate network similar to that of the figure but with the difference that the downsampling is performed directly after the delay elements. Hence, the PPN
realization of the transposed direct form requires a slightly higher algorithmic complexity
43
Figure 3.11: Polyphase network implementation of the FBE for the direct-form filter.
compared to that of the direct form realization. The polyphase network decomposition can
be performed for both FIR filters and IIR filters [17].
The utilization of a suitable prototype filter is a scheme that we are interested in
since the polyphase representation allows to improve the spectral selectivity of the
subband filters in order to reduce the cross-talk between adjacent frequency bins.
We consider the PPN implementation of the filter bank used in our system as an alternative
of FFT in order to obtain a better way of estimating energy in each subband and thus,
calculating gains.
In terms of computational complexity, it becomes M logM (due to FFT computation)+2M
(due to computations at M -subband filters) for the polyphase representation of the filter
bank in contrast with M logM of FFT where M is the number of subbands (length of FFT).
For an FFT length equal to 64 (M = 64) and a prototype filter of length L + 1 = 2M = 128,
each filter performs two multiplications and one addition (see section ).
3.5.2
Length
128
Delay
32
120
8
7
4
1
[ 32
4, 1.2
32 4]
1
[ 32
, 0.5 1.2
32 ]
44
Combined error
0.2427
0.1118
0.0067
Length
128
Delay
32
120
8
7
4
1
4, 1.2
[ 64
64 4]
1
[ 64 , 0.5 1.2
64 ]
Combined error
0.3219
0.2132
0.0024
Mc=2
Mc=3
Mc=4
Filter Design
1st scheme
Length
128
Delay
64
1st filter
2nd filter
1st filter
2nd filter
1st filter
2nd filter
120
8
120
8
120
8
30
4
20
4
15
4
Transition Edges
1.2
[ 0.8
64 , 64 ]
2nd scheme
1.2
[ 0.8
32 , 32 ]
0.8
[ 64 , 1 1.2
64 ]
0.8
1.2
[ 64 3, 64 3]
1.2
[ 0.8
64 , 1 64 ]
0.8 1.2
[ 16 , 16 ]
1.2
[ 0.8
64 , 0.5 64 ]
Error
0.2416
0.1363
9.3139e-0.8
0.0913
7.4511e-4
0.0667
0.0024
45
Combined error
0.2416
0.1363
0.0920
0.0262
Table 3.3: The maximum combined error for different compression factors Mc .
The proposed scheme (second scheme) gives us better results according to the estimated
combined approximation error for every value of compression factor Mc . More specifically
as the compression factor Mc increases, the combined error decreases and for Mc = 4, the
suggested idea gives us a better approximation error by an order of magnitude and thus,
by two orders of energy compared to the original (first) scheme.
3.6
Conclusions
Chapter 4
4.1
We consider the filter of finite impulse response hs (n) which has transfer function:
Hs (z) =
L
X
hs (n)z n
(4.1)
n=0
This FIR filter is initially approximated by a uniform auto-regressive filter with infinite
impulse response (IIR) hAR (n) and transfer function:
HAR (z) =
0
PP
n=1 n z
(4.2)
The cascade of the original FIR filter with transfer function Hs (z) and the inverse of the
AR filter is sketched in Figure 4.1.
46
47
The input x(k) to the original filter is a white noise sequence x(k) = nw (k) with variance
2 . As a result, the output y(k) of the FIR filter is colored noise and the output
E{x2 (k)} = w
of the inverse AR filter of order-P is:
!
P
X
1
e(k) =
y(k)
n y(k n) .
(4.3)
0
n=1
We know from the theory that for a linear causal system H(z) excited by a white noise
sequence w(k), the output x(n) is determined according to:
x(n) =
hk w(n k).
(4.4)
k=0
If xn is a wide sense stationary (WSS) process, its power spectral density which is the
Fourier transform of its autocorrelation function xx is given by:
xx (ej ) =
xx ejm
(4.5)
m=
and hence:
2
xx (ej ) = |H(ej )|2 w
.
(4.6)
In order to obtain the input white noise sequence w(n), the inverse procedure which is
called noise whitening is realized where the signal x(n) is now the input to the system
with transfer function 1/H(z) and the produced output is w(n). As a consequence, in order
to achieve a good approximation of the original FIR filter Hs (z) of figure 4.1 by an allpole
filter HAR (z), the power of the error signal e(k) has to be minimized since:
e (ej ) = |H(ej )[1 A(ej )]|2 w (ej )
where the power spectral density of white noise is w (ej ) = 1.
The power of the error signal is:
R
R
1
1
j
j 2
j 2
E{e2k } = 2
e (e )d = 2 |H(e )| |1 A(e )| d
(4.7)
(4.8)
The P -unknown coefficients n of filter A(z) can be computed by minimizing the power
with respect to the filter coefficients:
E{e2k }
= 0,
P
X
= 1, 2, . . . , P.
n yy ( n),
= 1, 2, . . . , P.
(4.9)
(4.10)
n=1
2 , in
Since the power spectral density of the output signal y(k) is yy (ej ) = hh (ej )w
the time-domain it leads to:
2
yy () = hh ()w
(4.11)
For the AR filter coefficients given by Equation (4.10), the power (variance) of the output
error signal amounts to:
h
i
P
E{e2k } = e2 = 12 yy (0) Pn=1 n yy (n)
(4.12)
0
48
n=1
or equivalently
v
u
P
u
X
t
n hh (n).
0 = hh (0)
(4.14)
n=1
From the Equation (4.10) and taking the (P + 1) different values of , we obtain:
hh (1)1 hh (2)2 . . . hh (P )P
= hh (0)
= hh (P )
1
hh (0)
hh (1)
hh (P )
1
hh (1)
hh (0)
hh (P + 1)
2
..
=
..
.
..
..
.
..
.
.
.
hh (P ) hh (P 1)
hh (0)
P
0
0
0
..
.
(4.15)
= hh (0)
P
X
n hh (n)
(4.16)
n=1
the augmented normal equations to determine the (P + 1)-coefficients n are finally given
by:
2
0
1
hh (0)
hh (1)
hh (P )
1 0
hh (1)
hh (0)
hh (P + 1)
2 0
(4.17)
=
..
..
..
..
.
.. ..
.
.
.
. .
hh (P ) hh (P 1)
hh (0)
P
0
The system of Equations (4.17) contains a correlation matrix that has Toeplitz structure
and can hence be efficiently inverted using the Levinson-Durbin algorithm. In addition, for
a Toeplitz correlation matrix that is positive definite, the AR filter is always stable which
is exactly the case.
There are several reasons why the AR filter approximation has been taken instead of
the more general AR Moving Average (ARMA) approximation. First, the AR filter has
minimum phase property since all poles and zeros are within the unit circle and thus,
it is always stable and a low signal delay can be achieved. Moreover, it leads to a good
49
Figure 4.2: Basic concept of approximation: the two signals y and e must have the same
statistical characteristics.
approximation of the magnitude response of the original filter but not of the phase response.
Nevertheless, this is tolerable for the hearing-aid application since the human ear is relatively
insensitive towards minor phase modifications [14].
In a more compact form, the objective of the described approximation is to determine
the filter coefficients n such that if the input signal x(k) is a white noise sequence, then
the output signal e(k) is a white noise sequence acquiring the same signal characteristics.
In other words, as depicted in Figure 4.2 we desire that the two signals y(k) and y (k)
have the same statistical characteristics rather than being identical.
For the calculation of the AR filter coefficients, the system that we have solved yields
from the main idea of minimizing the energy of the output error signal and thus, minimizing
the quantity
Z
|H(ej ) 1 A(ej ) |2 d = 0.
(4.18)
(4.19)
Taking the derivative with respect to A, we finally obtain:
Z
Z
j 2
H
|H(e )| Re{EE }d A =
|H(ej )|2 Re{ET }d
(4.20)
R
R
We define = |H(ej )|2 Re{EEH }d and b = |H(ej )|2 Re{ET }d and the
solution of the system gives the vector A of the unknown AR filter coefficients:
A = 1 b
(4.21)
In this way, the scaling factor 0 which plays the role of the minimization constant is
s
Z
0 =
min
A
(4.22)
1 1
z 1
0
. . . P z P
(4.23)
4.2
4.2.1
50
It has already been mentioned that the allpass transformation of a discrete filter is achieved
by substituting all delay elements by allpass filters:
z 1 HA (z)
where HA (z) is the transfer function of a (causal) real allpass filter of first order
HA (z) =
z 1
,
1 z 1
|| < 1; R.
(4.24)
ej
= ej ()
1 ej
(4.25)
(4.26)
(4.27)
The frequency warping or allpass transformation is marked by the tilde notation. This
warped post-filter shall be approximated by a warped AR filter with transfer function
AR (z) =
H
0
1
PP
n
n=1 n HA (z)
(4.28)
0
1
PP
j ()n
n=1 n e
(4.29)
In this case, we get an illustration of the approximation of a warped FIR filter by a warped
AR filter similar to that shown in Figure 4.1 and the power of the corresponding error signal
that has to be minimized leads to:
Z
2 d() = 0
T A|
|H(ej() )|2 |1 E
(4.30)
= [
= ej() ej2() . . . ejP () T .
where A
1
2 . . .
P ]T and E
yields the system
Taking the derivative with respect to A
Z
|H(e
Z
H
)| Re{EE }d() A =
j() 2
T }d()
|H(ej() )|2 Re{E
(4.31)
or
=
1b
A
(4.32)
51
E
H }d()
|H(ej() )|2 Re{E
(4.33)
T }d()
|H(ej() )|2 Re{E
(4.34)
and
b =
The value of the scaling factor 0 is determined in a similar way as in the case of the
uniform approximation. It should be noted that the allpass transformation leads to delayless
feedback loops. Moreover, the coefficients of the frequency warped post-filter hs (n) =
h(n)wn are obtained from an allpass transformed analysis filter bank which has higher
computational complexity compared to that of a uniform filter bank. In case of a multichannel controlled post-filter, the adaptation of the filter coefficients requires more than one
frequency warped analysis filter banks and this leads to high computational complexity.
4.2.2
To face the above mentioned problem, a different filter approximation is proposed which
is that of a uniform time-domain post-filter approximated by an allpass transformed AR
filter. We calculate the filter coefficients hs (n) by means of one or more DFT analysis
filter banks and thus, provide a uniform post-filter. This uniform time-domain filter is then
approximated by a warped AR filter so as to exploit the benefits of a Bark scaled frequency
resolution. The obtained approximation problem is depicted in Figure 4.3.
For the following derivation, it is useful to express the convolution of a sequence y(k)
by an allpass chain of length- as follows
[]
(4.35)
[]
(4.36)
and denotes -times convolution. Specifically, hA (k) = (k) and the unit delay z 1 is
included as a special case for the warping parameter = 0.
By applying the definition (4.35), the output of the inverse warped AR filter with transfer
AR (z)1 is expressed in the time-domain as follows:
function H
!
P
X
1
e(k) =
y(k)
n Dy[n] (k)
(4.37)
0
n=1
52
Similarly to Sec.4.1, the minimization of the power of the error signal e(k)
E{
e2k }
= 0;
leads to
E{y(k)Dy[] (k)} =
P
X
= 1, 2, . . . , P
(4.38)
= 1, 2, . . . , P
(4.39)
n=1
(4.40)
yy (0, ) =
n yy (n, ).
(4.41)
n=1
[n]
[]
(4.42)
l=0
where
hs hs (n, ) =
[n]
[]
l=0
(4.43)
[nn]
[n]
Dhs (l)Dhs (l)
l=0
L
X
[n]
hs (l)Dhs
(l)
(4.44)
l=0
or
hs hs (0, n) = hs hs ( n)
(4.45)
L
X
L
h
i X
[0]
hs (l) hA (l) ? hs (l) =
h2s (l)
l=0
(4.46)
l=0
L
X
i
h
[1]
hs (l) hA (l) ? hs (l)
(4.47)
l=0
and in the same way, we obtain the (P + 1)-warped impulse autocorrelation coefficients.
53
Figure 4.4: Network for calculation of the (P + 1)-warped impulse autocorrelation coefficients hs hs ().
Taking into account Equations (4.39) and (4.42), we obtain:
hs hs () =
P
X
n hs hs ( n)
(4.48)
n=1
(4.49)
n=1
As far as the system of equations is concerned, it is given by the system of the augmented
normal Equations (4.17) by replacing hh () with hh (). Therefore, the AR filter coefficients are determined as for the uniform case.
To obtain an overall view of the present approximation, we must take into consideration
the frequency warping caused by the allpass transformation of the AR filter since
= ()
(4.50)
The power of the error signal e(k) for the warped frequency axis
amounts to:
Z 2
1
2
e =
E(
)d
(4.51)
2 0
The partial derivative of
with respect to gives:
1 2
1 2 cos + 2
as shown in Appendix C. It finally yields:
Z 2
1 2
1
2
E()
d
e =
2 0
1 2 cos + 2
noting that the upper and lower limits become
0 = (0) = 0
(4.52)
(4.53)
54
2 = (2) = 2
respectively. Consequently, the output signal e(k) is weighted by a Laguerre filter [21] as
depicted in Fig. 4.5 since the transfer function of a Laguerre filter is
1 2
(4.54)
HL (z) =
1 z 1
The error signal e(k) is now weighted and thus, does not have an approximately flat
spectrum as in the case of a uniform AR filter approximation. Hence, in order to obtain an
error signal with approximately flat spectrum, an inverse Laguerre filter must be applied to
the error signal e(k). The overall filter is eventually a cascade of the warped AR filter and
a Laguerre filter:
1 2
0
ARL (z) =
H
(4.55)
PP
1 n=1 n HA (z)n 1 z 1
4.3
Simulation Results
To ease the treatment of the warped AR filter, the approximation of the uniform post-filter
by a uniform AR filter was first described. The obtained results are shown in Fig. 4.6.
The approximation -as explained in theoretical analysis- is done using the Mean Square
Error (MSE) criterion since our aim is to minimize the output error signal e(k). We might
observe that in the figure, the approximating AR filter tries to catch the outliers while
loses a bit the fluctuations.
Furthermore, we considered the approximation of a warped and a uniform time-domain
post-filter by an allpass transformed AR low delay post-filter as shown in Fig. 4.7 and
Fig. 4.8 respectively. From the obtained figures, we observe that a good approximation
is achieved in almost all cases and especially the one we are interested in which is that
of approximating a warped post-filter by a warped AR filter of lower degree P (P = 20).
55
Magnitude in dB
10
15
20
25
0.2
0.4
0.6
0.8
4.4
N 1
+
N 1
q
h1
...
1
0
...
+
+
...
1
0
...
q
q
. . . hN 1 hN
1
+
1
q
hN +1
. . . N 1
N
...
+
+
. . . N 1 N
...
q
q
. . . h2N 1 h2N
(4.56)
This operation is similar to that of separating a function to a sum of an even and odd
56
dB
0
5
10
15
20
25
0.2
0.4
0.6
0.8
(4.58)
where R(ej ) approximates the ideal frequency response D(ej ). From the two above
equations and by expressing R(ej ) = Rr (ej ) + jRi (ej ), it yields:
Rr (ej ) = 0 + 21 cos + . . . + 2N cos N
Ri (ej ) = 21 sin + . . . + 2N sin N
L1
2
On the other hand, an even-tapered filter (L = 2N ) might be expressed:
() = N =
N
+
N
q
h0
N 1
+
N 1
q
h1
...
1
1
...
+
+
...
1
1
...
q
q
. . . hN 1 hN
. . . N 1
N
...
+
+
. . . N 1 N
...
q
q
. . . h2N 2 h2N 1
(4.59)
(2N 1)
(2N 1)
+. . .+2N cos
)++j(21 sin +. . .+2N sin
)}
2
2
2
2
(4.60)
j
In this case where L = 2N , R(e ) and the phase response of the filter become:
Rr (ej ) = 21 cos
(2N 1)
+ . . . + 2N cos
2
2
57
dB
5
10
15
20
25
30
0.2
0.4
0.6
0.8
(2N 1)
+ . . . + 2N sin
2
2
L1
() = (N 0.5) =
2
As a result, an FIR filter has linear phase which leads to a delayed output of (L 1)/2
samples:
L1
H(ej ) = ej 2 R(ej )
(4.61)
Ri (ej ) = 21 sin
Since the filter length L has a high value most of the times, the signal delay becomes
significant which is a deficit for real-time applications. An idea to reduce the delay is to
approximate ejM D(ej ) with R(ej ) instead of approximating the ideal frequency response
D(ej ) with R(ej ). This results in:
H(ej ) ej(
L1
M )
2
D(ej )
(4.62)
Delay
32
64
32
58
Error
3.6531e-03
1.7574e-05
3.9822e-04
Table 4.1: The maximum approximation error for different filter lengths in symmetrical and
non-symmetrical case.
with
sin(M )D(ej ).
In other words:
ej
L1
R(ej ) ej(
L1
M )
2
D(ej )
(4.64)
i ,i
max
(4.65)
where
- i and i are the filter parameters which must be computed and summed so as to
obtain the real coefficients hi
- W () is a weight function that shows the maximum allowed error in the passband
and the stopband (it is not defined in the transition band, since we dont care about
its value in this region)
- H(ej ) is the approximating function and
- ejM D(ej ) = cos(M )D(ej ) + j sin(M )D(ej ) is the function that must be approximated so as to achieve a reduced signal delay of M samples.
Summarizing, the general min-max problem helps us to find the parameters i and
i that minimize the distance between the two functions D(ej ) and R(ej ) in the general
case.We know that there is no algorithm that efficiently solves the general min-max problem
and hence,we try to solve the approximate min-max problem which is the one described
above by separating the general problem into two smaller that can be solved using Remez
algorithm (for the symmetric case).
In order to obtain arithmetic results for a filter with predefined specifications (passband,stopband and weighting function W ()), we used the Matlab command fminimax to
solve the complex min-max problem and the approximate min-max problem by solving the
two mentioned separate problems and minimizing their sum:
min max W ()|Rr (ej ) Dr (ej )| + min max W ()|Ri (ej ) Di (ej )|
0 ,...,N
0 ,...,N
(4.66)
59
It should be mentioned that the parameters n and n obtained by the min-max method
and thus, the real coefficients hn give a really good approximation of the magnitude response
|H(ej )| but not always a satisfying approximation in the phase response, especially out
of the passband. The phase responses behavior in the stopband is insignificant, since the
amplitude in this region is very close to zero.
Solve min-max problem using Linear Programming
It has already been mentioned that the approximate min-max method solves the complex
min-max problem by separating it into two subproblems of a real part and an imaginary
part approximation (see equation 3.49). Linear Programming refers to the problem of
maximizing or minimizing a linear function subject to linear constraints [20]. The linear
function is called objective function and the general problem is defined as follows:
f (x1 , x2 , . . . , xn ) = c1 x1 + . . . + cn xn + d
minx cT x,
subject to
Ax b
In our case, the problem is expressed as follows:
min max W ()|Dr () 0 21 cos() . . . 2N cos(N )|
(4.67)
(4.68)
0 ,...,N J
and
1 ,...,N J
where the region J includes the passband and stopband regions since these are the zones
we are interested in.
Defining
= max W ()|Dr () 0 21 cos() . . . 2N cos(N )|
J
0 ,...,N ,
But:
= opt
since is the maximum approximation error and
opt .
Thus, we finally obtain = opt .
The required inequality in the Linear Programming problem is:
W ()|Dr () 0 21 cos() . . . 2N cos(N )| =
W ()[Dr () 0 21 cos() . . . 2N cos(N )]
(4.69)
60
1 W (0 ) 2W (0 ) cos(0 ) 2W (0 ) cos(N 0 )
W (0 )Dr (0 )
1
..
..
..
..
..
..
.
..
.
.
.
.
.
.
1 W (k ) 2W (k ) cos(k ) 2W (k ) cos(N k )
W (k )Dr (k )
N
(4.71)
Similarly from the left part of the equality the system that yields is the following:
1 W (0 ) 2W (0 ) cos(0 ) 2W (0 ) cos(N 0 )
W (0 )Dr (0 )
..
..
..
..
..
.
.
.
.
.
.
.
.
..
1 W (k ) 2W (k ) cos(k ) 2W (k ) cos(N k )
W (k )Dr (k )
N
(4.72)
Consequently, for simplicity of programming we solved the min-max problem with the
help of linear programming and not using a Remez-like algorithm (see Appendix B). Unfortunately, the specific algorithm tends to be slow for large filter lengths while it can
sometimes fail converging. Despite that, the idea is extended for an even-tapped filter and
one can acquire a filter with the desired amplitude and delay.
Another approach to a Low Delay FIR filter
Since the linear-phase property is desired in FIR filter design problems, the signal delay is an
important algorithm characteristic. Most hearing-instrument users receive processed sound
together with unprocessed sound leaking directly into the ear canal. At low frequencies,
these components may have similar amplitudes. Interference between these components
can cause noticeable effects if the processed signal is delayed more than about 5 10ms.
For a listener with a severe hearing-loss, who cannot hear the unprocessed sound, the only
problem is the asynchrony between speech sounds and lip movements. Then, delays of
about 50ms are acceptable. The processing delay is finally influenced by several factors in
the algorithm implementation [23].
The filtering performed by the algorithm always introduces some group delay regardless
of implementation. The FIR filter proposed for our problem yields from the frequency
compression gains which provide real and symmetric filter coefficients by using the Inverse
Fast Fourier Transform.
Assume the frequency gains
g0 , g1 , . . . , gN 1
where N = 64 in our case and the property
gn = gN
n
61
Figure 4.9: Introduction of an all-pole filter so as to eliminate deep nulls in FIR filter
frequency response.
such that the filter frequency response has amplitude gn and linear phase k that represents
the desired filter delay (for example, k = 16):
2
(4.73)
4.5
An AR filter has no zeros outside the origin and is hence less suitable to approximate filters
having (many) zeros within their frequency response. Lollmann and Vary proposed that
the spectral weighting coefficients for speech enhancement should be bounded according to
the equation
Wthres < Wi (k 0 ) < 1;
i = 0, 1, . . . , M
(4.74)
This noise floor ensures that the AR filter does not have to approximate sharp zeros (deep
nulls).
1
Our idea was to introduce a filter with transfer function B(z)
in cascade with the AR
post-filter so as to eliminate the deep nulls in frequency response, as illustrated in figure 4.9.
As a result, the AR filter required in order to approximate the time-domain filter Hs (z) is
now HAR (z) where the inverse transfer function is given by
1
HAR
(z) =
1
(1 A(z))
B(z)
(4.75)
One thing that matters is the order of the polynomial B(z) and whether the correction at a
specific deep null frequency is right. The first thing to do is to find the frequencies at which
sharp zeros exist. In case of one or two deep nulls at frequencies 1 and 1 , 2 respectively,
we need to solve the following system:
- 1st case: if there is one deep null at frequency 1 then we must solve the equation
Hs (ej1 )
1
1 1 ej1
(4.76)
62
- 2nd case: if there are two deep nulls at frequencies 1 and 2 then we must solve the
system of equations
Hs (ej1 )
1
1 1 ej1 2 ej21
(4.77)
Hs (ej2 )
1
1 1 ej2 2 ej22
(4.78)
By solving the two linear systems, we find the appropriate coefficients 1 of the first
filter and the coefficients 1 and 2 of the second two-order filter B(z).
It should be noted that for the above analysis, we made the assumption that there are at
most two frequencies at which sharp zeros are met. Finally, the AR filter that approximates
the new FIR filter after the elimination of deep nulls has frequency response
P
0 1 2n=1 n ejn
HARN (ej ) =
(4.79)
P
1 Pn=1 n ejn
The simplest case of a filter frequency response after the elimination of one deep null is
shown in figures 4.10 and 4.11.
FIR Filter with Deep Null
Magnitude (dB)
28
26
24
22
20
0.2
0.4
0.6
0.8
Normalized Frequency ( rad/sample)
0.2
0.4
0.6
0.8
Normalized Frequency ( rad/sample)
Phase (degrees)
1000
2000
3000
Figure 4.10: Magnitude and Phase Response of a filter with one sharp zero at a specified
frequency.
We observed that by putting a lower threshold at spectral gain values, more intense
ripples are created than those with the proposed method of eliminating deep nulls. In
addition, the AR filter that yields after adding one term at the nominator stays stable and
its group delay is 1 up to 6 samples at most.
4.6
Conclusions
63
Magnitude (dB)
28
26
24
22
20
0.2
0.4
0.6
0.8
Normalized Frequency ( rad/sample)
0.2
0.4
0.6
0.8
Normalized Frequency ( rad/sample)
Phase (degrees)
1000
2000
3000
Figure 4.11: Magnitude and Phase Response of a filter after the elimination of its sharp
zero.
complexity since no frequency warping is employed. Despite that, the warped AR filter
is mostly preferred since it provides a better approximation at low frequencies than the
uniform AR filter. This higher frequency resolution for the lower bands is favorable for
perceptual based speech enhancement applications.
The third case of the proposed approximations is the approximation of the warped FIR
post-filter by a warped AR filter which has a very high overall computational complexity
but perceptual speech quality can be improved in cost of computational complexity.
An idea of eliminating deep nulls appearing at one or two frequencies was proposed in
place of putting a lower threshold (noise floor) at spectral gain coefficients i.e.by restricting
the a priori SNR value depending on the spectral speech estimator. This idea employs an
allpole filter of order one or two at most depending on the number of deep null frequencies
(we assume that there are one or two frequencies with deep nulls) and changes the AR filter
required to approximated the original time-domain filter after having eliminated deep nulls.
Appendix A
1 1
1
1
1 j 1 j
W4 =
1 1 1 1
1 j 1 j
The subscripts N on W are usually omitted if they are clear from the context. The
matrix W also satisfies the property W W = N I where W is the transpose-conjugate of
W so that it is unitary. Given a finite length sequence x(n), 0 n N 1 suppose we
define the vector x = [x(0)
x(1) . . .
x(N 1)]T and compute the vector X = Wx.
Then the components of X are said to form the DFT coefficients of the sequence x(n). The
sequence x(n) is the inverse DFT of the sequence X(k) and the matrix W1 is called the
IDFT matrix.
The DFT and IDFT relations are more commonly written as
X(k) =
N
1
X
x(m)W km
m=0
and
x(m) =
N 1
1 X
X(k)W km .
N
k=0
64
Appendix B
FIR a p p r o x i m a t e Parks M c C l e l l a n
%
%
%
%
%
%
%
%
%
%
The d e s i r e d f r e q u e n c y r e s p o n s e i s d e s c r i b e d by F ,A and D.
F i s a v e c t o r o f f r e q u e n c y band e d g e s i n p a i r s , i n a s c e n d i n g o r d e r
b e t w e e n 0 and 1 . 1 c o r r e s p o n d s t o t h e N y q u i s t f r e q u e n c y or h a l f t h e
s a m p l i n g f r e q u e n c y . At l e a s t one f r e q u e n c y band must have a nonz e r o
w i d t h . A i s a r e a l v e c t o r t h e same s i z e as F which s p e c i f i e s t h e
d e s i r e d a m p l i t u d e o f t h e f r e q u e n c y r e s p o n s e o f t h e r e s u l t a n t f i l t e r B.
D i s t h e d e s i r e d d e l a y . In our c a s e i t w i l l be 3 2 .
W i s a weight function f o l l o w i n g the l o g i c of the amplitude response
A. Normally i t i s [ 1 1 1 1 ] , b u t i f we want t h e e r r o r i n t h e s e c o n d band
t o be 10 t i m e s s m a l l e r than i n t h e f i r s t t h e n we s e l e c t W=[1 1 10 1 0 ] .
%
%
%
%
We a r e g o i n g t o s o l v e t h e minimax problem w i t h t h e h e l p o f l i n e a r
programming and n o t u s i n g a Remezl i k e a l g o r i t h m . This i s f o r s i m p l i c i t y
o f programming . U n f o r t u n a t e l y t h e a l g o r i t h m t e n d s t o be s l o w f o r l a r g e
f i l t e r l e n g t h s w h i l e i t can sometimes f a i l c o n v e r g i n g .
N a = N/2 + 1 ;
N b = N/ 2 ;
Delay = N b D;
% I n t e r v a l sampling
maxs = N b 1 0 ; maxs1 = c e i l ( maxs (F(2)+F ( 3 ) ) / 2 ) ; maxs2 = maxsmaxs1 ;
om = [ [ 0 : maxs1 ] F ( 2 ) / maxs1 ; F ( 3 ) + [ 0 : maxs2 ] (1 F ( 3 ) ) / maxs2 ] pi ;
Ampl = [A( 2 ) o n e s ( maxs1 + 1 , 1 ) ; A( 4 ) o n e s ( maxs2 + 1 , 1 ) ] ;
W = [W( 1 ) o n e s ( maxs1 + 1 , 1 ) ;W( 3 ) o n e s ( maxs2 + 1 , 1 ) ] ;
Des a = cos (om Delay ) . Ampl . W; Des a = [ Des a ; Des a ] ;
65
66
1];
disp ( Real p a r t a p p r o x i m a t i o n )
d e l t a a = l i n p r o g ( f , C, Des a ) ;
e r r a = abs ( Des aC ( : , 1 : end1) d e l t a a ( 1 : end 1 ) ) ;
B = d e l t a a ( 1 : end1);
m a x r e a l e r r o r = d e l t a a ( end )
B = [ B( end : 1 : 2 ) ; B ] ;
% Solve imaginary part
C = zeros ( 2 length (om) , N b +1);
for i = 1 : N b
CC = 2 sin ( i om) . W;
C ( : , i ) = [ CC; CC ] ;
end
C ( : , end ) = [ o n e s ( length (om ) , 1 ) ; o n e s ( length (om) , 1 ) ] ;
f =[ zeros ( N b , 1 ) ; 1 ] ;
disp ( Imaginary p a r t a p p r o x i m a t i o n )
d e l t a b = l i n p r o g ( f , C, Des b ) ;
e r r b = abs ( Des b C ( : , 1 : end1) d e l t a b ( 1 : end1) ) ;
BB = d e l t a b ( 1 : end1);
m a x i m a g i n a r y e r r o r = d e l t a b ( end )
B = B + [ BB( end : 1 : 1 ) ; 0 ; BB ] ;
m a x c o m b i n e d e r r o r = sqrt ( max( e r r a . 2 + e r r b . 2 ) )
[ h , w ] = f r e q z (B, 1 , 1 0 2 4 ) ;
figure (1)
plot ( w/ pi , abs ( h ) )
t i t l e ( F i l t e r amplitude response ) ;
figure (2)
plot ( w ( 2 : end ) / pi , d i f f ( phase ( h ) ) / (w(2) w( 1 ) ) ) ; % p l o t group d e l a y
axis ( [ 0 1 0 D 2 ] ) % d e l a y = 32 i n p a s s b a n d [ 0 0 . 6 ]
t i t l e ( F i l t e r group d e l a y r e s p o n s e ) ;
Appendix C
P
X
1
= 2 y(k)y(k) 2
n y(k n)y(k) +
0
n=1
Since
P
X
!2
n y(k n)
n=1
P X
P
X
P
X
!2
n y(k n)
n=1
n m y(k n)y(k m)
n=1 m=1
!2
n y(k n)
n=1
P
X
n=1
P
X
n=1
P
X
P
X
m=1
P
X
m y(k n)y(k m)
m yy (n m)
m=1
n yy (n)
n=1
1
20
yy (0)
67
PP
n=1 n yy (n)
(C.1)
68
Proof of Eq.(4.42):
In the z-domain, Parsevals theorem reads:
I
X
1 dz
1
h(k)h (k) =
H(z)H
2j C
z z
(C.2)
[n]
[]
Dhs (l)Dhs (l) =
l=0
1
2j
I
C
n
1
(z)Hs (z)HA
(z )Hs (z 1 )
HA
dz
z
1
dz
H n (z)Hs (z)Hs (z 1 )
2j C A
z
I
1
dz
n+mm
=
HA
(z)Hs (z)Hs (z 1 )
2j C
z
I
1
dz
n+m
HA
(z)Hs (z)HA (z 1 )+m Hs (z 1 )
=
2j C
z
X
[n+m]
[+m]
=
Dhs (l)Dhs (l).
=
l=0
Appendix D
Matlab code
Solution of system of equations 4.17:
The solution of the system of equations 4.17 is done using the following Matlab code:
% c a l c u l a t i o n o f 1 s t p a r t Re{ExH}
P = 20;
N = 1 0 2 4 ; % no o f s a m p l i n g p o i n t s
w1 = linspace ( 0 , pi , N ) ;
E1 = zeros ( N, P ) ;
Es = zeros ( N, P ) ;
for j = 1 : N
for k = 1 : P
E1 ( j , k ) = exp ( sqrt ( 1) k w1 ( j ) ) ;
end
Es ( j , : ) = r e a l ( E1 ( j , : ) ) abs ( FIRf ( j ) ) . 2 ;
end
ReEH = sum( Es ) ;
b = ReEH ;
% c a l c u l a t i o n o f 2nd p a r t
E = zeros ( P , N ) ;
EEH = c e l l ( 0 ) ;
EEHxH2 = zeros (P , P ) ;
clear i j k
for j = 1 : N
for k = 1 : P
E( k , j ) = exp ( sqrt ( 1) k w1 ( j ) ) ;
end
EEH{ j } = E ( : , j ) E ( : , j ) ;
EEH{ j } = abs ( FIRf ( j ) ) . 2 r e a l ( EEH{ j } ) ;%T o e p l i t z s y m m e t r i c a l s t r u c t u r e
end
f o r j = 1 : N1
EEHxH2 = EEH{ j } + EEH{ j +1};
EEH{ j +1} = EEHxH2 ;
end
A = EEHxH2 ;
% s y s t e m s o l u t i o n
x = A\b
69
70
It should be mentioned that the coefficients of the FIR filter that must be approximated
are obtained by the m-file Gainvector and FIRfreq presented as follows:
% FIR PostF i l t e r
gainvector = Gainvector ;
a = i f f t ( gainvector ) ;
FIR = a (mod ( ( 0 : 6 3 ) + 3 3 , 6 4 ) + 1 ) ;
N = 1 0 2 4 ; % # no o f s a m p l i n g p o i n t s
w1 = linspace ( 0 , pi , N ) ;
H1 = zeros ( 1 ,N ) ;
FIRf = zeros ( 1 ,N ) ;
K = 6 3 ; % // f i l t e r c o e f f s a r e 63 ( e l i m i n a t i n g t h e l a s t z e r o )
for j = 1 : N
for k = 1 : K
H1( k ) = FIR ( k ) . exp ( sqrt ( 1) ( k1) w1 ( j ) ) ;
end
FIRf ( j ) = sum(H1 ) ;
end
71
Regarding the approximation problem, the approximated filter might be either a uniform(Gainvector) or a warped filter (warped).
a l p h a = 0 . 5 ; %warping f a c t o r
L = 6 4 ; %DFTp o i n t s
N = 1024;
w1 = linspace ( 0 , pi , N ) ;
a r g t a n 1 = sin ( w1 ) . / ( cos ( w1 ) a l p h a ) ;
warpphase1 = w1 + 2 atan ( a r g t a n 1 ) ;
H1 = zeros ( 1 ,N ) ;
warpedFIR1 = zeros ( 1 ,N ) ;
K = 63;
for j = 1 : N
for k = 1 : K
H1( k ) = FIR ( k ) . exp ( 1 i ( k1) warpphase1 ( j ) ) ;
end
warpedFIR1 ( j ) = sum(H1 ) ;
end
72
% F i l t e r o f l e n g t h M w i t h group d e l a y n0
gainvector = Gainvector ;
gainvector (17) = 0.01;
M = 64;
n0 = 1 6 ; % d e s i r e d group d e l a y
wf = zeros (M, 1 ) ;
for l = 1 : M
wf ( l ) = 2 pi ( l 1)/M;
end
w = zeros ( length ( g a i n v e c t o r ) , 1 ) ;
f o r k = 1 : length ( g a i n v e c t o r )
w( k ) = g a i n v e c t o r ( k ) exp ( 1 i n0 wf ( k ) ) ;
end
W f i l t = i f f t (w ) ;
73
74
for j = 1 : M
for k = 1 : P
%f i n d c o e f f s x from ReExH64
f i l t A ( k ) = x ( k ) exp ( 1 i n0 wf1 ( j ) ) ;
end
A a l l p o l e ( j ) = sum( f i l t A ) ;
i f ( n u l l p o s ( j ) = 0 )
Af ( j ) = ( 1 b1 exp(1 i f n ) ) / ( 1 A a l l p o l e ( j ) ) ;
else
Af ( j ) = 1 / ( 1 A a l l p o l e ( j ) ) ;
end
end
HxA = H H . A a l l p o l e ;
approxHAa = abs (HxA) . 2 ;
approxHA = ( 1 /M) sum( approxHAa ) ;
a0 = sqrt ( approxHA ) ;
A f t o t = a0 Af ; % f r e q u e n c y r e s p o n s e o f AR f i l t e r
[ Hn ,W] = f r e q z ( FIRn , 1 , 6 4 ) ;
[Num, Den ] = i n v f r e q z ( A f t o t , W, 1 , P ) ;
figure (3)
plot ( W, 10 log10 ( abs ( A f t o t ) ) , r )
t i t l e ( AR F i l t e r Amplitude r e s p o n s e ) ;
figure (4)
g r p d e l a y (Num, Den ,W)
t i t l e ( AR f i l t e r Group Delay r e s p o n s e ) ;
Bibliography
[1] Harvey Dillon, Hearing Aids. Thieme Medical Publishers, 1st edition, May 2001.
[2] http://openlearn.open.ac.uk/file.php/3373/formats/print.htm
[3] http://mail.pittsfield.net/teachersites/WhelihanKathleen/
[4] http://dissertations.ub.rug.nl/FILES/faculties/science/1996/p.w.j.hengel/c1.pdf
[5] http://openlearn.open.ac.uk/mod/resource/view.php?id=263164
[6] http://books.google.gr/books?id=wpYSS8o0PeoC&printsec=frontcover#v=onepage&q=&f=false
[7] http://openlearn.open.ac.uk/mod/resource/view.php?id=263208
[8] Robert E.Sandlin, Hearing Aid Amplification-Technical and Clinical Considerations.
Second Edition. Singular Thomson Learning.
[9] R. Gao, S. Basseas, D.T. Bargiotas, L.H. Tsoukalas, Next-generation hearing prosthetics, IEEE Robotics and Automation Magazine, March 2003.
[10] Brent Edwards and Dave Smriga, Better Hearing Through DSP, GN ReSound North
America.
[11] Hecox K.E Williamson M.J., Cummins K.L. Adaptive Programmable Signal Processing
and filtering for hearing aids,1991.
[12] James M. Kates and Kathryn Hoberg Arehart, Multichannel Dynamic-Range Compression Using Digital Frequency Warping, EURASIP Journal on Applied Signal Processing, vol. 2005, no. 18, pp. 3003-3014, 2005. doi:10.1155/ASP.2005.3003
[13] J. Benesty, S. Makino, and J. Chen, Eds., Speech Enhancement, Springer, New York,
NY, USA, 2005.
[14] Heinrich W. Lllmann and Peter Vary, Post-Filters for Speech Enhancement in HearingAids. Final Report January 2006-January 2007 RWTH Aachen.
[15] P. P. Vaidyanathan, Multirate Systems and Filter Banks, Prentice Hall, PTR, 1993.
[16] H. W. L
ollmann and P. Vary, Efficient Non-Uniform Filter-Bank Equalizer, Institute
of Communication Systems and Data Processing. RWTH Aachen University, D-52056
Aachen, Germany.
[17] H. W. L
ollmann and P. Vary, Low Delay Filter-Banks for Speech and Audio Processing,
Institute of Communication Systems and Data Processing. RWTH Aachen University,
Aachen, Germany.
75
BIBLIOGRAPHY
76
[18] H. W. L
ollmann and P. Vary, Low Delay Filter for Adaptive Noise Reduction, Institute
of Communication Systems and Data Processing. RWTH Aachen University, D-52056
Aachen, Germany.
[19] George V. Moustakides, Basic Techniques in Digital Signal Processing. Tziola Editions.
2004.
[20] Thomas S. Ferguson, LINEAR PROGRAMMING: A Concise Introduction.
[21] H. H. Dam, A. Cantoni, Fellow, IEEE, S. Nordholm, Senior Member, IEEE, and
K. L. Teo, Senior Member, IEEE Digital Laguerre Filter Design With Maximum
Passband-to-Stopband Energy Ratio Subject to Peak and Group Delay Constraints,
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSI: REGULAR PAPERS,
VOL. 53, NO. 5, MAY 2006.
[22] H. W. L
ollmann and P. Vary, A Warped Low Delay Filter for Speech Enhancement,
Institute of Communication Systems and Data Processing. RWTH Aachen University,
D-52056 Aachen, Germany.
[23] FP6-004171 HEARCOM, Hearing in the Communication Society, D-5-1: Sub-set of
signal enhancement techniques operational on PC system, Integrated project, Editor:
Arne Leijon.