You are on page 1of 5

Multimedia Signal Processing Laboratory

Report Abstracts
P. Kabal
ITU-T G.723.1 Speech Coder: A Matlab Implementation
MMSP Lab Technical Report, Dept. Electrical & Computer Engineering, McGill University, Version 2c, 54 pp., Aug. 2011
(initial version Nov. 2003)
Matlab code: G.723.1-v2r1b.tar.gz
This report documents the details of the processing steps in the ITU-T G.723.1 Speech Coder. This report
accompanies an implantation of that coder in Matlab. The Matlab implementation was designed to
facilitate experimentation and research using a practical speech coder as a base.
P. Kabal
Minimum Mean-Square Error Filtering: Autocorrelation/Covariance, General Delays and Multirate Systems
MMSP Lab Technical Report, Dept. Electrical & Computer Engineering, McGill University, Version 0.985. 215 pp., April 2011
These notes examine procedures for solving for minimum mean-square error filter. The stochastic case
and the block-based (least-squares) analyses are covered in a single formalism. The filtering is analyzed
in more generality than in many expositions, allowing for configurations with general filter delays and
flexible windows for the least squares problem. The important linear prediction problem is examined in
detail. For the equally spaced delay case, a rich set of results ensue. Several topics are covered that are
missing from many textbooks: affine estimation (non-zero means), cyclostationary signals (for multirate
signals), fractionally spaced equalizers, joint process estimation in relation to the Levinson algorithm, and
an approximate formulation for linearly constrained filters.
P. Kabal
Frequency Domain Representations of Sampled and Wrapped Signals
MMSP Lab Technical Report, Dept. Electrical & Computer Engineering, McGill University, Version 1.5, 16 pp., March 2011
(initial version Jan. 2008)
These notes examine the relationships between frequency domain representations of discrete-time and
wrapped signals derived from a continuous-time signal. The first part of these notes develops the
relationships for periodic signals which allow for the analysis of periodic signals within the framework of
the Fourier transform. The second part examines the relationships between the Fourier series, the
Discrete-Time Fourier Transform (DTFT) and the Discrete Fourier Transform (DFT).
P. Kabal
The Equivalence of ADPCM and CELP Coding
MMSP Lab Technical Report, Dept. Electrical & Computer Engineering, McGill University, Version 1.2, 14 pp., March 2011
(initial version April 2010)
This document examines a coding schemes which differentially code signals while at the same time
controlling the frequency characteristics of the coding (quantization) error. We show that a (vector
quantized) version of an Adaptive Differential Pulse Code Modulation (ADPCM) system using noise
feedback to shape the quantization noise can be converted to an equivalent system which is in the form
of a Code Excited Linear Prediction (CELP) system. While this equivalence is known by, or at least not a
surprise to, the signal processing cognoscenti, it is not widely appreciated by many others. We also try to
add a historical perspective on the development of these systems.
P. Kabal
Minimum Phase & All-Pass Filters
MMSP Lab Technical Report, Dept. Electrical & Computer Engineering, McGill University, Version 2.1, 28 pp., March 2011
(initial version Nov. 2007)
This document analyzes minimum-phase and all-pass filters. The analysis allows for complex-valued filter
coefficients. The properties of the frequency responses (amplitude, phase, and group delay) of these
filters are discussed.

P. Kabal
Time Windows for Linear Prediction of Speech
MMSP Lab Technical Report, Dept. Electrical & Computer Engineering, McGill University, Version 2a, 43 pp., Nov. 2009
(initial version 2003-10)
This report examines the time windows used for linear prediction (LP) analysis of speech. The goal of
windowing is to create frames of data each of which will be used to calculate an autocorrelation sequence.
Several factors enter into the choice of window. The time and spectral properties of Hamming and Hann
windows are examined. We also consider windows based on Discrete Prolate Spherical Sequences
including multiwindow analysis. Multiwindow analysis biases the estimation of the correlation more than
single window analysis. Windows with frequency responses based on the ultraspherical polynomials are
discussed. This family of windows includes Dolph-Chebyshev and Saramäki windows. This report also
considers asymmetrical windows as used in modern speech coders. The frequency response of these
windows is poor relative to conventional windows. Finally, the presence of a "pedestal" in the time window
(as in the case of a Hamming window) is shown to be deleterious to the time evolution of the LP
parameters.

P. Kabal
FIR Filters: Frequency-Weighted and Minimum-Phase Designs
MMSP Lab Technical Report, Dept. Electrical & Computer Engineering, McGill University, Version 1.6, 32 pp., Nov. 2007
(initial version Sept. 2004)
Matlab code: FilterDesign-M-v2r0.tar.gz
P. Kabal
Improving the Presentation of Matlab Plots
MMSP Lab Technical Report, Dept. Electrical & Computer Engineering, McGill University, June 2006
Matlab code: Matlab-Plot-v1r3.tar.gz
This document describes a number of strategies which go towards the goal of producing publication
quality plots from Matlab. One finds much to criticize in the quality of plots that are reproduced in today's
journals. This is due to the fact that the authors supply the plots without having a clear view of how they
will be processed to produce the final plot on the printed page. We give some guidelines and supply
Matlab routines that streamline the application of these guidelines.
P. Kabal
Matlab Plots in Microsoft Word
MMSP Lab Technical Report, Dept. Electrical & Computer Engineering, McGill University, Jan. 2006
This report looks at different options for inserting plots generated from Matlab into Microsoft Word
document. For publication quality output, it is important to control the size of the graphic that will appear
in the final document. The graphic should be drawn at its final size in Matlab. Scaling in Word is
undesirable, as it not only scales the plot, but also the text on the graphic. This report outlines a
procedure that sets the size of the figure and the font size in Matlab. Once set, the graphic can be
imported into Word with no further scaling.
Results indicate that the PostScript format is the best option for good quality graphics. Graphics imported
using cut and paste from Matlab (EMF or bitmap format) are noticeably inferior in quality.

P. Kabal
Windows for Transform Processing
MMSP Lab Technical Report, Dept. Electrical & Computer Engineering, McGill University, Dec. 2005
This report examines the time windows used in processing of signals in a transformed domain. The goal of
windowing is to create frames of data, each of which will be used to calculate a transformed sequence.
The transform coefficients are then modified (filtering for instance for noise reduction) or coded
(transform coding). The modified transform coefficients are then applied to an inverse transform and
windowed again before creating an output signal using addition of the overlapped blocks. It is the analysis
window (before the transform) and the synthesis window (after the inverse transform) that are examined
in this report. The requirement for perfect reconstruction (when the transform coefficients are not
modified) is developed. This gives a condition on the product of the analysis and synthesis windows. An
argument is given to show that if additive noise is introduced in the transform domain, the windowing
should be equally apportioned between these windows, i.e. the analysis and synthesis windows should be
the same. The windowing requirements for systems implementing block-by-block filtering of the input
signal in the transform domain are also examined.

P. Kabal
Ill-Conditioning and Bandwidth Expansion in Linear Prediction of Speech
MMSP Lab Technical Report, Dept. Electrical & Computer Engineering, McGill University, Feb. 2003
This report examines schemes that modify linear prediction (LP) analysis for speech signals. First
techniques which improve the conditioning of the LP equations are examined. White noise compensation
for the correlations is justified from the point of view of reducing the range of values which the predictor
coefficients can take on. A number of other techniques which modify the correlations are investigated
(highpass noise, selective power spectrum modification). The efficacy of these procedures is measured
over a large speech database. The results show that white noise compensation is the method of choice - it
is both effective and simple.
Other methods to prematurely terminate the iterative solution of the correlation equations (Durbin
recursion) to circumvent problems of ill-conditioning are also investigated.
The report also considers the bandwidth expansion of digital filters which have resonances. In speech
coding such resonances correspond to the formant frequencies. Bandwidth expansion of the LP filter
serves to avoid unnatural sharp resonances that may be artefacts of pitch and formant interaction. Lag
windowing of the correlation values has been used with the aim of both bandwidth expansion and helping
the conditioning of the LP equations. Experiments show that the benefit for conditioning is minimal. This
report also discusses bandwidth expansion of the prediction coefficients after LP analysis using radial
scaling of the z-transform. A simple new formula is given which can be used to estimate the bandwidth
expansion.
R. Der
Stable Symmetric Distributions and Their Role in the Signal Separation Problem
MMSP Lab Technical Report, Dept. Electrical & Computer Engineering, McGill University, Feb. 2003
This report examines the problem of blind source separation when the sources are distributed from a
stable class. We show that cost functions extemising any marginal property of a mixture of signals are
constant over the set of symmetric stable distributions, and thus cannot solve the blind source separation
problem in full generality. These distributions are non-pathological, but have infinite energy. The
noticeable exception is the Gaussian distribution, for which the separation problem is inherently
undetermined. For finite variance signals, the use of marginal statistics for blind signal separation is
justified.
P. Kabal
An Examination and Interpretation of ITU-R BS.1387: Perceptual Evaluation of Audio Quality
MMSP Lab Technical Report, Dept. Electrical & Computer Engineering, McGill University, May 2002 (updated Dec. 2003)
This report examines the standard which describes a method for the objective measure of perceived audio
quality (ITU-R Recommendation BS.1387). This standard uses a number of psycho-acoustical measures
which are combined to give a measure of the quality difference between two instances of a signal (a
reference and a test signal). Many aspects of the standard are under-specified. This report examines
alternate interpretations. It also looks at efficiency issues in the implementation of computationally
intensive parts of the algorithm.
Matlab code: PQevalAudio-v1r0.tar.gz
R. Der
Blind Signal Separation
MMSP Lab Technical Report, Dept. Electrical & Computer Engineering, McGill University, Sept. 2001
Blind Signal Separation is the task of separating signals when only their mixtures are observed. Recently,
Independent Component Analysis has become a favourite method of researchers for attacking this
problem. We review the techniques, from cumulant-based algorithms to Infomax to second-order
statistics, from feedback to feedforward architectures, from the instantaneous to the convolutional
problem. A new method for reducing the whitening effect on speech, known to occur in feedforward
architectures, is introduced. The procedure also possesses significant stabilization properties, being based
on performing the filter update in the LP-residual domain of speech. Experimental tests are conducted,
and the algorithms compared.

P. Kabal
Generating Gaussian Pseudo-Random Deviates
MMSP Lab Technical Report, Dept. Electrical & Computer Engineering, McGill University, Oct. 2000
This report examines low-complexity methods to generate pseudo-random Gaussian (normal) deviates.
We introduce a new method based on modelling the Gaussian probability density function using piecewise
linear segments. This approach is shown to be both efficient and accurate. It does not require the
calculation of transcendental functions
All of the methods considered map one or more uniform distributions to create the Gaussian deviates.
This report investigates the effect of the use of discrete variates, particularly in the tails of the Gaussian
distribution. In addition, we give a new interpretation of the method of aliases that suggests its
application to non-uniform quantization.
P. Kabal
Formatting a Thesis with LaTeX
MMSP Lab Technical Report, Dept. Electrical & Computer Engineering, McGill University, March 2000 (updated June 2005)
Thesis Macros: ThesisStyle.zip
This report describes the use of LaTeX to format a thesis. A number of topics are covered: content and
organization of the thesis, LaTeX macros for controlling the thesis layout, formatting mathematical
expressions, generating bibliographic references, importing figures and graphs, generating graphs in
Matlab, and formatting tables. The LaTeX macros used to format a thesis (and this document) are
described. As well, Matlab procedures are shown to illustrate methods that can be used to format graphs
in a form suitable for inclusion in a LaTeX document.
P. Kabal
Matlab Plots in Microsoft Word
MMSP Lab Technical Report, Dept. Electrical & Computer Engineering, McGill University, March 2000
Superseded by the version of Jan. 2006
P. Kabal
Measuring Speech Activity
MMSP Lab Technical Report, Dept. Electrical & Computer Engineering, McGill University, August 1999
This report discusses the algorithm described in ITU-T Recommendation P.56 for measuring the active
speech level. Method B in P.56 determines a speech activity factor representing the fraction of time that
the signal is considered to be active speech (as opposed to background idle noise) and the corresponding
active level for the speech part of the signal. The basic algorithm generates an envelope value at each
sample time. The envelope values are compared with a discrete set of thresholds. The (approximate)
active speech level is determined by interpolating in the log domain between the threshold values. In this
report we assess the effects on the speech active level due to interpolation. Recommendation P.56 allows
for sampling rates as low as 600 Hz. Results for subsampled data are compared with those calculated at
the full speech sampling rate.

C. C. Chu and P. Kabal


Codebook Excited Linear Prediction of Speech: Performance in the Presence of Channel Errors
Technical Report 88-10, INRS-Telecommunications, University of Quebec, March 1988.
P. Kabal
Code Excited Linear Prediction Coding of Speech at 4.8 kb/s
Technical Report 87-36, INRS-Telecommunications, University of Quebec, July 1987
This report describes a software implementation of an algorithm for digital coding of speech at low bit
rates. The reconstruction of the speech signal is accomplished by exciting a cascade of a formant
synthesis filter and a pitch synthesis filter with an excitation waveform. The excitation waveform is
selected from a dictionary of waveforms using a frequency weighted mean-square error criterion. At
transmission rates in the neighborhood of 5 kb/s, this scheme produces speech with better quality than
any other known scheme.
J.-L. Moncet and P. Kabal
Codeword Selection for CELP Coders
Technical Report 87-35, INRS-Telecommunications, University of Quebec, July 1987
This report describes the algorithm used for selecting an excitation waveform for a CELP coder operating
at 5 kb/s. Each candidate waveform is used to synthesize a segment of speech. A frequency weighted
error criterion is used to find the waveform which regenerates the best output speech. The synthesis
operation uses both a pitch synthesis filter and a formant synthesis filter. The pitch synthesis filter is
optimized to give the best output speech. This optimization offers a significant improvement over a
procedure which uses a pitch filter chosen by analyzing the input speech. Simplified sequential versions of
this strategy also give good quality speech. The quantization of the parameters is also considered.
C. C. Chu and P. Kabal
Coding of LPC Parameters for Low Bit Rate Speech Coders
Technical Report 87-19, INRS-Telecommunications, University of Quebec, March 1987
This report summarizes the results of a study of the use of line spectral frequencies (LSF's) for the low bit
rate coding of the linear predictive (LPC) parameters for use in a speech coder. Different forms of
quantization using LSF's for the LPC coefficients are examined. An LSF based scheme allows the quantizer
to take into account the perceptual impact of spectral distortion. One of the schemes considered takes
advantage of the frame-to-frame correlation of the LSF parameters. The LSF based coding scheme is
compared to a quantization based on a reflection coefficient representation. The application of this
quantization scheme to the low rate, 4800 bits/sec, Code Excited Linear Predictive (CELP) coder is
considered. In this context, adaptive gain factors for the differential (frame-to-frame) LSF quantizer are
useful. In addition, a frame-to-frame interpolation scheme is proposed. With these modifications, LSF
coding of 10 LPC parameters requires 1150 bits/sec.
L. Barbeau, D. Bernardi, C. C. Chu, P. Kabal, J.-L. Moncet, and D. O'Shaughnessy
Speech Enhancement in the Presence of Interfering Music and Noise
Technical Report 87-09, INRS-Telecommunications, University of Quebec, Jan. 1987
This report summarizes the results of speech enhancement experiments for a signal consisting of speech
in the presence of interfering music and noise. Filtering was applied to remove hum and high frequency
components. The composite signal was then frequency equalized to flatten the noise spectrum. A
reference recording of the same passage of music which interferes with the original recording was
obtained and time aligned with the composite recording.
The time-aligned reference music was processed through an adaptive filter and then subtracted from the
composite recording. This results in a noticeable reduction and muffling of the music level. While before
music cancellation, the music tended to dominate the composite signal, after cancellation the speech has
a generally higher level than the music.
A number of other techniques were also investigated. The most successful of these is spectral subtraction.
This involves suppressing those frequency components present in the music from the composite signal.
This has the effect of suppressing the music, but since the desired speech component also contains the
same frequency components, the speech quality is also affected.
The adaptive filtering approach has the least subjective effect on the speech components but does not
completely suppress the music. The speech components are considerably more intelligible after music
cancellation has been carried out. Spectral subtraction lends a somewhat unnatural quality to the
resultant signal, but does render more complete suppression of the music. The speech is slightly muffled.
The intelligibility of the speech can be judged to be about the same or better than for the adaptive
filtering approach.

R. P. Ramachandran and P. Kabal


The Computation of Line Spectral Frequencies Using Chebyshev Polynomials
Technical Report 85-27, INRS-Telecommunications, University of Quebec, Sept. 1985
Line spectral frequencies provide an alternate parameterization of the analysis and synthesis filters used
in linear predictive coding (LPC) of speech. In this paper, a new method of converting between the direct
form predictor coefficients and line spectral frequencies is presented. Both even and odd order LPC
systems are considered. The system polynomial for the analysis filter is converted to two even order
symmetric polynomials with interlacing roots on the unit circle. The line spectral frequencies are given by
the positions of the roots of these two auxiliary polynomials. The response of each of these polynomials
on the unit circle is expressed as a series expansion in Chebyshev polynomials. The line spectral
frequencies are found using an iterative root finding algorithm which searches for real roots of a real
function. The algorithm developed is simple in structure and is designed to constrain the maximum
number of evaluations of the series expansions. The method is highly accurate and can be used in a form
that avoids the storage of trigonometric tables or the computation of trigonometric functions. The
reconversion of line spectral frequencies to predictor coefficients uses an efficient algorithm derived by
expressing the root factors as an expansion in Chebyshev polynomials.
P. Kabal and B. Sayar
Rounding and Scaling in Fixed-Point FFT Implementations
Technical Report 85-24, INRS-Telecommunications, University of Quebec, June 1985
The calculation of the discrete Fourier transform using a fast Fourier transform (FFT) algorithm with fixed-
point arithmetic is considered. The input data is scaled to prevent, overflow and to maintain accuracy.
New conditions on the magnitudes of the input components to avoid overflow during the computation of
the FFT are derived. Particular emphasis is placed on an implementation using a digital signal processing
architecture based on a 16-bit fixed-point representation for the data and the provision for double
precision accumulation of sums and products. Simulation results to assess the error performance (signal-
to-noise ratio) are presented for various forms of the implementation. Algorithm variants as well as
different rounding options are compared. Execution times for implementations based on a single chip
signal processor (the Texas Instruments TMS320) are also given. These show that a considerable increase
in accuracy can be obtained with only a small penalty in execution time, by applying an alternating form
of rounding rather than truncation.

P. Kabal and R. Rabipour


Adaptive Transform Coding (ATC) of Speech - Phase II
Technical Report 83-07, INRS-Telecommunications, University of Quebec, April 1983
P. Kabal
Quantizers for hte Gamma Distribution and Other Symmetrical Distributions
Technical Report 83-08, INRS-Telecommunications, University of Quebec, April 1983
C. Side
Un Systeme d'Analyse et de Synthese de Parole par Prediction Lineaire pour un Tau de Transmission inferieeur a 2400
BPS
Technical Report 82-02, INRS-Telecommunications, University of Quebec, February 1982
P. Kabal
Adaptive Transform Coding of Speech at 9.6 kb/s
Technical Report 82-06, INRS-Telecommunications, University of Quebec, May 1982
P. Kabal
Feasibility Study of a Hardware Implementation of a 4.8 kb/s RELP Speech Coder
Technical Report 81-08, INRS-Telecommunications, University of Quebec, May 1981
This report investigates the feasibility of a hardware implementation of a speech coder based on Residual
Excited Linear Prediction (RELP) at 4.8 kb/s. To this end, the basic RELP algorithm has been restructured
and simplified to be compatible with real-time processing. In addition, the computations have been
implemented with integer arithmetic using 16 bit precision, augmented with the judicious use of double
precision accumulation. An architecture based on a microprocessor supplemented with a peripheral
processor built around a high speech multiplier/accumulator is proposed. This arrangement can be the
basis for a simple, cost-effective and flexible implementation of a hardware RELP coder.

A. Roset
Application of Quadrature Mirror Filters to Split Band Voice Coding Process
Technical Report 80-03, INRS-Telecommunications, University of Quebec, January 1980
This report discusses an application of quadrature ;mirror filters for an 8 sub-band coder; this system
allows us to take adavance of the differences in the long term power and of the just nopticable noide in
each band.t
P. Kabal
Minimum Mean Square Error Quantizers
Technical Report 80-09, INRS-Telecommunications, University of Quebec, May 1980
This report discusses the design of quantizers which minimize the mean square error for a signal with a
given probability density function. Tables of optimal non-uniform quantizers are given for signals with
Gaussian, Laplace (exponential) and gamma distributions. These figures correct values given previously in
the literature. An appendix documents a program for calculating an optimal quantizer for an empirically
derived tabulated probability density.
M. Belleau and P. Kabal
Optimal QUantizers in Linear Predictive Coding of Speech
Technical Report 80-23, INRS-Telecommunications, University of Quebec, May 1980
D. C. Stevenson and P. Kabal
Comparative Evaluation of Residual-Excited Linear Prediction and Sub-Band Coding for Speech Transmission at 9.6 kb/s
Technical Report 79-14, INRS-Telecommunications, University of Quebec, October 1979
P. Kabal
Simulation of Digital Coding Techniques for Speech Transmission at 9.6 kb/szers
Technical Report 78-08, INRS-Telecommunications, University of Quebec, December 1978
Speech transmission at 9.6 kb/s is of significant interest becaus that is the highest rate currently
attainable over analog voice lines. Two methods of speech coding, residual-ecited linear prediction (RELP)
and sub-band coding (SBC) are simulated and evaluated.

You might also like