Professional Documents
Culture Documents
Report Abstracts
P. Kabal
ITU-T G.723.1 Speech Coder: A Matlab Implementation
MMSP Lab Technical Report, Dept. Electrical & Computer Engineering, McGill University, Version 2c, 54 pp., Aug. 2011
(initial version Nov. 2003)
Matlab code: G.723.1-v2r1b.tar.gz
This report documents the details of the processing steps in the ITU-T G.723.1 Speech Coder. This report
accompanies an implantation of that coder in Matlab. The Matlab implementation was designed to
facilitate experimentation and research using a practical speech coder as a base.
P. Kabal
Minimum Mean-Square Error Filtering: Autocorrelation/Covariance, General Delays and Multirate Systems
MMSP Lab Technical Report, Dept. Electrical & Computer Engineering, McGill University, Version 0.985. 215 pp., April 2011
These notes examine procedures for solving for minimum mean-square error filter. The stochastic case
and the block-based (least-squares) analyses are covered in a single formalism. The filtering is analyzed
in more generality than in many expositions, allowing for configurations with general filter delays and
flexible windows for the least squares problem. The important linear prediction problem is examined in
detail. For the equally spaced delay case, a rich set of results ensue. Several topics are covered that are
missing from many textbooks: affine estimation (non-zero means), cyclostationary signals (for multirate
signals), fractionally spaced equalizers, joint process estimation in relation to the Levinson algorithm, and
an approximate formulation for linearly constrained filters.
P. Kabal
Frequency Domain Representations of Sampled and Wrapped Signals
MMSP Lab Technical Report, Dept. Electrical & Computer Engineering, McGill University, Version 1.5, 16 pp., March 2011
(initial version Jan. 2008)
These notes examine the relationships between frequency domain representations of discrete-time and
wrapped signals derived from a continuous-time signal. The first part of these notes develops the
relationships for periodic signals which allow for the analysis of periodic signals within the framework of
the Fourier transform. The second part examines the relationships between the Fourier series, the
Discrete-Time Fourier Transform (DTFT) and the Discrete Fourier Transform (DFT).
P. Kabal
The Equivalence of ADPCM and CELP Coding
MMSP Lab Technical Report, Dept. Electrical & Computer Engineering, McGill University, Version 1.2, 14 pp., March 2011
(initial version April 2010)
This document examines a coding schemes which differentially code signals while at the same time
controlling the frequency characteristics of the coding (quantization) error. We show that a (vector
quantized) version of an Adaptive Differential Pulse Code Modulation (ADPCM) system using noise
feedback to shape the quantization noise can be converted to an equivalent system which is in the form
of a Code Excited Linear Prediction (CELP) system. While this equivalence is known by, or at least not a
surprise to, the signal processing cognoscenti, it is not widely appreciated by many others. We also try to
add a historical perspective on the development of these systems.
P. Kabal
Minimum Phase & All-Pass Filters
MMSP Lab Technical Report, Dept. Electrical & Computer Engineering, McGill University, Version 2.1, 28 pp., March 2011
(initial version Nov. 2007)
This document analyzes minimum-phase and all-pass filters. The analysis allows for complex-valued filter
coefficients. The properties of the frequency responses (amplitude, phase, and group delay) of these
filters are discussed.
P. Kabal
Time Windows for Linear Prediction of Speech
MMSP Lab Technical Report, Dept. Electrical & Computer Engineering, McGill University, Version 2a, 43 pp., Nov. 2009
(initial version 2003-10)
This report examines the time windows used for linear prediction (LP) analysis of speech. The goal of
windowing is to create frames of data each of which will be used to calculate an autocorrelation sequence.
Several factors enter into the choice of window. The time and spectral properties of Hamming and Hann
windows are examined. We also consider windows based on Discrete Prolate Spherical Sequences
including multiwindow analysis. Multiwindow analysis biases the estimation of the correlation more than
single window analysis. Windows with frequency responses based on the ultraspherical polynomials are
discussed. This family of windows includes Dolph-Chebyshev and Saramäki windows. This report also
considers asymmetrical windows as used in modern speech coders. The frequency response of these
windows is poor relative to conventional windows. Finally, the presence of a "pedestal" in the time window
(as in the case of a Hamming window) is shown to be deleterious to the time evolution of the LP
parameters.
P. Kabal
FIR Filters: Frequency-Weighted and Minimum-Phase Designs
MMSP Lab Technical Report, Dept. Electrical & Computer Engineering, McGill University, Version 1.6, 32 pp., Nov. 2007
(initial version Sept. 2004)
Matlab code: FilterDesign-M-v2r0.tar.gz
P. Kabal
Improving the Presentation of Matlab Plots
MMSP Lab Technical Report, Dept. Electrical & Computer Engineering, McGill University, June 2006
Matlab code: Matlab-Plot-v1r3.tar.gz
This document describes a number of strategies which go towards the goal of producing publication
quality plots from Matlab. One finds much to criticize in the quality of plots that are reproduced in today's
journals. This is due to the fact that the authors supply the plots without having a clear view of how they
will be processed to produce the final plot on the printed page. We give some guidelines and supply
Matlab routines that streamline the application of these guidelines.
P. Kabal
Matlab Plots in Microsoft Word
MMSP Lab Technical Report, Dept. Electrical & Computer Engineering, McGill University, Jan. 2006
This report looks at different options for inserting plots generated from Matlab into Microsoft Word
document. For publication quality output, it is important to control the size of the graphic that will appear
in the final document. The graphic should be drawn at its final size in Matlab. Scaling in Word is
undesirable, as it not only scales the plot, but also the text on the graphic. This report outlines a
procedure that sets the size of the figure and the font size in Matlab. Once set, the graphic can be
imported into Word with no further scaling.
Results indicate that the PostScript format is the best option for good quality graphics. Graphics imported
using cut and paste from Matlab (EMF or bitmap format) are noticeably inferior in quality.
P. Kabal
Windows for Transform Processing
MMSP Lab Technical Report, Dept. Electrical & Computer Engineering, McGill University, Dec. 2005
This report examines the time windows used in processing of signals in a transformed domain. The goal of
windowing is to create frames of data, each of which will be used to calculate a transformed sequence.
The transform coefficients are then modified (filtering for instance for noise reduction) or coded
(transform coding). The modified transform coefficients are then applied to an inverse transform and
windowed again before creating an output signal using addition of the overlapped blocks. It is the analysis
window (before the transform) and the synthesis window (after the inverse transform) that are examined
in this report. The requirement for perfect reconstruction (when the transform coefficients are not
modified) is developed. This gives a condition on the product of the analysis and synthesis windows. An
argument is given to show that if additive noise is introduced in the transform domain, the windowing
should be equally apportioned between these windows, i.e. the analysis and synthesis windows should be
the same. The windowing requirements for systems implementing block-by-block filtering of the input
signal in the transform domain are also examined.
P. Kabal
Ill-Conditioning and Bandwidth Expansion in Linear Prediction of Speech
MMSP Lab Technical Report, Dept. Electrical & Computer Engineering, McGill University, Feb. 2003
This report examines schemes that modify linear prediction (LP) analysis for speech signals. First
techniques which improve the conditioning of the LP equations are examined. White noise compensation
for the correlations is justified from the point of view of reducing the range of values which the predictor
coefficients can take on. A number of other techniques which modify the correlations are investigated
(highpass noise, selective power spectrum modification). The efficacy of these procedures is measured
over a large speech database. The results show that white noise compensation is the method of choice - it
is both effective and simple.
Other methods to prematurely terminate the iterative solution of the correlation equations (Durbin
recursion) to circumvent problems of ill-conditioning are also investigated.
The report also considers the bandwidth expansion of digital filters which have resonances. In speech
coding such resonances correspond to the formant frequencies. Bandwidth expansion of the LP filter
serves to avoid unnatural sharp resonances that may be artefacts of pitch and formant interaction. Lag
windowing of the correlation values has been used with the aim of both bandwidth expansion and helping
the conditioning of the LP equations. Experiments show that the benefit for conditioning is minimal. This
report also discusses bandwidth expansion of the prediction coefficients after LP analysis using radial
scaling of the z-transform. A simple new formula is given which can be used to estimate the bandwidth
expansion.
R. Der
Stable Symmetric Distributions and Their Role in the Signal Separation Problem
MMSP Lab Technical Report, Dept. Electrical & Computer Engineering, McGill University, Feb. 2003
This report examines the problem of blind source separation when the sources are distributed from a
stable class. We show that cost functions extemising any marginal property of a mixture of signals are
constant over the set of symmetric stable distributions, and thus cannot solve the blind source separation
problem in full generality. These distributions are non-pathological, but have infinite energy. The
noticeable exception is the Gaussian distribution, for which the separation problem is inherently
undetermined. For finite variance signals, the use of marginal statistics for blind signal separation is
justified.
P. Kabal
An Examination and Interpretation of ITU-R BS.1387: Perceptual Evaluation of Audio Quality
MMSP Lab Technical Report, Dept. Electrical & Computer Engineering, McGill University, May 2002 (updated Dec. 2003)
This report examines the standard which describes a method for the objective measure of perceived audio
quality (ITU-R Recommendation BS.1387). This standard uses a number of psycho-acoustical measures
which are combined to give a measure of the quality difference between two instances of a signal (a
reference and a test signal). Many aspects of the standard are under-specified. This report examines
alternate interpretations. It also looks at efficiency issues in the implementation of computationally
intensive parts of the algorithm.
Matlab code: PQevalAudio-v1r0.tar.gz
R. Der
Blind Signal Separation
MMSP Lab Technical Report, Dept. Electrical & Computer Engineering, McGill University, Sept. 2001
Blind Signal Separation is the task of separating signals when only their mixtures are observed. Recently,
Independent Component Analysis has become a favourite method of researchers for attacking this
problem. We review the techniques, from cumulant-based algorithms to Infomax to second-order
statistics, from feedback to feedforward architectures, from the instantaneous to the convolutional
problem. A new method for reducing the whitening effect on speech, known to occur in feedforward
architectures, is introduced. The procedure also possesses significant stabilization properties, being based
on performing the filter update in the LP-residual domain of speech. Experimental tests are conducted,
and the algorithms compared.
P. Kabal
Generating Gaussian Pseudo-Random Deviates
MMSP Lab Technical Report, Dept. Electrical & Computer Engineering, McGill University, Oct. 2000
This report examines low-complexity methods to generate pseudo-random Gaussian (normal) deviates.
We introduce a new method based on modelling the Gaussian probability density function using piecewise
linear segments. This approach is shown to be both efficient and accurate. It does not require the
calculation of transcendental functions
All of the methods considered map one or more uniform distributions to create the Gaussian deviates.
This report investigates the effect of the use of discrete variates, particularly in the tails of the Gaussian
distribution. In addition, we give a new interpretation of the method of aliases that suggests its
application to non-uniform quantization.
P. Kabal
Formatting a Thesis with LaTeX
MMSP Lab Technical Report, Dept. Electrical & Computer Engineering, McGill University, March 2000 (updated June 2005)
Thesis Macros: ThesisStyle.zip
This report describes the use of LaTeX to format a thesis. A number of topics are covered: content and
organization of the thesis, LaTeX macros for controlling the thesis layout, formatting mathematical
expressions, generating bibliographic references, importing figures and graphs, generating graphs in
Matlab, and formatting tables. The LaTeX macros used to format a thesis (and this document) are
described. As well, Matlab procedures are shown to illustrate methods that can be used to format graphs
in a form suitable for inclusion in a LaTeX document.
P. Kabal
Matlab Plots in Microsoft Word
MMSP Lab Technical Report, Dept. Electrical & Computer Engineering, McGill University, March 2000
Superseded by the version of Jan. 2006
P. Kabal
Measuring Speech Activity
MMSP Lab Technical Report, Dept. Electrical & Computer Engineering, McGill University, August 1999
This report discusses the algorithm described in ITU-T Recommendation P.56 for measuring the active
speech level. Method B in P.56 determines a speech activity factor representing the fraction of time that
the signal is considered to be active speech (as opposed to background idle noise) and the corresponding
active level for the speech part of the signal. The basic algorithm generates an envelope value at each
sample time. The envelope values are compared with a discrete set of thresholds. The (approximate)
active speech level is determined by interpolating in the log domain between the threshold values. In this
report we assess the effects on the speech active level due to interpolation. Recommendation P.56 allows
for sampling rates as low as 600 Hz. Results for subsampled data are compared with those calculated at
the full speech sampling rate.
A. Roset
Application of Quadrature Mirror Filters to Split Band Voice Coding Process
Technical Report 80-03, INRS-Telecommunications, University of Quebec, January 1980
This report discusses an application of quadrature ;mirror filters for an 8 sub-band coder; this system
allows us to take adavance of the differences in the long term power and of the just nopticable noide in
each band.t
P. Kabal
Minimum Mean Square Error Quantizers
Technical Report 80-09, INRS-Telecommunications, University of Quebec, May 1980
This report discusses the design of quantizers which minimize the mean square error for a signal with a
given probability density function. Tables of optimal non-uniform quantizers are given for signals with
Gaussian, Laplace (exponential) and gamma distributions. These figures correct values given previously in
the literature. An appendix documents a program for calculating an optimal quantizer for an empirically
derived tabulated probability density.
M. Belleau and P. Kabal
Optimal QUantizers in Linear Predictive Coding of Speech
Technical Report 80-23, INRS-Telecommunications, University of Quebec, May 1980
D. C. Stevenson and P. Kabal
Comparative Evaluation of Residual-Excited Linear Prediction and Sub-Band Coding for Speech Transmission at 9.6 kb/s
Technical Report 79-14, INRS-Telecommunications, University of Quebec, October 1979
P. Kabal
Simulation of Digital Coding Techniques for Speech Transmission at 9.6 kb/szers
Technical Report 78-08, INRS-Telecommunications, University of Quebec, December 1978
Speech transmission at 9.6 kb/s is of significant interest becaus that is the highest rate currently
attainable over analog voice lines. Two methods of speech coding, residual-ecited linear prediction (RELP)
and sub-band coding (SBC) are simulated and evaluated.