Professional Documents
Culture Documents
Outline
The general problem of fundamental frequency estimation is to take a portion of signal and to find the dominant
frequency of repetition. Difficulties arise from (i) that not all signals are periodic, (ii) those that are periodic may
be changing in fundamental frequency over the time of interest, (iii) signals may be contaminated with noise, even
with periodic signals of other fundamental frequencies, (iv) signals that are periodic with interval T are also
periodic with interval 2T, 3T etc, so we need to find the smallest periodic interval or the highest fundamental
frequency; and (v) even signals of constant fundamental frequency may be changing in other ways over the
interval of interest.
A reliable way of obtaining an estimate of the dominant fundamental frequency for long, clean, stationary speech
signals is to use the cepstrum. The cepstrum is a Fourier analysis of the logarithmic amplitude spectrum of the
signal. If the log amplitude spectrum contains many regularly spaced harmonics, then the Fourier analysis of the
spectrum will show a peak corresponding to the spacing between the harmonics: i.e. the fundamental frequency.
Effectively we are treating the signal spectrum as another signal, then looking for periodicity in the spectrum itself.
The cepstrum is so-called because it turns the spectrum inside-out. The x-axis of the cepstrum has units of
quefrency, and peaks in the cepstrum (which relate to periodicities in the spectrum) are called rahmonics.
To obtain an estimate of the fundamental frequency from the cepstrum we look for a peak in the quefrency
region corresponding to typical speech fundamental frequencies. Here is an example:
http://www.phon.ucl.ac.uk/courses/spsci/matlab/lect10.html 1/6
6/6/2014 Introduction to Computer Programming with MATLAB
xlabel('Time (s)');
ylabel('Amplitude');
%
% do fourier transform of windowed signal
Y=fft(x.*hamming(length(x)));
%
% plot spectrum of bottom 5000Hz
hz5000=5000*length(Y)/fs;
f=(0:hz5000)*fs/length(Y);
subplot(3,1,2);
plot(f,20*log10(abs(Y(1:length(f)))+eps));
legend('Spectrum');
xlabel('Frequency (Hz)');
ylabel('Magnitude (dB)');
%
% cepstrum is DFT of log spectrum
C=fft(log(abs(Y)+eps));
%
% plot between 1ms (=1000Hz) and 20ms (=50Hz)
q=(ms1:ms20)/fs;
subplot(3,1,3);
plot(q,abs(C(ms1:ms20)));
legend('Cepstrum');
xlabel('Quefrency (s)');
ylabel('Amplitude');
To search for the index of the peak in the cepstrum between 1 and 20ms, and then convert back to hertz, use:
http://www.phon.ucl.ac.uk/courses/spsci/matlab/lect10.html 2/6
6/6/2014 Introduction to Computer Programming with MATLAB
[c,fx]=max(abs(C(ms1:ms20)));
fprintf('Fx=%gHz\n',fs/(ms1+fx-1));
The cepstrum works best when the fundamental frequency is not changing too rapidly, when the fundamental
frequency is not too high and when the signal is relatively noise-free. A disadvantage of the cepstrum is that it
involves computationally-expensive frequency-domain processing.
The cepstrum looks for periodicity in the log spectrum of the signal, whereas our perception of pitch is more
strongly related to periodicity in the waveform itself. A means to estimate fundamental frequency from the
waveform directly is to use autocorrelation. The autocorrelation function for a section of signal shows how well
the waveform shape correlates with itself at a range of different delays. We expect a periodic signal to correlate
well with itself at very short delays and at delays corresponding to multiples of pitch periods.
This code plots the autocorrelation function for a section of speech signal:
You can see that the autocorrelation function peaks at zero delay and at delays corresponding to 1 period,
2 periods, etc. We can estimate the fundamental frequency by looking for a peak in the delay interval
corresponding to the normal pitch range in speech, say 2ms(=500Hz) and 20ms (=50Hz). For example:
http://www.phon.ucl.ac.uk/courses/spsci/matlab/lect10.html 3/6
6/6/2014 Introduction to Computer Programming with MATLAB
The autocorrelation approach works best when the signal is of low, regular pitch and when the spectral content
of the signal is not changing too rapidly. The autocorrelation method is prone to pitch halving errors where a
delay of two pitch periods is chosen by mistake. It can also be influenced by periodicity in the signal caused by
formant resonances, particularly for female voices where F1 can be lower in frequency than Fx.
Estimation of formant frequencies is generally more difficult than estimation of fundamental frequency. The
problem is that formant frequencies are properties of the vocal tract system and need to be inferred from the
speech signal rather than just measured. The spectral shape of the vocal tract excitation strongly influences the
observed spectral envelope, such that we cannot guarantee that all vocal tract resonances will cause peaks in the
observed spectral envelope, nor that all peaks in the spectral envelope are caused by vocal tract resonances.
The dominant method of formant frequency estimation is based on modelling the speech signal as if it were
generated by a particular kind of source and filter:
This type of analysis is called source-filter separation, and in the case of formant frequency estimation, we are
interested only in the modelled system and the frequencies of its resonances. To find the best matching system
we use a method of analysis called Linear Prediction. Linear prediction models the signal as if it were
generated by a signal of minimum energy being passed through a purely-recursive IIR filter.
http://www.phon.ucl.ac.uk/courses/spsci/matlab/lect10.html 4/6
6/6/2014 Introduction to Computer Programming with MATLAB
We will demonstrate the idea by using LPC to find the best IIR filter from a section of speech signal and then
plotting the filter's frequency response.
To find the formant frequencies from the filter, we need to find the locations of the resonances that make up the
filter. This involves treating the filter coefficients as a polynomial and solving for the roots of the polynomial.
Details are outside the scope of this course (see readings). Here is the code to find estimated formant
frequencies from the LP filter:
http://www.phon.ucl.ac.uk/courses/spsci/matlab/lect10.html 5/6
6/6/2014 Introduction to Computer Programming with MATLAB
Reading
A mathematical treatment can be found in: J. Makhoul. Linear prediction: A tutorial review. Proceedings of the
IEEE, 63(4):561-580, April 1975.
A good set of lecture notes on the mathematics of speech signal processing can be found at:
http://mi.eng.cam.ac.uk/~ajr/SA95/SpeechAnalysis.html
Exercises
For these exercises, use the editor window to enter your code, and save your answers to files under your
account on the central server. When you save the files, give them the file extension of ".m". Run your programs
from the command window.
1. Write a program (ex101.m) that asks the user for a filename (WAV file), and plots a fundamental frequency
track against time for the whole signal using the cepstrum method. Use a window size of 30ms and an output
frame rate of 100 Fx samples/second.
2. Modify the previous exercise to use the autocorrelation method (ex102.m). In this program set the
fundamental frequency value to zero when the peak of the autocorrelation function is less than 0.5. This acts
as a crude voicing decision.
3. Write a program (ex103.m) that asks the user for a filename (WAV file), and produces a scatter plot of
formant peaks against time. Use a window size of 30ms and an output frame rate of 100 per second.
4. (Homework) Write a GUI program which asks you to say a vowel and plots the location of the vowel on a
Vowel Quadrilateral diagram (effectively F1 vs F2).
http://www.phon.ucl.ac.uk/courses/spsci/matlab/lect10.html 6/6