Lab 9a. Linear Predictive Coding For Speech Processing: Vocal Tract Parameters Pitch Period Voiced/Unvoiced Speech Switch

E E 2 7 5 Lab October 27, 2007
Lab 9a. Linear Predictive Coding for Speech Processing

Impulse Train
Generator
Random
Noise
Generator
Vocal Tract
Parameters
Time-Varying
Digital Filter
Block Diagram of simplified model of speech production
Pitch Period
Voiced/Unvoiced
Speech
Switch
H(z)
Figure 1:
Sections 0.4 and 0.5 contain the Lab Experiment and Lab Report needed.
0.1 Basic Principles of Linear Predictive Analysis
The basic discrete-time model of speech production is shown above. The composite spectral eects of
radiation, vocal tract and glottal excitation are represented by a time-varying digital lter. For short
periods when parameters are considered stationary, we have a time-invariant system. The steady-state
transfer function H(z) of the lter part of the model is modeled as,
H(z) =
S(z)
U(z)
=
G
1 a
1
z
1)
a
2
z
2
a
3
z
3
. . . a
p
z
p
(1)
The vocal tract system is excited by signal u[n], which will be an impulse train for voiced speech or random
noise for unvoiced speech. Thus, the parameters of this speech model are: voiced-unvoiced classication,
pitch period for voiced speech, gain parameter G and the coecients {a
k
} of the lter. These are the
parameters that are transmitted in coded speech.
There are many methods for estimation of pitch period and voiced/unvoiced classication. They are not
discussed here and actually are not implemented in this Demo. What is implemented is a method for
determining lter coecients (lattice lter coecients, referred to as reection coecients). It is these
lter coecients that are transmitted along with a residual signal instead of the parameters referred to above.
We consider the simplied all-pole model of Figure 1 , equation (1) as the natural representation of non-
nasal voiced sounds. (For nasals and fricatives, the acoustic theory calls for both poles and zeros in the
vocal tract transfer function H(z)). Actually, if the lter order p is high enough, the all-pole model provides
a fairly good representation for almost all the sounds of the speech. The major advantage of the all-pole
model is that the gain parameter G and the lter coecients a
k
can be estimated in a straightforward and
computationally ecient way using the method of linear predictive analysis.
0.2 Linear Predicton Analysis & Synthesis Filters
We assume that speech is modeled as shown in Figure 1. The speech s(n) is related to excitation u(n) by
s[n] =
p
k=1
a
k
s[n k] + Gu[n] (2)
To obtain model coecients, we resort to the following: Assume that you are trying to predict signal s[n]
at time n from previous values at times n 1, n 2, . . . etc.. A linear predictor with prediction coecients
k
is dened as a system whose output is
s[n] =
p
k=1
k
s[n k] (3)
The transfer function of the p
th
order linear predictor of equation (3) is the polynomial
P(z) =
p
k=1
k
z
k
The prediction error e(n) is dened as
e[n] = s[n] s[n] = s[n]
p
k=1
k
s[n k] (4)
Equivalently,
E(z) = A(z)S(z)
where
A(z) = 1
p
k=1
k
z
k
Comparing equations (2) and (4) it is seen that when the speech signal obeys the model of (2) exactly, then
k
= a
k
exactly. Then e[n] = Gu[n] and E(z) = GU(z). Thus the prediction error lter A(z) will be the
inverse lter of the system H(z) of (1). That is,
E(z) = GU(z) = A(z)S(z)
Hence,
H(z) =
S(z)
U(z)
=
G
A(z)
So we have A(z), the analysis lter and H(z), the synthesis lter.
The basic problem of linear prediction analysis is to determine the set of predictor coecients coecients
k
directly from the speech signal. Because of the non-stationary nature of speech, coecients are determined
for short segments of the speech where the signal is considered approximately stationary. These are found
through a minimization of the mean-square prediction error. The resulting parameters are then assumed to
be the parameters of the system function H(z) which is then used for the synthesis of that speech segment.
The method of determining these coecients is outlined below.
0.3 Minimum Mean-Square Error and the Orthogonality Principle
We consider the linear prediction problem of equation (3) as predicting a random variable from a set of other
random variables. Given RVs (x
1
, x
2
, . . . , x
n
) we wish to nd n constants
a
1
, a
2
, a
3
, . . . , a
n
such that we form a linear estimate of a random variable s by the sum of RVs
s = a
1
x
1
+ a
2
x
2
+ . . . , +a
n
x
n
. (5)
This is typically done by assuring that the the mean-square value
P = E{|s (a
1
x
1
+ a
2
x
2
+ . . . , +x
n
)|
2
}
of the resulting error
= s s = s (a
1
x
1
+ a
2
x
2
+ . . . , +x
n
)
is minimum. We do this by setting
P
a
i
= E{2[s (a
1
x
1
+ a
2
x
2
+ . . . , +a
n
x
n
)](x
i
)} = 0 (6)
which yields the so-called Yule Walker equations:
Setting i = 1, 2, . . . , n in equation (6) we get
R
11
a
1
+ R
12
a
2
+ .... + R
1n
a
n
= R
01
R
21
a
1
+ R
22
a
2
+ .... + R
2n
a
n
= R
02
R
31
a
1
+ R
32
a
2
+ .... + R
3n
a
n
= R
03
............................................................
R
n1
a
1
+ R
n2
a
2
+ .... + R
nn
a
n
= R
0n
(7)
where
R
ji
= E{x
i
x
j
} R
0j
= E{sx
j
}
If the data x
i
are linearly independent then the determinant of the coecients R
ij
is positive. Equation
(7) is solved for the unknown coecients a
k
, k = 1, 2, . . . n (
k
on the previous page) by using the so-called
Levinson-Durbin algorithm. Accordingly, the problem essentially consists of determining, for a short segment
of speech, the matrix of correlation coecients R
i,j
and then inverting the matrix to obtain the prediction
coecients which are then transmitted. All this often has to be done in real-time.
0.4 MATLAB LPC DEMO
Run the Demo as per instructions in Lab 9.
Demo Decsription
The demo consists of two parts; analysis and synthesis. The analysis portion is found in the transmitter
section of the system.
Analysis Section:
In this simulation, the speech signal is divided into frames of size 20 ms (160 samples), with an overlap of 10
ms (80 samples). Each frame is windowed using a Hamming window. The original speech signal is passed
through an analysis lter, which is an all-zero lter. It is a so-called lattice lter with coecients referred
to as reection coecients obtained in the previous step. The output of the lter is called the residual
signal. This is what is transmitted here along with the lter coecients. Here, the analysis section output
is simply connected to the synthesis portion.
Synthesis Section:
This residual signal is passed through a synthesis lter which is the inverse of the analysis lter. The output
of the synthesis lter is the original signal.
0.5 LAB REPORT
Give a brief description of what exactly is happening in the analysis and synthesis portion of the MATLAB
LPC speech analysis and synthesis Demo. Observe the residual signal and lter coecients generated in the
Analysis section that are then transmitted to the synthesis section.
Figure 2:
Ref: MATLAB Help, Linear Predicting & Coding of Speech.
Class notes:mirchand/ee276-2003

Lab 9a. Linear Predictive Coding For Speech Processing: Vocal Tract Parameters Pitch Period Voiced/Unvoiced Speech Switch

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lab 9a. Linear Predictive Coding For Speech Processing: Vocal Tract Parameters Pitch Period Voiced/Unvoiced Speech Switch

Uploaded by

Copyright:

Available Formats

E E 2 7 5 Lab October 27, 2007

Lab 9a. Linear Predictive Coding for Speech Processing

You might also like