LPC

LPC Methods
LPC methods are the most widely used in

speech coding, speech synthesis, speech
recognition, speaker recognition and verification
and for speech storage
Digital Speech Processing

Lecture 13
LPC methods provide extremely accurate estimates

of speech parameters, and does it extremely
efficiently
basic idea of Linear Prediction: current speech
sample can be closely approximated as a linear
combination of past samples, i.e.,
Linear Predictive
Coding (LPC)Introduction
s(n ) =
s(n k ) for some value of p, k 's
k =1
LPC Methods
LPC Methods
LP is based on speech production and synthesis models

- speech can be modeled as the output of a linear,
time-varying system, excited by either quasi-periodic
pulses or noise;
for periodic signals with period N p , it is obvious that
- assume that the model parameters remain constant

over speech analysis interval
s( n ) s( n N p )
but that is not what LP is doing; it is estimating s(n ) from
the p ( p << N p ) most recent values of s(n ) by linearly
LP provides a robust, reliable and accurate method for

estimating the parameters of the linear system (the combined
vocal tract, glottal pulse, and radiation characteristic for voiced speech)
predicting its value

for LP, the predictor coefficients (the k 's) are determined
(computed) by minimizing the sum of squared differences
(over a finite interval) between the actual speech samples
and the linearly predicted ones
3
LPC Methods
Basic Principles of LP
p
s( n ) =
LP methods have been used in control and

information theorycalled methods of system
estimation and system identification
k =1
the time-varying digital filter

represents the effects of the glottal
pulse shape, the vocal tract IR, and
radiation at the lips
used extensively in speech under group of names

including
1.
2.
3.
4.
5.
6.
7.
a s( n k ) + G u ( n )
covariance method
autocorrelation method
lattice method
inverse filter formulation
spectral estimation formulation
maximum likelihood method
inner product method
H (z ) =
S( z )
=
GU ( z )
1
p
a z
k
k =1
the system is excited by an impulse

train for voiced speech, or a random
noise sequence for unvoiced speech
k
this all-pole model is a natural

representation for non-nasal voiced
speechbut it also works reasonably
well for nasals and unvoiced sounds
6
LP Basic Equations
LP Estimation Issues
a pth order linear predictor is a system of the form

p
p
S% ( z )
k s( n k ) P ( z ) =
k z k =
s%( n ) =
S( z )
k =1
k =1
need to determine { k } directly from speech such
that they give good estimates of the time-varying

spectrum
the prediction error, e( n ), is of the form
need to estimate { k } from short segments of speech
e( n ) = s(n ) s%(n ) = s( n )
s(n k )
need to minimize mean-squared prediction error over
k =1
the prediction error is the output of a system with transfer function
short segments of speech

resulting { k } assumed to be the actual {ak } in the
A( z ) =
E (z )
= 1 P (z ) = 1
k zk
S( z )
k =1
speech production model

=> intend to show that all of this can be done efficiently,
if the speech signal obeys the production model exactly, and if k = ak ,1 k p

e( n ) = Gu(n )and A( z ) is an inverse filter for H ( z ), i.e.,
H (z ) =
reliably, and accurately for speech
1
A( z )
Solution for {k}
Solution for {k}

short-time average prediction squared-error is defined as
En =
en2 (m) =
( sn (m) s%n (m))
En
= 0,
i
i = 1, 2,..., p
giving the set of equations
sn (m)
=
k sn (m k )
m
k =1
select segment of speech sn (m ) = s(m + n ) in the vicinity
can find values of k that minimize En by setting:
of sample n
the key issue to resolve is the range of m for summation
(to be discussed later)
s (m i )[s (m) s (m k )] = 0, 1 i p
n
k n
k =1
s (m i ) e (m) = 0, 1 i p
n
where k are the values of k that minimize En (from now

on just use k rather than k for the optimum values)
prediction error (en (m)) is orthogonal to signal (sn (m i )) for
delays (i ) of 1 to p
10
Solution for {k}
Solution for {k}

minimum mean-squared prediction error has the form
defining
n (i , k ) =
En =
s (m i )s (m k )
n
En = n (0, 0)
k =1
which can be written in the form
we get
k n
2
n
(i, k ) = (i, 0),
s (m ) s (m ) s ( m k )
i = 1, 2,..., p
(0, k )
k n
k =1
k =1
leading to a set of p equations in p unknowns that can be

solved in an efficient manner for the { k }
Process:
1. compute n (i , k ) for 1 i p, 0 k p
2. solve matrix equation for k
need to specify range of m to compute n ( i , k )
11
need to specify sn (m )
12
Autocorrelation Method
if sn ( m ) is non-zero only for 0 m L 1 then
assume sn (m ) exists for 0 m L 1 and is
en ( m ) = sn (m )
s (m k )
k n
k =1
exactly zero everywhere else (i.e., window of

length L samples)
(Assumption #1)
sn (m ) = s(m + n ) w ( m ), 0 m L 1
is non-zero only over the interval 0 m L 1 + p, giving

L 1+ p
En =
e (m ) = e ( m )
2
n
m =
2
n
m =0
at values of m near 0 (i.e., m = 0,1...., p 1) we are predicting signal from zero-valued samples
outside the window range => en ( m ) will be (relatively) large
where w (m ) is a finite length window of length
at values near m = L (i.e., m = L, L + 1,..., L + p 1) we are predicting zero-valued samples

(outside window range) from non-zero samples => en (m ) will be (relatively) large
L samples
for these reasons, normally use windows that taper the segment to zero (e.g., Hamming window)
^n
^
n+L-1
m=0
m=L
m=0
m=L-1
m=L+p-1
m=p-1
13
14
The Autocorrelation Method
^n + L 1
sn [m] = s[ m + n ]w[ m]
L-1
L 1 k
s [m]s [m + k ]
Rn [k ] =
m =0
k = 1, 2,K, p
15
The Autocorrelation Method
16
for calculation of n (i , k ) since sn (m ) = 0 outside the range 0 m L 1, then
n
sn [m] = s[m + n]w[m]
n + L 1
n + L 1
n (i , k ) =
L 1+ p
s (m i )s (m k ), 1 i p, 0 k p
n
m =0
which is equivalent to the form
n (i , k ) =
L 1+( i k )
sn (m )sn ( m + i k ), 1 i p, 0 k p
m =0
there are L | i k | non-zero terms in the computation of n ( i , k ) for each value
Large errors
en [m] = sn [m] k sn [m k]
at ends of s [m] = s[m + n]w[m]
k =1
window
p
L1
of i and k ; can easily show that

n (i , k ) = f (i k ) = Rn (i k ),
1 i p, 0 k p
where Rn (i k ) is the short-time autocorrelation of sn (m ) evaluated at i k where

L 1 k
Rn (k ) =
s (m)s (m + k )
n
m =0
L 1 ( L 1 + p)
17
18
since Rn (k ) is even, then
n (i , k ) = Rn ( i k ), 1 i p, 0 k p
as expressed in matrix form
thus the basic equation becomes
k n (i k ) = n (i , 0), 1 i p
k =1
p
R ( i k ) = R (i ),
k
1 i p
k =1
with the minimum mean-squared prediction error of the form
with solution
En = n (0, 0)
= 1 r
is a p p Toeplitz Matrix => symmetric with all diagonal elements equal
=> there exist more efficient algorithms to solve for { k } than simple matrix
k n (0, k )
k =1
= Rn (0)
R (k )
k
inversion
k =1
Covariance Method
there is a second basic approach to defining the speech

segment sn (m ) and the limits on the sums, namely fix the
interval over which the mean-squared error is computed,
(Assumption #2):
giving
En =
m =0
n (i , k ) =
=
k sn (m k )
sn (m )
m =0
k =1
L 1
en2 (m )
changing the summation index gives
n (i , k ) =
n (i , k ) =
L i 1
s (m) s (m + i k ), 1 i p, 0 k p
n
m = i
L k 1
s (m) s (m + k i ), 1 i p, 0 k p
n
m = k
key difference from Autocorrelation Method is that limits of summation

include terms before m = 0 => window extends p samples backwards
from s(n - p ) to s(n + L - 1)
L 1
s (m i )s (m k ), 1 i p, 0 k p
n
20
19
Covariance Method
L 1
. . Rn ( p 1) 1 Rn (1)

. . Rn ( p 2) 2 Rn ( 2)
. = .
. .
.

. .
.
. .
. .
Rn (0) p Rn ( p )
Rn (1)
Rn (0)
Rn (0)
Rn (1)
.
.
.
.
R ( p 1) R ( p 2)
n
n
= r
since we are extending window backwards, don't need to taper it using

a HW- since there is no transition at window edges
m =0
m=-p
m=L-1
21
Covariance Method
22
Covariance Method
cannot use autocorrelation formulation => this is a true cross correlation
need to solve set of equations of the form
p
(i, k ) = (i, 0),

k n
i = 1, 2,..., p,
k =1
En = n (0, 0)
n (0, k )
k =1
n (1,1) n (1, 2)
n ( 2,1) n ( 2, 2)
.
.
.
.
( p,1) ( p, 2)
n
n
. . n (1, p ) 1 n (1, 0)

. . n ( 2, p ) 2 n ( 2, 0)
. .
. . = .

. .
. . .
. . n ( p, p ) p n ( p, 0)
= or = 1
23
24
Covariance Method
Summary of LP
use pth order linear predictor to predict s(n ) from p previous samples
minimize mean-squared error, En , over analysis window of duration L-samples
we have n (i , k ) = n (k , i ) => symmetric but not Toeplitz matrix
solution for optimum predictor coefficients, { k }, is based on solving a matrix equation
whose diagonal elements are related as
n (i + 1, k + 1) = n (i , k ) + sn ( i 1)sn ( k 1) sn (L 1 i )sn (L 1 k )
n ( 2, 2) = n (1,1) + sn ( 2)sn ( 2) sn (L 2)sn (L 2)
=> two solutions have evolved

- autocorrelation method => signal is windowed by a tapering window in order to
minimize discontinuities at beginning (predicting speech from zero-valued samples)
and end (predicting zero-valued samples from speech samples) of the interval; the
all terms n (i , k ) have a fixed number of terms contributing to the
matrix n (i , k ) is shown to be an autocorrelation function; the resulting autocorrelation
computed values (L terms)
matrix is Toeplitz and can be readily solved using standard matrix solutions
- covariance method => the signal is extended by p samples outside the normal range
n (i , k ) is a covariance matrix => specialized solution for { k }
of 0 m L - 1 to include p samples occurring prior to m = 0; this eliminates large errors

in computing the signal from values prior to m = 0 (they are available) and eliminates the
called the Covariance Method
need for a tapering window; resulting matrix of correlations is symmetric but not Toeplitz
=> different method of solution with somewhat different set of optimal prediction
coefficients, { k }
25
26
LPC Summary
LPC Summary
3. LPC Minimization:
1. Speech Production Model:
En = en2 (m ) = sn (m ) s%n (m )
s(n ) = ak s(n k ) + Gu(n )
k =1
H(z) =
S( z )
=
GU ( z )
= sn (m ) k sn ( m k )
m
k =1
1
p
1 ak z
En
= 0,
i
k =1
2. Linear Prediction Model:

p
s%(n ) = k s(n k )
i = 1, 2,..., p
p
s (m i )s (m) = s (m i )s (m k )
n
k =1
P(z) =
S% ( z ) p
= k z k
S( z ) k =1
k =1
n (i , k ) = sn (m i )sn (m k )
m
(i, k ) = (i, 0),
e(n ) = s(n ) s%(n ) = s(n ) k s(n k )
k =1
k =1
p
E(z)
= 1 k z k
A( z ) =
S( z )
k =1
k n
i = 1, 2,..., p
27
LPC Summary
4. Autocorrelation Method:
sn (m ) = s(m + n )w (m ),
En = n (0, 0) kn (0, k )
k =1
28
LPC Summary
4. Autocorrelation Method:
0 m L 1
resulting matrix equation:
en (m ) = sn (m ) k sn (m k ), 0 m L 1 + p
k =1
= r or = 1r
sn (m ) defined for 0 m L 1; en (m ) defined for 0 m L 1 + p

large errors for 0 m p 1 and for L m L + p 1
En =
L 1+ p
e (m )
m =0
2
n
n (i , k ) = Rn (i k ) =
L 1( i k )
m =0
R ( i k ) = R (i ),
k =1
Rn (1)
Rn (0)
R (1)
R
n
n ( 0)
.
.
.
.
Rn ( p 1) Rn ( p 2)
sn (m )sn (m + i k ) = Rn ( i k
1 i p
. . Rn ( p 1) 1 Rn (1)

. . Rn ( p 2) 2 Rn ( 2)
. = .
. .
.

. .
.
. .
Rn (0) p Rn ( p )
. .
En = Rn (0) k Rn (k )
k =1
29
matrix equation solved using Levinson or Durbin method
30
LPC Summary
Computation of Model Gain
5. Covariance Method:
it is reasonable to expect the model gain, G, to be determined by matching
fix interval for error signal
the signal energy with the energy of the linearly predicted samples
En = e (m ) = sn (m ) k sn ( m k )
m =0
m =0
k =1
need signal for from s(n p ) to s(n + L 1) L + p samples

L 1
L 1
from the basic model equations we have
2
n
(i, k ) =
k =1
Gu( n ) = s( n )
a s(n k ) model
k
k =1
n (i , 0), i = 1, 2,..., p
k n
whereas for the prediction error we have

p
e(n ) = s(n )
En = n ( 0, 0) kn (0, k )
when k = ak (i.e., perfect match to model), then
expressed as a matrix equation:
e(n ) = Gu(n )
= or = 1 , symmetric matrix
. . n (1, p ) 1 n (1, 0)

. . n ( 2, p ) 2 n ( 2, 0)
. = .

. .
. . .
. . n ( p, p ) p n ( p, 0)
. .
k =1
k =1
n (1,1) n (1, 2)
( 2,1) ( 2, 2)
n
n
.
.
.
.
n ( p,1) n ( p, 2)
s(n k ) best fit to model
since it is virtually impossible to guarantee that k = ak , cannot use this

simple matching property for determining the gain; instead use energy
matching criterion (energy in error signal=energy in excitation)
L 1+ p
L 1+ p
u (m ) = e (m ) = E
G2
m =0
31
m =0
32
Solution for Gain (Voiced)
Gain Assumptions
for voiced speech the excitation is G (n ) with output h%(n ) (since it is

the IR of the system),
assumptions about excitation to solve for G
h%(n ) =
voiced speech --u(n ) = (n ) L order of a
h%(n k ) + G (n );
k
k =1
single pitch period; predictor order, p, large enough

to model glottal pulse shape, vocal tract IR, and
G
H% ( z ) =
=
A( z )
G
p
z
k
k =1
with autocorrelation R% (m ) (of the impulse response) satisfying the relation

shown below
radiation
h%(n) h%(m + n) = R% [m],
R% (m ) =
unvoiced speech --u (n )-zero mean, unity variance,

stationary white noise process
0m<
n =0
p
R% ( m k ),
1 m <
R% (k ) + G ,
m=0
R% (m ) =
R% (0) =
k =1
p
k =1
33
Solution for Gain (Unvoiced)
Solution for Gain (Voiced)
for unvoiced speech the input is white noise with zero mean
and unity variance, i.e.,
Since R% (m ) and Rn (m ) have the identical form, it follows that

0mp
R% (m ) = c R (m ),
E u(n ) u (n m ) = (m )
if we excite the system with input Gu(n ) and call the output
g% (n ) then
where c is a constant to be determined.

Since the total energies in the signal (R( 0)) and the impulse
response (R% ( 0)) must be equal, the constant c must be 1, and
g% (n ) =
Since the autocorrelation function for the output is the convolution

of the autocorrelation function of the impulse response with the
R (k ) = E
g% (n k ) + Gu(n )
k =1
we obtain the relation

G 2 = Rn (0)
34
autocorrelation function of the white noise input, then

E [g% [n ]g% [n m ]] = R% [m ] [m ] = R% [m ]
k =1
since R% (m ) = Rn (m ), 0 m p, and the energy of the impulse
letting R% ( m ) denote the autocorrelation of g% ( n ) gives
response=energy of the signal => first p + 1 coefficients of the

autocorrelation of the impulse response of the model are identical
R% (m ) = E g% (n ) g% (n m ) =
p
to the first p + 1 coefficients of the autocorrelation function of the

speech signal. This condition called the autocorrelation matching
property of the autocorrelation method.
k R% (m k ),
E [g% (n k )g% (n m)] + E Gu(n )g% (n m)

k
k =1
m0
k =1
35
since E Gu(n )g% (n m ) = 0 for m > 0 because u (n ) is uncorrelated
36
with any signal prior to u(n )
Solution for Gain (Unvoiced)

for m = 0 we get
R% (0) =
Frequency
Domain
Interpretations
of Linear
Predictive
Analysis
R% (k ) + GE u(n)g% (n )
k
k =1
p
R% (k ) + G
k
k =1
since E u( n )g% (n ) = E u(n )(Gu( n ) + terms prior to n = G 2

since the energy in the signal must equal the energy in
the response to Gu( n ) we get
R% (m ) = R (m )
n
G = Rn (0)
2
R (k ) = E
n
k =1
37
38
LPC Spectrum
The Resulting LPC Model

The final LPC model consists of the LPC parameters,
{ k }, k = 1, 2,..., p, and the gain, G, which together

define the system function
G
H% ( z ) =
p
1 k zk
k =1
with frequency response

H% (e j ) =
G
p
1 k e jk
G
A(e j )
H% (e j ) =
k =1
with the gain determined by matching the energy of the

model to the short-time energy of the speech signal, i.e.,
k =1
39
sn [m] = s[m + n ]w[m]
LP Analysis is seen to be a method of short-time spectrum estimation with

removal of excitation fine structure (a form of wideband spectrum analysis)
40
(b) Corresponding shorttime autocorrelation

function used in LP
analysis (heavy line
shows values used in LP
analysis)
The discrete-time Fourier transform of this

windowed segment is:
x = s .* hamming(301);
X = fft( x , 1000 )
[ A , G , r ] = autolpc( x , 10 )
H = G ./ fft(A,1000);
(a) Voiced speech segment

obtained using a
Hamming window
Defined speech segment as:
( ) s[m + n]w[m] e
j k
LP Short-Time Spectrum Analysis
Sn e j =
1 k e
k =1
G = En = ( en (m )) = Rn (0) k Rn (k )
2
G
p
j m
(c) Corresponding shorttime log magnitude

Fourier transform and
short-time log magnitude
LPC spectrum (FS=16
kHz)
m =
Short-time FT and the LP spectrum are linked

via short-time autocorrelation
41
42
Frequency Domain Interpretation of

Mean-Squared Prediction Error
(a) Unvoiced speech

segment obtained using
a Hamming window
(b) Corresponding shorttime autocorrelation
function used in LP
analysis (heavy line
shows values used in LP
analysis)
(c) Corresponding shorttime log magnitude
Fourier transform and
short-time log magnitude
LPC spectrum (FS=16
kHz)
43
Frequency Domain Interpretation of

Mean-Squared Prediction Error
The LP spectrum provides a basis for examining the properties

of the prediction error (or equivalently the excitation of the VT)
The mean-squared prediction error at sample n is:
En =
L + p 1
e [ m]
2
m =0
which, by Parseval's Theorem, can be expressed as:

En =
1
2
| E (e
n
) |2 d =
1
2
G2
2
| Sn (e j ) |2
| H% (e ) | d = G
j
| S (e
n
) |2 | A(e j ) |2 d = G 2
where S n (e j ) is the FT of sn [m] and A(e j ) is the corresponding

prediction error frequency response
p
A(e j ) = 1 k e j k
k =1
44
LP Interpretation Example1
The LP spectrum is of the form:

G
H% (e j ) =
A(e j )
Thus we can express the mean-squared error as:
En =
Much better
spectral matches
to STFT spectral
peaks than to
STFT spectral
valleys as
predicted by
spectral
interpretation of
error
minimization.
We see that minimizing total squared prediction error

is equivalent to finding gain and predictor coefficients
such that the integral of the ratio of the energy spectrum of
the speech segment to the magnitude squared of the frequency
response of the model linear system is unity.
Thus | Sn (e j ) |2 can be interpreted as a frequency-domain
weighting function LP weights frequencies where | S n (e j ) |2
45
is large more heavily than when | Sn (e j ) |2 is small.
46
LP Interpretation Example2
Effects of Model Order
Note small
differences in
spectral shape
between STFT,
autocorrelation
spectrum and
covariance
spectrum when
using short
window duration
(L=51 samples).
The AC function, Rn [m] of the speech segment, sn [m],

and the AC function, R%[m], of the impulse response, h%[m],
corresponding to the system function, H% ( z ), are equal for
the first ( p + 1) values. Thus, as p , the AC functions
are equal for all values and thus:
lim | H% (e j ) |2 =| S (e j ) |2
p
Thus if p is large enough, the FR of the all-pole

model, H% (e j ), can approximate the signal spectrum
with arbitrarily small error.
47
48
49
50
plots show Fourier transform of segment

and LP spectra for various orders
- as p increases, more details of the spectrum
are preserved
- need to choose a value of p that represents
the spectral effects of the glottal pulse, vocal
tract and radiation--nothing else
51
52
Linear Prediction Spectrogram

Linear Prediction Spectrogram
Speech spectrogram previously defined as:
L 1
20 log | S r [k ] |= 20 log| s[rR + m] w[m] e j (2 / N ) km |
L=81, R=3, N=1000,

40 db dynamic range
m =0
for set of times, tr = rRT , and set of frequencies, Fk = kFS / N , k = 1, 2,..., N / 2

where R is the time shift (in samples) between adjacent STFTs,
T is the sampling period, FS = 1/ T is the sampling frequency,
and N is the size of the discrete Fourier transform used to
compute each STFT estimate.
Similarly we can define the LP spectrogram as an image plot of:
20 log | H% r [ k ] |= 20 log
where Gr and Ar (e
j (2 / N ) k
Gr
Ar (e j (2 / N ) k )
) are the gain and prediction error polynomial
at analysis time rR.

53
54
Comparison to Other Spectrum

Analysis Methods
Comparison to Other Spectrum

Analysis Methods
Spectra of synthetic
vowel /IY/
Natural speech spectral

estimates using cepstral
smoothing (solid line) and linear
prediction analysis (dashed
line).
(a) Narrowband
spectrum using
40 msec window
(b) Wideband
spectrum using a
10 msec window
Note the fewer (spurious) peaks

in the LP analysis spectrum
since LP used p=12 which
restricted the spectral match to
a maximum of 6 resonance
peaks.
(c) Cepstrally
smoothed
spectrum
(d) LPC spectrum
from a 40 msec
section using a
p=12 order LPC
analysis
55
Selective Linear Prediction
Note the narrow bandwidths of

the LP resonances versus the
cepstrally smoothed
resonances.
56
Selective Linear Prediction
it is possible to apply LP methods to selected parts of spectrum
0-10 kHz region modeled

using p=28
- 0-4 kHz for voiced sounds use a predictor of order p1

- 4-8 kHz for unvoiced sounds use a predictor of order p2
no discontinuity in model
spectra at 5 kHz
the key idea is to map the frequency region {fA , fB } linearly to {0,.5}
or, equivalently, the region {2 fA , 2 fB } maps linearly to {0, } via
the transformation
2 fA
2 fB
2 fB 2 fA

using p1=14
we must modify the calculation for the autocorrelation to give:

R (m ) =
1
2
| S (e
n

using p2=5
) |2 e j m d
discontinuity in model
spectra at 5 kHz
57
58
LPC Solutions-Covariance Method

for the covariance method we need to solve the matrix equation
p
(i, k ) = (i, 0),

k n
i = 1, 2,..., p
k =1
Solutions of LPC Equations
= (in matrix notation)

is a positive definite, symmetric matrix with (i , j ) element n (i , j ),
and and are column vectors with elements i and n (i , 0)
the solution of the matrix equation is called the Cholesky
decomposition, or square root method
Covariance Method (Cholesky

Decomposition Method)
=VDV t ; V = lower triangular matrix with 1's on the main diagonal

D=diagonal matrix
59
60
10
Cholesky Decomposition Example
can readily determine elements of V and D by solving for (i , j ) elements

of the matrix equation, as follows
consider example with p = 4, and matrix elements n (i , j ) = ij
n (i , j ) =
ik d kV jk ,
1 j i 1
11
21
31
41
k =1
giving
Vij d j = n (i , j )
j 1
1 j i 1
ik d kV jk ,
k =1
and for the diagonal elements
n (i , i ) =
ik d kVik
0
0
1
0
V21 1
V31 V32 1
V41 V42 V43
k =1
giving
d i = n (i , i )
i 1
V
k =1
2
ik
i2
dk ,
with
d1 = n (1,1)
0 d1 0
0 0 d2
0 0 0
1 0 0
31
32
33
34
0
0
d3
0
41
42
=
43
44
0 1 V21 V31 V41
0 0 1 V32 V42
0 0 0
1 V43
d 4 0 0
0
1
61

d1 = 11
now need to solve for using a 2-step procedure

VDV t =
step 1
writing this as
VY= with
V21d1 = 21; V31d1 = 31; V41d1 = 41

V21 = 21 / d1; V31 = 31 / d1; V41 = 41 / d1
d 2 = 22 V212 d1
step 2
DV t = Y or
V t = D1Y
step 3
from V (which is now known) solve for column vector Y

using a simple recursion of the form
V32 d 2 = 32 V31d1V21 V32 = (32 V31d1V21 ) / d 2

V42d 2 = 42 V41d1V21 V42 = (42 V41d1V21 ) / d 2 step 4
Yi = i
iterate procedure to solve for d3 ,V43 , d 4
pi 2
with initial condition

Y1 = 1
64
co n tin u in g th e e xa m p le w e so lv e fo r Y
0 Y1 1

0 Y 2 2
=
0 Y 3 3

V 42 V 43 1 Y 4 4
first so lv in g fo r Y1 Y 4 w e g e t
1
V 21
V 31
V 41
ij
now can solve for using the recursion

ji
i 1
V Y ,
j =1
63
V ,
62
solve matrix for d1,V21,V31,V41, d 2 ,V32 ,V42 , d3 ,V43 , d 4
i = Yi / d i
21
22
32
42
1 i p 1
j = i +1
with initial condition

p = Yp / d p
calculation proceeds backwards from i = p 1
down to i = 1
65
0
1
V 32
0
0
1
Y1 = 1
Y 2 = 2 V 21Y1
Y 3 = 3 V 31Y1 V 32Y 2
Y 4 = 4 V 41Y1 V 42Y 2 V 43Y 3
66
11
Covariance Method Minimum Error
next solve for from equation

0
0
0 Y1 Y1 / d1
1 V21 V31 V41 1 1 / d1

0
0 Y2 Y2 / d 2
0 1 V32 V42 2 = 0 1 / d 2
=
0 0
0
1 / d3
0 Y3 Y3 / d3
1 V43 3 0

0
0
1 / d 4 Y4 Y4 / d 4
0
1 4 0
0 0
giving the results
4 = Y4 / d 4
3 = Y3 / d3 V43 4
2 = Y2 / d 2 V32 3 V42 4
1 = Y1 / d1 V21 2 V31 3 V41 4
the minimum mean squared error can be written in the form

p
En = n (0, 0)
(0, k )
k n
k =1
t
= n (0, 0)
since t = Y t D 1V 1 can write this as

En = n (0, 0) Y t D 1Y
p
= n ( 0, 0)
2
k
/ dk
k =1
this computation for En can be used for all values of LP order from
1 to p can understand how LP order reduces mean-squared error
completing the solution

67
68
Solutions of LPC Equations

Autocorrelation Method via
Levinson-Durbin Algorithm
69
Levinson-Durbin Algorithm 1
70
Autocorrelation equations (at each frame n):

p
By combining the last two equations we get a larger matrix

equation of the form:
R[| i k |] = R[i ] 1 i p
r
=
k =1
R[ p ] 1 E ( p )
( p)
... R[ p 1] 1 0
( p)
... R[ p 2] 2 = 0

.
.
. .
( p)
...
R[0] p 0
expanded (p + 1) x ( p + 1) matrix is still Toeplitz and can be solved iteratively
by incorporating new correlation value at each iteration and
R[1]
R[2]
R[0]
R[1]
R[0]
R[1]
R[2]
R[1]
R[0]
.
.
.
R[ p ] R[ p 1] R[ p 2]
is a positive definite symmetric Toeplitz matrix

The set of optimum predictor coefficients satisfy:
p
R[i] k R[| i k |] = 0,
1 i p
k =1
with minimum mean-squared prediction error of:
...
solving for next higher order predictor in terms of new correlation

value and previous predictor
R[0] k R[k ] = E ( p )
k =1
71
72
12
Show how i th order solution can be derived from (i 1) st

order solution; i.e., given ( i 1) , the solution to R ( i 1) ( i 1) = E ( i 1)
we derive solution to R ( i ) ( i ) = E ( i )
R[1]
R[2]
R[0]
R[1]
R[0]
R[1]
R[2]
R[1]
R[0]
.
.
.
R[i 1] R[i 2] R[i 3]
R[i 1] R[i 2]
R[i]
The (i 1) st solution can be expressed as:

R[1]
R[2]
R[0]
R[1]
[0]
R
R[1]
R[2]
R[1]
R[0]
.
.
.
R[i 1] R[i 2] R[i 3]
Appending a 0 to vector ( i 1) and multiplying by the matrix R (i ) gives

a new set of (i + 1) equations of the form:
... R[i 1] 1 E ( i 1)
... R[i 2] 1( i 1) 0
( i 1)
= 0
... R[i 3] 2
.
. . .
( i 1)
R[0] i 1 0
...
...
R[i] 1 E (i 1)
... R[i 1] 1( i 1) 0
... R[i 2] 2( i 1) 0
=
...
. . .
( i 1)
R[1]
i 1
...
0

R[0] 0 ( i 1)
...
i 1
where ( i 1) = R[i] (ji 1) R[i j ] and R[i] are introduced

j =1
73
74
Key step is that since Toeplitz matrix has special symmetry
component in the vector E ( i ) ) we combine the two

sets of matrices (with a multiplicative factor ki ) giving:
To get the equation into the desired form (a single
we can reverse the order of the equations (first equation last,

last equation first), giving:
R[1]
R[2]
R[0]
R[1]
R[0]
R[1]
R[2]
R[1]
R[0]
.
.
.
R[i 1] R[i 2] R[i 3]
R[i 1] R[i 2]
R[i ]
...
...
...
...
...
...
1
( i 1)
0 E ( i 1)
( i 1)
( i 1)
1
0
i 1 0
( i 1)
i(i 21) 0
( i ) 2
R
ki
ki
=
.
.
. .
i(i11)
0
1( i 1) 0
( i 1)
( i 1)
E
0
1
R[i ] 0
R[i 1] i(i11) 0
R[i 2] i(i 21) 0
=
. . .
(
i
1)
R[1] 1 0

R[0] 1 E (i 1)
( i 1)
Choose ( i 1) so that vector on right has only a single

non-zero entry, i.e.,
i 1
ki =
75
The first element of the right hand side vector is now:
j =1
E ( i 1)
76
1 j p
with prediction error
, the vector of i order predictor

th
m =1
m =1
E ( p ) = E[0] (1 km2 ) = R[0] (1 km2 )
1 1
0
(i ) (i 1)
( i 1)
1 1
i 1
(i )
( i 1)
2 2
i(i 21)
=
ki
. .
.
i(i1) i(i11)
1( i 1)
(i )
1
i 0
If we use normalized autocorrelation coefficients:

r[k ] = R[k ] / R[0]
we get normalized errors of the form:
(i ) =
yielding the updating procedure
(ji ) = (ji 1) ki i(i j1) , j = 1, 2,..., i

i(i ) = ki
R[i ] (ji 1) R[i j ]
j = (j p )
The ki parameters are called PARCOR coefficients.

With this choice of
coefficients is:
E ( i 1)
The final solution for order p is:
E (i ) = E (i 1) ki (i 1) = E (i 1) (1 ki2 )
( i 1)
( i 1)
77
i
i
E (i )
= 1 k(i ) r[k ] = (1 km2 )
R[0]
k =1
m =1
where 0 < (i ) 1 or 1 < ki < 1
78
13
Levinson-Durbin Algorithm
Autocorrelation Example
consider a simple p = 2 solution of the form
R(0) R(1) 1 R(1)
R(1) R(0) 2 R( 2)
with solution
A( i ) (z) = A( i 1) (z)
ki zi A( i 1) (z1 )
E ( 0 ) = R (0)
k1 = R(1) / R(0)
1(1) = R(1) / R(0)

E (1) =
79
Vn =
R( 2)R(0) R (1)
2( 2 ) =
1( 2 ) =
80
Prediction Error as a Function of p
Autocorrelation Example
k2 =
R 2 (0) R 2 (1)
R ( 0)
R 2 (0) R 2 (1)
p
En
R [k ]
= 1 k n
Rn [0]
Rn [0]
k =1
Model order is usually determined

by the following rule of thumb:
Fs/1000 poles for vocal tract
2-4 poles for radiation
2 poles for glottal pulse
R( 2)R(0) R 2 (1)
R 2 (0) R 2 (1)
R(1)R(0) R (1)R ( 2)
R 2 (0) R 2 (1)
with final coefficients
1 = 1( 2 )
2 = 2( 2 )
E ( i ) = prediction error for predictor of order i
81
82
Autocorrelation Method Properties

mean-squared prediction error always non-zero
decreases monotonically with increasing model order
autocorrelation matching property

model and data match up to order p
spectrum matching property

favors peaks of short-time FT
minimum-phase property
zeros of A(z) are inside the unit circle
Levinson-Durbin recursion
efficient algorithm for finding prediction coefficients
PARCOR coefficients and MSE are by-products
83
14

LPC

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

LPC

Uploaded by

Copyright:

Available Formats

LPC Methods

LPC methods are the most widely used in

Digital Speech Processing

LPC methods provide extremely accurate estimates

s(n k ) for some value of p, k 's

LP is based on speech production and synthesis models

for periodic signals with period N p , it is obvious that

- assume that the model parameters remain constant

LP provides a robust, reliable and accurate method for

predicting its value

LP methods have been used in control and

the time-varying digital filter

used extensively in speech under group of names

the system is excited by an impulse

this all-pole model is a natural

a pth order linear predictor is a system of the form

need to determine { k } directly from speech such

that they give good estimates of the time-varying

the prediction error, e( n ), is of the form

need to estimate { k } from short segments of speech

need to minimize mean-squared prediction error over

the prediction error is the output of a system with transfer function

short segments of speech

speech production model

if the speech signal obeys the production model exactly, and if k = ak ,1 k p

reliably, and accurately for speech

Solution for {k}

Solution for {k}

( sn (m) s%n (m))

giving the set of equations

select segment of speech sn (m ) = s(m + n ) in the vicinity

can find values of k that minimize En by setting:

where k are the values of k that minimize En (from now

Solution for {k}

Solution for {k}

which can be written in the form

(i, k ) = (i, 0),

leading to a set of p equations in p unknowns that can be

if sn ( m ) is non-zero only for 0 m L 1 then

assume sn (m ) exists for 0 m L 1 and is

exactly zero everywhere else (i.e., window of

is non-zero only over the interval 0 m L 1 + p, giving

where w (m ) is a finite length window of length

at values near m = L (i.e., m = L, L + 1,..., L + p 1) we are predicting zero-valued samples

The Autocorrelation Method

The Autocorrelation Method

which is equivalent to the form

there are L | i k | non-zero terms in the computation of n ( i , k ) for each value

of i and k ; can easily show that

where Rn (i k ) is the short-time autocorrelation of sn (m ) evaluated at i k where

since Rn (k ) is even, then

as expressed in matrix form

thus the basic equation becomes

with the minimum mean-squared prediction error of the form

there is a second basic approach to defining the speech

changing the summation index gives

key difference from Autocorrelation Method is that limits of summation

since we are extending window backwards, don't need to taper it using

(i, k ) = (i, 0),

we have n (i , k ) = n (k , i ) => symmetric but not Toeplitz matrix

solution for optimum predictor coefficients, { k }, is based on solving a matrix equation

whose diagonal elements are related as

=> two solutions have evolved