You are on page 1of 14

LPC Methods

LPC methods are the most widely used in


speech coding, speech synthesis, speech
recognition, speaker recognition and verification
and for speech storage

Digital Speech Processing


Lecture 13

LPC methods provide extremely accurate estimates


of speech parameters, and does it extremely
efficiently
basic idea of Linear Prediction: current speech
sample can be closely approximated as a linear
combination of past samples, i.e.,

Linear Predictive
Coding (LPC)Introduction

s(n ) =

s(n k ) for some value of p, k 's

k =1

LPC Methods

LPC Methods

LP is based on speech production and synthesis models


- speech can be modeled as the output of a linear,
time-varying system, excited by either quasi-periodic
pulses or noise;

for periodic signals with period N p , it is obvious that

- assume that the model parameters remain constant


over speech analysis interval

s( n ) s( n N p )
but that is not what LP is doing; it is estimating s(n ) from
the p ( p << N p ) most recent values of s(n ) by linearly

LP provides a robust, reliable and accurate method for


estimating the parameters of the linear system (the combined
vocal tract, glottal pulse, and radiation characteristic for voiced speech)

predicting its value


for LP, the predictor coefficients (the k 's) are determined
(computed) by minimizing the sum of squared differences
(over a finite interval) between the actual speech samples
and the linearly predicted ones
3

LPC Methods

Basic Principles of LP
p

s( n ) =

LP methods have been used in control and


information theorycalled methods of system
estimation and system identification

k =1

the time-varying digital filter


represents the effects of the glottal
pulse shape, the vocal tract IR, and
radiation at the lips

used extensively in speech under group of names


including
1.
2.
3.
4.
5.
6.
7.

a s( n k ) + G u ( n )

covariance method
autocorrelation method
lattice method
inverse filter formulation
spectral estimation formulation
maximum likelihood method
inner product method

H (z ) =

S( z )
=
GU ( z )

1
p

a z
k

k =1

the system is excited by an impulse


train for voiced speech, or a random
noise sequence for unvoiced speech
k

this all-pole model is a natural


representation for non-nasal voiced
speechbut it also works reasonably
well for nasals and unvoiced sounds
6

LP Basic Equations

LP Estimation Issues

a pth order linear predictor is a system of the form


p
p
S% ( z )
k s( n k ) P ( z ) =
k z k =
s%( n ) =
S( z )
k =1
k =1

need to determine { k } directly from speech such

that they give good estimates of the time-varying


spectrum

the prediction error, e( n ), is of the form

need to estimate { k } from short segments of speech

e( n ) = s(n ) s%(n ) = s( n )

s(n k )

need to minimize mean-squared prediction error over

k =1

the prediction error is the output of a system with transfer function

short segments of speech


resulting { k } assumed to be the actual {ak } in the

A( z ) =

E (z )
= 1 P (z ) = 1
k zk
S( z )
k =1

speech production model


=> intend to show that all of this can be done efficiently,

if the speech signal obeys the production model exactly, and if k = ak ,1 k p


e( n ) = Gu(n )and A( z ) is an inverse filter for H ( z ), i.e.,
H (z ) =

reliably, and accurately for speech

1
A( z )

Solution for {k}

Solution for {k}


short-time average prediction squared-error is defined as
En =

en2 (m) =

( sn (m) s%n (m))

En
= 0,
i

i = 1, 2,..., p

giving the set of equations

sn (m)
=
k sn (m k )

m
k =1

select segment of speech sn (m ) = s(m + n ) in the vicinity

can find values of k that minimize En by setting:

of sample n
the key issue to resolve is the range of m for summation
(to be discussed later)

s (m i )[s (m) s (m k )] = 0, 1 i p
n

k n

k =1

s (m i ) e (m) = 0, 1 i p
n

where k are the values of k that minimize En (from now


on just use k rather than k for the optimum values)
prediction error (en (m)) is orthogonal to signal (sn (m i )) for
delays (i ) of 1 to p

10

Solution for {k}

Solution for {k}


minimum mean-squared prediction error has the form

defining

n (i , k ) =

En =

s (m i )s (m k )
n

En = n (0, 0)

k =1

which can be written in the form

we get
k n

2
n

(i, k ) = (i, 0),

s (m ) s (m ) s ( m k )

i = 1, 2,..., p

(0, k )
k n

k =1

k =1

leading to a set of p equations in p unknowns that can be


solved in an efficient manner for the { k }

Process:
1. compute n (i , k ) for 1 i p, 0 k p
2. solve matrix equation for k
need to specify range of m to compute n ( i , k )

11

need to specify sn (m )

12

Autocorrelation Method

Autocorrelation Method

if sn ( m ) is non-zero only for 0 m L 1 then

assume sn (m ) exists for 0 m L 1 and is

en ( m ) = sn (m )

s (m k )
k n

k =1

exactly zero everywhere else (i.e., window of


length L samples)
(Assumption #1)
sn (m ) = s(m + n ) w ( m ), 0 m L 1

is non-zero only over the interval 0 m L 1 + p, giving


L 1+ p

En =

e (m ) = e ( m )
2
n

m =

2
n

m =0

at values of m near 0 (i.e., m = 0,1...., p 1) we are predicting signal from zero-valued samples
outside the window range => en ( m ) will be (relatively) large

where w (m ) is a finite length window of length

at values near m = L (i.e., m = L, L + 1,..., L + p 1) we are predicting zero-valued samples


(outside window range) from non-zero samples => en (m ) will be (relatively) large

L samples

for these reasons, normally use windows that taper the segment to zero (e.g., Hamming window)

^n

^
n+L-1

m=0

m=L

m=0

m=L-1
m=L+p-1

m=p-1
13

14

The Autocorrelation Method

Autocorrelation Method

^n + L 1

sn [m] = s[ m + n ]w[ m]

L-1
L 1 k

s [m]s [m + k ]

Rn [k ] =

m =0

k = 1, 2,K, p

15

The Autocorrelation Method

16

Autocorrelation Method
for calculation of n (i , k ) since sn (m ) = 0 outside the range 0 m L 1, then

n
sn [m] = s[m + n]w[m]

n + L 1

n + L 1

n (i , k ) =

L 1+ p

s (m i )s (m k ), 1 i p, 0 k p
n

m =0

which is equivalent to the form

n (i , k ) =

L 1+( i k )

sn (m )sn ( m + i k ), 1 i p, 0 k p

m =0

there are L | i k | non-zero terms in the computation of n ( i , k ) for each value

Large errors
en [m] = sn [m] k sn [m k]
at ends of s [m] = s[m + n]w[m]
k =1
window
p

L1

of i and k ; can easily show that


n (i , k ) = f (i k ) = Rn (i k ),

1 i p, 0 k p

where Rn (i k ) is the short-time autocorrelation of sn (m ) evaluated at i k where


L 1 k

Rn (k ) =

s (m)s (m + k )
n

m =0

L 1 ( L 1 + p)

17

18

Autocorrelation Method

Autocorrelation Method

since Rn (k ) is even, then

n (i , k ) = Rn ( i k ), 1 i p, 0 k p

as expressed in matrix form

thus the basic equation becomes

k n (i k ) = n (i , 0), 1 i p

k =1
p

R ( i k ) = R (i ),
k

1 i p

k =1

with the minimum mean-squared prediction error of the form

with solution

En = n (0, 0)

= 1 r
is a p p Toeplitz Matrix => symmetric with all diagonal elements equal
=> there exist more efficient algorithms to solve for { k } than simple matrix

k n (0, k )

k =1

= Rn (0)

R (k )
k

inversion

k =1

Covariance Method

there is a second basic approach to defining the speech


segment sn (m ) and the limits on the sums, namely fix the
interval over which the mean-squared error is computed,
(Assumption #2):

giving
En =

m =0

n (i , k ) =

=
k sn (m k )
sn (m )

m =0
k =1
L 1

en2 (m )

changing the summation index gives

n (i , k ) =
n (i , k ) =

L i 1

s (m) s (m + i k ), 1 i p, 0 k p
n

m = i
L k 1

s (m) s (m + k i ), 1 i p, 0 k p
n

m = k

key difference from Autocorrelation Method is that limits of summation


include terms before m = 0 => window extends p samples backwards
from s(n - p ) to s(n + L - 1)

L 1

s (m i )s (m k ), 1 i p, 0 k p
n

20

19

Covariance Method

L 1

. . Rn ( p 1) 1 Rn (1)

. . Rn ( p 2) 2 Rn ( 2)
. = .
. .
.

. .
.
. .
. .
Rn (0) p Rn ( p )

Rn (1)
Rn (0)

Rn (0)
Rn (1)

.
.

.
.

R ( p 1) R ( p 2)
n
n
= r

since we are extending window backwards, don't need to taper it using


a HW- since there is no transition at window edges

m =0

m=-p

m=L-1

21

Covariance Method

22

Covariance Method
cannot use autocorrelation formulation => this is a true cross correlation
need to solve set of equations of the form
p

(i, k ) = (i, 0),


k n

i = 1, 2,..., p,

k =1

En = n (0, 0)

n (0, k )

k =1

n (1,1) n (1, 2)

n ( 2,1) n ( 2, 2)
.
.

.
.
( p,1) ( p, 2)
n
n

. . n (1, p ) 1 n (1, 0)

. . n ( 2, p ) 2 n ( 2, 0)
. .
. . = .

. .
. . .
. . n ( p, p ) p n ( p, 0)

= or = 1
23

24

Covariance Method

Summary of LP
use pth order linear predictor to predict s(n ) from p previous samples
minimize mean-squared error, En , over analysis window of duration L-samples

we have n (i , k ) = n (k , i ) => symmetric but not Toeplitz matrix

solution for optimum predictor coefficients, { k }, is based on solving a matrix equation

whose diagonal elements are related as

n (i + 1, k + 1) = n (i , k ) + sn ( i 1)sn ( k 1) sn (L 1 i )sn (L 1 k )
n ( 2, 2) = n (1,1) + sn ( 2)sn ( 2) sn (L 2)sn (L 2)

=> two solutions have evolved


- autocorrelation method => signal is windowed by a tapering window in order to
minimize discontinuities at beginning (predicting speech from zero-valued samples)
and end (predicting zero-valued samples from speech samples) of the interval; the

all terms n (i , k ) have a fixed number of terms contributing to the

matrix n (i , k ) is shown to be an autocorrelation function; the resulting autocorrelation

computed values (L terms)

matrix is Toeplitz and can be readily solved using standard matrix solutions
- covariance method => the signal is extended by p samples outside the normal range

n (i , k ) is a covariance matrix => specialized solution for { k }

of 0 m L - 1 to include p samples occurring prior to m = 0; this eliminates large errors


in computing the signal from values prior to m = 0 (they are available) and eliminates the

called the Covariance Method

need for a tapering window; resulting matrix of correlations is symmetric but not Toeplitz
=> different method of solution with somewhat different set of optimal prediction
coefficients, { k }
25

26

LPC Summary

LPC Summary
3. LPC Minimization:

1. Speech Production Model:

En = en2 (m ) = sn (m ) s%n (m )

s(n ) = ak s(n k ) + Gu(n )

k =1

H(z) =

S( z )
=
GU ( z )

= sn (m ) k sn ( m k )
m
k =1

1
p

1 ak z

En
= 0,
i

k =1

2. Linear Prediction Model:


p

s%(n ) = k s(n k )

i = 1, 2,..., p
p

s (m i )s (m) = s (m i )s (m k )
n

k =1

P(z) =

S% ( z ) p
= k z k
S( z ) k =1

k =1

n (i , k ) = sn (m i )sn (m k )
m

(i, k ) = (i, 0),

e(n ) = s(n ) s%(n ) = s(n ) k s(n k )

k =1

k =1

p
E(z)
= 1 k z k
A( z ) =
S( z )
k =1

k n

i = 1, 2,..., p

27

LPC Summary
4. Autocorrelation Method:
sn (m ) = s(m + n )w (m ),

En = n (0, 0) kn (0, k )
k =1

28

LPC Summary
4. Autocorrelation Method:

0 m L 1

resulting matrix equation:

en (m ) = sn (m ) k sn (m k ), 0 m L 1 + p
k =1

= r or = 1r

sn (m ) defined for 0 m L 1; en (m ) defined for 0 m L 1 + p


large errors for 0 m p 1 and for L m L + p 1
En =

L 1+ p

e (m )

m =0

2
n

n (i , k ) = Rn (i k ) =

L 1( i k )

m =0

R ( i k ) = R (i ),
k =1

Rn (1)
Rn (0)
R (1)
R

n
n ( 0)

.
.

.
.

Rn ( p 1) Rn ( p 2)

sn (m )sn (m + i k ) = Rn ( i k

1 i p

. . Rn ( p 1) 1 Rn (1)

. . Rn ( p 2) 2 Rn ( 2)
. = .

. .
.

. .
.
. .

Rn (0) p Rn ( p )
. .

En = Rn (0) k Rn (k )
k =1

29

matrix equation solved using Levinson or Durbin method

30

LPC Summary

Computation of Model Gain

5. Covariance Method:

it is reasonable to expect the model gain, G, to be determined by matching

fix interval for error signal

the signal energy with the energy of the linearly predicted samples

En = e (m ) = sn (m ) k sn ( m k )
m =0
m =0
k =1

need signal for from s(n p ) to s(n + L 1) L + p samples


L 1

L 1

from the basic model equations we have

2
n

(i, k ) =
k =1

Gu( n ) = s( n )

a s(n k ) model
k

k =1

n (i , 0), i = 1, 2,..., p

k n

whereas for the prediction error we have


p

e(n ) = s(n )

En = n ( 0, 0) kn (0, k )

when k = ak (i.e., perfect match to model), then

expressed as a matrix equation:

e(n ) = Gu(n )

= or = 1 , symmetric matrix
. . n (1, p ) 1 n (1, 0)

. . n ( 2, p ) 2 n ( 2, 0)

. = .

. .
. . .
. . n ( p, p ) p n ( p, 0)
. .

k =1

k =1

n (1,1) n (1, 2)
( 2,1) ( 2, 2)
n
n
.
.

.
.
n ( p,1) n ( p, 2)

s(n k ) best fit to model

since it is virtually impossible to guarantee that k = ak , cannot use this


simple matching property for determining the gain; instead use energy
matching criterion (energy in error signal=energy in excitation)

L 1+ p

L 1+ p

u (m ) = e (m ) = E

G2

m =0

31

m =0

32

Solution for Gain (Voiced)

Gain Assumptions

for voiced speech the excitation is G (n ) with output h%(n ) (since it is


the IR of the system),

assumptions about excitation to solve for G

h%(n ) =

voiced speech --u(n ) = (n ) L order of a

h%(n k ) + G (n );
k

k =1

single pitch period; predictor order, p, large enough


to model glottal pulse shape, vocal tract IR, and

G
H% ( z ) =
=
A( z )

G
p

z
k

k =1

with autocorrelation R% (m ) (of the impulse response) satisfying the relation


shown below

radiation

h%(n) h%(m + n) = R% [m],

R% (m ) =

unvoiced speech --u (n )-zero mean, unity variance,


stationary white noise process

0m<

n =0
p

R% ( m k ),

1 m <

R% (k ) + G ,

m=0

R% (m ) =
R% (0) =

k =1
p

k =1

33

Solution for Gain (Unvoiced)

Solution for Gain (Voiced)

for unvoiced speech the input is white noise with zero mean
and unity variance, i.e.,

Since R% (m ) and Rn (m ) have the identical form, it follows that


0mp
R% (m ) = c R (m ),

E u(n ) u (n m ) = (m )
if we excite the system with input Gu(n ) and call the output
g% (n ) then

where c is a constant to be determined.


Since the total energies in the signal (R( 0)) and the impulse
response (R% ( 0)) must be equal, the constant c must be 1, and

g% (n ) =

Since the autocorrelation function for the output is the convolution


of the autocorrelation function of the impulse response with the

R (k ) = E

g% (n k ) + Gu(n )
k =1

we obtain the relation


G 2 = Rn (0)

34

autocorrelation function of the white noise input, then


E [g% [n ]g% [n m ]] = R% [m ] [m ] = R% [m ]

k =1

since R% (m ) = Rn (m ), 0 m p, and the energy of the impulse

letting R% ( m ) denote the autocorrelation of g% ( n ) gives

response=energy of the signal => first p + 1 coefficients of the


autocorrelation of the impulse response of the model are identical

R% (m ) = E g% (n ) g% (n m ) =
p

to the first p + 1 coefficients of the autocorrelation function of the


speech signal. This condition called the autocorrelation matching
property of the autocorrelation method.

k R% (m k ),

E [g% (n k )g% (n m)] + E Gu(n )g% (n m)


k

k =1

m0

k =1

35

since E Gu(n )g% (n m ) = 0 for m > 0 because u (n ) is uncorrelated

36

with any signal prior to u(n )

Solution for Gain (Unvoiced)


for m = 0 we get
R% (0) =

Frequency
Domain
Interpretations
of Linear
Predictive
Analysis

R% (k ) + GE u(n)g% (n )
k

k =1
p

R% (k ) + G
k

k =1

since E u( n )g% (n ) = E u(n )(Gu( n ) + terms prior to n = G 2


since the energy in the signal must equal the energy in
the response to Gu( n ) we get
R% (m ) = R (m )
n

G = Rn (0)
2

R (k ) = E
n

k =1

37

38

LPC Spectrum

The Resulting LPC Model


The final LPC model consists of the LPC parameters,

{ k }, k = 1, 2,..., p, and the gain, G, which together


define the system function
G
H% ( z ) =
p
1 k zk
k =1

with frequency response


H% (e j ) =

G
p

1 k e jk

G
A(e j )

H% (e j ) =

k =1

with the gain determined by matching the energy of the


model to the short-time energy of the speech signal, i.e.,
k =1

39

sn [m] = s[m + n ]w[m]

LP Analysis is seen to be a method of short-time spectrum estimation with


removal of excitation fine structure (a form of wideband spectrum analysis)

40

(b) Corresponding shorttime autocorrelation


function used in LP
analysis (heavy line
shows values used in LP
analysis)

The discrete-time Fourier transform of this


windowed segment is:

x = s .* hamming(301);
X = fft( x , 1000 )
[ A , G , r ] = autolpc( x , 10 )
H = G ./ fft(A,1000);

(a) Voiced speech segment


obtained using a
Hamming window

Defined speech segment as:

( ) s[m + n]w[m] e

j k

LP Short-Time Spectrum Analysis

LP Short-Time Spectrum Analysis

Sn e j =

1 k e
k =1

G = En = ( en (m )) = Rn (0) k Rn (k )
2

G
p

j m

(c) Corresponding shorttime log magnitude


Fourier transform and
short-time log magnitude
LPC spectrum (FS=16
kHz)

m =

Short-time FT and the LP spectrum are linked


via short-time autocorrelation
41

42

LP Short-Time Spectrum Analysis

Frequency Domain Interpretation of


Mean-Squared Prediction Error

(a) Unvoiced speech


segment obtained using
a Hamming window
(b) Corresponding shorttime autocorrelation
function used in LP
analysis (heavy line
shows values used in LP
analysis)
(c) Corresponding shorttime log magnitude
Fourier transform and
short-time log magnitude
LPC spectrum (FS=16
kHz)
43

Frequency Domain Interpretation of


Mean-Squared Prediction Error

The LP spectrum provides a basis for examining the properties


of the prediction error (or equivalently the excitation of the VT)
The mean-squared prediction error at sample n is:
En =

L + p 1

e [ m]
2

m =0

which, by Parseval's Theorem, can be expressed as:


En =

1
2

| E (e
n

) |2 d =

1
2

G2
2

| Sn (e j ) |2

| H% (e ) | d = G
j

| S (e
n

) |2 | A(e j ) |2 d = G 2

where S n (e j ) is the FT of sn [m] and A(e j ) is the corresponding


prediction error frequency response
p

A(e j ) = 1 k e j k
k =1

44

LP Interpretation Example1

The LP spectrum is of the form:


G
H% (e j ) =
A(e j )
Thus we can express the mean-squared error as:
En =

Much better
spectral matches
to STFT spectral
peaks than to
STFT spectral
valleys as
predicted by
spectral
interpretation of
error
minimization.

We see that minimizing total squared prediction error


is equivalent to finding gain and predictor coefficients
such that the integral of the ratio of the energy spectrum of
the speech segment to the magnitude squared of the frequency
response of the model linear system is unity.
Thus | Sn (e j ) |2 can be interpreted as a frequency-domain
weighting function LP weights frequencies where | S n (e j ) |2
45

is large more heavily than when | Sn (e j ) |2 is small.

46

LP Interpretation Example2

Effects of Model Order

Note small
differences in
spectral shape
between STFT,
autocorrelation
spectrum and
covariance
spectrum when
using short
window duration
(L=51 samples).

The AC function, Rn [m] of the speech segment, sn [m],


and the AC function, R%[m], of the impulse response, h%[m],
corresponding to the system function, H% ( z ), are equal for
the first ( p + 1) values. Thus, as p , the AC functions
are equal for all values and thus:
lim | H% (e j ) |2 =| S (e j ) |2
p

Thus if p is large enough, the FR of the all-pole


model, H% (e j ), can approximate the signal spectrum
with arbitrarily small error.

47

48

Effects of Model Order

Effects of Model Order

49

50

Effects of Model Order

Effects of Model Order

plots show Fourier transform of segment


and LP spectra for various orders
- as p increases, more details of the spectrum
are preserved
- need to choose a value of p that represents
the spectral effects of the glottal pulse, vocal
tract and radiation--nothing else

51

52

Linear Prediction Spectrogram


Linear Prediction Spectrogram
Speech spectrogram previously defined as:
L 1

20 log | S r [k ] |= 20 log| s[rR + m] w[m] e j (2 / N ) km |

L=81, R=3, N=1000,


40 db dynamic range

m =0

for set of times, tr = rRT , and set of frequencies, Fk = kFS / N , k = 1, 2,..., N / 2


where R is the time shift (in samples) between adjacent STFTs,
T is the sampling period, FS = 1/ T is the sampling frequency,
and N is the size of the discrete Fourier transform used to
compute each STFT estimate.
Similarly we can define the LP spectrogram as an image plot of:
20 log | H% r [ k ] |= 20 log
where Gr and Ar (e

j (2 / N ) k

Gr
Ar (e j (2 / N ) k )
) are the gain and prediction error polynomial

at analysis time rR.


53

54

Comparison to Other Spectrum


Analysis Methods

Comparison to Other Spectrum


Analysis Methods

Spectra of synthetic
vowel /IY/

Natural speech spectral


estimates using cepstral
smoothing (solid line) and linear
prediction analysis (dashed
line).

(a) Narrowband
spectrum using
40 msec window
(b) Wideband
spectrum using a
10 msec window

Note the fewer (spurious) peaks


in the LP analysis spectrum
since LP used p=12 which
restricted the spectral match to
a maximum of 6 resonance
peaks.

(c) Cepstrally
smoothed
spectrum
(d) LPC spectrum
from a 40 msec
section using a
p=12 order LPC
analysis
55

Selective Linear Prediction

Note the narrow bandwidths of


the LP resonances versus the
cepstrally smoothed
resonances.
56

Selective Linear Prediction

it is possible to apply LP methods to selected parts of spectrum

0-10 kHz region modeled


using p=28

- 0-4 kHz for voiced sounds use a predictor of order p1


- 4-8 kHz for unvoiced sounds use a predictor of order p2

no discontinuity in model
spectra at 5 kHz

the key idea is to map the frequency region {fA , fB } linearly to {0,.5}

or, equivalently, the region {2 fA , 2 fB } maps linearly to {0, } via

the transformation

2 fA
2 fB
2 fB 2 fA

0-5 kHz region modeled


using p1=14

we must modify the calculation for the autocorrelation to give:


R (m ) =

1
2

| S (e
n

5-10 kHz region modeled


using p2=5

) |2 e j m d

discontinuity in model
spectra at 5 kHz

57

58

LPC Solutions-Covariance Method


for the covariance method we need to solve the matrix equation
p

(i, k ) = (i, 0),


k n

i = 1, 2,..., p

k =1

Solutions of LPC Equations

= (in matrix notation)


is a positive definite, symmetric matrix with (i , j ) element n (i , j ),
and and are column vectors with elements i and n (i , 0)
the solution of the matrix equation is called the Cholesky
decomposition, or square root method

Covariance Method (Cholesky


Decomposition Method)

=VDV t ; V = lower triangular matrix with 1's on the main diagonal


D=diagonal matrix

59

60

10

LPC Solutions-Covariance Method

Cholesky Decomposition Example

can readily determine elements of V and D by solving for (i , j ) elements


of the matrix equation, as follows

consider example with p = 4, and matrix elements n (i , j ) = ij

n (i , j ) =

ik d kV jk ,

1 j i 1

11

21
31

41

k =1

giving
Vij d j = n (i , j )

j 1

1 j i 1

ik d kV jk ,

k =1

and for the diagonal elements

n (i , i ) =

ik d kVik

0
0
1

0
V21 1
V31 V32 1

V41 V42 V43

k =1

giving
d i = n (i , i )

i 1

V
k =1

2
ik

i2

dk ,

with
d1 = n (1,1)

0 d1 0

0 0 d2
0 0 0

1 0 0

31
32
33
34
0
0
d3
0

41
42
=
43

44
0 1 V21 V31 V41

0 0 1 V32 V42
0 0 0
1 V43

d 4 0 0
0
1

61

Cholesky Decomposition Example


d1 = 11

now need to solve for using a 2-step procedure


VDV t =

step 1

writing this as
VY= with

V21d1 = 21; V31d1 = 31; V41d1 = 41


V21 = 21 / d1; V31 = 31 / d1; V41 = 41 / d1
d 2 = 22 V212 d1

step 2

DV t = Y or
V t = D1Y

step 3

from V (which is now known) solve for column vector Y


using a simple recursion of the form

V32 d 2 = 32 V31d1V21 V32 = (32 V31d1V21 ) / d 2


V42d 2 = 42 V41d1V21 V42 = (42 V41d1V21 ) / d 2 step 4

Yi = i

iterate procedure to solve for d3 ,V43 , d 4

LPC Solutions-Covariance Method

pi 2

with initial condition


Y1 = 1

64

co n tin u in g th e e xa m p le w e so lv e fo r Y
0 Y1 1

0 Y 2 2
=
0 Y 3 3

V 42 V 43 1 Y 4 4
first so lv in g fo r Y1 Y 4 w e g e t
1

V 21
V 31

V 41

ij

Cholesky Decomposition Example

now can solve for using the recursion


ji

i 1

V Y ,
j =1

63

V ,

62

LPC Solutions-Covariance Method

solve matrix for d1,V21,V31,V41, d 2 ,V32 ,V42 , d3 ,V43 , d 4

i = Yi / d i

21
22
32
42

1 i p 1

j = i +1

with initial condition


p = Yp / d p
calculation proceeds backwards from i = p 1
down to i = 1
65

0
1
V 32

0
0
1

Y1 = 1
Y 2 = 2 V 21Y1
Y 3 = 3 V 31Y1 V 32Y 2
Y 4 = 4 V 41Y1 V 42Y 2 V 43Y 3

66

11

Cholesky Decomposition Example

Covariance Method Minimum Error

next solve for from equation


0
0
0 Y1 Y1 / d1
1 V21 V31 V41 1 1 / d1


0
0 Y2 Y2 / d 2
0 1 V32 V42 2 = 0 1 / d 2
=
0 0
0
1 / d3
0 Y3 Y3 / d3
1 V43 3 0


0
0
1 / d 4 Y4 Y4 / d 4
0
1 4 0
0 0
giving the results

4 = Y4 / d 4
3 = Y3 / d3 V43 4
2 = Y2 / d 2 V32 3 V42 4
1 = Y1 / d1 V21 2 V31 3 V41 4

the minimum mean squared error can be written in the form


p

En = n (0, 0)

(0, k )
k n

k =1
t

= n (0, 0)

since t = Y t D 1V 1 can write this as


En = n (0, 0) Y t D 1Y
p

= n ( 0, 0)

2
k

/ dk

k =1

this computation for En can be used for all values of LP order from
1 to p can understand how LP order reduces mean-squared error

completing the solution


67

68

Solutions of LPC Equations


Autocorrelation Method via
Levinson-Durbin Algorithm

69

Levinson-Durbin Algorithm 1

70

Levinson-Durbin Algorithm 2

Autocorrelation equations (at each frame n):


p

By combining the last two equations we get a larger matrix


equation of the form:

R[| i k |] = R[i ] 1 i p

r
=

k =1

R[ p ] 1 E ( p )
( p)

... R[ p 1] 1 0
( p)
... R[ p 2] 2 = 0

.
.
. .
( p)
...
R[0] p 0
expanded (p + 1) x ( p + 1) matrix is still Toeplitz and can be solved iteratively
by incorporating new correlation value at each iteration and

R[1]
R[2]
R[0]
R[1]
R[0]
R[1]

R[2]
R[1]
R[0]

.
.
.
R[ p ] R[ p 1] R[ p 2]

is a positive definite symmetric Toeplitz matrix


The set of optimum predictor coefficients satisfy:
p

R[i] k R[| i k |] = 0,

1 i p

k =1

with minimum mean-squared prediction error of:

...

solving for next higher order predictor in terms of new correlation


value and previous predictor

R[0] k R[k ] = E ( p )
k =1

71

72

12

Levinson-Durbin Algorithm 3

Levinson-Durbin Algorithm 4

Show how i th order solution can be derived from (i 1) st


order solution; i.e., given ( i 1) , the solution to R ( i 1) ( i 1) = E ( i 1)
we derive solution to R ( i ) ( i ) = E ( i )

R[1]
R[2]
R[0]
R[1]
R[0]
R[1]

R[2]
R[1]
R[0]

.
.
.
R[i 1] R[i 2] R[i 3]

R[i 1] R[i 2]
R[i]

The (i 1) st solution can be expressed as:


R[1]
R[2]
R[0]
R[1]
[0]
R
R[1]

R[2]
R[1]
R[0]

.
.
.
R[i 1] R[i 2] R[i 3]

Appending a 0 to vector ( i 1) and multiplying by the matrix R (i ) gives


a new set of (i + 1) equations of the form:

... R[i 1] 1 E ( i 1)

... R[i 2] 1( i 1) 0
( i 1)

= 0
... R[i 3] 2

.
. . .
( i 1)

R[0] i 1 0
...

...
R[i] 1 E (i 1)

... R[i 1] 1( i 1) 0
... R[i 2] 2( i 1) 0

=
...
. . .
( i 1)

R[1]
i 1
...
0


R[0] 0 ( i 1)
...

i 1

where ( i 1) = R[i] (ji 1) R[i j ] and R[i] are introduced


j =1

73

74

Levinson-Durbin Algorithm 5

Levinson-Durbin Algorithm 6

Key step is that since Toeplitz matrix has special symmetry

component in the vector E ( i ) ) we combine the two


sets of matrices (with a multiplicative factor ki ) giving:

To get the equation into the desired form (a single

we can reverse the order of the equations (first equation last,


last equation first), giving:
R[1]
R[2]
R[0]
R[1]
R[0]
R[1]

R[2]
R[1]
R[0]

.
.
.
R[i 1] R[i 2] R[i 3]

R[i 1] R[i 2]
R[i ]

...
...
...
...
...
...

1
( i 1)
0 E ( i 1)
( i 1)

( i 1)
1
0
i 1 0
( i 1)

i(i 21) 0
( i ) 2
R
ki

ki
=
.
.
. .
i(i11)
0
1( i 1) 0

( i 1)

( i 1)

E
0
1

R[i ] 0

R[i 1] i(i11) 0
R[i 2] i(i 21) 0

=
. . .
(
i

1)
R[1] 1 0


R[0] 1 E (i 1)
( i 1)

Choose ( i 1) so that vector on right has only a single


non-zero entry, i.e.,
i 1

ki =

75

Levinson-Durbin Algorithm 7
The first element of the right hand side vector is now:

j =1

E ( i 1)

76

1 j p

with prediction error

, the vector of i order predictor


th

m =1

m =1

E ( p ) = E[0] (1 km2 ) = R[0] (1 km2 )

1 1
0
(i ) (i 1)
( i 1)
1 1
i 1
(i )
( i 1)
2 2
i(i 21)

=
ki

. .
.
i(i1) i(i11)
1( i 1)
(i )

1
i 0

If we use normalized autocorrelation coefficients:


r[k ] = R[k ] / R[0]
we get normalized errors of the form:

(i ) =

yielding the updating procedure

(ji ) = (ji 1) ki i(i j1) , j = 1, 2,..., i


i(i ) = ki

R[i ] (ji 1) R[i j ]

Levinson-Durbin Algorithm 7
j = (j p )

The ki parameters are called PARCOR coefficients.


With this choice of
coefficients is:

E ( i 1)

The final solution for order p is:

E (i ) = E (i 1) ki (i 1) = E (i 1) (1 ki2 )
( i 1)

( i 1)

77

i
i
E (i )
= 1 k(i ) r[k ] = (1 km2 )
R[0]
k =1
m =1

where 0 < (i ) 1 or 1 < ki < 1

78

13

Levinson-Durbin Algorithm

Autocorrelation Example
consider a simple p = 2 solution of the form
R(0) R(1) 1 R(1)

R(1) R(0) 2 R( 2)
with solution

A( i ) (z) = A( i 1) (z)
ki zi A( i 1) (z1 )

E ( 0 ) = R (0)
k1 = R(1) / R(0)

1(1) = R(1) / R(0)


E (1) =

79

Vn =

R( 2)R(0) R (1)

2( 2 ) =
1( 2 ) =

80

Prediction Error as a Function of p

Autocorrelation Example
k2 =

R 2 (0) R 2 (1)
R ( 0)

R 2 (0) R 2 (1)

p
En
R [k ]
= 1 k n
Rn [0]
Rn [0]
k =1

Model order is usually determined


by the following rule of thumb:
Fs/1000 poles for vocal tract
2-4 poles for radiation
2 poles for glottal pulse

R( 2)R(0) R 2 (1)
R 2 (0) R 2 (1)
R(1)R(0) R (1)R ( 2)

R 2 (0) R 2 (1)
with final coefficients

1 = 1( 2 )
2 = 2( 2 )
E ( i ) = prediction error for predictor of order i

81

82

Autocorrelation Method Properties


mean-squared prediction error always non-zero
decreases monotonically with increasing model order

autocorrelation matching property


model and data match up to order p

spectrum matching property


favors peaks of short-time FT

minimum-phase property
zeros of A(z) are inside the unit circle

Levinson-Durbin recursion
efficient algorithm for finding prediction coefficients
PARCOR coefficients and MSE are by-products
83

14

You might also like