You are on page 1of 8

Biometrics and Speech Processing

DSP Tools and Techniques

MELE0021

POWER SPECTRUM ESTIMATION


ESTIMATION OF AUTOCORRELATION FUNCTIONS
Suppose that a discrete time wide-sense stationary random process xn has the
unknown autocorrelation function
R(m) = E(xn xn+m)
and that an estimate of R(m) using an observation of N random variables x0 , x1 ,.
xN-1 is desired.
One estimate for R(m) is
RN ( m) =

x0 x m + x1x m +1 + ..........x N 1 m x N 1

or RN (m) =

N
1
N

N 1 m

xn xn + m ,

for m = 0,1,2......... N 1,

n =0

which averages together all possible products of samples separated by a lag of m.


Notice that for a large m, there are very few possible products.

-2

-1

m=2
N=4

xn

xn+2

In fact for mN there are no possible pairs with lag available, and so we arbitrarily
estimate the autocorrelation function at those lags to be zero.
R(m) = 0, for m N
This autocorrelation function estimator RN(m) is sometimes called the sample
autocorrelation function since it is based on an average of sample products. Its
subscript N emphasizes that it uses a realisation which is N samples in length.
One property of RN(m) is that it is an even function of m. [i.e. RN(m) = RN(-m)], just as
the true as R(m) is.
The question of the goodness of the estimator RN(m) immediately arises. A good
estimator should have an expected value that is close to the true value of the quantity
being estimated. The difference between the true value of the quantity and expected

value of its estimator is called the bias of the estimator. The estimator is said to be
unbiased if the bias is zero.
N 1 m

1
R N ( m) =
xn xn+ m
N n =0
RN(m) is regarded as a random variable for any given lag m since it is a function of
the particular realisation ( its value changes with different realisations of the random
process xn),The expected value of RN(m) is,

1 N 1 m
E [R N (m)] = E
x
x
n n+ m
N n=0

Since the summation and expectation are linear operations, we may interchange them
N 1 m
1
E [RN (m)] =
E xn xn + m
N n =0

E [RN (m)] =

Nm
N

m
R(m) = 1 R(m)
N

where R(m) is the true autocorrelation function.


Thus the sample autocorrelation function is biased, because its mean is not the true
autocorrelation R(m) at lag m. it can be said, however, that this RN is asymptotically
unbiased because |m| / N term vanishes as N reaches .
We could easily get unbiased estimates by using (N - |m|) to divided the sum in the
above equation rather than N. The bias shown in the above equation is really not
important because as will be seen N >> |m| is needed for the variance to be small and
the bias is correspondingly small in such cases. A good estimator should have a small
variance in addition to having a small bias.
For most reasonable signals it can be shown that
Var{R N (m)} =

2
N

N 1 m

r =0

m +r 2
1
R ( r ) + R ( r + m) R ( r m)
N

This expression for the variance of R N (m) in terms of the true autocorrelation
function R (m) is exact for a Gaussian process.

For large N, the only dependence of Var{R N (m)} on N is the 1 / N factor to the left of
the sum in the above equation. Therefore Var{R N (m)} approaches zero for large N.

Periodogram
Suppose that x n is a wide-sense stationary, discrete-time, random process. The
power-spectral density estimate of x n is defined as the discrete Fourier transform of
the autocorrelation function estimate R N (m) .

S N ( ) =

RN (m)e jm

m =

This definition relating the two estimates is motivated by the fact that the true power
spectral density and autocorrelation function obey the similar discrete Fourier
transform relation.

R(m)e jm

S ( ) =

m =

This spectral-density estimate defined in terms of the sample autocorrelation function


can be related directly to the observed data. It can be shown that S N ( ) has the
following relationship to the discrete Fourier transform X N ( ) of the data samples.
1
S N ( ) =
N

N 1

xn e

2
jn

1
2
X N ( )
N

To shown this first we define a truncated signal x nN which equals x n when x n is


defined and is zero for other values, namely

n = 0,1,2,3,N-1

x
x nN = n
0

otherwise

This is necessary in order to handle the infinite sum of


S N ( ) =

m =

(m)e jm

with finite data samples.

In this way, R N (m) is given by,

R N ( m) =

1
N

n =

N
n

x nN+ m

and S N ( ) is given by;


S N ( ) =

1 + + N N jm
xn xn + me
N m = n =

S N ( ) =

1 + N + N jm
xn xn + m e
N n = m =

When a factor of 1 = e jne jn is introduced into the above equation , it becomes;

1 N jn N j ( n + m )
xn e xn + m e
N n =
m =
Changing the summation variable to k = m + n in the second sum, we see that
+
1
1
S N ( ) = X N ( ) xnN e jn = X N ( ) X N ( )
N
N
n =
S N ( ) =

S N ( ) =

1
2
X N ( )
N

In which the transform of the original and truncated signal is,

N 1

n=0

X N ( ) = xnN e jn =

x n e j n

This completes the proof that the spectral estimate S N ( ) defined as the discrete
Fourier transform of RN (m) can also be computed directly from the data. In fact it is
generally faster to compute S N ( ) as2
1 N 1
jn
S N ( m) =
xn e
N n =0

and then compute R N (m) by taking the inverse discrete Fourier transform of S N ( ) ,
rather than direct use of equations.
R N ( m) =

1
N

N 1 m

x
n =0

xn+ m

This spectral density estimate is called Periodogram. The Periodogram is seen to be


the magnitude squared of the discrete Fourier transform of the data divided by N. The
power spectrum of a random signal is thus unrelated to the angle of the complex
discrete Fourier transform x N ( ) . The factor 1/N in the above equation helps S N
2

converge as N , because as N increases the X N ( ) part of S N ( ) will keep


getting larger. It will be seen that even with this factor (i.e. 1/N), S N ( ) does not
approach S() for large N. Now the spectral density estimate must be examined to
find out how well it approximates the true spectral density.
If the bias and variance of S N both approach zero as N (the number of data)
approaches infinity, then S N is a good estimate of the true S ( ) .
The mean value of S N is given byE{S N ( )} =

E{RN (m)}e jm

m =

but

therefore

m
E{R N (m)} = 1 R (m)
N
N
m
E{S N (m)} = R (m) 1 e jm
N
m=N
m
N
1 is a triangular function ( m ).
N

vmN = (1

m
)
N

mN

vmN = 0

m >N

vmN

The Periodogram can be rewritten as an infinite sumE{S N ( )} =

R(m)vmN e jm

m =

This sum is the discrete Transform of the product of two m-functions, R (m) and vmN .
It is apparent that S N ( ) is a biased estimate of S ( ) . E{S N ( )} in the above
equation differs from the true S ( ) =

R ( m)e

m =

jm

, by the presence of the vmN term.

This term is a window function and appears because of the necessity of using a finite
set of data samples.
This corresponds to the convolution of S ( ) , with a spectral window given by the
transform of VmN . This spectral estimate is thus biased
E{S N ( )} = S ( ) * V N ( )

At any particular frequency a ;

E{S N ( a )} = S ( a ) * V N ( a )

and the bias, B N ( a ) , is the difference between the true Power spectral density and
the mean value of its estimate.
B N ( a ) = S ( a ) E{S N ( a )}
For large N, the spectral window V N ( ) will have a high narrow main Peak and its
side lobes will be very small.
In such cases E{S N ( )} will become almost equal to the true value of the spectral
density, S() at frequencies where S() is smooth. Therefore at these frequencies
BN() vanishes for large N. But in order to achieve an equivalent bias at a frequency
where the true power spectral density has a narrower peak, N must be much larger so
that the main lobe of V N ( ) becomes narrower than the narrowest peak in S().
True S()
Narrow peak in
S()

Smooth S()

Therefore it is concluded that in order for the periodogram S N ( ) to have a small


bias, the number of data samples must be sufficiently large.

Variance of Periodogram
The variance of S N ( ) is generally not small, even for large N. As an example, if the
data xn comes from a Gaussian process, it can be shown that (see Jenkins & Watts,
spectral analysis and its applications, Holden-day, San Francisco)

LimVar[S N ( )] = S

( )

i.e. the variance of S N ( ) approaches the square of the true spectrum at each
frequency . Using the ratio of mean to standard deviation as a kind of signal-tonoise ratio,

SNR =

E{S N ( )}
Var{S N ( )}

S ( )
=1
S ( )

It is seen that the signal (true spectrum) is only as big as the noise [uncertainty
in S N ( ) ]. One way to get a good spectral estimate is to average together several
Periodograms. Given 1000 data samples, we could compute 10 separate Periodograms
of length M=100 each. The ith Periodogram, for example, would then be given by

1 100i 1 jn
S100,i ( ) =
, for i = 1,2,...,10
xn e
100 n =100(i 1)

e.g. i=1

i=2

1
S100,1 ( ) =
100
S100, 2 ( ) =

1
100

99

x e
n =0

2
jn

199

xn e jn

n =100

This is of course done by cutting the original data into 10 equal-length segments. An
average Periodogram can then be computed from I individual Periodograms of
length M. for example when I=10.
I

1
I
SM
( ) = S M ,i ( )
I i =1
1 10
10
S100
( ) = S100,i ( )
10 i =1

Since the 10 estimates are identically distributed Periodograms, the averaged spectral
estimates will have the same mean value as any of the estimates. However, the
averaged estimate will have a smaller variance. If the N Periodograms were
statistically independent, the variance of the averaged estimated would be;
1
1
2
VarS MI ( ) = VarS M ( ) [S ( )]
I
I

The Welch method


In this method the data segments can be overlapped so that each of the I segments
can have more than N/I samples. The resulting subsidiary Periodograms are more
statistically dependent so there is less than 1/I variance reduction. But the reduced
bias may yield a net improvement in the estimates. Alternatively, by overlapping data
samples one may obtain more than I segments each with N/I samples. Thus the data
segments may be represented as:
xi (n) = x(n + iD )

n = 0,1,..........., M 1
i = 0,1,..........., L 1

Where iD is the starting point of the i-th sequence.


For D = M, the segments do not overlap. But if for example D = M/2, there is a 50%
overlap between successive data segments, and L = 2I segments are obtained. Each
such segment is then windowed (using a length- M data window such as a triangular
or hamming window) prior to computing the Periodogram. The window reduces the
sidelobe frequency leakage at the expenses of resolution. The result is a modified
Periodogram. The Welch power spectrum estimate is the average of these modified
Periodograms.
1 L 1
S w ( ) = S i ( )
L i =0
Although the subsidiary Periodograms are more statistically dependent, the increased
number of segments averaged reduces the variance.

You might also like