Professional Documents
Culture Documents
SIGNAL
PROCESSING
with selected topics
ADAPTIVE SYSTEMS
TIME-FREQUENCY ANALYSIS
SPARSE SIGNAL PROCESSING
Ljubiša Stanković
2015
2
ISBN-13: 978-1514179987
ISBN-10: 1514179989
All right reserved. Printed and bounded in the United States of America.
To
my parents
Božo and Cana,
my wife Snežana,
and our
Irena, Isidora, and Nikola.
4
Contents
I Review 19
Chapter 1 Continuous-Time Signals and Systems 21
1.1 Continuous-Time Signals 21
1.2 Periodic Signals and Fourier Series 24
1.2.1 Fourier Series of Real-Valued Signals 28
1.2.2 Linear Systems 33
1.3 Fourier Transform 35
1.3.1 Fourier Transform and Linear Time-Invariant
Systems 37
1.3.2 Properties of the Fourier Transform 37
1.3.3 Relationship Between the Fourier Series and
the Fourier Transform 40
1.4 Fourier Transform and Stationary Phase Method 42
1.5 Laplace Transform 48
1.5.1 Linear Systems Described by Differential Equa-
tions 51
1.5.2 Table of the Laplace Transform 52
1.6 Butterworth Filter 53
5
6 Contents
Index 811
T
HIS
teaching and research in signal processing. It is written for students
and engineers as a first book in digital signal processing, assuming
that a reader is familiar with the basic mathematics, including integrals, dif-
ferential calculus, and linear algebra. Although a review of continuous-time
analysis is presented in the first chapter, a prerequisite for the presented
content is a basic knowledge about continuous-time signal processing.
The book consists of three parts. After an introductory review part, the
basic principles of digital signal processing are presented within Part two of
the book. This part starts with Chapter two which deals with basic defini-
tions, transforms, and properties of discrete-time signals. The sampling the-
orem, providing essential relation between continuous-time and discrete-
time signals, is presented in this chapter as well. Discrete Fourier transform
and its applications to signal processing are the topic of the third chapter.
Other common discrete transforms, like Cosine, Sine, Walsh-Hadamard,
and Haar are also presented in this chapter. The z-transform, as a power-
ful tool for analysis of discrete-time systems, is the topic of Chapter four.
Various methods for transforming a continuous-time system into a corre-
sponding discrete-time system are derived and illustrated in Chapter five.
Chapter six is dedicated to the forms of discrete-time system realizations.
Basic definitions and properties of random discrete-time signals are given
in Chapter six. Systems to process random discrete-time signals are consid-
ered in this chapter as well. Chapter six concludes with a short study of
quantization effects.
The presentation is supported by numerous illustrations and exam-
ples. Chapters within Part two are followed by a number of solved and
unsolved problems for practice. Theory is explained in a simple way with
a necessary mathematical rigor. The book provides simple examples and
13
14 Preface
London,
July 2013 - July 2015.
Author
Introduction
S
IGNAL
physical or symbolic representation of an information. Signal theory
and processing are the areas dealing with the efficient generation,
description, transformation, transmission, reception, and interpretation of
information. In the beginning, the most common physical processes used
for these purposes were the electric signals, for example, varying current
or electromagnetic waves. Signal theory is most commonly studied within
electrical engineering. Signal theory theory are strongly related to the ap-
plied mathematics and information theory. Examples of signals include
speech, music, image, video, medical, biological, geophysical, sonar, radar,
biomedical, car engine, financial, and molecular data. In terms of signal gen-
eration, the main topics are in sensing, acquisition, synthesis, and reproduc-
tion of information. Various mathematical transforms, representations, and
algorithms are used for describing signals. Signal transformations are a set
of methods for decomposition, filtering, estimation and detection. Modu-
lation, demodulation, detection, coding, and compression are the most im-
portant aspects of the signal transmission. In the process of interpretation,
various approaches may be used, including adaptive and learning-based
tools and analysis.
Mathematically, signals are presented by functions of one or more vari-
ables. Examples of one-dimensional signals are speech and music signals.
A typical example of a two-dimensional signal is an image while video se-
quence is a sample of a three-dimensional signal. Some signals, for example,
geophysical, medical, biological, radar, or sonar, may be represented and
interpreted as one-dimensional, two-dimensional, or multidimensional.
Signals may be continuous functions of independent variables, for ex-
ample, functions of time and/or space. Independent variables may also
be discrete, with the signal values being defined only over an ordered set
15
16 Introduction
xd(n)
x(n)
x(t)
8 1000
7 0111
0.4 0.4 6 0110
5 0101
4 0100
0.2 0.2 3 0011
2 0010
1 0001
0 0 0000
0 5 10 15 0 5 10 15 0 5 10 15
t n n
Figure 1 Illustration of a continuous signal and its discrete-time and digital version.
ANALOG SYSTEM
x(t) y(t)
ha(t)
DIGITAL SYSTEM
Figure 2 Illustration of an analog and a digital system used to process an analog signal.
Review
19
Chapter 1
Continuous-Time Signals and Systems
M
OST
time signals. In many applications, the result of signal process-
ing is presented and interpreted in the continuous-time domain.
Throughout the course of digital signal processing, the results will be dis-
cussed and related to the continuous-time forms of signals and their param-
eters. This is the reason why the first chapter is dedicated to a review of
signals and transforms in the continuous-time domain. This review will be
of help in establishing proper correspondence and notation for the presen-
tation that follows in the next chapters.
In the Heaviside function definition, the value of u(0) = 1/2 is also used.
Note that the independent variable t is continuous, while the signal itself is
not a continuous function. It has a discontinuity at t = 0.
The boxcar signal (rectangular window) is formed as b(t) = u(t +
1/2) − u(t − 1/2), that is, b(t) = 1 for −1/2 ≤ t < 1/2 and b(t) = 0 else-
where. A signal obtained by multiplying the unit-step signal by t is called
the ramp signal, with notation R(t) = tu(t).
21
22 Continuous-Time Signals and Systems
"∞ "t
u(t) = δ(τ )u(t − τ )dτ = δ(τ )dτ
−∞ −∞
or
du(t)
= δ ( t ). (1.4)
dt
A sinusoidal signal, with amplitude A, frequency Ω0 , and initial phase
ϕ, is a signal of the form
x ( t + T ) = x ( t ). (1.6)
is also periodic with period T = 2π/Ω0 . Fig. 1.1 depicts basic continuous-
time signals.
Ljubiša Stanković Digital Signal Processing 23
u(t) 1 1
δ(t)
0 0
-1 (a) -1 (b)
-4 -2 0 2 4 -4 -2 0 2 4
1 1
sin(πt)
b(t)
0 0
-1 (c) -1 (d)
-4 -2 0 2 4 -4 -2 0 2 4
t t
Figure 1.1 Continuous-time signals: (a) unit-step signal, (b) impulse signal, (c) boxcar signal,
and (d) sinusoidal signal.
n =0
Example 1.2. Find the periods of signals: x1 (t) = sin(2πt/36), x2 (t) = cos(4πt/15 +
2), x3 (t) = exp( j0.1t), x4 (t) = x1 (t) + x2 (t), and x5 (t) = x1 (t) + x3 (t).
⋆Periods are calculated according to (1.6). For x1 (t) the period follows
from 2πT1 /36 = 2π, as T1 = 36. Similarly, T2 = 15/2 and T3 = 20π. The
period of x4 (t) is the smallest interval containing T1 and T2 . It is T4 = 180 (5
periods of x1 (t) and 24 periods of x2 (t)). For signal x5 (t), when the periods of
components are T1 = 36 and T3 = 20π, there is no common interval T5 such
that the periods T1 and T3 are contained an integer number of times. Thus,
the signal x5 (t) is not periodic.
• Signal energy
"∞
Ex = | x (t)|2 dt, (1.9)
−∞
"T
1
PAV = lim | x (t)|2 dt.
T →∞ 2T
−T
if the Dirichlet conditions are met: (1) the signal x (t) has a finite number of
discontinuities within the period T; (2) it has a finite average value in the
period T; and (3) the signal has a finite number of maxima and minima.
Since the signal analysis deals with real-world physical signals, rather than
with mathematical generalizations, these conditions are almost always met.
Ljubiša Stanković Digital Signal Processing 25
# $ 1 T/2"
e j2πmt/T , e j2πnt/T = e j2πmt/T e− j2πnt/T dt
T
− T/2
%
1 for m = n
= sin(π (m−n)) .
π (m−n)
= 0 for m ̸= n
It means that the inner product of any two different basis functions is zero
(orthogonal set), while the inner product of a function with itself is 1 (normal
set). In the case of orthonormal set of basis functions, it is easy to show that
the weighting coefficients Xn can be calculated as projections of x (t) onto
the basis functions e j2πnt/T ,
# $ 1 T/2
"
Xn = x (t), e j2πnt/T = x (t)e− j2πnt/T dt. (1.12)
T
− T/2
This relation follows after a simple multiplication of the right and left sides
& T/2
of (1.11) by e− j2πmt/T and an integration within the period T1 −T/2 (·) dt.
Example 1.3. Show that the Fourier series coefficients Xn of a periodic signal x (t)
can be obtained by minimizing the mean square error between the signal and
∑nN=− N Xn e j2πnt/T within the period T.
⋆The mean square value of error
N
e(t) = x (t) − ∑ Xn e j2πnt/T ,
n=− N
From ∗
∂I/∂Xm = 0 follows
T/2
"
( )
N
1
e− j2πmt/T x (t) − ∑ Xn e j2πnt/T dt = 0
T n=− N
− T/2
T/2
"
1
Xm = x (t)e− j2πmt/T dt. (1.13)
T
− T/2
26 Continuous-Time Signals and Systems
∂F (z) ∂ | f (z)|2
∗ = = 2e− jα f (z).
∂z ∂z∗
In our case
' '
' '
| f (z)|2 = 'a + jb + e jα ( x + jy)'
= ( a + x cos α − y sin α)2 + (b + x sin α + y cos α)2
For the minimization of a function of two variables x and y we need partial
derivatives
∂ | f (z)|2
= 2 cos α( a + x cos α − y sin α)+ (1.14)
∂x
2 sin α(b + x sin α + y cos α)
= 2 Re{e− jα f (z)}
and
∂ | f (z)|2
= 2 Im{e− jα f (z)}. (1.15)
∂y
Therefore, all calculations with two real-valued equations (1.14) and (1.15)
are the same as using one complex valued relation
* +
∂ | f (z)|2 ∂ | f (z)|2 ∂ ∂ ∂F (z)
+j = +j F ( x, y) = .
∂x ∂y ∂x ∂y ∂z∗
Since the signal and the basis functions are periodic with period T, in
all previous integrals, we can use
T/2
" " +Λ
T/2
1 − j2πnt/T 1
x (t)e dt = x (t)e− j2πnt/T dt (1.16)
T T
− T/2 − T/2+Λ
Ljubiša Stanković Digital Signal Processing 27
Example 1.5. Calculate the Fourier series coefficients of a periodic signal x (t)
defined as
∞
x (t) = ∑ x0 (t + 2n)
n=−∞
with
x0 (t) = u(t + 1/4) − u(t − 1/4). (1.18)
x(t) Xn
-1 -1/4 1/4 1
t 0 n
Figure 1.2 Periodic signal (left) and its Fourier series coefficients (right).
1.5 1.5
1 1
x (t)
x (t)
0.5 0.5
1
0 0
(a) (b)
-0.5 -0.5
-2 -1 0 1 2 -2 -1 0 1 2
t t
1.5 1.5
1 1
x (t)
x (t)
0.5 0.5
30
6
0 0
(c) (d)
-0.5 -0.5
-2 -1 0 1 2 -2 -1 0 1 2
t t
Figure 1.3 Illustration of signal reconstruction by using a finite Fourier series with: (a)
coefficients Xn within −1 ≤ n ≤ 1, (b) coefficients Xn within −2 ≤ n ≤ 2, (c) coefficients Xn
within −6 ≤ n ≤ 6, and (d) coefficients Xn within −30 ≤ n ≤ 30.
For a real-valued signal x (t) the Fourier series coefficients can be written as
T/2
" T/2
"
1 2πnt 1 2πnt
Xn = x (t) cos( )dt − j x (t) sin( )dt
T T T T
− T/2 − T/2
An − jBn
= . (1.20)
2
Ljubiša Stanković Digital Signal Processing 29
∗
where An /2 and − Bn /2 are real and imaginary part of Xn . Since Xn = X− n
holds for real-valued signals, the values of An and Bn are
T/2
"
2 2πnt
A n = Xn + X− n = x (t) cos( )dt,
T T
− T/2
T/2
"
Xn − X− n 2 2πnt
Bn = = x (t) sin( )dt. (1.21)
−j T T
− T/2
An = Hn + H−n
Bn = Hn − H−n .
The coefficients calculated by (1.24) are the Hartley series coefficients. For a
real-valued and even signal x (t) = x (−t) this transform reduces to
T/2
"
1 2πnt
Cn = x (t) cos( )dt
T T
− T/2
0.6 0.6
0.4 0.4
x1(t)
x (t)
2
0.2 0.2
0 0
(a) (b)
-1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1
t t
0.6 0.6
0.4 0.4
x (t)
x6(t)
30
0.2 0.2
0 0
(c) (d)
-1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1
t t
Figure 1.4 Reconstruction of a signal using the Fourier series. Reconstructed signal is denoted
by x M (t), where M indicates the number of coefficients used in reconstruction.
with X0 = 1/8. Note that the relation between the Fourier coefficients in (a)
(b) ( a)
and (b) is 2X2n = Xn . The reconstruction is presented in Fig.1.5.
(c) For the signal xc (t) extended with its reversed version follows
1/2
" "1
Cn = te− j2πnt dt + (1 − t)e− j2πnt dt
0 1/2
1/2
"
(−1)n − 1
=2 t cos(2πnt)dt =
2π 2 n2
0
0.6 0.6
0.4 0.4
x1(t)
x2(t)
0.2 0.2
0 0
(a) (b)
-1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1
t t
0.6 0.6
0.4 0.4
x30(t)
x6(t)
0.2 0.2
0 0
(c) (d)
-1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1
t t
Figure 1.5 Reconstruction of a periodic signal, with a zero interval extension before using the
Fourier series.
0.6 0.6
0.4 0.4
x1(t)
x2(t)
0.2 0.2
0 0
(a) (b)
-1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1
t t
0.6 0.6
0.4 0.4
x30(t)
x6(t)
0.2 0.2
0 0
(c) (d)
-1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1
t t
Figure 1.6 Reconstruction of a periodic signal after an even extension before using the Fourier
series (cosine Fourier series).
Ljubiša Stanković Digital Signal Processing 33
A system transforms one signal (input signal) into another signal (output
signal). Assume that x (t) is the input signal. The system transformation will
be denoted by an operator T {◦}. The output signal can be written as
A system is linear if, for any two signals x1 (t) and x2 (t) and arbitrary
constants a1 and a2 , it holds
for any t0 .
Linear time-invariant (LTI) systems are fully described by their re-
sponse to the impulse signal. If we know the impulse response of these
systems,
h(t) = T {δ(t)},
then for arbitrary signal x (t) at the input, the output can be calculated, by
using (1.3), as
⎧ ⎫
⎨ "∞ ⎬
y(t) = T { x (t)} = T x (τ )δ(t − τ )dτ
⎩ ⎭
−∞
"∞ "∞
Linearity Time−invariance
= x (τ )T {δ(t − τ )}dτ = x (τ )h(t − τ )dτ.
−∞ −∞
x ( t ) ∗ t h ( t ) = h ( t ) ∗ t x ( t ). (1.29)
Example 1.7. Find a convolution of signals x (t) = u(t + 1) − u(t − 1) and h(t) =
e − t u ( t ).
⋆By using the convolution definition, we get
"∞ "1
y(t) = x (τ )h(t − τ )dτ = 1 · e−(t−τ ) u(t − τ )dτ
−∞ −1
⎧ & t +1
⎪
⎪ e−λ dλ = e−t (e − 1/e), for t ≥ 1
t −1
" ⎨ t −1
⎪
& t +1 − λ
=− e−λ u(λ)dλ = −(t+1) ,
⎪ 0 e dλ = 1 − e
⎪
for − 1 ≤ t < 1
t +1 ⎪
⎩
0 for t < −1.
"∞
|h(τ )|dτ < ∞ (1.30)
−∞
since
"∞ "∞
|y(t)| = | x (t − τ )h(τ )dτ | ≤ | x (t − τ )h(τ )|dτ
−∞ −∞
"∞ "∞
= | x (t − τ )|| h(τ )|dτ ≤ Mx |h(τ )|dτ < ∞,
−∞ −∞
if (1.30) holds.
It can be shown that the absolute value integrability of the impulse
response is the necessary condition for a linear time-invariant system to be
stable as well.
Ljubiša Stanković Digital Signal Processing 35
The Fourier series has been introduced and presented for periodic signals,
with a period T. Assume now that we extend the period to infinity, while not
changing the signal. This case corresponds to the analysis of an aperiodic
signal x (t). Its transform, the Fourier series coefficients normalized by the
period, is given by
T/2
" "∞
− j2πnt/T
lim Xn T = lim x (t)e dt = x (t)e− jΩt dt (1.31)
T →∞ T →∞
− T/2 −∞
"∞
X (Ω) = x (t)e− jΩt dt, (1.32)
−∞
is called the Fourier transform (FT) of a signal x (t). For the Fourier trans-
form existence it is sufficient that a signal is absolutely integrable. There
are some signals that do not satisfy this condition, whose Fourier transform
exists in a form of generalized functions, such as delta function.
The inverse Fourier transform (IFT) can be obtained by multiplying
both sides of (1.32) by e jΩτ and integrating over Ω,
"∞
1
x (t) = X (Ω)e jΩt dΩ. (1.33)
2π
−∞
36 Continuous-Time Signals and Systems
Example 1.8. Calculate the Fourier transform of x (t) = Ae−at u(t), a > 0.
⋆According to the Fourier transform definition we get
"∞
A
X (Ω) = Ae− at e− jΩt dt = .
( a + jΩ)
0
where
"0 "∞
2Ω
Xa (Ω) = −e at e− jΩt dt + e− at e− jΩt dt = . (1.35)
ja2 + jΩ2
−∞ 0
It results in
2
X (Ω) = . (1.36)
jΩ
Example 1.10. Find the Fourier transform of δ(t), x (t) = 1 and u(t).
⋆The Fourier transform of δ(t) is
"∞
FT{δ(t)} = δ(t)e− jΩt dt = 1. (1.38)
−∞
FT{1} = 2πδ(Ω).
Finally,
! 6
sign(t) + 1 1
FT{u(t)} = FT = + πδ(Ω). (1.39)
2 jΩ
"∞
y(t) = x (t) ∗t h(t) = Ae j(Ω0 (t−τ )+ ϕ) h(τ )dτ
−∞
"∞
= Ae j(Ω0 t+ ϕ) h(τ )e− jΩ0 τ dτ = H (Ω0 ) x (t), (1.40)
−∞
where
"∞
H (Ω) = h(t)e− jΩt dt (1.41)
−∞
is the Fourier transform of h(t). The linear time-invariant system does not
change the form of an input complex harmonic signal Ae j(Ω0 t+ ϕ) . It remains
complex harmonic signal after passing through the system, with the same
frequency Ω0 . The amplitude of the input signal x (t) is changed for | H (Ω0 )|
and the phase is changed for arg { H (Ω0 )}.
1. Linearity
where X1 (Ω) and X2 (Ω) are the Fourier transforms of signals x1 (t) and
x2 (t), separately.
2. Realness
The Fourier transform of a signal is real (i.e., X ∗ (Ω) = X (Ω)), if
x ∗ (−t) = x (t),
since
"∞ "∞
t→−t
X ∗ (Ω) = x ∗ (t)e jΩt dt = x ∗ (−t)e− jΩt dt = X (Ω), (1.43)
−∞ −∞
if x ∗ (−t) = x (t).
3. Modulation
"∞
FT{ x (t)e jΩ0 t } = x (t)e jΩ0 t e− jΩt dt = X (Ω − Ω0 ) (1.44)
−∞
FT{2x (t) cos(Ω0 t)} = X (Ω − Ω0 ) + X (Ω + Ω0 ).
4. Shift in time
"∞
FT{ x (t − t0 )} = x (t − t0 )e− jΩt dt = X (Ω)e− jt0 Ω . (1.45)
−∞
5. Time-scaling
"∞
1 Ω
FT{ x ( at)} = x ( at)e− jΩt dt = X ( ). (1.46)
| a| a
−∞
6. Convolution
"∞ "∞
FT{ x (t) ∗t h(t)} = x (τ )h(t − τ )e− jΩt dτdt (1.47)
−∞ −∞
"∞ "∞
t−τ →u
= x (τ )h(u)e− jΩ(τ +u) dτdu = X (Ω) H (Ω).
−∞ −∞
Ljubiša Stanković Digital Signal Processing 39
7. Multiplication
"∞ "∞
1
FT{ x (t)h(t)} = x (t) H (θ )e jθt dθe− jΩt dt (1.48)
2π
−∞ −∞
"∞
1
= H (θ ) X (Ω − θ )dθ = X (Ω) ∗Ω H (Ω) = H (Ω) ∗Ω X (Ω).
2π
−∞
9. Differentiation
⎧ ⎛ ⎞⎫
! 6 ⎨d "∞ ⎬
dx (t) ⎝ 1
FT = FT X (Ω)e jΩt dΩ⎠ = jΩX (Ω). (1.50)
dt ⎩ dt 2π ⎭
−∞
10. Integration
The Fourier transform of
"t
x (τ )dτ
−∞
Then,
⎧ ⎫
⎨ "t ⎬
FT = FT{ x (t)}FT{u(t)} =
x (τ )dτ (1.51)
⎩ ⎭
−∞
* +
1 1
X (Ω) + πδ(Ω) = X (Ω) + πX (0)δ(Ω).
jΩ jΩ
40 Continuous-Time Signals and Systems
It can be written as
where Xh (Ω) is the Fourier transform of the Hilbert transform of the signal
x (t). From Example 1.9 with the signal x (t) = sign(t) and the duality prop-
erty of the Fourier transform pair, obviously the inverse Fourier transform
of sign(Ω) is j/(πt). Therefore, the analytic part of a signal, in the time
domain, reads as
"∞
j 1 x (τ )
x a (t) = x (t) + jxh (t) = x (t) + x (t) ∗t = x (t) + j dτ. (1.54)
πt π t−τ
−∞
p.v.
where p.v. stands for Cauchy principal value of the considered integral.
1.3.3 Relationship Between the Fourier Series and the Fourier Transform
Consider an aperiodic signal x (t), with the Fourier transform X (Ω). As-
sume that the signal is of a limited duration (i.e., x (t) = 0 for |t| > T0 /2).
Then,
T"0 /2
X (Ω) = x (t)e− jΩt dt. (1.55)
− T0 /2
The periodic signal x p (t) can be expanded into Fourier series with the
coefficients
T/2
"
1
Xn = x p (t)e− j2πnt/T dt. (1.56)
T
− T/2
Ljubiša Stanković Digital Signal Processing 41
T/2
" T"0 /2
− j2πnt/T
x p (t)e dt = x (t)e− jΩt dt|Ω=2πn/T
− T/2 − T0 /2
or
1
Xn =
X (Ω)|Ω=2πn/T . (1.57)
T
It means that the Fourier series coefficients are the samples of the Fourier
transform, divided by T. The only condition in the derivation of this relation
is that the signal duration is shorter than the period of periodic extension
(i.e., T > T0 ). The sampling interval in frequency is
2π 2π
∆Ω = , ∆Ω < .
T T0
It should be smaller than 2π/T0 , where T0 is the signal x (t) duration. This
is a form of the sampling theorem in the frequency domain. The sampling
theorem in the time domain will be discussed later.
In order to write the Fourier series coefficients in the Fourier transform
form, note that a periodic signal x p (t), formed by a periodic extension of
x (t) with period T, can be written as
∞ ∞
x p (t) = ∑ x (t + nT ) = x (t) ∗t ∑ δ(t + nT ). (1.58)
n=−∞ n=−∞
since
% ; "∞
∞ ∞
FT ∑ δ(t + nT ) = ∑ δ(t + nT )e− jΩt dt
n=−∞ n=−∞−∞
∞ * +
jΩnT 2π ∞ 2π
= ∑ e = ∑ δ Ω− T n .
T n=−
(1.60)
n=−∞ ∞
42 Continuous-Time Signals and Systems
When a signal
x (t) = A(t)e jφ(t) (1.61)
is not of a simple analytic form, it may be possible, in some cases, to obtain
an approximative expression for its Fourier transform by using the method
of stationary phase.
The method of stationary phase states that if the phase function φ(t) is
monotonous and the amplitude A(t) is sufficiently smooth function, then
<
"∞
2πj
A(t)e jφ(t) e− jΩt dt ≃ A(t0 )e jφ(t0 ) e− jΩt0 , (1.62)
|φ′′ (t0 )|
−∞
Ωi ( t ) = φ′ ( t )
Around this point the phase can be expanded into a Taylor series as
1
φ(t) − Ωt = [φ(t0 ) − Ωt0 ] + [φ′ (t0 ) − Ω] + φ′′ (t0 )t2 + ...
2
Ljubiša Stanković Digital Signal Processing 43
"∞ "∞
1 ′′ ( t 2
A(t)e j(φ(t)−Ωt) dt ∼
= A(t0 )e j(φ(t0 )−Ωt0 ) ej 2 φ 0 )t dt
−∞ −∞
where A(t) ∼
= A(t0 ) is also used. With
<
"∞
j 12 at2 2πj
e dt =
| a|
−∞
8πt0 + 10π = Ω
Ω − 10π
t0 =
8π
and
φ′′ (t0 ) = 8π. (1.63)
The amplitude of X (Ω) is
'< ' =
' 2π '' 2π
' 2 2
| X (Ω)| ≃ A(t0 ) ' ' = exp (−( t 0 − 1 ) t 0 )
' φ′′ (t0 ) ' 8π
( >* + ?* + )
1 Ω − 10π 2 Ω − 10π 2
= exp − −1 (1.64)
2 8π 8π
The signal, stationary phase approximation of the Fourier transform and the
numerical value of the Fourier transform amplitudes are shown in Fig.1.7
44 Continuous-Time Signals and Systems
1
x(t)
-1
-4 -3 -2 -1 0 1 2 3 4
t
1
0.5
0
-100 -80 -60 -40 -20 0 20 40 60 80 100
Ω
1
Numeric calculation
|X(Ω)|
0.5
0
-100 -80 -60 -40 -20 0 20 40 60 80 100
Ω
Figure 1.7 The signal (top), along with the stationary phase method approximation of its
Fourier transform and the Fourier transform obtained by a numeric calculation (bottom).
and * +
Ω (2N −2)/(2N −1)
φ′′ (t0 ) = 2N (2N − 1) a . (1.65)
2Na
The amplitude and phase of X (Ω), according to (1.62), are
' '
2 2
' 2π '
'
| X (Ω)| ≃ A (t0 ) ' ′′ ' (1.66)
φ ( t0 ) '
* +1/(2N −1) '' * +1/(2N −1) ''
Ω ' 2π Ω '
= A2 ( )' '
2Na ' (2N − 1)Ω 2aN '
* +1/(2N −1)
(1 − 2N ) Ω
arg { X (Ω)} ≃ φ(t0 ) − Ωt0 + π/4 = Ω + π/4
2N 2aN
the method of stationary phase states that if the Fourier transform phase
function θ (t) is monotonous and the amplitude B(t) is sufficiently smooth
function, then
<
"∞
1 jθ (Ω) jΩt jθ (Ω0 ) jΩ0 t j
x (t) = B(Ω)e e dΩ ≃ B(Ω0 )e e , (1.68)
2π 2π |θ ′′ (Ω0 )|
−∞
−θ ′ (Ω0 ) = t,
and
t g = −θ ′ (Ω)
is the group delay.
Example 1.13. Consider a system with transfer function
aΩ0 + b = t
t−b
Ω0 =
a
and
θ ′′ (Ω0 ) = − a.
The impulse response is
<
2 j
h(t)) ≃ exp(−Ω20 )e− jaΩ0 /2− jbΩ0 + jΩ0 t
2π |θ ′′ (Ω0 )|
* + =
t − b 2 j((t−b)2 /(2a)+π/4) 1
= exp(− )e .
a 2πa
Example 1.14. For a system with frequency response H (Ω) = | H (Ω)| e j0 the im-
pulse response is h(t). Find the impulse response of the systems with transfer
functions shown in Fig.1.8 with:
(a) Ha (Ω) = | H (Ω)| e− j4Ω ,
2
(b) Hb (Ω) = | H (Ω)|@e− j2πΩ , and A
3
(c) Hc (Ω) = | H (Ω)| 4 + 14 cos(2πΩ2 ) e j0 .
"∞
1
h a (t) = H (Ω)e− j4Ω e jΩt = h(t − 4).
2π
−∞
"∞
1 2
hb (t) = H (Ω)e− j2πΩ e jΩt dΩ.
2π
−∞
|Hb(jΩ)|
|Ha(jΩ)|
|H (jΩ)|
c
Ω Ω Ω
arg{Hb(jΩ)}
arg{Ha(jΩ)}
arg{H (jΩ)}
c
Ω Ω Ω
|ha(t)|=|h(t-4)|
|hb(t)|
|h (t)|
c
t t t
Figure 1.8 Frequency response of systems (amplitude, top row, and phase, middle row) with
corresponding impulse responses (amplitude - bottom row).
48 Continuous-Time Signals and Systems
"∞
X (s) = L{ x (t)} = x (t)e−st dt, (1.69)
−∞
if
lim e−(s+ a)t = 0
t→∞
or σ + a > 0, that is, σ > − a. Therefore, the region of convergence of this
Laplace transform is the region where σ > − a. The point s = − a is the pole of
Ljubiša Stanković Digital Signal Processing 49
"∞ "∞
−σt −σt − jΩt
FT{ x (t)e }= x (t)e e dt = x (t)e−st dt = X (s). (1.70)
−∞ −∞
γ"+ jT
1
x (t) = lim X (s)est ds
2πj T →∞
γ− jT
Since the Laplace transform will be used to describe linear systems de-
scribed by linear differential equations we will consider only the relation of
the signal derivatives with the corresponding forms in the Laplace domain.
In general the Laplace transform of the first derivative dx (t)/d(t) of a signal
x (t) is
"∞ "∞
dx (t) −st
e dt = s x (t)e−st dt = sX (s).
dt
−∞ −∞
This relation follows by integration in part of the first integral, with the
assumption that the values of x (t)e−st are zero as t → ±∞.
In many applications it has been assumed that the systems are causal
with corresponding causal signals used in calculations. In these cases x (t) =
0 for t < 0, i.e., x (t) = x (t)u(t). Then the so called one-sided Laplace trans-
form (unilateral Laplace transform) is used. Its definition is
"∞
X (s) = x (t)e−st dt.
0
d( x (t)u(t)) dx (t)
= u ( t ) + x (0 ) δ ( t ).
dt dt
"∞ "∞
dx (t) −st
e dt = x (t)e−st |0∞ + s x (t)e−st dt = sX (s) − x (0).
dt
0 0
"∞ n "∞
d x (t) −st
n
e dt = s n
x (t)e−st dt − sn−1 x (0) − sn−2 x ′ (0) − ... − x (n−1) (0)
dt
0 0
= s X (s) − sn−1 x (0) − sn−2 x ′ (0) + ... − x (n−1) (0).
n
"t
1
L{ x (τ )dτ } = L{u(t) ∗t x (t)} = X (s)},
s
0
&∞
since L{u(t)} = 0 e−st dt = 1/s.
The initial and final values of the signal are x (0) = lims→∞ sX (s) and
x (∞) = lims→0 sX (s), respectively.
After we have established the relation between the Laplace transform and
signals derivatives we may use it to analyze the systems described by
differential equations. Consider a causal system
with the initial conditions x (0) = x ′ (0) = x (n−1) (0) = 0. The Laplace trans-
form of both sides of this differential equation is
Y (s) b s M + ... + b1 s + b0
H (s) = = M N .
X (s) a N s + ... + a1 s + a0
|H(jΩ)|2
|H(jΩ)|
Ω Ω Ω
Figure 1.9 Squared amplitude of the frequency response of a Butterworth filter of order N.
1
| H ( jΩ)|2 = B C2N .
Ω
1+ Ωc
1
H ( jΩ) H (− jΩ) = B C2N
jΩ
1+ jΩc
1
H (s) H (−s) = B C2N for s = jΩ.
s
1+ jΩc
54 Continuous-Time Signals and Systems
Im{s}
Im{s}
Re{s} Re{s} Re{s}
s0 , s1 , ..., s N −1
within the left side of the s plane, where Re{s} < 0, i.e., π/2 < αk < 3π/2.
The symmetric poles with Re {s} > 0 are the poles of H (−s). They are not
used in the filter design.
Example 1.18. Design a lowpass Butterworth filter with:
(a) N = 3 with Ωc = 1,
(b) N = 4 with Ωc = 3.
⋆(a) Poles for N = 3 with Ωc = 1 have the phases
2πk + π π
αk = + , for k = 0, 1, 2.
6 2
Ljubiša Stanković Digital Signal Processing 55
with
c 1
H (s) = √ √ =
(s + 1
−j 3 1 3 (s2 + s + 1)(s + 1)
2 2 )( s + 2 +j 2 )( s + 1)
2πk + π π
αk = + , for k = 0, 1, 2, 3.
8 2
with
c
H (s) =
(s2 + 2.296s + 9)(s2 + 5.543s + 9)
9
= 2
(s + 2.296s + 9)(s2 + 5.543s + 9)
In practice we usually do not know the filter order, but its passband
frequency Ω p and stopband frequency Ωs , with a maximal attenuation in
the passband a p [dB] and a minimal attenuation in the stopband a p [dB], as
shown in Fig.1.11. Based on these values we can calculate the order N and
the critical frequency Ωc needed for a filter design.
56 Continuous-Time Signals and Systems
1
A
p
|H(jΩ)|2
A
s
Ω Ω
p s
Figure 1.11 Specification of a Butterworth filter parameters in the passband and stopband.
1 2
B C2N ≥ A p (1.72)
Ωp
1+ Ωc
1 2
B C2N ≤ As .
Ωs
1+ Ωc
Nearest greater integer is assumed for the filter order N. Then we can
use any of the relations in (1.72) with equality sign to calculate Ωc . If we
' '2
choose the first one then Ωc will satisfy ' H ( jΩ p )' = A2p , while if we use
the second relation the value of Ωc will satisfy | H ( jΩs )|2 = A2s . These two
values differ. However both of them are within the defined criteria for the
transfer function.
The relation
a = 20 log A
or A = 10a/20 should be used for the attenuation given in [dB] .
All other filter forms, like passband and highpass, may be obtained
from a lowpass filter with appropriate signal modulations. These modula-
tions will be discussed for discrete-time filter forms in Chapter V.
Part II
57
Chapter 2
Discrete-Time Signals and Transforms
T
HE
tion in time. A continuous-time signal is converted into a sequence
of numbers, defining the discrete-time signal. The basic definitions
of discrete-time signals and their transforms are presented in this chapter.
The key fact in the conversion from a continuous-time signal into a sequence
of numbers is that these two signal representations are equivalent under cer-
tain conditions. The discrete-time signal may contain the same information
as the original continuous-time signal. The sampling theorem is fundamen-
tal for this relation between two signal forms. It is presented in this chapter,
after basic definitions of discrete-time signals and systems are introduced.
59
60 Discrete-Time Signals and Transforms
Δt t n
Figure 2.1 Signal discretization: continuous-time signal (left) and corresponding discrete-
time signal (right).
context will always be clear, so that there is no doubt what kind of signal is
considered. Notation x [n] is sometimes used in literature for discrete-time
signals, instead of x (n).
as illustrated in Fig.2.3.
The discrete unit-step signal is defined by
!
1, for n ≥ 0
u(n) = . (2.5)
0, for n < 0
Ljubiša Stanković Digital Signal Processing 61
1 1
x(n)=u(n)
δ(n)
0 0
-1 -1
(a) (b)
-10 0 10 -10 0 10
t n
1 1
x(n)=b(n)
sin(nπ/4)
0 0
-1 -1
(c) (d)
-10 0 10 -10 0 10
n n
Figure 2.2 Illustration of discrete-time signals: (a) unit-step function, (b) discrete-time im-
pulse signal, (c) boxcar signal b(n) = u(n + 2) − u(n − 3), and (d) discrete-time sinusoid.
4 4
2 2
-2 δ (n+2)
x(n)
0 0
-2 -2
-4 -4
-5 0 5 -5 0 5
n n
4 4
2 2
- δ(n-1 )
3δ(n)
0 0
-2 -2
-4 -4
-5 0 5 -5 0 5
n n
δ ( n ) = u ( n ) − u ( n − 1)
n
u(n) = ∑ δ ( k ).
k=−∞
x ( n + N ) = x ( n ). (2.7)
x (n) = x (−n).
where xe (n) and xo (n) are its even and odd part, respectively.
Ljubiša Stanković Digital Signal Processing 63
⋆For a signal x (n) we can form its even and odd part as
x (n) + x (−n)
xe (n) =
2
x (n) − x (−n)
xo (n) = .
2
Summing these two parts, the signal x (n) is reconstructed. Note that xo (0) =
0.
A signal is Hermitian if
x (n) = x ∗ (−n).
1 N # $
PAV = lim ∑ | x (n)|2 = | x (n)|2 , (2.9)
N →∞ 2N + 1 n=− N
# $
where | x (n)|2 is used to denote an average over large number of signal
values, as N → ∞. The average power of signals with a finite energy (energy
signals) is PAV = 0. For power signals (when 0 < PAV < ∞) the energy is
infinite, Ex → ∞.
Example 2.3. The energy of signal x (n) is Ex = 10. The energy of its even part is
Exe = 3. Find the energy of its odd part.
⋆The energy of signal is
∞ ∞
Ex = ∑ | x (n)|2 = ∑ | xe (n) + xo (n)|2
n=−∞ n=−∞
∞
= ∑ [ xe (n) + xo (n)][ xe (n) + xo (n)]∗
n=−∞
∞ ∞ ∞
= ∑ | xe (n)|2 + ∑ | xo (n)|2 + ∑ [ xo (n) xe∗ (n) + xe (n) xo∗ (n)].
n=−∞ n=−∞ n=−∞
64 Discrete-Time Signals and Transforms
The terms xo (n) xe∗ (n) and xe (n) xo∗ (n) in the last sum correspond to odd
signals
For the signals xe (n) and xo (n), satisfying the previous relation, we say that
they are orthogonal.
Therefore, for the energies Ex , Exe , and Exo , holds
Ex = Ex e + Ex o .
A discrete system T {·} is linear if for any two signals x1 (n) and x2 (n) and
any two constants a1 and a2 holds
holds
T { x (n − n0 )} = y(n − n0 ),
for any t0 .
For any input signal x (n) the signal at the output of a linear time-
invariant discrete system can be calculated if we know the output to the
impulse signal. The output to the impulse signal, h(n) = T {δ(n)}, is the
impulse response.
Ljubiša Stanković Digital Signal Processing 65
3 3
2 2
1 1
h(n)
x(n)
0 0
-1 -1
-2 -2
-4 -2 0 2 4 6 8 -4 -2 0 2 4 6 8
n n
Figure 2.4 Input signal and impulse response.
x ( n ) ∗ n h ( n ) = h ( n ) ∗ n x ( n ). (2.15)
Example 2.4. Calculate discrete-time convolution of signals x (n) and h(n) shown
in Fig. 2.4.
⋆By definition, according to Fig. 2.5, we have
∞
y (0) = ∑ x (k)h(−k) = 1 − 1 + 2 = 2,
k=−∞
∞
y (1) = ∑ x (k)h(1 − k ) = −1 − 1 + 1 + 4 = 3.
k=−∞
3 3
2 2
h(-k )
1 1
x(k)
0 0
-1 -1
-2 -2
-4 -2 0 2 4 6 8 -4 -2 0 2 4 6 8
n n
3 3
2 2
h(1- k)
h(2- k)
1 1
0 0
-1 -1
-2 -2
-4 -2 0 2 4 6 8 -4 -2 0 2 4 6 8
n n
Figure 2.5 Signals for the output y(0), y(1), and y(2) calculation.
8
6
4
y(n)
2
0
-2
-4 -2 0 2 4 6 8
n
Example 2.5. Calculate the convolution of signals x (n) = n[u(n) − u(n − 10)] and
h ( n ) = u ( n ).
⋆The convolution is
∞
y(n) = ∑ k[u(k) − u(k − 10)]u(n − k ) =
k=−∞
⎧ n
⎪
⎪ n +1
⎨ ∑ k=n 2 for 0≤n≤9
k =0
= ∑ k=
⎪ 9
0≤k ≤9 and k≤n ⎪
⎩ ∑ k = 45 for n>9
k =0
n+1
=n [u(n) − u(n − 10)] + 45u(n − 10).
2
Ljubiša Stanković Digital Signal Processing 67
Therefore |y(n)| < ∞ if (2.16) holds. It can be shown that the absolute sum
convergence of the impulse response is the necessary condition for a linear
time-invariant discrete system to be stable as well.
x (n∆t)∆t −→ x (n)
Ω∆t −→ ω, (2.19)
x (n) = Ae−α|n|
2.2.1 Properties
With respect to the signal shift and modulation the Fourier transform
of discrete-time signals behaves in the same way as the Fourier transform
of continuous-time signals,
and
FT{ x (n)e jω0 n } = X (e j(ω −ω0 ) ). (2.27)
Example 2.9. The Fourier transform of a discrete-time signal x (n) is X (e jω ).
Find the Fourier transform of y(n) = x (2n).
⋆For y(n) = x (2n) the Fourier transform is
∞
FT{ x (2n)} = ∑ x (2n)e− jωn
n=−∞
∞
x (n) + (−1)n x (n)
= ∑ e− jωn/2
n=−∞ 2
∞
1
= ∑ [x(n) + e− jnπ x(n)]e− jωn/2
2 n=− ∞
1 1
= [ X (e jω/2 ) + X (e j(ω/2+π ) )] = [ X (e jω/2 ) + X (e j(ω +2π )/2 )]. (2.28)
2 2
The period of this Fourier transform is 2π. Period of X (e jω/2 ) is 4π.
Example 2.10. Calculate the Fourier transform of the discrete-time signal (rectan-
gular window),
w R ( n ) = u ( N + n ) − u ( n − N − 1 ). (2.29)
Write the Fourier transform of a Hann(ing) window
1
w H (n) = [1 + cos(nπ/N )] [u( N + n) − u(n − N − 1)] .
2
⋆By definition
N
1 − e− jω (2N +1) sin(ω 2N2+1 )
WR (e jω ) = ∑ e− jωn = e jωN = . (2.30)
n=− N 1 − e− jω sin(ω/2)
1
N=4
W (ejω)
w (n)
0.5
R
R
0
-10 0 10 -π ω π
n
1
N=8
W (ejω)
w (n)
0.5
R
R
0
-10 0 10 -π ω π
n
1
N=8
W (ejω)
w (n)
0.5
H
-10 0 10 -π ω π
n
Figure 2.7 Discrete-time signal in a form of rectangular window of the widths 2 N + 1 = 9 and
2N + 1 = 17 samples (top and middle), and a Hann(ing) window with 2N + 1 = 17 (bottom).
The time domain values are on the left while the Fourier transforms of these discrete-time
signals are on the right.
As the window width increases in the time domain the main lobe width in
the Fourier domain is narrowing. The first zero value of the Fourier transform
of a rectangular window is at ω (2N + 1)/2 = π, i.e., at ω = 2π/(2N + 1)
where 2N + 1 is the signal duration. In the case of a Hann(ing) window
the main lobe is wider as compared to the rectangular window of the same
width, but its convergence is much faster with very reduced oscillations in
the Fourier transform, Fig.2.7.
72 Discrete-Time Signals and Transforms
∞ ∞
FT{ x (n) ∗n h(n)} = ∑ ∑ x (k )h(n − k )e− jnω = X (e jω ) H (e jω ), (2.33)
n=−∞ k=−∞
∞
H (e jω ) = ∑ h(n)e− jωn
n=−∞
Example 2.11. Find the output of a discrete linear time-invariant system with
frequency response H (e jω ) if the input signals are:
(a) x (n) = Ae jω0 n and (b) x (n) = A cos(ω0 n + ϕ). What is the output if
the impulse response h(n) is real-valued?
∞ ∞
y(n) = ∑ h(k ) x (n − k ) = ∑ h(k) Ae jω0 (n−k)
k=−∞ k=−∞
∞
= Ae jω0 n ∑ h(k) Ae− jω0 k = Ae jω0 n H (e jω0 )
k =−∞
' '
' ' jω
= A 'H (e jω0 )' e j(ω0 n+arg{ H (e 0 }) .
A j ( ω0 n + ϕ ) A − j ( ω0 n + ϕ )
x (n) = A cos(ω0 n + ϕ) = e + e .
2 2
A '' '
' jω0
y(n) = 'H (e jω0 )' e j(ω0 n+ ϕ)+ j arg{ H (e )}
2
A '' '
' − jω
+ 'H (e− jω0 )' e− j(ω0 n+ ϕ)+ j arg{ H (e 0 )} .
2
H (e jω ) = H ∗ (e− jω )
Ljubiša Stanković Digital Signal Processing 73
and
∞ ∞
H (e jω ) = ∑ h(n) cos(ωn) + j ∑ h(n) sin(ωn)
n=−∞ n=−∞
' '2 ' '2
' ' ' '
'H (e jω )' = 'H (e− jω )'
! ∞ 6
∑n=−∞ h(n) sin(ωn)
jω
arg{ H (e ) = arctan = − arg{ H (e− jω ).
∑∞ n=−∞ h (n ) cos( ωn )
∞ ∞ "π
1
∑ ∑ 2π X (e jω )e jωn y∗ (n)dω
x (n)y∗ (n) = (2.35)
n=−∞ n=−∞ −π
"π
( ) "π
∞
1 1
=
2π
jω
X (e ) ∑ (e y(n)) dω = 2π X (e jω )Y∗ (e jω )dω.
− jωn ∗
−π n=−∞ −π
1 ' '2
' '
Pxx (e jω ) = lim 'X N (e jω )' , (2.37)
N →∞ 2N + 1
N N
1
Pxx (e jω ) = lim ∑ ∑ x (n) x ∗ (m)e− jω (n−m) . (2.38)
N →∞ 2N + 1 n=− N m=− N
2N
1
Pxx (e jω ) = lim ∑ (2N + 1 − |k |)r (k )e− jωk
N →∞ 2N + 1
k=−2N
with a period 2Ω0 . It is very important to note that X p (Ω) = X (Ω) for
|Ω| < Ω0 if
Ω0 > Ω m .
In this case, it is possible to make transformation from X (Ω) to X p (Ω) and
back without losing any information.
Of course, that would not be the case if Ω0 > Ωm did not hold. By
periodic extension of X (Ω), in that case, overlapping (aliasing) would have
occurred in X p (Ω). It would not be reversible. Then it would not be possible
to recover X (Ω) from X p (Ω). The periodic extension is illustrated in Fig. 2.8.
The periodic function X p (Ω) can be expanded into Fourier series with
coefficients
"Ω0 "∞
1 jπΩn/Ω0 1
X− n = X p (Ω)e dΩ = X (Ω)e jπΩn/Ω0 dΩ.
2Ω0 2Ω0
− Ω0 −∞
The integration limits are extended to the infinity since X (Ω) = X p (Ω)
within the basic period interval and X (Ω) = 0 outside this interval.
76 Discrete-Time Signals and Transforms
X(Ω)
-Ω Ω Ω
m m
X (Ω) = X(Ω)
p
-Ω <Ω<Ω
0 0
- Ω0 - Ωm Ωm Ω0 Ω
Figure 2.8 The Fourier transform of a signal, with X (Ω) = 0 for |Ω| > Ωm (top) and its
periodically extended version, with period 2Ω0 > 2Ωm (bottom).
"∞
1
x (t) = X (Ω)e jΩt dΩ. (2.41)
2π
−∞
with ∆t = π/Ω0 .
Ljubiša Stanković Digital Signal Processing 77
"∞ "Ω0
1 jΩt 1
x (t) = X (Ω)e dΩ = X p (Ω)e jΩt dΩ (2.44)
2π 2π
−∞ − Ω0
"Ω0
( )
∞
1
= ∑ Xn e jπnΩ/Ω0 e jΩt dΩ
2π n=−∞
− Ω0
"Ω0
( )
∞
1 jπnΩ/Ω0
= ∑ x (−n∆t)∆te e jΩt dΩ
2π n=−∞
− Ω0
as
∞ π
sin( ∆t (t − n∆t))
x (t) = ∑ x (n∆t) π . (2.45)
n=−∞ ∆t (t − n∆t )
The signal x (t), for any t, is expressed in terms of its samples x (n∆t).
Example 2.13. The last relation can be used to prove that X (Ω) = X (e jω ) with
Ω∆t = ω and |ω | < π for the signals sampled at the rate satisfying the
sampling theorem.
⋆Starting from
"∞
X (Ω) = x (t)e− jΩt dt
−∞
the signal x (t), satisfying the sampling theorem, can by written in terms of
samples, according to the third row of (2.44), as
"Ω0
( )
∞
1 − j∆tnθ
x (t) = ∑ x (n∆t)∆te e jθt dθ.
2π n=−∞
− Ω0
It follows
"∞ "Ω0
( )
∞
1 − j∆tnθ
X (Ω) = ∑ x (n∆t)∆te e jθt dθe− jΩt dt
2π n=−∞
−∞ − Ω0
∞ "Ω0
= ∑ x (n∆t)∆t δ(θ − Ω)e− j∆tnθ dθ
n=−∞
− Ω0
∞
= ∑ x (n∆t)∆te− j∆tnΩ for |Ω| < Ω0 (2.46)
n=−∞
78 Discrete-Time Signals and Transforms
resulting in
∞
X (Ω) = ∑ x (n)e− jωn for |ω | < π
n=−∞
Example 2.14. If the highest frequency in a signal x (t) is Ωm1 and the highest
frequency in a signal y(t) is Ωm2 what should be the sampling interval for the
signal x (t)y(t) and for the signal x (t − t1 )y∗ (t − t2 )? The highest frequency
Ωm in a signal is used in the sense that the Fourier transform of the signal is
zero for |Ω| > Ωm .
2 2 2
X p (Ω) = ... + + + + ...
1 + (Ω + 20π )2 1 + Ω2 1 + (Ω − 20π )2
Ljubiša Stanković Digital Signal Processing 79
Thus, the value of X p (Ω) at the period ending points ±10π will approxi-
mately be X p (±10π ) = 2/(1 + 100π 2 ) ∼ = 0.002. Comparing with the maxi-
mum value X p (0) = 2, it means that the expected error due to the discretiza-
tion of this signal (since it does not strictly satisfy the sampling theorem) will
be of a 0.1% order.
(b) The discrete-time signal obtained by sampling x (t) = exp(− |t|)
with ∆t = 0.1 is x (n) = 0.1e−0.1|n| . Its Fourier transform is already calculated
with A = 0.1 and α = 0.1, eq.(2.22). The result is
1 − e−0.2
X (e jω ) = 0.1 . (2.48)
1 − 2e−0.1 cos(ω ) + e−0.2
1
y(n) = cos(nπ/4 + π/4)∆t
2
corresponding to the continuous-time signal
1 π 1
y(t) = cos(n + π/4) = cos(25πt + π/4).
2 4∆t 2
2.4 PROBLEMS
Problem 2.1. Check the periodicity and find the period of signals:
(a) x (n) = sin(2πn/32), (b) x (n) = cos(9πn/82), (c) x (n) = e jn/32 , and (d)
x (n) = sin(πn/5) + cos(5πn/6) − sin(πn/4).
Problem 2.2. Check the linearity and time-invariance of the discrete system
described by equation
y(n) = x (n) + 2.
h(n) = h1(n)*h2(n)*h2(n)
12 3
10 2
8
1
x(n)
6
4 0
2 -1
0
-2
-2 0 2 4 6 8 -2 0 2 4 6 8
n n
Figure 2.9 Problem 2.7, impulse response h(n) (left) and Problem 2.14, discrete signal x (n)
(right).
Problem 2.5. Find the convolution of signals x (n) = e−|n| and h(n) = u(n +
5) − u ( n − 6).
Problem 2.6. A discrete system consists of systems with impulse responses
h1 (n) = e− an u(n), h2 (n) = e−bn u(n), and h3 (n) = u(n). Find the impulse
response of the resulting system for:
(a) Systems h1 (n), h2 (n), and h3 (n) connected in parallel,
(b) System h1 (n) connected in parallel with a cascade of systems h2 (n)
and h3 (n).
Problem 2.7. Consider three causal linear time-invariant systems in cas-
cade. Impulse responses of these systems are h1 (n), h2 (n), and h2 (n), re-
spectively. The impulse response of the second and the third system is
h2 (n) = u(n) − u(n − 2), while the impulse response of the whole system,
h ( n ) = h 1 ( n ) ∗ n h 2 ( n ) ∗ n h2 ( n ),
Find
∞
S= ∑ ne−n/2 .
n =0
82 Discrete-Time Signals and Transforms
H (e jω ) ∼
= jω for small ω,
i.e., ' '
dH (e jω ) '' d2 H (e jω ) ''
= j and = 0.
dω 'ω =0 dω 2 'ω =0
Problem 2.11. Find the Fourier transform of the following discrete-time
signal (triangular window)
* +
|n|
wT (n) = 1 − [u(n + N ) − u(n − N − 1)].
N+1
with N being an even number.
Problem 2.12. Find the value of integral
"π
1 sin2 (( N + 1)ω/2)
I= dω.
2π sin2 (ω/2)
−π
w(n) = w H (n + N ) + w H (n) + w H (n − N )
Problem 2.14. A discrete-time signal x (n) is given in Fig. 2.9 (right). Without
calculating its Fourier transform X (e jω ) find
Using this Fourier transform find the center of gravity of signal x (n) =
e−n/4 u(n) defined by
∞
∑ nx (n)
n=−∞
ng = ∞ .
∑ x (n)
n=−∞
sin(nπ/3)
(a) h(n) = , with h(0) = 1/3,
nπ
sin2 (nπ/3)
(b) h(n) = ,
(nπ )2
sin((n − 2)π/4)
(c) h(n) = .
( n − 2) π
Show that the frequency response of the system with h(n) = sin(nπ/3)/nπ
is H (e jω ) = 1 for |ω | ≤ π/3 and H (e jω ) = 0 for π/3 < |ω | < π. Find the
frequency responses in other two cases. Find the systems output to the input
signal x (n) = sin(nπ/6).
where xh (n) is the Hilbert transform of x (n). Find the impulse response of
the system that transforms a signal x (n) into its Hilbert transform (Hilbert
transformer).
Problem 2.20. For a signal whose Fourier transform is zero for frequencies
Ω ≥ Ωm = 2π f m = π/∆t show that
"∞
sin(π (t − τ )/∆t)
x (t) = x (τ ) dτ.
π (t − τ )
−∞
Problem 2.21. Sampling of a signal is done twice, with the sampling interval
∆t = 2π/Ωm that is twice larger than the sampling interval required by the
sampling theorem (∆t = π/Ωm is required). After first sampling process,
the discrete-time signal x1 (n) = ∆tx (n∆t) is formed, while after the second
sampling process signal x2 (n) = ∆tx (n∆t + a) is formed. Show that we can
reconstruct continuous-time signal x (t) based on x1 (n) and x2 (n) if a ̸= k∆t,
that is, if samples x1 (n) and x2 (n) do not overlap in continuous-time.
Problem 2.23. Show that the relation among the amplitudes of a signal
x (n) and its even and odd parts xe (n) = [ x (n) + x (−n)]/2 and xo (n) =
[ x (n) − x (−n)]/2 is
√
As (n) ≤ | xe (n)| + | xo (n)| ≤ 2As (n)
@ A
with As (n) > 0 defined by A2s (n) = | x (n)|2 + | x (−n)|2 /2.
2.5 SOLUTIONS
T { x ( n − N ) = x ( n − N ) + 2 = y ( n − N ).
h(n) = T {δ(n)}.
86 Discrete-Time Signals and Transforms
It can be written as
h(n) = T {u(n) − u(n − 1)}.
∞ ∞
2−1
∑ |h(n)| = 1 + ∑ 2− n = 1 + = 2.
n=−∞ n =1 1 − 2−1
The system is stable since the sum of absolute values of impulse response is
finite.
∞
y (0) = ∑ x (k ) x (−k ) = x (0) x (0) = 1
k=−∞
∞
y (1) = ∑ x ( k ) x (1 − k ) = x (0 ) x (1 ) + x (1 ) x (0 ) = 2
k=−∞
∞
y(−1) = ∑ x (k ) x (−1 − k ) = 0
k=−∞
∞
y (2) = ∑ x ( k ) x (2 − k ) = 3
k=−∞
...
1.5 1.5
1 1
x(-k )
x(k)
0.5 0.5
0 0
-0.5 -0.5
-15 -10 -5 0 5 10 15 -15 -10 -5 0 5 10 15
k k
1.5 1.5
1 1
x(1-k )
x(2-k )
0.5 0.5
0 0
-0.5 -0.5
-15 -10 -5 0 5 10 15 -15 -10 -5 0 5 10 15
k k
1.5 6
x(n)* x(n)
1
x(-1-k )
4
0.5
2
0
-0.5 0
-15 -10 -5 0 5 10 15 -15 -10 -5 0 5 10 15
k n
with !
1, for k ≤ n + 5
u((n − k ) + 5) =
0, for k > n + 5
and !
1, for k ≤ n − 6
u((n − k ) − 6) =
0, for k > n − 6
we get
!
1, for n − 6 < k ≤ n + 5
(u((n − k) + 5) − u((n − k) − 6)) =
0, elsewhere.
88 Discrete-Time Signals and Transforms
Since !
k, for k ≥ 0
|k | = ,
−k, for k < 0
we have three cases:
1) For n + 5 ≤ 0, i.e., n ≤ −5, we have k ≤ 0 for all terms. Therefore |k | = −k,
n +5
1 − e11 e −5 − e 6
y(n) = ∑ e k = e n −5 = en
k = n −5
1−e 1−e
e0.5 e−5.5 − e5.5 sinh 5.5
= en 0.5 −0.5 = en
e e − e0.5 sinh 0.5
2) For n − 5 ≥ 0, the lowest k = n − 5 is greater than 0. Then k ≥ 0 for all
terms and |k | = k with
n +5
1 − e−11 5
−n e − e
−6
y(n) = ∑ e − k = e − n +5 = e
k = n −5
1 − e −1 1 − e −1
e−0.5 e5.5 − e−5.5 sinh 5.5
= e−n −0.5 0.5 = e−n .
e e − e−0.5 sinh 0.5
3) For −5 < n < 5, index k can assume positive and negative values. The
convolution is split into two sums as
n +5 −1 n +5 5− n n +5
y(n) = ∑ e−|k| = ∑ ek + ∑ e−k = ∑ e−k + ∑ e−k
k = n −5 k = n −5 k =0 k =1 k =0
1−e −( 5 − n ) 1 − e n +6)
−(
= e −1 + =
1 − e −1 1 − e −1
1 − e n −5 1/2 1 − e
−(n+6)
= e−1/2 1/2 + e
e − e−1/2 e1/2 − e−1/2
1
= 0.5 (e−0.5 − en−5.5 + e0.5 − e−n−5.5 ) =
e − e−0.5
−e−5.5 (en + e−n ) + e−0.5 + e0.5
=
e0.5 − e−0.5
cosh 0.5 − e−5.5 cosh(n)
= .
sinh 0.5
Ljubiša Stanković Digital Signal Processing 89
where
∞
h23 (n) = ∑ h 3 ( m ) h2 ( n − m ) = h 2 ( n ) ∗ h 3 ( n ).
m=−∞
The impulse response of the whole system is
h(n) = h1 (n) + h23 (n) = h1 (n) + h2 (n) ∗ h3 (n),
with
∞
h2 ( n ) ∗ h3 ( n ) = ∑ e−b(n−m) u (n − m )u (m )
m=−∞
n
1 − e b ( n +1) e−bn − eb
= u(n) ∑ e−b(n−m) = e−bn 1 − eb
u ( n ) =
1 − eb
u ( n ).
m =0
90 Discrete-Time Signals and Transforms
It follows
∞
e−(1/2+ jω )
H (e jω ) = ∑ ne−n/2 e− jωn = .
n =0 (1 − e−(1/2+ jω ) )2
The output for a real-valued h(n) is
' '
' '
y(n) = 5 'H (e jπ/10 )' sin(πn/5 + arg{ H (e jπ/10 })
' ' ' '
' ' ' '
− 3 'H (e jπ/6 )' cos(πn/3 + π/6 + 'H (e jπ/6 )')
=14.1587 sin(πn/5 − 1.1481)
− 5.7339 cos(πn/3 + π/6 − 1.6605).
Ljubiša Stanković Digital Signal Processing 91
Solution 2.10. For the impulse response h(n) the frequency response is
a + 2b = 1/2
a + 4b = 0
1
h(n) = δ(n + 1) − δ(n − 1) − (δ(n + 2) − δ(n − 2)).
4
Solution 2.11. Note that
1
wT (n) = w R (n) ∗n w R (n)
N+1
1 1 sin2 (ω N2+1 )
WT (e jω ) = WR (e jω )WR (e jω ) = .
N+1 N + 1 sin2 (ω/2)
Ljubiša Stanković Digital Signal Processing 93
sin(ω N2+1 )
X (e jω ) = .
sin(ω/2)
This signal is the rectangular window, x (n) = u(n + N/2) − u(n − N/2 − 1).
Its energy is
1
w H (n) = [1 + cos(nπ/N )] [u( N + n) − u(n − N − 1)] .
2
w(n) = w H (n) + w H (n − N )
1 1
= [1 + cos(nπ/N )] + [1 + cos((n − N )π/N )]
2 2
1 1
= 1 + cos(nπ/N ) + cos(nπ/N − π ) = 1.
2 2
The same holds for − N ≤ n ≤ −1 when
w(n) = w H (n + N ) + w H (n) = 1.
window WH (e jω ), is
For
K
w(n) = ∑ w H (n + kN )
k=−K
we get
⎧
⎪
⎪ 0D E for n < −(K + 1) N
⎪
⎪ π
⎨ 12 1 + cos((n + KN ) N ) for −(K + 1) N + 1 ≤ n ≤ −KN + 1
w(n) = 1D E for −KN ≤ n ≤ KN − 1
⎪
⎪ 1 π
⎪
⎪ 2 1 + cos((n − KN ) N ) for KN ≤ n ≤ (K + 1) N − 1
⎩
0 for n > ( K + 1) N − 1
with
K
1 − e− jω (2K +1) N
W (e jω ) = WH (e jω ) ∑ e− jωkN = e jωKN
k=−K 1 − e− jωN
sin(ω (2K + 1) N/2)
= WH (e jω ) .
sin(ωN/2)
Similar results hold for the Hamming and triangular window. The results
can be generalized for shifts of N/2, N/4,...
For very large K the second term variations in W (e jω ) are much faster
than the variations of WH (e jω ). Thus, for large K the Fourier transform
W (e jω ) approaches to the Fourier transform of a rectangular window of the
width (2K + 1) N.
Solution 2.14. Based on the definition of the Fourier transform of discrete-
time signals,
∞
X (e j0 ) = ∑ x (n) = 7,
n=−∞
∞
X (e jπ ) = ∑ x (n)(−1)n = 1,
n=−∞
"π
X (e jω )dω = 2πx (0) = 4π,
−π
Ljubiša Stanković Digital Signal Processing 95
1B C
Re{ X (e jω )} = X (e jω ) + X ∗ (e jω ) .
2
The inverse Fourier transform of Re { X (e jω )} is
1
y(n) = ( x (n) + x ∗ (−n)).
2
Solution 2.15. The Fourier transform of y(n) is
> ?
∞ ∞
d
Y (e jω ) = ∑ ne−n/4 u(n)e− jωn = j ∑ e−n/4 e− jωn
n=−∞ dω n =0
d 1 e − 1/4 − jω
=j = .
dω 1 − e 1/4− jω
− (1 − e−1/4− jω )2
is
π/3
" 'π/3
1 e jωn '' sin(πn/3)
h(n) = e jωn dω = = .
2π 2jπn '−π/3 πn
−π/3
The value of frequency response at the input signal frequency ω = ±π/6 is
H (e± jπ/6 ) = 1. The output signal is, y(n) = sin(nπ/6).
(b) The frequency response, in this case, is H (e jω ) ∗ω H (e jω ), resulting in
y(n) = 0.25 sin(nπ/6).
(c) Output signal in this case is y(n) = sin((n − 2)π/6) = sin(nπ/6 − π/3).
96 Discrete-Time Signals and Transforms
π ∞
X (e jω ) = ∑ [δ(ω − 0.2π + 2kπ )e jπ/4 + δ(ω + 0.2π + 2kπ )e− jπ/4 ]
100 k=− ∞
∞
π
+ ∑
j100 k=−∞
[δ(ω − 0.9π + 2kπ ) − δ(ω + 0.9π + 2kπ )].
π ∞
X (e jω ) = ∑ [δ(ω − 0.4π + 2kπ )e jπ/4 + δ(ω + 0.4π + 2kπ )e− jπ/4 ]
50 k=− ∞
π ∞
+ ∑ [δ(ω − 1.8π + 2kπ ) − δ(ω + 1.8π + 2kπ )].
j50 k=− ∞
Ljubiša Stanković Digital Signal Processing 97
H(ejω), X(ejω)
1
H(ejω), X(ejω)
1
jω jω
H(e ), X(e )
1
Figure 2.11 Illustration of the system output with various sampling intervals (a)-(c).
π
X (e jω ) = [δ(ω − 0.4π )e jπ/4 + δ(ω + 0.4π )e− jπ/4 ]
50
π
+ [δ(ω − 1.8π + 2π ) − δ(ω + 1.8π − 2π )]
j50
π
= [δ(ω − 0.4π )e jπ/4 + δ(ω + 0.4π )e− jπ/4 ]
50
π
+ [δ(ω + 0.2π ) − δ(ω − 0.2π )].
j50
x(n)
Figure 2.12 Illustration of the aliasing caused frequency change, from signal sin (90πt) to
signal − sin(10πt).
H(ejω) h(n)
1 2/π
-2 π -π 0 π 2π ω
0 n
Figure 2.13 Frequency and impulse response of the discrete-time Hilbert transformer.
1.5 1.5
1 1
X (Ω)
X(Ω)
p
0.5 0.5
0 0
-3 -2 -1 0 1 2 3 4 5 6 7 -3 -2 -1 0 1 2 3 4 5 6 7
Ω/Ω1 Ω/Ω1
Figure 2.14 Problem 2.19: illustration of the Fourier transform periodic extension.
∞
sin(π (t − n∆t)/∆t)
x (t) = e j4Ω1 t ∑ x (n∆t) with ∆t = π/Ω1 .
n=−∞ π (t − n∆t)/∆t
100 Discrete-Time Signals and Transforms
Solution 2.20. For signal whose Fourier transform is zero for frequencies
Ω ≥ Ωm = 2π f m = π/∆t hods
where !
1 for |Ω| < π/∆t
H (Ω) = .
0 for |Ω| ≥ π/∆t
The impulse response of H (Ω) is
1 π/∆t
& jΩt sin(πt/∆t)
h(t) = e dΩ = .
2π −π/∆t πt
"∞ "∞
sin(π (t − τ )/∆t)
x (t) = x (τ )h(t − τ )dτ = x (τ ) dτ.
π (t − τ )
−∞ −∞
Relation (1.60)
* + % ; % ;
2π ∞ 2π ∞ ∞
∑ δ Ω − ∆t k = FT ∑ δ(t + n∆t) = FT ∑ δ(t − n∆t)
∆t k=− ∞ n=−∞ n=−∞
is used.
Ljubiša Stanković Digital Signal Processing 101
"∞ ∞
x (t) = x p (t) ∗t h(t) = x (τ ) ∑ δ(τ − n∆t)h(t − τ )∆tdτ
−∞ n=−∞
∞ ∞ π
sin( ∆t (t − n∆t))
= ∑ x (n∆t)h(t − n∆t)∆t = ∑ x (n∆t) π . (2.52)
n=−∞ n=−∞ ∆t ( t − n∆t )
∆Ωm
with a reduction of the sampling interval to ∆t = π/(Ωm + 2 ) with
respect to ∆t = π/Ωm .
Solution 2.21. The Fourier transforms of discrete-time signals, in continu-
ous frequency notation, are periodically extended versions of X (Ω) with the
period 2π/∆t,
∞
X1 ( Ω ) = ∑ X (Ω + 2πn/∆t),
n−−∞
∞
X2 ( Ω ) = ∑ X (Ω + 2πn/∆t)e j(Ω+2πn/∆t)a .
n−−∞
H(Ω)
X(Ω)
- Ωm Ωm Ω
X (Ω) = X(Ω)
p
H(Ω) -Ω <Ω<Ω
0 0
X(Ω)
-Ω -Ω Ω Ω Ω
0 m m 0
Xp(Ω) = X(Ω)
- Ω0 < Ω < Ω0
X(Ω)
-Ω -Ω Ω Ω Ω
0 m m 0
Figure 2.15 Smoothed filter in the sampling theorem illustration (first two graphs) versus
original sampling theorem relation within filtering framework.
Similarly for negative frequencies, within the basic period −Ωm < Ω < 0,
follows
X (Ω)e j2πa/∆t − X2 (Ω)e− jΩa
X (Ω) = 1 for a ̸= k∆t.
e j2πa/∆t − 1
Ljubiša Stanković Digital Signal Processing 103
with * +
1 x (t0 + ∆t) + x (t0 − ∆t)
Ω0 = arccos .
∆t 2x (t0 )
The condition for a unique solution is that the argument of cosine is 0 ≤
Ω0 ∆t ≤ π, limiting the approach to small values of ∆t.
In addition, here we will discuss the discrete complex-valued signal.
For a complex sinusoid x (n) = A exp( j2πk0 n/N + φ0 ), with available two
samples x (n1 ) = A exp( jϕ(n1 )) and x (n2 ) = A exp( jϕ(n2 )), from
x ( n1 )
= exp( j2πk0 (n1 − n2 )/N )
x ( n2 )
follows
2πk0 (n1 − n2 )/N = ϕ(n1 ) − ϕ(n2 ) + 2kπ,
where k is an arbitrary integer. Then
ϕ ( n1 ) − ϕ ( n2 ) k
k0 = N+ N. (2.53)
2π (n1 − n2 ) n1 − n2
k0 = 5 + 16k/4,
| x (n)|2 + | x (−n)|2
| xe (n)|2 + | xo (n)|2 = = A2s (n).
2
Obviously
F | xe (n)|2 ≤ A2s (n) and | xo (n)|2 ≤ A2s (n). Replacing | xo (n)| =
A2s (n) − | xe (n)|2 into | xe (n)| + | xo (n)| we get
F
| xe (n)| + | xo (n)| = | xe (n)| + A2s (n) − | xe (n)|2 .
2.6 EXERCISE
Exercise 2.1. Calculate the convolution of signals x (n) = n[u(n) − u(n − 3)]
and h(n) = δ(n + 1) + 2δ(n) − δ(n − 2).
Exercise 2.2. Find the convolution of signals x (n) = e−|n| and h(n) = u(3 −
n ) u (3 + n ).
Exercise 2.3. The output of a linear time-invariant discrete system to the
input signal x (n) = u(n) is y(n) = ( 31n + n)u(n). Find the impulse response
h(n). Is the system stable?
Exercise&2.4. For signal x (n&) = nu(5 − n)u(n + 5) find the values of X (e j0 ),
π π
X (e jπ ), −π X (e jω )dω, and −π | X (e jω )|2 dω without the Fourier transform
calculation. Check the results by calculating the Fourier transform.
Exercise 2.5. For a signal x (n) at an instant m a signal y(n) = x (m −
n) x ∗ (m + n) is formed. Show that the Fourier transform of y(n) is real-
valued. What is the Fourier transform of y(n) if x (n) = A exp( jan2 /4 +
j2ω0 n)? Find the Fourier transform of z(m) = x (m − n) x ∗ (m + n) for a given
n.
Note: The Fourier transform of y(n) is the Wigner distribution of x (n)
for a given m, while the Fourier transform of z(m) is the Ambiguity function
of x (n) for a given n.
Exercise 2.6. For a signal x (n) with Fourier transform X (e jω ) find the
Fourier transform of x (2n). Find the Fourier transform of y1 (2n) = x (2n)
and y1 (2n + 1) = 0. What is the Fourier transform of x (2n + 1) and what is
the Fourier transform of y2 (2n) = 0 and y2 (2n + 1) = x (2n + 1). Check the
result by showing that Y1 (e jω ) + Y2 (e jω ) = X (e jω ).
Exercise 2.7. For a real-valued signal find the relation between the Fourier
transform of signal X (e jω ) and the Hartley transform
∞
H (e jω ) = ∑ x (n)[cos(ωn) + sin(ωn)].
n=−∞
Write this relation if the signal is real-valued and even, x (n) = x (−n).
Exercise 2.8. Systems with impulse responses h1 (n), h2 (n) and h3 (n) are
connected in cascade. If the impulse responses h2 (n) = h3 (n) = u(n) −
u(n − 2) and the resulting impulse response is h(n) = δ(n) + 5δ(n − 1) +
10δ(n − 2) + 11δ(n − 3) + 8δ(n − 4) + 4δ(n − 5) + δ(n − 6). Find the impulse
response h1 (n).
106 Discrete-Time Signals and Transforms
for k = 0, 1, 2, ..., N − 1.
In order to establish the relation between the DFT with the Fourier
transform of discrete-time signals, consider a discrete-time signal x (n) of
limited duration. Assume that nonzero samples of x (n) are within 0 ≤ n ≤
N0 − 1. Its Fourier transform is
N0 −1
X (e jω ) = ∑ x (n)e− jωn .
n =0
The DFT values can be considered as the frequency domain samples of the
Fourier transform of discrete-time signals, taken at ∆ω = 2π/N. There are
N frequency samples within the period −π ≤ ω < π,
'
'
X (k ) = X (e j2πk/N ) = X (e jω )' . (3.2)
ω =k∆ω =2πk/N
107
108 Discrete Fourier Transform
x(n)
0 N0 n
xp(n)
0 N n
"T
1
Xk = x p (t)e− j2πkt/T dt.
T
0
Assuming that the sampling theorem is satisfied, the integral can be re-
placed by a sum (in the sense of Example 2.13)
N −1
1
Xk = ∑ x (n∆t)e− j2πkn∆t/T ∆t
T n =0
Ljubiša Stanković Digital Signal Processing 109
with x p (t) = x (t) within 0 ≤ t < T. Using T/∆t = N, x (n∆t)∆t = x (n) and
X (k ) = TXk this sum can be written as
N −1
X (k ) = ∑ x (n)e− j2πkn/N . (3.3)
n =0
Therefore, the relation between the DFT and the Fourier series coeffi-
cients is
X (k ) = TXk . (3.4)
Sampling the Fourier transform of a discrete-time signal corresponds to
the periodical extension of the original discrete-time signal in time by the
period N. The period N in time is equal to the number of samples of the
Fourier transform within one period in frequency. We can conclude that this
periodic extension in time (discretization in frequency) will not influence
the possibility to recover the original signal if the original discrete-time
signal duration was not longer than N (the number of samples in the Fourier
transform of discrete-time signal).
The inverse DFT is obtained by multiplying both sides of the DFT
definition (3.1) by e j2πkm/N and summing over k
N −1 N −1 N −1
∑ X (k )e j2πmk/N = ∑ x (n) ∑ e j2πk(m−n)/N
k =0 n =0 k =0
with
N −1
1 − e j2π (m−n)
∑ e j2πk(m−n)/N = = Nδ(m − n),
k =0 1 − e j2π (m−n)/N
for 0 ≤ m, n ≤ N − 1. The inverse discrete Fourier transform (IDFT) of signal
x (n) is
1 N −1
X (k )e j2πnk/N .
N k∑
x (n) = (3.5)
=0
for 0 ≤ n ≤ N − 1.
The signal calculated by using the IDFT is, by definition, periodic with
the period N since
N −1
1
x (n + N ) = ∑ X (k )e j2π (n+ N )k/N = x (n).
N k =0
Therefore the DFT of a signal x (n) calculated using the signal samples
within 0 ≤ n ≤ N − 1 assumes that the signal x (n) is periodically extended
110 Discrete Fourier Transform
with period N as
∞
IDFT{DFT{ x (n)}} = ∑ x (n + mN )
m=−∞
∞
with ∑ x (n + mN ) = x (n) for 0 ≤ n ≤ N − 1.
m=−∞
The values of this periodical extension within the basic period are equal to
x (n). This is a circular extension of signal x (n). The following notations are
also used for this kind of the signal x (n) extension
assuming that the initial DFT was calculated for signal samples x (n) within
0 ≤ n ≤ N − 1.
In literature it is quite common to use the same notation for both x (n)
and IDFT{DFT{ x (n)}} having in mind that any DFT calculation with N
signal samples implicitly assumes a periodic extension of the original signal
x (n) with period N. Thus, we will use this kind of notation, except in the
cases when we want to emphasize a difference in the results when the
inherent periodicity in the signal (when the DFT is used) is not properly
taken into account.
Example 3.1. For the signals x (n) = 2 cos(2πn/8) for 0 ≤ n ≤ 7 and x (n) =
2 cos(2πn/16) for 0 ≤ n ≤ 7 plot the periodic signals IDFT {DFT{ x (n)}} with
N = 8 without calculating the DFTs.
∞
IDFT{DFT{ x (n)}} = ∑ x (n + 8m)
m=−∞
Example 3.3. For a signal x (n) whose values are x (0) = 1, x (1) = 1/2, x (2) = −1,
and x (3) = 1/2 find the DFT with N = 4. What is the IDFT for n = −2?
Ljubiša Stanković Digital Signal Processing 111
x(n) x(n)
0 N=8 n 0 N=8 n
...x(n-N)+x(n)+x(n+N)+.. ...x(n-N)+x(n)+x(n+N)+..
0 N=8 n 0 N=8 n
Figure 3.2 Signals x (n) = 2 cos(2πn/8) for 0 ≤ n ≤ 7 (left) and x (n) = 2 cos(2πn/16) for
0 ≤ n ≤ 7 (right) along with their periodic extensions IDFT {DFT{ x (n)}} with N = 8.
The IDFT is
1 3
[1 + cos(2πk/4) + (−1)k+1 ]e j2πnk/4 ,
4 k∑
x (n) =
=0
for 0 ≤ n ≤ 3. The DFT and IDFT inherently assume the signal and its Fourier
transform periodicity. Thus the result for n = −2 is
1 3 k 1 3 k
x (−2) = ∑ X (k )e j2π (−2) 4 = ∑ X (k )e j2π (4−2) 4 = x (4 − 2) = x (2) = −1.
4 k =0 4 k =0
Example 3.4. Assume that there is a routine to calculate the DFT of x (n) for
0 ≤ n ≤ N − 1 as X (k ) = DFT{ x (n)} = R{ x (n)}. How to use it to calculate the
DFT of a signal x (n) whose values are given within − N/2 ≤ n ≤ N/2 − 1?
⋆A periodic extension of the signal x (n) is assumed when the DFT
is calculated. It means that in the DFT calculation the signal x (n), defined
within − N/2 ≤ n ≤ N/2 − 1, will be extended with the period N. Here, we
112 Discrete Fourier Transform
Here, we have used the property that for a signal y(n) periodic with a period
N holds ∑nN=−01 y(n) = ∑nM=+MN −1 y(n) for any M (Generalize the result for the
DFT calculation and inversion for a signal x (n) defined within M ≤ n ≤
M + N − 1, using the given routine R{ x (n)}).
where
k
WN = e− j2πk/N
is used to simplify the notation, especially in graphical illustrations.
The number of additions to calculate a DFT is N − 1 for each X (k )
in (3.1). Since there are N DFT coefficients the total number of additions is
Ljubiša Stanković Digital Signal Processing 113
N ( N − 1). From the matrix from (3.6) we can see that the multiplications
are not needed for calculation of X (0). There is non need for multiplication
in the first term of each coefficient calculation as well. If we neglect the fact
that some other terms in matrix (3.6) may also assume values 1, −1, j, or − j
then the number of multiplications is ( N − 1)2 . The order of the number of
multiplications and the number of additions for the DFT calculation is N 2 .
The inverse DFT in a matrix form is
x = W−1 X, (3.9)
Most of the DFT properties can be derived in the same way as in the Fourier
transform and the Fourier transform of discrete-time signals.
N −1
1 2π
IDFT{ X (k )e− j2πkn0 /N } = ∑ X (k )e− j2πkn0 /N e j N kn
N k =0
N −1
1 2π
= ∑ X ( k ) e j N k ( n − n0 ) = x ( n − n 0 ). (3.10)
N k =0
x ∗ ( n ) = x ( N − n ).
X (k ) = X ∗ (k )
or
N −1 N −1 N −1
∑ x (n)e− j2πnk/N = ∑ x ∗ (n)e j2πnk/N = ∑ x ∗ ( N − n)e j2π ( N −n)k/N ,
n =0 n =0 n =0
X ∗ ( k ) = X ( N − k ).
Calculate the convolution x (n) ∗ x (n). Extend signals with period N = 7 and
calculate the circular convolution (corresponding to the DFT based convolu-
tion calculation with N = 7, which is longer than the signal duration). Com-
pare the results. What value of N should be used for the period so that the
direct convolution corresponds to one period of the circular convolution?
⋆Signal x (n) and its reversed version x (−n), along with the
shifted signal used in the convolution calculation, are presented in Fig.3.3.
In the circular (DFT) calculation, for example, at n = 0, the con-
volution value is
6
x p (n) ∗ x p (n) = ∑ x p (m) x p (0 − m) = 1 + 1 + 1 = 3.
m =0
In addition to the term x (0) x (0) = 1 which exists in the aperiodic convolution,
two terms for m = 3 and m = 4 appeared due to the periodic extension of
116 Discrete Fourier Transform
1.5 6
x(n)* x(n)
1 4
x(n)
0.5
2
0
-0.5 0
-15 -10 -5 0 5 10 15 -15 -10 -5 0 5 10 15
n n
1.5 1.5
xp(-m+1 )
1 1
x (m)
0.5 0.5
p
0 0
-0.5 -0.5
-15 -10 -5 0 5 10 15 -15 -10 -5 0 5 10 15
n n
1.5 1.5
xp(-m+3 )
1 1
x (-m )
0.5 0.5
p
0 0
-0.5 -0.5
-15 -10 -5 0 5 10 15 -15 -10 -5 0 5 10 15
n n
1.5 6
xp(n)* xp(n)
x (-m+5 )
1 4
0.5
2
0
p
-0.5 0
-15 -10 -5 0 5 10 15 -15 -10 -5 0 5 10 15
n n
Figure 3.3 Illustration of the discrete-time signal convolution and circular convolution for
signals whose length is 5 and the circular convolution is calculated with N = 7.
the signal. They made that the circular convolution value differs from the
convolution of original aperiodic signals. The same situation occurred for
n = 1 and n = 2. For n = 3, 4, and 5 the correct result for aperiodic convolution
is obtained using circular convolution. It could be concluded that if the signal
in circular convolution were separated by at least two more zero values (if the
period N were N ≥ 9) this difference would not occur, Fig.3.4 for N = 9. Then
one period of circular convolution 0 ≤ n ≤ N − 1 would correspond to the
original aperiodic convolution.
1.5
1
xp(-m )
xp(m)
0.5
0
-0.5
-15 -10 -5 0 5 10 15 n
n
1.5 6
xp(n)* xp(n)
xp(-m+8 )
1 4
0.5
2
0
-0.5 0
-15 -10 -5 0 5 10 15 -15 -10 -5 0 5 10 15
n n
Figure 3.4 Illustration of the discrete-time signal circular convolution for signals whose
length is 5 and the circular convolution is calculated with N = 9.
Duration of the input signal x (n) may be much longer that the dura-
tion of the impulse response h(n). For example, an input signal may have
tens of thousands of samples, while the impulse response of a discrete sys-
tem duration is, for example, tens of samples, M ≫ L. A direct convolution
would be calculated (after first L − 1 output samples) as
n
y(n) = ∑ x ( m ) h ( n − m ).
m = n − L +1
For each output sample, L multiplications would be used. For a direct DFT
application in the convolution calculation we should wait until the end of
the signal and then zero-pad both the input signal and the impulse response
118 Discrete Fourier Transform
For the convolutions yk (n) = xk (n) ∗n h(n) calculation the signals xk (n)
and h(n) should be of duration N + L − 1 only. These convolutions can be
calculated after each N ≪ M input signal samples. The output sequence
yk (n) duration is N + L − 1. Since yk (n), k = 0, 1, . . . , K − 1, are calculated
with step N in time, they overlap, although the input signals xk (n) are
nonoverlapping. For two successive yk (n) and yk+1 (n) and L ≤ N, L − 1
samples within kN + N ≤ n < kN + N + L − 1 overlap. This should be taken
into account, by summing the overlapped output samples in y(n), after the
individual convolutions yk (n) = xk (n) ∗n h(n) are calculated using the DFTs,
Fig.3.5.
2π 2π
ω= k or Ω = k, for 0 ≤ k ≤ N/2 − 1, (3.15)
N N∆t
and the other part being a shifted version of the negative frequencies (in the
original aperiodic signal)
2π 2π
ω= (k − N ) or Ω = (k − N ), for N/2 ≤ k ≤ N − 1. (3.16)
N N∆t
Illustration of the frequency correspondence to the frequency index in the
DFT is given in Fig.3.6
Ljubiša Stanković Digital Signal Processing 119
x(n)
0 n
h(n)
0 n
x (n)
1
0 n
x (n)
2
0 n
x (n)
3
0 n
y1(n)
0 n
y2(n)
0 n
y (n)
3
0 n
y(n)
0 n
Figure 3.5 Illustration of the convolution calculation when the input signal duration is much
longer then the duration of the system impulse response.
120 Discrete Fourier Transform
X(Ω)|Ω=2πk/(NΔt)
-N/2 0 N/2-1 k
Ω=2πk/(NΔt)
X(k)
0 N k
Figure 3.6 Relation between the frequency in continuous-time and the DFT frequency index.
holds in the case when the sampling theorem is satisfied, then we see that by
increasing N in the DFT calculation, the density of sampling (interpolation)
in the Fourier transform of the original signal increases. The DFT interpo-
lation by zero padding the signal in the time domain is illustrated in Fig.
3.7.
The same holds for the frequency domain. If we calculate DFT with N
samples and then add, for example, N zeros after the region corresponding
to the highest frequencies, then by the IDFT of this 2N point DFT, we will
interpolate the original signal in time. All zero values in the frequency
domain should be inserted between two parts (regions) of the original DFT
corresponding to positive and negative frequencies.
Ljubiša Stanković Digital Signal Processing 121
x(n)
n
X(k)
k
x(n)
n
X(k)
k
x(n)
n
X(k)
Figure 3.7 Discrete-time signal and its DFT (top two subplots). Discrete-time signal zero-
padded and its DFT interpolated (two subplots in the middle). Zero-padding (interpolation)
factor was 2. Discrete-time signal zero-padded and its DFT interpolated (two bottom subplots).
Zero-padding (interpolation) factor was 4. According to the duality property, the same holds if
X (k) were signal in the discrete-time and x (−n) was its Fourier transform.
122 Discrete Fourier Transform
Example 3.6. The Hann(ing) window for a signal within − N/2 ≤ n ≤ N/2 − 1, is
1 2πn
w(n) = [1 + cos( )], for − N/2 ≤ n ≤ N/2 − 1. (3.18)
2 N
If the original signal values are within 0 ≤ n ≤ N − 1 then the Hann(ing)
window form is
1 2πn
w(n) = [1 − cos( )], for 0 ≤ n ≤ N − 1. (3.19)
2 N
Present the zero-padded forms of Hann(ing) windows with 2N samples.
⋆The zero-padded form of the Hann(ing) windows used for window-
ing data within the intervals − N/2 ≤ n ≤ N/2 − 1 and 0 ≤ n ≤ N − 1 are
shown in Fig.3.8. The DFTs of windows (3.18) and (3.19) are W (k ) = N [δ(k) +
δ(k − 1)/2 + δ(k + 1)/2]/2 and W (k ) = N [δ(k) − δ(k − 1)/2 − δ(k + 1)/2]/2,
respectively. After the presented zero-padding the window DFT realness
property w pz (n) = w pz (n − 2N ) is preserved (for an even N in the case
− N/2 ≤ n ≤ N/2 − 1 and for an odd N for data within 0 ≤ n ≤ N − 1).
"∞ "∞
x (t) = 1
2π X (Ω)e jΩt dΩ, X (Ω) = x (t)e− jΩt dt.
−∞ −∞
∞
x p (t) = ∑ x (t + mT )
m=−∞
∞ T/2
"
x p (t) = ∑ Xn e j2πnt/T
, Xn = 1
T x (t)e− j2πnt/T dt,
n=−∞
− T/2
1
Xn = X (Ω)|Ω=2πn/T .
T
Ljubiša Stanković Digital Signal Processing 123
w(n)
-N/2 0 N/2-1 n
wp(n)
0 N n
w (n)
p
0 N 2N n
w(n)
0 N n
wp(n)
0 N n
w (n)
p
0 2N n
Figure 3.8 Zero-padding of the Hann(ing) windows used to window data within − N/2 ≤
n ≤ N/2 − 1 and 0 ≤ n ≤ N − 1.
124 Discrete Fourier Transform
∞
2π
X (e jω ) = ∑ X (Ω + m ) .
m=−∞ ∆t |Ω=ω/∆t
The Fourier transform of the discrete-time signal is a periodic exten-
sion X (e jω ), ω = Ω∆t, of the Fourier transform X (Ω) of a continuous-
time signal. There is no overlapping (aliasing) if the width of the
Fourier transform of the original continuous-time signal is shorter
than the extension period 2π/∆t.
4. Discrete-time periodic signal (discrete Fourier transform)
∞
x p (n) = ∑ x (n + mN ) = x p (t)|t=n∆t ,
m=−∞
N −1 N −1
x p (n) = 1
N ∑ X (k )e j2πnk/N , X (k ) = ∑ x (n)e− j2πnk/N ,
k =0 n =0
x(t) X(Ω)
X(Ω)
x(t)
t Ω
)
jω
x(n)
X(e
-π π
n ω
Xn
p
- T/2 T/2
t n
j2πk/T
xp(n) = x(n) X(k) = X(e ) = TX
k
- N/2 ≤ n < N/2 - N/2 ≤ k < N/2
x (n)
X(k)
p
n k
Figure 3.9 Aperiodic continuous-time signal and its Fourier transform (first row). Discrete-
time signal and its Fourier transform (second row). Periodic continuous-time signal and its
Fourier series coefficients (third row). Periodic discrete-time signal and its discrete Fourier
transform (DFT), (fourth row).
126 Discrete Fourier Transform
differences between the DFT and inverse DFT calculation are in the sign of
the exponent and the division of the final result by N.
Here we will present an algorithm based on splitting the signal x (n),
with N samples, into two signals x (n) for 0 ≤ n ≤ N/2 − 1 and x (n) for
N/2 ≤ n ≤ N − 1, whose duration is N/2. It is assumed that N is an even
number. By definition, a DFT of a signal with N samples is
N −1
DFT N { x (n))} = X (k ) = ∑ x (n)e− j2πnk/N
n =0
N/2−1 N −1
= ∑ x (n)e− j2πnk/N + ∑ x (n)e− j2πnk/N
n =0 n= N/2
N/2−1 @ A
= ∑ x (n) + x (n + N/2)(−1)k e− j2πnk/N
n =0
with
g(n) = x (n) + x (n + N/2).
For an odd number k = 2r + 1, follows
N/2−1
DFT N/2 {h(n)} = X (2r + 1) = ∑ h(n)e− j2πnr/( N/2)
n =0
where
h(n) = ( x (n) − x (n + N/2))e− j2πn/N .
In this way, we split one DFT of N elements into two DFTs of N/2
elements. Having in mind that the direct calculation of a DFT with N
elements requires an order of N 2 operations, it means that we will reduce
the calculation complexity, since N 2 > ( N/2)2 + ( N/2)2 . An illustration of
this calculation, with N = 8, is shown in Fig. 3.10. We can continue and
split N/2 DFTs into N/4 DFTs, and so on. A complete calculation scheme is
shown in Fig. 3.11. We can conclude that in the FFT algorithms an order of
N log2 N of operations is required. Here it is assumed that log 2 N = p is an
integer, i.e., N = 2 p . This a decimation-in-frequency algorithm.
Ljubiša Stanković Digital Signal Processing 127
x(0) X(0)
x(1) X(2)
DFT
4
x(2) X(4)
x(3) X(6)
x(4) 0 X(1)
-1 W8
x(5) 1 X(3)
-1 W8 DFT
4
x(6) 2 X(5)
-1 W8
x(7) X(7)
-1 W3
8
x(0) X(0)
x(1) 0 X(4)
-1 W8
x(2) 0 X(2)
-1 W8
x(3) 2 0 X(6)
-1 W8 -1 W8
x(4) 0 X(1)
-1 W8
x(5) 1 0 X(5)
-1 W8 -1 W8
x(6) 2 0 X(3)
-1 W8 -1 W8
x(7) X(7)
-1 W3 -1 W8
2 -1 W8
0
8
Nadditions = N log2 N.
For the number of multiplications we can see that in the first stage there are
( N/2 − 1) multiplications. In the second stage there are 2 ( N/4 − 1) mul-
B be 4C( N/8 − 1) multiplications. Finally
tiplications. In the next stage would
in the last stage would be 2 p−1 2Np − 1 = N2 ( N N − 1) = 0 multiplications
p
(N = 2 or p = log2 N). The total number of multiplications, in this algo-
rithm, is
* + * + * + * +
N N N p −1 N
Nmultiplicat. = −1 +2 −1 +4 − 1 + ... + 2 −1
2 4 8 2p
N N N N N
= − 1 + − 2 + − 4 + ... + −
2 2 2 2 2
N N 1 − 2p
= p − (1 + 2 + 22 + ... + 2 p−1 ) = p −
2 2 1−2
N N
= log2 N − ( N − 1) = [log2 N − 2] + 1.
2 2
If the multiplications by j and − j were excluded the number of multiplica-
tions would be additionally reduced.
Example 3.7. Consider a signal x (n) within 0 ≤ n ≤ N − 1. Assume that N is an
even number. Show that the DFT of x (n) can be calculated as two DFTs, one
using the even samples of x (n) and the other using odd samples of x (n).
⋆By definition
N −1
X (k ) = ∑ x (n)e− j2πkn/N
n =0
N/2−1 N/2−1
= ∑ x (2m)e− j2πk2m/N + ∑ x (2m + 1)e− j2πk(2m+1)/N
m =0 m =0
N/2−1 N/2−1
= ∑ xe (m)e− j2πkm/( N/2) + e− j2πk/N ∑ xo (m)e− j2πkm/( N/2) , (3.20)
m =0 m =0
where xe (m) = x (2m) and xo (m) = x (2m + 1) are even and odd samples of the
signal, respectively. Thus, a DFT of N elements is split into two DFTs of N/2
elements. Two DFTs of N/2 elements require an order of 2 ( N/2)2 = N 2 /2
operations. It is less than N 2 . In this way, if N/2 is an even number, we can
continue and split two DFTs of N/2 elements into four DFTs of N/4 elements,
and so on. This is a decimation-in-time algorithm, Fig.3.12.
Ljubiša Stanković Digital Signal Processing 129
x(0) X(0)
x(4) X(1)
W0 -1
8
x(2) X(2)
W0 -1
8
x(6) X(3)
W0 -1 W2 -1
8 8
x(1) X(4)
W0 -1
8
x(5) X(5)
W0 -1 W1 -1
8 8
x(3) X(6)
W0 -1 W2 -1
8 8
x(7) X(7)
W0 -1 W2 -1 W3 -1
8 8 8
M −1
X (3k) = ∑ g(n)e− j2πmk/M
m =0
M −1
X (3k + 1) = ∑ r (n)e− j2πmk/M
m =0
130 Discrete Fourier Transform
D E
with r (n) = x (m) + ax (m + M ) + a2 x (m + 3M ) e− j2πm/(3M) ,
M −1
X (3k + 2) = ∑ p(n)e− j2πmk/M
m =0
D E
with p(n) = x (m) + a2 x (m + M ) + ax (m + 3M ) e− j2π2m/(3M) , where a =
e− j2π/3 . Thus, a DFT of N = 3M elements is split into three DFTs of N/3 = M
elements. Three DFTs of N/3 elements require an order of 3 ( N/3)2 = N 2 /3
operations. If, for example, M = N/3 is an even number, we can continue
and split three DFTs of N/3 elements into six DFTs of N/6 elements, and so
on.
A periodic signal x (t), with a period T, can be reconstructed if its Fourier se-
ries is with limited number of nonzero coefficients so that Xk = 0 for k > k m
corresponding to frequencies greater than Ωm = 2πk m /T. The periodic sig-
nal can be reconstructed from the samples taken at ∆t < π/Ωm = 1/(2 f m ).
The number of samples within the period is N = T/∆t.
The reconstructed signal is
N −1 t
sin[(n − ∆t )π ]
x (t) = ∑ x (n∆t)
N sin[(n − t
n =0 ∆t )π/N ]
N −1 t
sin[(n − ∆t )π ]
x (t) = ∑ x (n∆t)e j(n−t/∆t)π/N t
n =0 N sin[(n − ∆t ) π/N ]
for an even N.
Example 3.9. Samples of a signal x (t) are taken with step ∆t = 1. Obtained discrete-
time values are x (n) = [0, 2.8284, − 2, 2.8284, 0, − 2.8284, 2, − 2.8284] for 0 ≤
n ≤ N − 1 with N = 8. Assuming that the signal satisfies the sampling
theorem find its value at t = 1.5. Check the accuracy if the original signal
values were known, x (t) = 3 sin(3πt/4) + sin(πt/4).
⋆Using the reconstruction formula for an even N we get
7
sin[(n − 1.5)π ]
x (1.5) = ∑ x(n)e j(n−1.5)π/8 8 sin[(n − 1.5)π/8] = −0.2242.
n =0
Ljubiša Stanković Digital Signal Processing 131
-2
-4
0 2 4 6 8
time
This result is equal to the original signal value. Calculation is repeated with
0 ≤ t ≤ 8, with step 0.01. The reconstructed values of x (t) are presented in
Fig.3.13.
km
x (t) = ∑ Xk e j2πkt/T . (3.21)
k=−k m
km
x (n∆t) = ∑ Xk e j2πkn/N .
k =−k m
( N −1)/2
T km T
x (n∆t)∆t = ∑ Xk e j2πkn/N = N
N k=− ∑ Xk e j2πkn/N .
k m k=−( N −1)/2
132 Discrete Fourier Transform
With x (n∆t)∆t = x (n) and TXk = X (k ) this form reduces to the DFT and the
inverse DFT
( N −1)/2 N −1
1
X (k )e j2πkn/N , x (n)− j2πkn/N .
N k=−(∑ ∑
x (n) = X (k ) =
N −1)/2 n =0
N −1 N −1
2 N −1 N −1 2
1 n j2πk t
− j2πk N 1 t n
x (t) = ∑ ∑ x (n)e e T = ∑ ∑ x (n∆t)e j2πk( T − N )
T N
k=− N2−1 n=0 n=0 k=− N −1
2
N −1 n )( N −1)/2 1 − e
j2π ( Tt − N
n )N
1 − j2π ( Tt − N
=
N ∑ x (n∆t)e t n
n =0 1 − e j2π ( T − N )
N −1 π
sin[ ∆t (t − n∆t)]
= ∑ x (n∆t) π
N sin[ N∆t (t − n∆t)]
.
n =0
This is the reconstruction formula that can be used to calculate x (t) for any
t based on the signal values at x (n∆t) with ∆t < π/Ωm = 1/(2 f m ).
In a similar way the reconstruction formula for an even number of
samples N can be obtained.
The sampling theorem reconstruction formula of aperiodic signals
follows as a special case as N → ∞, since for a small argument
π π
sin[ (t − n∆t)] → (t − n∆t)
N∆t N∆t
and
∞ π
sin[ ∆t (t − n∆t)]
x (t) → ∑ x (n∆t) π .
n=−∞ ∆t (t − n∆t )
Example 3.10. For a signal x (t) whose period is T it is known that the signal has
components corresponding to the nonzero Fourier series coefficients at k1 , k2 ,
..., k K . What is the minimal number of signal samples needed to reconstruct
the signal? What condition the sampling instants and the frequencies should
satisfy for the reconstruction?
⋆The signal x (t) can be reconstructed by using the Fourier series
(1.11). In calculations, a finite number of K nonzero terms will be used,
K
x (t) = ∑ Xkm e j2πkm t/T .
m =1
Ljubiša Stanković Digital Signal Processing 133
Since there are K unknown values Xk1 , Xk2 ,...,XkK the minimal number of
equations to calculate their values is K. The equations are written for K time
instants
K
∑ Xkm e j2πkm ti /T = x (ti ), for i = 1, 2, ..., K
m =1
or
ΦX= y, X = Φ −1 y
where
⎡ ⎤
e j2πk1 t1 /T e j2πk2 t1 /T ... e j2πkK t1 /T
⎢ e j2πk1 t2 /T e j2πk2 t2 /T ... e j2πkK t2 /T ⎥
Φ=⎢
⎣ ...
⎥
⎦
... ... ...
e j2πk1 tK /T e j2πk2 tK /T ... e j2πkK tK /T
The reconstruction condition is det ∥Φ∥ ̸= 0 for selected time instants ti and
given frequency indices k i .
Assume that the signal x (t) is sampled with ∆t. The discrete-time form
of this signal is
x (n) = Ae jω0 n ∆t,
with ω0 = Ω0 ∆t.
In order to compute the DFT of this signal, we will assume a value of
N and calculate
N −1
X (k ) = ∑ Ae jω0 n e− j2πnk/N ∆t.
n =0
N −1
1 − e jω0 N e− j2πk
X (k ) = A ∑ e jω0 n e− j2πnk/N ∆t = A ∆t (3.24)
n =0 1 − e jω0 e− j2πk/N
sin( N (ω0 − 2πk/N )/2)
= Ae j(( N −1)(ω0 −2πk/N )/2) ∆t (3.25)
sin((ω0 − 2πk/N )/2)
' '
' sin( N (ω0 − 2πk/N )/2) '
| X (k)| = | A| '' ' ∆t. (3.26)
sin((ω0 − 2πk/N )/2) '
ω0 = 2πk0 /N
N −1
X (k ) = A ∑ e j2πk0 n/N e− j2πnk/N ∆t = N Aδ(k − k0 )∆t. (3.27)
n =0
X(k)
x(n)
n k
X(k)
x(n)
n k
Figure 3.14 Sinusoid x (n) = cos(8πn/64) and its DFT with N = 64 (top row) and sinusoid
x (n) = cos(8.8πn/64) and its DFT absolute value, with N = 64 (bottom row).
N −1 B 2πk C
X (k ) = ∑ w(n) Ae jω0 n e− j2πnk/N ∆t = W e j( N −ω0 ) ∆t,
n =0
1
w(n) = [1 − cos(2nπ/N )] [u(n) − u(n − N − 1)] .
2
N −1
1
X H (k ) = ∑ [1 − cos(2nπ/N )] Ae jω0 n e− j2πnk/N ∆t
n =0 2
- .
A N −1 1 j2nπ/N 1 − j2nπ/N
Ae jω0 n e− j2πnk/N ∆t
2 n∑
= 1− e − e
=0 2 2
- .
1 1 1
= X R ( k ) − X R ( k − 1) − X R ( k + 1) ,
2 2 2
1
A= [ X (k0 ) + X (k0 + 1) + X (k0 − 1)].
N∆t
Ljubiša Stanković Digital Signal Processing 137
x(n)w(n)
XH(k)
n k
x(n)w(n)
XH(k)
n k
Figure 3.15 Sinusoid x (n) = cos(8πn/64) multiplied by a Hann(ing) window and its DFT
with N = 64 (top row) and sinusoid x (n) = cos(8.8πn/64) multiplied by a Hann(ing) window
and its DFT absolute value, with N = 64 (bottom row).
3.7.2 Displacement
A relation of the maximum DFT value with the few surrounding values of
the windowed DFT is used to calculate correction, the displacement bin of
the estimated frequency. If we apply a window function w(n) in the DFT
calculation, we get
B 2πk C
X (k ) = W e j( N −ω0 ) ∆t.
! 6
k̂0 = arg max | X (k)| ,
0≤ k ≤ N −1
138 Discrete Fourier Transform
by
X0 = | X (k̂0 )|
and two neighboring samples by
∂X (k̂0 + d)/∂d = 0,
2π
Ω0 = (k̂0 + d) (3.32)
N∆t
2π
for 0 ≤ k̂0 ≤ N/2 − 1 and Ω0 = N∆t (( k̂ 0 + d) − N ) for N/2 ≤ k̂0 ≤ N − 1.
Example 3.11. A sinusoidal signal x (t) = A exp( jΩ0 t) is sampled with a sampling
interval ∆t = 1/128 and N0 = 64 samples are considered. Prior to the DFT
calculation, the signal is zero padded four times. The DFT maximum is
detected at k̂0 = 95. The maximum DFT value is X (95) = 0.9. Neighboring
values are X (96) = 0.7 and X (94) = 0.3. Calculate the displacement bin d and
estimate the value of Ω0 .
Ljubiša Stanković Digital Signal Processing 139
X(0) X(1)
X(Ω), X(k)
X(Ω), X(k)
X(-1)
Ω, k Ω, k
Figure 3.16 Illustration of the displacement bin correction for a true maximum position
calculation based on three neighboring values (full range – left and zoomed graph – right)
.
The DFT of signal satisfies many desirable properties. Also its calculation is
simple and efficient using the FFT algorithm. With the DFT calculation the
signal periodic extension is assumed and embedded in the discrete trans-
form. However, in the DFT case the periodic signal extension will, in gen-
eral, introduce significant signal change (corresponding to discontinuities in
continuous time) at the period ending points Fig.3.17 (first and second row).
This change will significantly worsen the DFT coefficients convergence and
increases number of coefficients in signal reconstruction. In order to reduce
this effect and to improve convergence of the signal transform coefficients
the signal could be extended in an appropriate way.
The discrete cosine transforms (DCT) and discrete sine transforms
(DST) are used to analyze real-valued discrete signals, periodically extended
to produce even or odd signal forms, respectively. However, this extension
is not straightforward for discrete-time signals. Consider a discrete-time
signal of duration N, when x (n) assumes nonzero values for 0 ≤ n ≤ N − 1.
If we try with a direct extension (using all signal values) and form a periodic
signal y(n), whose basic period is of duration 2N, as
!
x (n) for 0 ≤ n ≤ N − 1
y(n) =
x (2N − n − 1) for N ≤ n ≤ 2N − 1
the obtained signal is not even, Fig.3.17(third row). It is obvious that y(n)
does not satisfy the condition y(n) = y(−n) = y(2N − n), required for a
real-valued DFT.
The same holds for an odd extension, Fig.3.17(fourth row),
!
x (n) for 0 ≤ n ≤ N − 1
y(n) = .
− x (2N − n − 1) for N ≤ n ≤ 2N − 1
Thus we have not achieved one of our goals to have a real-valued transform
after a real-valued signal periodic extension. However from Fig.3.17(third
and fourth row) we can see that the signals y(n) are even (or odd) with
respect to the vertical line at n = −1/2. Thus, if we add zeros between each
sample of y(n) and assume that the position which was at n = −1/2 in the
initial signal is the new coordinate origin n = 0 in the new signal z(n), then
these signals will be even and odd, respectively, Fig.3.17(last two rows).
This is just one of possible extensions to make the original discrete-
time signal even (or odd). Several forms of the DCT and DST are defined
based on other ways of getting an even (odd) signal extension.
The most commonly used is the so called DCT-II or just DCT. It will
be presented here. Signal extension for this transform corresponds to the
Ljubiša Stanković Digital Signal Processing 141
x(n)
z(n)
z(n)
Figure 3.17 Illustration of a signal x (n), its periodic extension corresponding to the DFT, an
even and odd discrete-time signal extension corresponding to the DCT and DST of type II.
There are two main advantages of this transform over the standard DFT
calculation. The DCT coefficients are real-valued for a real-valued signal.
142 Discrete Fourier Transform
This transform can produce a better energy concentration than the DFT. In
order to understand why a better energy concentration can be obtained we
will compare the DCT to the standard DFT
N −1
X (k ) = ∑ x (n)e− j2πnk/N ), 0≤k≤ N−1
n =0
Only N terms of the transform are used and the DCT values are obtained.
Since the basis functions are orthogonal the inverse DCT is obtained
2π (2m+1)k
by multiplying both sides of the DCT by cos ( 4N ) and summing over
0 ≤ k ≤ N − 1,
N −1 N −1
2π (2n + 1)k 2π (2m + 1)k
∑ 2x(n) ∑ wk cos(
4N
) cos(
4N
)
n =0 k =0
N −1
2π (2m + 1)k
= ∑ wk C (k ) cos(
4N
),
k =0
Ljubiša Stanković Digital Signal Processing 143
we get
N −1
1 2π (2n + 1)k
x (n) =
N ∑ wk C (k ) cos(
4N
). (3.34)
k =0
A symmetric relation, with the same coefficients in the time and fre-
quency domain, is
N −1
2π (2n + 1)k
C (k ) = vk ∑ x (n) cos(
4N
)
n =0
N −1
2π (2n + 1)k
x (n) = ∑ vk C (k ) cos(
4N
),
k =0
√ √
where v0 = 1/N and vn = 2/N for n ̸= 1.
In a similar way the discrete sine transforms are defined. The most
common form is the DST of type II (DST-II), whose definition is
N −1
2π (2n + 1)
S(k ) = ∑ 2x(n) sin( 2N
(k + 1))
n =0
z(2n + 1) = y(n)
z(2n) = 0.
X(k)
x(n)
n k
[x(n) x(n)]
X (k)
2
n k
C(k)
y(n)
n k
with N terms of the transform being used. The DST is the imaginary part of
this DFT.
Example 3.12. Consider a signal
x (n) = cos(2π (2n + 1)/64) + 0.75 cos(7π (2n + 1)/64).
Calculate its DFT with N = 32. Plot the periodic extension of this signal. Plot
the even extension y(n) of x (n). Calculate the DFT (the DCT) of such a signal
and discuss the results.
⋆Signal x (n), along with its extended versions and corresponding
transforms, is presented in Fig.3.18. Better energy concentration in the DCT
is due to the introduced symmetry in y(n). The artificial discontinuity in the
DFT, which causes its slow convergence, is eliminated in the DCT.
Here we will present two discrete signal transform that can be calculated
without using multiplications. One of them will be used to explain the basic
principle of the wavelet transform calculation as well.
Let us consider a two-sample signal x (n), with N = 2. The correspond-
ing two-sample DFT is
1
X (k) = ∑ x(n)e− j2πnk/2 = x(0) + (−1)k x(1).
n =0
It can be calculated without using multiplications, X (0) = x (0) + x (1) and
X (1) = x (0) − x (1). Now we can show that it is possible to define basis
functions for any signal duration in such a way that the multiplications
are not used in the signal transformation. These transform values will be
denoted by H (k ). For two-sample signal case
X(k)
x(n)
n k
[x(n) x(n)]
X2(k)
n k
y (n)
C(k)
c
n k
y (n)
S(k)
s
n k
Figure 3.19 Signal and its periodic extensions, corresponding to: the DFT (second row), the
cosine transform (third row), and the sine transform (fourth row). Positive frequencies for the
DFT are shown.
Example 3.14. For the signal shown in Fig. 3.20 calculate the two-sample The DFT
for each pair of signal samples
x(n)
n
y (n)
L
n
y (n)
H
Figure 3.20 Original signal x (n) and its two-sample lowpas part y L (n) and highpass part
y H ( n ).
for 0 ≤ n ≤ N/2 − 1.
In some cases the smoothed version y L (n), with a half of the samples of
the original signal, (3.20), is quite good representative of the original signal,
so there is no need to use corrections. Note that for many instants correction
is zero as well.
There are two possibilities to continue and apply the two-point DFT
scheme to a signal with N samples. One of them is in further splitting of
both y L (n) and y H (n) into their low and highpass parts. It leads to a discrete
Walsh-Hadamard transform Fig.3.21. In the other case the splitting is done
for the lowpass part only, while the highpass correction is kept as it is.
It leads to the Haar wavelet transform, Fig.3.22. These two forms will be
explained in details next.
148 Discrete Fourier Transform
x(n)
yH(n)
y (n)
L
n n
yHL(n)
y (n)
LL
n n
yHH(n)
y (n)
LH
n n
Figure 3.21 Illustration of the procedure leading to the Walsh-Hadamard transform calcula-
tion.
Let us continue the idea of splitting both (lowpass and highpass) parts of the
signal and define a transformation of a four-sample signal. For this signal
form two auxiliary two-sample signals y L (n) and y H (n) as
y L (0 ) = x (0 ) + x (1 ), y L (1 ) = x (2 ) + x (3 ) (3.37)
y H (0 ) = x (0 ) − x (1 ), y H (1 ) = x (2 ) − x (3 ). (3.38)
yH(n)
x(n)
n n
yLH(n)
y (n)
L
n n
y (n)
LL
Figure 3.22 Illustration of the procedure leading to the Haar wavelet transform calculation.
H (0 ) = y L (0 ) + y L (1 ) = x (0 ) + x (1 ) + x (2 ) + x (3 ).
H (1 ) = y L (0 ) − y L (1 ) = x (0 ) + x (1 ) − x (2 ) − x (3 ).
H (3 ) = y H (0 ) + y H (1 ) = x (0 ) − x (1 ) + x (2 ) − x (3 ).
H (4 ) = y H (0 ) − y H (1 ) = x (0 ) − x (1 ) − x (2 ) + x (3 ).
150 Discrete Fourier Transform
By replacing the values of y L (n) and y H (n) with signal values x (n),
we get the transformation equation
⎡ ⎤ ⎡ ⎤⎡ ⎤ ⎡ ⎤
H (0) 1 1 1 1 x (0 ) x (0 )
⎢ H (1) ⎥ ⎢ 1 −1 −1 ⎥ ⎢ ⎥ ⎢ ⎥
⎢ ⎥=⎢ 1 ⎥ ⎢ x (1 ) ⎥ = T 4 ⎢ x (1 ) ⎥ , (3.40)
⎣ H (2) ⎦ ⎣ 1 −1 1 −1 ⎦ ⎣ x (2 ) ⎦ ⎣ x (2 ) ⎦
H (3) 1 −1 −1 1 x (3 ) x (3 )
It would correspond to [ H (0), H (4), H (2), H (6), H (1), H (5), H (3), H (7)]T
order of coefficients in the Walsh transform with dyadic ordering (3.41).
Recursive construction of a Hadamard transform matrix H2N is easy
using the Kronecker product of T2 defined by (3.36) and HN ,
- .
HN HN
H2N = T2 ⊗ HN = .
HN −HN
Order [ H (0), H (1), H (3), H (2), H (6), H (7), H (5), H (4)] T in (3.41) would
correspond to a Walsh transform with sequency ordering.
Calculation of the Walsh-Hadamard transforms requires only addi-
tions. For an N-order transform the number of additions is ( N − 1) N.
152 Discrete Fourier Transform
Consider again two pairs of signal samples, x (0), x (1) and x (2), x (3). The
high frequency parts of these pairs are calculated as y H (n) = x (2n) − x (2n +
1), for n = 0, 1. They are used in the Haar transform without any further
modification. Since they represent highpass Haar transform coefficients
they will be denoted, in this case, by W (2) = y H (0) = x (0) − x (1) and
W (3) = y H (1) = x (2) − x (3). The lowpass coefficients of these pairs are
y L (0) = x (0) + x (1) and y L (1) = x (2) + x (3). The highpass and lowpass
parts of these signals are calculated as y LH (0) = [ x (0) + x (1)] − [ x (2) + x (3)]
and y LL (0) = [ x (0) + x (1)] + [ x (2) + x (3)]. For a four-sample signal the
transformation ends here with W (1) = y LH (0) and W (0) = y LL (0). Note
that the order of coefficients is such that the lowest frequency coefficient
corresponds to the transform index k = 0. Matrix form for a four-sample
signal is
⎡ ⎤ ⎡ ⎤⎡ ⎤
W (0) 1 1 1 1 x (0)
⎢ W (1) ⎥ ⎢ 1 1 −1 −1 ⎥ ⎢ ⎥
⎢ ⎥ ⎢ ⎥ ⎢ x (1) ⎥ .
⎣ W (2) ⎦ = ⎣ 1 −1 0 0 ⎦ ⎣ x (2) ⎦
W (3) 0 0 1 −1 x (3)
In specific, for the highest N/2 coefficients the Haar transform does
only one addition (of two signal values) for each coefficient. For next N/4
coefficients the Haar wavelet uses 4 signal values with 3 additions and so
on. The total number of additions is for a Haar transform is
N N N N
Nadditions = (2 − 1) + (4 − 1) + (8 − 1) + ... + ( N − 1).
2 4 8 N
For N of the form N = 2m we can write
1 1 1 1
Nadditions = N log2 N − N ( + 2 + 3 + ... + m )
2 2 2 2
1 1 − 21m
= N log2 N − N = N log2 N − ( N − 1) = N [log2 N − 1] + 1.
2 1 − 12
Calculate its Haar and Walsh-Hadamard transform with N = 16. Discuss the
results.
⋆Signal x (n) is presented in Fig.3.23. In full analogy with (3.43) a
Haar transformation matrix of order N = 16 is formed. For example, higher
coefficients are just two-sample signal transforms,
Although there are some short duration pulses (x (2), x (3), x (13)), the Haar
transform coefficients W (2), W (3), ..., W (8), W (10), W (11), W (12), W (13),
W (15) are zero-valued, Fig.3.23. This is the result of its property to decom-
pose the high frequency signal region into short duration (two-sample) basis
functions. Then a short duration pulse is contained in high frequency part
of only one Haar coefficient. That is not the case in the Fourier transform
(or Walsh-Hadamard transform) where a single delta pulse will cause that
all coefficients are nonzero, Fig.3.24. Transformation matrix T16 is obtained
from T8 using (3.42).
Property that high-frequency coefficients are well localized in time and
they represent a short duration signal components is used in image compres-
sion where adding high frequency coefficients adds details into an image,
with important property that one detail in the image corresponds to one (a
few) nonzero coefficient. Reconstruction with the Haar transform with dif-
ferent number of coefficients is presented in Fig.3.23. As explained it can be
considered as "a zooming" a signal toward the details when the higher fre-
quency coefficients are added. Since a half of the coefficients are zero-valued
154 Discrete Fourier Transform
W(k)
x(n)
n k
x0-1 (n)
x (n)
0
n n
x0-1,9,14 (n)
(n)
0-1,9
x
n k
Figure 3.23 Signal x (n) and its discrete Haar transform H (k ). Reconstructed signals: using
H (0) presented by x0 (n), using two coefficients H (0) and H (1) denoted by x0−1 (n), using
H (0), H (1), and H (9) denoted by x0−1,9 (n), and using H (0), H (1), H (9), and H (14) denoted
by x0−1,9,14 (n). Vertical axes scales for the signal and transform are different.
H(k)
x(n)
n k
x(n)
x(n)
n n
W(k)
W(k)
k k
H(k)
H(k)
k k
Figure 3.25 The Haar wavelet transform (second row) and the Walsh-Hadamard transform
(third row) for high frequency long duration signals (first row). Vertical axes scales for the
signal and transform are different.
3.10 PROBLEMS
Problem 3.1. Calculate the DFT of signals using the smallest possible value
of N: a) x (n) = δ(n), b) x (n) = δ(n) + δ(n − 1) − 2jδ(n − 2) + 2jδ(n − 3) +
δ(n − 4), and c) x (n) = an (u(n) − u(n − 10)).
Problem 3.2. If the signals g(n) and f (n) are real-valued show that their
DFTs, G (k ) and F (k ), can be obtained from the DFT Y (k ) of the signal
y(n) = g(n) + jh(n).
Problem 3.3. The relationship between the DFT index and the continuous
signal frequency is given by
%
2πk/( N∆t) for 0 ≤ k ≤ N/2 − 1
Ω=
2π (k − N )/( N∆t) for N/2 ≤ k ≤ N − 1.
Problem 3.7. Find the signal whose DFT is Y (k ) = | X (k )|2 and X (k ) is the
DFT of x (n) = u(n) − u(n − 3) with period N = 10.
Problem 3.8. What is the relation between the discrete Hartley transform
(DHT) of real-valued signals
N −1 * +
2πnk 2πnk
H (k ) = ∑ x (n) cos + sin
n =0
N N
Ljubiša Stanković Digital Signal Processing 157
and the DFT? Express the DHT in terms of the DFT and the DFT in terms of
the DHT.
Problem 3.9. Show that the DCT of a signal x (n) with N samples, defined
by
N −1
2πk 1
C (k ) = ∑ 2x(n) cos( 2N (n + 2 ))
n =0
as
πk
N −1 2πk n πk
C (k ) = Re{e− j 2N ∑ y(n)e− j N } = Re{e− j 2N DFT{y(n)}}.
n =0
z(2n + 1) = y(n)
z(2n) = 0.
(a) What are the real and imaginary parts of Z (k ) = DFT{z(n)}? How
they are related to the DCT and DST of x (n)? (b) The signal x (n) is applied
as an input to a system with impulse response h(n) such that h(n) is of
duration shorter than N, defined within 0 ≤ n ≤ N − 1, and x (n) ∗n h(n) is
also within 0 ≤ n ≤ N − 1. The DCT of the output signal is calculated. How
it is related to the DCT and DST of x (n)?
N −1
yk (n + ( N − 1)) = ∑ x (n + m)e− j2πmk/N
m =0
158 Discrete Fourier Transform
so that its value yk ( N − 1) at the last instant of the signal duration is equal
to the DFT of signal, for a given k,
N −1
y k ( N − 1) = ∑ x (m)e− j2πmk/N = DFT{ x (n)} = X (k ).
m =0
Note that the system is causal since yk (n) uses only x (n) at instant n and
previous instants.
Show that the output signal yk (n) is related to previous output value
yk (n − 1) by the equation
3.11 SOLUTIONS
Solution 3.1. The DFT assumes that the signals are periodic. In order to
calculate the DFT we have to assume a period of signals first. Period N
should be greater or equal to the duration of signal, so that the signal values
do not overlap. Larger values of N will increase the density of the frequency
domain samples, but will also increase the computation time.
a) For this signal any N ≥ 1 is acceptable, producing
X (k ) = 1, k = 0, 1, ..., N − 1,
with period N.
b) We may use any N ≥ 5. Using N = 5 we get:
5−1
X (k ) = ∑ x(n)e− j2πnk/5 = 1 + e− j2πk/5 − 2je− j4πk/5 + j2e− j6πk/5 + e− j8πk/5
n =0
= 1 + 2 cos(2πk/5) − 4 sin(4πk/5).
Ljubiša Stanković Digital Signal Processing 159
c) For a period N ≥ 10
9
1 − a10 e− j2πk(10/N )
X (k ) = ∑ (ae− j2πk/N )n = 1 − ae− j2πk/N
.
n =0
Solution 3.2. From y(n) = g(n) + j f (n) the real and imaginary parts g(n)
and f (n) can be obtained as
DFT{y∗ (n)} = Y ∗ ( N − k ).
Y (k ) + Y ∗ ( N − k ) Y (k ) − Y ∗ ( N − k )
G (k) = and F (k ) = .
2 2j
N −1
X1 ( k ) = ∑ x (n)(−1)n e− j2πnk/N .
n =0
For 0 ≤ k ≤ N/2 − 1
N −1 N −1
N
X1 ( k ) = ∑ x (n)e− jπn e− j2πnk/N = ∑ x (n)e− j2πn(k+ N/2)/N = X (k + ).
n =0 n =0 2
For N/2 ≤ k ≤ N − 1
N −1 N −1
N
X1 ( k ) = ∑ x (n)e jπn e− j2πnk/N = ∑ x (n)e− j2πn(k− N/2)/N = X (k − ).
n =0 n =0
2
160 Discrete Fourier Transform
N −1 N −1
Y (k ) = ∑ y(n)e− j2πnk/N = ∑ [x(n) + (−1)n x(n)]e− j2πnk/N
n =0 n =0
N −1
N
= ∑ [x(n) + x(n)e− jπnN/N ]e− j2πnk/N = X (k) + X (k + 2
)
n =0
N −1 N −1
Z (k ) = ∑ z(n)e− j2πnk/N = ∑ [x(n) − (−1)n x(n)]e− j2πnk/N
n =0 n =0
N
= X ( k ) − X ( k + ).
2
Obviously Y (k ) + Z (k ) = X (k ).
Solution 3.5. For the convolution calculation, using the DFT, the minimal
number N is N = K + L − 1 = 4, where K = 2 is the duration of x (n) and
L = 3 is the duration of h(n). With N = 4 follows
X (k ) = 1 − e− j2πk/4
H (k ) = 2 − e− j2πk/4 + 2e− j4πk/4
Y (k ) = X (k ) H (k ) = (1 − e− j2πk/4 )(2 − e− j2πk/4 + 2e− j4πk/4 )
= 2 − 3e− j2πk/4 + 3e− j4πk/4 − 2e− j6πk/4 .
The signal is
Solution 3.6. The circular convolution of y(n) = x (n) ∗ h(n) has the DFT
Y (k ) = X (k) H (k ) with
N −1
1 1
X (k ) = ∑ [e j4πn/N + 2j e j2πn/N − 2j e− j2πn/N ]e− j2πnk/N
n =0
N N
= Nδ(k − 2) + δ ( k − 1) − δ ( k + 1)
2j 2j
Ljubiša Stanković Digital Signal Processing 161
and
N −1
1 1
H (k) = ∑ [ 2 e j4πn/N + 2 e− j4πn/N + e j2πn/N ]e− j2πnk/N
n =0
N N
= δ(k − 2) + δ(k + 2) + Nδ(k − 1).
2 2
The value of Y (k ) is
N2 N2
Y (k ) = δ ( k − 2) + δ ( k − 1 ).
2 2j
Since
( )∗
N −1 N −1
IDFT{ X ∗ (k )} = ∑ X ∗ (k )e j2πnk/N = ∑ X (k )e− j2πnk/N
k =0 k =0
( )∗
N −1
= ∑ X (k )e j2πk( N −n)/N = x∗ ( N − n)
k =0
we get
Thus,
N −1
2πnk X (k ) + X ( N − k ) H (k ) + H ( N − k )
∑ x (n) cos
N
=
2
=
2
n =0
N −1
2πnk X ( N − k ) − X (k ) H (k ) − H ( N − k )
∑ x (n) sin
N
=
2j
=
2
.
n =0
2H (k ) = X (k ) + X ( N − k ) − j[ X ( N − k ) − X (k )].
2X (k ) = H (k ) + H ( N − k ) − j[ H (k ) − H ( N − k )].
Solution 3.9. We can split the DCT sum into an even and odd part
N −1
2πk 1
C (k ) = ∑ 2x(n) cos( 2N (n + 2 )) =
n =0
N/2−1 N/2−1
2πk 1 2πk 1
∑ 2x (2n) cos(
2N
(2n + )) + ∑ 2x (2n + 1) cos(
2 2N
(2n + 1 + )).
2
n =0 n =0
N/2−1
2πk 1
∑ 2x (2n + 1) cos(
2N
(2n + 1 + ))
2
n =0
N/2−1
2πk 1
= ∑ 2x ( N − 2m − 1) cos(
2N
( N − 2m − 1 + )).
2
m =0
Shifting now the summation index in this sum for N/2 + m = n follows
N/2−1
2πk 1
∑ 2x ( N − 2m − 1) cos(
2N
( N − 2m − 1 + ))
2
m =0
N −1
2πk 1
= ∑ 2x (2N − 2n − 1) cos(
2N
(2N − 2n − )).
2
n= N/2
Ljubiša Stanković Digital Signal Processing 163
Now we can go back to the DCT and to replace the second sum, to get
N/2−1
2πk 1
C (k ) = ∑ 2x (2n) cos(
2N
(2n + ))
2
n =0
N −1 N −1
2πk 1 2πk 1
+ ∑ 2x (2N − 2n − 1) cos(
2N
(2n + )) = ∑ y(n) cos(
2 2N
(2n + ))
2
n= N/2 n =0
and
N −1
Z (k ) = DFT{z(n)} = e− j2πk/(4N ) ∑ 2x(n)e− j2πnk/(2N )
n =0
jπk/(2N )
Z (k )e = Y (k) = 2X (k/2).
b) If the signal x (n) is input to a system then the DCT is calculated for
It has been assumed that all x (n), h(n), and x (n) ∗n h(n) are zero-valued
outside 0 ≤ n ≤ N − 1 (it means that the duration of x (n) and h(n) should
be such that their convolution is within 0 ≤ n ≤ N − 1) . Then for a signal
zh (n) related to xh (n) = x (n) ∗n h(n) in the same way as z(n) to x (n) in a)
we can write
k k k k
DFT{zh (n)}e jπk/(2N ) = 2Xh ( ) = 2X ( ) H ( ) = Y (k ) H ( ).
2 2 2 2
Then
k
Ch (k ) = DCT{ xh (n)} = Re{Y (k ) H ( )e− jπk/(2N ) }
2
− jπk/(4N ) k k
= Re{Y (k)e } Re{ H ( )} − Im{Y (k)e− jπk/(4N ) } Im{ H ( )}
2 2
k k
= C (k) Re{ H ( )} + S(k) Im{ H ( )}.
2 2
The system output is x (n) ∗n h(n) = xh (n) = IDCT{Ch )k )}, (3.34). Transform
H (k/2) is the DFT of zero-padded h(n) with factor 2. Only first half of the
DFT samples are then used.
Solution 3.11. For the signal yk (n) we may write
N −1
yk (n) = ∑ x (n − N + 1 + m)e− j2πmk/N .
m =0
For 0 ≤ n ≤ N − 1
Therefore H (2r ) is
N/2−1 - .
2πrn 2πrn
H (2r ) = ∑ g(n) cos
N/2
+ sin
N/2
n =0
where g(n) = x (n) + x (n + N/2). This is a DHT of g(n) with N/2 samples.
Note: For odd frequency indices k = 2r + 1 we can write
N −1 - .
2π (2r + 1)n 2π (2r + 1)n
H (2r + 1) = ∑ x (n) cos + sin .
n =0
N N
N/2−1 - .
2πnr 2πnr
H (2r + 1) = ∑ f (n) cos
N/2
+ sin
N/2
n =0
where
N 2πn N 2πn
f (n) = [ x (n) − x (n + )] cos + [ x ( − n) − x ( N − n)] sin .
2 N 2 N
This is again a DHT of a signal f (n) with N/2 samples.
166 Discrete Fourier Transform
where |X| is the vector whose elements are the DFT values | X (k )|, k =
0, 1, ..., 15. Maximal value is at k = 3, meaning that the frequency estimation
without displacement √ bin would be (2π · 3)/16 = 1.1781, while the true
frequency is (2π · 2 3)/16 = 1.3603. The error is 13.4%.
For the zero-padded signal (interpolated DFT), with a factor of 4,
15 √ 15 √
3n/16 − j2πnk/64
| X (k)| = | ∑ e j4π e | =| ∑ e j2π (8 3−k)n/64
|
n =0 n =0
' B C'
' sin π (8√3 − k )/4 '
' '
= '' B √ C '' .
' sin π (8 3 − k )/64 '
@ √ A @ √ A
Maximal value is obtained for k = 8 3 = 14, where 8 3 denotes the
nearest integer value. Then
' B C'
' sin π (8√3 − 14)/4 '
' '
| X (14)| = '' B √ C '' = 15.9662,
' sin π (8 3 − 14)/64 '
' B C'
' sin π (8√3 − 15)/4 '
' '
| X (15)| = '' B √ C '' = 13.9412
' sin π (8 3 − 15)/64 '
' B C'
' sin π (8√3 − 13)/4 '
' '
| X (13)| = '' B √ C '' = 14.8249,
' sin π (8 3 − 13)/64 '
| X (15)| − | X (13)|
d = 0.5 = −0.1395.
2 | X (14)| − | X (15)| − | X (15)|
√
The true frequency index would be 8 3 = 13.8564, with the true frequency
2π · 13.8564/64 = 1.3603. The correct value of frequency index is shifted
Ljubiša Stanković Digital Signal Processing 167
4
3
2
x( n)
1
0
-1
-2
-3
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
n
from the nearest integer k = 14 (on the frequency grid) for 14 − 13.8564 =
−0.1436, when the interpolation is done. Thus, the obtained displacement
bin value −0.1395 is close to the true shift value −0.1436. The estimated
frequency, using the displacement bin, is 1.3608. As compared to the true
frequency the error is 0.03%.
If the displacement formula is applied on √ the DFT values, without
interpolation, we would get d = 0.3356, while 2 3 = 3.4641 is displaced
from the nearest integer for 0.4641.
3.12 EXERCISE
Exercise 3.1. Find the DFT of x (n) = δ(n) − δ(n − 3) with N = 4 and N = 8.
Exercise 3.2. Calculate the DFT of signal x (n) = sin(nπ/4) for 0 ≤ n < N
with N = 8 and N = 16.
Exercise 3.3. For a real-valued signal the DFT is calculated with N = 8 and
the following DFT values are known: X (0) = 1, X (2) = 2 − j, X (5) = j,
X (7) = 3. Find the remaining values. What are the values of x (0) and
∑7n=0 x (n)?
Exercise 3.4. Signal x (n) is presented in Fig. 3.26. Find X (0), X (4), and X (8),
where X (k ) is the DFT of x (n) calculated with N = 16.
Exercise 3.5. Prove that for an arbitrary real-valued signal x (n), defined for
0 ≤ n < N, where N is an even integer, the DFT value X ( N/2) is real-valued.
168 Discrete Fourier Transform
4
3
2
X( k)
1
0
-1
-2
-3
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
k
5. Find x (0).
6. Calculate ∑15
n =0 x ( n ).
7. Calculate ∑15 2
n=0 | x ( n )| .
8. Calculate ∑15 n
n=0 (−1) x (n ).
Fourier transform of discrete signals and the DFT are used for
T
HE
direct signal processing and calculations. A transform that gener-
alizes these transforms, in the same way as the Laplace transform
generalizes the Fourier transform of continuous signals, is the z-transform.
This transform provides an efficient tool for qualitative analysis and design
of the discrete systems.
∞
X (z) = ∑ x (n )z−n , (4.1)
n=−∞
x ( n ) = a n u ( n ) + b n u ( n ),
where a and b are complex numbers, | a| < |b|. Find the z-transform of this
signal and its region of convergence.
169
170 z-Transform
a a
Im{z}
Im{z}
Im{z}
b b
Re{z} Re{z} Re{z}
a a
Im{z}
Im{z}
Im{z}
b b
Re{z} Re{z} Re{z}
4.2.1 Linearity
∞
Z{ ax (n) + by(n)} = ∑ [ ax (n) + by(n)]z−n = aX (z) + bY (z)
n=−∞
with the region of convergence being at least the intersection of the regions
of convergence of X (z) and Y (z). In special cases the region can be larger
than the intersection of the regions of convergence of X (z) and Y (z) if
some poles, defining the region of convergence, cancel out in the linear
combination of transforms.
4.2.2 Time-Shift
∞ ∞
Z{ x (n − n0 )} = ∑ x ( n − n0 ) z − n = ∑ x (n)z−(n+n0 ) = X (z)z−n0 .
n=−∞ n=−∞
Example 4.3. For a causal signal x (n) = x (n)u(n) find the z-transform of x (n +
n0 )u(n), for n0 ≥ 0.
172 z-Transform
n =0 n =0
> ?
∞
= z n0 ∑ x (n)z−n − x (0) − x (1)z−1 − ... − x (n0 − 1)z−n0 +1
n =0
@ A
= zn0 X (z) − x (0) − x (1)z−1 − ... − x (n0 − 1)z−n0 +1 .
For n0 = 1 follows Z{ x (n + 1)u(n)} = zX (z) − x (0). Note that for this signal
x ( n + n0 ) u ( n ) ̸ = x ( n + n0 ) u ( n + n0 ).
4.2.4 Differentiation
d N X (z)
Z{n(n − 1)...(n − N − 1) x (n)u(n)} = (−1) N z N .
dz N
Ljubiša Stanković Digital Signal Processing 173
∞
Z{ x (n) ∗ y(n)} = Z{ ∑ x (m)y(n − m)}
m=−∞
∞ ∞ ∞ ∞
= ∑ ∑ x (m )y (n − m )z−n = ∑ ∑ x ( m ) y ( l ) z − m − l = X ( z )Y ( z )
n=−∞ m=−∞ l =−∞ m=−∞
with the region of convergence being at least the intersection of the re-
gions of convergence of X (z) and Y (z). In the case of a product of two z-
transforms it may happen that some poles are canceled out causing that the
resulting region of convergence is larger than the intersection of the individ-
ual regions of convergence.
= lim [ x ( N + 1) − x (0)].
N →∞
Thus,
lim [ x ( N + 1) − x (0)] = zX (z) − x (0) − X (z),
N →∞
identify possible regions of convergence and find the inverse z-transform for
each of them.
⋆Obviously the z-transform has the poles z1 = 1/2 and z2 = 1/3. Since
there are no poles in the region of convergence there are three possibilities
to define the region of convergence: 1) |z| > 1/2, 2) 1/3 < |z| < 1/2, and 3)
|z| < 1/3. The signals are obtained by using power series expansion for each
case.
1) For the region of convergence |z| > 1/2 the z-transform should be
written in the form
1 1
X (z) = 1
+ 1
.
1− 2z −3z(1 − 3z )
Both of these sums converge for |z| > 1/2. The resulting power series expan-
sion of X (z) is
∞
1 −n 1 ∞ 1 −n
X (z) = ∑ 2n
z −
3z n∑ n
z
n =0 =0 3
∞ ∞
1 1
= ∑ n z−n − ∑ n z−n .
n =0
2 n =1
3
2) For 1/3 < |z| < 1/2 the z-transform should be written in the form
−2z 1
X (z) = + 1
.
1 − 2z −3z(1 − 3z )
The corresponding geometric series are
∞ 0
1 1
= ∑ (2z)n = ∑ 2−n z−n for |2z| < 1 or |z| <
1 − 2z n=0 n=−∞ 2
∞ * + ' '
1 1 n ∞
1 −n '1' 1
1
= ∑ = ∑ n z for '' '' < 1 or |z| > .
1 − 3z n =0
3z n =0
3 3z 3
They converge for 1/3 < |z| < 1/2. The resulting power series expansion is
0
1 ∞ 1 −n
∑ 2− n z − n −
3z n∑
X (z) = −2z n
z
n=−∞ =0 3
−1 1 −n ∞
1
=− ∑ n
z − ∑ n z−n .
n=−∞ 2 n =1
3
The inverse z-transform for this region of convergence is
1 1
x (n) = − u(−n − 1) − n u(n − 1).
2n 3
3) For |z| < 1/3 we can write
−2z 1
X (z) = + .
1 − 2z 1 − 3z
The corresponding geometric series are
∞ 0
1 1
= ∑ (2z)n = ∑ 2−n z−n for |2z| < 1 or |z| <
1 − 2z n=0 n=−∞ 2
∞ 0
1 1
= ∑ (3z)n = ∑ 3−n z−n for |3z| < 1 or |z| < .
1 − 3z n=0 n=−∞ 3
1 1
X (z) = e a/z = 1 + ( a/z) + ( a/z)2 + ( a/z)3 + ...
2! 3!
follows
1 2 1
x (n) = δ(n) + aδ(n − 1) + a δ(n − 2) + a3 δ(n − 3)+
2! 3!
1
= an u ( n ).
n!
The series converges for any z except z = 0.
z2 + 1
X (z) =
(z − 1/2)(z2 − 3z/4 + 1/8)
find the signal x (n) if the region of convergence is |z| > 1/2.
⋆ The denominator of X (z) will be rewritten in the form
z2 + 1 z2 + 1
X (z) = =
(z − 1/2)(z − z1 )(z − z2 ) (z − 1/2)2 (z − 1/4)
where z1 = 1/2 and z2 = 1/4. Writing X (z) in the form of partial fractions
A B C
X (z) = + +
(z − 12 )2 z − 1
2 z− 1
4
or from
1 1 1 1
(z2 + 1) = A(z − ) + B(z − )(z − ) + C (z − )2 . (4.4)
4 2 4 2
For z = 1/4 we get 17/16 = C/16 or C = 17. Value of z = 1/2 gives
1 1 1
( + 1) = A ( − )
4 2 4
178 z-Transform
For the region of convergence |z| > 1/2 and a parameter | a| ≤ 1/2 holds
∞
1 1 −1 −1
= a = z (1 + az + a 2 z −2 + . . . ) = ∑ a n −1 z − n .
z−a z (1 − z ) n =1
In general the inversion is calculated by using the Cauchy relation from the
complex analysis
O
1
zm−1 dz = δ(m),
2πj
C
Ljubiša Stanković Digital Signal Processing 179
where C is any closed contour line within the region of convergence. The
complex plane origin is within the contour. By multiplying both sides of
X (z) by zm−1 , after integration along the closed contour within the region
of convergence we get
O ∞ O
1 1
zm−1 X (z)dz = ∑ 2πj zm−1 x (n)z−n dz = x (m).
2πj n=−∞
C C
where zi are the poles of zn−1 X (z) within the integration contour C that is
in the region of convergence and k is the pole order. If the signal is causal,
n ≥ 0, and all poles of zn−1 X (z) within contour C are simple (first-order
poles with k = 1) then, for a given instant n,
M N
x (n) = ∑ [zn−1 X (z)(z − zi )]|z=zi . (4.5)
zi
Y ( z ) = X ( z ) H ( z ).
It means that the z-transform exists at |z| = 1, i.e., that the circle
|z| = 1
Ljubiša Stanković Digital Signal Processing 181
a a a
Im{z}
Im{z}
Im{z}
1 1 1
c
b b
Re{z} Re{z} Re{z}
2 4
60 h (n) 1.5 h2(n) 3 h3(n)
1
40 1 2
20 0.5 1
0 0 0
Figure 4.3 Regions of convergence (gray) with corresponding signals. Poles are denoted by
"x".
| H (e jω )| = | H (z)||z=e jω .
where z0i are zeros and z pi are poles of the transfer function. For the
amplitude of frequency response we my write
' '
' B ' TO1 TO2 ...TO M
| H (e jω )| = '' 0 ''
A0 TP1 TP2 ...TPN
Example 4.9. Plot the frequency response of the causal notch filter with the transfer
function
z − e jπ/3
H (z) =
z − 0.95e jπ/3
TO1
| H (e jω )| =
TP1
O1
1.5
T P
1
ω
π/3
|H(ejω)|
Im{z}
0.5
0
Re{z} -2 0 π/3 2 ω
Figure 4.4 Poles and zeros of a first-order notch filter (left). The frequency response of this
notch filter (right).
B0 + B1 z−1 + ... + B M z− M
Y (z) = X ( z ).
1 + A1 z−1 + ... + A N z− N
184 z-Transform
5 1
y ( n ) − y ( n − 1) + y ( n − 2) = x ( n ). (4.6)
6 6
If the input signal is x (n) = 1/4n u(n) find the output signal.
1
Y (z) = X ( z ).
1 − 12 z−1 + 16 z−2
The z-transform of the input signal is X (z) = 1/(1 − 14 z−1 ) for |z| > 1/4. The
output signal z-transform is
z3
Y (z) = .
(z − 12 )(z − 13 )(z − 14 )
For a causal system the region of convergence is |z| > 1/2. The output signal
is the inverse z-transform of Y (z). For n > 0 it is
M N
y(n) = ∑ [zn−1 Y (z)(z − zi )]|z=zi
zi =1/2,1/3,1/4
z n +2 z n +2 z n +2
= 1 1
+ 1 1
+
(z − 3 )( z − 4 ) |z=1/2 (z − 2 )( z − 4 ) |z=1/3 (z − 12 )(z − 13 ) |z=1/4
1 8 3
=6 − n + n.
2n 3 4
For n = 0 there is no pole at z = 0. Thus, the above expressions hold for n = 0
as well. The output signal is
- .
6 8 3
y(n) = − + u ( n ).
2n 3n 4n
Note: This kind of solution assumes the initial values from the system causal-
ity and x (n) as y(0) = x (0) = 1 and y(1) − 5y(0)/6 = x (1), i.e., y(1) =
13/12.
Find its impulse response and discuss its behavior in terms of the system
coefficients.
Ljubiša Stanković Digital Signal Processing 185
⋆For the impulse response calculation the input signal is x (n) = δ(n)
with X (z) = 1. Then we have
The pole of this system is z = − A1 . The are two possibilities for the region
of convergence |z| > | A1 | and |z| < | A1 |. For a causal system the region of
convergence is |z| > | A1 |. Thus, the z-transform Y (z) can be expanded into a
geometric series with q = A1 z−1 = ( A1 /z) < 1
B CB C
Y (z) = B0 + B1 z−1 1 − A1 z−1 + A21 z−2 − A31 z−3 + ... + (− A1 z−1 )n + ...
∞ ∞
= B0 + B0 ∑ (− A1 )n z−n + B1 ∑ (− A1 )(n−1) z−n
n =1 n =1
with
y(n) = B0 δ(n) + (− A1 )n−1 (− A1 B0 + B1 )u(n − 1).
We can conclude that, in general, the impulse response has an infinite dura-
tion for any A1 ̸= 0. It is a result of the recursive relation between the output
y(n) and its previous value(s) y(n − 1). This kind of systems are referred to
as infinite impulse response (IIR) systems or recursive systems. If the value
of coefficient A1 is A1 = 0 then there is no recursion and
Then we have a system with a finite impulse response (FIR). This kind of
system produces an output to a signal x (n) as
They are called moving average (MA) systems. Systems without recursion
are always stable since a finite sum of finite signal values is always finite.
Systems that would contain only x (n) and the output recursions, in this
case,
y(n) + A1 y(n − 1) = B0 x (n)
are auto-regressive (AR) systems or all pole systems. This kind of systems
could be unstable, due to recursion. In our case the system is obviously
unstable if | A1 | > 1. Systems (4.7) are in general auto-regressive moving
average (ARMA) systems.
186 z-Transform
If the region of convergence were |z| < | A1 | then the function Y (z)
would be expanded into series with q = z/A1 < 1 as
* + ∞
B0 + B1 z−1 B0 B1
(− A1−1 z)n
A1 n∑
Y (z) = = z +
A1 z−1 (z/A1 + 1) A1 =0
0
B1 0
= B0 ∑ (− A1 )n−1 z−(n−1) + ∑ (− A1 )n z−n
n=−∞ A1 n=− ∞
−1 B1 0
= B0 ∑ (− A1 )n z−n + ∑ (− A1 )n z−n
n=−∞ A1 n=− ∞
with
B1
y(n) = B0 (− A1 )n u(−n − 1) + (− A1 )n u(−n).
A1
This system would be stable if |1/A1 | < 1 and unstable if |1/A1 | > 1, having
in mind that y(n) is nonzero for n < 0. This is an anticausal system since it
has impulse response satisfying h(n) = 0 for n ≥ 1.
Here, we have just introduced the notions. These systems will be
considered in Chapter 5 in details.
yi (n) = Ci λin .
y(0) = C1 + C2 + 1 = 1
C1 C
y (1) = + 2 +4=5
2 3
the constants C1 = 6 and C2 = −6 follow. The final solution is
- .
6 6
y(n) = n − n + 3n + 1 u(n).
2 3
Note: The z-transform based solution would assume y(0) = x (0) =
11/6 and y(1) = 5y(0)/6 + x (1) = 157/36. The solution with the initial
conditions y(0) = 1 and y(1) = 5 could be obtained from this solution with
appropriate changes of the first two samples of the input signal in order to
take into account the previous system state and to produce the given initial
conditions y(0) = 1 and y(1) = 5 .
If multiple polynomial roots are obtained, for example λi = λi+1 , then
yi (n) = λin and yi+1 (n) = nλin .
Consider a periodic signal x (n) with a period N and its DFT values X (k ),
1 N −1
x (n) = ∑ X (k )e j2πnk/N . (4.12)
N k =0
1 1 − e j2πk0 z− N '
'
= z N −1 X ( k 0 ) '
N 1 − e j2πk0 /N z−1 ' j2πk0 /N
z=e
1 z N − e j2πk0
= X (k0 ) lim j2πk0 /N
= X ( k 0 ).
N z→e j2πk0 /N z − e
190 z-Transform
z=ej2πk/N z=e
j2πk/N
k0=k
Im{z}
Im{z}
Im{z}
z=ej2πk0/N
k0≠ k
Re{z} Re{z} Re{z}
Figure 4.5 Zeros and the pole in Z{ xk (n)} (left), the pole in 1/ (1 − e j2πk0 n/N z−1 ) for k ̸= k0
(middle), and the pole in 1/ (1 − e j2πk0 n/N z−1 ) for k = k0 (right). Illustration is for N = 16.
y k ( N − 1) = X ( k ) δ ( k − k 0 ).
It requires just one additional complex multiplication for the last instant
and for one frequency. The total number of multiplications is 2N + 4. It
is reduced with respect to the previously needed 4N real multiplications.
The total number of additions is 4N + 2. It is increased. However the time
needed for a multiplication is much longer than the time needed for an
addition. Thus, the overall efficiency is improved. The efficiency is even more
improved having in mind that (4.18) is the same for calculation of X (k0 ) and
X (−k0 ) = X ( N − k0 ).
"∞ ∞ ∞
X (s) = x (t)e−st dt ∼
= ∑ x (n∆t)e−sn∆t ∆t = ∑ x (n)e−sn∆t
−∞ n=−∞ n=−∞
with x (n) = x (n∆t)∆t. Comparing this relation with the z-transform defini-
tion we can conclude that the Laplace transform of x (t) corresponds to the
z-transform of its samples with
z = exp(s∆t),
that is,
X (s) ↔ X (z)|z=exp(s∆t) . (4.19)
A point s = σ + jΩ from the Laplace domain maps into the point
z = re jω with r = eσ∆t and ω = Ω∆t. Points from the left half-plane in the
s domain, σ < 0, map to the interior of unit circle in the z domain, r < 1.
192 z-Transform
Example 4.14. A causal discrete-time signal x (n) has the Fourier transform X (e jω ).
Write its z-transform in terms of the Fourier transform of the discrete-time
signal, i.e., write the z-transform value based on its values on the unit circle.
⋆The signal can be expressed in term of its Fourier transform as
"π
1
x (n) = X (e jω )e jωn dω
2π
−π
∞ "π ∞
−n 1
X (z) = ∑ x (n)z =
2π
X (e jω ) ∑ e jωn z−n dω
n =0 −π n =0
"π
1 X (e jω )
= dω,
2π 1 − e jω z−1
−π
N −1
X (k ) = X (e jω )|ω =2πk/N = X (z)|z=e j2πk/N = ∑ x (n)z−n j2πk/N .
n =0
|z=e
Example 4.15. Consider a discrete-time signal with N samples different from zero
within 0 ≤ n ≤ N − 1. Show that all values of X (z), for any z, can be calculated
based on its N samples on the unit circle in the z-plane.
⋆If the signal has N nonzero samples, then it can be expressed in term
of its DFT as
N −1
1 N −1
∑ x (n)e− j2πnk/N and x (n) = X (k )e j2πnk/N .
N k∑
X (k ) =
n =0 =0
Thus, the z-transform of x (n), using only the values of the IDFT where the
original signal is nonzero, 0 ≤ n ≤ N − 1,
1 N −1 N −1 1 N −1 1 − z− N e j2πk
∑ ∑ X (k )e j2πnk/N z−n =
N k∑
X (z) = −1 j2πk/N
X (k)
N k =0 n =0 =0 1 − z e
Ljubiša Stanković Digital Signal Processing 193
N=16
jω j2π k/16
z=e z=e
π/Δt
Im{s}=Ω
Im{z}
Im{z}
0
- π/Δt
Figure 4.6 Illustration of the z-transform relation with the Laplace transform (left), the Fourier
transform of discrete signals (middle), and the DFT (right).
4.7 PROBLEMS
Problem 4.1. Find the z-transform and the region of convergence for the
following signals:
(a) x (n) = δ(n − 2),
(b) x (n) = a|n| u(n),
(c) x (n) = 21n u(n) + 31n u(n)
Problem 4.2. Find the z-transform and the region of convergence for the
following signals:
(a) x (n) = δ(n + 1) + δ(n) + δ(n − 1),
(b) x (n) = 21n [u(n) − u(n − 10)].
Problem 4.3. Using the z-transfrom property that
dX (z)
Y (z) = −z
dz
corresponds to
y(n) = nx (n)u(n)
194 z-Transform
in the discrete-time domain, with the same region of convergence for X (z)
and Y (z), find a causal signal whose z-transform is
(a) X (z) = e a/z , |z| > 0.
(b) X (z) = ln(1 + az−1 ), |z| > | a|.
Problem 4.4. (a) How the z-transform of x (−n) is related to the z-transform
of x (n)?
(b) If the signal x (n) is real-valued show that its z-transfrom satisfies
X ( z ) = X ∗ ( z ∗ ).
Problem 4.5. If X (z) is the z-transform of a signal x (n) find the z-transform
of
∞
y(n) = ∑ x ( k ) x ( n + k ).
k =−∞
z+1
X (z) = .
(2z − 1)(3z + 2)
3 − 56 z−1
H (z) = .
(1 − 14 z−1 )(1 − 13 z−1 )
Problem 4.10. Find the impulse response of a causal system whose transfer
function is
z+2
H (z) = .
( z − 2) z2
Problem 4.11. Find the inverse z-transform of
z2
X (z) = .
z2 + 1
5 1 5 3
y ( n ) − y ( n − 1) + y(n − 2) − y(n − 3) = 3x (n) − x (n − 1) + x (n − 2).
16 16 4 16
3 1
y ( n ) = x ( n ) − x ( n − 1) + x ( n − 2)
4 8
has a finite output duration for an infinite duration input x (n) = 1/4n u(n) .
Using the z-transform find the output to the input signal x (n) = u(n) −
u ( n − 6) .
11 1 3
y(n) − y(n − 1) + y(n − 2) = 2x (n) − x (n − 1)
6 2 2
x (n + 2) + 3x (n + 1) + 2x (n) = 0
with the initial condition x (0) = 0 and x (1) = 1. Signal x (n) is causal.
196 z-Transform
x ( n + 1) = x ( n ) + a n
to the input signal x (n) = 31n u(n) by a direct solution of the differential
equation in the discrete-time domain and by using the z-transform. The
initial conditions are y(n) = 0 for n < 0.
Problem 4.19. The first backward difference is defined as
∇ x ( n ) = x ( n ) − x ( n − 1 ),
∇ m x ( n ) = ∇ m −1 x ( n ) − ∇ m −1 x ( n − 1 ).
∆x (n) = x (n + 1) − x (n),
∆ m x ( n ) = ∆ m −1 x ( n + 1 ) − ∆ m −1 x ( n ).
sampled at ∆t = 1/60.
Ljubiša Stanković Digital Signal Processing 197
Problem 4.21. Plot the frequency response of the discrete system (comb
filter)
1 − z− N
H (z) =
1 − rz− N
with r = 0.9999 and r 1/N ∼
= 1. Show that this system has the same transfer
function as
4.8 SOLUTIONS
for any z ̸= 0.
(b) For this signal
∞ −1 ∞
(1 − a 2 ) z
X (z) = ∑ a|n| z−n = ∑ a−n z−n + ∑ an z−n = (1 − az)(z − a)
n=−∞ n=−∞ n =0
for |z| < 1/a and |z| > a. If | a| < 1 then the region of convergence is
a < |z| < 1/a.
(c) In this case
∞ ∞
1 −n 1 1 1
X (z) = ∑ n
z + ∑ n z−n = 1 −1
+ 1 −1
n =0 2 n =0 3 1 − 2 z 1 − 3z
2 − 56 z−1 z(2z − 56 )
X (z) = =
(1 − 12 z−1 )(1 − 13 z−1 ) (z − 12 )(z − 13 )
for |z| > 1/2 and |z| > 1/3. The region of convergence is |z| > 1/2.
Solution 4.2. (a) The z-transform is
∞
X (z) = ∑ (δ(n + 1) + δ(n) + δ(n − 1)) z−n =
n=−∞
1
= z + 1 + z −1 = z + 1 + .
z
198 z-Transform
j2π/10
z=e /2
Im{z}
z=1/2
Re{z}
The z-transform is
∞ 9 9
1 −n −n 1 − (2z)−10
X (z) = ∑ x (n )z−n = ∑ z = ∑ ( 2z ) = =
n=−∞ n =0 2
n
n =0 1 − (2z)−1
z−10 z10 − ( 12 )10 z10 − ( 12 )10
= =
z −1 z − 12 z9 (z − 12 )
The expression for X (z) is written in this way in order to find the region of
convergence, observing the zero-pole locations in the z-plane, Fig.4.7. Poles
are at z p1 = 0 and z p2 = 1/2. Zeros are z0i = e j2iπ/10 /2, Fig.4.7. Since the z-
transform has a zero at z0 = 1/2, it will cancel out the pole z p2 = 1/2. The
resulting region of convergence will include the whole z plane, except the
point at z = 0.
Solution 4.3. (a) For X (z) = e a/z holds
dX (z) a a
−z = z 2 e a/z = X (z)
dz z z
Ljubiša Stanković Digital Signal Processing 199
nx (n)u(n) = ax (n − 1)u(n)
dX (z)
since Z [nx (n)] = −z dz and z−1 X (z) = Z [ x (n − 1)]. It means that
a
x (n) = x ( n − 1)
n
It means that
a2 a3
x (1) = a, x (2) = , x (3 ) = ,...
2 2·3
or
an
x (n) = u ( n ).
n!
(b) For X (z) = ln(1 + az−1 )
Therefore
dX (z) az−1
Z [nx (n)] = −z =
dz 1 + az−1
nx (n) = a(− a)n−1 u(n − 1),
producing
−(− a)n
x (n) = u ( n − 1 ).
n
Solution 4.4. (a) The z-transform of signal x (−n) is
∞
X1 ( z ) = ∑ x (−n)z−n .
n=−∞
1
Y ( z ) = X ( z ) X ( ).
z
Solution 4.6. A direct expansion of the given transform into power series,
within the region of convergence, will be used. In order to find the signal
x (n) whose z-transform is X (z) = 2−13z , it should be written in a form of
' '
power series with respect to z−1 . Since the condition ' 3z '
2 < 1 does not
correspond to the region of convergence given in the problem formulation
we have to rewrite X (z) as
1 1
X (z) = − 2
.
3z 1 − 3z
'2'
Now the condition ' 3z ' < 1, that is |z| > 2 , corresponds to the problem for-
3
mulation region of convergence. In order to obtain the inverse z-transform,
write
1 1 1
X (z) = − 2
= − X1 ( z ) ,
3z 1 − 3z 3z
where
1
X1 ( z ) = 2
.
1 − 3z
Ljubiša Stanković Digital Signal Processing 201
∞ * +n ∞ * +n
2 2
X1 ( z ) = ∑ = ∑ z−n .
n =0 3z n =0 3
∞
X (z) = ∑ x (n ) z−n (4.21)
n=−∞
or
* + n −1
1 2
x (n) = − u ( n − 1 ).
3 3
Solution 4.7. Since the signal is causal the region of convergence is outside
the pole with the largest radius (outside the circle passing through this pole).
202 z-Transform
and
* + * +
B 1 B ∞ 2 n −n B ∞ 2 n − n −1 2
3z n∑
2
= − z = ∑ − z , |z| > .
3z 1 + 3z =0 3 3 n =0 3 3
3 − 56 z−1 A B
H (z) = 1 −1 1 −1
= 1 −1
+
(1 − 4 z )(1 − 3 z ) 1− 4z 1 − 13 z−1
with A = 1, B = 2.
(a) The region of convergence must contain |z| = 1, for a stable system. It is
|z| > 13 .
From
1 2
H (z) = + =
1 − 14 z−1 1 − 13 z−1
∞ * +n ∞ * +n
1 −n 1 1 1
= ∑ z +2 ∑ z−n , |z| > and |z| >
n =0
4 n =0
3 3 4
h ( n ) = ( 4− n + 2 × 3− n ) u ( n ).
(b) The region of convergence is 14 < |z| < 13 . The first term in H (z) is the
same as in (a), since it converges for |z| > 14 . It corresponds to the signal
4−n u(n). The second term must be rewritten in such a way that its geometric
series converges for |z| < 13 . Then
∞ −1
2 3z 1
1 −1
= −2 = −2 ∑ (3z)n = −2 ∑ (3z)−m with |z| < .
1− 3z
1 − 3z n =1
m=−n
m=−∞ 3
c) For an anticausal system the region of convergence is |z| < 14 . Now the
second term in H (z) is the same as in (b). For |z| < 14 the first term in H (z)
should be written as:
∞ −1
1 4z 1
=− = − ∑ (4z)n = − ∑ (4z)−m with |z| < .
1 − 14 z−1 1 − 4z n =1
m=−n
m=−∞ 4
204 z-Transform
The signal corresponding to this term is −4−n u(−n − 1). The impulse
response of the anticausal discrete system with given transfer function is
can be written as
1
H (z) = √ √
3 3
(1 − 4z)(z − 4 + j 14 )(z − 4 − j 14 )
√ √
with poles z1 = 1/4, z2 = 43 − j 14 , and z3 = 43 + j 14 . Since |z2 | = |z3 | = 1/2
possible regions of convergence are: 1) |z| < 1/4, 2) 1/4 < |z| < 1/2, and 3)
|z| > 1/2. In the first two cases the system is neither causal nor stable, while
in the third case the system is causal and stable since |z| = 1 and |z| → ∞
belong to the region of convergence.
The output to x (n) = 2 cos(nπ/2) = 1 + cos(nπ ) = 1 + (−1)n is y(n) =
H (e )|ω =0 × 1 + H (e jω )|ω =π × (−1)n = H (z)|z=1 + H (z)|z=−1 (−1)n =
jω
−0.8681 + 0.0945(−1)n .
Solution 4.10. The transfer function can be written as
z+2 A B C
H (z) = = + + 2.
z2 ( z − 2) z−2 z z
Az2 + Bz(z − 2) + C (z − 2) = z + 2
( A + B)z2 + (−2B + C ) − 2C = z + 2.
A+B=0
−2B + C = 1
−2C = 2,
z −1 1 1
H (z) = − 1
− 2− .
1 − 2z z z
Ljubiša Stanković Digital Signal Processing 205
The region of convergence for a causal system is |z| > 2. The inverse z-
transform for a causal system is the system impulse response
h ( n ) = 2n −1 u ( n − 1 ) − δ ( n − 2 ) − δ ( n − 1 ) = δ ( n − 2 ) + 2n −1 u ( n − 3 ).
For the region of convergence defined by |z| > 1 the signal is causal and
1 1
x (n) = [1 + (−1)n ] jn u(n) = [1 + (−1)n ]e jπn/2 u(n).
2 2
For n = 4k, where k ≥ 0 is an integer, x (n) = 1 , while for n = 4k + 2 the
signal values are x (n) = −1. For other n the signal is x (n) = 0.
For |z| < 1 the inverse z-transform is
1
x (n) = − [1 + (−1)n ] jn u(−n − 1).
2
Solution 4.12. The transfer function of this system is
3 − 54 z−1 + 3 −2
16 z 3 − 54 z−1 + 3 −2
16 z
H (z) = 5 −2 1 −3
=
1 − z −1 + 16 z − 32 z (1 − 12 z−1 + 1 −2 1 −1
16 z )(1 − 2 z )
1 1 1
= 1 −1
+B C2 + .
1 − 4z 1 − 14 z−1 (1 − 12 z−1 )
For a causal system the region of convergence is outside of the pole z = 1/2,
that is |z| > 1/2. Since
* +'
1 d z '
B C2 = '
da 1 − az − 1 '
1 − 14 z−1 a=1/4
' '
d ∞ n −(n−1) '' ∞ '
n−1 −(n−1) '
∞
1
= ∑ a z ' = ∑ na z ' = ∑ ( n + 1) n z − n ,
da n=0 ' n =0
' n =0
4
a=1/4 a=1/4
3 1
y ( n ) = x ( n ) − x ( n − 1) + x ( n − 2)
4 8
is
3 1
H ( z ) = 1 − z −1 + z −2 .
4 8
The z-transform of the input signal x (n) = 1/4n u(n) is
1
X (z) = ,
1 − 14 z−1
with the region of convergence |z| > 1/4. The output signal z-transform is
1
H (z) =
1 − 13 z−1
1 − z −6
X ( z ) = 1 + z −1 + z −2 + z −3 + z −4 + z −5 = .
1 − z −1
The z-transform of the output signal is
1 − z −6
Y (z) = = Y1 (z) − Y1 (z)z−6
(1 − z−1 )(1 − 1/3z−1 )
with
1 3/2 1/2
Y1 (z) = = − .
(1 − z−1 )(1 − 1/3z−1 ) 1 − z−1 1 − 13 z−1
Its inverse is - * +n .
3 1 1
y1 ( n ) = − u ( n ).
2 2 3
Ljubiša Stanković Digital Signal Processing 207
Im{z}
Im{z}
Im{z}
1/3 1/3
3/2 3/2 3/2
Figure 4.8 Poles and zeros of the system (left), input signal z-transform (middle), and the
z-transform of the output signal (right).
The output signal transform does not have a pole z = 3/2 since this pole is
canceled out. The output signal is
1 3 1
y(n) = u(n) − u ( n − 1).
3 n 2 3n −1
208 z-Transform
with
z 1 1
X (z) = = − .
z2 + 3z + 2 1 + z −1 1 + 2z−1
The inverse z-transform of X (z) is
Solution 4.17. The z-transforms of the left and right side of the equation are
z
zX (z) − zx (0) = X (z) +
z−a
- .
z 1 1 a
X (z) = = − .
(z − a)(z − 1) 1 − a z − 1 z − a
The inverse z-transform is
1 1 − an
x (n) = [u(n − 1) − an u(n − 1)] = u ( n − 1)
1−a 1−a
or
n −1
x (n) = ∑ ak , n > 0.
k =0
√ √
2 2
with λ1,2 = 4 ±j 4 . The homogenous solution is
√ √ √ √
2 2 n 2 2 n
yh (n) = C1 ( +j ) + C2 ( −j )
4 4 4 4
1 1
= C1 n e jnπ/4 + C2 n e− jnπ/4 .
2 2
A particular solution is of the input signal x (n) = 31n u(n) form. It is y p (n) =
A 31n u(n). The constant A is obtained by replacing this signal into (4.20)
√
1 2 1 1 1 1
A n− A + A n −2 = n
3 2 3n −1 4 3 3
√
3 2 9
A (1 − + ) = 1.
2 4
Its value is A = 0.886. The general solution is
1 jnπ/4 1 1
y(n) = yh (n) + y p (n) = C1 n
e + C2 n e− jnπ/4 + 0.886 n .
2 2 3
Since the system is causal with y(n) = 0 for n < 0 then the constants C1
and
√
C2 may be obtained from the initial condition following from
√
y(n) −
2 2
2 y(n − 1) + 14 y(n − 2) = x (n) as y(0) = x (0) = 1 and y(1) = 2 y (0 ) +
√
x (1) = 22 + 13 ,
C1 + C2 + 0.886 = 1 (4.23)
√ √ √ √ √
2 2 2 2 1 2 1
C1 ( +j )/2 + C2 ( −j )/2 + 0.886 = + ,
2 2 2 2 3 2 3
as C1 = 0.057 − j0.9967 = 0.9984 exp(− j1.5137) = C2∗ . The final solution is
1 1
y(n) = 2 × 0.9984 cos(nπ/4 − 1.5137) + 0.886 n .
2n 3
For the z-domain we write
√
2 1
Y (z) − Y ( z ) z −1 + Y ( z ) z −2 = X ( z )
2 4
with
1 1
Y (z) = √
1− 2 −1
+ 1 −2 1 − 13 z−1
2 z 4z
210 z-Transform
with
z3
Y (z) = √ √ √ √
2 2 2 2 1
(z − ( 4 +j 4 ))( z − ( 4 −j 4 ))( z − 3 )
Using, for example, the residual value based inversion of the z-transform,
M N
n −1
y (n) = ∑ [
√
z
√
Y ( z )( z − z )]
i | z = zi
2 2
z1,2,3 = 4 ± j 4 ,1/3
' '
' '
1 ' '1
= z n +2 √ √ '
' + z n +2 √ √ '
'
2− j 2 1 '√ √ 2+ j 2 1 ' √ √
(z − 4 )(z − 3 ) 2+ j 2 (z − 4 )(z − 3 ) z= 2− j 2
4 4
'
'
1 '
+ z n +2 √ √ √ √ '
'
(z − 2+4 j 2 )(z − 2−4 j 2 ) 'z=1/3
(√ √ ) n +2 (√ √ ) n +2
1 2+j 2 1 1 2−j 2 1
= √ √ √ − √ √ √
j 2 4 2+ j 2
− 1 j 2 4 2− j 2
− 1
2 4 3 2 4 3
1 1
+ √
3n +2 ( 1 − 1 2 + 1 )
9 3 2 4
√ √
1 −j 2 1 j 2 1
= e j(n+2)π/4 √ √ + e− j(n+2)π/4 √ √ + 0.886
2n +2 2+ j 2
− 13 2n +2 − 13 2− j 2 3n
4 4
√ √
1 2 1 2 1
= n e jnπ/4 √ √ 4
+ n e− jnπ/4 √ √ 4
+ 0.886 n
2 2+j 2− 3 2 2−j 2− 3 3
1 1
= 2 × 0.9984 n cos(nπ/4 − 1.5137) + 0.886 n ,
2 3
for n ≥ 1. For n = 0 there is no additional pole at z = 0 the previous result
holds for n ≥ 0.
Solution 4.19. The z-transform of the first backward difference is
Its z-transform is
Z [∇2 x (n)] = (1 − z−1 )2 X (z).
In the same way we get
m −1
Z [∆m x (n)] = ( z − 1)m X (z) − z ∑ ( z − 1 ) m − j −1 ∆ j x (0 ).
j =0
The values of TP1 and TO1 , and TP2 and TO2 , are almost the same for
any ω except ω = ±π/4 where the distance to the transfer function zero is
212 z-Transform
O1
1.5
T
P
1
|H(ejω)|
Im{z}
P
2
0.5
O
2
0
Re{z} -2 - π/4 0 π/4 2 ω
Figure 4.9 Location of zeros and poles for a second order system.
0, while the distance to the corresponding pole is small but finite. Based on
this analysis the amplitude of frequency response is presented in Fig.4.9.
The input discrete-time signal is
This system will filter out signal components at ω = ±π/4. The output
discrete-time signal is
z−
o
N
= 1 = e− j2πm
zom = e j2πm/N , m = 0, 1, ..., N − 1
Similarly, the poles are zmp = r1/N e j2πm/N , m = 0, 1, ..., N − 1. The frequency
response of the comb filter is
N −1 N −1
z − zom z − e j2πm/N
H (z) = ∏ = ∏ 1/N e j2πm/N
.
m=0 z − z pm m =0 z − r
Ljubiša Stanković Digital Signal Processing 213
| H (e jω )| ∼
= 1 for z ̸= e j2πm/N
| H (e jω )| = 0 for z = e j2πm/N .
4.9 EXERCISE
Exercise 4.1. Find the z-transform and the region of convergence for the
following signals:
(a) x (n) = δ(n − 3) − δ(n + 3),
(b) x (n) = u(n) − u(n − 20) + 3δ(n),
(c) x (n) = 1/3|n| + 1/2n u(n),
(d) x (n) = 3n u(−n) + 2−n u(n),
(e) x (n) = n(1/3)n u(n).
(f) x (n) = cos(n π2 ).
Exercise 4.2. Find the z-transform and the region of convergence for the
signals:
(a) x (n) = 3n u(n) − (−2)n u(n) + n2 u(n).
(b) x (n) = ∑nk=0 2k 3n−k ,
(c) x (n) = ∑nk=0 3k .
Exercise 4.3. Find the inverse z-transform of:
−8
(a) X (z) = 1z−z + 3, if X (z) is the z-transform of a causal signal x (n).
(b) X (z) = (zz−+22)z2 , if X (z) is the z-transform of a causal signal x (n).
2+3z−2 , if X (z ) is the z-transform of an unlimited-duration
(c) X (z) = 6z
6z2 −5z+1
signal x (n). Find ∑∞ n=−∞ x ( n ) in this case.
3 1
y ( n ) − y ( n − 1) + y ( n − 2 ) = x ( n ). (4.24)
4 8
to the input signal x (n) = nu(n) by:
(a) a direct solution in the time domain.
(b) using the z-transform.
The initial conditions are y(n) = 0 for n < 0, that is y(0) = x (0) = 0 and
y(1) = 3y(0)/4 + x (1) = 1.
Exercise 4.10. A causal discrete system is described by the difference equa-
tion
5 1
y ( n ) − y ( n − 1) + y ( n − 2 ) = x ( n ). (4.25)
6 6
If the input signal is x (n) = 1/4n u(n) find the output signal if the initial
value of the output was y(0) = 2.
Ljubiša Stanković Digital Signal Processing 215
Hint: Since y(0) does not follow from (4.25) obviously the system
output was "preloaded" before the input is applied. This fact can be taken
into account by changing the input signal at n = 0 to produce the initial
output. It is x (n) = 1/4n u(n) + δ(n). Now the initial conditions are y(0) = 2
and y(1) = 5/3 + 1/4 = 23/12 and we can apply the z-transform with this
new input signal.
1
x ( n + 2) − x ( n + 1) + x ( n ) = 0
2
with initial condition x (0) = 0 and x (1) = 1/2. The signal x (n) is causal.
and r = 0.9999 plot the amplitude of the frequency response and find the
output to the signal
T
RANSFORMATION
discrete-time systems is of high importance. Some discrete-time sys-
tems are designed and realized in order to replace or perform as
equivalents of continuous-time systems. It is quite common to design a
continuous-time system with desired properties, since the designing pro-
cedures in this domain are simpler and well developed. In the next step
the obtained continuous-time system is transformed into an appropriate
discrete-time system.
Consider an Nth order linear continuous-time system described by a
differential equation with constant coefficients
217
218 From Continuous to Discrete Systems
Δt t n
Figure 5.1 Sampling of the impulse response for the impulse invariance method.
h(n) = hc (n∆t)∆t.
Obviously this relation can be used only if the sampling theorem is satisfied
for the sampling interval ∆t. It means that the frequency response of the
continuous-time system satisfies the condition
and ∆t < π/Ωm . Otherwise the discrete-time version will not correspond to
the continuous-time version of the frequency response. Here, the discrete-
time system frequency response is related to a periodically extended form
of the continuous-time system frequency response H (Ω) as
∞
∑ H (Ω + 2kπ/∆t) = H (e jω ), Ω = ω/∆t.
k=−∞
a N s N + ... + a1 s + a0 k1 k2 kM
H (s) = = + + ··· + , (5.1)
b M s M + ... + b1 s + b0 s − s1 s − s2 s − sM
Ljubiša Stanković Digital Signal Processing 219
where only simple poles of the transfer function are assumed. The case of
multiple poles will be discussed later. The inverse Laplace transform of a
causal system, described by the previous transfer function, is
h c ( t ) = k 1 e s1 t u ( t ) + k 2 e s2 t u ( t ) + · · · + k M e s M t u ( t ).
h(n) = hc (n∆t)∆t = [k1 ∆tes1 n∆t u(n) + k2 ∆tes2 n∆t u(n) + ... + k M ∆tes M n∆t u(n)],
since u(n∆t) = u(n). The z-transform of the impulse response h(n) of the
discrete-time system is
k1 ∆t k2 ∆t k M ∆t
H (z) = −
+ −
+ ··· + . (5.2)
1−e s 1 ∆t z 1 1−e s 2 ∆t z 1 1 − es M ∆t z−1
By comparing (5.1) and (5.2) it can be concluded that the terms in the
transfer functions are transformed from the continuous-time to the discrete-
time case as
ki k i ∆t
→ . (5.3)
s − si 1 − esi ∆t z−1
If a multiple pole, of an (m + 1)th order, exists in the continuous-time
system transfer function then it holds
ki 1 dm k i
= .
( s − si ) m + 1 m! dsim s − si
si → esi ∆t .
s=jΩ
j2π/Δt
jπ/Δt jω
z=e
Im{z}
Im{s}
-j π/Δt
-j2 π/Δt
Re{s} Re{z}
forms assume that the discrete-time impulse response h(n) = hc (t)|t=+0 . Re-
mind that the theory of Fourier transforms in this case states that the inverse
Fourier transform IFT P { H ( jΩ)} = hc (t) where
Q the signal hc (t) is continuous
and IFT{ H ( jΩ)} = hc (t)|t=−0 + hc (t)|t=+0 /2 at the discontinuity points,
in this case at t = 0. The special case of discontinuity at t = 0 can be easily
detected by mapping H (s) into H (z) and by checking, for a causal system,
is the following relation satisfied
with
k1 = H (s)(s + 1)|s=−1 = −1,
'
1 ''
k2 = H (s)(s + )' = 2.
2 s=−1/2
Thus, we get
−1 2
H (s) = + .
s+1 s + 12
According to (5.3) the discrete-time system is
−1 2
H (z) = + .
1 − e −1 z −1 1 − e−1/2 z−1
Since limz→∞ H (z) = 1 obviously there is a discontinuity in the impulse
response and the resulting transfer function should be corrected as
−1 2
H (z) = + − 1/2.
1 − e −1 z −1 1 − e−1/2 z−1
Impulse and frequency responses of the systems with uncorrected and cor-
rected discontinuity effect are presented in Fig.5.3.
with
k1 = H (s)(s + 1/2)|s=−1/2 = −7,
k2 = 27/8,
222 From Continuous to Discrete Systems
0.5 0.5
0 0
-5 0 5 10 15 -5 0 5 10 15
4 jω
4 jω
|H(e )| |H(e )|
3 |H(jΩ)| 3 |H(jΩ)|
2 2
1 1
0 0
-2 0 2 -2 0 2
Figure 5.3 Impulse responses of systems in continuous and discrete-time domains (top). Am-
plitude of the frequency response of systems in continuous and discrete-time domains (bot-
tom). System without discontinuity correction (left) and system with discontinuity correction
(right).
'
'
k3 = H (s)(s + 1)2 ' = 5/4.
s=−1
The coefficient k4 follows, for example, from
H (0) = 1 = 2k1 + 3k2 + k3 + k4 ,
as
k4 = 29/8.
Thus, we get
−7 27/8 5/4 29/8
H (s) = 1
+ 1
+ 2
+ .
s+ 2 s+ 3 ( s + 1) s+1
According to (5.3) and (5.4) the discrete-time system is
−7 27/8
H (z) = +
1 − e−1/2 z−1 1 − e−1/3 z−1
'
d 5/4 ' 29/8
+ { } ' +
s
dsi 1 − e i z − 1 ' 1 − e −1 z −1
si =−1
−7z 27z/8 5e−1 z/4 29z/8
= + + + .
z−e − 1/2 z−e − 1/3 ( z − e −1 )2 z − e −1
Ljubiša Stanković Digital Signal Processing 223
s=jΩ
jω
z=e
Im{z}
Im{s}
1
-1 2/32/3 1.9894
Re{s} Re{z}
Figure 5.4 Pole-zero locations in the s-domain and the z-domain using the impulse invariance
method.
we can easily see that the poles are mapped according to s pi → es pi ∆t , Fig.5.4,
while there is no direct correspondence among zeros of the transfer functions.
Impulse responses of continuous-time system and discrete-time system are
presented in Fig.5.5.
"∞ ∞
X (s) = x (t)e−st dt ∼
= ∑ x (n)e−sn∆t = X (z)|z=es∆t .
−∞ n=−∞
This approximation leads to a relation between the Laplace domain and the
z-domain in the form of
z = es∆t .
224 From Continuous to Discrete Systems
0.3
h (t), h(n)
c
0.2
0.1
-0.1
0 5 10 15 20 25 30 35 40
jω
|H(jΩ)|, |H(e )|
1
0.5
0
-3 -2 -1 0 1 2 3
1
10
20log|H(jΩ)|
jω
0 20log|H(e )|
10
-1
10
-2
10
-3 -2 -1 0 1 2 3
Figure 5.5 Impulse responses of systems in continuous and discrete-time domains (top).
Amplitude of the frequency response of systems in continuous and discrete-time domains
(middle). Amplitude of the frequency response of systems in continuous and discrete-time
domains in logarithmic scale (bottom).
If we use this relation to map all zeros and poles of a continuous system
transfer function
z0i = es0i ∆t
z pi = es pi ∆t ,
Ljubiša Stanković Digital Signal Processing 225
s=jΩ
j2π/Δt
jπ/Δt jω
z=e
Im{z}
Im{s}
-j π/Δt
-j2 π/Δt
Re{s} Re{z}
Figure 5.6 Illustration of the zeros and poles mapping in the matched z−transform method.
dx (t)
y(t) =
dt
∼ n∆t) − x ((n − 1)∆t) .
y(n∆t) =
x (
∆t
The Laplace transform domain of the continuous-time first derivative is
1 − z −1
Y (z) = X ( z ). (5.6)
∆t
Based on (5.5) and (5.6) we can conclude that a mapping of the correspond-
ing differentiation operators from the continuous-time to the discrete-time
domain is
1 − z −1
s= . (5.7)
∆t
With a normalized discretization step ∆t = 1 this mapping is of the form
s = 1 − z −1 .
"t −∆t
t"
y(t) = x (t)dt ∼
= x (t)dt + x (n∆t)∆t.
−∞ −∞
y(n∆t) ∼
= y(n∆t − ∆t) + x (n∆t)∆t.
Ljubiša Stanković Digital Signal Processing 227
The Laplace and the z-transform domain forms of the previous integral
equations are
1
Y (s) = X (s)
s
∆t
Y (z) = X ( z ).
1 − z −1
1 − s → z −1 . (5.8)
Now we will consider the region that corresponds to the imaginary axis and
the left semi-plane of the s-domain (containing poles of a stable system),
Fig.5.7(left). The aim is to find the corresponding region in the z-domain.
If we start from the s-domain and the region in Fig.5.7(left), the first
mapping is to reverse the s-domain to −s and shift it for +1, as
1 − s → p.
1
Re{ p} = Re{ }
z
1
1 = Re{ }
x + jy
1 x − jy
1 = Re{ }
x + jy x − jy
228 From Continuous to Discrete Systems
s=0+jΩ p=1
z=ejω
Im{p}
Im{z}
Im{s}
-1
1-s → p p→ z
Figure 5.7 Illustration of the differentiation based mapping of the left s−semi-plane with the
imaginary axis (left), translated and reversed p−domain (middle), and the z−domain (right).
resulting in
x
1=
x2 + y2
or in * +2
1 1
( x − )2 + y2 = . (5.9)
2 2
Therefore, the imaginary axis in the s-plane is mapped onto a circle defined
by (5.9), Fig.5.7(right) in the z-plane. From the mapping relation 1 − s → z−1
it is easy to conclude that the origin s = 0 + j0 maps into z = 1 and that
s = 0 ± j∞ maps into z = ±0, according to 1/ (1 − s) → z.
Mapping of the imaginary axis into z-domain can also be analyzed
from
1 − (re jω )−1 1 − r −1 cos ω r −1
σ + jΩ → = +j sin ω.
∆t ∆t ∆t
For σ = 0 follows
1 − r −1 cos ω = 0 (5.10)
r = cos ω,
with
r −1 tan ω
Ω=sin ω = .
∆t ∆t
Obviously ω = 0 maps to Ω = 0 (with Ω ∼ = ω/∆t for small ω), and ω =
±π/2 maps into Ω → ±∞. Thus, the whole imaginary axis maps onto
−π/2 ≤ ω ≤ π/2. These values of ω could be used within the basic period.
Relation (5.10), with −π/2 ≤ ω ≤ π/2, is a circle defined by (5.9) if we
Ljubiša Stanković Digital Signal Processing 229
, ,
replace r = x2 + y2 and cos ω = x/ x2 + y2 with σ < 0 (semi-plane with
negative real values) being mapped into r < cos ω (interior of unit circle).
Example 5.4. A continuous-time system is described by a differential equation
3 1
y′′ (t) + y′ (t) + y(t) = x (t),
4 8
with zero initial conditions and the transfer function
1
H (s) = .
s + 4 s + 18
2 3
P ⋆A discrete-time
Q system transfer function is obtained by replacing
s = 1 − z−1 /∆t in H (s) as
1
H (z) = B C2
1 − z −1 3 1 − z −1 1
∆t + 4 ∆t + 8
(∆t)2
= 2
1 + 34 ∆t + 18 (∆t) − [2 + 34 ∆t]z−1 + z−2
with
y(n) = B0 x (n) + A1 y(n − 1) + A2 y(n − 2)
(∆t)2
B0 = = 0.1778
1 + 34 ∆t + 18 (∆t)2
[2 + 34 ∆t]
A1 = 3 1 2
= 1.6889
1+ 4 ∆t + 8 (∆t )
1
A2 = − = −0.7111,
1 + 34 ∆t + 18 (∆t)2
where ∆t = 1/2. For x (t) = u(t) in the continuous-time case
1
Y (s) = H (s) X (s) =
s(s2 + 34 s + 18 )
8 8 16
= + 1
− 1
s s+ 2 s+ 4
with
y(t) = [8 + 8e−t/2 − 16e−t/4 ]u(t).
The results of the difference equation for y(n) are compared with the exact
solution y(t) in Fig.5.8. The agreement is high. It could be additionally
improved by reducing the sampling interval, for example, to ∆t = 1/8.
230 From Continuous to Discrete Systems
10
y(t), y(n)
0 5 10 15
Figure 5.8 Exact solution of the difference equation y(t) in solid line and the discrete-time
system output y(n) in large dots for ∆t = 1/2 and in small dots for ∆t = 1/8..
"t −∆t
t"
x (n∆t) + x ((n − 1)∆t)
y(t) = x (t)dt ∼
= x (t)dt + ∆t
2
−∞ −∞
x ( n ) + x ( n − 1)
y ( n ) = y ( n − 1) + ∆t.
2
In the Laplace and the z-transform domain, these relations have the forms
1
Y (s) = X (s)
s
∆t 1 + z−1
Y (z) = X ( z ).
2 1 − z −1
Ljubiša Stanković Digital Signal Processing 231
2 1 − z −1
s→ . (5.11)
∆t 1 + z−1
y ( n ) = x ( n ) − x ( n − 1 ).
The same signal samples can used for the first-order forward derivative
approximation
y ( n − 1 ) = x ( n ) − x ( n − 1 ).
If we assume that the difference x (n) − x (n − 1) fits better to the mean
of y(n) and y(n − 1) than to any single one of them, then the derivative
approximation by using the difference equation
y ( n ) + y ( n − 1)
= x ( n ) − x ( n − 1 ),
2
produces the bilinear transform.
In order to prove that the imaginary axis in the s−domain corresponds
to the unit circle in the z−domain we may simply replace z = e jω into (5.11)
and obtain
1 − e− jω e jω/2 − e− jω/2 ω
2 −
= 2 jω/2 = 2j tan( ) → s∆t.
1+e jω e + e− jω/2 2
For s = σ + jΩ follows
σ=0
2 ω
Ω= tan( ).
∆t 2
2 ω ω
Ω= tan( ) ∼
= , for |ω | ≪ 1.
∆t 2 ∆t
232 From Continuous to Discrete Systems
From
s∆t
1+ 2
z= s∆t
1− 2
F
(1 + σ∆t 2
2 ) + ( Ω∆t
2 )
2
|z| = F
(1 − σ∆t 2
2 ) + ( Ω∆t
2 )
2
it may easily be concluded that σ < 0 maps into |z| < 1, since 1 + σ∆t
2 <
σ∆t
1 − 2 for σ < 0.
The bilinear transform mapping can be derived by using a series of
complex plane mappings. Since
s∆t
1+ 2 2
z= s∆t
= − 1,
1− 2 1 − s∆t
2
we can write
s∆t
1− → p1 ,
2
1
→ p2 ,
p1
2p2 − 1 → z.
s=0+jΩ p=1
Im{p1}
Im{s}
-1
p1 → p2
Re{s} Re{p1}
jω
z=e
Im{p }
Im{z}
1 1
2
Re{p2} 2p2-1 → z
Re{z}
Figure 5.9 Bilinear mapping illustration trough a series of elementary complex plane map-
pings.
and to stop all other possible signal components. The parameters are Q =
0.01, Ω1 = π/4, and Ω2 = 3π/5. The signal is sampled with ∆t = 1 and
the discrete-time signal x (n) is formed. Using the bilinear transform, design
the discrete system that corresponds to the continuous-time system with the
transfer function H (s).
⋆For the beginning just use the bilinear transform relation
1 − z −1
s→2 (5.12)
1 + z −1
and map H (s) to HB (z) without any pre-modification. The result is presented
in the first two subplots of Fig.5.10. The discrete frequencies are shifted since
the bilinear transform (5.12) made a nonlinear frequency mapping from the
234 From Continuous to Discrete Systems
0.016569(1 + z−1 )2
=
4.65327z−2 − 6.6272z−1 + 4.7195
0.0551(1 + z−1 )2
+
11.4677z 2 + 7.1556z−1 + 11.6879
−
0.003567(1 + z−1 )2
= −1
(z − 1.0071e j0.25π )(z−1 − 1.0071e− j0.25π )
0.0048(1 + z−1 )2
+
( z −1 − 1.0096e j0.6π )(z−1 − 1.0096e− j0.6π )
Ljubiša Stanković Digital Signal Processing 235
Ω1 Ω2
1
H(s)
s → 2(1-z
0.5
0
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1
-1
)/(1+z )
1
HB(z)
-1
0.5
0
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1
frequency ω/π or ΩΔt/π
s → 2(1-z
0.5
0
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1
-1
)/(1+z )
1
H(z) ω =Ω Δt ω2=Ω2Δt
1 1
-1
0.5
0
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1
frequency ω/π or ΩΔt/π
Figure 5.10 Amplitude of the continuous-time system with transfer function H (s) and the
amplitude of the transfer function HB (z) of the discrete-time system obtained by the bilinear
transform (first two subplots). A premodified system to take into account the frequency map-
ping nonlinearity in the bilinear transform Hd (s) and the amplitude of the transfer function
H (z) of the discrete-time system obtained by the bilinear transform of Hd (s) (last two subplots).
Sampling
Fourier transform
Method theorem
H (s)|s= jΩ → H (z)|z=e jω
condition
Impulse Invariance Yes, Ω = ω/∆t Yes
Matched z-transform No No
First-oder difference No No
tan(ω/2)
Bilinear transform Yes, Ω = ∆t/2 No
|H(ejω)|2
ω ω
-2 π -π - ωc ωc π 2π -2 π -π 0 π 2π
Figure 5.11 Lowpass filter frequency response: ideal case (left) and Butterworth type (right).
Ljubiša Stanković Digital Signal Processing 237
k i = H (s)(s − si )|s=si
k0 = (−0.3628 + j0.1503)/∆t,
k1 = (0.3628 − j0.8758)/∆t,
k2 = (0.3628 + j0.8758)/∆t,
k3 = (−0.3628 − j0.1503)/∆t.
Using the impulse invariance method we get the transfer function of the
discrete-time fourth order Butterworth filter
k0 ∆t k1 ∆t k2 ∆t k3 ∆t
H (z) = + + +
1 − es0 ∆t z−1 1 − es1 ∆t z−1 1 − es2 ∆t z−1 1 − es3 ∆t z−1
−0.3628 + j0.1503 0.3628 − j0.8758
= +
1 − eωc (−0.3827+ j0.9239) z−1 1 − eωc (−0.9239+ j0.3827) z−1
0.3628 + j0.8758 −0.3628 − j0.1503
+ + .
1 − eωc (−0.9239− j0.3827) z−1 1 − eωc (−0.3827− j0.9239) z−1
238 From Continuous to Discrete Systems
ωd = Ωd ∆t = 0.8284
H (z) =
ωd 4
z −1 2 − z −1 −1 −1
[4( 11−
+ z −1
) + 2ωd 0.7654 11+ z −1
−z )2 + 2ω 1.8478 1−z + ω 2 ]
+ ωd2 ][4( 11+ z −1 d 1 + z −1 d
0.4710
= z −1 2 − z −1 −1 −1
[4( 11−
+ z −1
) + 0.6863][4( 11−
+ 1.2626 11+ z −1
z
+ z −1
−z + 0.6863]
)2 + 3.0481 11+ z −1
P − 1
Q 4
0.4710 1 + z
=P QP Q
−
3.4237z − 6.6274z + 5.9484 1.6382z−2 − 6.6274z−1 + 7.7704
2 − 1
P Q4
0.084 1 + z−1
P
= −2 Q P Q
z − 1.9357z−1 + 1.7343 z−2 − 4.0455z−1 + 4.7433
0.084z−4 + 0.336z−3 + 0.504z−2 + 0.336z−1 + 0.084
= .
z−4 − 5.9810z−3 + 14.3z−2 − 16.1977z−1 + 8.2263
The transfer function (amplitude and phase) of the continuous-time
filter and the discrete-time filters obtained by using the impulse invariance
method and the bilinear transform are presented in Fig.5.12, within one
Ljubiša Stanković Digital Signal Processing 239
1.5
arg{H(e }
1
jω
|H(e |
jω
0
0.5
-π
0 ω ω
-π - π/2 0 π/2 π -π - π/2 0 π/2 π
Figure 5.12 Amplitude and phase of the fourth order Butterworth filter frequency response
obtained by using the impulse invariance method and bilinear transform.
2 1 − z −1
s=
∆t 1 + z−1
Ljubiša Stanković Digital Signal Processing 241
as
1.13273
H (z) = B C2
1 − z −1 − z −1 −z−1 + 1.13272 )
(2 1 + z −1
+ 1.1327)( 2 11+ z − 1 + 2.2653 11+ z −1
1.4533(1 + z−1 )3
=
(−0.8673z−1 + 3.1327)(3.0177z−2 − 5.434z−1 + 7.54)
−0.5553z−3 − 1.6658z−2 − 1.6658z−1 − 0.5553
=
z−3 − 5.4127z−2 + 9.0028z−1 − 9.0249
0.0615z3 + 0.1846z2 + 0.1846z + 0.0615
= .
z3 − 0.9975z2 + 0.5998z − 0.1108
The corresponding difference equation is
y(n) = 0.9975y(n − 1) − 0.5998y(n − 2) + 0.1108y(n − 3)
+ 0.0615x (n) + 0.1846x (n − 1) + 0.1846x (n − 2) + 0.0615x (n − 3).
Kπ ≤ Ω0 ∆t < (K + 1)π
will, after sampling, result into a component within the basic period of the
Fourier transform of discrete-time signal, corresponding to the continuous
K
signal at exp( j(Ω0 t − ∆t πt) This effect is known as aliasing. The most
obvious visual effect is when a wheel rotating with f 0 = 25 [Hz], Ω0 = 50π, is
sampled in a video sequence at ∆t = 1/50 [sec]. Then Ω0 ∆t = π corresponds
to exp( j(Ω0 t − 50πt)) = e j0 , i.e., the wheel looks as a static (nonmoving)
object.
HH (e jω ) = H (e j(ω −π ) ).
|H (ejω)|2
jω 2
1
|H(e )|
H
ω ω
-2 π -π -ω ω π 2π -2 π -π 0 π 2π
c c
x(n) y(n)
× h(n) ×
n n
(-1) (-1)
1.5 1.5
1 1
|HH(ejω|
|H(e |
jω
0.5 0.5
0 ω 0 ω
-π - π/2 0 π/2 π -π - π/2 0 π/2 π
Figure 5.15 Amplitude of frequency response of a lowpass Butterworth filter (left) and a filter
obtained from the lowpass Butterworth filter when z is replaced by −z (right).
jω 2
|H(ejω)|2
|HB(e )|
ω ω
-2 π -π - ωc ωc π 2π -2 π - π - ω0 0 ω0 π 2π
× h(n) ×
× h(n) ×
cos(ω0n) 2cos(ω0n)
Figure 5.17 Bandpass system realization using corresponding lowpass systems and signal
modulation.
∞ ∞
y(n) = h B (n) ∗ x (n) = ∑ h B (m) x (n − m) = 2 ∑ cos(ω0 m)h(m) x (n − m)
m=−∞ m=−∞
∞
=2 ∑ cos(ω0 n + ω0 m − ω0 n)h(m) x (n − m)
m=−∞
∞
=2 ∑ [cos(ω0 n) cos(ω0 m − ω0 n) − sin(ω0 n) sin(ω0 m − ω0 n)]h(m) x (n − m)
m=−∞
∞
= 2 cos(ω0 n) ∑ cos(ω0 (n − m)) x (n − m)h(m)
m=−∞
∞
+2 sin(ω0 n) ∑ sin(ω0 (n − m)) x (n − m)h(m).
m=−∞
z+2 z − 1a e jθ
Hs (z) = H (z) H A (z) = e− j2θ .
(z − 12 )(z − 13 )(z − 2) 1 − 1a e− jθ z
For a = 1/2 and θ = 0 we get
z+2 z−2
Hs (z) =
(z − 12 )(z − 13 )(z − 2) 1 − 2z
z+2
=− .
2(z − 12 )2 (z − 13 )
This system has the same frequency response amplitude as the initial system
' ' ' ' ' '
' ' ' ' ' '
'Hs (e jω )' = 'H (e jω ) H A (e jω )' = 'H (e jω )' .
Ljubiša Stanković Digital Signal Processing 247
where 0 < ai < 1 and θi , i = 1, 2, ..., N are arbitrary constants and phases. The
resulting frequency response amplitude is
' '
' '
'H A (e jω )' = 1.
This system can be used for multiple poles cancellation and phase correc-
tion.
1
Hi (z) = .
H (z)
It is obvious that
H (z) Hi (z) = 1
h ( n ) ∗ h i ( n ) = δ ( n ).
This kind of system can be used to reverse the signal distortion. For ex-
ample, assume that the Fourier transform of a signal x (n) is distorted dur-
ing transmission by a transfer function H (z), i.e., the received signal z-
transform is R(z) = H (z) X (z). In that case the distortion can be compen-
sated by processing the received signal using the inverse system. The output
signal is obtained as
1
Y (z) = R ( z ) = X ( z ).
H (z)
The system Hi (z) = 1/H (z) should be stable as well. It means that the poles
of the inverse system should be within the unit circle. The poles of the
inverse system are equal to the zeros of H (z).
The system H (z) whose both poles and zeros are within the unit circle
is called a minimum phase system.
248 From Continuous to Discrete Systems
If a system is the minimum phase system (with all poles and zeros
within |z| < 1) then this system has a minimum group delay out of all
systems with the same amplitude of the frequency response. Thus, any
nonminimum phase system will have a more negative phase compared
to the minimum phase system. The negative part of the phase is called
the phase-lag function. The name minimum phase system comes from the
minimum phase-lag function.
In order to prove this statement consider a system H (z) with the sam-
ple amplitude of the frequency response as a nonminimum phase system
Hmin (z). Its frequency response can be written as
z−1 − ae− jθ
H (z) = Hmin (z) H A (z) = Hmin (z)
1 − ae jθ z−1
Here we assumed the first-order allpass system without any loss of gener-
ality, since the same proof can be used for any number of allpass systems
that multiply Hmin (z). Since 0 < a1 < 1 and the system Hmin (z) is stable the
system H (z) has a zero at |z| = 1/a1 > 1.
The phases of the system are related as
e− jω − ae− jθ
arg{ H A (e jω )} = arg{ }
1 − ae jθ e− jω
1 − ae− jθ e jω
= arg{e− jω } = −ω + arg{1 − ae− jθ e jω }
1 − ae jθ e− jω
a sin(ω − θ )
− arg{1 − ae jθ e− jω } = −ω − 2 arctan .
1 − a cos(ω − θ )
d arg{ H A (e jω )} a cos(ω − θ ) − a2
τgA (ω ) = − =1+2
dω 1 − 2a cos(ω − θ ) + a2
1 − a2 1 − a2
= 2
=' ' .
1 − 2a cos(ω − θ ) + a '1 − ae j(ω −θ ) '2
250 From Continuous to Discrete Systems
τg (ω ) = τg min (ω ) + τgA (ω )
τg (ω ) ≥ τg min (ω ),
1 − ae− jθ
arg{ H A (e j0 )} = arg{ }=0 (5.16)
1 − ae jθ
"ω
arg{ H A (e jω )} = − τg (ω )dω ≤ 0 (5.17)
0
since τg (ω ) > 0 for 0 ≤ ω < π.
We can conclude that the minimum phase systems satisfy the following
conditions.
1. A minimum phase system is system of minimum group delay out
of the systems with the same amplitude of frequency response. A system
containing one or more allpass parts with uncompensated zeros outside of
the unit circle will have larger delay than the system which does not contain
zeros outside the unit circle.
2. The phase of a minimal phase system will be lower than the phase
of any other system with the same amplitude of frequency response since,
according to (5.17),
This proves the fact that the phase of any system arg { H (e jω ) is always
lower than the phase of minimal phase system arg { Hmin (e jω )}, having the
same amplitude of the frequency response.
3. Since the group delay is minimal we can conclude that
n n
∑ |hmin (m)|2 ≥ ∑ |h(m)|2
m =0 m =0
This relation may be proven in a similar way like minimal phase property,
by considering the outputs of a minimum phase system and a system
H (z) = Hmin (z) H A (z).
Ljubiša Stanković Digital Signal Processing 251
Example 5.12. A system has absolute squared amplitude of the frequency response
equal to
B C2
' '2 5
' ' 2 cos(ω ) + 2
'H (e jω )' =
(12 cos(ω ) + 13)(24 cos(ω ) + 25)
Find the corresponding minimal phase system.
⋆ For the system we can write
' '2
' '
'H (e jω )' = H (e jω ) H ∗ (e jω ) = H (e jω ) H (e− jω )
In the z−domain the system with this amplitude of the frequency response
(with real-valued coefficients) satisfies
' ' ' '2
∗ 1 '
' 1 '' ' '
H (z) H ( ∗ )' = H (z) H ( )' = 'H (e jω )' = H (e jω ) H (e− jω ).
z z=e jω z z=e jω
In this sense
B C2
'
'
'2
' e jω + e− jω + 5
2
'H (e jω )' =
(6e jω + 6e− jω + 13)(12e jω + 12e− jω + 25)
and
B C2
1 z+ 5
2 + z −1
H (z) H ( ) =
z (6z + 13 + 6z−1 )(12z + 25 + 12z−1 )
B C2
z2 + 52 z + 1
=
(6z2 + 13z + 6)(12z2 + 25z + 12)
1 (z + 2)2 (z + 12 )2 1 ( 1z + 12 )2 (z + 12 )2
= = .
36 (z + 23 )(z + 32 )(z + 34 )(z + 43 ) 36 (z + 23 )( 1z + 23 )(z + 34 )( 1z + 34 )
The minimum phase system, with the desired amplitude of the frequency
response, is a part of H (z) H ∗ ( z1∗ ) with zeros and poles inside the unit circle
1 (z + 12 )2
H (z) = .
6 (z + 23 )(z + 34 )
5.6 PROBLEMS
2s
H (s) = − .
s2 + 2s + 2
(1 + 4s)
H (s) = .
(s + 1/2)(s + 1)3
2QΩ1
H (s) =
s2 + 2Ω1 Qs + Ω21 + Q2
Ljubiša Stanković Digital Signal Processing 253
x (t) = A1 cos(Ω1 t + ϕ1 )
and to stop all other possible signal components. The parameters are Q =
0.01, Ω1 = π/2. The signal is sampled with ∆t = 1 and a discrete-time signal
x (n) is formed. Using bilinear transform design the discrete system that
corresponds to the continuous-time system with transfer function H (s).
Problem 5.7. (a) By using the bilinear transform find the transfer function of
the second-order Butterworth filter with f ac = 4kHz. The sampling interval
is ∆t = 50µ sec.
(b) Translate the discrete-time transfer function to obtain a highpass filter.
Find its corresponding critical frequency in the continuous-time domain.
Problem 5.8. Design a discrete-time lowpass Butterworth filter for the sam-
pling frequency 1/∆t = 10 kHz. The passband should be from 0 to 1 kHz,
maximal attenuation in the passband should be 3 dB and the attenuation
should be more than 10 dB for frequencies above 2 kHz.
Problem 5.9. Using the impulse invariance method design a Butterworth
filter with the passband frequency ω p = 0.1π and stopband frequency
ωn = 0.3π in the discrete domain. Maximal attenuation in the passband
region should be less than 2dB, and the minimal attenuation in the stopband
should be 20dB.
Problem 5.10. Highpass filter can be obtained from a lowpass by using
HH (s) = H (1/s). Using the bilinear transform with ∆t = 2 we can trans-
form the continuous-time domain function into discrete domain using the
relation s = (z − 1)/(z + 1). If we have a design of a lowpass filter how to
change its coefficients in order to get a highpass filter.
Problem 5.11. For filtering of a continuous-time signal a discrete-time filter
is used. Find the corresponding continuous-time filter frequencies if the
discrete-time filter is: a) a lowpass with ω p = 0.15π, b) bandpass within
0.2π ≤ ω ≤ 0.25π, c) a highpass with ω p = 0.35. Consider cases when
∆t = 0.001s and ∆t = 0.1s.
What should be the starting frequencies to design these systems in the
continuous-time domain if the impulse invariance method is used and what
are the design frequencies if the bilinear transform is used?
Problem 5.12. A transfer function of the first-order lowpass system is
1−α
H (z) = .
1 − αz−1
254 From Continuous to Discrete Systems
Problem 5.13. Using allpass system find stable systems with the same
amplitude of the frequency response as the systems:
(a)
2 − 3z−1 + 2z−2
H1 (z) =
1 − 4z−1 + 4z−2
(b)
z
H2 (z) = .
(4 − z)(1/3 − z)
Problem 5.14. The z-transform
Problem 5.15. A signal x (n) has passed trough a media whose influence
can be described by the transfer function
√
(4 − z)(1/3 − z)(z2 − 2z + 14 )
H (z) = .
z − 12
5.7 SOLUTIONS
1
LC 25
H (s) = =
s2 + s RL + LC
1 s2 + 8s + 25
25
=
(s + 4 + 3j)(s + 4 − j3)
Ljubiša Stanković Digital Signal Processing 255
j 25
6 − j 25
6
H (s) = + .
s + 4 + j3 s + 4 − j3
The poles are mapped using
s i → zi = e si .
j 25
6 − j 25
6
H (z) = +
1 − e−(4+ j3) z−1 1 − e−(4− j3) z−1
25 −4 −1
3 e z sin 3
= ,
1 − 2e cos 3z 1 + e−8 z−2
− 4 −
25 −4
y(n) = e sin(3) x (n − 1) + 2e−4 cos(3)y(n − 1) − e−8 y(n − 2).
3
The output signal values can be calculated for any input signal using
this difference equation. For x (n) = δ(n) the impulse response would follow.
The impulse response can be obtained in a closed form from
as
25 −4n − j3n
h(n) = e ( je − je j3n )u(n) =
6
25
= e−4n sin(3n)u(n).
3
Solution 5.2. The system is not of lowpass type. For s → ∞ we get H (s) → 1.
Thus, the impulse invariance method cannot be used. The bilinear trans-
form can be used. It produces
(1 − z −1 )2 −z + 3−1
4 (1+z−1 )2 − 6 11+ z −1 13z−2 − 2z−1 + 1
H (z) = = .
(1 − z −1 )2 −z + 3
4 (1+z−1 )2 + 6 11+
−1 z−2 − 2z−1 + 13
z −1
256 From Continuous to Discrete Systems
3 1
y′′ (t) + y′ (t) + y(t) = x (t)
2 2
the transfer function is
1
H (s) = 3 1
.
s2 + 2s + 2
Corresponding discrete system is obtained using
1 − z −1
s→ = 10(1 − z−1 )
∆t
as
1
H (z) =
100(1 − z−1 )2 + 32 10(1 − z−1 ) + 1
2
1
= 231
.
100z−2 − 215z−1 + 2
2 430 200
y(n) = x (n) + y ( n − 1) − y ( n − 2 ).
231 231 231
Solution 5.4. The transfer function can be written as
1+j 1−j
H (s) = − − .
s+1−j s+1+j
1 − z −2
H ( z ) = −2 .
5 − 2z−1 + z−2
Solution 5.5. (a) The transfer function
(1 + 4s)
H (s) =
(s + 1/2)(s + 1)3
Ljubiša Stanković Digital Signal Processing 257
k1 k2 k3 k4
H (s) = + + +
s + 1/2 (s + 1) (s + 1)2 ( s + 1 )3
'
with k1 = H (s)(s + 1/2)|s=−1/2 = −8 and k4 = H (s)(s + 1)3 's=−1 = 6. By
equating the coefficients with s3 to 0 we get the relation k1 + k2 = 0. Similar
relation follows for the coefficients with s2 as 3k1 + 5k2 /2 + k3 = 0 or
k1 /2 + k3 = 0. Then k2 = 8 and k3 = 4. With
ki ki
→
s − si 1 − e s i z −1
and
1 dm k i 1 dm ki
m → m { }
m! dsi s − si m! dsi 1 − esi z−1
we get the discrete system
−8 8
H (z) = +
1 − e−1/2 z−1 1 − e −1 z −1
* +' * +'
d 4 '
' d2 6 '
'
+ s − 1 ' + 2 s − 1 '
ds1 1 − e 1 z s1 =−1 ds1 1 − e 1 z s1 =−1
−8 8 4e−1 z−1 3e−2 z−2 + 3e−1 z−1
= + − −
+ − −
+
1 − e−1/2 z−1 1−e z1 1 1
(1 − e z ) 1 2 (1 − e −1 z −1 )3
−5.83819z−3 − 9.68722z−2 + 22.0531z−1
=
(z−1 − e)3 (z−1 − e1/2 )
−1
(1 + 8 11− z
+ z −1
)
H (z) = −1 −1
−z + 1/2)(2 1−z
(2 11+ + 1 )3
z −1 1 + z −1
−14z−4 − 24z−3 + 12z−2 + 40z−1 + 18
= .
3z−4 − 32z−3 + 126z−2 − 216z−1 + 135
Solution 5.6. Since we use the bilinear transform we have to pre-modify the
system according to
2 Ω ∆t
Ωd = tan( 1 ) = 2.0 = 0.6366π.
∆t 2
The frequency value is shifted from Ω1 = 0.5π to Ωd = 0.6366π. The modi-
fied system is
2QΩd
Hd (s) = 2 .
s + 2Ωd Qs + Ω2d + Q2
−z −1
Now using s = 2 11+ z −1
the corresponding discrete- system is obtained,
2QΩd
H (z) = B C2 B C .
− z −1
2 11+ + 2Ωd Q 2 11− z −1
+ Ω 2 + Q2
z −1 + z −1 d
s1 s2 4π 2 f c2
Ha ( s ) = = √ .
(s − s1 )(s − s2 ) s + 2π f c 2s + 4π 2 f c2
2
1.0548(1 + z−1 )2
H (z) = .
5.1066 − 1.8874z−1 + z−2
Ljubiša Stanković Digital Signal Processing 259
1.0548(1 − z−1 )2
H (z) = .
5.1066 + 1.8874z−1 + z−2
Solution 5.8. For the continuous-time system the design frequencies are
f p = 1 kHz
f s = 2 kHz.
They correspond to
Ω p = 2π 103 rad/s
Ωs = 4π 103 rad/s.
ω p = 0.2π
ωs = 0.4π.
The frequencies for the filter design, that will be mapped to ωs and ω p by
using the bilinear transform, are
2 0.6498
Ω pd = tan(0.2π/2) =
∆t ∆t
2 1.4531
Ωsd = tan(0.4π/2) = .
∆t ∆t
1−100.1a p
1 log 1−100.1as
N= = 1.368.
2 log Ω pd
Ωsd
We assume N = 2.
260 From Continuous to Discrete Systems
Mapping this system into the discrete-time domain by using the bilinear
transform,
2 1 − z −1
s= ,
∆t 1 + z−1
produces
0.067569(1 + z−1 )2
H (z) = .
1 − 1.14216z−1 + 0.412441z−2
Solution 5.9. The Butterworth filter order is
1−100.1a p
1 log 1−100.1as
N= = 2.335.
2 log Ω p
Ωs
Ωp
Ωc = 2N
, = 0.109345π = 0.3435.
100.1a p − 1
ki ∆tk i
→ .
s − spi 1 − es pi ∆t z−1
The discrete-time system transfer function is
−0.0253z−2 − 0.0318z−1
H (z) = .
−1.98774 + 4.61093z−1 − 3.68033z−2 + z−3
Solution 5.10. The transfer function is
1
HH (s) = H ( )
s
2 1− z −1 2 z −1
with s = ∆t 1+z−1 = ∆t z+1 and ∆t = 2. Corresponding lowpass filter would
be
z−1
HL (z) = H (s)|s= z−1 = H ( ).
z +1 z+1
The discrete highpass filter is
'
1 '
HH (z) = HH (s)|s= z−1 = H ( )''
z +1 s s = z −1
z +1
z+1
HH (z) = H (
).
z−1
Obviously HH (z) = HL (−z). It means that a discrete highpass system can be
realized by replacing z with −z in the transfer function. For ∆t ̸= 2 a scaling
is present as well.
262 From Continuous to Discrete Systems
2 − 3z−1 + 2z−2
H1 (z) =
(1 − 2z−1 )2
is not stable since it has a second-order pole at z = 2. This system may be
stabilized, keeping the same amplitude of the frequency response, using a
second-order allpass system with zero at z = 2
( )2
z −1 − 1
2
H A (z) = 1 −1
.
1− 2 z
2 − 3z−1 + 2z−2
H1 (z) = .
( z −1 − 2 )2
Causal system H2 (z) has a pole at z = 4. It can be stabilized by using allpass
system
z−1 − 14 4−z
H A (z) = 1 −1
= .
1 − 4z 4z − 1
Ljubiša Stanković Digital Signal Processing 263
(z − 14 )(z + 12 )
H (z) = .
(z + 45 )(z − 37 )
z − 12
Hi (z) = .
(4 − z)(1/3 − z)(z − 1.2071)(z − 0.2071)
These poles have to be compensated, keeping the same amplitude, by using
two first-order allpass systems. The resulting system transfer function is
z − 4 z − 1.2071
Hi (z)
1 − 4z 1 − 1.2071z
z − 12
= .
(1/3 − z)(z − 0.2071)(1 − 4z) (1 − 1.2071z)
264 From Continuous to Discrete Systems
5.8 EXERCISE
( s + 2)
H (s) = .
4s2 + s + 1
What is the corresponding discrete-time system obtained with ∆t = 1 by
using the impulse invariance method and the bilinear transform.
Exercise 5.2. A continuous system is described by a differential equation
1
y′′ (t) + 6y′ (t) − y(t) = x (t) + x ′ (t)
2
with zero initial conditions. What is the corresponding transfer function
of a discrete system obtained by using the first-order backward difference
approximation with ∆t = 1?
Exercise 5.3. (a) A continuous system
2QΩ0
H (s) =
s2 + 2Ω0 Qs + Ω20 + Q2
x (t) = A cos(Ω0 t + ϕ)
sampled with the sampling interval ∆t = 10−3 s. What would be the corre-
sponding continuous-time output signal after an ideal D/A converter.
Exercise 5.4. (a) By using the bilinear transform find the transfer function
of a third-order Butterworth filter with f ac = 3.4 kHz. The sampling step is
∆t = 40 µ sec.
(b) Translate the discrete transfer function to obtain a bandpass system
with corresponding central frequency f ac = 12.5 kHz in the continuous
domain.
Ljubiša Stanković Digital Signal Processing 265
2 − 5z−1 + 2z−2
H1 (z) = ,
1 − 4z−1 + z−2
z−1
H2 (z) = .
(2 − z)(1/4 − z)
Exercise 5.7. The z-transform
(z − 13 )(z−1 − 13 )
R(z) =
(z + 12 )(z−1 + 12 )
can can be written as
1
R(z) = H (z) H ∗ ().
z∗
Find H (z) for the minimum phase system. If h(n) is the impulse response
of H (z) and h1 (n) is the impulse response of
z−1 − a1 e− jθ1
H1 (z) = H (z)
1 − a1 e jθ1 z−1
show that |h(0)| ≤ |h1 (0)| for any θ1 and | a1 | < 1. All systems are causal.
Exercise 5.8. A signal x (n) has passed trough a media whose influence can
be described by the transfer function
(1 − z/3)(1 − 5z)(z2 − z + 34 )
H (z) =
z2 − 2/3
and the signal r (n) = x (n) ∗ h(n) is' obtained.
' Find
' a causal
' and stable system
to process r (n) in order to obtain 'Y (e jω )' = ' X (e jω )'.
266 From Continuous to Discrete Systems
Chapter 6
Realization of Discrete Systems
L
INEAR
ence equation relating the output signal with the input signal at the
considered instant and the previous values of the output and input
signal. The transfer function can be written in various forms producing dif-
ferent system realizations. Some of them will be presented next. Symbols
that are used in the realizations are presented in Fig.6.1.
a
z -1
x(n) ax(n) x(n) x(n-1) x(n) x(n)
x(n)
+ + ×
x(n) x(n)+y(n) x(n) - x(n)- y(n) x(n) x(n)y(n)
Figure 6.1 Symbols and their function in the realization of discrete-time systems.
267
268 Realization of Discrete Systems
x(n) B0 y(n)
+ +
z-1 z-1
x(n-1) + + y(n-1)
B A1
-1 1 -1
z z
x(n-2) y(n-2)
B A
2 2
z-1 z-1
+ +
B A
-1 1 1
z z-1
+ +
B A
2 2
-1 -1
z z
B A
M N
and
1
H2 (z) = .
1 − A1 z−1 − ... − A N z− N
The overall transfer function is
It means that these two blocks can interchange their positions. After the
positions are interchanged, then by using the same delay systems, we get
the resulting system in the direct realization II form, presented in Fig.6.4.
This system uses a reduced number of delay blocks in the realization.
Example 6.1. Find the transfer function of a discrete system presented in Fig.6.5.
⋆The system can be recognized as a direct realization II form. After
its blocks are separated and interchanged the system in a form presented in
Fig.6.6 is obtained.
The output of the first block is
1 1
y1 ( n ) = x ( n ) − x ( n − 1) + x ( n − 2). (6.2)
2 3
Its transfer function is
1 1
H1 (z) = 1 − z−1 + z−2 .
2 3
270 Realization of Discrete Systems
x(n) B0 y(n)
+ +
z-1
+ +
A B
1 -1 1
z
+ +
A B
2 2
z-1
A B
N M
x(n) y(n)
+ +
-1
z
+
-1/2
-1
z
+
1/2 1/3
z-1
-1/6
1
H2 (z) = .
1 − 12 z−2 + 16 z−3
Ljubiša Stanković Digital Signal Processing 271
+
-1/2
-1 -1
z z
+
1/3 1/2
-1
z
-1/6
The difference equation for the whole system is obtained after y1 (n) from
(6.2) is replaced into (6.3)
1 1 1 1
y(n) = y ( n − 2) − y ( n − 3) + x ( n ) − x ( n − 1) + x ( n − 2).
2 6 2 3
The system transfer function is
1 − 12 z−1 + 13 z−2
H (z) = H1 (z) H2 (z) = .
1 − 12 z−2 + 16 z−3
1 1
H (z) = −
=
1 + A1 z 1 1 − z p1 z−1
272 Realization of Discrete Systems
the error in coefficient A1 is the same as the error in the system pole z p1 . If
the coefficient is quantized with a step ∆ then the error in the pole location
is of order ∆. The same holds for the system zeros.
For a second-order system with real-valued coefficients and a pair of
complex-conjugated poles
1 1
H (z) = =
1 + A 1 z −1 + A 2 z −2 (1 − z p1 z−1 )(1 − z p2 z−1 )
the relation between the coefficients and the real and imaginary parts of the
poles z p1/2 = x p ± jy p is
1
H (z) =
1 − 2x p z −1 + ( x2p + y2p )z−2
A1 = −2x p
A2 = x2p + y2p .
The error in coefficient A1 defines the error in the real part of poles x p .
When the coefficient A2 assumes discrete values A2 = m∆, with A1 ∼
x p = n∆ then the imaginary part of poles may assume the values y p =
F √
± A2 − x2p = ± m∆ − n2 ∆2 with n2 ≤ mN. For small n, i.e., for small real
√
part of a pole, y p = ± ∆m. For N discretization levels, assuming that the
poles are within the unit circle x2p + y2p ≤ 1, the first discretization step is
√
changed from 1/N order to 1/ N order. The error, in this case, could be
significantly increased. The changes in y p due to the discretization of A2
may be large.
The quantization of x p and y p as a result of quantization of − A1 /2
and A2 = x2p + y2p is shown in Fig.6.7 for the case of N = 16 and N = 32
quantization levels. We see that the error in y p , when it assumes small
values, can be very large. We can conclude that the poles close to the unit
circle with larger imaginary values y p are less sensitive to the errors. The
highest error could appear if a second order real-valued pole (with y p = 0)
were implemented by using a second order system.
We have concluded that the poles close to the real axis (small y p )
are sensitive to the error in coefficients even in the second order systems.
The sensitivity increases with the system order, since the higher powers in
polynomial increase the maximal possible error.
Consider a general form of a polynomial in the transfer function,
written in two forms
y =Im{z } y =Im{z }
p p p p
1 1
0.5 0.5
0 0
-0.5 -0.5
-1 -1
-1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1
x =Re{z } x =Re{z }
p p p p
Figure 6.7 Quantization of the real and imaginary parts x p = Re{z p } and y p = Im{z p } of
poles (zeros) as a result of the quantization in 16 levels (left) and 32 levels (right) of the
coefficients A1 = −2x p and A2 = x2p + y2p .
and
P(z) = (z − z1 )(z − z2 )...(z − z M ).
If the coefficients A1 , A2 , ..., A M are changed for small ∆A1 , ∆A2 , ..., ∆A M
(due to quantization) then the pole position (without loss of generality and
for notation simplicity consider the pole z1 ) is changed for
- .
∼ ∂z1 ∂z1 ∂z1
∆z1 = ∆A1 + ∆A2 + ... + ∆A M . (6.4)
∂A1 ∂A2 ∂A M | z = z1
Since there is no a direct relation between z1 and A1 we will find ∂z1 /∂Ai
using
∂P(z) ∂P(z) ∂z1
= .
∂Ai |z=z1 ∂z1 ∂Ai |z=z1
From this relation it follows
∂P(z)
∂z1 ∂Ai |z=z1 z1M−i
= ∂P(z)
= .
∂Ai |z=z1 −(z1 − z2 )(z1 − z3 )...(z1 − z M )
∂z1 |z=z1
The coefficients ∂z1 /∂Ai|z=z1 could be large, especially in the case when
there are close poles, with a small distance (zi − zk ).
274 Realization of Discrete Systems
with
12 7 111 95
P(z) = (z − )(z − )(z − )(z − )
27 29 132 101
∼
= (z − 0.4444)(z − 0.2414)(z − 0.8409)(z − 0.9406)
In the realization of this system the coefficients are rounded to two decimal
positions, with absolute error up to 0.005. Find the poles of the system with
rounded coefficients.
⋆The system denominator is
P(z) ∼
= z4 − 2.4673z3 + 2.1200z2 − 0.7336z + 0.0849.
with poles
The poles of the function with rounded coefficients can differ significantly
from the original pole values. Maximal error in poles is 0.115. One pole is on
the unit circle making the system with rounded coefficients unstable, in this
case.
Note that if the system is written as a product of the first-order func-
tions in the denominator and each pole value is rounded to two decimals
1
H (z) = 7 12 111 95
(z − 29 )( z − 27 )( z − 132 )( z − 101 )
P(z) ∼
= (z − 0.24)(z − 0.44)(z − 0.84)(z − 0.94)
the poles will differ from the original ones for no more than 0.005.
If the poles are grouped into the second-order terms (what should be
done if the coefficients were complex-conjugate in order to avoid calculation
with complex valued coefficients), then
P(z) ∼
= (z − 0.6858z + 0.1073)(z − 1.7815z + 0.7910).
P(z)
we will get
The sensitivity analysis for this example can be done for each pole.
Assume that the poles are denoted as z1 = 12/27, z2 = 7/29, z3 = 111/132
and z4 = 95/101. Then
∆z1 ∼
= 0.0878.
The true error is ∆z1 = 0.0926. A small difference is due to the linear approx-
imation, assuming small ∆Ai . The obtained result is a good estimate of an
order of error for the pole z1 . The error in z1 is about 18.5 time greater than
the maximal error in the coefficients Ai , that is of order 0.005.
Commonly real-valued signals are processed and the poles and zeros in the
transfer function are in complex-conjugated pairs. In that case it is better to
group these pairs into second order systems to avoid complex calculations.
The transfer function is of the form
B00 + B10 z−1 + B20 z−2 B0K + B1K z−1 + B2K z−2
H (z) = × ... ×
1 − A10 z−1 − A20 z−2 1 − A1K z−1 − A2K z−2
= H0 (z) H1 (z)...HK (z),
where
B0i + B1i z−1 + B2i z−2
Hi (z) =
1 − A1i z−1 − A2i z−2
are second-order systems with real-valued coefficients. The whole system
may be realized as a cascade of lower-order (first or second-order) systems,
Fig.6.9. Of course, if there are some real-valued poles then there is no need
to group them. It is better to keep the realization order of the subsystems as
low as possible.
Ljubiša Stanković Digital Signal Processing 277
z-1 z-1
+ + + +
A B A B
10 -1 10 1K -1 1K
z z
A B A B
20 20 2K 2K
H(z)
Y ( z ) = H ( z ) R ( z ) = H ( z ) X ( z ) − H 2 ( z )Y ( z ) .
Y (z) H (z)
He (z) = = .
X (z) 1 + H 2 (z)
278 Realization of Discrete Systems
H2(z)
x(n) y(n)
+ H(z) +
- +
z -1
xp
H(z)
Figure 6.11 Complete second-order subsystem with complex-conjugate pair of poles realized
using the first-order systems.
y pL z−1
Qi ( z ) =
1 − 2x pL z−1 + x2pL z−2 + y2pL z−2
y pL z−1
=
(1 − x pL z−1 )2 + y2pL z−2
1 1
= y pL z−1 * +2
(1 − x pL z−1 )2 y pL z−1
1+ 1− x pL z−1
H (z) H2 (z)
=
1 + H 2 (z)
where
y pL z−1 1
H (z) = −
and H2 (z) = .
1 − x pL z 1 1 − x pL z−1
H(z)
x(n) yp y(n)
-1
+ z
+
xp
Figure 6.12 First-order system for the realization of the second-order system with complex-
conjugate pair of poles.
The error in one coefficient (real or imaginary part of a pole) does not
influence the other coefficients. However if an error in the signal calculation
happens in one cascade, then it will propagate as an input to the following
cascades. In that sense it would be the best to order cascades in such a way
that the lowest probability of an error appears in early cascades. From the
analysis of error we can conclude that the cascades with the poles and zeros
close to the origin are more sensitive to the error and should be used in later
stages.
1.4533(1 + z−1 )3
H (z) =
(−0.8673z−1 + 3.1327)(3.0177z−2 − 5.434z−1 + 7.54)
1 + z −1 1 + 2z−1 + z−2
= 0.0615 ×
1 − 0.2769z−1 1 − 0.7207z−1 + 0.4002z−2
⋆(a) Realization of the system H (z) when both the first and the
second-order subsystems can used is done according to system transfer func-
tion as in Fig.6.13.
(b) For the first-order systems the realization should be done based on
1 + z −1
H (z) = 0.0615 × (1 + z −1 ) × (1 + z −1 )
1 − 0.2769z−1
1
× ,
1 − 0.7207z−1 + 0.4002z−2
280 Realization of Discrete Systems
+ +
1 0.2769 0.7207 -1 2
z
-0.4002 1
with
1
1 − 0.7207z−1 + 0.4002z−2
1
=
(1 − (0.3603 + j0.5199)z−1 )(1 − (0.3603 − j0.5199)z−1 )
1
=
1 − 2 × 0.3603z−1 + 0.36032 z−2 + 0.51992 z−2
1 1 1
= = .
0.51992 z−2 + (1 − 0.3603z−1 )2 (1 − 0.3603z−1 )2 1 + ( 0.5199z−1−1 )2
1−0.3603z
In this way the system can be written and realized in terms of the first-order
subsystems
1 + z −1 1 + z −1
H (z) = 0.0615 ×
1 − 0.2769z−1 1 − 0.3603z−1
1+z − 1 1
× × −1 .
1 − 0.3603z−1 −1
1 + 0.5199z −1 × 0.5199z −1
1−0.3603z 1−0.3603z
x(n) 0.0615
+ + + +
-1 -1
z z
0.2769 0.3603
y(n)
+ + +
-1
z yp=0.5199
y yp
p
-1 + -1 +
z z
0.3603
0.3603 0.3603
In the case of a parallel realization the error in one subsystem does not
influence the other subsystems. If an error in the signal calculation appears
in one parallel subsystem, then it will influence the output signal, but will
not influence the outputs of other parallel subsystems.
x(n) y(n)
+ + +
B00
z-1
+ +
A10 B10
-1
z
A B
20 20
+ +
B01
z-1
+ +
A B
11 11
z-1
A21 B21
+ +
B0K
z-1
+ +
A1K B1K
-1
z
A2K B2K
z-1
1.1078 -1 0.2542
z
-0.5482
0.7256
+ +
-1
z
0.9246 -0.084
z-1
-0.2343
+ + +
1.1078 1 0.9246 0.0858
z-1 z
-1
x(n) B y(n)
0
+ +
z-1 z-1
+ +
A B
1 -1 -1 1
z z
+ +
A B
2 2
-1 -1
z z
AN BM
transfer functions it follows that the inverse realization has the same transfer
function as the original realization.
y(n) B x(n)
0
+ +
z-1 z-1
+ +
A B
1 1
z-1 z-1
+ +
A B
2 2
-1 -1
z z
A B
N M
in the pervious section. Systems that would not have recursions, when the
output signal is a linear combination of the input signal and its delayed
versions only,
are the FIR systems. These systems are always stable. The FIR systems can
also have a linear phase.
Im{ H (e jω }
arg{ H (e jω } = arctan{ } = −ωq (6.5)
Re{ H (e jω }
is also acceptable in these systems. They will have a constant group delay
d(arg{ H (e jω })
τg = − =q
dω
286 Realization of Discrete Systems
and will not distort the impulse response with respect to the zero-phase
system. The impulse response will only be delayed in time for q.
Example 6.6. Consider an input signal of the form
M
x (n) = ∑ A m e j ( ωm n + θ m ).
m =1
arg{ H (e jω }
τϕ = − =q
ω
and the group delay τg are the same. In general, the group delay and the
phase delays are different. The group delay, as notion dual to the instanta-
neous frequency, is introduced and discussed in the first chapter.
N −1 N −1 N −1
H (e jω ) = ∑ h(n)e− jωn = ∑ h(n) cos(ωn) − j ∑ h(n) sin(ωn). (6.6)
n =0 n =0 n =0
Combining the linear phase condition (6.5) with form (6.6), we get
or
N −1
∑ h(n)[sin(ωq) cos(ωn) − cos(ωq)sin (ωn)] = 0.
n =0
N −1
∑ h(n) sin(ω (n − q)) = 0. (6.7)
n =0
N−1
q=
2
h(n) = h( N − 1 − n), 0 ≤ n ≤ N − 1.
Since the Fourier transform is unique, this is the unique solution for the
linear phase condition. It is illustrated for an even and odd N in Fig.6.20.
From the symmetry condition it is easy to conclude that there is no a causal
linear phase system with infinite impulse response.
6.2.2 Windows
When a system obtained from the design procedure is an IIR system and
the requirement is to implement it as an FIR system, in order to get a linear
phase or to guaranty the system stability (when small changes of coefficients
are possible), then the most obvious way is to truncate the desired impulse
response hd (n) of the resulting IIR system. The impulse response of the FIR
system is
!
hd (n) for 0 ≤ n ≤ N − 1
h(n) =
0 elsewhere.
This form can be written as
h ( n ) = h d ( n ) w ( n ),
where !
1 for 0≤n≤ N−1
w(n) =
0 elsewhere
288 Realization of Discrete Systems
q=16
h(n)
N=32
0 16 32
n
q=16.5
h(n)
N=33
0 16.5 33
n
Figure 6.20 Impulse response of a system with a linear phase for an even and odd N.
is the rectangular window function. In the Fourier domain the desired im-
pulse response truncation by a window function will mean a convolution of
the desired frequency response with the frequency response of the window
function
H (e jω ) = Hd (e jω ) ∗ W (e jω ).
Without loss of generality, assume that the most significant values of hd (n)
are within − N/2 ≤ n ≤ N/2 − 1. The impulse response hc (n) can assume
nonzero values only within − N/2 ≤ n ≤ N/2 − 1. Therefore,
N/2−1 − N/2−1 ∞
e2 = ∑ |hd (n) − hc (n)|2 + ∑ |hd (n)|2 + ∑ |hd (n)|2 .
n=− N/2 n=−∞ n= N/2
Since the last two terms are hc (n) independent and all three terms are
non negative, the error e2 is minimal if
h(n) = hc (n − N/2).
1
w(n) = [u(n) − u(n − N/2)] ∗n [u(n) − u(n − N/2)]
N/2
290 Realization of Discrete Systems
(a)
0.2
h (n)
d
0
-60 -40 -20 0 20 40 60 n
(b)
H ( ejω )
1
0.5
d
0
-π - π/4 0 π/4 π ω
(c)
0.2
h(n)
0
-40 -20 0 20 40 60 n
|H( ejω )|
1 (d)
0.5
0
-π - π/4 0 π/4 π ω
(e)
0.2
h(n)
0
-20 0 20 40 60 n
|H( ejω )|
1 (f)
0.5
0
-π - π/4 0 π/4 π ω
(g)
0.2
h(n)
0
-20 0 20 40 60 n
|H( ejω )|
1 (h)
0.5
0
-π - π/4 0 π/4 π ω
Figure 6.21 Impulse response of a FIR system obtained by truncating the desired IIR response
(a), (b) using two rectangular window of different widths (c)-(f), and using a Hann(ing)
window (g),(h).
Ljubiša Stanković Digital Signal Processing 291
It loses the continuity property (in the continuous-time domain). Its con-
vergence for very large values of ω will be slower than in the Hann(ing)
window case. However, as it will be shown later, its coefficients are derived
in such a way that the first side-lobe is canceled out at its mid point. Then
the immediate convergence, after the main lobe, is much better than in the
Hann(ing) window case.
Other windows are derived with various constraints. Some of them
will be reviewed in Part three of this book as well.
Suppose that the desired system frequency response is given in the fre-
quency domain. If we want to get an N point FIR system that approximates
the desired frequency response then it can be obtained by sampling the fre-
quency response Hd (e jω ) at
2π
ω= k, k = 0, 1, 2, ..., N − 1
N
H (k ) = Hd (e jω )|ω =2πk/N
h(n) = IDFT{ H (k )}.
292 Realization of Discrete Systems
1.5 1.5
H (ejω), H (k)
H (ejω), H (k)
1 1
d
d
0.5 0.5
d
d
0 0
-0.5 -0.5
-2 0 2 -2 0 2
ω ω
h(n)
h(n)
n n
1.5 1.5
1 1
H(ejω), H(k)
H(ejω), H(k)
0.5 0.5
0 0
-0.5 -0.5
-2 0 2 -2 0 2
ω ω
Figure 6.22 Realization of a FIR system with N samples in time, obtained by sampling the
desired frequency response with N samples. A direct sampling (left) and the sampling with
smoothed transition (right),
0.5
W (k)
w(n)
H
0.25
-16 -12 -8 -4 0 4 8 12 n
k
Figure 6.23 A Hann(ing) window for smoothing the frequency response in the frequency
domain (left) and in the time domain (right).
+ + +
y(n)
The FIR systems can be realized in the same way as the IIR systems pre-
sented in the previous section, without using the recursive coefficients. A
common way of presenting a direct realization of FIR system is shown in
Fig.6.24. It is often referred to as an adder with weighted coefficients h(n).
A realization of liner phase FIR system that uses the coefficients sym-
metry h(0) = h( N − 1), h(2) = h( N − 2),... is shown in Fig.6.25.
Realization of a frequency sampled FIR filter may be done using the
relation between the z−transform and the DFT of a signal.
If we want to realize a FIR system with N nonzero samples, then it
can be expressed in term of the DFT of frequency response (samples of the
transfer function H (z) along the unit circle) as follows. For a FIR filter we
294 Realization of Discrete Systems
may write
N −1
H (k) = ∑ h(n)e− j2πnk/N
n =0
N −1
1
h(n) = ∑ H (k )e j2πnk/N .
N k =0
N −1 N −1 N −1
1 1 1 − z− N e j2πk
H (z) = ∑ ∑ H (k )e j2πnk/N z−n = ∑ H (k )
N k =0 n =0
N k =0 1 − z−1 e j2πk/N
Example 6.8. For a system whose impulse response is the Hamming window
function of the length N = 32 present the FIR filter based realization.
π
h(n) = 0.52 + 0.48 cos((n − 16) )
16
0 ≤ n ≤ 31.
Ljubiša Stanković Digital Signal Processing 295
The DFT values are H (0) = 0.52 × 32, H (1) = −0.24 × 32, H (31) = H (−1) =
−0.24 × 32 and H (k) = 0 for other k within 0 ≤ k ≤ 31. Therefore
This is a cascade of
H1 (z) = (1 − z−32 )/32
and a system H2 (z) + H3 (z) where
and
1 − cos(π/16)z−1
H3 (z) = −2H (1) .
1 − 2 cos(π/16)z−1 + z−2
Example 6.9. For a system whose frequency response Hd ( jΩ) in the continuous-
time domain is
Hd ( jΩ) = π − |Ω|
for |Ω| ≤ π, with corresponding Hd (e jω ) in the discrete-time domain (∆t = 1
is assumed, Fig.6.26) find the FIR filter impulse response with N = 7 and
N = 8 using:
(a) Sampling the desired frequency response Hd (e jω ) in the frequency
domain.
(b) Calculating hd (n) = IFT{ Hd (e jω )} and taking its N the most signif-
icant values, h(n) = hd (n) for − N/2 ≤ n ≤ N/2 − 1 and h(n) = 0 elsewhere.
(c) Comment the error in both cases.
⋆(a) The sampling in frequency domain is illustrated in Fig.6.26. The
values of the FIR system, in this case, are the samples of Hd (e jω ),
' !
' π (1 − 2 Nk ) for 0 ≤ k < N/2
H (k ) = Hd (e jω )' = .
ω =2πk/N π (2 Nk − 1) for N/2 ≤ k ≤ N − 1
h(n) = IDFT{ H (k )}
1 N −1
H (k)e j2πnk/N .
N k∑
=
=0
296 Realization of Discrete Systems
For N = 7
π 10π 2π 6π 2π 2π 2π
h(n) = + cos( n) + cos(2 n) + cos(3 n)
7 49 7 49 7 49 7
0 ≤ n ≤ 6.
For N = 8
π 3π 2π π 2π π 2π
h(n) = + cos( n) + cos(2 n) + cos(3 n)
8 16 8 8 8 16 8
0 ≤ n ≤ 7.
It is shown in Fig.6.26 (third row). The frequency response of the FIR filter is
H (e jω ) = FT{h(n)}.
Its values are equal to the desired frequency response at the sampling points
' '
' '
H (e jω )' = Hd (e jω )' .
ω =2πk/N ω =2πk/N
or for N = 8 %
1−cos(nπ )
h(n) = πn2
for −4 ≤ n ≤ 3 .
0 elsewhere.
The frequency response of this FIR filter is
H (e jω ) = FT{h(n)}.
Ljubiša Stanković Digital Signal Processing 297
jω
H (jΩ) H (e )
d d
3 3
2 2
1 1
0 0
-5 -π 0 π 5 -5 -π 0 π 5
2 2
1 1
0 0
-5 0 5 -5 0 5
2 2
h(n), N=7 h(n), N=8
1 1
0 0
-1 -1
0 2 4 6 0 2 4 6
2 2
1 1
0 0
-5 0 5 -5 0 5
Figure 6.26 Design of a FIR filter by frequency sampling of the desired frequency response.
It is shown in Fig.6.27.
(c) The error in frequency sampling (a) is zero at the desired frequency
points. However, since the frequency response is equal to the samples of
the impulse response of an infinite duration there will be aliasing of the
impulse response, resulting in the error outside the sampling points. For the
case of windowing the impulse response (b), the aliasing in the frequency
response is avoided since the impulse response is truncated. However, the
truncation causes an error in the resulting frequency response. In this case
the error distribution is not the same as in case (a). The mean square error Er
298 Realization of Discrete Systems
2
hd(n)
-1
-15 -10 -5 0 5 10 15
2 2
h(n), N=7 h(n), N=8
1.5 1.5
1 1
0.5 0.5
0 0
-4 -2 0 2 4 -4 -2 0 2 4
jω jω
H(e ), N=7 H(e ), N=8
3 3
2 2
1 1
0 0
-5 0 5 -5 0 5
Figure 6.27 Design of a FIR filter by windowing the impulse response of an IIR filter.
is calculated and presented in Fig.6.28, along with the errors in the absolute
value of the frequency responses. As expected from the theory, the impulse
response truncation produced lower mean square error in the estimation.
6.3 PROBLEMS
16(z + 1)z2
H (z) =
(4z2 − 2z + 1)(4z + 3)
0.2
E = 0.008092
r
0.1
-0.1
-0.2
0 1 2 3 4 5 6
0.2
Er= 0.0018945
0.1
-0.1
-0.2
0 1 2 3 4 5 6
Figure 6.28 Error in the case of the frequency response sampling (top) and the IIR impulse
response truncation (bottom), along with the corresponding mean square error (Er ) value.
Plot the direct realization I and II, parallel and cascade realization.
Problem 6.3. Find the transfer function of a discrete system presented in
Fig.6.29.
Problem 6.4. Find the transfer function of a discrete system presented in
Fig.6.30.
Problem 6.5. For the system
4z2 4z + 4
H (z) = 2
,
4z − 2z + 1 4z + 3
300 Realization of Discrete Systems
x(n) y(n)
+ + + +
z-1 z-1
+ + + +
2 1/2 1/3
z-1 z
-1
-1 -1
z z
x(n) y(n)
+ + + + +
-1 -1
z z
+ + + +
2 -1 1/2 -1 1/3
z z
plot the cascade and parallel realization. Write down the difference equation
which describes this system.
Problem 6.7. For the system defined by the transfer function
1 + z −2
H (z) =
1 + 2z−1 + 2z−2 + z−3
plot the cascade realization.
Problem 6.8. System is defined by
1 1 2
y ( n ) + y ( n − 1) + w ( n ) + w ( n − 1) = x ( n )
4 2 3
Ljubiša Stanković Digital Signal Processing 301
H1 ( z )
x(n) rsinθ
+
+
-1
rcosθ z
y(n)
+
-r sinθ
-1
rcosθ z
H ( z)
2
5 5
y(n) − y(n − 1) + 2w(n) − 2w(n − 1) = − x (n),
4 3
where x (n) is the input signal, y(n) is the output, and w(n) is a signal within
the system. What is the frequency and impulse response of the system?
Problem 6.9. For the system presented in Fig.6.31 find the transfer function.
Problem 6.10. Show that the FIR system
1 + 2z − z2 + 4z3 − z4 + 2z5 + z6
H (z) =
z6
has a linear phase function. Find its group delay.
Problem 6.11. Let h(n) be an impulse response of a causal system with the
Fourier transform H (e jω ). A real-valued output signal y1 (n) = x (n) ∗ h(n)
of this system is reversed, r (n) = y1 (−n), and passed through the same
system, resulting in the output signal y2 (n) = r (n) ∗ h(n). The final output
is reversed again y(n) = y2 (−n). Find the phase of the frequency response
function of the overall system.
Problem 6.12. For a system whose frequency response in the continuous-
time domain is
⎧
⎨ 2 for |ω | < π2
Hd ( jΩ) = 1 for π2 < |ω | < 3π
4
⎩
0 elsewhere,
302 Realization of Discrete Systems
x(n) y(n)
+ +
-1 -1
z z
+
-1/4 -1
z
+
1/8
z-1
-3/16
6.4 SOLUTIONS
Solution 6.1. In order to plot the direct form of realization, transfer function
should be written in a form suitable for this type of realization,
16(z + 1)z2 1 + z −1
H (z) = =
(4z2 − 2z + 1)(4z + 3) (1 − 12 z−1 + 14 z−2 )(1 + 34 z−1 )
1 + z −1
= .
1 + 14 z−1 − 18 z−2 + 3 −3
16 z
x(n) y(n)
+ +
-1
z
+
-1/4 -1
z
+
1/8
z-1
-3/16
x(n) y(n)
+ + +
-1 -1
z z
+
1/2 -3/4
z-1
-1/4
1 + z −1
H (z) =
(1 − 12 z−1 + 14 z−2 )(1 + 34 z−1 )
1 + z −1 1
= = H1 (z) H2 (z).
1 − 12 z−1 + 14 z−2 1 + 34 z−1
x(n) y(n)
+ +
22/19
z-1
+
1/2 1/19
z-1
-1/4
+
-3/19
-1
z
-3/4
1 + z −1 Az−1 + B C
H (z) = = + .
(1 − 1 −1
2z + 4 z )(1 + 4 z ) 1 − 12 z−1 + 14 z−2 1 + 34 z−1
1 −2 3 −1
Y (z) 1 + z −1 + z −2
H (z) = = .
X (z) 1 − z−1 + z−2 + 3z−3
x(n) y(n)
+ +
-1 -1
z z
x(n-1) y(n-1)
+ +
1
-1 -1
z z
x(n-2) y(n-2)
+
-1
-1
z
y(n-3)
-3
x(n) y(n)
+ +
-1
z
+ +
1
-1
z
+
-1
z-1
-3
1 + z −1 + z −2 1
H (z) = = H1 (z) H2 (z).
1 − 2z−1 + 3z−2 1 + z−1
1 1 −1
6 2z + 56
H (z) = + .
1 + z −1 1 − 2z−1 + 3z−2
306 Realization of Discrete Systems
x(n) y(n)
+ + +
-1 -1
z z
+ +
2 -1 -1
z
-3
where H1 (z) denotes the first block. It can be considered as a direct realiza-
tion II, with
1 1 1
y1 (n) = 2y1 (n − 1) + y1 (n − 2) + x (n) + x (n − 1) − x (n − 2),
3 2 3
presented in Fig.6.39. Using the z-transform properties, its transfer function
is
Y (z) 1 + 12 z−1 − 13 z−2
H1 (z) = 1 = .
X (z) 1 − 2z−1 − 13 z−2
Now consider the second block whose transfer function is H2 (z). This
block can be considered as a parallel realization of two blocks, H2 (z) =
H21 (z) + H22 (z) where
H21 (z) = 1.
The second transfer function is the transfer function corresponding to a
direct realization II, of a subsystem described by
1 1
y 2 ( n ) = y 2 ( n − 1 ) + y 2 ( n − 2 ) + x 1 ( n ) + x 1 ( n − 1 ) − x 1 ( n − 2 ).
3 4
Thus, the transfer function of this subsystem is
+ + + +
2 -1 1/2 -1 1/3
z z
It means that
1 + 13 z−1 − 14 z−2
H2 (z) = H21 (z) + H22 (z) = 1 + .
1 − z −1 − z −2
The transfer function of the whole system is
( )
1 + 12 z−1 − 13 z−2 1 + 13 z−1 − 14 z−2
H (z) = H1 (z) H2 (z) = 1+ .
1 − 2z−1 − 13 z−2 1 − z −1 − z −2
1 + ( 12 + 1)z−1 − 13 z−2
H1 (z) = .
1 − 2z−1 − 13 z−2
Previous relation holds since the upper delay block (above the obvious
direct realization II block) has the same input and output as the first delay
block below it.
The block with transfer function H2 (z) can be considered as a parallel
realization of two blocks, similarly as in previous example, with, H21 (z) and
H22 (z), defined by
1 + 13 z−1 − 14 z−2
H21 (z) = ,
1 − z −1 − z −2
308 Realization of Discrete Systems
and
H22 (z) = z−1 .
Hence, the transfer function of the right block is
1 + 13 z−1 − 14 z−2
H2 (z) = H21 (z) + H22 (z) = + z −1 .
1 − z −1 − z −2
Now, the resulting transfer function can be written in the form
1 − 1.8z−1 + 1.45z−2
H1 (z) =
1 − 1.7z−1 + 1.285z−2
1 − 0.2z−1 + 0.02z−2
H2 (z) =
1 − 0.1z−1 + 0.125z−2
since the zero-pole pairs with small values of imaginary parts should come
later. They are more sensitive to the quantization of coefficients and they
will more probably cause this kind of error. Larger imaginary parts of roots
are less sensitive to these effects. The cascade realization is presented in
Fig.6.40.
Solution 6.6. For a cascade realization, form of the transfer function is
1 1 + z −1
H (z) = .
1 − 12 z−1 + 14 z−2 1 + 34 z−1
Ljubiša Stanković Digital Signal Processing 309
x(n) y(n)
+ + + +
-1 -1
z z
+ + + +
1.7 -1 -1.8 0.1 -1 -0.2
z z
x(n) y(n)
+ + +
-1 -1
z z
+
-3/4 1/2 -1
z
-1/4
1 + z −1
H (z) = 1 −1
.
1+ 4z − 18 z−2 + 3 −3
16 z
1 1 3
y ( n ) = x ( n ) + x ( n − 1 ) − y ( n − 1 ) + y ( n − 2 ) − y ( n − 3 ).
4 8 16
310 Realization of Discrete Systems
z-1
1/2 1/19
z-1
-1/4
-3/19
+
-1
z
-3/4
1 (1 + z −2 )
H (z) = H1 (z) H2 (z) = .
(1 + z −1 ) (1 + z −1 + z −2 )
This form corresponds to the cascade realization presented in Fig.6.43.
Solution 6.8. The z-transforms of these equations are
1 1 2
Y (z)(1 + z−1 ) + W (z)(1 + z−1 ) = X (z)
4 2 3
5 5
Y (z)(1 − z−1 ) + 2W (z)(1 − z−1 ) = − X (z).
4 3
By eliminating W (z) we get
1 5 1
Y (z)[(2 + z−1 )(1 − z−1 ) − (1 − z−1 )(1 + z−1 )]
2 4 2
4 5 1
= X (z)[ (1 − z−1 ) + (1 + z−1 )].
3 3 2
Ljubiša Stanković Digital Signal Processing 311
x(n) y(n)
+ + +
-1 -1
z z
+
-1 -1 -1
z
-1
Y (z) 3 − 12 z−1
H (z) = = ,
X (z) 1 − 34 z−1 + 18 z−2
3 1 1
y(n) − y(n − 1) + y(n − 2) = 3x (n) − x (n − 1).
4 8 2
3 − 12 e− jω
H (e jω ) = .
1 − 34 e− jω + 18 e− j2ω
Based on
Y (z) 3 − 12 z−1 4 1
H (z) = = 3 −1 1 −2
= 1 −1
− ,
X (z) 1 − 4z + 8z 1 − 2z 1 − 14 z−1
z−1 r sin θ
H2 (z) = − .
1 − r cos θz−1
For the feedback holds
It produces
h ( n ) = h ( N − 1 − n ), 0 ≤ n ≤ N − 1
with N = 7, which implies phase function linearity. Thus, the group delay q
is
N−1
q= = 3.
2
Solution 6.11. We have that:
Y1 (e jω ) = H (e jω ) X (e jω )
R(e jω ) = Y1∗ (e jω ) = H ∗ (e jω ) X ∗ (e jω )
Y2 (e jω ) = R(e jω ) H (e jω ) = H ∗ (e jω ) H (e jω ) X ∗ (e jω )
Y (e jω ) = Y2∗ (e jω ) = H (e jω ) H ∗ (e jω ) X (e jω ).
So we get
Y (e jω ) = | H (e jω )|2 X (e jω ).
Obviously, the phase function of the system is equal to zero, for all ω.
Ljubiša Stanković Digital Signal Processing 313
Solution 6.12. (a) Values of the FIR filter, obtained by sampling frequency
response in the frequency domain are
'
'
H (k ) = Hd (e jω )' .
ω =2πnk/N
h(n) = IDFT{ H (k )}
N −1
1
= ∑ H (k )e j2πnk/N .
N k =0
H (e jω ) = FT{h(n)}.
Its values are equal to the desired frequency response at the sampling points
' '
' '
H (e jω )' = Hd (e jω )' .
ω =2πk/N ω =2πk/N
sin(nπ/2) sin(3nπ/4)
hd (n) = IFT{ Hd (e jω )} = + .
πn πn
Using the first N = 15 samples in the discrete-time domain we get
!
hd (n) for −7 ≤ n ≤ 7
h(n) =
0 elsewhere
or for N = 16 !
hd (n) for −8 ≤ n ≤ 7
h(n) =
0 elsewhere.
The frequency response of this FIR filter is
H (e jω ) = DFT{h(n)}.
It is shown in Fig.6.45.
(c) The errors along with the mean square absolute errors Er are
presented in Fig.6.46.
314 Realization of Discrete Systems
jω
H (jΩ) H (e )
d d
3 3
2 2
1 1
0 0
-5 -π 0 π 5 -5 -π 0 π 5
2 2
1 1
0 0
-5 0 5 -5 0 5
2 2
h(n), N=15 h(n), N=14
1 1
0 0
-1 -1
-5 0 5 -6 -4 -2 0 2 4 6
2 2
1 1
0 0
-5 0 5 -5 0 5
Figure 6.44 Design of a FIR filter by frequency sampling of the desired frequency response.
6.5 EXERCISE
1 1 1
y ( n ) = x ( n ) − x ( n − 1 ) + x ( n − 2 ) + y ( n − 1 ) − y ( n − 2 ) − y ( n − 3 ),
2 3 4
plot the direct realization I and II, parallel and cascade realization.
Ljubiša Stanković Digital Signal Processing 315
2
hd(n)
-1
-15 -10 -5 0 5 10 15
h(n), N=15 h(n), N=14
1.5 1.5
1 1
0.5 0.5
0 0
-0.5 -0.5
-10 -5 0 5 10 -10 -5 0 5 10
jω jω
H(e ), N=15 H(e ), N=14
3 3
2 2
1 1
0 0
-5 0 5 -5 0 5
Figure 6.45 FIR filter design using N the most significant values of the impulse response.
z2 − 2
H (z) =
(z − 1)(z − 2)
plot the direct I and II realization, cascade realization, and parallel realiza-
tion.
Exercise 6.3. For a system whose transfer function is
3z−2 + 6
H (z) =
z−3 − 2z−2 + 3z−1 − 6
a) plot the direct realizations I and II, the cascade realization, and the parallel
realization.
b) Find ∑∞ n=−∞ h ( n ), where h ( n ) is the impulse response of the system.
316 Realization of Discrete Systems
E = 0.037954
0.5 r
-0.5
0 1 2 3 4 5 6
Er= 0.028921
0.5
-0.5
0 1 2 3 4 5 6
Figure 6.46 Error in the case of the frequency response sampling (top) and the IIR impulse
response truncation (bottom), along with the corresponding mean square error (Er ) value.
Exercise 6.4. Find the impulse response of the discrete system presented in
Fig.6.47.
Exercise 6.5. Using the impulse invariance method with the sampling step
∆t = 0.1, transform the analog system given with the transfer function
1 + 5s
H (s) =
8 + 2s + 5s2
into discrete, and plot the direct and cascade realization of the system. Is the
obtained discrete system stable?
Exercise 6.6. Using the bilinear transform with the sampling step ∆t = 1,
transform the system given with the transfer function
2+s
H (s) =
8 + 2s + 5s2
into discrete, and plot the direct and cascade realization of the system. Is the
obtained discrete system stable?
Exercise 6.7. Using the bilinear transform, with the sampling step ∆t = 0.2
transform the analog system given with the transfer function
3s + 6
H (s) =
(s + 1)(s + 3)
Ljubiša Stanković Digital Signal Processing 317
x(n) y(n)
+ +
z-1
4 -1
z-1
-5
+ +
0
-1
z
1/2 2
jω
Hd(jΩ) Hd(e )
3 3
2 2
1 1
0 0
-5 - π - π/2 0 π/2 π 5 -5 - π - π/2 0 π/2 π 5
into discrete, and plot the direct realization II of the discrete system.
R
ANDOM
tions. Their values are not known in advance. These signals can be
described by stochastic tools only. Here we will restrict the analysis
to the discrete-time random signals. The first-order and the second-order
statistics will be considered.
1
µx = ( x (1) + x (2) + ... + x ( N )).
N
Example 7.1. Consider a random signal x (n) whose one realization is given
in Table 7.1. Find the mean value of this signal. Find how many samples of
the signal are within the intervals [1, 10], [11, 20],...,[91, 100]. Plot the number
of occurrences of signal x (n) samples within these intervals as a function of
the interval range.
⋆The realization of signal x (n) defined in Table 7.1 is presented in
Fig.7.1.
319
320 Discrete-Time Random Signals
Table 7.1
A realization of random signal
54 62 58 51 70 43 99 52 57 57
56 53 38 61 28 69 87 41 72 72
23 26 66 47 69 71 69 81 68 68
31 55 52 23 60 34 83 39 66 66
37 12 54 42 67 95 89 67 42 42
35 55 54 55 49 77 18 64 73 73
67 56 42 66 50 47 49 25 50 50
61 84 48 67 71 74 35 59 60 60
40 77 52 63 57 42 44 64 36 36
66 39 50 31 11 75 45 62 60 60
120
x(n)
110
100
90
80
70
60 mean(x)
50
40
30
20
10
0
0 10 20 30 40 50 60 70 80 90 100
1 100
100 n∑
µx = x (n) = 55.76.
=1
Ljubiša Stanković Digital Signal Processing 321
25
20
15
10
0
0 10 20 30 40 50 60 70 80 90 100
Figure 7.2 Histogram of random signal x (n) with 10 intervals [10i + 1, 10i + 10], i = 0, 1, 2, ..., 9.
From Table 7.1 or the graph in Fig. 7.1 we can count that, for example,
there is no a signal sample whose value is within the interval [1, 10]. Within
[11, 20] there are two signal samples (x (42) = 12 and x (95) = 11). In a similar
way, the number of signal samples within other intervals are counted and
presented in Fig.7.2. This kind of random signal presentation is called a
histogram of x (n), with defined intervals.
Example 7.2. For the signal x (n) from the previous example assume that a new
random signal y(n) is formed as
! 6
x (n) + 5
y(n) = int ,
10
where int {◦} denotes the nearest integer. It means that y(n) = 1 for 1 ≤
x (n) ≤ 10, y(n) = 2 for 11 ≤ x (n) ≤ 20, ..., y(n) = i for 10(i − 1) + 1 ≤ x (n) ≤
10i up to i = 10. Plot the new signal y(n). What is the set of possible values of
y(n). Present on a graph how many times each of the possible values of y(n)
appeared in this signal realization. Find the mean value of the new signal
y(n) and discuss the result.
⋆ The signal y(n) is shown in Fig.7.3. This signal assumes values from
the set {2, 3, 4, 5, 6, 7, 8, 9, 10}.
For the signal y(n), instead of histogram we can plot a diagram of the
number of occurrences of each value that y(n) can assume. It is presented in
322 Discrete-Time Random Signals
11
y(n)
10
7
mean(y)
6
0
0 10 20 30 40 50 60 70 80 90 100
1 100
100 n∑
µy = y(n) = 6.13.
=1
The mean value can also be written, by grouping the same values of y(n), as
1
µy = (1 · n1 + 2 · n2 + 3 · n3 + ... + 10 · n10 ) =
100
n n n n
= 1 · 1 + 2 · 2 + 3 · 3 + ... + 10 · 10 ,
N N N N
where N = 100 is the total number of signal values and ni is the number
showing how many times each of the values i appeared in y(n). If there is a
sufficient number of occurrences for each outcome value i then
n
Py (i ) = i
N
can be considered as the probability that the value i appears. In that sense
25 0.25
P (i)
y
20 0.2
15 0.15
10 0.1
5 0.05
0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Figure 7.4 Number of appearances of each possible value of y(n) (left) and the probabilities
that the random signal y(n) takes a value i = 1, 2, . . . , 10 (right).
with
10
∑ Py (i) = 1.
i =1
In general, the mean for each signal sample could be different. For
example, if the signal values represent the highest daily temperature during
a year then the mean value is highly dependent on the considered sample.
In order to calculate the mean value of temperature, we have to have several
realizations of these random signals (measurements over M years), denoted
by { xi (n)}, where argument n = 1, 2, 3, ... is the cardinal number of the day
within a year and i = 1, 2, ..., M is the index of realization (year index). The
mean value is then calculated as
1 1 M
M i∑
µ x (n) = ( x1 (n) + x2 (n) + ... + x M (n)) = x i ( n ), (7.2)
M =1
for each n. In this case we have a set (a signal) of mean values {µ x (n)}, for
n = 1, 2, ..., 365.
Example 7.3. Consider a signal x (n) with realizations given in Table 7.2. Its values
are equal to the monthly average of maximal daily temperatures in a city
measured from year 2001 to 2015. Find the mean temperature for each month
over the considered period of years. What is the mean value of temperature
over all months and years? What is the mean temperature for each year?
324 Discrete-Time Random Signals
Table 7.2
Average of maximal temperatures value within months over 15 years, 2001-2015.
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
10 4 18 17 22 29 30 28 27 17 17 5
6 7 11 23 22 32 35 33 22 26 22 8
10 11 10 16 21 26 32 31 23 19 17 4
3 11 13 19 22 26 34 29 26 22 12 9
7 10 13 21 27 29 30 34 24 20 16 11
7 11 17 17 27 25 37 34 33 22 14 14
7 12 13 19 23 32 34 38 21 21 12 10
12 5 9 20 21 37 34 34 27 22 20 7
7 12 13 23 27 33 29 31 25 21 6 11
8 12 10 17 27 33 38 32 23 20 15 9
8 10 13 24 23 33 33 31 27 21 16 8
4 6 15 18 25 26 27 33 23 23 13 11
3 6 16 17 27 28 30 32 29 24 12 10
11 12 14 18 22 29 34 34 23 21 20 11
6 13 8 22 22 29 30 34 23 18 15 8
⋆The signal for years 2001 to 2007 is presented in Fig.7.5. The mean
temperature for the nth month, over the considered years, is
1 15
15 i∑
µ x (n) = x20i (n),
=1
where the notation 20i is symbolic in the sense, 2001, 2002, ... 2015, for
i = 01, 02, ..., 15. The mean-value signal µ x (n) is presented in the last subplot
of Fig. 7.5. The mean value over all months and years is
12 15
1
15 · 12 n=1 i∑
∑
µx = x20i (n) = 19.84.
=1
1 12
12 n∑
µ x (20i ) = x20i (n).
=1
Ljubiša Stanković Digital Signal Processing 325
45 45
x2001(n) x2002(n)
35 35
25 25
15 15
5 5
-5 -5
1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12
45 45
x2003(n) x2004(n)
35 35
25 25
15 15
5 5
-5 -5
1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12
45 45
x2005(n) x2006(n)
35 35
25 25
15 15
5 5
-5 -5
1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12
45 45
x (n) µx(n)
2007
35 35
25 25
15 15
5 5
-5 -5
1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12
Figure 7.5 Several realizations of a random signal x20i (n), for i = 01, 02, ..., 07 and the mean
value µ x (n) for each sample (month) over 15 available realizations.
326 Discrete-Time Random Signals
3) The sum of probabilities that x (n) takes any value ξ i over the set A
of all possible values of ξ is a certain event. Its probability is 1,
∑ Px(n) (ξ ) = 1.
ξ∈A
Example 7.4. Consider a random signal whose values are equal to the numbers ap-
pearing in a die tossing. The set of possible signal values is ξ i ∈ {1, 2, 3, 4, 5, 6}.
Find
Probability { x (n) = 2 or x (n) = 5}
and
Probability { x (n) = 2 and x (n + 1) = 5} .
⋆Events that x (n) = 2 and x (n) = 5 are obviously mutually exclusive. Thus,
1 1 1
Probability { x (n) = 2 or x (n) = 5} = Px(n) (2) + Px(n) (5) = + = .
6 6 3
The events that x (n) = 2 and x (n + 1) = 5 are statistically independent. In
this case
11 1
Probability { x (n) = 2 and x (n + 1) = 5} = Px(n) (2) Px(n) (5) = = .
66 36
Example 7.5. Assume that a signal x (n) length is N and that the number of samples
disturbed by an extremely high noise is I. The observation set of signal
samples is taken as a set of M < N randomly positioned signal samples. What
is the probability that within M randomly selected signal samples there are
no samples affected by the high noise? If N = 128, I = 16, and M = 32 find
how many sets of M samples without high noise can be expected in 1000
realizations (trials).
⋆Probability that the first randomly chosen sample is not affected by
the high noise could be calculated as a priori probability,
N−I
P (1) =
N
since there are N samples in total and N − I of them are noise-free. Probability
that the first randomly chosen sample is not affected by high noise and that,
at the same time, the second randomly chosen sample is not affected by high
noise is equal to to the product of their probabilities,
N− I N−1− I
P (2) = .
N N−1
Here we used so called conditional probability property stating that
the probability that both events A and B occur is
where P( A) is the probability that event A occurs, while P( B/A) denotes the
probability that event B occurs subject to the condition that event A already
occurred.
328 Discrete-Time Random Signals
P(32) = 0.0112.
It means that if we repeat the whole procedure 1000 times (1000 realizations)
we can expect
P(32) × 1000 = 11.2,
i.e., about 11 realizations when none of M signal samples is disturbed by the
high noise.
The mean value is calculated as a sum over the set of possible ampli-
tudes, weighted by the corresponding probabilities,
∞
µ x (n) = E{ x (n)} = ∑ ξ i Px(n) (ξ i ). (7.4)
i =1
"∞
p x(n) (ξ )dξ = 1.
−∞
The probability of an event that a value of signal x (n) is within a ≤ x (n) < b
is
"a
Probability { a ≤ x (n) < b} = p x(n) (ξ )dξ.
b
Ljubiša Stanković Digital Signal Processing 329
"χ
F (χ) = Probability { x (n) < χ} = p x(n) (ξ )dξ.
−∞
"∞
µ x (n) = E{ x (n)} = ξ p x(n) (ξ )dξ. (7.6)
−∞
7.1.3 Median
If N is an even number then the median is defined as the mean value of two
samples nearest to ( N − 1)/2,
B C B C
N N
s 2 +s 2 +1
median{ x (n)} = , for an even N.
2
The median will not be influenced by a possible small number of big outliers
(signal values being significantly different from the values of the rest of
data).
330 Discrete-Time Random Signals
120
sort(x)
110
100
90
80
70
60 median(x)
50
40
30
20
10
0
0 10 20 30 40 50 60 70 80 90 100
In some cases the number of big outliers is small. Thus the median will
neglect many signal values that could produce a good estimate of the mean
value. In that cases, the best choice would be to use not only the mid-value
in the sorted signal, but several samples of the signal around its median and
to calculate their mean, for odd N, as
L * +
1 N+1
LSmean{ x (n)} = ∑
2L + 1 i=−
s
2
+ i .
L
With L = ( N − 1)/2 all signal values are used and LSmean { x (n)} is the
standard mean of a signal. With L = 0 the value of LSmean{ x (n)} is the
Ljubiša Stanković Digital Signal Processing 331
7.1.4 Variance
For random signals that take values from a discrete set, with known proba-
bilities, the variance is defined as
For a random signal x (n) whose values are available in M realizations the
variance can be estimated as a mean square deviation of the signal values
from their corresponding mean values µ x (n),
1 B C
σx2 (n) = | x1 (n) − µ x (n)|2 + ... + | x M (n) − µ x (n)|2 .
M
For a small number of samples, this estimate tends to produce lower values
of the standard deviation. Thus, an adjusted version, the sample standard
deviation, is also used. It reads
=
1 B C
σx (n) = |( x1 (n) − µ x (n))|2 + ... + | x M (n) − µ x (n)|2 .
M−1
This form confirms the fact that in the case when only one sample is
available, M = 1, we should not be able to estimate the standard deviation.
For the case of random signals whose amplitude is continuous the
variance, in terms of the probability density function p x(n) (ξ ), is
Table 7.3
Random signal z(n)
55 57 56 54 59 52 66 54 56 56
55 55 51 56 48 59 63 52 59 59
47 48 58 53 58 59 59 61 58 58
49 55 54 47 56 50 62 51 58 58
50 44 55 50 58 58 63 58 52 52
50 55 55 55 53 60 46 57 59 59
58 55 58 58 54 53 54 48 54 54
57 62 53 58 59 60 50 56 56 56
51 60 54 57 55 52 52 57 50 50
58 51 54 49 44 60 52 57 56 56
120
z(n)
110
100
90
80
70
60 mean(z)
50
40
30
20
10
0
0 10 20 30 40 50 60 70 80 90 100
Example 7.7. For the signal x (n) from Example 7.1 calculate the mean and vari-
ance. Compare it with the mean and variance of the signal z(n) given in Table
7.3.
Ljubiša Stanković Digital Signal Processing 333
⋆The mean value and variance for signal x (n) are µ x = 55.76 and
σx2 = 314.3863. The standard deviation is σx = 17.7309. It is a measure of signal
value deviations from the mean value. For the signal z(n) the mean value
is µz = 55.14 (very close to µ x ), while the variance is σz2 = 18.7277 and the
standard deviation is σz = 4.3275. Deviations of z(n) from the mean value
are much smaller. If signals x (n) and z(n) were measurements of the same
physical value, then the individual measurements from z(n) would be much
more reliable than the individual measurements from x (n).
Example 7.8. A random signal x (n) can take values from the set {0, 1, 2, 3, 4, 5}. It
is known that for k = 1, 2, 3, 4 the probability of x (n) = k is twice higher than
the probability of x (n) = k + 1. Find the probabilities P{ x (n) = k }. Find the
mean value and variance of signal.
⋆Assume that P{ x (n) = 5} = A. Then the probabilities that x (n) takes
a value k are
k 0 1 2 3 4 5
P{ x (n) = k} 32A 16A 8A 4A 2A A
Constant A can be found from ∑k P{ x (n) = k } = 1. It results in A = 1/63.
Now we have
19
µ x(n) = ∑ kP{ x (n) = k} =
k
21
* +2
19 626
σx2(n) = ∑ k − P{ x (n) = k } = .
k
21 441
Example 7.9. Consider a real-valued random signal x (n) with samples whose
values are uniformly distributed over interval −1 ≤ x (n) ≤ 1. a) Find the
mean value and variance of the signal samples. b) Signal y(n) is obtained as
y(n) = x2 (n). Find the mean value and variance of signal y(n).
⋆Since the random signal x (n) is uniformly distributed, its probability
density function is of the form
!
A for |ξ | ≤ 1
p x (n) (ξ ) = .
0 for |ξ | > 1
&∞
Constant A = 1/2 is obtained from −∞ p x(n) (ξ )dξ = 1. Now we have
"∞ "1
1
µ x (n) = ξ p x(n) (ξ )dξ = ξdξ = 0
2
−∞ −1
"∞ "1
1 2 1
σx2(n) = (ξ − µ x(n) )2 p x(n) (ξ )dξ = ξ dξ = .
2 3
−∞ −1
334 Discrete-Time Random Signals
"1
1 1
µy(n) = ξ √ dξ =
2 ξ 3
0
"1
1 1 4
σy2(n) = (ξ − )2 √ dξ = .
3 2 ξ 45
0
where p x(n),y(m) (ξ, ζ ) is the joint probability density function. The probabil-
ity of an event a ≤ x (n) < b and c ≤ y(m) < d is
"a "d
Probability { a ≤ x (n) < b, c ≤ y(m) < d} = p x(n),y(m) (ξ, ζ )dξdζ.
b c
F (θ ) = P{s(n) < θ }
= Probability{−∞ < a(n) < ∞, −∞ < a(n) + b(n) ≤ a < θ }
"∞ θ"−ζ "∞ θ"−ζ
= p a(n),b(n) (ξ, ζ )dξdζ = pb(n) (ζ ) p a(n) (ξ )dξdζ.
−∞ −∞ −∞ −∞
"∞ θ"−ζ
dF (θ ) d
ps(n) (θ ) = = pb(n) (ζ ) p a(n) (ξ )dξdζ
dθ dθ
−∞ −∞
"∞
= pb(n) (ζ ) p a(n) (θ − ζ )dζ = pb(n) (θ ) ∗θ p a(n) (θ ),
−∞
p x ( n ) ( θ ) = p c ( n ) ( θ ) ∗ θ p b ( n ) ( θ ) ∗ θ p a ( n ) ( θ ),
⎧
⎪ ( θ +3)2
⎪
⎪ 16 for − 3 ≤ θ ≤ −1
⎪
⎪
⎪
⎨ 3−8θ
2
for − 1 < θ ≤ 1
p x (n) (θ ) = ( θ −3)2 .
⎪
⎪ for 1 < θ ≤ 3
⎪
⎪ 16
⎪
⎪
⎩ 0 for |θ | > 3
The mean value and variance can be calculated from p x(n) (θ ), or in direct
way, as
1 M
r xx (n, m) = E{ x (n) x ∗ (m)} = xi (n) xi∗ (m).
M i∑
(7.8)
=1
µ x (n) = E{ x (n)} = µ x
r xx (n, m) = E{ x (n) x ∗ (m)} = r xx (n − m). (7.14)
A signal is stationary in the strict sense (SSS) if all order statistics are
invariant to a shift in time. The relations introduced for the second-order
statistics may be extended to the higher-order statistics. For example, the
third-order moment of a signal x (n) is defined by
1
µ x (n) = lim ( x (n) + x2 (n) + ... + x M (n))
M 1
M→∞
1
= lim ( xi (n) + xi (n − 1) + ... + xi (n − N + 1)).
N →∞ N
338 Discrete-Time Random Signals
k =1
where θk are random variables uniformly distributed over −π < θk ≤ π.
All random variables are statistically independent. Frequencies ωk are −π <
ωk ≤ π for each k.
⋆The mean value is
K K "π
j ( ωk n + θ k ) 1 j ( ωk n + θ k )
µx = ∑ a k E{ e }= ∑ ak 2π
e dθk = 0.
k =1 k =1 −π
The autocorrelation is
K K K
r xx (n) = E{ ∑ ak e j(ωk (n+m)+θk ) ∑ ak e− j(ω m+θ ) } = ∑ a2k e jω n ,
k k k
k =1 k =1 k =1
while the power spectral density for −π < ω ≤ π is
K
Sxx (e jω ) = FT{r xx (n)} = 2π ∑ a2k δ(ω − ωk ).
k =1
Ljubiša Stanković Digital Signal Processing 339
Remind that the average signal power of a signal x (n) has been de-
fined as
1 N # $
PAV = lim ∑ | x (n)|2 = | x (n)|2 .
N →∞ 2N + 1 n=− N
1 ' '2
' '
Pxx (e jω ) = lim E{'X N (e jω )' } (7.19)
N →∞ 2N + 1
' '2
1 ' N '
' '
= lim E{' ∑ x (n)e− jωn ' }.
N →∞ 2N + 1 'n=− N '
Different notation is used since the previous two definitions, (7.16) and
(7.19) of power spectral density, will not produce the same result, in general.
We can write
N N
1
Pxx (e jω ) = lim E{ ∑ ∑ x (m) x ∗ (n)e− jω (m−n) }.
N →∞ 2N + 1 m=− N n=− N
2N
Pxx (e jω ) = lim ∑ r xx (k )e− jωk = FT{r xx (n)} = Sxx (e jω ).
N →∞
k=−2N
7.3 NOISE
Spectral density of this kind of noise is constant (like it is the case in the
white light). If this property is not satisfied, then the power spectral density
is not constant. Such a noise is referred to as colored.
Regarding to the distribution of noise ε(n) amplitudes the most com-
mon types of noise in signal processing are: uniform, binary, Gaussian, and
impulsive noise.
1
pε(n) ( ξ ) = , for − ∆/2 ≤ ξ < ∆/2 (7.21)
∆
Ljubiša Stanković Digital Signal Processing 341
1.5 1.5
p ( ξ)
x
1 1
0.5 0.5
0 0
-0.5 -0.5
-1 -1
-1.5 -1.5
0 10 20 30 40 50 60 0 0.5 1 1.5
Figure 7.8 A realization of uniform noise (left) with probability density function (right) with
∆ = 0.5.
∆/2
"
∆2
σε2 = ξ 2 pε(n) (ξ )dξ = .
12
−∆/2
µε = ∑ ξPx (ξ ) = (−1)(1 − p) + 1 · p = 2p − 1.
ξ =−1,1
The variance is
A special case is when the values from the set {−1, 1} are equally probable,
that is when p = 1/2. Then we get µε = 0 and σε2 = 1.
342 Discrete-Time Random Signals
Example 7.12. Consider a set of N → ∞ balls. Equal number of balls is marked with
1 (or white) and 0 (or black). A random signal x (n) corresponds to drawing of
four balls in a row. It has four values x (0), x (1), x (2), and x (3). Signal values
x (n) are equal to the marks on the drawn balls. Write all possible realizations
of x (n). If k is the number of appearances of value 1 in the signal, write the
probabilities for each value of k.
x (0) 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
x (1) 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1
x (2) 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
x (2) 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
k 0 1 1 2 1 2 2 3 1 2 2 3 2 3 3 4
( a + b )4
B C B C B C B C B C
4 4
= 0 a4 + 1 a3 b + 42 a2 b2 + 43 ab3 + 44 b4
with a = 1/2 and b = 1/2. For the case when N is a finite number see Problem
7.6.
An interesting form of the random variable that can assume only two
possible values {−1, 1} or {No, Yes} or { A, B} is the binomial random
variable. It has been introduced through the previous simple example. In
general, if a signal x (n) assumes value B from the set { A, B} with probability
p, then the probability that there is exactly k values of B in a sequence of N
samples of x (n) is
B C
P(k) = N
k p k (1 − p ) N − k
N!
= p k (1 − p ) N − k .
k!( N − k )!
N
µ y = E{ y } = ∑ kP(k)
k =0
N
N!
= ∑ k k!( N − k)! pk (1 − p) N −k .
k =0
Since the first term in summation is 0 we will shift the summation for one
and reindex it to
N −1
N ( N − 1) !
µ y = E{ y } = ∑ (k + 1) (k + 1)!(( N − (k + 1))! pk+1 (1 − p) N−(k+1)
k =0
N −1
( N − 1) !
= Np ∑ pk (1 − p)( N −1)−k .
k =0
k!(( N − 1) − k )!
N −1 B C
1 = ( p + (1 − p)) N −1 = ∑ N −1
k pk (1 − p)( N −1)−k
k =0
N −1
( N − 1) !
= ∑ pk (1 − p)( N −1)−k
k =0
k!(( N − 1) − k )!
µy = E{y} = N p.
As we could write from the beginning , the expected value of the number
of appearances of an event B, whose probability is p, in N realizations is
E{y} = N p. This derivation was performed not only to prove this fact, but
it will lead us to the next step in deriving the variance of the event y, by
using the expected value of the product of y and y − 1,
N
E{y(y − 1)} = ∑ k ( k − 1) P ( k )
k =0
N
N!
= ∑ k(k − 1) k!( N − k)! pk (1 − p) N −k .
k =0
344 Discrete-Time Random Signals
Since the first two terms are 0 we can reindex the summation into
N −2
N!
E{y(y − 1)} = ∑ (k + 2)(k + 1) (k + 2)!( N − 2 − k)! pk+2 (1 − p) N−2−k
k =0
N −2
( N − 2) !
= N ( N − 1) p2 ∑ p k (1 − p ) N −2− k .
k =0
k!( N − 2 − k )!
The relation
N −2
( N − 2) !
∑ pk (1 − p) N −2−k = ( p + (1 − p)) N −2 = 1
k =0
k!( N − 2 − k )!
is used to get
E{y(y − 1)} = N ( N − 1) p2 .
The variance of y follows from
3 3
ε(n) p ( ξ)
x
2 2
1 1
0 0
-1 -1
-2 -2
-3 -3
0 10 20 30 40 50 60 0 0.25 0.5
Figure 7.9 A realization of Gaussian noise (left) with probability density function (right).
"λ * +
1 2 / (2σ2 ) λ
Probability{|ε(n)| < λ} = √ e−ξ ε dξ = erf √ (7.23)
σε 2π 2σε
−λ
where
"λ
2 2
erf(λ) = √ e−ξ dξ
π
0
is the error function.
Commonly used probabilities that the absolute value of the noise is
within the standard deviation, two standard deviations (two-sigma rule),
or three standard deviations are:
√
Probability{−σε < ε(n) < σε } = erf(1/ 2) = 0.6827, (7.24)
√
Probability{−2σε < ε(n) < 2σε } = erf( 2) = 0.9545,
√
Probability{−3σε < ε(n) < 3σε } = erf(3/ 2) = 0.9973.
0.5
px( ξ)
0.4
0.3
0.2
0.1
0
-4 -3 -2 -1 0 1 2 3 4
Figure 7.10 Probability density function with intervals corresponding to −σε < ε(n) < σε ,
−2σε < ε(n) < 2σε , and −3σε < ε(n) < 3σε . Value of σε = 1 is used.
"2.5 √
1 2 2
P= √ e−ξ /(2·1.031 ) dξ = erf(2.5/( 2 · 1.031)) = 0.9847.
1.031 2π
−2.5
1 2 2
p x(n),n̸=n0 (ξ ) = √ e−ξ /(2σε ) .
σε 2π
The probability that any of these samples is smaller than a value of λ could
be defined by using (7.23)
of λ is
−
PN −1 (λ ) = Probability{All N − 1 values of x (n ) < λ, n ̸ = n0 }
@ √ A N −1
= 0.5 + 0.5 erf(λ/( 2σε )) .
Example 7.15. Random signal x (n) is a Gaussian noise with the mean µ x = 1
and variance σx2 = 1. A random sequence y(n) is obtained by omitting
samples from signal x (n) that are either negative or higher that 1. Find the
probability density function of sequence y(n). Find its µy and σy .
⋆The probability density function for the sequence y(n) is
% ( ζ −1)2
py(n) (ζ ) = B √1 e− 2 for 0 < ζ ≤ 1
2π
0 otherwise
&∞
Constant B can be calculated from −∞ py(n) (ζ )dζ = 1, resulting in
1
B = 2/ erf( √ ).
2
348 Discrete-Time Random Signals
Now we have
"1 √
2 1 − ( ζ −1)2 2(1 − e−1/2 )
µy(n) = ζ √ e 2 dζ = 1 − √ ≈ 0.54
erf( √1 ) 2π π erf( √1 )
0 2 2
"1 ( ζ −1)2
2 1
σy2(n) = ( ζ − µ y ( n ) )2 √ √ e− 2 dζ ≈ 0.08.
erf( 2) 2π
0
Example 7.16. Consider a random signal x (n) that can assume values
{No, Yes} with probabilities 1 − p and p. If a random realization of this
signal is available with N = 1000 samples and we obtained that the event
Yes appeared 555 times find the interval where the true p will be with
probability of 0.95. Denote by y the number of observed Yes values divided
by N. We can assume that the mean value estimates for various realizations
are Gaussian distributed.
⋆This is a binomial random variable with the mean p and the variance
555 555
p (1 − p ) ∼ 1000 (1 − 1000 ) 0.2470
σy2 = = =
N 1000 1000
σy = 0.0157.
The variance is
With ξ = ρ cos α and ζ = ρ cos α (the Jacobian of the polar coordinate trans-
formation is J = |ρ|) we get
F "χ "2π
1 2 /σ2
P{ ε2r (n) + ε2i (n) < χ} = 2 e−ρ ρdρdα
σ π
0 0
2 /σ2
"χ χ"
2 −ρ2 /σ2 2 /σ2
= 2 e ρdρdα = e−λ dλ = (1 − e−χ )u(χ) = F|ε(n)| (χ).
σ
0 0
dF|ε(n)| (ξ ) 2ξ −ξ 2 /σ2
p|ε(n)| (ξ ) = = e u ( ξ ). (7.27)
dξ σ2
2ξ − ξ 22
py ( x ) = e σ u(ξ )
σ2
The probability that y(n) ≥ A is
2
− A2
P{ξ > A} = 1 − P{ξ ≤ A} = e σ .
This noise is used to model disturbances when strong impulses occur more
often than in the case of a Gaussian noise. Due to possible stronger pulses,
their probability density function decay toward ±∞ is slower than in the
case of Gaussian noise.
The Laplacian noise has the probability density function
1 −|ξ |/α
pε(n) (ξ ) = e .
2α
Ljubiša Stanković Digital Signal Processing 351
Gaussian distribution
pε(ξ)
0.6
0.4
0.2
0
-5 -4 -3 -2 -1 0 1 2 3 4 5
Laplacian distribution
pε(ξ)
0.6
0.4
0.2
0
-5 -4 -3 -2 -1 0 1 2 3 4 5
Figure 7.11 The Gaussian and Laplacian noise histograms (with 10000 realizations), with
corresponding probability density function (dots).
The Cauchy distributed noise ε(n) is a random signal that can be obtained
as a ratio of two independent Gaussian random signals ε 1 (n) and ε 2 (n), i.e.,
352 Discrete-Time Random Signals
as
ε 1 (n)
ε(n) = .
ε 2 (n)
In the case of noisy signals the noise could added to the signal s(n). Then
we have
x ( n ) = s ( n ) + ε ( n ).
This is an additive noise. For a deterministic signal s(n)
x (n) = (1 + ε(n))s(n).
In this case
Both the mean and the variance are signal dependent in the case of multi-
plicative noise.
N −1
X (k) = ∑ (s(n) + ε(n))e− j2πkn/N = S(k) + Ξ(k). (7.29)
n =0
µ X (k ) = E{ X (k )} = S(k ). (7.30)
we get
σX2 (k ) = σε2 N. (7.32)
If the deterministic signal is a complex sinusoid,
with a frequency adjusted to the grid ω0 = 2πk0 /N, then its DFT is
S(k ) = ANδ(k − k0 ).
Peak signal-to-noise ratio, being relevant parameter for the DFT based
estimation of frequency, is
X(k)
x(n)
n k
X(k)
x(n)
n k
Figure 7.12 Illustration of a signal x (n) = cos(6πn/64) and its DFT (top row); the same signal
corrupted with additive zero-mean real-valued Gaussian noise of variance σε2 = 1/4, along
with its DFT (bottom row).
N −1
2
∑ | x (n)|
Ex n =0 N A2 A2
SNRin = = N = Nσ2 = σ2 . (7.35)
Eε N −1 M
2 ε ε
∑ E |ε(n)|
n =0
If the maximal DFT value is detected then only its value could be used for
the signal reconstruction (equivalent to the notch filter at k = k0 being used).
The DFT of output signal is then
Y ( k ) = X ( k ) δ ( k − k 0 ).
N −1
2
∑ | x (n)|
Ex n =0
SNRout = = !' '2 6
Eε X N −1 ' Ξ(k ) '
∑ E ' N0 e j2πk0 n/N '
n =0
N A2 A2
= =N = N · SNRin .
N Nσε2 σε2
N2
Taking 10 log(◦) of both sides we get the signal-to-noise ratio relation in dB,
Example 7.18. If the DFT of a noisy signal s(n) + ε(n) is calculated using a window
function w(n), find its mean and variance. Noise is white, rεε = σε2 δ(n), with
zero-mean.
⋆Here,
N −1
X (k) = ∑ w(n) [s(n) + ε(n)] e− j2πkn/N .
n =0
Example 7.19. The DFT definition, for a given frequency index k, can be under-
stood as
N −1
X (k) = ∑ (s(n) + ε(n))e− j2πkn/N
n =0
M N
=N mean (s(n) + ε(n))e− j2πkn/N (7.38)
n=0,1,...,N −1
356 Discrete-Time Random Signals
can produce better results than (7.38). Calculate the value X (0) using (7.38)
and estimate it by (7.39) for s(n) = exp( j4πn/N ) with N = 8 and noise
ε(n) = 2001δ(n) − 204δ(n − 3). Which one is closer to the noise-free DFT
value?
⋆If we can expect strong impulsive noise then the mean value will be
highly sensitive to this noise. The median based calculation is less sensitive
to strong impulsive noise. For the given signal
Consider a set of data x (n), for 0 ≤ n ≤ N − 1. Assume that this set of data
are noisy samples of signal s(n) = Ae j2πk0 n/N . Additive noise ε(n) is white,
complex-valued Gaussian with zero-mean independent real and imaginary
parts and variance σε2 . The aim is to find the signal s(n) parameters from
the noisy observations x (n). Since the signal form is known we look for
a solution of the same form, using the model be j2πkn/N where b and k are
parameters that have to determined, and α = {b, k } is the set of parameters.
Parameter b is complex-valued. It includes amplitude and initial phase of
the model. For each value of x (n) we may define an error as a difference of
the given value x (n) and the assumed model, at the considered instant n,
Since the noise is Gaussian, the probability density function of the error is
1 −|e(n,α)|2 /(2σε2 )
p(e(n, α)) = e .
2πσε2
The joint probability density function for all samples from the data set is
equal to the product of individual probability density functions
1 N −1 2 2
pe (e(0, α), e(1, α), ..., e( N − 1, α)) = 2 N
e− ∑n=0 |e(n,α)| /(2σε ) .
(2πσε )
N −1 N −1 ' '2
' '
ϵ(α) = ∑ |e(n, α)|2 = ∑ 'x (n) − be j2πkn/N ' . (7.42)
n =0 n =0
1 N −1 M N 1
b= ∑ x (n)e− j2πkn/N = mean x (n)e− j2πkn/N = X (k ).
N n =0 N
A specific value of parameter k that minimizes ϵ(α) and gives the estimate
of the signal frequency index k0 is obtained by replacing the obtained b back
into relation (7.42) defining ϵ(α),
( )
N −1 N −1
j2πkn/N 2 2
ϵ(α) = ∑ | x (n) − be | = ∑ | x (n)| − N | b |2 .
n =0 n =0
If the additive noise were, for example, Laplacian then the probability
density function would be p(e(n, α)) = 2σ1 ε e−|e(n,α)|/σε , and the solution of
ϵ(α) = ∑nN=−01 |e(n, α)| minimization would follow from
M N
X (k ) = Nmedian x (n)e− j2πkn/N .
where ε(n) is a complex zero mean Gaussian white noise with independent
real and imaginary parts, with variance σε2 . Its DFT is
N −1
X (k ) = ∑ (s(n) + ε(n))e− j2πkn/N = N Aδ(k − k0 ) + Ξ(k),
n =0
with σX2 (k ) = σε2 N and E{Ξ(k )} = 0. The real and imaginary parts of the DFT
X (k0 ) at the signal position k = k0 are Gaussian random variables, with total
variance σε2 N, or
Next, we will find the probability that a DFT value of noise at any
k ̸= k0 is higher than the signal DFT value at k = k0 . This case corresponds
Ljubiša Stanković Digital Signal Processing 359
2ξ −ξ 2 /(σε2 N )
q(ξ ) = e , ξ ≥ 0.
σε2 N
The DFT at a noise only position takes a value greater than Ξ, with proba-
bility
"∞
2ξ −ξ 2 /(σε2 N ) Ξ2
Q(Ξ) = e dξ = exp (− ). (7.44)
σε2 N σε2 N
Ξ
The probability that a DFT of noise only is lower than Ξ is [1 − Q(Ξ)]. The
total number of noise only points in the DFT is M = N − 1. The probability
that M independent DFT noise only values are lower than Ξ is [1 − Q(Ξ)] M .
Probability that at least one of M DFT noise only values is greater than Ξ, is
The probability density function for the absolute DFT values at the
position of the signal (whose real and imaginary parts are described by
(7.43)) is Rice-distributed
2ξ −(ξ 2 + N 2 A2 )/(σε2 N )
p(ξ ) = e I0 (2N Aξ/(σε2 N )), ξ ≥ 0, (7.46)
σε2 N
"∞ "∞
( - .M )
ξ2
PE = G (ξ ) p(ξ )dξ = 1 − 1 − exp(− 2 )
σε N
0 0
2ξ 2 2 2 2
× 2 e−(ξ + N A )/(σε N ) I0 (2N Aξ/(σε2 N ))dξ. (7.47)
σε N
1
Ryy (z) = R xx (z) H (z) H ∗ ( ).
z∗
The input signal is a zero-mean white noise ε(n) with variance σε2 . Find the
cross-correlation of the input and output signal and the autocorrelation of
the output signal. For a = −1 find the power spectral density of the output
signal.
⋆The system transfer function is
Since the input signal is a white noise of variance σε2 its autocorrelation, by
definition, is
r xx (n) = rεε (n) = σε2 δ(n).
The power spectral density of the input signal is
∞
Sxx (ω ) = ∑ r xx (n)e− jωn = σε2 .
n=−∞
The z-transform of the autocorrelation function of the output signal, for linear
time-invariant system, is
Syy (ω ) = Ryy (e jω )
= σε2 (1 + a2 + a4 + 2a(1 + a2 ) cos ω + 2a2 cos(2ω )),
while the z-transform of the cross-correlation of the input and output signal
is
Ryx (z) = H (z) R xx (z) = (1 + az−1 + a2 z−2 )σε2 .
Its inverse z-transform is the cross-correlation,
with the input signal x (n) = ε(n), µε = 0 and rεε (n) = δ(n), find:
a) Mean value µy (n) and autocorrelation ryy (n) of the output signal,
b) Power spectral density functions Syy (ω ) and Syx (ω ).
⋆a) The mean value of output signal is
µy = µ x H (e j0 ) = µε H (e j0 ) = 0.
1
Ryy (z) =
(1 − 0.9z−1 )(1 − 0.4z−1 )(1 − 0.9z)(1 − 0.4z)
or - .
25 z z
Ryy (z) = − .
8 (z − 0.4)(z − 1/0.4) (z − 0.9)(z − 1/0.9)
The inverse z-transform of Ryy (z) is
- .
25 0.9 0.4
ryy (n) = (0.9)|n| − (0.4)|n| .
8 0.19 0.84
1
Syy (ω ) = Ryy (z)|z=e jω = ,
(1.16 − 0.8 cos ω )(1.81 − 1.8 cos ω )
while the cross-power spectral density function Syx (ω ) can be defined as the
value of Ryx (z) at z = e jω
Example 7.22. A white noise ε(n) with variance σε2 and zero mean is an input to
a linear time-invariant system. If the impulse response of the system is h(n)
show that
E { x (n)y(n)} = h(0)σε2
and
∞
σy2 = σε2 ∑ |h(n)|2 = σε2 Eh ,
n=−∞
and
r xx (n) = σε2 δ(n)
we get
∞
E { x (n)y(n)} = ∑ h(k )σε2 δ(k ) = h(0)σε2 .
k=−∞
The variance of output signal is defined by
or % ;
∞ ∞
σy2 = E ∑ h(k ) x (n − k ) ∑ h∗ (k ) x ∗ (n − k ) −
% k=−∞ ; k% =−∞ ;
∞ ∞
−E ∑ h(k ) x (n − k) E ∑ h∗ (k ) x ∗ (n − k ) .
k=−∞ k=−∞
Thus, we get
∞ ∞
σy2 = ∑ ∑ h(k)h∗ (l ) E { x (n − k) x ∗ (n − l )}
k=−∞ l =−∞
∞ ∞
= ∑ ∑ h(k)h∗ (l )r xx (l − k).
k=−∞ l =−∞
Since r xx (n) = σε2 δ(n) , i.e., r xx (l − k ) = σε2 δ(l − k ) , only the terms with l = k
remain in the double summation expression for the variance σy2 , producing
∞
σy2 = σε2 ∑ |h(k)|2 = σε2 Eh .
k=−∞
366 Discrete-Time Random Signals
G
H (z) =
(1 − r1 e jω1 z−1 )(1 − r2 e jω2 z−1 )...(1 − r Np e jω Np z−1 )
G
= .
1 + a1 z −1 + a2 z −2 + ... + a Np z− Np
when the input is a white noise. The amplitudes of the poles ri are inside
(and close to) the unit circle. The discrete-time domain description of this
system is
For k = 0 it follows
The previous equations are known as the Yule-Walk equations. The matrix
form of this system is
⎡ ⎤ ⎡ 2 ⎤
⎡ ⎤ 1 G
ryy (0) ryy (1) ... ryy ( Np ) ⎢ a1 ⎥ ⎢ 0 ⎥
⎢ ryy (1) ryy (0) ... ryy ( Np − 1) ⎥ ⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢ a2 ⎥ = ⎢ 0 ⎥ . (7.57)
⎣ ... ... ... ... ⎦ ⎢ ⎥ ⎢ ⎥
⎣ ... ⎦ ⎣ ... ⎦
ryy ( Np ) ryy ( Np − 1) ... ryy (0)
a Np 0
1 N −1− k
N − k n∑
ryy (k ) = y(n + k )y(n) for 0 ≤ k ≤ N − 1, (7.59)
=0
and ryy (k ) = ryy (−k ) for − N + 1 ≤ k < 0. These values are then used in
(7.57) for the autoregressive spectral estimation.
Next we will comment the estimated autocorrelation within the basic
definition of the power spectral density, Section 7.2.3. Relation (7.59) corre-
sponds to the unbiased estimation of the autocorrelation function. Power
spectral density, according to (7.17), is calculated as Syy (ω ) = FT{ryy (k )}.
Since the autocorrelation estimates for a large k use only a small
number of signal samples in averaging, they are not reliable. It is common
to apply a triangular (Bartlett) window function (w(k ) = ( N − |k |)/N) to
reduce the weight of these estimates in the Fourier transform calculation
1 N −1− k
N − k n∑
w(k )ryy (k ) = w(k ) y(n + k )y(n)
=0
( N − k ) 1 N −1− k
N − k n∑
= y(n + k )y(n)
N =0
N −1− k
1
=
N ∑ y(n + k )y(n) (7.60)
n =0
368 Discrete-Time Random Signals
within 0 ≤ n ≤ 127, where ϕ1 and ϕ2 are random variables. Plot the power
spectral density calculated using:
(a) The Fourier transform of ryy (k )
N −1
Syy (ω ) = FT{ryy (k )} = ∑ ryy (k )e− jωk .
k=− N +1
1 M −1
Yi (e jω ) = y(iR + n)e− jωn
M n∑
=0
for i = 0, 1, ..., 6 and averaging the power spectral density over these intervals
(Welch periodogram)
A 1 K −1 '' '2
'
Syy (ω ) = ∑ 'Yi (e jω )' .
K i =0
1
0.5
0
(a)
-0.5
-3 -2 -1 0 1 2 3
1
0.5
0
(b)
-0.5
-3 -2 -1 0 1 2 3
1
0.5
0
(c)
-0.5
-3 -2 -1 0 1 2 3
1
0.5
0
(d)
-0.5
-3 -2 -1 0 1 2 3
Figure 7.13 Spectral analysis of sinusoidal signals with random phases (normalized values).
x ( n ) = s ( n ) + ε ( n ),
370 Discrete-Time Random Signals
where s(n) is a known function with the Fourier transform S(e jω ) and ε(n)
is a white noise with power spectral density σε2 . The problem is to find a
system with a maximal output if the input x (n) contains the signal s(n).
The output signal is used to test the hypothesis H1 : presence of the signal
s(n) in x (n).
The output of a system with impulse response h(n), with the fre-
quency response H (e jω ), to the signal x (n) is of the form
where ys (n) and yε (n) are the system outputs to the inputs s(n) and ε(n),
respectively. For the output signal ys (n) holds
Ys (e jω ) = H (e jω )S(e jω ).
"π
1
y s ( n0 ) = H (e jω )S(e jω )e jωn0 dω
2π
−π
' '2
' "π '
' 1 '
|ys (n0 )|2 = '' H (e jω )S(e jω )e jωn0 dω '' .
' 2π
−π
'
The aim is to maximize the output signal at an instant n0 if the input signal
contains s(n). According to Schwartz’s inequality (for its discrete form see
Section 10.3.3)
' '
' π ' "π ' "π '
' 1 " jω jω jωn0
' 1 ' jω '2
' 1 ' jω '
'2
' H ( e ) S ( e ) e dω ' ≤ 'S ( e ) ' dω 'H ( e ) ' dω,
' 2π ' 2π 2π
' −π
' −π −π
Ljubiša Stanković Digital Signal Processing 371
2
s(n) = e−2(n/128) cos(8π (n/128)2 + πn/8)
Two cases are presented in Fig.7.14: 1) When the input signal contains s(n)
and 2) when the input signal does not contain s(n). We can see that the
output of the matched filter has an easily detectable peak at n = 0 for the
case then the input signal contains s(n). There is no such a peak in y(n)
when the input signal x (n) is noise only.
372 Discrete-Time Random Signals
s(n)
1
-1
-2 -1 0 1 2
5 5
x(n)=s(n)+ε(n) x(n)=ε(n)
0 0
-5 -5
-2 -1 0 1 2 -2 -1 0 1 2
100 100
y(n) y(n)
50 50
0 0
-50 -50
-2 -1 0 1 2 -2 -1 0 1 2
Figure 7.14 Illustration of the matched filter: Signal s(n). Input noisy signal x (n) = s(n) + ε(n)
containing signal s(n). Input signal x (n) = ε(n) does not contain signal s(n). Corresponding
outputs from the matched filter y(n) = x (n) ∗ s(−n) are presented bellow the input signal
subplots.
Assume that the input signal is x (n) and that it contains an information
about the desired signal d(n). The output signal is y(n) = h(n) ∗n x (n).
The task here is to find the impulse response h(n) of system such that the
difference of the desired signal and the output signal, denoted as error
e ( n ) = d ( n ) − y ( n ),
Ljubiša Stanković Digital Signal Processing 373
This relation states that expected value of the product of error signal e(n) =
d(n) − y(n) and the input signal x ∗ (n − k ) is zero
E {2e(n) x ∗ (n − k )} = 0
for any k. For signals satisfying this relation we say that they are normal to
each other.
Relation (7.62) can be written as
% ;
∞
E ∑ h(m) x (n − m) x ∗ (n − k ) = E {d(n) x ∗ (n − k )}
m=−∞
or
∞
∑ h(m)r xx (k − m) = rdx (k ).
m=−∞
Taking the z-transform of both sides we get
Rdx (z)
H (z) = .
R xx (z)
For a special case when the input signal is the desired signal d(n) with an
additive noise
x (n) = d(n) + ε(n)
374 Discrete-Time Random Signals
where ε(n) is uncorrelated with the desired signal, the optimal Wiener
filtering relation follows
Rdd (z)
H (z) =
Rdd (z) + Rεε (z)
since
rdx (k ) = E {d(n) x ∗ (n − k )}
= E {d(n)[d∗ (n − k) + ε∗ (n − k)]}
= rdd (k).
Sdd (ω )
H (e jω ) = .
Sdd (ω ) + Sεε (ω )
Example 7.24. A signal x (n) = d(n) + ε(n) is processed by an optimal filter. Power
spectral density of d(n) is Sdd (ω ). If the signal d(n) and the additive noise
ε(n), whose power spectral density is Sεε (ω ), are independent find the output
signal-to-noise ratio.
⋆For this signal and noise, according to (7.56), we have
' '2
' '
Syy (e jω ) = 'H (e jω )' Sxx (e jω )
' '2
' Sdd (ω ) '
Syy (e jω ) = '' ' Sxx (e jω )
Sdd (ω ) + Sεε (ω ) '
2 (ω )
Sdd
=
Sdd (ω ) + Sεε (ω )
The optimal prediction system follows with the input signal x (n) =
d(n − 1) + ε(n − 1) and the desired signal d(n). Transfer function of the
optimal predictor is obtained from
rdx (k ) = E {d(n) x ∗ (n − k )}
= E {d(n)[d∗ (n − 1 − k) + ε∗ (n − 1 − k )]} = rdd (k + 1)
and
as
zSdd (z)
H (z) =
Sdd (z) + Sεε (z)
since
∞ ∞
∑ rdd (k + 1)z−k = ∑ rdd (k )z−k+1 = zSdd (z).
k =−∞ k=−∞
The optimal smoothing is the case when the desired signal is d(n) and
we can use its future value(s). It follows with x (n) = d(n + 1) + ε(n + 1) as
Example 7.25. The input signal is x (n) = s(n) + ε(n), where d(n) = s(n) is the
desired signal and ε(n) is a noise. If the autocorrelation functions of the signal
and noise are rss (n) = 4−|n| and rεε (n) = 2δ(n), respectively, and the cross-
correlation of the signal and noise is rsε (n) = δ(n), design the optimal filter.
⋆The optimal filter transfer function is
Rdx (z)
H (z) =
R xx (z)
where are
Rdx (z) = Rss (z) + Rsε (z)
376 Discrete-Time Random Signals
Rsε (z) = 1
Rεε (z) = 2.
The optimal systems with FIR filters will be presented within the
introductory part of the chapter dealing with adaptive discrete systems.
x (n) = x (n∆t)∆t.
x (n)
x(n)
x(t)
8 1000
d
7 0111
0.4 0.4 6 0110
5 0101
4 0100
0.2 0.2 3 0011
2 0010
1 0001
0 0 0000
0 5 10 15 0 5 10 15 0 5 10 15
t n n
Figure 7.15 Illustration of a continuous signal and its discrete-time and digital version.
step ∆) can be modeled as uniform noise with values between −∆/2 and
∆/2.
-Quantization of the results of arithmetic operations. It depends on the
way how the calculations are performed.
-Quantization of the coefficients in the algorithm. Usually this kind
of error is neglected in analysis since it is deterministic (comments on the
errors in the coefficients are given in the chapter dealing with realizations
of discrete systems).
In order to make appropriate analysis, common assumptions are:
1) random variables corresponding to the quantization errors are un-
correlated, i.e., the quantization error is a white noise process with a uniform
distribution,
2) the error sources are uncorrelated with one another, and
3) all the errors are uncorrelated with the input signal and, conse-
quently, with all signals in the system.
For registers with b bits the digital signal values xQ (n) are coded into binary
format.
378 Discrete-Time Random Signals
Assume that registers with b bits are used and that all input signals
are normalized to the range 0 ≤ x (n) < 1. The binary numbers are written
within the register as
a −1 a −2 a −3 ... a−b .
The maximal number that can be written within this format is 0.111...11
representing 1 − 2−(b+1) . Common number of bits b ranges from 8 to 24.
For reducing the signal number of digits to b bits rounding or trun-
cation is used. An example of quantization with b = 4 bits is presented in
Fig.7.15, where the maximal value of xd (n) = xQ (n) is denoted by 1111
meaning 2−1 + 2−2 + 2−3 + 2−4 = 15/16.
For the case with positive an negative numbers, one extra bit is used
for the sign. The registers are now with b + 1 bits. The first bit is the sign bit
and the remaining b bits represent the signal absolute value
s a −1 a −2 a −3 ... a−b .
1 0 1 1 0 0 1 0
The decimal point assumes position just before the first digit. The values of
x Q (n) in this register are
255
0 ≤ xQ (n) ≤
256
with the quantization step 1/256.
Ljubiša Stanković Digital Signal Processing 379
e ( n ) = x ( n ) − x Q ( n ).
For rounding, the maximum absolute error can be a half of the last digit
weight
1 1 −b
− 2− b ≤ x ( n ) − x Q ( n ) < 2
2 2
1 1
− ∆ ≤ x (n) − x Q (n) < ∆
2 2
where
∆ = 2− b .
We can also write
1
|e(n)| ≤ 2−(b+1) = ∆.
2
In the example from Fig.7.15, obviously the quantization step is 2 −4 = 1/16
and the error is within |e(n)| ≤ 12 16
1
.
The error values are equally probable within the defined interval. Its
probability density function is
⎧ 1
⎨ ∆ for − 12 ∆ ≤ ξ < 12 ∆
pe (ξ ) = ,
⎩
0 elsewhere.
∆/2
"
µe = E{e(n)} = ξ pe (ξ )dξ = 0.
−∆/2
Its variance is
∆/2
"
1 1
σe2 = (ξ − µe )2 dξ = ∆2 .
∆ 12
−∆/2
When the truncation is used, the error is within
or
0 ≤ e(n) < ∆
with mean value
∆
µe = E{e(n)} =
2
and variance
"∆
1 ∆ 1
σe2 = (ξ − )2 dξ = ∆2 .
∆ 2 12
0
Example 7.27. The DFT of a signal x (n) is calculated by using its quantized
version
xQ (n) = Q[ x (n)] = x (n) + e(n).
Quantization is done in an A/D convertor with b + 1 = 8 bits using round-
ing. The DFT is calculated on a high precision computer with N = 1024
signal samples. Find the mean and variance of the calculated DFT.
⋆The DFT of quantized signal is
N −1
XQ (k ) = ∑ [ x (n) + e(n)] e− j2πkn/N .
n =0
Its mean is
N −1
µ XQ (k ) = E{ XQ (k )} = ∑ x (n)e− j2πkn/N = X (k ).
n =0
The variance is
N −1 N −1
σX2 Q (k ) = ∑ ∑ σe2 δ(n1 − n2 )e− j2πk(n1 −n2 )/N
n1 =0 n2 =0
1 2 1
= σe2 N = ∆ N = 2−2b N
12 12
1 −14 1 −14 10 1
= 2 N= 2 2 = .
12 12 192
The noise in the DFT is a sum of many independent noises from the
input signal and coefficients. Thus it is Gaussian distributed with standard
deviation σXQ = 0.072. It may significantly influence the signal DFT values,
especially if they are not well concentrated or if there are signal components
with small amplitudes.
Ljubiša Stanković Digital Signal Processing 381
Example 7.28. How the input quantization error influences the results of:
(a) Weighted sum
N −1
Xs = ∑ an x (n)
n =0
(b) Product
N −1
XP = ∏ x ( n ).
n =0
⋆If the quantized values xQ (n) = Q[ x (n)] = x (n) + e(n) of signal x (n)
are used in calculation instead of the signal true values then:
(a) The estimator of a weighted sum is
N −1 N −1 N −1
X̂s = ∑ an x Q (n) = ∑ an x (n) + ∑ a n e ( n ).
n =0 n =0 n =0
N −1
e Xs = ∑ a n e ( n ).
n =0
It is Gaussian distributed since there are many small errors e(n). It has been
assumed that the weighting coefficients are such that they allow many signal
values to influence result with similar weights.
The mean value is
N −1
µ Xs = E { e Xs } = ∑ an E{e(n)} = 0,
n =0
N −1
2 1 2 N −1 2
σX = ∑ a2n var{e(n)} = ∆ ∑ an .
s
n =0
12 n =0
N −1
X̂P = ∏ (x(n) + e(n)).
n =0
Assuming that the individual errors are small so that all higher order error
terms containing e(n)e(m), e(n)e(m)e(l ), ... could be neglected we get
N −1 N −1 N −1
X̂P ∼
= ∏ x ( n ) + ∑ ∏ x ( n ) e ( m ).
n =0 m =0 n =0
n̸=m
382 Discrete-Time Random Signals
The mean value is zero if rounding is used. The variance is signal dependent,
N −1 N −1 N −1 N −1
2 1
σX p
= ∑ ∏ x2 (n)var{e(n)} = 12 ∆2 ∑ ∏ x2 (n).
m =0 n =0 m =0 n =0
n̸=m n̸=m
In the quantization of results after the basic arithmetic operations are per-
formed we can distinguish two cases. One is with fixed point arithmetic. In
that case the register assumes that the decimal point is at the fixed place. All
data are written with respect to this position. In the floating point arithmetic
numbers are written in the sign-mantissa-exponent format. The quantiza-
tion error is then produced on mantissa only.
Fixed point arithmetic assumes that the decimal point position is fixed.
Common assumption is that the all input values and the mid-results, in this
case, are normalized so that 0 ≤ x (n) < 1 or −1 < x (n) < 1 if sign bit is used.
In multiplications, the result of a multiplication
x Q (n) x Q (m)
where e(n, m) is the quantization error satisfying all the previous properties
with
1 1
− ∆ ≤ e(m, n) ≤ ∆.
2 2
Example 7.29. Find the mean of quantization error for
N −1
r (n) = ∑ x (n + m) x (n − m)
m =0
xQ (n) = Q[ x (n)] = Q[Re{ x (n)} + j Im{ Q[ x (n)]}] = x (n) + er (n) + jei (n).
Since the real and imaginary part are independent, with the same variance,
the variance of quantization error for a complex-valued signal is
1 2 1 2
σe2 = 2 ∆ = ∆ .
12 6
384 Discrete-Time Random Signals
x (0) x (1 ) x ( N − 1)
XN = + + ... + .
N N N
Then we are sure that no result will be outside the interval (−1, 1). By divid-
ing the signal samples by N an additive quantization noise is introduced,
x (0 ) x (1) x ( N − 1)
X̂ N = + e (0 ) + + e(1) + ... + + e ( N − 1 ).
N N N
Variance of the equivalent noise e(0) + e(1) + · · · + e( N − 1) is
1 2 1
σe2 = ∆ N = 2−2b N.
12 12
Since the variance of x (n)/N is σx2 /N 2 , the variance of X̂ N is
σx2 1
σX2 N = N + ∆2 N.
N2 12
Ratio of variances corresponding to the signal and noise in the result is
σ2
N Nx2 1 σx2 1 σx2
SNR = 1 2
= 2 1
= 2 1 −2b
12 ∆ N
N 12 ∆ 2 N 12 2
Ljubiša Stanković Digital Signal Processing 385
or in [dB]
1 σx2
SNR = 10 log( )
N2 1 −2b
12 2
= 20 log σx − 20 log N − 20 log 2−b + 10 log(12)
log2 N log 2−b
= 20 log σx − 20 − 20 2 + 10.8
log2 10 log2 10
= 20 log σx − 6.02(m − b) + 10.8,
x (n) x ( n + 1) x ( n ) + x ( n + 1) (2)
+ e(n) + + e ( n + 1) = + en .
2 2 2
The error
(2)
en = e ( n ) + e ( n + 1)
has the variance
M N 1 1 1
(2)
var en = ∆2 + ∆2 = ∆2 .
12 12 6
After each division by 2 the result is shifted in the register to the right and
a quantization error is created. Thus the error model, due to the addition
quantization, is
x (0) x (1) (2 ) x (2) x (3) (2)
2 + 2 + e0 2 + 2 + e2 (4)
2 + 2 + e0
X̂ N = (7.63)
2
x (4) x (5) (2) x (6) x (7) (2)
2 + 2 + e4 2 + 2 + e6 (4)
2 + 2 + e4 (8)
+ + e0
2
386 Discrete-Time Random Signals
x (0 ) x (1 ) x ( N − 1)
= + + ... +
N N N
(2) (2) (2)
e0 e2 e
+ + + ... + N −2 +
N/2 N/2 N/2
(4) (4)
e0 e
+ + ... + N −4 +
N/4 N/4
....
(N)
e0
+ .
N/N
1 1
σe2 = ∆2 = 2−2b .
6 6
Note that the noises in the first stage are divided by N/2, due to divisions
by 2 in the next stages of summation. Their variance is reduced for N 2 /4.
The value of variance of errors in these stages is
σx2 1 2 1 4 1 2m
σX2 N = N 2
+ ∆2 + ∆2 + ... + ∆2 (7.64)
N 6 N 6 N 6 N
σ2 1 2 σ2 1 2 1 − 2m
= x + ∆2 (1 + 2 + ... + 2m−1 ) = x + ∆2
N 6 N N 6 N 1−2
σx2 1 22 σx2 1 2 1
= + ∆ (N − 1 ) = + ∆ (1 − ).
N 6 N N 3 N
Ljubiša Stanković Digital Signal Processing 387
ed ∈ {−∆/2, 0, ∆/2},
with probabilities Pd (±∆/2) = 1/4 and Pd (0) = 1/2. Mean value of this
error kind of is zero, provided that the rounding is done in such a way that it
takes values ±∆/2 with equal probability (various tie-breaking algorithms
for rounding exist). Its variance is
M N 1 ∆ 1 ∆ 1
(i )
var en = 2var {ed } = 2[ (− )2 + ( )2 ] = ∆2 , for i > 2.
4 2 4 2 4
The total variance of X̂ N is then of form
σx2 1 2 1 4 1 2m σ2 1 4
σX2 N = N 2
+ ∆2 + ∆2 + ... + ∆2 = x + ∆ 2 (1 − ),
N 6 N 4 N 4 N N 2 3N
instead of (7.64). Signal-to-noise ratio is
σx2
SNR = N ∼
= 2σx2 22(b−m/2) .
1 2 4
2 ∆ (1 − 3N )
388 Discrete-Time Random Signals
where ei (n) is the input signal quantization error and em (n) is the multipli-
cation quantization error. The variances for complex-valued signals are
1 2 1 2 1 1
var{ei (n)} = 2 ∆ = ∆ , var{em (n)} = 4 ∆2 = ∆2 .
12 6 12 3
In addition, we have to provide that additions do not produce an overflow.
If we use the calculation scheme, presented for N = 8, as
y (0) y (1) (2) y (2) y (3) (2)
2 + 2 + e0 2 + 2 + e2 (4)
2 + 2 + e0
X̂ (k ) =
2
y (4) y (5) (2) y (6) y (7) (2)
2 + 2 + e4 2 + 1 + e6 (4)
2 + 2 + e4 (8)
+ + e0 ,
2
then in each addition the terms should be divided by 2. This division
introduces a quantization error. In the first step
y(n) y ( n + 1) 1 nk
+ e(n) + + e(n + 1) = {[ x (n) + ei (n)] WN + em (n)+
2 2 2
( n +1) k
[ x (n + 1) + ei (n + 1)]WN + em (n + 1)} + e(n) + e(n + 1).
The total error in this step is
nk + e (n ) + e (n + 1)W ( n +1) k
(2) ei (n)WN m i N + em ( n + 1)
en = + e ( n ) + e ( n + 1)
2
with variance
* +
(2) 1 1 2 1 2 1 2 1 2 1 7
var{en } = ∆ + ∆ + ∆ + ∆ + 2 ∆2 = ∆2 .
4 6 3 6 3 6 12
(4) (N)
In all other steps, within the errors e0 to e0 , just the addition errors
appear. Their variance, for complex-valued terms, is
(i ) 1
var{en } = 2 ∆2 .
6
Ljubiša Stanković Digital Signal Processing 389
If the FFT is calculated using the fixed point arithmetic and the signal
is uniform, distributed within −1 < x (n) < 1 with variance σx2 , then in order
to avoid an overflow the signal could be divided at the input with N and
the standard FFT could be used, as in Fig.7.16.
An improvement in the SNR can be achieved if the scaling is done
not to the input signal x (n) by N but by 1/2 in each butterfly, as shown in
Fig.7.17. The improvement is here due to the fact that the quantization errors
appearing in the early butterfly stages are divided by 1/2 and reduced at the
output as in (7.63). Improvement of an order of N is obtained in the output
signal-to-noise ratio.
Fixed point arithmetic is simple, but could be inefficient if the signal values
within wide range of amplitudes may be expected. For example, if we can
expect signal values
x Q (n1 ) = 1011111110101.010
x Q (n2 ) = 0.0000000000110101
then obviously fixed point arithmetic would require large registers so that
both values can be stored without loosing their significant digits. However,
we can represent these signal values into the exponential form as
The exponential format of numbers is then written within the register in the
following format
sn se e1 e2 e3 e4 e5 e6 e7 m −1 m −2 m −3 ... m−b
where:
sn is the sign of number (1 for positive number and 0 for negative
number)
se is the sign of exponent (1 for positive exponent and 0 for negative
exponent)
e1 e2 ...e7 is the binary format of exponent, and
m−1 m−2 ...m−b is the mantissa, assuming that the integer value is
always 1, it is omitted.
Ljubiša Stanković Digital Signal Processing 391
x(0)/N X(0)
x(1)/N X(4)
-1 W0
8
x(2)/N X(2)
-1 W0
8
x(3)/N 2 0 X(6)
-1 W8 -1 W8
x(4)/N 0 X(1)
-1 W8
x(5)/N X(5)
-1 W1 -1 0
W8
8
x(6)/N 2 0 X(3)
-1 W8 -1 W8
x(7)/N 3 2 0 X(7)
-1 W8 -1 W8 -1 W8
Figure 7.16 FFT calculation scheme obtained by decimation in frequency for N = 8 with
signal being divided by N in order to avoid overflow when the fixed point arithmetic is used.
1/
2
1/
2
1/
2
1/
1/2 1/2
x(1) -1/2 X(4)
W0
2
1/
2
8
1/
1/
2
1/2 1/2
x(2) -1/2 X(2)
W0
1/
2
2
1/
2
1/
8
1/
2
1/2
x(3) -1/2 2 -1/2 0 X(6)
W8 W8
1/
2
1/
2
1/2 1/2
x(4) -1/2 0 X(1)
W8
2
1/
2
1/
1/
1/
2
2
1/2
x(5) -1/2 1 -1/2 0 X(5)
W8 W8
2
1/
1/
2
1/
2
1/2
x(6) -1/2 2 -1/2 0 X(3)
W8 W8
1/
2
1/
1/
2
2
1/
2
Figure 7.17 FFT calculation scheme obtained by decimation in frequency for N = 8 with
signal being divided in each butterfly by 1/2 in order to avoid overflow when the fixed point
arithmetic is used.
392 Discrete-Time Random Signals
Within this format, the previous signal value xQ (n1 ), with a register of
19 bits in total, is
1 1 0 0 0 1 1 0 0 0 1 1 1 1 1 1 1 0 1
while x Q (n2 ) is
1 0 0 0 0 1 0 1 1 1 0 1 0 1 0 0 0 0 0 .
If the exponent cannot be written within the defined number of bits (here
7) the computer has to stop the calculation and indicate "overflow", that
is, the number cannot fit into the register. For mantissa the values are just
rounded to the available number of bits. In the implementations based on
the floating-point arithmetic, the quantization affects the mantissa only. The
relative error in mantissa is again
1
|e(n)| ≤ 2−(b+1) = ∆.
2
The error in signal is multiplied by the exponent. Since we can say that the
exponent value is of the signal order, we can write
The error behaves here as a multiplicative uniform noise. Thus, for the
floating-point representation, multiplicative errors appear.
The floating-point additions also produce the quantization errors,
which are represented by a multiplicative noise. During additions the num-
ber of bits may increase. This increase in the number of bits requires man-
tissa shift, what causes multiplicative error.
In addition to the IEEE standard when the total number of bits is 32
(23 for mantissa and 7 for exponent) we will mention two standard formats
for the telephone signal coding. The µ-law pulse-coded modulation (PCM)
is used in the North America and the A-law PCM is used in European tele-
phone networks. They use 8-bit representations with a sign bit, 3 exponent
bits, and 4 mantissa bits
s e1 e2 e3 m1 m2 m3 m4 .
The µ-law encoding takes a 14-bit signed signal value (its two’s complement
representation) as input, adds 33 (binary 100001) and converts it to an 8 bit
value. The encoding formula in the µ-law is
@ A
(−1)s 2e+1 (m + 16.5) − 33 .
Ljubiša Stanković Digital Signal Processing 393
0 0 0 0 0 0 0 0 .
0 0 1 1 1 1 1 0 .
Quantization step for this range of numbers is 2 4 = 16. It means that the
previous possible number is 439, while the next possible number would be
471. It is the last number with 2 e+1 = 16.
if the quantization error is caused by floating point registers with b bits for
mantissa. What is the mean value? Write the model for
y ( n ) = x ( n ) + x ( n + 1).
394 Discrete-Time Random Signals
where e(n, n + 1) is the is the multiplicative noise modeling the addition error.
7.9 PROBLEMS
Problem 7.1. Signal x20i (n), for i = 01, 02, .., 15, is the monthly average of
maximal daily temperatures in a city measured from year 2001 to 2015.
Values are given in Table 7.2. If we can assume that the signal for individual
month is Gaussian find the probability that the average of maximal daily
temperatures: (a) in January is lower than 2, (b) in January is higher than 12.
Find the probability density function p(ξ ) and the probability that x (n) <
2.5.
where a and b are constants. Find the relation between a and b. What is the
cumulative probability distribution function for a = 1?
Ljubiša Stanković Digital Signal Processing 395
λ −λ|ξ |
p x (ξ ) = e , λ > 0.
2
where ε(n) is a Gaussian noise with mean µε = 0 and variance σε2 , A > 0
is a constant and N x is nonempty set of discrete time instants. A threshold
based criterion is used to detect if an arbitrary time instant n belongs to the
set N x
n ∈ N x if x (n) > T,
where T is threshold. Find threshold T if the probability of false detection is
0.01.
Problem 7.12. Signal x (n) is a random Gaussian sequence with mean
µ x = 5 and variance σx2 = 1. Signal y(n) is a random Gaussian sequence,
independent from x (n), with mean µy = 1 and variance σy2 = 1. If we
consider N = 1000 samples of these signals find the expected number of
time instants where x (n) > y(n) holds.
Problem 7.13. Let x (n) and y(n) be independent real-valued white Gaus-
sian random variables with means µ x = µy = 0 and variances σx2 and σy2 .
Show that the random variable
1 M
M n∑
z= x (n)y(n)
=1
If the input signal is white noise x (n) = ε(n), with the autocorrelation
function rεε (n) = σε2 δ(n), find the autocorrelation and the power spectral
density of the output signal.
Problem 7.18. Consider a linear time-invariant system whose input is
x (n) = ε(n)u(n)
h ( n ) = a n u ( n ),
Problem 7.22. The spectrogram is one of the most commonly used tools in
time-frequency analysis. Its form is
' '2
' N −1 '
' 2π '
Sx (n, k ) = ' ∑ x (n + i )w(i )e− j N ik '
' i =0 '
where the signal is x (n) = s(n) + ε(n), with s(n) being the desired deter-
ministic signal and ε(n) a complex-valued, zero-mean white Gaussian noise
with variance σε2 and independent and identically distributed (i.i.d.) real
and imaginary parts. Window function is w(i ). Using a rectangular window
of the width N find:
a) the mean of Sx (n, k ),
b) the variance of Sx (n, k ).
Note: For a Gaussian random signal ε(n), it holds
E{ε(l )ε∗ (m)ε∗ (n)ε( p)} = E{ε(l )ε∗ (m)} E{ε∗ (n)ε( p)}
+ E{ε(l )ε∗ (n)} E{ε∗ (m)ε( p)} + E{ε(l )ε( p)} E{ε∗ (m)ε∗ (n)}.
where the signal is x (n) = s(n) + ε(n), with s(n) being the desired deter-
ministic signal and ε(n) complex-valued, zero-mean white Gaussian noise
with variance σε2 and independent and identically distributed (i.i.d.) real
and imaginary parts. Find:
a) the mean value of Wx (n, ω ),
b) the variance of Wx (n, ω ).
Use the previous problem note. Write the variance form for an FM
signal when |s(n)| = A.
Problem 7.24. A random signal s(n) carries an information. Its autocorre-
lation function is rss (n) = 4(0.5)|n| . A noise with variance of autocorrelation
rεε (n) = 2δ(n) is added to the signal. Find the optimal filter for:
Ljubiša Stanković Digital Signal Processing 399
1.5 jω
S (e )
dd
0.5
0
-3 -2 -1 0 1 2 3
1.5
S (ejω)
εε
0.5
0
-3 -2 -1 0 1 2 3
1.5
H(ejω)
1
0.5
0
-3 -2 -1 0 1 2 3
' '2
Figure 7.18 Power spectral densities of the signal 'S(e jω )' and input noise Sεε (e jω ) along
jω
with the frequency response of an optimal filter H (e ).
Problem 7.26. The power spectral densities of the signal Sdd (e jω ) and input
noise Sεε (e jω ) are given in Fig.7.18. Show that the frequency response of the
optimal filter H (e jω ) is presented in Fig.7.18(bottom). Find the SNR at the
input and output of the optimal filter.
400 Discrete-Time Random Signals
Problem 7.27. Find the mean of quantization error of the Wigner distribu-
tion (its pseudo form over-sampled in frequency)
N −1
Wx (n, k ) = ∑ x (n + m) x (n − m)e− j2πmk/N
m =0
7.10 SOLUTIONS
Solution 7.1. (a) The mean value for January, Table 7.2, is
µ x (1) = 7.2667.
"2 (ξ −µ x (1)) 2
1 −
P ( x (1) < 2) = √ e 2σx2 (1) dξ =
σ (1) 2π
−∞ x
- .
7.2667 − 2
= 0.5 1 − erf( √ ) = 0.0260.
2.7115 2
"∞ (ξ −µ x (1)) 2
1 −
P( x (1) > 12) = √ e 2σx2 (1) dξ =
σx (1) 2π
12
- .
12 − 7.2667
0.5 1 − erf( √ ) = 0.0404.
2.7115 2
The probability of x (n) < 2.5 is P( x (n) < 2.5) = F (2.5) = 0.75.
Therefore
" ∞ -" 0 " ∞ .
2a
ae−b|ξ | dξ = a ebξ dξ + e−bξ dξ = = 1,
−∞ −∞ 0 b
resulting in b = 2a.
For a = 1 the probability density function is p x (ξ ) = e−2|ξ | for −∞ <
ξ < ∞ . The probability distribution function is
" χ " χ
e2χ
Fx (χ) = p x (ξ )dξ = e2ξ dξ =
−∞ −∞ 2
5
P0 = .
10
If the first ball was 0 then we have 9 balls for the second draw, with 4 balls
marked with 0. The probability that x (1) = 0 if x (0) = 0 is
4
P1 = .
9
If x (0) = 0 and x (1) = 0 then there are 8 remaining balls with 3 of them being
marked with 0. The probability that x (2) = 0, with x (0) = 0 and x (1) = 0, is
3
P2 = .
8
5 432
P ( k = 0) = .
10 9 8 7
Ljubiša Stanković Digital Signal Processing 403
Solution 7.7. Let us find probability that x (n) < ξ for arbitrary ξ. Consider
the case when ξ < 0,
* * ++
0.2 ξ −3
P{ x (n) < ξ } = 1 + erf .
2 2
It has been taken into account that the considered sample is Gaussian (with
probability 0.2), along with the probability that the sample value is smaller
than ξ.
For ξ > 0 we should take into account that the signal assumes x (n) = 0
with probability 80% as well as that in the remaining 20% cases, Gaussian
random value could be smaller than ξ. So we get
* * ++
0.2 ξ −3
P{ x (n) < ξ } = 0.8 + 1 + erf .
2 2
Now we have
⎧ B B CC
⎪ 0.2 ξ −3
⎪
⎨ 2 1 + erf 2 forξ < 0
P{ x (n) < ξ } = B B CC
⎪
⎪ ξ −3
⎩ 0.8 + 0.2
1 + erf for ξ > 0
2 2
d 0.2 ( ξ −3)2
P{ x (n) < ξ } = py(n) (ξ ) = √ e− 4 + 0.8δ(ξ ).
dξ 2 π
The mean and variance are
"∞
µy(n) = ξ py(n) (ξ )dξ = 0.2 × 3 + 0.8 × 0 = 0.6
−∞
"∞
σy2(n) = (ξ − 0.6)2 py(n) (ξ )dξ = 0.2 × 7.76 + 0.8 × (0.6)2 = 1.84.
−∞
404 Discrete-Time Random Signals
with 2000 × 4.7 × 10−3 = 9.4 ≈ 9 samples among considered 2000 assuming
an amplitude higher than 3.
Solution 7.9. If we are in position to use a reduced set of signal samples
for processing, then the ideal scenario would be to eliminate signal samples
with higher noise values and to keep for processing the samples with lower
noise values. For the case of N signal samples and signal processing based
on M samples we can find the interval of amplitudes A for the lowest M
noisy samples. The probability that | x (n)| < Aσε is
"Aσε
1 2 / (2σ2 )
P{| x (n)| < Aσε } = √ e−ξ ε dξ.
σε 2π
− Aσε
"A * +
1 2 /2 A M
√ e−ξ dξ = erf √ = .
2π 2 N
−A
√( x ) function
The calculation of A value is easily related to the inverse erf
denoted by erfinv( x ). For a given M/N, the amplitude is A = 2erfinv( M N ).
Ljubiša Stanković Digital Signal Processing 405
For example, for M = N/2 a half of the lowest √ noise samples will be within
the interval [−0.6745σε , 0.6745σε ] since A = 2erfinv(0.5) = 0.6745.
The probability density function of the new noise is
% 2 2
√k e−ξ /(2σε ) for |ξ | < Aσε
py (ξ ) = σε 2π
0 for |ξ | ≥ Aσε .
&∞
The constant k is obtained from the condition that py (ξ )dξ = 1. It is
−∞
k = N/M.
The variance of this new noise, formed from the Gaussian noise after
the largest N − M values are removed, is much lower than the variance of
the whole noise. It is
√
2erfinv( M
N ) σε
N "
2 / (2σ2 )
σy2 = M
√ ξ 2 e−ξ ε dξ. (7.66)
σε 2π √
− 2erfinv( M
N ) σε
Now we have
µy(n) = 0
"A 2
1 1 − (ζ )2
σy2(n) = ζ 2 B C √ e 2σx dζ
A σx 2π
−A
erf √
σx 2
⎛ ⎞
√ − A2
A 2e 2σx2
⎜ C⎟
= σx2 ⎝1 − √ B ⎠.
σx π erf A
√
σx 2
406 Discrete-Time Random Signals
√
By denoting β = A/( 2σx ), the variance σy2(n) can be written as
( 2
)
e− β
σy2(n) = σx2 1 − 2β √ .
π erf ( β)
* +
1 1 T
PF = P{ε(n) > T } = − erf √
2 2 2σε
where erfinv(·) is the inverse erf function. Note that the threshold does not
depend on A.
since signals are mutually independent. Probability that x (n) > y(n) can be
obtained by integrating p x(n),y(n) (ξ, ζ ) over the region ξ > ζ. It is
"∞ ( ξ −5 )2
"ξ ( ζ −1)2
1 1
P{ x (n) > y(n)} = √ e− 2 √ e− 2 dζdξ ≈ 0.99766.
2π 2π
−∞ −∞
For 1000 instants we expect that x (n) > y(n) is satisfied in about 998
instants.
1 M
M n∑
z= x (n)y(n)
=1
Ljubiša Stanković Digital Signal Processing 407
1 M M 1 M M
= ∑ ∑
M 2 n =1 m =1
E [ x ( n ) y ( n ) x ( m ) y ( m )] =
M2 n∑ ∑ E[x(n)x(m)] E[y(n)y(m)]
=1 m =1
1 M 2 2 1 M 2 2 1 2 2
= 2 ∑
M n =1
E [ x ( n )] E [ y ( n )] = 2 ∑
M n =1
σx σy = σ σ .
M x y
Solution 7.14. Probability that the random variable is within −∞ < ξ < ∞
is
"∞ "∞
a ∞
1= pε(n) (ξ )dξ = dξ = a arctan(ξ )|− ∞ = aπ,
1 + ξ2
−∞ −∞
resulting in a = 1/π. The mean value is
"∞
1 ξ
µε = dξ = 0,
π 1 + ξ2
−∞
1
H (z) = .
1 − 0.5z−1
a
Y (z) = H (z) X (z) = , |z| > 1/2.
1 − 0.5z−1
408 Discrete-Time Random Signals
∞
Y (z) = a ∑ (1/2)n z−n .
n =0
The mean value and autocorrelation of the output signal y(n) are
" ∞
µy (n) = E {y(n)} = y(n) p( a)da = 9 · 2−(n+1) u(n)
−∞
61 −(n+m)
ryy (n, m) = E {y(n)y∗ (m)} = 2 u ( n ) u ( m ).
3
The output signal y(n) is not WSS.
∞
R xx (z) = ∑ r xx (n)z−n = 1
n=−∞
Sxx (ω ) = 1.
For z = e jω we get
Rεε h (e jω ) = Sεε (ω ) H (e jω ) = H (e jω ),
resulting in % 2
2 sin (nπ/2)
rεε h (n) = h(n) = π n , n ̸= 0
0, n = 0.
It is easy to conclude that the cross-correlation function is antisymmetric
r xy (−n) = −r xy (n).
∞
x a (n) = ε a (n) = x (n) + jxh (n) = x (n) + j ∑ h ( k ) x ( n − k ).
k=−∞
Xa (e jω ) = X (e jω ) + jH (e jω ) X (e jω ).
Xa (e jω )
= Ha (e jω ) = 1 + jH (e jω ) = 1 + sgn(ω )
X (e jω )
⎧
⎨ 2, ω>0
= 1, ω=0
⎩
0, ω < 0.
a|n| 2
ryy (n) = σ .
1 − a2 ε
Power spectral density of the output signal is
σε2 σε2
Syy (ω ) = Ryy (e jω ) = −
= .
(1 − ae jω )(1 − ae jω ) 1 − 2a cos ω + a2
Solution 7.18. The mean of y(n) is
% ;
∞ ∞
µy (n) = E ∑ h(k ) x (n − k ) = ∑ ak E{ε(n − k)}u(n − k)
k=−∞ k =0
n
1 − a n +1
= ∑ ak µε = µε u ( n ).
k =0
1−a
The variance is
MP Q2 N
σy2 (n) = E y(n) − µy (n) = E{y2 (n)} − µ2y (n)
n n * +2
k1 k2 1 − a n +1
= ∑ ∑ a a E { ε ( n − k 1 ) ε ( n − k 2 )} u ( n ) − µ ε
1−a
u ( n ).
k =0 k =0
1 2
Ljubiša Stanković Digital Signal Processing 411
1 − a 2( n +1)
σy2 (n) = σε2 u ( n ).
1 − a2
Solution 7.19. The mean value is
N
µ x = µε + ∑ a k E{ e j ( ωk n + θ k ) } = µ ε ,
k =1
since
"π
j ( ωk n + θk ) 1
E{ e }= e j(ωk n+θk ) dθk = 0.
2π
−π
The autocorrelation is
N
r xx (n) = σε2 δ(n) + µ2ε + ∑ a2k e jωk n ,
k =1
N
Sxx (e jω ) = FT{r xx (n)} = σε2 + 2πµ2ε δ(ω ) + 2π ∑ a2k δ(ω − ωk ).
k =1
Solution 7.20. For the optimal filtering d(n) = s(n). The cross-correlation of
the input and desired signal is
Its z-transform is
−15z/4
Rdx (z) = Rss (z) = .
(z − 1/4)(z − 4)
−15z/4 z2 − 8z + 1
R xx (z) = +1= .
(z − 1/4)(z − 4) (z − 1/4)(z − 4)
412 Discrete-Time Random Signals
Using the fact that the signal s(n) is deterministic and the noise ε(n) is zero-
mean white stationary we get
N −1 N −1 2π
E{Sx (n, k )} = ∑ ∑ s ( n + i 1 ) s ∗ ( n + i 2 ) e − j N ( i1 − i2 ) k
i1 =0 i2 =0
N −1 N −1 2π
+ ∑ ∑ E{ε(n + i1 )ε∗ (n + i2 )}e− j N (i1 −i2 )k
i1 =0 i2 =0
or
N −1 N −1 2π
E{Sx (n, k )} = Ss (n, k ) + σε2 ∑ ∑ δ ( i 1 − i 2 ) e − j N ( i1 − i2 ) k
i1 =0 i2 =0
N −1
= Ss (n, k) + σε2 ∑ 1 = Ss (n, k) + Nσε2
i =0
Ljubiša Stanković Digital Signal Processing 413
E{ x (n + i1 ) x ∗ (n + i2 ) x ∗ (n + i3 ) x (n + i4 )}
= s ( n + i1 ) s ∗ ( n + i2 ) s ∗ ( n + i3 ) s ( n + i4 )
+ s(n + i1 )s∗ (n + i2 )rεε (i4 − i3 ) + s(n + i1 )s∗ (n + i3 )rεε (i4 − i2 )
+ s∗ (n + i2 )s(n + i4 )rεε (i1 − i3 ) + s∗ (n + i3 )s(n + i4 )rεε (i1 − i2 )
+ E{ε(n + i1 )ε∗ (n + i2 )ε∗ (n + i3 )ε(n + i4 )}.
The facts that odd order moments of a Gaussian zero-mean noise are zero
and rεε∗ (k ) = rε∗ ε (k ) = 0 for a complex-valued noise with i.i.d. are used.
According to the relation from the note it holds
Signal is deterministic and not correlated with the white noise ε(n),
where rεε (2k ) is the autocorrelation function of the additive noise ε(n). The
noise variance is σε2 . Then
E{ x (n + k1 ) x ∗ (n − k1 ) x ∗ (n + k2 ) x (n − k2 )}
= s(n + k1 )s∗ (n − k1 )s∗ (n + k2 )s(n − k2 ) + s(n + k1 )s∗ (n − k1 )rεε (−2k2 )
+ s(n + k1 )s∗ (n + k2 )rεε (k2 − k1 ) + s∗ (n − k1 )s(n − k2 )rεε (k1 − k2 )
+ s∗ (n + k2 )s(n − k2 )rεε (2k1 ) + rεε (2k1 )rεε (−2k2 ) + rεε
2
( k 1 − k 2 ).
Ljubiša Stanković Digital Signal Processing 415
L L
σ2 = ∑ ∑ [s(n + k1 )s∗ (n + k2 )rεε (k2 − k1 ) + rεε
2
(k1 − k2 )
k1 =− L k1 =− L
L
+s∗ (n − k1 )s(n − k2 )rεε (k1 − k2 )]e− jω (k1 −k2 ) = σε2 ∑ (2|s(n + k )|2 + σε2 ).
k=− L
a) For the optimal filtering d(n) = s(n). The cross-correlation of the desired
and input signal is
rdx (n) = E{d(k ) x (n − k )} = E{s(k )[s(k − n) + ε(k − n)]} = rss (n) = 4(0.5)|n| .
3z
R (z) (2z−1)(2−z) 3z
H (z) = dx = 3z
= .
R xx (z)
(2z−1)(2−z)
+1 −2z2 + 8z − 2
and
∞
−n 3z2
Rdx (z) = ∑ 4(0.5)|n−1|z = zRss (z) =
n=−∞ (2z − 1)(2 − z)
follows
3z2
H (z) = .
−2z2 + 8z − 2
416 Discrete-Time Random Signals
∞
−n 3
Rdx (z) = ∑ 4(0.5)|n+1|z = z−1 Rss (z) =
n=−∞ (2z − 1)(2 − z)
with
3
H (z) = .
−2z2 + 8z − 2
Solution 7.25. For the optimal filter d(n) = s(n). The correlation functions
are
and
rdx (n) = E{s(k )[s∗ (k − n) + ε∗ (k − n)]} = rss (n) + rsε (n) = 3(0.9)|n| + 2δ(n).
Sdd (e jω )
H (e jω ) = .
Sdd (e ) + Sεε (e jω )
jω
1 − ω/2
H (e jω ) =
1 − ω/2 + (1 − |ω − 2|)
1 − ω/2 2−ω
= = .
1 − ω/2 + (1 + (ω − 2)) ω
1
&π '
jω '
'
jω '2
&2 ' '2
2π −π Sdd (e ) H ( e ) dω 3/2 + 2 1 (1 − ω2 ) ' 2−ωω ' dω
SNRo = &π ' ' = &2 ' '2
1 jω 2 ' jω '2 2 1 (1 + (ω − 2)) ' 2−ωω ' dω
2π −π Sεε ( e ) H ( e ) dω
10 − 12 ln 2
= = 18.6181
16 ln 2 − 11
or 12.7 [dB].
It has been assumed that the errors in two different signal samples are not
correlated E{e(n + m)e(n − m)} = 0 for m ̸= 0 and that the signal and error
are not correlated, E{ x (n + m)e(n − m)} = 0 for any m and n.
418 Discrete-Time Random Signals
7.11 EXERCISE
Exercise 7.1. Signal x20i (n) is equal to the monthly average of maximal daily
temperatures in a city measured from year 2001 to 2015. If we can assume
that the signal for an individual month is Gaussian find the probability that
the average of maximal temperatures: (a) in July is lower than 25, (b) in
August is higher than 39.
Exercise 7.2. Random signal x (n) is such that x (n) = x1 (n) with probability
p. In all other cases x (n) is x2 (n). If the mean and variance of x1 (n) and
x2 (n) are µ x1 , σx21 and µ x2 , σx22 , respectively, find the mean and the variance
of x (n).
Result: µ x = pµ x1 + (1 − p)µ x2 and
@ A @ A
σx2 = p E{ x12 (n)} − µ2x + (1 − p) E{ x22 (n)} − µ2x
= p[σx21 + µ2x1 − µ2x ] + (1 − p)[σx22 + µ2x2 − µ2x ]
= pσx21 + (1 − p)σx22 + p(1 − p)(µ x1 − µ x2 )2 .
Exercise 7.3. Find the mean and variance of a white uniform noise whose
values are within the interval − a ≤ x (n) ≤ a. If that signal is an input to
the FIR system with impulse response h(n) = 1 for 1 ≤ n ≤ N and h(n) = 0
elsewhere, find the mean and variance of the output signal.
Exercise 7.4. Consider a signal x (n) equal to the Gaussian zero-mean noise
with variance σε2 . A new noise y(n) is formed by using the values of x (n)
lower than median value. Find the mean and variance of this new noise
y(n). Result: σy2 = 0.1426σε2 .
1
y ( n ) − y ( n − 1 ) = x ( n ).
2
x (n) = ε(n)u(n)
where µε = 0 and rεε (n) = σε2 δ(n). Find the mean value and the autocorrela-
tion ryy (n, m) of the output signal. What is the cross-correlation between the
input and output signal ryx (n, m). Show that for n → ∞ the output signal
tends to a WSS signal.
Ljubiša Stanković Digital Signal Processing 419
Exercise 7.6. (a) Calculate the DFT value X (4) for x (n) = exp( j4πn/N ) with
N = 16.
(b) Calculate the DFT of a noisy signal x (n) + ε(n), where the noise is
ε(n) = 1001δ(n) − 899δ(n − 3) + 561δ(n − 11) − 32δ(n − 14).
(c) Estimate the DFT using noisy signal x (n) + ε(n) and
M N
XR (k ) = N median Re ( x (n) + ε(n))e− j2πkn/N
n=0,1,..,N −1
M N
+ jN median Im ( x (n) + ε(n))e− j2πkn/N .
n=0,1,..,N −1
Exercise 7.7. The power spectral densities of the signal Sdd (e jω ) and input
noise Sεε (e jω ) are given in Fig.7.19 for two cases. One on the left subplots
and the other on the right subplots. Show that the frequency response of the
optimal filter H (e jω ) is presented in Fig.7.19(bottom subplot for both cases
of signal and noise). Find the SNR at the input and output of the optimal
filter in both cases.
Exercise 7.8. Find the transfer function of an optimal filter for the signal
x (n) = s(n) + ε(n), where ε(n) is a white noise with the autocorrelation
rεε (n) = Nδ(n), and s(n) is a random signal obtained as the output of the
first-order linear system to a white noise with the autocorrelation rss (n) =
a|n| , 0 < a < 1. Signal and noise are not correlated.
Exercise 7.10. Find the power spectral densities of signals whose autocor-
relation functions are:
a) r xx (n) = δ(n) + 2 cos(0.πn) ,
b) r xx (n) = −4δ(n + 1) + 7δ(n) − 4δ(n − 1) .
∞
(c) r xx (n) = 2a cos(ω0 n) + ∑ σ2 (1/2)k δ(n − k )
k =0
420 Discrete-Time Random Signals
1.5 1.5
Sdd (ejω) Sdd (ejω)
1 1
0.5 0.5
0 0
-3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3
1.5 1.5
Sεε (ejω) Sεε (ejω)
1 1
0.5 0.5
0 0
-3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3
1.5 1.5
jω jω
H(e ) H(e )
1 1
0.5 0.5
0 0
-3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3
' '2
Figure 7.19 Power spectral densities of the signal 'S(e jω )' and input noise Sεε (e jω ) along
jω
with the frequency response of an optimal filter H (e ). Two cases are presented, one on the
left subplots and the other on the right subplots.
Part III
Selected Topics
421
Chapter 8
Adaptive Systems
8.1 INTRODUCTION
Classic systems for signal processing are designed to satisfy properties de-
fined in advance. Their parameters are time-invariant. Adaptive systems
change their parameters or form, in order to achieve the best possible per-
formance. These systems are characterized by ability to observe variations
in the input signal behavior and to react to these changes by adapting their
parameters in order to improve the desired performance of the output sig-
nal. Adaptive systems have the ability to "learn" so that they can appropri-
ately adapt the performance when the system environment is changed. By
definition the adaptive systems are time-variant. These systems are often
nonlinear as well. These two facts make the design and analysis of adaptive
systems more difficult than in the case of classical time-invariant systems.
Adaptive systems are the topic of this chapter.
Consider an adaptive system with one input and one output signal,
as in Figure 8.1. In addition to the algorithm that transforms the input
signal to the output signal, the adaptive system have a part that tracks
the system performance and implements appropriate system changes. This
control system takes into account the input signal, the output signal, and
some additional information that can help in making a decision on how the
system parameters should change.
Architecture of an adaptive system whose task is to transform the
input signal in such a way that it is as close to a reference (desired) signal
________________________________________
Authors: Ljubiša Stanković, Miloš Daković
423
424 Adaptive Systems
adaptation
rule other
data
e(n) - desired
+
error signal d(n) output
y ( n ) = a x ( n ) + b x ( n − 1) + c y ( n − 1) + d y ( n − 2)
a + bz−1
H (z) = .
1 − cz−1 − dz−2
The stability condition requires that the system poles are within the
unit circle. The pole values are
√
c± c2 + 4d
z1,2 = .
2
Consider two cases:
Case I: Poles are complex-valued, that is c2 + 4d < 0. In this case the parame-
√
ter d must be negative. In addition it has to satisfy the inequality |c| < 2 −d.
Poles of the system are
√
c −c2 − 4d
z1,2 = ±j .
2 2
√
They are within the unit circle if |z1,2 | < 1, meaning −d < 1 or −1 < d < 0.
Parameters a and b do not influence the system stability. They can take
any value. The conditions that the system is stable in this case are
d > −1
√
|c| < 2 −d.
Case II: Poles of the system are real-valued, c2 + 4d ≥ 0. The poles are
√ √
c+ c2 + 4d c− c2 + 4d
z1 = , z2 = .
2 2
The stability condition is
or ,
2d + c2 ± c 4d + c2 < 2.
426 Adaptive Systems
real
d
complex
-1
-2
-3 -2 -1 0 1 2 3
c
Figure 8.3 Region of system coefficient values where the system is stable.
c2 + 4d ≥ 0
,
2d + c2 + c 4d + c2 < 2
,
2d + c2 − c 4d + c2 < 2
|d| < 1
|c| < 1 − d.
A constant Ay exist if the system coefficients are limited, |hi (n)| < Ah
for any i and n. Then
|y(n)| < Ay = N A x Ah
and the system is stable.
The proof of system stability in this case is simpler that in the case of
the recursive system from previous example. The stability condition is also
simple here. It is sufficient that the system coefficients are bounded.
y ( n ) = h 0 x ( n ) + h 1 x ( n − 1 ) + · · · + h N −1 x ( n − N + 1 )
N −1
= ∑ hi x ( n − i )
i =0
Description and analysis of this system is quite simple. The system is linear.
In addition, the system with finite impulse response is always stable, for
any finite coefficient values. Finally, the realization of these systems is very
simple. In the case of adaptive systems the coefficients hi change their values
428 Adaptive Systems
-1 -1 -1
z z z
x(n)
y(n)
+ + +
in time. This simple system is called linear adaptive adder. Taking into
account time-variant nature of the coefficients the system is described by
y ( n ) = h 0 ( n ) x ( n ) + h 1 ( n ) x ( n − 1 ) + · · · + h N −1 ( n ) x ( n − N + 1 )
N −1
= ∑ h i ( n ) x ( n − i ).
i =0
will be introduced for description and analysis of this system. Vector X(n)
commonly consists of the current value of the input x (n) and its N − 1 past
values, while elements of vector H(n) are the system coefficients hi (n) in
the current instant n. The output signal can be written as a product of these
two vectors
y(n) = X T (n)H(n) = H T (n)X(n) (8.1)
where (·)T denotes the vector transpose operation. The output y(n) is a
scalar.
In general the input vector X(n) may not be formed using the de-
layed samples of the input signal x (n). It can be understood, in gen-
eral case, as a vector whose elements are N independent input signals
Ljubiša Stanković Digital Signal Processing 429
h (n)
x (n) 0
0
h (n)
x (n) 1
1
+ y(n)
...
h (n)
x (n) N-1
N-1
x 0 ( n ), x 1 ( n ), . . . x N −1 ( n ),
⎡ ⎤
x0 ( n )
⎢ x1 ( n ) ⎥
⎢ ⎥
X(n) = ⎢ .. ⎥ .
⎣ . ⎦
x N −1 ( n ) N ×1
This system has N inputs and one output (multiple input single output
system - MISO system). It is called a linear adaptive combinator, Figure 8.5.
The linear adaptive adder is just a special case of linear adaptive combinator
with xi (n) = x (n − i ) for i = 0, 1, . . . , N − 1.
e ( n ) = d ( n ) − y ( n ).
ε = E[e2 (n)],
where E[·] denotes the expected value. For the previous example with 6
values of error we get: ε = 0 for the first case, ε = 400 in the second case,
and ε = 0.01 in the third case. We see that this kind of measure meets our
expectation about the measure behavior.
In general a function J (e) is used to define the deviation of the error
signal e(n) from the ideal case. This is a cost function. It should be nonneg-
ative. It should also have a minimum where the error signal achieves its
lowest possible value (in ideal case 0), while local minima should not exist.
From the previous illustration we can conclude that one possible form of
the cost function is the mean square error function
1 L −1 2
L k∑
JLS (e) = e ( n − k ).
=0
This measure corresponds to the least square (LS) criterion in the analysis.
Consider now the square error signal in the linear adaptive adder
B C2
e2 (n) = (d(n) − y(n))2 = d(n) − H T (n)X(n) =
= d2 (n) − 2d(n)H T (n)X(n) + H T (n)X(n)X T (n)H(n)
1 A simple modification of the expected value of error that would produce the correct
conclusion would be the expected absolute value of error |e(n)|. However, the absolute
value is not differentiable function (at e(n) = 0). The algorithms for its minimization would
be complex. Therefore it will not be used here (it will be the main form of minimization
function in the chapter dealing with sparse signals).
Ljubiša Stanković Digital Signal Processing 431
In the mean square error ε = E[e2 (n)] calculation we should take into
account that the signals d(n) and x (n) are random, while the coefficients
of the system H(n) are deterministic variables
ε = E[e2 (n)] =
= E[d2 (n) − 2d(n)H T (n)X(n) + H T (n)X(n)X T (n)H(n)] =
= E[d2 (n)] − 2H T (n)E[d(n)X(n)] + H T (n)E[X(n)X T (n)]H(n). (8.2)
⎡ ⎤ ⎡ ⎤
x (n) E[d(n) x (n)]
⎢ x ( n − 1) ⎥ ⎢ E[d(n) x (n − 1)] ⎥
⎢ ⎥ ⎢ ⎥
E[d(n)X(n)] = E[d(n) ⎢ .. ⎥] = ⎢ .. ⎥.
⎣ . ⎦ ⎣ . ⎦
x ( n − N + 1) E[d(n) x (n − N + 1)]
Its elements E[d(n) x (m)] are the cross-correlations of the reference and in-
put signals. They will be denoted by rdx (n, m) = E[d(n) x (m)]. For station-
ary random signals rdx (n, m) is a function of time index difference only,
E[d(n) x (m)] = rdx (n − m). The previous relation can be rewritten in the form
⎡ ⎤ ⎡ ⎤
rdx (n, n) rdx (0)
⎢ rdx (n, n − 1) ⎥ ⎢ rdx (1) ⎥
⎢ ⎥ ⎢ ⎥
E[d(n)X(n)] = ⎢ .. ⎥=⎢ .. ⎥ = rdx . (8.3)
⎣ . ⎦ ⎣ . ⎦
rdx (n, n − N + 1) rdx ( N − 1)
The task of an adaptive system is to find the coefficients in vector H(n) that
will produce the minimal mean square error ε. In (8.5) we have a vector
rdx of cross-correlations between the reference and input signal and the
Ljubiša Stanković Digital Signal Processing 433
1 M −1
M k∑
r xx (i ) = E[ x (n) x (n − i )] = x (n − k ) x (n − k − i ) (8.6)
=0
80 ε(h ,h )
0 1
ε(h0,h1) 6
60
4
40
2
(h*,h*)
h1 0 1
20
0
0 -2
6 (h*,h*)
4 0 1
h 2 6
1 0 2 4 -2 0 2 4 6
-2 0 h0
-2 h0
Figure 8.6 Mean square error ε as a function of the system coefficients h0 and h1 . The optimal
coefficient values are denoted by h0∗ and h1∗ .
or in matrix form
RH = rdx .
The solution of this matrix equation
H∗ = R−1 rdx .
produces the optimal system coefficients, denoted by h0∗ and h1∗ . This is the
Wienner optimal filter.
Ljubiša Stanković Digital Signal Processing 435
6h0 + 5h1 = −3
8h1 + 5h0 = −2.
436 Adaptive Systems
14 3
h0∗ = − and h1∗ = .
23 23
It is a stationary point of ε(h0 , h1 ). The stationary point can be minimum,
maximum or neither of this two (just a saddle point). To check what kind of
stationary point is the previous solution (h0∗ , h1∗ ) we have to find the second
order partial derivatives of ε(h0 , h1 ). They are
∂2 ε ∂2 ε ∂2 ε
=6 =5 =8
∂h20 ∂h0 ∂h1 ∂h21
The stationary point is a minimum of the function if for h0 = h0∗ and h1 = h1∗
holds
* 2 +2
∂2 ε ∂2 ε ∂2 ε ∂ ε
> 0 and > .
∂h20 ∂h20 ∂h21 ∂h0 ∂h1
In the considered case these inequalities hold (6 > 0 and 6 · 8 > 52 ). Therefore,
the function ε(h0 , h1 ) has a minimum at h0∗ = − 14 ∗ 3
23 and h1 = 23 . The minimum
value is
28
ε(h0∗ , h1∗ ) = .
23
Example 8.4. The input signal x (n) is a zero-mean white noise with variance 1.
The reference signal is d(n) = 12 x (n − 2). Find the optimal coefficients of the
fourth order system.
⋆The optimal coefficients are the solution of
H∗ = R−1 rdx ,
1 1 1
rdx (i ) = E[d(n) x (n − i )] = E[ x (n − 2) x (n − i )] = r xx (i − 2) = δ(i − 2).
2 2 2
Therefore D ET
rdx = 0 1 .
0 2 0
Ljubiša Stanković Digital Signal Processing 437
Example 8.5. Signal x (n) is observed. The autocorrelation function values r xx (0) =
1, r xx (1) = 0.8, r xx (2) = 0.4 and r xx (3) = 0.1 are obtained by averaging. Find
the parameters of the optimal system that will predict values of x (n) one-step
ahead. The reference signal is d(n) = x (n + 1). Find the first and second order
system, with N = 1 and N = 2. In both cases calculate the value of minimal
error ε min .
⋆For the first-order system N = 1 we have H = [ h0 ], R = [r xx (0)] = 1
and rdx = [ E[d(n) x (n)]] = [ E[ x (n + 1) x (n)]] = [r xx (1)] = 0.8. The optimal
value of system coefficient h0 is
For the minimal error value ε min calculation we need the value of
4 2
y(n) = x ( n ) − x ( n − 1).
3 3
Note that by using data given in the example it was possible to calcu-
late the coefficients of a third-order ( N = 3) one-step ahead prediction system
as well.
Rq = λq. (8.10)
q
q0 =
||q||
Rq = λq
Rq − λq= 0
(R − λI)q= 0 (8.11)
- .
1 0.9
R= .
0.9 1
Rq0 = λ0 q0
Rq1 = λ1 q1 .
we may write
RQ = QΛ
or
R = QΛQ−1
Λ=Q−1 RQ.
R = QΛQ T
Λ=Q T RQ
The same matrix relations can be written for any order N of autocorrelation
matrix R.
Example 8.7. For the autocorrelation matrix R defined by
⎡ ⎤
3 1 1
R = ⎣1 3 1⎦
1 1 3
( R − λ0 I ) q0 = 0
⎡ ⎤⎡ ⎤
−2 1 1 q00
⎣ 1 −2 1 ⎦ ⎣q01 ⎦ = 0.
1 1 −2 q02
Since the rank of the system matrix is 2 the system does not have a unique
solution. One equation is omitted. Solving two remaining equations for two
unknowns we get q00 = q01 = q02 = α, where α is an arbitrary scalar. The
solution is ⎡ ⎤ ⎡ ⎤
q00 α
q0 = ⎣q01 ⎦ = ⎣α⎦ .
q02 α
Value of scalar α is√
found in such a way to normalize the intensity of q0 . It
follows that α = 1/ 3, or ⎡ ⎤
√1
⎢ 13 ⎥
q0 = ⎢ √ ⎥
⎣ 3⎦.
√1
3
For the second and third eigenvector we use λ = 2
(R − 2I)q2 =0
⎡ ⎤⎡ ⎤
1 1 1 q10
⎣1 1 1⎦ ⎣q11 ⎦ = 0.
1 1 1 q12
with the solution q12 = −q10 − q11 . Therefore we may take two variables
q10 = α and q11 = β as arbitrary. The solution is
⎡ ⎤
α
q = ⎣ β ⎦.
−α − β
Ljubiša Stanković Digital Signal Processing 443
We are interested in orthogonal vectors. For the second vector we will use
α = β. After normalization we get
⎡ ⎤
√1
⎢ 16 ⎥
q1 = ⎢ √ ⎥
⎣ 6 ⎦
− √2
6
The third vector should be orthogonal to q1 , meaning that their scalar prod-
uct is zero,
1 1 2
⟨q, q1 ⟩ = √ α + √ β − √ (−α − β) = 0
6 6 6
−α − β = 0
α = − β.
With α = − β after normalization, the third eigenvector is obtained in the form
⎡ 1 ⎤
−√
⎢ √1 2 ⎥
q2 = ⎣ ⎦.
2
0
Matrices Q and Λ contain the eigenvectors and eigenvalues, respectively,
⎡ ⎤
√1 √1 − √1 ⎡ ⎤
⎢ 1 3 6 2 ⎥ 5 0 0
1 1
Q =⎢ √
⎣ 3
√
6
√ ⎥
2 ⎦
Λ = ⎣0 2 0⎦
√1 − √ 2
0 0 0 2
3 6
The mean square error of a linear adaptive adder has been defined by (8.5)
as
ε = σd2 − 2H T (n)rdx + H T (n)RH(n). (8.12)
Its minimization produced the optimal coefficients
H∗ = R−1 rdx .
The minimal value of the mean square error is obtained for H(n) = H∗ in
(8.12) as
The error (8.12) can be expressed in terms of ε min , the autocorrelation matrix
R, and optimal coefficients H∗ . The value of σd2 is calculated using (8.13) and
replaced in (8.12),
ε = ε min + (H − H∗ )T QΛQ T (H − H∗ ).
V = Q T ( H − H ∗ ).
ε = ε min + V T ΛV.
v02 v21
ε−ε min + ε−ε min =1
λ0 λ1
E[ x1 (n) x2 (n)] = 0
h
1
v1
*
h1
h* h0
0
v0
Figure 8.7 Coordinate system change by translation origin to the optimal point and by using
coordinate axes defined by eigenvectors of the autocorrelation matrix
where µ/2 defines the step in the steepest descend direction. In general the
iterations are defined by
'
µ ∂ε ''
H n +1 = H n + (− )'
2 ∂H H=Hn
∂ε
= −2rdx + 2RH.
∂H
The iterative relation is then
h h
1 1
* *
h1 h1
3 1
2
2
* 1 h0 * h0
h0 h0
0
0
(a) (b)
h
1
3
1
*
h1
* h
2 h0 0
(c)
Figure 8.8 Steepest descent method illustration. Smallest step µ is used in case (a), larger step
is presented in case (b), and the largest µ corresponds to case (c). The steepest descend method
converges in cases (a) and (b), while it diverges in case (c). Contour plot of error function is
presented in all cases. Iterations are marked with dots and numbers 0, 1, 2, . . . , where 0 is the
starting iteration.
the coefficient values in the next iterations are denoted by 1, 2, 3, . . .We can
see that the iterative procedure converges toward the optimal coefficient
values h0∗ , h1∗ . A larger step µ is used in the case presented in Figure 8.8(b).
The iterative algorithm convergence is faster than in the previous case. In
the third case, presented in Figure 8.8(c), a very large step µ is used. The step
is here too large and the iterative algorithm does not converge to the optimal
coefficient values. Note that the convergence in all of these cases does not
depend on the initial position (initial value of the system coefficients).
The range of step µ values when the steepest descend iterative algo-
rithm converges can be determined in an analytical way. The optimal coeffi-
cient values are obtained as a result of the equation RH∗ = rdx . Consider the
deviation of the system coefficients vector Hn+1 in (n + 1)th iteration from
Ljubiša Stanković Digital Signal Processing 449
(1 − µλ0 )2 = (1 − µλ N −1 )2
µ(λ20 − λ2N −1 ) = 2(λ0 − λ N −1 )
2
µ= .
λ 0 + λ N −1
In this case, for k = 0, 1, . . . , N − 1 holds
(1 − µλ0 )2 ≥ (1 − µλk )2
(1 − µλ0 )2 − (1 − µλk )2 ≥ 0
µ(λ0 − λk )(2 − µ(λ0 + λk )) ≥ 0
* +
λ0 + λ k
2µ(λ0 − λk ) −1 ≥0
λ 0 + λ N −1
* +
λ k − λ N −1
2µ(λ0 − λk ) ≥ 0,
λ 0 + λ N −1
Ljubiša Stanković Digital Signal Processing 451
2
µopt =
λmax + λmin
1
µopt = .
λ
EH = A0 0n + A1 0n + . . . + A N −1 0n = 0.
In means that the steepest descend method, in this special case, will reach
the optimal system coefficients H∗ in one iteration step.
Assume that the cross-correlation vector of the input and reference signal is
- .
3.8
rdx = .
1.9
After 141 iterations the norm of the coefficients deviation is below 0.01.
Using a larger step, µ2 = 1, we get
- . - . - . - .
3.8 2.09 5.168 10.994
H1 = H2 = H3 = . . . H69 =
1.9 −1.52 0.019 −7.992
Example 8.9. Consider an adaptive system of the second order, described by the
difference equation
y ( n ) = h0 ( n ) x ( n ) + h1 ( n ) x ( n − 1)
where h0 (n) and h1 (n) are real-valued varying system parameters. The in-
put signal x (n) is stationary with the autocorrelation function r xx (m) =
5δ(m) + 3δ(m2 − 1). The reference signal is d(n) with the cross-correlation
between the input and reference signal rdx (m) = δ(m) + 12 δ(m − 1). System
is adapted by using the steepest descend method with step µ. The initial
conditions for the system coefficients are h0 (0) = 0 and h1 (0) = 0. Find the
Ljubiša Stanković Digital Signal Processing 453
optimal system coefficients in the sense of minimal mean square error, where
the error is e(n) = d(n) − y(n). Find the coefficient values as a function of
the iteration (time) index n. Find the range for the step µ when the coeffi-
cients converge toward the optimal values. For the cases when the system
coefficients converge find the number of iterations when the mean square
deviation of the coefficients from the optimal values will be lower than 10 −6 .
⋆System is of the second order. Its autocorrelation matrix and cross-
correlation vector are
- . - .
r (0) r xx (1) 5 3
R= xx =
r xx (1) r xx (0) 3 5
- . - .
r (0) 1
rdx = dx = 1 .
rdx (1) 2
The inverse of R is - .
1 5 −3
R −1 = ,
16 −3 5
with the optimal coefficients of the system
H∗ = R−1 rdx
- ∗. - . - . - 7 .
h0 1 5 −3 1
∗ = · 1 = 321 .
h1 16 −3 5 2 − 32
In order to get the coefficients h0 (n) and h1 (n) as a function of the
iteration (time) index n we will use the iteration relation for the steepest
descend method
Hn+1 = Hn + µ(rdx − RHn )
with the initial condition H0 = [0 0] T , or
- . - . *- . - .- .+
h0 ( n + 1) h (n) 1 5 3 h0 ( n )
= 0 +µ 1 − 3 .
h1 ( n + 1) h1 ( n ) 2 5 h1 ( n )
The system of equations is
h0 (n + 1) = h0 (n) + µ (1 − 5h0 (n) − 3h1 (n))
h1 (n + 1) = h1 (n) + µ (1 − 3h0 (n) − 5h1 (n)) .
Expressing h1 (n) from the first equation
1 − 5µ 1 1
h1 ( n ) = h0 ( n ) − h ( n + 1) +
3µ 3µ 0 3
and replacing it into the second equation, to get
1 − 5µ 1 1
h0 ( n + 1) − h ( n + 2) + =
3µ 3µ 0 3
* +
1 − 5µ 1 1
= (1 − 5µ) h0 ( n ) − h ( n + 1) + + µ − 3µh0 (n).
3µ 3µ 0 3
454 Adaptive Systems
with initial conditions h0 (0) = 0, h0 (1) = h0 (1) + µ (1 − 5h0 (1) − 3h1 (1)) = µ.
The solution of this equation is
3 1 7
h0 ( n ) = − (1 − 8µ)n − (1 − 2µ)n + .
32 8 32
From the relationship between h1 (n) and h0 (n) follows
3 1 1
h1 ( n ) = − (1 − 8µ)n + (1 − 2µ)n − .
32 8 32
Consider limit values
Using positive value of the step µ we get µ < 1/4. For this value of step, the
limit values are equal to the optimal system coefficient values. For µ > 1/4
the coefficients tend to infinity. In the limit case µ = 1/4, for a large n (so that
the term with (1 − 2µ)n can be neglected) the coefficients are approximately
equal to
7 3
h0 ( n ) = − (−1)n
32 32
1 3
h1 (n) = − − (−1)n .
32 32
They assume oscillatory form, with oscillations around the optimal values of
the system coefficients.
The number of iterations needed to get the mean square deviation of
the coefficients bellow 10−6 follows from
(h0 (n) − h0∗ )2 + (h1 (n) − h1∗ )2
< 10−6
2
or
9 1
(1 − 8µ)2n + (1 − 2µ)2n < 10−6 .
1024 64
This inequality does not have a closed form solution. For a given step µ
(0 < µ < 1/4) the minimal number of iterations n can be found in a numerical
way. Solutions for some possible values of step µ are given as
µ 0.01 0.1 0.15 0.18 0.19 0.2 0.21 0.22 0.24 0.248
n 239 22 14 11 11 10 12 17 55 282
Ljubiša Stanković Digital Signal Processing 455
From this table we can conclude that small values of step µ should not be
used since the convergence is very slow. Based on the values from the table
we can conclude that the optimal step is around µ = 0.2. Next we will find
this value based on the analytical consideration of the coefficients. Assume
an arbitrary value of variable n and use the equality for the mean square error
9 1
(1 − 8µ)2n + (1 − 2µ)2n = 10−6
1024 64
9 B Cn 1 B Cn
(1 − 8µ)2 + (1 − 2µ)2 = 10−6 .
1024 64
This formula provides the relation between n and µ. Finding the value of µ
that produces minimal n is not simple. Note that the left side of the previous
equation consists of two positive terms. Assume that, for a sufficiently large
n, the terms are of the same order. It results in
(1 − 8µ)2 = (1 − 2µ)2
µ(5µ − 1) = 0
or µ = 0.2. For this value of step µ the number of iterations is
log 1024 −6
25 10
n= 9
≈ 9.888 ≈ 10
log 25
These values of µ and n correspond to the numerically obtained ones,
presented in the table. For µ < 0.2 the second term dominates in the mean
square deviation relation. The number of iterations can then be determined
as
1 B Cn
(1 − 2µ)2 = 10−6
64
log(64 · 10−6 )
n= .
log((1 − 2µ)2 )
For µ = 0.15 we get n ≈ 13. 537. This result is in agreement with the numerical
one obtained for n = 14. For µ > 0.2 the first term is dominant and
log(64 · 10−6 )
n= .
log((1 − 8µ)2 )
For µ = 0.22 value n = 17. 594 follows. It corresponds to the numerical result
n = 17.
Example 8.10. Analyze the convergence of the steepest descend method using the
eigenvalues of the autocorrelation matrix from the previous example.
⋆The autocorrelation matrix is
- .
5 3
R= .
3 5
456 Adaptive Systems
2 2 1
µ< = = = 0.25.
λmax 8 4
2 2 1
µ= = = = 0.2.
λmax + λmin 8+2 5
2 16
µ< = √ ≈ 1.085.
λmax 9 + 33
Optimal rate of convergence is achieved if
2 2 8
µ= = √ √ = ≈ 0.889.
λmax + λmin 9− 33 9+ 33
+ 8 9
8
Statistical properties of the signals are not fast-varying. For each next instant
n we may use the system coefficients obtained at the previous instant n − 1
(in K iterations) as the initial values
H 0 ( n ) = H K ( n − 1 ).
458 Adaptive Systems
Assume that only one iteration is done for each time instant n. With K = 1 it
follows
For notation simplicity, the index denoting the number of iterations will be
omitted (since it has been assumed that it is 1). Then we can write
In the LMS algorithm the autocorrelation matrix R(n + 1) and the cross-
correlation vector rdx (n + 1) are approximated by their instantaneous values
Difference d(n) − y(n) is the error signal e(n). A common LMS algorithm
form reads
H ( n + 1) = H ( n ) + µ e ( n ) X ( n ) (8.19)
In each time instant the coefficients of adaptive system are changed with
respect to their previous values in the direction of input signal vector X(n).
Intensity of the change is determined by the step µ and the error signal at
the previous instant e(n).
For a system of order N the LMS algorithm is numerically very effi-
cient. At each instant n it needs N + 1 multiplication and N additions.
Consider a stationary signals when matrix R(n) and vector rdx (n) are time
invariant. Then the LMS algorithm converges „in mean” toward the optimal
Ljubiša Stanković Digital Signal Processing 459
lim E[H(n)] = H∗
n→∞
under the same conditions as in the steepest descend case. The step µ in the
LMS algorithm should be such that
2
µ<
λmax
Assume that the expected value E [H(n)], for a sufficiently large n, does not
depend on n as well as that X(n) and H(n) are mutually independent. Then,
with E[H(n + 1)] = E[H(n)] = H LMS , it follows
or
H LMS = H LMS + µrdx − µRH LMS .
From this relation we get
This proves the statement that the LMS algorithm coefficients converge „in
mean” to the optimal system coefficient values.
The convergence in mean does not mean that the LMS achieves the
optimal value in the stationary state. If there is a smallest difference between
the reference and the output signal it will cause the coefficients fluctuation.
460 Adaptive Systems
In addition convergence in mean does not guarantee that the results will
converge to the same values. It can be shown that the LMS algorithm will
converge with finite variations of the coefficients and the error if the step µ
satisfies a more conservative bound
2
µ<
∑kN=1 λk
than the bound µ < 2/λmax requited for the convergence „in mean”. It is
known that the sum of the eigenvalues is equal to the trace of matrix R.
As it has been stated for the steepest
@ descend
A method, the trace can easily
calculated as Tr[R] = Nr xx (0) = NE | x (n)|2 = Ex , where Ex is input signal
energy.
can adapt its coefficients, through the iterative procedure, in such a way
that y(n) is as close as possible to d(n). In an ideal case, with N ≥ M, it is
possible to obtain limn→∞ H(n) = [ a0 a1 ...a M−1 0 ... 0]. In that case e(n) = 0.
The system is identified when the error is equal to zero. The identification
of an unknown system is illustrated in Figure 8.9.
If the unknown system is an infinite impulse response (recursive)
system or if the order of finite impulse response system is greater than the
adaptive system order, then we will get an approximation of the unknown
system, in the sense of minimal mean square error. The error signal will not
vanish as n increases.
Ljubiša Stanković Digital Signal Processing 461
unknown d(n)
system
e(n)
It has been assumed that this signal is unknown. Identification of this system
is done using an adaptive system of order N = 3. The identification process
is repeated with an adaptive system of order N = 5. The input to the system
x (n) is Gaussian zero-mean white noise with variance σx2 = 1. The step
µ = 0.05 is used in the adaptive algorithm. Comment the results.
⋆For the input signal x (n) the reference signal is d(n) = 3x (n) +
2x (n − 1) − x (n − 2) + x (n − 3). This reference signal is used in the adaptive
system of order N = 3 implemented as
y ( n ) = H T ( n ) X ( n ) = h0 ( n ) x ( n ) + h1 ( n ) x ( n − 1) + h2 ( n ) x ( n − 1).
1 200
101 n=∑
h̄0 = h0 (n) = 2.72 h̄1 = 2.03 h̄2 = −0.92.
100
462 Adaptive Systems
6 4
h (n)
4 3 0
coefficients h (n)
error signal e(n)
2 2
k
h (n)
1
0 1
-2 0
h (n)
2
-4 -1
(a) (b)
-6 -2
0 50 100 150 200 0 50 100 150 200
time index n time index n
6 4
h (n)
0
4 3
coefficients h (n)
h (n)
error signal e(n)
1
2 2
k
h (n)
3
0 1
h (n)
4
-2 0
h (n)
2
-4 -1
(c) (d)
-6 -2
0 50 100 150 200 0 50 100 150 200
time index n time index n
Figure 8.10 Identification of unknown system from Example 8.12. System order is N = 3 (a-
b), and N = 5 (c-d). The error signal is presented on the left and the system coefficients on the
right.
They are close to the true values of first three system coefficients (3, 2, and
−1), meaning that the LMS algorithm in this case follows the true values „in
mean”.
For the fifth order adaptive system (N = 5), after about 100 iterations,
the error signal is almost 0. The adaptive system has identified the unknown
system. The final coefficient values in this case are
3
2 1 0
2
coefficients h (n)
error signal e(n)
4
1
k
0
0 3
-1 -1
1
-2 -2
(a) (b)
-3
0 50 100 150 200 0 50 100 150 200
time index n time index n
3
1 0
2 2
coefficients h (n)
error signal e(n)
4
1
k
6
0
5 7 8 9
0 3
-1
-1
1
-2 -2
(c) (d)
-3
0 50 100 150 200 0 50 100 150 200
time index n time index n
Figure 8.11 Identification of unknown system from Example 8.13. System order is N = 5
(a) and (b), and N = 10 (c) and (d). The error signal is presented on the left and the system
coefficients on the right. System coefficients hk (n) are labeled with k.
Example 8.13. Repeat the simulation from 8.12 for the case of unknown system
whose transfer function is
11 −1
1− 8 z
H (z) = 1 −1
.
1+ 4z − 15
64 z
−2
Use the step µ = 0.05 and the adaptive systems of order N = 5 and N = 10.
⋆In this case the unknown system is a system with an infinite impulse
response. In theory, we should have an adaptive system with very large
(infinite) order to identify this system. The identification results with the
adaptive systems of order N = 5 and order N = 10 are shown in Figure 8.11.
We can see that the system with order N = 10 reduces the error to a small
value, achieving a good approximation of the unknown system.
464 Adaptive Systems
signal of interest
with interference
d(n)
Example 8.14. Consider a simple setup when we will be in a position to follow the
system behavior in an intuitive way. Assume that the input signal η (n) is a
white zero-mean Gaussian noise with variance ση2 = 1. The desired signal is of
the form s(n) = cos(2πn/512) + 0.5 sin(2πn/256 + π/3), with 0 ≤ n ≤ 5000.
The noise at the position of signal s(n) is ε(n) = 0.5η (n) − 0.7η (n − 1). Find
the optimal coefficients and then the error signal at the output of an LMS
Ljubiša Stanković Digital Signal Processing 465
based adaptive system from Figure 8.12. Comment the result with respect to
the LMS step µ.
⋆ A second-order adaptive system with the input
will be used. The adaptive system output is y(n) = H T (n)X(n) = h0 (n)η (n) +
h1 (n)η (n − 1). The reference signal is d(n) = s(n)+ ε(n) The input signal
autocorrelation matrix and the cross-correlation vector of the input and
reference signal are
- . - .
rηη (0) rηη (1) 1 0
R= = and
rηη (1) rηη (0) 0 1
- . - .
0.5rηη (0) 0.5
rdx = rεη = .
−0.7rηη (0) −0.7
The optimal coefficient values are
- .
0.5
H∗ = R−1 rdx = ,
−0.7
producing the output y(n) = h0∗ η (n) + h1∗ η (n − 1) = 0.5η (n) − 0.7η (n − 1), as
expected. The error signal is then e(n) = d(n) − y(n) = s(n).
Next the LMS algorithm is used in the adaptation, at each time instant
n, as H(n + 1) = H(n) + µe(n)X(n) with H(0) = 0. For large n the error
value will not vanish since, in an ideal case e(n) = s(n). Therefore the system
coefficients H(n + 1) will fluctuate with µe(n)X(n) ̸= 0. In means that, in
order to reduce these fluctuations, the step µ should be much lower than its
bound µ < 2/λmax = 2 required by the convergence condition. The results
with µ = 0.01 and µ = 0.001 are presented in Figure 8.13.
Example 8.15. Consider a signal s(n) embedded in high noise ε(n). The signal
acquisition is done using two microphones. One close to the source of s(n)
and the other far from this source.
Signal s(n) is modelled as a nonstationary zero-mean Gaussian noise
with variance σs2 (n) = 3 sin(πn/100)4 . Signal ε(n) is a stationary zero-mean
white Gaussian noise with variance σε2 = 300. The noise at the input to the
first and the second microphone is modified by the system transfer functions
0.5 0.5
coefficients h (n)
coefficients hk(n)
k
0 0
-0.5 -0.5
-1 -1
3 3
2 2
error signal e(n)
1 1
0 0
-1 -1
-2 -2
-3 -3
0 2000 4000 0 2000 4000
time index n time index n
Figure 8.13 Simulation results for Example 8.14 – Adaptive system for noise cancelation.
System coefficients are given in upper subplots. Lower subplots present error signal and target
signal (black line).
1 0
coefficients h (n) 0.8
k
1
0.6
0.4
2
0.2
0 4 5 6 7 8 9
-0.2 3
20
10
error signal e(n)
-10
-20
0 50 100 150 200 250 300 350 400 450 500
time index n
Figure 8.14 Simulation results for Example 8.15 – Adaptive system for noise cancelation.
System coefficients are given in upper subplot. Lower subplot presents error signal (gray line)
and target signal (black line).
d(n)
20
10
input signal x(n) 10 5
signal s(n)
0 0
-10 -5
0.1
10
coefficients h (n)
0 0
-0.05 -5
1
0.8
0.6
0.4
0.2
(e)
0
-2 0 2
frequency ω
Figure 8.16 Simulation results for Example 8.16. Signal with sinusoidal interference (a), signal
without interference (b), system coefficients (c), output signal (d), and Fourier transform of the
final system coefficients, hk (200), k = 0, 1, 2, ...N − 1, (e).
e ( n ) = d ( n ) − H T ( n ) X ( n ) = x ( n ) − y ( n − 1 ).
If the adaptive system is able to adjust its coefficients so that the error is
small, with y(n − 1) ≈ d(n) = x (n) then its output will predict the next
470 Adaptive Systems
output
e(n)
signal value
y ( n ) ≈ x ( n + 1).
Consider a signal described by
where ε(n) is a zero-mean white noise with variance σε2 . We may expect that
the optimal coefficients for one step ahead prediction should be
H∗ = [ a1 a2 a M 0 ... 0]T
The prediction error will depend on the ratio of the recursive part of signal
x (n) and random part ε(n). For large n the error value will not vanish since,
in an ideal case e(n) = ε(n). The system coefficients H(n + 1) will fluctuate
with µe(n)X(n) ̸= 0 causing so called excessive mean square error. In order
to reduce these fluctuations (this kind of error), the step µ should be much
lower than its bound µ < 2/λmax = 2 required by the convergence condition.
The excessive means square error is proportional to the signal energy and
the algorithm step, EMSE = µEx /2.
Example 8.17. Consider a third order adaptive system for signal prediction. As-
sume that the signal x (n) is random signal with autocorrelation function
r xx (m) = σx2 (m)δ(m). Find the output signal, assuming that the adaptive
system has adjusted its coefficients is such a way that they are equal to the
optimal ones.
Ljubiša Stanković Digital Signal Processing 471
Optimal coefficient values are H∗ (n) = R−1 (n)rdx (n) = 0. It means that the
output signal is zero.
Example 8.18. Assume that in the previous example the input signal is stationary
with autocorrelation r xx (m) = 2−|m| . Find the optimal coefficient values and
the form of optimal predictor.
⋆Autocorrelation matrix of the input signal and its cross-correlation
vector with the reference signal are
⎡ 1 1⎤
1 2 4
R = ⎣ 12 1 1⎦
2
1 1
4 2 1
⎡ ⎤ ⎡ ⎤⎡1⎤
E[ x (n) x (n − 1)] r xx (1) 2
rdx = ⎣E[ x (n) x (n − 2)]⎦ = ⎣r xx (2)⎦ ⎣ 14 ⎦
E[ x (n) x (n − 3)] r xx (3) 1
8
The output signal, predicting one step ahead the input signal value is
1
y(n) = x̂ (n + 1) = x ( n ).
2
where ε(n) is a zero-mean white noise with variance σε2 = 0.5. Find optimal
values of the system coefficients for one step ahead prediction of a second
order adaptive system. Plot the adaptation coefficients for the second order
LMS algorithm with µ = 0.1 and µ = 0.01. Calculate and plot average of the
prediction square error in 100 realizations in dB for both cases. What is the
convergence bound for µ.
Repeat the calculation for x (n) = 12 x (n − 1) + ε(n) and the first-order
adaptive system.
⋆For the optimal values of the adaptive prediction system we have to
find autocorrelation matrix of the input signal vector, in this case of X(n) =
[ x (n − 1) x (n − 2)]T . Signal x (n) is obtained as the output of a recursive
system whose input is ε(n) and the transfer function is
9 8
1 17 17
H (z) = = + .
1 + 0.1z−1 − 0.72z−2 1 + 0.9z−1 1 − 0.8z−1
Its impulse response is
9 8
h(n) = [ (−0.9)n + (0.8)n ]u(n).
17 17
Therefore the signal x (n) can be written as
x ( n ) = h ( n ) ∗ ε ( n ).
r xx (m) = E[ x (n + m) x (n)]
∞ ∞
= E[ ∑ ∑ ε(n + m − k1 )h(k1 )ε(n − k2 )h(k2 )]
k 1 =0 k 2 =0
∞
= σε2 ∑ h(k )h(k − m)
k =0
∞
r xx (1) = σε2 ∑ h(k)h(k − 1) = −0.425,
k =0
and r xx (2) = 0.8993. The optimal coefficient values are
- . −1 - . - .
1.19 −0.425 −0.425 −0.1
H∗ = =
−0.425 1.19 0.8993 0.72
Ljubiša Stanković Digital Signal Processing 473
since
@ A @ A
R(n)=E X(n)X T (n) = E [ x (n − 1) x (n − 2)] T [ x (n − 1) x (n − 2)]
- .
r (0) r xx (1)
= xx
r xx (1) r xx (0)
0.2 0.2
0 0
h (n), h (n)
h (n), h (n)
-0.2 -0.2
1
1
-0.4 -0.4
0
0
-0.6 -0.6
-0.8 -0.8
0 500 1000 0 500 1000
time index n time index n
2 2
0 0
10loge2(n) [dB]
10loge2(n) [dB]
-2 -2
-4 -4
-6 -6
0 500 1000 0 500 1000
time index n time index n
Figure 8.18 Coefficients and error in the prediction setup of the second order adaptive LMS
algorithm for µ = 0.1 (left) and µ = 0.01 (right). The results are averaged over 500 realizations.
td = l cos(θ )/c.
The same delay holds for each next antenna, since the antenna array is
uniform. The signal at (k + 1)th antenna is
* + B
l cos(θ )
C
l cos(θ ) jω0 t−k c
rk (t) = s t − k e .
c
Ljubiša Stanković Digital Signal Processing 475
Since the signal s(t) is narrowband, meaning that its amplitude variations
are slow, we may write
* +
kl cos(θ ) ∼
s t− = s ( t ).
c
Including this fact, the signal at the (k + 1)th antenna assumes the form
ω0 l
rk (t) = s (t) e jω0 t e− j c kl cos(θ )
= s (t) e jω0 t e− j2π λ k cos(θ ) ,
N −1 N −1 l
y(n) = ∑ hk (n) xk (n) = s(n∆t) ∑ hk (n)e− j2π λ k cos(θ )
k =0 k =0
N −1
= s(n∆t) ∑ hk (n)e− jωk = s(n∆t)FTk [hk (n)]|ω =2π l cos(θ )
λ
k =0
= s(n∆t)H T (n)a(ω )|ω =2π l cos(θ )
λ
where
The output signal y(n) is equal to the input signal s(n∆t) multiplied by
the Fourier transform of the coefficients hk (n), k = 0, 1, ..., N − 1 at ω =
2π λl cos(θ ).
476 Adaptive Systems
h (n)
x0(n) 0
l h1(n)
x (n) y(n)
1
+
...
...
...
h (n)
x (n) N-1
N-1
θ -
x (n) +
ref
e(n)
incident antenna adaptive
wave array system
Figure 8.19 Uniform antenna array with adaptive system for interference rejection.
Now we will consider an adaptive setup of this system with the aim
to cancel out input interference signals. Assume that several waves with
incident angles θ1 , θ2 ,...,θ P arrives to this antenna array. The input signal to
each antenna is then
P l P
xk (n) = ∑ s p (n∆t)e− j2π λ k cos(θ p ) = ∑ s p (n∆t)e− jω p k
p =1 p =1
The reference signal is the output of reference antenna d(n) = x N (n). With
e(n) = d(n) − H T (n)X(n) and d(n) = x N (n) we can write
@ A@ AT
e(n) = −H T (n) 1 X T (n) x N (n)
They will approach the values such that their Fourier transform is a notch
filter form like function. Then all the input signals will be canceled out
and the error e(n) will be zero-valued (assuming that the order of system
is appropriate for the number of input signal from different directions). It
was assumed that the desired signal was not present (switched off) during
the adaptation process, otherwise it would be canceled out as well. When
the system ends adaptation we can then switch on out desired signal from
a direction that does not correspond to one of interferences. It will pass
through the system, while all interfering signals are canceled out.
This kind of system is simulated using an adaptive system of order
N = 10, with four interfering signals with the directions or arrival θ1 = 30◦ ,
θ2 = 75◦ , θ3 = 90◦ and θ4 = 120◦ . Note that the ability to cancel out a number
of disturbances depends on the system order and the positions of the angles
of arrivals. With for example, 10 coefficients we will not be able to achieve
an arbitrary number of arbitrary positioned zeros in its Fourier transform.
The antenna system gain is
' '
' y(n) ' '' T '
'
'
A(θ ) = ' ' = 'H (n)a(ω )
s(n∆t) ' |ω =2π λ cos(θ ) '
l
' '
' '
= 'FTk [hk (n)]|ω =2π l cos(θ ) '
λ
or in decibels
a(θ ) = 20 log10 A(θ ) [dB].
It is calculated for angles 0◦
≤θ≤ 180◦
and presented in Figure 8.20. The
antenna system is adjusted to cancel out the interference (gain of the system
is here below −25dB). Signals from other directions will pass unattenuated
through this system, with a gain of about 5dB. A radiation plot of this
system is presented in 8.21.
478 Adaptive Systems
10
0
antenna gain [dB]
-5
-10
-15
-20
-25
-30
0 15 30 45 60 75 90 105 120 135 150 165 180
incident angle [degrees]
Figure 8.20 Antenna system gain for various incident angles. Iterference incident angles are
marked with arrows.
o
θ=90 θ=75 o
o
1 20
θ=
θ=
3 0
o
antenna array
In this case the input to microphone is an acoustic signal. This is the desired
signal in adaptive system. In addition to this signal there are interference
signals coming from speakers. These signals come to the microphone over
direct path and one or more reflected paths. Adaptive system has the task
Ljubiša Stanković Digital Signal Processing 479
reflection voice
speaker microphone
direct path
d(n)
to cancel out the influence of this interference. The system for adaptive
acoustic echo cancellation is presented in Figure 8.22. This kind of adaptive
systems is used in hands-free devices and in the systems for audio commu-
nication over internet.
Example 8.20. Consider a system as in Figure 8.22. Assume that the signal from
microphone is sampled with frequency f s = 11025 Hz. Speed of the acoustic
signal propagation is c = 330 m s . Speaker is at the distance r0 = 27 cm from the
microphone, meaning that the direct component reaches microphone with
a delay of f s r0 /c ≈ 9 samples. The system is in a room whose dimensions
are such that the reflected components passing paths longer than 3m can
be neglected. From this fact we can conclude that the maximal delay is
100 samples. The intensity of reflected components is inversely proportional
to the propagation path. With these assumptions the impulse response of
the system that transfers a signal x (n) from the speaker to the input in
microphone can be modelled as
⎧
⎨ 1 for n = 9
wn
hecho (n) = for 10 ≤ n ≤ 100
⎩ n
0 for other values of n
10
signal s(n)
voice
-10
10
microphone
signal d(n)
-10
10
signal e(n)
output
-10
35
30 µ= 0.00001
echo rejection ratio [dB]
25
µ= 0.00005
20
15 µ= 0.00020
10
µ= 0.00050
5
0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
time [s]
The number of arithmetic operations is the same as in the sign LMS algo-
rithm.
For an arbitrary step µ it is still possible to avoid multiplications by
applying sign functions to both the error and the signal vector X(n). The
sign-sign LMS is defined by
Note that the change of system coefficients in each iteration is ±µ. It pre-
vents the system from achieving stationary state (coefficients oscillate). To
avoid this effect it is possible to define a sign function with a "dead zone" as
sign(α − D ) + sign(α + D )
signD (α) = .
2
Function sign D (α) is equal to −1 for α < − D, and 1 for α > D, while it is 0
for |α| < D. Value of this function at the discontinuity points is 1/2.
Example 8.21. Consider adaptive system described in 8.12 (page 461). Simulate the
system using signed error LMS, signed regressor LMS, and sign-sign LMS.
Use the adaptive system of order N = 5 with step µ = 0.05.
⋆Simulation results are presented in Figure 8.25. We can conclude that
the convergence is slower when the sign is applied to the error function (in
both cases) than in the case when the sign is applied to the signal vector. If the
error signal is kept in its original form, when the error approaches to zero the
system coefficients approach to their stationary values, without oscillations.
This is not the case for sign error LMS form.
The block LMS differs from the standard in the sense that the coefficients
are not modified at each instant n, but after each K instants. Time index is of
the form n = pK + m, where K is the block length, p is the block index,
and m is the index of a sample within a block 0 ≤ m < K. For adaptive
systems with large N the computation time of standard LMS algorithm can
be reduced by using the block LMS. They can be implemented with the FFT
algorithms (fast block LMS algorithm). Coefficients are adapted in such a
way that all coefficient modifications within a block are added up and the
final coefficients modification is done according to
K −1
H ( p + 1) = H ( p ) + µ ∑ e( pK + m)X( pK + m).
m =0
Ljubiša Stanković Digital Signal Processing 483
0 0 0
4 4 4
coefficients hk(n)
0 0 0
2 1 2 1 2 1
3 3 3
0 4 0 4 0 4
2 2 2
(d) (e) (f)
-2 -2 -2
0 100 200 0 100 200 0 100 200
time index n time index n time index n
Example 8.22. Consider the system from Example 8.15 (page 465). Simulations will
be repeated with the block LMS using the block size K = 50.
⋆The results of simulation are presented in Figure 8.26. Note that
the coefficients change at the end of each block. Deviations of the adaptive
system coefficients are lower than in the LMS algorithm. The input signal-to-
noise ratio is −25dB, while this ratio at the output is 11dB.
1 0
coefficients h (n)
0.8
k
1
0.6
0.4
2
0.2
4 5
0 6 7 8 9
-0.2 3
20
10
error signal e(n)
-10
-20
0 50 100 150 200 250 300 350 400 450 500
time index n
relation is
µnorm
H ( n + 1) = H ( n ) + e(n)X(n)
1 + X T (n)X(n)
µnorm < 2
µnorm
µ= (8.20)
α + X T (n)X(n)
The basic idea of a variable step size (VSS) variant of the LMS algorithm is to
change the step µ during iterations. The step should not be too large to cause
divergence, but also not too small so that it cannot detect possible changes
in the coefficients. Various variable step size algorithms have been derived
in literature, using the error signal, the input signal, the reference signal, and
the output signal in the considered and previous instants. Previous values
of the step µ are also used in the algorithms.
The normalized LMS, in the case that the signal energy changes during
the considered interval, may be considered as a variable step size LMS. A
form of the normalized LMS was used to define an interesting and efficient
variable step size algorithm called generalized normalized gradient descent
(GNGD) algorithm, with appropriate adaptation of coefficient α in (8.20).
It takes into account most of above-mentioned signals. Adaptation formula
for coefficient α is derived in the form
e ( n ) e ( n − 1) X T ( n ) X ( n − 1)
α(n + 1) = α(n) − ρµ 2
(8.21)
(e(n − 1) + X T (n)X(n))
where e(n) = d(n) − H T (n)X(n) is the error signal and ρ is a constant ρ < 1.
A simple form of the variable step size LMS can be obtained using
the standard LMS calculated with two (or several) step values µ. One µmin
should be sufficiently small that the coefficient deviations in the steady state
are small and the other µmax sufficiently large that the convergence is fast
when the change of the coefficients is detected. The crucial decision in this
simple algorithm is when to use the LMS with the small step and when the
LMS with the large step. One of possible criteria is based on the energy of
error signal. If the error increases the algorithm will switch to the larger step,
after few instants, when the energy of error exceeds the threshold α.
The transition period in this case could be much shorter if the variable
step size LMS based on the weighting coefficients bias-variance trade-off is
used. In this algorithm the difference between the coefficients is compared
with the expected standard deviation of the coefficients (with a constant κ)
for considered steps µ,
|hk (n, µmin ) − hk (n, µmax )| ≷ κ (σµmin + σµ max ).
If the difference is small (within confidence interval with few (σµmin +
σµ max )) then the system is assumed to be in the stationary state and small
step µmin should be used. Otherwise the system is in a transition and the
large step µmax should be used. Standard deviations of the coefficients can
be calculated based on the input signal energy and the used steps µ.
486 Adaptive Systems
Example 8.23. Consider the system from Example 8.15 (page 465). The simulation
will be repeated with the variable step size LMS. In this case it has been
assumed that at the instant n = 200 transfer function H1 (z) changes to
Consider two ways of the step µ changes. In the first case use
αµmin + Ee (n)µmax
µ(n) =
α + Ee (n)
where µmin = 0.00005, µmax = 0.0005 and α = 25, while Ee (n) is an average
energy (power) of the signal in previous instants K = 50
n
1
Ee (n) = ∑ e2 ( n )
K k = n − K +1
In the second case use only two steps µmin and µmax the switching criterion
Ee (n) ≷ α.
⋆The result of simulation are presented in Figure 8.27. At the begin-
ning, the algorithm uses maximal possible step size µmax . Then the step de-
creases. At n = 200 there is an abrupt change in the considered system and
the adaptive system adjusts its step to the new circumstances.
The results using the second way of step size change, using only
two steps µmin and µmax , are presented in Figure 8.28. On the coefficients
plot, a gray shade indicates the region where the system uses larger step
µ(n) = µmax . Within the remaining time intervals the lower value of step
µ(n) = µmin is used.
where x (n) is a zero-mean Gaussian random signal with variance σx2 = 0.6.
Using the constant step LMS with µ = 1 and µ = 0.1 identify the system.
Compare the identification result with the normalized LMS (8.20) using
adaptive α defined by the generalized normalized gradient descent (GNGD)
algorithm, (8.21).
1 0 0
coefficients hk(n) 1 3
0.5
2 1
2
0 4 5 4
5
3
5 µ
max
4
3
2
1 µ
min
0
0 50 100 150 200 250 300 350 400 450 500
time index n
20
10
e(n) and s(n)
-10
-20
0 50 100 150 200 250 300 350 400 450 500
time index n
Figure 8.27 LMS algorithm with variable step (Example 8.23, first case).
When the input signal x (n) (or/and the coefficients of an adaptive system
h(n), or its reference signal d(n)) are complex-valued then the complex
LMS algorithm should be used. In this case the square absolute value is
minimized. The error function is
1 0 0
coefficients h (n)
k
1 3
0.5
2 1
2
4 5 4
0 5
3
20
10
e(n) and s(n)
-10
-20
0 50 100 150 200 250 300 350 400 450 500
time index n
Figure 8.28 LMS algorithm with variable step (Example 8.23, second case).
-50
average error 10loge2(n) [dB]
1 Adaptive GNGD
2 LMS with µ=1
-100 3 LMS with µ=0.1
-150
-200 1 2 3
-250
-300
0 100 200 300 400 500 600 700 800 900 1000
time index n
Figure 8.29 Averaged square error in dB for the constant LMS with µ = 0.1 and µ = 1 and the
variable step size generalized normalized gradient descent (GNGD) algorithm.
Ljubiša Stanković Digital Signal Processing 489
H ( n + 1 ) = H ( n ) + µ e ( n ) X ∗ ( n ).
e (i | n ) = d (i ) − y (i | n ) = d (i ) − X T (i ) H ( n )
n
∂e(n)
= −2 ∑ λn−i e(i |n)X(i ) = 0.
∂H(n) i =1
n
∑ λn−i (d(i) − XT (i)H(n))X(i) = 0
i =1
or
n n
∑ λ n −i d (i ) X (i ) = ∑ λ n −i X (i ) X T (i ) H ( n )
i =1 i =1
r̃dx (n) = R̃(n)H(n).
This solution is similar as in the optimal filter case. The difference is that
the cross-correlation vector r̃dx (n) and the autocorrelation matrix R̃(n) are
obtained by a weighted averaging
n
r̃dx (n) = ∑ λ n −i d (i ) X (i )
i =1
n
R̃(n) = ∑ λ n − i X ( i ) X T ( i ).
i =1
In order to find the relation between H(n) and H(n − 1) we have to find a
relation between R̃−1 (n) and R̃−1 (n − 1) and between r̃dx (n) and r̃dx (n − 1).
Ljubiša Stanković Digital Signal Processing 491
By definition
n
R̃(n) = ∑ λ n −i X (i ) X T (i )
i =1
n −1
=λ ∑ λ(n−1)−i X(i)XT (i) + X(n)XT (n) = λR̃(n − 1) + X(n)XT (n).
i =1
The inverse matrix R̃−1 (n) relation is need for a recursion. Using the matrix
inversion formula for
A = B + ab T ,
where A and B are square matrices of order N, and a and b are vector
columns with N elements, we have
A −1 = B −1 − B −1 a (1 + b T B −1 a ) −1 b T B −1 .
1 −1
R̃−1 (n) = R̃ (n − 1)
λ
* + −1
1 1 1
− R̃−1 (n − 1)X(n) 1 + X T (n) R̃−1 (n − 1)X(n) X T (n) R̃−1 (n − 1).
λ λ λ
it follows
1 −1 1
R̃−1 (n) = R̃ (n − 1) − R̃−1 (n − 1)X(n)X T (n)R̃−1 (n − 1).
λ λ(λ + µ(n))
and introducing
C ( n − 1) X ( n )
g(n) =
λ + µ(n)
492 Adaptive Systems
we get
µ ( n ) = X T ( n ) C ( n − 1) X ( n )
1 1
C ( n ) = C ( n − 1 ) − g ( n ) X T ( n ) C ( n − 1 ).
λ λ
The relation between vectors r̃dx (n) and r̃dx (n − 1) is obtained from
n
r̃dx (n) = ∑ λ n −i d (i ) X (i )
i =1
n −1
=λ ∑ λ(n−1)−i d(i)X(i) + X(n)d(n) = λr̃dx (n − 1) + X(n)d(n).
i =1
1 0
coefficients h (n) 1
k
0.5
2
0 4 5 6 7 8 9
20
10
error signal e(n)
-10
-20
0 20 40 60 80 100 120 140 160 180 200
time index n
For recursive systems (with infinite impulse response) value of the output
signal at the nth instant depends on the input signal at the nth and previous
N − 1 instants, x (n), x (n − 1), x (n − 2), . . . ,x (n − N + 1). Output signal
depends also on the previous output signal values y(n − 1), y(n − 2),. . . ,
y ( n − L ),
N −1 L
y(n) = ∑ ak (n) x (n − k ) + ∑ bk ( n ) y ( n − k )
k =0 k =1
D ET
X ( n ) = x ( n ) x ( n − 1) . . . x ( n − N + 1)
D ET
Y ( n ) = y ( n − 1) y ( n − 2) . . . y ( n − L )
494 Adaptive Systems
D ET
A ( n ) = a0 ( n ) a1 ( n ) ... a N −1 ( n )
D ET
B(n) = b1 (n) b2 (n) ... b L (n)
y ( n ) = W T ( n ) U ( n ).
e ( n ) = d ( n ) − y ( n ).
W ( n + 1 ) = W ( n ) + µ e ( n ) G ( n ).
L
∂y(n)
αl (n) = = x ( n − l ) + ∑ bk ( n ) α l ( n − k )
∂al k =1
L
∂y(n)
β l (n) = = y ( n − l ) + ∑ bk ( n ) β l ( n − k ) .
∂bl k =1
Ljubiša Stanković Digital Signal Processing 495
In a vector notation
L
G(n) = U(n) + ∑ bk ( n ) G ( n − k ) .
k =1
M = diag(µ1 , µ2 , . . . , µ N + L ).
1.5
coefficients a (n) and b (n)
a (n)
0
1
k
0.5 b (n)
2
k
0 b (n)
1
-0.5
-1 a (n)
1
-1.5
0 20 40 60 80 100 120 140 160 180 200
time index n
3
2
error signal e(n)
1
0
-1
-2
-3
0 20 40 60 80 100 120 140 160 180 200
time index n
Figure 8.31 Identification of an unknown system (from Example 8.26) using the adaptive
recursive system.
time instant the adaptive system coefficients are changed following the rule
H ( n + 1) = H ( n ) + µ ( n ) e ( n ) X ( n ). (8.22)
is the first step towards Kalman filters, and (8.22) now becomes
H ( n + 1 ) = H ( n ) + G ( n ) e ( n ) X ( n ) = H ( n ) + g ( n ) e ( n ). (8.23)
Note that it is assumed that the unknown system is deterministic and non-
stationary. Since the weight error vector can be related with the system
output error e(n) with:
a relation between J MSE and J MSD can be found indicating that the min-
imization of MSD also corresponds to the minimization of MSE. For the
simplicity of derivation we will assume that X(n) is deterministic which is
a common assumption in Kalman filtering literature, although it is usually
treated as a zero-mean process with autocorrelation matrix R in the con-
text of adaptive
R systems.
S If we introduce the weight error covariance matrix
P(n) = E Ĥ(n)Ĥ T (n) , in order to perform the minimization of J MSD , start-
ing from (8.23) a recursive relation for the matrix P(n) is established
B C
Ĥ(n + 1)Ĥ T (n + 1) = Ĥ(n) − g(n)X T (n)Ĥ(n) − g(n)ν(n)
B CT
Ĥ(n) − g(n)X T (n)Ĥ(n) − g(n)ν(n)
B C
P(n + 1)=P(n)− P(n)X(n)g T (n) + g(n)X T (n)P(n)
B C
+ g(n)g T (n) X T (n)P(n)X(n) + ν(n) .
498 Adaptive Systems
The optimal learning gain vector g(n) which provides the control over
both direction and amplitude of adaptation steps in (8.23) is obtained by
solving ∂J MSD (n + 1)/∂g(n) = 0 as
P(n)X(n)
g(n) = G(n)e(n) = , (8.27)
X T (n)P(n)X(n) + σν2
which is known as the Kalman gain. Besides the calculation of (8.27), the
Kalman filter which estimates the optimal time-invariant and deterministic
coefficients for each time instant also includes the coefficients adjustment
P ( n + 1 ) = P ( n ) − g ( n ) X T ( n ) P ( n ). (8.28)
Note that previous algorithm steps for σν2 = 1 can be related with the
RLS algorithm equations.
A generalization of the previous approach assumes time-varying and
stochastic weight vector H∗ (n)
H ∗ ( n + 1 ) = F ( n ) H ∗ ( n ) + q ( n ), (8.29)
T ∗
d(n) = X (n)H (n) + ν(n) (8.30)
H ( n + 1 | n ) = F ( n ) H ( n | n ). (8.32)
Note that the same definition of the weight error vector Ĥ(n|n) =
H∗ − H(n|n) holds, as well as for weight error covariance matrix
M N
P(n|n) = E Ĥ(n|n)Ĥ T (n|n) .
The weight error covariance matrix is updated in the same manner as for
the time-invariant deterministic case
P ( n | n ) = P ( n | n − 1 ) − g ( n ) X T ( n ) P ( n | n − 1 ), (8.33)
with the respect to the new index notation. The general Kalman filter also
includes the prediction step of weight error matrix which easily follows
from its definition
M N
P(n + 1|n) = E Ĥ(n + 1|n)Ĥ T (n + 1|n) = F(n)P(n|n)F T (n) + Q. (8.34)
P ( n | n − 1) X ( n )
g(n) = G(n)e(n) = . (8.35)
X T (n)P(n|n − 1)X(n) + σν2
ε(h0,h1)
6
LMS convergence path
Kalman filter convergence path
1 (h*,h*)
4 0 1
h1
00
0 1 2 3 4 5 6
h0
Figure 8.32 Convergence paths of the LMS algorithm and Kalman filter in the problem of
identification of unknown time-invariant deterministic system. Contour lines are the projec-
tions of the MSE surface on the coefficients plane.
with the training period. It continues through the whole functioning of neu-
ral network.
Neural network can be defined as an artificial cell system capable of
accepting, memorizing and applying empirical knowledge. The knowledge
here means that the neural network can respond to an input from the en-
vironment in an appropriate way. Neural network is connected to the envi-
ronment in two ways: through the inputs where the environment influences
the network and through the outputs where the network responses to envi-
ronment, as it is illustrated in Figure 8.33.
The basic element in a neural network is neuron. It is the elementary
unit for a distributed signal processing in a neural network. A full function-
ality of neural networks is achieved using large number of interconnected
neurons. Connections among neurons are one-directional (the outputs from
one neuron can be used as inputs to the other neuron). They are called
synapses, in analogy with the biological systems.
Possible applications of neural networks include almost all aspects of
modern life, text and speech recognition, optimization of a communication
channel, financial forecasts, detection of a fraud credit card usage, are just a
few examples.
Of course, there are many situations when a usage of neural networks
is not justified. In many cases our knowledge about the system, that we
want to control or observe, is sufficient and complete so the problem can be
solved using classical algorithms, with sequential processing on common
computers.
An ideal system for neural networks realization would use indepen-
dent systems for hardware realization of each neuron. Then the distributed
processing would be most efficient. In the cases of monoprocessor comput-
ers, high efficiency is achieved by using very fast sequential data processing.
Typical examples are computer programs for recognition of a scanned text.
502 Adaptive Systems
x (n) x (n)
1 1 network
function
x (n) x (n)
2 2
y(n) u(n) y(n)
f
...
...
inputs output
activation
function
x (n) x (n)
N N
(a) (b)
Figure 8.34 Neuron schematic symbol (a) and the model based on network and activation
functions (b).
8.10.1 Neuron
The first step in a neuron design is to define its inputs and outputs. In
biological systems the input and output signals to a neuron are electric
potential that can be modelled by real numbers. The same principle is used
in artificial neurons. Illustration of a neuron is given in Figure 8.34(a) for
the case when it has N inputs ( x1 (n), x2 (n), . . . , x N (n)) and one output y(n).
Index n may be a time index, but it can also be understood as a cardinal
number that identifies the input and output index of a neuron.
Neuron represents and algorithm that transforms N input data into
one output signal. It is common to split this algorithm into two parts: 1)
combinatorial process that transforms N input data to one output value
u(n) and 2) the process that produces output signals y(n) based on the
value of u(n). This two-phase model of a neuron is presented in Figure
8.34(b). The algorithm/rule to produce u(n) is called the network function,
while the second part which determines the output value is the activation
function.
Neuron knowledge is accumulated and contained in the way how the
input data are combined, i.e., in the network function.
The basic task of the network function is to combine the input data. The
simplest way of combining N input signals is in their linear weighed com-
bination with coefficients wi , i = 1, 2, ..., N. This is a linear network function.
Because of it simplicity, this type of function is commonly used in neurons.
Ljubiša Stanković Digital Signal Processing 503
The activation function transform the output value from the network func-
tion to an acceptable output value. A common requirement is that the out-
put values have limited range. Thus, most of the activation functions have
a bounded interval of real numbers as its codomain, like for example, [0, 1]
or [−1, 1] or a set of binary digits. Forms of commonly used activation func-
tions are presented in table. The most important functions from this set are
the unipolar threshold function and the unipolar sigmoid. Some of the acti-
vation functions are presented in Figure 8.35 as well.
504 Adaptive Systems
u u u
unipolar sigmoid unipolar threshold function Gaussian function
u u u
Function Formula
Linear f (u) = u
⎧
⎨ 1 za u>1
Linear with a limiter f (u) = u za −1 ≤ u ≤ 1
⎩
! −1 za u < −1
1 za u > 0
Threshold function (unipolar) f (u) =
! 0 za u < 0
1 za u > 0
Threshold function (bipolar) f (u) =
−1 za u < 0
1
Sigmoid (unipolar) f (u) =
1 + exp(−u)
2
Sigmoid (bipolar) f (u) = −1
1 + exp(−2u)
Inverse tangent function f (u) = π2 arctan
B (u) C
( u − m )2
Gauss function f (u) = exp σ2
x1(n)
x1(n) y(n)
x2(n) y(n)
x (n)
2
(a) (b)
Figure 8.36 Neural network topology: acyclic (a) and cyclic (b).
y1(n)
x (n)
1
y (n)
2
x2(n)
input y (n)
layer 3
output
layer
x1(n)
y1(n)
x2(n) output
layer
input hidden
layer layer II
hidden
layer I
hidden
layer III
1. Data for the network training are acquired. This data consists of
the input-output pairs. The output data are assumed, estimated or
obtained trough experiments. This set of training data pairs if finite.
Denote the number of available input-output pairs by K.
2. The network is initiated, commonly by using random parameters of
neurons (if an a priori information about the range of their values does
not exist). After the initialization, the iterative training procedure is
implemented as follows:
where it has been assumed that the neuron has N input data. The weighting
coefficients wk represent „knowledge” that the network should get through
the training procedure. This knowledge will be then used in real situations.
The vector notation is
⎡ ⎤ ⎡ ⎤
x1 ( n ) w1
⎢ x2 ( n ) ⎥ ⎢ w2 ⎥
⎢ ⎥ ⎢ ⎥
X(n) = ⎢ . ⎥ W=⎢ . ⎥ .
⎣ .. ⎦ ⎣ .. ⎦
x N ( n ) N ×1 w N N ×1
Ljubiša Stanković Digital Signal Processing 509
u(n) = W T X(n)
= X T (n) W.
In the case when d(n) = 1 and y(n) = 0 it means that Wnew = Wold + µX(n)
or
T T
Wnew X(n) = Wold X(n) + µX T (n)X(n)
T
= Wold X(n) + µ||X(n)||22 ,
where ||X(n)||22 is the squared norm two of vector X(n) (sum of its squared
elements). The value of W T X(n) is increased for µ||X(n)||22 , what was the
aim. If d(n) = 0 and y(n) = 1 then Wnew = Wold − µX(n) holds, meaning
that W T X(n) is reduced for µ||X(n)||22 .
The coefficient µ is the learning coefficient. It is positive. The choice
of parameter µ value is of great importance for the rate of convergence
and learning process of the network. Larger values may reduce the learning
period, but also may influence the convergence of the training process.
Example 8.28. Consider a one-neuron neural network. Assume that the activation
function of the neuron is unipolar threshold function and that the neuron
Ljubiša Stanković Digital Signal Processing 511
is biased. The network has three inputs and one output. Set of data for the
neural network training is
⎡ ⎤
1 1 0 0 0 1
X = ⎣1 0 1 1 0 0⎦
0 1 1 0 1 0
D E
D= 1 1 0 0 0 0
where matrix X contains the input data and vector D consists of desired
outputs from the neural network for the considered input data values. Train
the neural network with µ = 0.5.
⋆Since the neuron is biased one more input will be introduced. Its
input value is always 1. After this modification the matrix of input data is
⎡ ⎤
1 1 1 1 1 1
⎢1 1 0 0 0 1⎥
X=⎢ ⎣1 0 1 1 0 0⎦ .
⎥
0 1 1 0 1 0
Initial values of weighting coefficients are random, for example,
⎡ ⎤ ⎡ ⎤
w0 −1
⎢ w1 ⎥ ⎢ 1 ⎥
W=⎢ ⎥ ⎢ ⎥
⎣ w2 ⎦ = ⎣ 1 ⎦ .
w3 0
Now we can start the first epoch of training process. We will use
all input-output data pairs and calculate the output y(n) from the neural
network. The output y(n) will be compared with the desired value d(n) and
the coefficients W will be appropriately modified for each pair of data.
For the first par of data we have
⎡ ⎤
1
B C D E ⎢1⎥
T
y (1) = f W X (1) = f ( −1 1 1 0 ⎣ ⎥ ⎢ ) = 1.
1⎦
0
Since d(1) = 1 the error d(n) − y(n) is 0 and the coefficients are not modified.
For the second pair of data
B C
y(2) = f W T X(2) = 0.
The desired value is d(2) = 1. Since the error is not zero, the coefficients
should be modified as
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
−1 1 −0.5
⎢ 1 ⎥ ⎢1⎥ ⎢ 1.5 ⎥
Wnew = Wold + µ(d(2) − y(2))X(2) = ⎢ ⎥ ⎢ ⎥ ⎢
⎣ 1 ⎦ + 0.5 ⎣0⎦ = ⎣ 1 ⎦ .
⎥
0 1 0.5
512 Adaptive Systems
Next pair of input-output data is used. After all data pairs are used, the first
epoch of training is finished. Nonzero error appeared in three out of six data
pairs. The final value of the coefficients, after the first training epoch, is
D ET
Wepoch 1 = −1.5 1 0.5 0 .
With this initial value, the second epoch of training is completed, using
the same input-output pairs of data. After the second epoch nonzero error
appeared two times. The final values of the coefficients, after the second
epoch, are
D ET
Wepoch 2 = −1.5 1 1 0 .
The process is continued in the third epoch. In the fifth epoch we came
to the situation that the neural network has made no error. It means that the
training is completed and that more epochs are not needed. The final values
of the coefficients are
D ET
W = −1.5 1.5 0.5 0.5 .
In this kind of neural networks the output signal is not binary, but a real
number (usually within the interval from 0 to 1). It may be interpreted as
a probability that the input data contain or do not contain certain property.
In general any interval of real numbers can be a codomain of the output
function. The main difference from the perceptron is that we do not require
that the neural network achieves an exact precision y(n) − d(n) = 0. In this
case the aim to get a small error in the processing of input results.
Since the output variable is continuous, the activation function should
have such a property as well. Consider, for example, the unipolar sigmoid
activation function
1
f (u) = .
1 + e−u
A simple way to quantify the difference of the output signal from the
desired signal is to use the square error
1
ε(n) = (d(n) − y(n))2
2
d 1 −e−u 1 e−u
f ′ (u) = = − = =
du 1 + e−u (1 + e − u )2 1 + e−u 1 + e−u
* +
1
= f (u) 1 − = f (u) (1 − f (u)) .
1 + e−u
Therefore
∂ε(n)
= −(d(n) − y(n)) y(n) (1 − y(n)) xk (n),
∂wk
where ( )
n
f ∑ wk x k (n ) = y(n)
k =1
514 Adaptive Systems
or in vector form
This rule is called delta-rule. Note that the letter δ is also used for Dirac delta
pulse in some chapters of the book. These two values do not have anything
in common.
For the activation function in the form of bipolar sigmoid
2 1 − e−2u
f (u) = − 1 =
1 + e−2u 1 + e−2u
we would have
Example 8.29. Neural network consists of one unbiased neuron with two input
signals and a sigmoid activation function. Input values are random numbers
from the interval [0, 1]. Available are K = 30 input-output pairs of data.
Training of of the neural network should be done in 30 epochs with µ = 2.
Data for network training are obtained as a set of 30 input values of x1
and x2 . They are assumed as random numbers from the interval from 0 to 1
with a uniform probability density function. For each training pair of random
Ljubiša Stanković Digital Signal Processing 515
numbers x1 and x2 the desired output data is calculated using the formula
1 x1 − 2x2
d= + .
2 3 + x12 + 3x22
Find the total square error after the first, second, fifth and thirtieth
epoch. What are the coefficient values at the end of training process? If the
input values x1 = 0.1 and x2 = 0.8 are applied to the network after the training
process is completed find the output value y and compare it with the desired
result d calculated using the formula.
⋆Coefficients of the neuron are w1 and w2 . With the sigmoid activation
function the coefficient corrections are
- . - . - .
w1 w x (n)
= 1 + µ (d(n) − y(n)) y(n) (1 − y(n)) 1 ,
w2 new w2 old x2 ( n )
where index n assumes values from 1 to 30 within one epoch. It denotes the
index of the input-output pair of data. The output y is calculated using
B C
y(n) = f W T X(n) = f (w1 x1 (n) + w2 x2 (n)) .
y = f (w1 x1 + w2 x2 ) = 0.1904
1 x1 − 2x2
d= + = 0.1957.
2 3 + x12 + 3x22
The error is small. The task for neural network in this example was to find a
complex, nonlinear relation between the input and output data.
∂ε(n) B C
= −(d(n) − y(n)) f ′ V T U(n) um (n)
∂vm
= −(d(n) − y(n)) y(n) (1 − y(n)) um (n).
where
∂ε(n) B B CC B C
= −(d(n) − y(n)) f ′ V T f W T X(n) vk f ′ WkT X(n) x p (n).
∂w pk
The pth element of vector X(n) is denoted by x p (n), while the kth element
P Q
of vector V is vk . Taking into account that uk (n) = f WkT X(n) we get
∂ε(n)
= −(d(n) − y(n)) y(n) (1 − y(n)) vk [uk (n)(1 − uk (n))] x p (n).
∂w pk
where δn2 denotes the learning rule for the considered layer of neurons. In
vector form we can write
This is the modification formula for all coefficients of one neuron in the
hidden layer. The modification can be generalized to all neurons in the
hidden layer
Example 8.30. Consider a two-layer neural network with two neurons in the
hidden layer and one neuron in the output layer. The activation function for
all neurons is the unipolar sigmoid. The task for this neural network is to find
unknown relation between the input and output data. Step µ = 5 is used in
the training process. The data for the training are formed as in Example 8.29,
i.e., as a set of K = 30 input data x1 and x2 that are uniformly distributed
random numbers from the interval 0 ≤ x1 ≤ 1, 0 ≤ x2 ≤ 1. For each training
input value of x1 and x2 the desired signal is calculated as
1 x1 − 2x2
d= + .
2 3 + x12 + 3x22
Find the total square error after 10th, 100th, and 300th epoch. What
are the coefficients of the neurons after the training process? If the values of
x1 = 0.1 and x2 = 0.8 input the trained neural network find the output y and
compare it with the desired result d.
⋆The training process is implement on a computer and the following
results are obtained: Total square error after 10 epochs of training is 0.1503.
After 100 epochs the total square error is reduced to 0.0036, while the squared
error after 300 epochs is 0.0003. The final coefficient values in the hidden and
output layers, W and V, are
- . - .
−0.2911 1.8297 −2.6173
W= V= .
3.4435 −0.6945 2.5889
The error is very small. As expected this result is better than in the case of
one-layer neural network (Example 8.29). However, the calculation process
is significantly more demanding.
After the training process we may expect that each of the neurons
recognizes one category (belonging to one group) of the input signals. If
an uncategorized input signal appears it means that the estimation of the
number of neurons is not good. It should be increased and the training
process should be continued. When two neuron adjust to the same category,
then they produce the same result and one of them can be eliminated. In this
way, we may avoid the assumption that the number of categories (groups)
or neurons N is known in advance.
Example 8.31. Consider a neural network with two input data and 3 neurons. The
task of neural network is to classify the input data in one of three categories.
Each neuron corresponds to one category. The classification decision is made
by choosing the neuron with the highest output. Activation function is a
bipolar sigmoid.
Simulate the neural network in the case when the input data belongs to
one of thee categories with equal probability. Data from the first category are
pairs of Gaussian random variables with probability density function whose
means are x̄1 = 0 and x̄2 = 4 and variances are σx21 = 4 and σx22 = 0.25. For the
data from the second category the mean values and variances of Gaussian
variables are x̄1 = 4, x̄2 = −2, σx21 = 1 and σx22 = 4. In the third category are
the input data with x̄1 = −4 and x̄2 = −2, σx21 = 1 and σx22 = 1. during the
training process the step µ = 0.5 is used.
⋆Results achieved by neural network after 10 and 100 pairs of input
data are presented in Figure 8.39. The categories are indicated with different
colors. Learning process of the neural network in the input classification of
data is fast.
2
0 0
x
-2 -2
-4 -4
-6 -6
-8 -8
-8 -6 -4 -2 0 2 4 6 8 -8 -6 -4 -2 0 2 4 6 8
x1 x
1
Figure 8.39 Example of unsupervised training of a neural network. Input data are classified
into three categories. Regions obtained by the neural network after 10 and 100 input data, are
presented in different colors in the plane of input data.
and a control network that interprets the output data from these neural
networks. All networks are trained to solve the same kind of problem,
meaning that the same data are used as input in all of them, while the control
network decides about the final result, for example using the principle of
majority of votes.
The mixture of experts is a set of neural networks, where each of them
is trained to process one type of the input data. Control network, in this
case, has to choose one or more experts (neural networks) which are trained
for the type of data that appears the input.
Chapter 9
Time-Frequency Analysis
_________________________________________________
Authors: Ljubiša Stanković, Miloš Daković, Thayaparan Thayananthan
521
522 Time-Frequency Analysis
w(τ)
x(t)
τ
t
t
x(t+τ)w(τ)
The idea behind the short-time Fourier transform (STFT) is to apply the
Fourier transform to a portion of the original signal, obtained by introduc-
ing a sliding window function w(t) to localize the analyzed signal x (t). The
Fourier transform is calculated for the localized part of the signal. It pro-
duces the spectral content of the portion of the analyzed signal within the
time interval defined by the width of the window function. The STFT (a
time-frequency representation of the signal) is then obtained by sliding the
window along the signal. Illustration of the STFT calculation is presented in
Fig.9.1.
Analytic formulation of the STFT is
"∞
STFT (t, Ω) = x (t + τ ) w(τ ) e− jΩτ dτ. (9.1)
−∞
From (9.1) it is apparent that the STFT actually represents the Fourier
transform of a signal x (t), truncated by the window w(τ ) centered at
Ljubiša Stanković Digital Signal Processing 523
instant t (see Fig. 9.1). From the definition, it is clear that the STFT satisfies
properties inherited from the Fourier transform (e.g., linearity).
By denoting xt (τ ) = x (t + τ ) we can conclude that the STFT is the
Fourier transform of the signal xt (τ )w(τ ),
Example 9.1. To illustrate the STFT application, let us perform the time-frequency
analysis of the following signal
STFT (t, Ω) = w(t1 − t)e− jΩ(t1 −t) + w(t2 − t)e− jΩ(t2 −t)
+ W (Ω − Ω1 )e jΩ1 t + W (Ω − Ω2 )e jΩ2 t , (9.4)
where W (Ω) is the Fourier transform of the used window. The STFT is de-
picted in Fig. 9.2 for various window lengths, along with the ideal represen-
tation. A wide window w(t) in the time domain is characterized by a narrow
Fourier transform W (Ω) and vice versa. Influence of the window to the re-
sults will be studied later.
Ω2 Ω2
Ω1 Ω1
(a) (b)
t t t t t t
1 2 1 2
Ω2 Ω2
Ω1 Ω1
(c) (d)
t1 t2 t t1 t2 t
Figure 9.2 Time-frequency representation of the sum of two delta pulses and two sinusoids
obtained by using (a) wide window, (b) narrow window (c) medium width window and (d)
ideal time-frequency representation.
"∞
2
STFT (t, Ω) = e ja(t+τ ) w(τ )e− jΩτ dτ
−∞
"T =
2 2 2 2πj
e ja(t+τ ) w(τ )e− jΩτ dτ ≃ e jat e j(2at−Ω)τ0 e jaτ0 w(τ0 )
2a
−T
* +=
jat2 − j(2at−Ω)2 /4a Ω − 2at πj
=e e w (9.6)
2a a
since
2a(t + τ0 ) = Ω.
Ljubiša Stanković Digital Signal Processing 525
In this case, the width of |STFT (t, Ω)| along frequency does not decrease with
the increase of the window w(τ ) width. The width of |STFT (Ω, t)| around the
central frequency Ω = 2at is
D = 4aT,
where 2T is the window width in the time domain. Note that this relation
holds for a wide window w(τ ), such that the stationary phase method may
be applied. If the window is narrow with respect to the phase variations of
the signal, the STFT width is defined by the width of the Fourier transform
of window. It is proportional to 1/T. Thus, the overall STFT width could
be approximated by a sum of the frequency variation caused width and the
window’s Fourier transform width, that is,
2c
Do = 4aT + , (9.8)
T
where c is a constant defined by the window shape (by using the main lobe
as the window width, it will be shown later that c = 2π for a rectangular
window or c = 4π for a Hann(ing) window). This relation corresponds to
the STFT calculated as a convolution of an appropriately scaled time domain
window whose width is |τ | < 2aT and the frequency domain form of window
W (Ω). The approximation is checked against the exact STFT calculated by
definition. The agreement is almost complete, Fig.9.3. Therefore, there is a
window width T producing the narrowest possible STFT for this signal. It is
obtained by equating the derivative of the overall width to zero,
2c
4a − = 0,
T2
which results in =
c
To = . (9.9)
2a
As expected, for a sinusoid, a → 0, To → ∞. This is just an approximation
of the optimal window, since for narrow windows we may not apply the
stationary phase method (the term 4aT is then much smaller than 2c/T and
may be neglected anyway).
Note that for a = 1/2, when the instantaneous frequency is a symmetry
line for the time and the frequency axis
2c 2c
2− = 0 or 2T = ,
T2 T
meaning that the optimal window should have the widths equal in the time-
domain 2T and in the frequency domain 2c/T (main lobe width).
526 Time-Frequency Analysis
Figure 9.3 Exact absolute STFT value of a linear FM signal at t = 0 for various window
widths T = 2, 4, 8, 16, .., 1024 (left) and its approximation calculated as an appropriately scaled
convolution of the time and frequency domain window w(τ ) (right).
"∞ "∞
1
STFT (t, Ω) = X (θ ) e j(t+τ )θ w(τ ) e− jΩτ dθ dτ
2π
−∞ −∞
"∞ @ A
1
= X (θ )W (Ω − θ ) e jtθ dθ = X (Ω)e jtΩ ∗Ω W (Ω). (9.10)
2π
−∞
x (t)
x (t)
1
2
(a) (b)
t t
|X (Ω)|
|X (Ω)|
1
2
(c) (d)
Ω Ω
Figure 9.4 Two different signals x1 (t) ̸= x2 (t) with the same amplitudes of their Fourier
transforms, i.e., | X1 (Ω)| = | X2 (Ω)|.
Example 9.3. For illustration consider two different signals x1 (t) and x2 (t) pro-
ducing the same amplitude of the Fourier transform, Fig. 9.4,
* + ( * + )
t t 16 t − 128 2
x1 (t) = sin 122π − cos 42π − π
128 128 11 64
( * +2 * + )
t t − 128 t − 120 3 −( t−140 )2
− 1.2 cos 94π − 2π −π e 75
128 64 64
( * + )
t t − 50 2 −( t−50 )2
− 1.6 cos 15π − 2π e 16 (9.11)
128 64
x2 (t) = x1 (255 − t).
Their spectrograms are presented in Fig.9.5. From the spectrograms we can
follow time variations of the spectral content. The signals obviously consist
of one constant high frequency component, one linear frequency component
(in the first signal with increasing frequency as time progresses, and in the
second signal with decreasing frequency), and two chirps (one appearing at
different time instants and the other having different frequency variations).
SPEC1(t,Ω)
250
200
150
100
t
50
(a)
0 2.5 3
0.5 1 1.5 2
0
Ω
SPEC2(t,Ω)
250
200
150
100
t
50
(b)
0 2.5 3
0.5 1 1.5 2
0
This relation can be theoretically used for the signal within the region
w(τ ) ̸= 0. In practice it is used within the region of significant window w(τ )
values.
If the window is shifted for R, for each next STFT calculation, then a
set of values
"∞
x (t0 + iR + τ )w(τ ) = STFT (t0 + iR, Ω)e− jΩτ dτ
−∞
Ljubiša Stanković Digital Signal Processing 529
is obtained. If the value of step R is smaller than the window duration then
the same signal value is used within two (several) windows. Using a change
of variables iR + τ = λ and summing over all overlapping windows we get
"∞
x (t0 + λ) ∑ w(λ − iR) = ∑ STFT (t0 + iR, Ω)e− jΩλ e jΩiR dλ.
i i −∞
Values of i in the summation are such that for a given λ and R the value of
iR − λ = τ is within the window w(τ ).
If the sum of shifted versions of the windows is constant (without loss
of generality assume equal to 1), ∑ w(τ − iR) = 1, then
i
"∞
x ( t0 + λ ) = ∑ STFT (t0 + iR, Ω)e− jΩλ e jΩiR dλ
i −∞
9.2 WINDOWS
The window function plays a crucial role in the localization of the signal
in the time-frequency plane. The most commonly used windows will be
presented next.
It is defined by
!
1 − |τ/T | for |τ | < T
w(τ ) = (9.14)
0 elsewhere.
4 sin2 (ΩT/2)
WT (Ω) = . (9.15)
Ω2
Convergence of this function toward zero as Ω → ±∞ is of the 1/Ω2 order.
It is a continuous function of time, with discontinuities in the first derivative
at t = 0 and t = ± T. The mainlobe of this window function is twice wider
in the frequency domain than in the rectangular window case. Its width
follows from ΩT/2 = π as dΩ = 4π/T.
The discrete-time form is
- .
2 |n|
w(n) = 1 − [u(n + N/2) − u(n − N/2)].
N
Ljubiša Stanković Digital Signal Processing 531
N/2−1 - .
2 |n| − jωn sin2 (ωN/4)
W (e jω ) = ∑ 1− e = .
n=− N/2
N sin2 (ω/2)
Since cos (πτ/T ) = [exp ( jπτ/T ) + exp (− jπτ/T )]/2, the Fourier
transform of this window is related to the Fourier transform of the rect-
angular window of the same width as
1 1 1
WH (Ω) = WR (Ω) + WR (Ω − π/T ) + WR (Ω + π/T )
2 4 4
π 2 sin(ΩT )
= . (9.17)
Ω ( π 2 − Ω2 T 2 )
N N N
W (k ) = δ ( k ) + δ ( k + 1 ) + δ ( k − 1 ).
2 4 4
1
DFT{ x (n)w(n)} = DFT{ x (n)} ∗k DFT{w(n)}
N
1 1 1
= X ( k + 1) + X ( k ) + X ( k − 1)
4 2 4
Example 9.4. Find the window that will correspond to the frequency smoothing
( X (k + 1) + X (k) + X (k − 1))/3, i.e., to
1
DFT{ x (n)w(n)} = DFT{ x (n)} ∗k DFT{w(n)}
N
1 1 1
= X ( k + 1) + X ( k ) + X ( k − 1).
3 3 3
Example 9.5. Find the formula to calculate the STFT with a Hann(ing) window, if
the STFT calculated with a rectangular window is known.
⋆From the frequency domain STFT definition
"∞
1
STFT (t, Ω) = X (θ )W (Ω − θ )e jtθ dθ
2π
−∞
1 1 1
WH (Ω) = WR (Ω) + WR (Ω − π/T ) + WR (Ω + π/T ),
2 4 4
then
1
STFTH (t, Ω) = STFTR (t, Ω) (9.18)
2
1 B πC 1 B πC
+ STFTR t, Ω − + STFTR t, Ω + . (9.19)
4 T 4 T
Ljubiša Stanković Digital Signal Processing 533
For the Hann(ing) window w(τ ) of the width 2T, we may roughly
assume that its Fourier transform WH (Ω) is nonzero within the main lattice
| Ω |< 2π/T only, since the sidelobes decay very fast. Then we may write
dΩ = 4π/T. It means that the STFT is nonzero valued in the shaded regions
in Fig. 9.2.
We see that the duration in time of the STFT of a delta pulse is equal
to the widow width dt = 2T. The STFTs of two delta pulses (very short du-
ration signals) do not overlap in time-frequency domain if their distance
is greater than the window duration |t1 − t2 | > dt . Then, these two pulses
can be resolved. Thus, the window width is here a measure of time res-
olution. Since the Fourier transform of the Hann(ing) window converges
fast, we can roughly assume that a measure of duration in frequency is
the width of its mainlobe, dΩ = 4π/T. Then we may say that the Fourier
transforms of two sinusoidal signals do not overlap in frequency if the con-
dition |Ω1 − Ω2 | > dΩ holds. It is important to observe that the product of
the window durations in time and frequency is a constant. In this example,
considering time domain duration of the Hann(ing) window and the width
of its mainlobe in the frequency domain, this product is dt dΩ = 8π. There-
fore, if we improve the resolution in the time domain dt , by decreasing T, we
inherently increase the value of dΩ in the frequency domain. This essentially
prevents us from achieving the ideal resolution (dt = 0 and dΩ = 0) in both
domains. A general formulation of this principle, stating that the product of
effective window durations in time and in frequency cannot be arbitrarily
small, will be presented later.
resulting in
a = 25/46 ∼
= 0.54. (9.21)
This window has several sidelobes, next to the mainlobe, lower than the
previous two windows. However, since it is not continuous at t = ± T, its
decay in frequency, as Ω → ±∞, is not fast. Note that we let the mainlobe
to be twice wider than in the rectangular window case, so we cancel out not
the first but the second sidelobe, at its maximum.
The discrete-time domain form is
- * +.
2πn
w(n) = 0.54 + 0.46 cos [u(n + N/2) − u(n − N/2)]
N
with
W (k ) = 0.54Nδ(k ) + 0.23Nδ(k + 1) + 0.23Nδ(k − 1).
!
0.42 + 0.5 cos (πτ/T ) + 0.08 cos (2πτ/T ) for |τ | < T
w(τ ) = (9.22)
0 elsewhere.
"∞
STFT (t, Ω) = x (t + τ )w(τ )e− jΩτ dτ
−∞
∞
≃ ∑ x ((n + m)∆t)w(m∆t)e− jm∆tΩ ∆t.
m=−∞
536 Time-Frequency Analysis
10 log|W(Ω)|
W(Ω)
w(τ)
τ Ω Ω
10 log|W(Ω)|
W(Ω)
w(τ)
τ Ω 10 log|W(Ω)| Ω
W(Ω)
w(τ)
τ Ω Ω
10 log|W(Ω)|
W(Ω)
w(τ)
τ Ω Ω
10 log|W(Ω)|
W(Ω)
w(τ)
τ Ω Ω
Figure 9.6 Windows in the time and frequency domains: rectangular window (first row),
triangular (Bartlett) window (second row), Hann(ing) window (third row), Hamming window
(fourth row), and Blackman window (fifth row).
Ljubiša Stanković Digital Signal Processing 537
0 0
|STFT(0,Ω)|
|STFT(0,Ω)|
10 10
-5 -5
10 10
0 0
|STFT(0,Ω)|
|STFT(0,Ω)|
10 10
-5 -5
10 10
Figure 9.7 The STFT at n = 0 calculated using the Hamming window (left) and the Blackman
window (right) of signals x1 (n) (top) and signal x2 (n) (bottom).
By denoting
x (n) = x (n∆t)∆t
∞
STFT (n, ω ) = ∑ w(m) x (n + m)e− jmω . (9.23)
m=−∞
We will use the same notation for continuous-time and discrete-time signals,
x (t) and x (n). However, we hope that this will not cause any confusion since
we will use different sets of variables, for example t and τ for continuous
time and n and m for discrete time. Also, we hope that the context will be
always clear, so that there is no doubt what kind of signal is considered.
538 Time-Frequency Analysis
∞
STFT (n, ω ) = ∑ STFT (n∆t, Ω + 2kΩ0 ) with ω = ∆tΩ.
k=−∞
π π
∆t = ≤
Ω0 Ωm
N/2−1
STFT (n, k ) = STFT (n, ω )|ω = 2π k = ∑ w(m) x (n + m)e− j2πmk/N (9.24)
N
m=− N/2
for a given instant n. When the DFT routines with indices from 0 to N − 1
are used, then a shifted version of w(m) x (n + m) should be formed for the
calculation for N/2 ≤ m ≤ N − 1. It is obtained as w(m − N ) x (n + m − N ),
since in the DFT calculation periodicity of the signal w(m) x (n + m), with
period N, is inherently assumed.
Example 9.7. Consider a signal with M = 16 samples, x (0), x (1),...., x (15), write
a matrix form for the calculation of a four-sample STFT. Present nonoverlap-
ping and overlapping cases of the STFT calculation.
Ljubiša Stanković Digital Signal Processing 539
or
STFT(n) = W4 x(n)
with STFT (n) = [STFT (n, −2) STFT (n, −1) STFT (n, 0) STFT (n, 1)] T , x(n) =
[ x (n − 2) x (n − 1) x (n) x (n + 1)] T , and W4 is the DFT matrix of order
four with elements W4mk = exp(− j2πmk/N ). Here a rectangular window
is assumed. Including the window function, the previous relation can be
written as
STFT (n)= W4 H4 x(n),
with ⎡ ⎤
w(−2) 0 0 0
⎢ 0 w(−1) 0 0 ⎥
H4 = ⎢
⎣
⎥
0 0 w (0) 0 ⎦
0 0 0 w (1)
being a diagonal matrix whose elements are the window values w(m),
H4 =diag(w(m)), m = −2, −1, 0, 1 and
⎡ ⎤
w(−2)W44 w(−1)W42 w (0) w(1)W4−2
⎢ w(−2)W 2 w(−1)W41 w (0) w(1)W4−1 ⎥
W4 H 4 = ⎢
⎣
4 ⎥.
⎦
w(−2) w(−1) w (0) w (1)
w(−2)W4−2 w(−1)W4−1 w (0) w(1)W4 1
Assuming that the values of the signal with amplitudes bellow 1/e4 could be
neglected, find the sampling rate for the STFT-based analysis of this signal.
Write the approximate spectrogram expression for the Hann(ing) window of
N = 32 samples in the analysis. What signal will be presented in the time-
frequency plane, within the basic frequency period, if the signal is sampled
at ∆t = 1/128?
⋆The time interval, with significant signal content, for the first signal
component is −2 ≤ t ≤ 2, with the frequency content within −56π ≤ Ω ≤
−8π, since the instantaneous frequency is Ω(t) = −12πt − 32π. For the
second component these intervals are 0 ≤ t ≤ 2 and 160π ≤ Ω ≤ 224π. The
maximal frequency in the signal is Ωm = 224π. Here we have to take into
account possible spreading of the spectrum caused by the lag window. Its
width in the time domain is dt = 2T = N∆t = 32∆t. Width of the mainlobe
in frequency domain dw is defined by 32dw ∆t = 4π, or Ωw = π/(8∆t).
Thus, taking the sampling interval ∆t = 1/256, we will satisfy the sampling
theorem condition in the worst instant case, since π/(Ωm + dw ) = 1/256.
In the case of the Hann(ing) window with N = 32 and ∆t = 1/256,
the lag interval is N∆t = 1/8. We will assume that the amplitude variations
within the window are small, that is, w(τ )e−(t+τ ) ∼
2 2
= w(τ )e−t for −1/16 <
τ ≤ 1/16. Then, according to the stationary phase method, we can write the
Ljubiša Stanković Digital Signal Processing 541
STFT approximation,
2
B C B C
1 −8( t −1)2 2
|STFT (t, Ω)|2 = 16 e−2t w2 Ω+12πt+32π
12π + 32 e w Ω−32πt−160π
32π
with approximation,
2
B C B C
1 −8( t −1)2 2
|STFT (t, Ω)|2 = 16 e−2t w2 Ω+12πt+32π
12π + 32 e , w Ω−32πt−96π
32π
(9.25)
with t = n/128 and Ω = 128ω within −π ≤ ω < π or −128π ≤ Ω < 128π.
For the rectangular window, the STFT values at an instant n can be calcu-
lated recursively from the STFT values at n − 1, as
This recursive formula follows easily from the STFT definition (9.24).
For other window forms, the STFT can be obtained from the STFT
obtained by using the rectangular window. For example, according to (9.18)
the STFT with Hann(ing) window STFTH (n, k ) is related to the STFT with
rectangular window STFTR (n, k ) as
1 1 1
STFTH (n, k ) = STFTR (n, k ) + STFTR (n, k − 1) + STFTR (n, k + 1).
2 4 4
k
(-1) e j2kπ/N
x(n+N/2-1) + + STFT (n,k)
R
-N
z z-1
-1
a1
STFT (n,k+1)
R
a
STFTR(n,k) 0 STFTH(n,k)
+
a
STFTR(n,k-1) -1
Figure 9.8 Recursive implementation of the STFT for the rectangular and other windows.
1 1 1
( a −1 , a 0 , a 1 ) = ( , , ),
4 2 4
( a−1 , a0 , a1 ) = (0.23, 0.54, 0.23),
( a−2 , a−1 , a0 , a1 , a2 ) = (0.04, 0.25, 0.42, 0.25, 0.04)
STFT(n,0)
w(n) ↓R
STFT(n,1)
w(n) e
j2πn/N ↓R
x(n)
...
STFT(n,N-1)
w(n) e
j2πn(N-1)/ N ↓R
"∞
STFT (t, Ω) = x (t + τ ) w(τ ) e− jΩτ dτ
−∞
"∞ @ A
= x (t − τ )w(τ )e jΩτ dτ = x (t) ∗t w(t)e jΩt
−∞
what is illustrated in Fig.9.9. The next STFT can be calculated with time step
R∆t, meaning downsampling in time with factor 1 ≤ R ≤ N. Two special
cases are: no downsampling, R = 1, and nonoverlapping calculation, R = N.
Influence of R to the signal reconstruction will be discussed later.
544 Time-Frequency Analysis
Nonoverlapping cases are important and easy for analysis. They also keep
the number of the STFT coefficients equal to the number of the signal sam-
ples. However, the STFT is commonly calculated using overlapping win-
dows. There are several reasons for introducing overlapped STFT repre-
sentations. Rectangular windows have poor localization in the frequency
domain. The localization is improved by other window forms. In the case of
nonrectangular windows some of the signal samples are weighted in such
a way that their contribution to the final representation is small. Then we
want to use additional STFT with a window positioned in such a way that
these samples contribute more to the STFT calculation. Also, in the param-
eters estimation and detection the task is to achieve the best possible esti-
mation or detection for each time instant instead of using interpolations for
the skipped instants when the STFT with a big step (equal to the window
width) is calculated. Commonly, the overlapped STFTs are calculated using,
for example, rectangular, Hann(ing), Hamming, Bartlett, Kaiser, or Black-
man window of a constant window width N with steps N/2, N/4, N/8, ...
in time. Computational cost is increased in the overlapped STFTs since more
STFTs are calculated. A way of composing STFTs calculated with a rectan-
gular window into a STFT with, for example, the Hann(ing), Hamming, or
Blackman window, is presented in Fig.9.8.
If a signal x (n) is of duration M, in some cases in addition to the
overlapping in time, an interpolation in frequency is done, for example up
to the DFT grid with M samples. The overlapped and interpolated STFT of
this signal is calculated, using a window w(m) whose width is N ≤ M, as
N/2−1
STFTN (n, k) = ∑ w(m) x (n + m)e− j2πmk/M
m=− N/2
n = N/2 + 1, N/2 + 2, .., M − N/2
k = − M/2, − M/2 + 1, ..., −1, 0, 1, ..., M/2 − 1.
Example 9.9. The STFT calculation of a signal whose frequency changes linearly is
done by using a rectangular window. Signal samples within 0 ≤ n ≤ M − 1
with M = 64 were available. The nonoverlapping STFT of this signal is
calculated with a rectangular window of the width N = 8 and presented in
Fig.9.10. The nonoverlapping STFT values obtained by using the rectangular
window are shifted in frequency, scaled, and added up, Fig. 9.11, to produce
the STFT with a Hamming window, Fig. 9.12.
The STFT calculation for the same linear FM signal will be repeated for
the overlapping STFT with step R = 1. Results for the rectangular and Ham-
ming window (obtained by a simple matrix calculation from the rectangular
Ljubiša Stanković Digital Signal Processing 545
Figure 9.10 The STFT of a linear FM signal x (n) calculates using a rectangular window of the
width N = 8.
window case) are presented in Fig.9.13. Three window widths are used here.
The same procedure is repeated with the windows zero padded up to the
widest used window (interpolation in frequency). The results are presented
in Fig.9.14. Note that regarding to the amount of information all these fig-
ures do not differ from the basic time-frequency representation presented in
Fig.9.10.
546 Time-Frequency Analysis
Figure 9.11 The STFT of a linear FM signal calculated using a rectangular window (from the
previous figure), along with its frequency shifted versions STFTR (n, k − 1) and STFTR (n, k −
1). Their weighted sum produces the STFT of the same signal with a Hamming window
STFTH (n, k).
Figure 9.12 The STFT of a linear FM signal x (n) calculated using the Hamming window with
N = 8. Calculation is illustrated in the previous figure.
STFT with rectangular window, N=48 STFT with Hamming window, N=48
STFT with rectangular window, N=16 STFT with Hamming window, N=16
STFT with rectangular window, N=8 STFT with Hamming window, N=8
Figure 9.13 Time-frequency analysis of a linear frequency modulated signal with overlapping
windows of various widths. Time step in the STFT calculation is R = 1.
Ljubiša Stanković Digital Signal Processing 549
STFT with rectangular window, N=48 STFT with Hamming window, N=48
STFT with rectangular window, N=16 STFT with Hamming window, N=16
STFT with rectangular window, N=8 STFT with Hamming window, N=8
Figure 9.14 Time-frequency analysis of a linear frequency modulated signal with overlapping
windows of various widths. Time step in the STFT calculation is R = 1. For each window
width the frequency axis is interpolated (signal in time is zero padded) up to the total number
of available signal samples M = 64.
550 Time-Frequency Analysis
6
S (2,1) S (6,1) S (10,1) S (14,1)
4 4 4 4
5
2
S4(2,0) S4(6,0) S4(10,0) S4(14,0)
1
-1
-2
S4(2,-1) S4(6,-1) S4(10,-1) S4(14,-1)
-3
-4
-5
-6
S4(2,-2) S4(6,-2) S4(10,-2) S4(14,-2)
-7
-8
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
x(0), x(1), x(2), x(3) x(4), x(5), x(6), x(7) x(8), x(9),x(10),x(11) x(12),x(13),x(14),x(15)
Figure 9.15 Illustration of the signal reconstruction from the STFT with nonoverlapping
windows.
Ljubiša Stanković Digital Signal Processing 551
1 N/2−1
w(m) x (n0 + iR + m) = ∑ STFT (n0 + iR, k)e j2πmk/N .
N k=−
(9.27)
N/2
Since R < N we we will get the same signal value within different STFT, for
different i. For example, for N = 8, R = 2 and n0 = 0 we will get the value
x (0) for m = 0 and i = 0, but also for m = −2 and i = 1 or m = 2 and i = −1,
and son on. Then in the reconstruction we should use all these values to get
the most reliable reconstruction.
Let us reindex the reconstructed signal values (9.27) by substitution
m = l − iR
1 N/2−1
w(l − iR) x (n0 + l ) = ∑ STFT (n0 + iR, k)e j2πlk/N e− j2πiRk/N
N k=− N/2
− N/2 ≤ l − iR ≤ N/2 − 1.
Ljubiša Stanković Digital Signal Processing 553
x(n)
n
n -N n -N/2 n
0 0 0
w(m) w(m)
m
x(n - N+m)w(m) x(n +m)w(m)
0 0
w(m)
m
n
Figure 9.16 Illustration of the STFT calculation with windows overlapping in order to pro-
duce an inverse STFT whose sum will give the original signal within n0 − N ≤ n < n0 .
554 Time-Frequency Analysis
1 N/2−1
w ( l ) x ( n0 + l ) = ∑ STFT (n0 , k)e j2πlk/N
N k=− N/2
...
1 N/2−1
w(l − 2R) x (n0 + l ) = ∑ STFT (n0 + 2R, k)e j2πlk/N e− j2π2Rk/N
N k=− N/2
1 N/2−1
w ( l − R ) x ( n0 + l ) = ∑ STFT (n0 + R, k)e j2πlk/N e− j2πRk/N
N k=− N/2
1 N/2−1
w ( l + R ) x ( n0 + l ) = ∑ STFT (n0 − R, k)e j2πlk/N e j2πRk/N
N k=− N/2
1 N/2−1
w(l + 2R) x (n0 + l ) = ∑ STFT (n0 − 2R, k)e j2πlk/N e j2π2Rk/N
N k=− N/2
...
Special cases:
Ljubiša Stanković Digital Signal Processing 555
N/2
w(3) STFT(n-3,0) w(3) x(n-0)
x(n-0) ↓ -4
z
-1
z
w(2) STFT(n-3,1) w(2) x(n-1)
x(n-1) ↓ -4
z
-1
z
w(1) STFT(n-3,2) w(1) x(n-2)
x(n-2) ↓ -4
z
-1
z
w(0) STFT(n-3,3) w(0) x(n-3)
x(n-3) ↓ -4
z
-1 STFT IDFT
z
w(-1) STFT(n-3,4) w(-1) x(n-4) x(n-4)
x(n-4) ↓ +
-1
(DFT)
z
w(-2) STFT(n-3,5) w(-2) x(n-5) x(n-5)
x(n-5) ↓ +
z-1
w(-3) STFT(n-3,6) w(-3) x(n-6) x(n-6)
x(n-6) ↓ +
-1
z
w(-4) STFT(n-3,7) w(-4) x(n-7) x(n-7)
x(n-7) ↓ +
R=N/2=4
Figure 9.17 Signal reconstruction from the STFT for the case N = 8, when the STFT is
calculated with step R = N/2 = 4 and the window satisfies w(m) + w(m − N/2) = 1. This
is the case for the rectangular, Hann(ing), Blackman and triangular windows. The same holds
for the Hamming window up to a constant scaling factor of 1.08.
Very efficient realizations, for this case, are the recursive ones, instead
of the direct DFT calculation, Fig.9.8.
In analysis of non-stationary signals our primary interest is not in
signal reconstruction with the fewest number of calculation points. Rather,
we are interested in tracking signals’ non-stationary parameters, like for
example, instantaneous frequency. These parameters may significantly vary
between neighboring time instants n and n + 1. Quasi-stationarity of signal
within R samples (implicitly assumed when down-sampling by factor of R
is done) in this case is not a good starting point for the analysis. Here, we
have to use the time-frequency analysis of signal at each instant n, without
any down-sampling.
Ni /2−1 − j 2π
N mk
STFTNi (ni , k ) = ∑ x ( ni + m ) e i (9.29)
m=− Ni /2
Notation STFTNi (n, k ) means that the STFT is calculated using signal sam-
ples within the window [ni − Ni /2, ni + Ni /2 − 1] for − Ni /2 ≤ k ≤ Ni /2 − 1,
corresponding to an even number of Ni discrete frequencies from −π to π.
For an odd Ni , the summation limits are ±( Ni − 1)/2. Let us restate that
a wide window includes signal samples over a wide time interval, losing
the possibility to detect fast changes in time, but achieving high frequency
resolution. A narrow window in the STFT will track time changes, but with
a low resolution in frequency. Two extreme cases are Ni = 1 when
and Ni = M when
STFTM (n, k ) = X (k ),
Ljubiša Stanković Digital Signal Processing 557
x(n)
w(3) STFT(n-3,0)
x(n-0)
z-1
w(2) STFT(n-3,1)
x(n-1)
-1
z
w(1) STFT(n-3,2)
x(n-2)
-1
z
w(0) STFT(n-3,3) 1/(Nw(0)) x(n-3)
x(n-3) +
z-1 STFT
w(-1) STFT(n-3,4)
x(n-4)
(DFT)
z-1
w(-2) STFT(n-3,5)
x(n-5)
z-1
w(-3) STFT(n-3,6)
x(n-6)
-1
z
w(-4) STFT(n-3,7)
x(n-7)
Figure 9.18 Signal reconstruction when the STFT is calculated with step R = 1.
where STFT Ni (ni ) and x Ni (ni ) are column vectors. Their elements are
STFTNi (ni , k ), k = − Ni /2,..., Ni /2 − 1 and x (ni + m), m = − Ni /2,..., Ni /2 −
1, respectively
where m is the column index and k is the row index of the matrix. The STFT
value STFTNi (ni , k ) is presented as a block in the time-frequency plane of
the width Ni in the time direction, covering all time instants [ni − Ni /2, ni +
Ni /2 − 1] used in its calculation. The frequency axis can be labeled with the
DFT indices p = − M/2, ..., M/2 − 1 corresponding to the DFT frequencies
2π p/M (dots in Fig.9.19). With respect to this axis labeling, the block
STFTNi (ni , k) will be positioned at the frequency 2πk/Ni = 2π (kM/Ni )/M,
i.e., at p = kM/Ni . The block width in frequency is M/Ni DFT samples.
Therefore the block area in time and DFT frequency is always equal to
the number of all available signal samples M as shown in Fig.9.19 where
M = 16.
Example 9.11. Consider a signal x (n) with M = 16 samples. Write the expression
for calculation of the STFT value STFT4 (2, 1) with a rectangular window.
Indicate graphically the region of time instants used in the calculation and
the frequency range in terms of the DFT frequency values included in the
calculation of STFT4 (2, 1)?
⋆The STFT value STFT4 (2, 1) is:
1 2π
STFT4 (2, 1) = ∑ x (2 + m ) e − j 4 m .
m=−2
7 7
6 6
S4(2,1) S4(6,1) S4(10,1) S4(14,1)
5 5
S (11,0)
S2(13,0)
S (15,0)
S2(1,0)
S (3,0)
S (5,0)
S (7,0)
S (9,0)
4 4
3 3
2
2 2
S4(2,0) S4(6,0) S4(10,0) S4(14,0)
1 1
0 0
-1 -1
-2 -2
S4(2,-1) S4(6,-1) S4(10,-1) S4(14,-1)
-3 -3
S (11,-1)
S2(13,-1)
S (15,-1)
S2(1,-1)
S (3,-1)
S (5,-1)
S (7,-1)
S (9,-1)
-4 -4
-5 -5
2
-6 -6
S4(2,-2) S (6,-2) S (10,-2) S (14,-2)
4 4 4
-7 -7
-8 -8
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
7 7
S8(12,3)
6 6
S4(4,1) S4(8,1) S4(6,1)
5 5
S8(12,2)
S (11,0)
S2(13,0)
S2(1,0)
S2(1,0)
S (3,0)
4 4
3 3
2
2
S8(12,1)
2 2
S (4,0) S (8,0) S (6,0)
4 4 4
1 1
S8(12,0)
S (14,0)
S (15,0)
0 0
-1 -1
1
S8(12,-1)
-2 -2
S4(4,-1) S4(8,-1) S4(6,-1)
-3 -3
S8(12,-2)
S (11,-1)
S (13,-1)
S2(1,-1)
S2(1,-1)
S (3,-1)
-4 -4
-5 -5
2
S8(12,-3)
2
-6 -6
S4(4,-2) S4(8,-2) S4(6,-2)
-7 -7
S8(12,-4)
-8 -8
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Figure 9.19 The nonoverlapping STFTs with: (a) constant window of the width N = 4, (b)
constant window of the width N = 2, (c)-(d) time-varying windows. Time index is presented
on the horizontal axis, while the DFT frequency index is shown on the vertical axis (the STFT
is denoted by S for notation simplicity).
values is
⎡ ⎤
W N0 0 ··· 0
⎢ 0 W N1 ··· 0 ⎥
⎢ ⎥
STFT = ⎢ .. .. .. .. ⎥x
⎣ . . . . ⎦
0 0 ··· W NK
−1
STFT = W̃x = W̃W M X, (9.30)
where STFT is a column vector containing all STFT vectors STFT Ni (ni ),
i = 0, 1,..., K, X = W M x is a DFT of the whole signal x (n), while W̃ is a block
matrix (M × M) formed from the smaller DFT matrices W N0 , W N1 , ...,W NK ,
as in (9.29). Since the time-varying nonoverlapping STFT corresponds to a
decimation-in-time DFT scheme, its calculation is more efficient than the
DFT calculation of the whole signal. Illustration of time-varying window
STFTs is shown in Fig.9.19(c), (d). For a signal with M samples, there
is a large number of possible nonoverlapping STFTs with a time-varying
window Ni ∈ {1, 2, 3, ..., M }. The exact number will be derived later.
Example 9.12. Consider a signal x (n) with M = 16 samples, whose values are
x = [0.5, 0.5, −0.25, j0.25, 0.25, − j0.25, −0.25, 0.25, −0.25, 0.25, 0.5, 0.5,
− j0.5, j0.5, 0, −1]. Some of its nonoverlapping STFTs are calculated according
to (9.29) and shown in Fig.9.19. Different representations can be compared
based on the concentration measures, for example,
The best STFT representation, in this sense, would be the one with the small-
est µ[STFTN (n, k)]. For the considered signal and its four representations
shown in Fig.9.19 the best representation, according to this criterion, is the
one shown in Fig.9.19(b).
Example 9.13. Consider a signal x (n) with M = 8 samples. Its values are x (0) = 0,
x (1) = 1, x (2) = 1/2, x (3) = −1/2, x (4) = 1/4, x (5) = − j/4, x (6) = −1/4,
and x (7) = j/4.
(a) Calculate the STFTs of this signal with rectangular window of the
widths N = 1, N = 2, N = 4. Use the following STFT definition
N/2−1
STFTN (n, k ) = ∑ x (n + m)e− j2πmk/N .
m=− N/2
For an odd N, the summation limits are ±( N − 1)/2. Calculate STFT1 (n, k)
for n = 0, 1, 2, 3, 4, 5, 6, 7, then STFT2 (n, k ) for n = 1, 3, 5, 7, then STFT4 (n, k)
for n = 2, 6 and STFT8 (n, k) for n = 4. For frequency axis use notation
k = 0, 1, 2, 3, 4, 5, 6, 7.
Ljubiša Stanković Digital Signal Processing 561
Figure 9.20 Time-frequency representation in various lattices (grid-lines are shown), with
concentration measure M = µ[SPEC (n, k )] value. The optimal representation, with respect
to this measure, is presented with thicker gridlines. Time axis is n = 0, 1, 2, 3, 4, 5, 6, 7 and the
frequency axis is k = 0, 1, 2, 3, 4, 5, 6, 7.
for each case. (c) By measuring the concentration for all of them, we will get
Ljubiša Stanković Digital Signal Processing 563
3π/4
frequency
π/2
π/4
0
0 1 2 3 4 5
time
STFT2(1,1)
STFT (4,2)
3
3π/4
STFT1(2,0)
frequency
π/2 STFT3(4,1)
STFT2(1,0)
π/4
STFT3(4,0)
0
0 1 2 3 4 5
time
⋆(a) Denoted areas are presented in Fig. 9.22. (b) The STFT values are
obtained using
( N −1)/2−1
STFTN (n, k) = ∑ x (n + m)e− j2πmk/N or
m=−( N −1)/2
N/2−1
STFTN (n, k) = ∑ x (n + m)e− j2πmk/N
m=− N/2
Example 9.15. A discrete signal x (n) is considered for 0 ≤ n < M. Find the number
of the STFTs of this signal with time-varying windows.
(a) Consider arbitrary window widths from 1 to M.
(b) Consider dyadic windows, that is, windows whose width is 2 m ,
where m is an integer, such that 2m ≤ M. In this case find the number of time-
varying window STFTs for M = 1, 2, 3, ..., 15, 16.
⋆(a) Let us analyze the problem recursively. Denote by F ( M) the
number of STFTs for a signal with M samples. It is obvious that F (1) = 1,
that is, for one-sample signal there is only one STFT (signal sample itself).
If M > 1, we can use window with widths k = 1, 2, . . . M, as the first analysis
window. Now let us analyze remaining ( M − k) samples in all possible ways,
so we can write a recursive relation for the total number of the STFTs. If
566 Time-Frequency Analysis
the first window is one-sample window, then the number of the STFTs is
F ( M − 1). When the first window is a two-sample window, then the total
number of the STFTs is F ( M − 2), and so on, until the first window is the M-
sample window, when F ( M − M) = 1. Thus, the total number of the STFTs
for all cases is
F ( M ) = F ( M − 1) + F ( M − 2) + . . . + F (1) + 1
We can introduce F (0) = 1 (meaning that if there are no signal samples we
have only one way to calculate time-varying window STFT) and obtain
M
F ( M ) = F ( M − 1) + F ( M − 2) + . . . F (1) + F (0) = ∑ F( M − k)
k =1
and
M M
F ( M ) − F ( M − 1) = ∑ F ( M − k ) − ∑ F ( M − k ) = F ( M − 1)
k =1 k =2
F ( M) = 2F ( M − 1).
resulting in F ( M) = 2 M−1 .
(b) In a similar way, following the previous analysis, we can write
F ( M ) = F ( M − 20 ) + F ( M − 21 ) + F ( M − 22 ) + · · · + F ( M − 2m )
⌊log2 M⌋
= ∑ F ( M − 2m )
m =0
where [·] is an integer part of the argument, holds, with relative error smaller
then 0.4% for 1 ≤ M ≤ 1024. For example, for M = 16 we have 5272 different
ways to split time-frequency plane into non-overlapping time-frequency
regions.
The STFT may use frequency-varying window as well. For a given DFT
frequency pi the window width in time is constant, Fig.9.23
Ni /2−1 − j 2π
N mk i
STFTNi (n, k i ) = ∑ w(m) x (n + m)e i .
m=− Ni /2
2−1
STFT4 (2, −1) = ∑ x (2 + m)e− j2πm(−1)/4 .
m=−2
1 M −1
P(i ) X (k + i )e j2πin/M ,
M i∑
STFT (n, k ) = (9.31)
=0
STFT M (k ) = W− 1 −1
M P M X ( k ).
7 7
6 6
S4(2,1) S4(6,1) S4(10,1) S4(14,1)
5 5
S (11,0)
S2(13,0)
S (15,0)
S2(1,0)
S2(3,0)
S (5,0)
S2(7,0)
S2(9,0)
4 4
3 3
2
2
S8(4,1) S8(12,1)
2 2
1 1 S16(8,1)
0 0 S16(8,0)
-1 -1
-2 -2
S4(2,-1) S4(6,-1) S4(10,-1) S4(14,-1)
-3 -3
S (11,-1)
S (13,-1)
S (15,-1)
S (1,-1)
S (3,-1)
S (5,-1)
S2(7,-1)
S (9,-1)
-4 -4
-5 -5
2
-6 -6
S4(2,-2) S4(6,-2) S4(10,-2) S4(14,-2)
-7 -7
-8 -8
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
7 7 S16(8,7)
6 6 S (8,6)
16
5 5
S8(4,2) S8(12,2)
S (11,0)
S2(13,0)
S (15,0)
S2(1,0)
S2(3,0)
S (5,0)
S2(7,0)
S2(9,0)
4 4
3 3
2
S (4,1) S8(12,1)
8
2 2
1 1
0 0
S4(2,0) S (6,0)
4
S (10,0)
4
S (14,0)
4
-1 -1
-2 -2
S4(2,-1) S4(6,-1) S4(10,-1) S4(14,-1)
-3 -3
-4 -4
S4(2,-1) S4(6,-1) S4(10,-1) S4(14,-1)
-5 -5
S8(4,-3) S8(12,-3)
-6 -6
-7 S16(8,-7) -7
S8(4,-4) S8(12,-4)
-8 S16(8,-8) -8
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Figure 9.23 Time-frequency analysis with the STFT using frequency-varying windows.
⎡ ⎤
W− 1
N0 0 ··· 0
⎢ 0 W− 1
··· 0 ⎥
⎢ N1 ⎥
STFT = ⎢
⎢ .. .. .. .. ⎥ X,
⎥ (9.32)
⎣ . . . . ⎦
0 0 ··· W−NK
1
N(i,l ) /2−1
− j N2π mk l
STFTN(i,l ) (ni , k l ) = ∑ w(i,l ) (m) x (ni + m)e (i,l ) (9.33)
m=− N(i,l ) /2
For a graphical representation of the STFT with varying windows, the cor-
responding STFT value should be assigned to each instant n = 0, 1, ..., M − 1
and each DFT frequency p = − M/2, − M/2 + 1, ..., M/2 − 1 within a block.
In the case of a hybrid time–frequency-varying window the matrix form is
obtained from the definition for each STFT value. For example, for the STFT
calculated as in Fig.9.24, for each STFT value an expression based on (9.33)
should be written. Then the resulting matrix STFT can be formed.
There are several methods in the literature that adapt windows or
basis functions to the signal form for each time instant or even for every
considered time and frequency point in the time-frequency plane. Selection
of the most appropriate form of the basis functions (windows) for each time-
frequency point includes a criterion for selecting the optimal window width
(basis function scale) for each point.
The first form of functions having the basic property of wavelets was used
by Haar at the beginning of the twentieth century. At the beginning of
1980’s, Morlet introduced a form of basis functions for analysis of seismic
signals, naming them “wavelets”. Theory of wavelets was linked to the
image processing by Mallat in the following years. In late 1980s Daubechies
presented a whole new class of wavelets that can be implemented in a
simple way, by using digital filtering ideas. The most important applications
of the wavelets are found in image processing and compression, pattern
recognition and signal denoising. Here, we will only link the basics of the
wavelet transform to the time-frequency analysis.
Common STFT is characterized by a constant window and constant
time and frequency resolutions for both low and high frequencies. The ba-
sic idea behind the wavelet transform, as it was originally introduced by
Morlet, was to vary the resolution with scale (being related to frequency)
570 Time-Frequency Analysis
7
STFT8(12,3)
6
STFT4(2,1) STFT4(6,1)
5
STFT8(12,2)
4
3
STFT8(4,1) STFT8(12,1)
2
1 STFT16(8,1)
frequency
0 STFT16(8,0)
-1
STFT8(4,-1)
-2
STFT (10,-1)
4
-3
STFT (13,-1)
STFT (15,-1)
STFT8(4,-2)
-4
2
-5
-6
STFT4(2,-2) STFT4(6,-2) STFT4(10,-2)
-7
-8
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
time
in such a way that a high frequency resolution is obtained for signal com-
ponents at low frequencies, whereas a high time resolution is obtained for
signal at high frequency components. This kind of resolution change could
be relevant for some practical applications, like for example seismic signals.
It is achieved by introducing a frequency variable window width. Window
width is decreased as frequency increases.
The basis functions in the STFT are
"∞
STFTI I (t, Ω0 ) = x (τ )w(τ − t)e− jΩ0 τ dτ
−∞
# $ "∞
− jΩ0 τ ∗
= x ( τ ), w ( τ − t ) e = ⟨ x (τ ), h (τ − t)⟩ = x (τ )h∗ (τ − t)dτ
−∞
When the above idea about wavelet transform is translated into the
mathematical form and related to the STFT, one gets the definition of a
continuous wavelet transform
"∞
1 τ−t
WT (t, a) = , x (τ ) h∗ ( )dτ (9.34)
| a| a
−∞
where h(t) is a band-pass signal, and the parameter a is the scale. This
transform produces a time-scale, rather than the time-frequency signal rep-
resentation. For the Morlet wavelet the relation between the scale and the
frequency is a = Ω0 /Ω. In order to establish a strong formal relationship
between the wavelet transform and the STFT, we will choose the basic Mor-
let wavelet h(t) in the form
"∞
1 τ − t − jΩ0 τ −t
WT (t, a) = , x (τ )w∗ ( )e a dτ. (9.36)
| a| a
−∞
From the filter theory point of view the wavelet transform, for a given
scale a, could
, be considered as the output of system with impulse response
∗
h (−t/a) | a|, i.e.,
F
WT (t, a) = x (t) ∗t h∗ (−t/a) | a |,
Ω=Ω0/2
a=2
(a) (b)
t t
Ω=Ω0
a=1
(c) (d)
t t
Ω=2Ω0
a=1/2
(e) (f)
t t
Figure 9.25 Expansion functions for the wavelet transform (left) and the short-time Fourier
transform (right). Top row presents high scale (low frequency), middle row is for medium scale
(medium frequency) and bottom row is for low scale (high frequency).
these two band-pass filters from the bandwidth point of view we can see
that, in the case of STFT, the filtering is done by a system whose impulse
response w∗ (−t)e jΩt has a constant bandwidth, being equal to the width of
the Fourier transform of w(t).
Constant Q-Factor Transform: The quality factor Q for a band-pass filter,
as measure of the filter selectivity, is defined as
Central Frequency
Q=
Bandwidth
In the STFT the bandwidth is constant, equal to the window Fourier trans-
form width, Bw . Thus, factor Q is proportional to the considered frequency,
Ω
Q= .
Bw
WT(t,Ω) STFT(t,Ω)
Ω Ω
Ω2 Ω2
Ω1 Ω1
(a) (b)
t1 t2 t t1 t2 t
Figure 9.26 Illustration of the wavelet transform (a) of a sum of two delta pulses and two
sinusiods compared with STFT (b)
The scalogram obviously loses the linearity property, and fits into the cate-
gory of quadratic transforms.
This analysis will start by splitting the signal’s spectral content into its high
frequency and low frequency part. Within the STFT framework, this can be
achieved by a two sample rectangular window
w ( n ) = δ ( n ) + δ ( n + 1 ),
1 1
STFT (n, 0) = √ ∑ x (n + m)e− j0
2 m =0
1
= √ ( x (n) + x (n + 1)) = x L (n), (9.40)
2
1
x H (n) = √ ( x (n) − x (n + 1)) (9.41)
2
√
( x (n + 2) − x (n + 3)) /√2 and kept as it is. Lowpass part x L (n + 2) =
( x (n + 2) + x (n + 3)) / 2 is considered as a new signal, along with its cor-
responding previous sample x L (n).
Spectral content of the lowpass part of signal is divided, in the same
way, into its low and high frequency part,
1
x LL (n) = √ ( x L (n) + x L (n + 2))
2
1
= [ x (n) + x (n + 1) + x (n + 2) + x (n + 3)]
2
1
x LH (n) = √ ( x L (n) − x L (n + 2))
2
1
= [ x (n) + x (n + 1) − [ x (n + 2) + x (n + 3)]] .
2
The highpass part x LH (n) is left with resolution four in time, while the
lowpass part is further processed in the same way, by dividing spectral
content of x LL (n) and x LL (n + 4) into its low and high frequency part. This
process is continued until the full length of signal is achieved. The Haar
wavelet transformation matrix in the case of signal with 8 samples is
⎡ √ ⎤ ⎡ ⎤⎡ ⎤
2W1 (0, H ) 1 −1 0 0 0 0 0 0 x (0)
⎢ √2W (2, H ) ⎥ ⎢ 0 0 1 −1 0 0 0 0 ⎥ ⎢ x (1) ⎥
⎢ √ 1 ⎥ ⎢ ⎥⎢ ⎥
⎢ ⎥ ⎢
⎢ √2W1 (4, H ) ⎥ ⎢ 0 0 0 0 1 −1 0 0 ⎥ ⎥⎢
⎢ x (2) ⎥
⎥
⎢ ⎥ ⎢
⎢ 2W1 (6, H ) ⎥ ⎢ 0 0 0 0 0 0 1 −1 ⎥ ⎢
⎥
⎢ x (3) ⎥
⎥.
⎢ ⎥=⎢ (9.42)
⎢ 2W2 (0, H ) ⎥ ⎢ 1 1 −1 −1 0 0 0 0 ⎥⎥
⎢
⎢ x (4) ⎥
⎥
⎢ ⎥ ⎢
⎢ 2W2 (4, H ) ⎥ ⎢ 0 0 0 0 1 1 −1 −1 ⎥⎥⎢
⎢ x (5) ⎥
⎥
⎢ √ ⎥ ⎣
⎣ 2 2W4 (0, H ) ⎦ 1 1 1 1 −1 −1 −1 −1 ⎣
⎦ x (6) ⎦
√
2 2W4 (0, L) 1 1 1 1 1 1 1 1 x (7)
This kind of signal transformation was introduced by Haar more than a century
ago . In this notation scale a = 1 values of the wavelet coefficients W1 (2n, H )
are equal to the highpass part of signal calculated using two samples,
W1 (2n, H ) = x H (2n). The scale a = 2 wavelet coefficients are W2 (4n, H ) =
x LH (4n). In scale a = 4 there is only one highpass and one lowpass coef-
ficient at n = 0, W4 (8n, H ) = x LLH (8n) and W4 (8n, L) = x LLL (8n). In this
way any length of signal N = 2m can be decomposed into Haar wavelet
coefficients.
The Haar wavelet transform has a property that its highpass coeffi-
cients are equal to zero if the analyzed signal is constant within the analyzed
time interval, for considered scale. If signal has large number of constant
576 Time-Frequency Analysis
value samples within the analyzed time intervals, then many Haar wavelet
transform coefficients are zero valued. They can be omitted in signal storage
or transmission. In recovery their values are assumed as zeros and the orig-
inal signal is obtained. The same can be done in the case of noisy signals,
when all coefficients bellow an assumed level of noise can be zero-valued
and the signal-to-noise ratio in the reconstructed signal improved.
Although the presented Haar wavelet analysis is quite simple we will use
it as an example to introduce the filter bank framework of the wavelet
transform. Obvious results from the Haar wavelet will be used to introduce
other wavelet forms. For the Haar wavelet calculation two signals x L (n) and
x H (n) are formed according to (9.40) and (9.41), based on the input signal
x (n). Transfer functions of the discrete-time systems producing these two
signals are
1
H L ( z ) = √ (1 + z ) (9.43)
2
1
H H ( z ) = √ (1 − z ) .
2
1 B C
HL (e jω ) = √ 1 + e jω
2
1 B C
jω
HH (e ) = √ 1 − e jω
2
' ' √ ' '
with amplitude characteristics ' HL (e jω )' = 2 |cos(ω/2)|, and ' HH (e jω )' =
√
2 |sin(ω/2)|, presented in Fig.9.27. As expected, they represent a quite
rough forms of lowpass and highpass filters. In general, this principle
is kept for all wavelet transforms. The basic goal for all of them is to
split the frequency content of a signal into its lowpass part and highpass
part providing, in addition, a possibility of simple and efficient signal
reconstruction.
After the values representing lowpass and highpass part of signal √
are obtained, next values of the √ signals x L ( n ) = [ x ( n ) + x ( n + 1 )] / 2
and x H (n) = [ x (n) − x (n + 1)] / 2 are calculated after one time instant is
skipped. Therefore the output signal is downsampled by factor of two. The
Ljubiša Stanković Digital Signal Processing 577
2
|HL(ejω)|2+|HH(ejω)|2=2
1.8
1.6 jω
|H (e )|=|DFT{φ (n)}|
L 1
1.4
1.2
0.8
|HH(ejω)|=|DFT{ψ1(n)}|
0.6
0.4
0.2
0
-3 -2 -1 0 1 2 3
Figure 9.27√ Amplitude of the Fourier transform of basic Haar wavelet and scale function
divided by 2.
s L (n) = x L (2n)
s H (n) = x H (2n). (9.44)
1 1
Y (z) = X (z1/2 ) + X (−z1/2 ).
2 2
This relation can easily be verified using the z-transform definition
∞
X (z) = ∑ x (n )z−n
n=−∞
∞ ∞
X (z1/2 ) + X (−z1/2 ) = ∑ x (n)[(z−1/2 )n + (−z−1/2 )n ] = ∑ 2x (2n)z− n
n=−∞ n=−∞
1 1
Z{ x (2n))} = Y (z) = X (z1/2 ) + X (−z1/2 ). (9.45)
2 2
For the signals s L (n) = x L (2n) and s H (n) = x H (2n) the system implementa-
tion is presented in Fig.9.28.
578 Time-Frequency Analysis
H (z)
H ↓ [X(z1/2)H (z1/2+X(z-1/2 )H (z-1/2 )]/2
H H
2
x(n)
X(z)
H (z)
L ↓ [X(z1/2)H (z1/2+X(z-1/2 )H (z-1/2 )]/2
H H
Figure 9.28 Signal filtering by a low pass and a high pass filter followed by downsaampling
by 2.
If the signals s L (n)and s H (n) are passed through the lowpass and
highpass filters HL (z) and HH (z) and then downsampled,
1 1
S L (z) = HL (z1/2 ) X (z1/2 ) + HL (−z1/2 ) X (−z1/2 )
2 2
1 1
S H (z) = HH (z ) X (z ) + HH (−z1/2 ) X (−z1/2 )
1/2 1/2
2 2
hold.
9.3.1.2 Upsampling
Let us assume that we are not going to transform the signals s L (n) and
s H (n) any more. The only goal is to reconstruct the signal x (n) based on its
downsampled lowpass and highpass part signals s L (n) and s H (n). The first
step in the signal reconstruction is to restore the original sampling interval
of the discrete-time signal. It is done by upsampling the signals s L (n) and
s H ( n ).
Upsampling of a signal x (n) is described by
Y ( z ) = X ( z2 ),
Ljubiša Stanković Digital Signal Processing 579
since
∞
X ( z2 ) = ∑ x (n)z−2n = ...x (−1)z2 + 0 · z1 + x (0) + 0 · z−1 + x (1)z−2 + ....
n=−∞
(9.46)
Upsampling of a signal x (n) is defined by
!
x (n/2) for even n
y(n) = = Z −1 { X (z2 ))}.
0 for odd n
1 B 1/2 C2 1 B C2
Y (z) = X( z ) + X (− z1/2 )
2 2
1 1
Y (z) = X (z) + X (−z). (9.47)
2 2
P Q B C
In the Fourier domain it means Y (e jω ) = ( X e jω + X e j(ω +π ) . This form
B C
indicates that an aliasing component X e j(ω +π ) appeared in this process.
2
SH(z)
H (z) ↓ ↑ G (z)
H H
2
x(n) y(n)
+
X(z) Y(z)
2
H (z) ↓ ↑ G (z)
L L
SL(z)
2
Figure 9.29 One stage of the filter bank with reconstruction, corresponding to the one stage
of the wavelet transform realization.
1 1
YL (z) = S L (z2 ) GL (z) = [ HL (z) X (z) + HL (−z) X (−z)] GL (z)
2 2
1 1
YH (z) = S H (z2 ) GH (z) = [ HH (z) X (z) + HH (−z) X (−z)] GH (z)
2 2
Y (z) = YL (z) + YH (z)
1 1
= [ HL (z) GL (z) + HH (z) GH (z)] X (z)
2 2
1 1
+[ HL (−z) GL (z) + HH (−z) GH (z)] X (−z).
2 2
Y ( z ) = X ( z ).
It means that
HL (−z) GL (z)
GH (z) =
HH (−z)
HL (z) GL (−z)
HH (z) = .
GH (−z)
or
HL (z ) GL (z)
[ HH (−z) GH (−z) + HL (−z) GL (−z)] = 2.
HH (−z) GH (−z)
Since the expression within the brackets is equal to 2 (reconstruction condi-
tion (9.48) with z being replaced by −z) then
HL (z ) GL (z)
=1 (9.52)
HH (−z) GH (−z)
HL (e jω ) GL (e jω ) + HH (e jω ) GH (e jω ) = 2 (9.53)
HL (−e jω ) GL (e jω ) + HH (−e jω ) GH (e jω ) = 0.
582 Time-Frequency Analysis
If the impulse response h L (n) is orthogonal, as in (9.54), then the last relation
is satisfied for
g L (n) = h L (−n).
or P(z) + P(−z) = 2 with P(z) = GL (z) GL (z−1 ). Relation (9.48) may also
written for HL (z) as well
K −1
h L (n) = ∑ hk δ(n + k )
k =0
K −1
g L (n) = h L (−n) = ∑ hk δ(n − k )
k =0
GL (e jω ) = HL (e− jω )
584 Time-Frequency Analysis
g H (n) = (−1)n g L (K − n)
K K
GH (e jω ) = ∑ gH (n)e− jωn = ∑ (−1)n gL (K − n)e− jωn
n =0 n =0
K K
= ∑ (−1)K−m gL (m)e− jω(K−m) = (−1)K e− jωK ∑ e jπm gL (m)e− j(−ω)m
m =0 m =0
− jωK − j(ω −π ) − jωK
= −e GL (e ) = −e GL (−e− jω )
or
GH (e jω ) = −e− jωK GL (−e− jω ) = −e− jωK HL (−e jω )
for GL (e jω ) = HL (e− jω ). Similar relation holds for the anticausal h H (n)
impulse response
HL (e jω ) = GL (e− jω )
GH (e jω ) = −e− jωK GL (−e− jω )
HH (e jω ) = −e jωK GL (−e jω ). (9.56)
Ljubiša Stanković Digital Signal Processing 585
∑ h L (m)h H (m − 2n) = 0
m
is also satisfied with these forms of transfer functions for any n. Since
HL (−e jω ) GL (e jω ) + HH (−e jω ) GH (e jω ) = 0
The condition that the reconstruction filter GL (z) has zero value at z = e jπ =
−1 means that its form is GL (z) = a(1 √+ z−1 ). This form without additional
requirements would produce a = 1/ 2 from the reconstruction relation
GL (z) GL (z−1 ) + GL (−z) GL (−z−1 ) = 2. The time domain filter form is
1
g L (n) = √ [δ(n) + δ(n − 1)] .
2
It corresponds to the Haar wavelet. All other filter functions can be defined
using g L (n) or GL (e jω ).
586 Time-Frequency Analysis
The same result would be obtained starting from the filter transfer
functions for the Haar wavelet already introduced as
1
H L ( z ) = √ (1 + z )
2
1
H H ( z ) = √ (1 − z ) .
2
The reconstruction filters are obtained from (9.48)-(9.49)
1 1
√ (1 + z ) G L ( z ) + √ (1 − z ) G H ( z ) = 2
2 2
1 1
√ (1 − z ) G L ( z ) + √ (1 + z ) G H ( z ) = 0
2 2
as
1 B C
G L ( z ) = √ 1 + z −1 (9.58)
2
1 B C
G H ( z ) = √ 1 − z −1
2
with
1 1
g L (n) = √ δ(n) + √ δ ( n − 1) (9.59)
2 2
1 1
g H (n) = √ δ(n) − √ δ ( n − 1 ).
2 2
The values impulse responses in the Haar wavelet transform (relations
(9.43) and (9.59)) are:
√ √ √ √
n 2h L (n) 2h H (n) n 2g L (n) 2g H (n)
0 1 1 0 1 1
−1 1 −1 1 1 −1
A detailed time domain filter bank implementation of the reconstruc-
tion process in the Haar wavelet case is described. The reconstruction is
implemented in two steps:
1) The signals s L (n) and s H (n) from (9.44) are upsampled, according
to (9.46), as
These signals are then passed trough the reconstruction filters. A sum of the
outputs from these filters is
discrete-time n 0 1 2 3 4 5 6 7
2
W (0,L)
4
H (z) ↓
L
√
into its lowpass and highpass region√ W2 (n, L) = [s2 (n) + s2 (n + 1)] / 2 and
W2 (n, H ) = [s2 (n) − s2 (n + 1)] / 2, respectively, Fig.9.31(b). The same calcu-
lation is performed in the third and fourth stage, Fig.9.31(c) - (d).
The Haar wavelet has the duration of impulse response equal to two. In
one stage, it corresponds to a two-sample STFT calculated using a rectan-
gular window. Its Fourier transform presented in Fig.9.27 is quite rough
approximation of a lowpass and highpass filter. In order to improve filter
performance, an increase of the number of filter coefficients should be done.
A fourth order FIR system will be considered. The impulse response of an-
ticausal fourth order FIR filter is h L (n) = [ h L (0), h L (−1), h L (−2), h L (−3)] =
[ h0 , h1 , h2 , h3 ].
Ljubiša Stanković Digital Signal Processing 589
15 15
14 14
13 13
W (10,H)
W1(12,H)
W1(14,H)
W (10,H)
W (12,H)
W1(14,H)
W (0,H)
W (2,H)
W (4,H)
W1(6,H)
W1(8,H)
W1(0,H)
W1(2,H)
W (4,H)
W1(6,H)
W (8,H)
12 12
11 11
1
1
1
1
10 10
9 9
8 8
7 7
6 6
W (0,H) W2(4,H) W2(8,H) W2(12,H)
2
5 5
W (10,L)
W1(12,L)
W1(14,L)
W1(0,L)
W (2,L)
W (4,L)
W1(6,L)
W1(8,L)
4 4
1
3 3
1
2 2
W (0,L) W (4,L) W (8,L) W (12,L)
2 2 2 2
1 1
0 0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
(a) (b)
15 15
14 14
13 13
W (10,H)
W1(12,H)
W1(14,H)
W (10,H)
W (12,H)
W1(14,H)
W1(0,H)
W (2,H)
W (4,H)
W1(6,H)
W1(8,H)
W1(0,H)
W1(2,H)
W (4,H)
W1(6,H)
W (8,H)
12 12
11 11
1
1
1
1
10 10
9 9
8 8
7 7
6 6
W (0,H) W (4,H) W (8,H) W (12,H) W (0,H) W (4,H) W (8,H) W (12,H)
2 2 2 2 2 2 2 2
5 5
4 4
3 3
W (0,H) W (8,H) W (0,H) W (8,H)
3 3 3 3
2 2
1 1 W (0,H)
W (0,L) W (8,L) 4
0 3 3
0 W (0,L)
4
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
(c) (d)
Figure 9.31 Wavelet transform of a signal with M = 16 samples at the output of stages 1, 2, 3
and 4, respectively. Notation Wa (n, H ) is used for the highpass value of coefficient after stage
(scale of) a at an instant n. Notation Wa (n, L) is used for the lowpass value of coefficient after
stage (scale of) a at an instant n.
HL (z) GL (z ) + H H (z ) G H (z) = 2
HL (−z) GL (z) + HH (−z) GH (z) = 0
are satisfied if
h20 + h21 + h22 + h23 = 1.
HL (z ) GL (z ) + H H (z ) G H (z )
B CB C
= h 0 + h 1 z + h 2 z 2 + h 3 z 3 h 0 + h 1 z −1 + h 2 z −2 + h 3 z −3
B CB C
+ − h 0 z 3 + h 1 z 2 − h 2 z + h 3 − h 0 z −3 + h 1 z −2 − h 2 z −1 + h 3
= 2(h20 + h21 + h22 + h23 ) = 2
and
Its solution produces the fourth order Daubechies wavelet coefficients (D4)
Note that this is just one of possible symmetric solutions of the previous
system of equations, Fig.9.32.
The reconstruction conditions for the fourth order FIR filter
HL (e jω ) = h0 + h1 e jω + h2 e j2ω + h3 e j3ω
From Fig.9.33, we can see that it is much better approximation of low and
high pass filters than in the Haar wavelet case, Fig.9.27.
Another way to derive Daubechies wavelet coefficients (D4) is in using
relation (9.55)
P(z) + P(−z) = 2
with
P ( z ) = G L ( z ) H L ( z ) = G L ( z ) G L ( z −1 )
Condition imposed on the transfer function GL (z) in D4 wavelet is that its
value and the value of its first derivative at z = −1 are zero-valued (smooth
592 Time-Frequency Analysis
1 g (n) 1 g (n)
L H
0.5 0.5
0 0
-0.5 -0.5
-4 -3 -2 -1 0 1 2 3 4 -4 -3 -2 -1 0 1 2 3 4
time n time n
1 h (n) 1 h (n)
L H
0.5 0.5
0 0
-0.5 -0.5
-4 -3 -2 -1 0 1 2 3 4 -4 -3 -2 -1 0 1 2 3 4
time n time n
2 jω 2 jω 2
|HL(e )| +|HH(e )| =2
1.8
1.6
|H (ejω)|=|DFT{φ (n)}|
L 1
1.4
1.2
0.8 jω
|H (e )|=|DFT{ψ (n)}|
H 1
0.6
0.4
0.2
0
-3 -2 -1 0 1 2 3
Figure 9.33 Amplitude of the Fourier transform of basic Daubechies D4 wavelet and scale
function.
Ljubiša Stanković Digital Signal Processing 593
Using
P(z) + P(−z) = 2
only the terms with even exponents of z will remain in P(z) + P(−z)
producing
and
* +2 B B
1 √ √ C CB √ B √ C C
R(z) = √ 1+ 3 + 1 − 3 z −1 1 + 3 + 1 − 3 z 1 .
4 2
594 Time-Frequency Analysis
All other impulse responses follow from this one (as in the presented table).
Example 9.18. Consider a signal that is a linear function of time
x (n) = an + b.
Show that the condition
'
dHL (e jω ) ''
−h L (−1) + 2h L (−2) − 3h L (−3) = 0 following from ' =0
dω '
ω =π
√ √
where a1 = 2a and b1 = 2b + 0.8966a.
h0 h1 h2 h3 0 0 0 0
0 0 h0 h1 h2 h3 0 0
given by
h2 h0 + h3 h1 = 0.
⋆If we write the sum of squares of the first two equations follows
and
h0 h2 + h1 h3 = 0
√
follow from each other if h0 + h1 + h2 + h3 = 2 and −h0 + h1 − h2 + h3 = 0
are assumed.
596 Time-Frequency Analysis
The matrix for the D4 wavelet transform calculation in the first stage
is of the form
⎡ ⎤ ⎡ ⎤⎡
⎤
W1 (0, L) h0 h1 h2 h3 0 0 0 x (0 )
0
⎢ W1 (0, H ) ⎥ ⎢ h3 − h2 h1 − h0 0 0 0 ⎥⎢ x (1) ⎥
0
⎢ ⎥ ⎢ ⎥⎢ ⎥
⎢ W1 (2, L) ⎥ ⎢ 0 0 h0 h1 h2 h3 0 ⎥⎢ x (2) ⎥
0
⎢ ⎥ ⎢ ⎥⎢ ⎥
⎢ W1 (2, H ) ⎥ ⎢ 0 0 h3 − h2 h1 − h0 0 ⎥⎢ x (3) ⎥
0
⎢ ⎥=⎢ ⎥⎢ ⎥
⎢ W1 (4, L) ⎥ ⎢ 0 0 0 0 h0 h1 h2 ⎥⎢ x (4) ⎥.
h3
⎢ ⎥ ⎢ ⎥⎢ ⎥
⎢ W1 (4, H ) ⎥ ⎢ 0 0 0 0 h3 − h2 h1 ⎥⎢ x (5) ⎥
− h0
⎢ ⎥ ⎢ ⎥⎢ ⎥
⎣ W1 (6, L) ⎦ ⎣ h2 h3 0 0 0 0 h0 ⎦⎣ x (6) ⎦
h1
W1 (6, H ) h1 − h0 0 0 0 0 h3 − h2
x (7 )
(9.61)
In the first row of transformation matrix the coefficients corresponds to
h L (n), while the second row corresponds to h H (n). The first row produces
D4 scaling function, while the second row produces D4 wavelet function.
The coefficients are shifted for 2 in next rows. As it has been described
in the Hann(ing) window reconstruction case, the calculation should be
performed in a circular manner, assuming signal periodicity. That is why
the coefficients are circularly shifted in the last two rows.
Example 9.21. For the signal x (n) = δ(n − 7) defined within 0 ≤ n ≤ 15 calcu-
late the wavelet transform coefficients using the D4 wavelet/scale function.
Repeat the same calculation for the signal x (n) = 2 cos(16πn/N ) + 1 with
0 ≤ n ≤ N − 1 with N = 16.
Ljubiša Stanković Digital Signal Processing 597
with √ √ √ √
1− 3 3− 3 3+ 3 1+ 3
[ h3 , h2 , h1 , h0 ] = [ √ , √ , √ , √ ].
4 2 4 2 4 2 4 2
In specific, W1 (0, H ) = 0, W1 (2, H ) = 0, W1 (4, H ) = −0.4830, W1 (6, H ) =
−0.2241, W1 (8, H ) = 0, W1 (10, H ) = 0, W1 (12, H ) = 0, and W1 (14, H ) = 0.
The lowpass part of the first stage values
15 15
14 14
13 13
W (10,H)
W (12,H)
W1(14,H)
W1(10,H)
W1(12,H)
W (14,H)
W1(0,H)
W1(2,H)
W1(4,H)
W (6,H)
W (8,H)
W1(0,H)
W1(2,H)
W (4,H)
W1(6,H)
W (8,H)
12 12
11 11
1
1
1
1
10 10
9 9
8 8
7 7
6 6
W2(0,H) W2(4,H) W2(8,H) W2(12,H) W2(0,H) W2(4,H) W2(8,H) W2(12,H)
5 5
4 4
3 3
W (0,H) W (8,H) W (0,H) W (8,H)
3 3 3 3
2 2
1 1
W (0,L) W (8,L) W (0,L) W (8,L)
3 3 3 3
0 0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Figure 9.34 Daubechies D4 wavelet transform (absolute value) of the signal x (n) = δ(n − 7)
using N = 16 signal samples, 0 ≤ n ≤ N − 1 (left). The Daubechies D4 wavelet transform
(absolute value) of the signal x (n) = 2 cos(2π8n/N ) + 1, 0 ≤ n ≤ N − 1, with N = 16 (right).
The inverse matrix for the D4 wavelet transform for a signal with
N = 8 samples would be calculated from the lowest level in this case
for a = 2 with coefficients W2 (0, L), W2 (0, H ), W2 (4, L), and W2 (4, H ). The
lowpass part of signal at level a = 1 would be reconstructed using
⎡ ⎤ ⎡ ⎤⎡ ⎤
W1 (0, L) h0 h3 h2 h1 W2 (0, L)
⎢ W1 (2, L) ⎥ ⎢ h1 − h2 h3 − h0 ⎥ ⎢ W2 (0, H ) ⎥
⎢ ⎥ ⎢ ⎥⎢ ⎥.
⎣ W1 (4, L) ⎦ = ⎣ h2 h1 h0 h3 ⎦ ⎣ W2 (4, L) ⎦
W1 (6, L) h3 − h0 h1 − h2 W2 (4, H )
After the lowpass part W1 (0, L), W1 (2, L), W1 (4, L), and W1 (6, L) are recon-
structed, they are used with wavelet coefficients from this stage W1 (0, H ),
W1 (2, H ), W1 (4, H ), and W1 (6, H ) to reconstruct the signal as
⎡ ⎤ ⎡ ⎤⎡ ⎤
x (0) h0 h3 0 0 0 0 h2 h1 W1 (0, L)
⎢ x (1) ⎥ ⎢ h1 − h2 0 h3 − h0 ⎥ ⎢ ⎥
⎢ ⎥ ⎢ 0 0 0 ⎥⎢ W1 (0, H ) ⎥
⎢ x (2) ⎥ ⎢ h2 h1 h0 h3 0 0 0 ⎥ ⎢
0 ⎥⎢ W1 (2, L) ⎥
⎢ ⎥ ⎢ ⎥
⎢ x (3) ⎥ ⎢ h3 − h0 h1 − h2 0 ⎥ ⎢ ⎥
⎢ ⎥=⎢ 0 0 0 ⎥⎢ W1 (2, H ) ⎥ .
⎢ x (4) ⎥ ⎢ 0 0 h2 h1 h0 h3 0 ⎥ ⎢
0 ⎥⎢ W1 (4, L) ⎥
⎢ ⎥ ⎢ ⎥
⎢ x (5) ⎥ ⎢ 0 0 h3 − h0 h1 − h2 0 ⎥ ⎢ ⎥
⎢ ⎥ ⎢ 0 ⎥⎢ W1 (4, H ) ⎥
⎣ x (6) ⎦ ⎣ 0 0 0 0 h2 h1 h0 h3 ⎦ ⎣ W1 (6, L) ⎦
x (7) 0 0 0 0 h3 − h0 h1 − h2 W1 (6, H )
(9.62)
Ljubiša Stanković Digital Signal Processing 599
This procedure can be continued for signal of length N = 16 with one more
stage. Additional stage would be added for N = 32 and so on.
Example 9.22. For the Wavelet transform from the previous example find its
inverse (reconstruct the signal).
⋆The inversion is done backwards. From W3 (0, H ), W3 (0, L), W3 (8, H ),
W3 (8, L) we get signal s3 (n) or W2 (2n, L) as
⎡ ⎤ ⎡ ⎤⎡ ⎤
W2 (0, L) h0 h3 h2 h1 W3 (0, L)
⎢ W2 (4, L) ⎥ ⎢ h1 −h2 h3 −h0 ⎥ ⎢ W3 (0, H ) ⎥
⎢ ⎥ ⎢ ⎥⎢ ⎥
⎣ W2 (8, L) ⎦ = ⎣ h2 h1 h0 h3 ⎦ ⎣ W3 (8, L) ⎦
W2 (12, L) h3 − h0 h1 − h2 W3 (8, H )
⎡ ⎤⎡ ⎤ ⎡ ⎤
h0 h3 h2 h1 0.4668 −0.1373
⎢ h1 −h2 h3 −h0 ⎥ ⎢ −0.1251 ⎥ ⎢ 0.6373 ⎥
=⎢ ⎣ h2
⎥⎢ ⎥=⎢ ⎥.
h1 h0 h3 ⎦ ⎣ −0.1132 ⎦ ⎣ 0 ⎦
h3 − h0 h1 − h2 −0.4226 0
Then W2 (4n, L) = s3 (n) are used with the wavelet coefficients W2 (4n, H ) to
reconstruct W1 (2n, L) or s2 (n) using
⎡ ⎤ ⎡ ⎤⎡ ⎤
W1 (0, L) h0 h3 0 0 0 0 h2 h1 W2 (0, L)
⎢ W1 (2, L) ⎥ ⎢ 0 h3 − h0 ⎥ ⎢ ⎥
⎢ ⎥ ⎢ h1 − h2 0 0 0 ⎥⎢ W2 (0, H ) ⎥
⎢ ⎥ ⎢
W1 (4, L) ⎥ ⎢ h2 h1 h0 h3 0 0 0 0 ⎥⎢
⎥ W2 (4, L) ⎥
⎢ ⎢ ⎥
⎢ W1 (6, L) ⎥ ⎢ h3 − h0 h1 − h2 0 0 0 0 ⎥⎢ W2 (4, H ) ⎥
⎢ ⎥=⎢ ⎥⎢ ⎥.
⎢ W1 (8, L) ⎥ ⎢ 0 h2 h1 h0 h3 0 ⎥⎢ W2 (8, L) ⎥
⎢ ⎥ ⎢ 0 0 ⎥⎢ ⎥
⎢ ⎥ ⎢
W1 (10, L) ⎥ ⎢ 0 0 h3 − h0 h1 − h2 0 0 ⎥⎢
⎥ W2 (8, H ) ⎥
⎢ ⎢ ⎥
⎣ W1 (12, L) ⎦ ⎣ 0 0 0 0 h2 h1 h0 h3 ⎦⎣ W2 (12, L) ⎦
W1 (14, L) 0 0 0 0 h3 − h0 h1 − h2 W2 (12, H )
The obtained values W1 (n, L) with the wavelet coefficients W1 (n, H ) are used
to reconstruct the original signal x (n). The transformation matrix in this case
is of 16 × 16 order and it is formed using the same structure as the previous
transformation matrix.
Although the wavelet realization can be performed using the same basic
function presented in the previous section, here we will consider the equiva-
lent wavelet function h H (n) and equivalent scale function h L (n) in different
scales. To this aim we will analyze the reconstruction part of the system. As-
sume that in the wavelet analysis of a signal only one coefficient is nonzero.
Also assume that this nonzero coefficient is at the exit of all lowpass fil-
ters structure. It means that the signal is equal to the basic scale function in
600 Time-Frequency Analysis
δ (n) GL(z)
φ0(n)=hL(n)
↑ GL(z)
φ1(n)
2
0 GH(z) ↑ GL(z)
2
φ2(n)
↑ GH(z)
↑ GH(z)
the wavelet analysis. The scale function can be found in an inverse way, by
reconstructing signal corresponding to this delta pulse like transform. The
system of reconstruction filters is shown in Fig.9.35. Note that this case and
coefficient in the Haar transform would correspond to W4 (0, L) = 1 in (9.42)
or in Fig.9.30. The reconstruction process consists of signal upsampling and
passing it trough the reconstruction stages. For example, the output of the
third reconstruction stage has the z-transform
Φ2 ( z ) = G L ( z ) G L ( z2 ) G L ( z4 ).
0 G (z)
L
↑ G (z)
L
ψ (n)
1
2
δ (n) G (z)
H
ψ (n)=h (n) ↑ G (z)
L
0 H
2
ψ (n)
2
↑ G (z)
H
↑ G (z)
H
Ψ ( z ) = G H ( z ) G L ( z 2 ) G L ( z4 ).
In the Haar transform (9.42) and Fig.9.30 this case would correspond to
W4 (0, H ) = 1.
602 Time-Frequency Analysis
since
( )
⟨ψ0 (n − 2m), ψ1 (n)⟩ = ∑ g H ( p) ∑ gH (n − 2m) gL (n − 2p) =0
p n
1 1
0 0
-1 -1
0 10 20 30 40 0 10 20 30 40
1 1
0 0
-1 -1
0 10 20 30 40 0 10 20 30 40
1 1
0 0
-1 -1
0 10 20 30 40 0 10 20 30 40
1 1
0 0
-1 -1
0 10 20 30 40 0 10 20 30 40
1 1
0 0
-1 -1
0 1 2 3 0 1 2 3
Figure 9.37 The Daubechies D4 wavelet scale function and wavelet calculated using the filter
bank relation in different scales: a = 0 (first row), a = 1 (second row), a = 2 (third row), a = 3
(fourth row), a = 10 (fourth row-approximation of a continuous domain). The amplitudes are
scaled by 2(a+1)/2 to keep them within the same range. Values ψa (n)2(a+1)/2 and φa (n)2(a+1)/2
are presented.
604 Time-Frequency Analysis
1 1
0 0
-1 -1
0 5 10 15 0 5 10 15
1 1
0 0
-1 -1
0 5 10 15 0 5 10 15
1 1
0 0
-1 -1
0 5 10 15 0 5 10 15
1 1
0 0
-1 -1
0 5 10 15 0 5 10 15
Figure 9.38 The Haar wavelet scale function and wavelet calculated using the filter bank
relation in different scales. Values are normalized 2 (a+1)/2 .
√
In addition to the conditions HL (e j0 ) = 2 and HL (e jπ ) = 0, written as
√
h0 + h1 + h2 + h3 + h4 + h5 = 2
h0 − h1 + h2 − h3 + h4 − h5 = 0,
Ljubiša Stanković Digital Signal Processing 605
h0 h2 + h1 h3 + h2 h4 + h3 h5 = 0
h0 h4 + h1 h5 = 0,
are added. Since the filter order is 6 then two orthogonality conditions must
be used. One for shift 2 and the other for shift 4.
The linear signal cancellation condition is again used as
−h1 + 22 h2 − 32 h3 + 42 h4 − 52 h5 = 0
Another way to form filter coefficients for a six sample wavelet is to intro-
duce the condition that the first moment of the scale function is zero, instead
of the second order moment of the wavelet function. In this case symmetric
form of coefficients should be used in the definition
√
h L (−2) + h L (−1) + h L (0) + h L (1) + h L (2) + h L (3) = 2
h2L (−2) + h2L (−1) + h2L (0) + h2L (1) + h2L (2) + h2L (3) = 1
−2h L (−2) + h L (−1) − h L (1) + 2h L (2) − 3h L (3) = 0
h L (−2)h L (0) + h L (−1)h L (1) + h L (0)h L (2) + h L (1)h L (3) = 0
h L (−2)h L (2) + h L (−1)h L (3) = 0.
would be easier to relate the wavelet transform to the linear (D4) and higher
order interpolations of functions (signals), within the intervals of various
lengths (corresponding to various wavelet transform scales), than to the
spectral analysis where the harmonic basis functions play the central role.
would be used to calculate W (4n, 0), W (4n, 1), W (4n, 2), and W (4n, 3),
Fig.9.40. The asymmetry of the frequency regions is visible.
Note that the STFT analysis of this case, with a Hann(ing) window
of N = 8 and calculation step R = 4 will result in the same number of
instants, however the frequency range will be divided in 8 regions, having
a finer grid. This grid is redundant with respect to the signal and to the
wavelet transform. Both, the signal and the wavelet transform have 16 values
(coefficients).
9.3.2 S-Transform
discrete-time n 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Figure 9.39 Full coverage of the time-frequency plane using the filter bank calculation and
systems with impulse responses corresponding to the wavelet transformation.
+∞
" 2 2
|Ω| − (τ −t) 2 Ω
Sc (t, Ω) = x (τ )e 8π e− jΩτ dτ, (9.65)
(2π )3/2
−∞
|Ω| − τ2 Ω22
w(τ, Ω) = e 8π , (9.67)
(2π )3/2
the definition of the continuous S-transform can be rewritten as follows
+∞
"
− jΩt
Sc (t, Ω) = e x (t + τ )w(τ, Ω)e− jΩτ dτ. (9.68)
−∞
Ljubiša Stanković Digital Signal Processing 609
1 4
0 2
-1
0
0 2 4 6 8 10 -1 -0.5 0 0.5 1
1 4
0 2
-1
0
0 2 4 6 8 10 -1 -0.5 0 0.5 1
1 4
0 2
-1
0
0 2 4 6 8 10 -1 -0.5 0 0.5 1
1 4
0 2
-1
0
0 2 4 6 8 10 -1 -0.5 0 0.5 1
Figure 9.40 Daubechies functions: Scaling function (first row), Mother wavelet function
(second row), Function producing the low-frequency part in the second stage of the high
frequency part in the first stage (third), Function producing the high-frequency part in the
second stage of the high frequency part in the first stage (fourth). Time domain forms of the
functions are left while its spectral content is shown on the right.
After the presentation of the wavelet transform we will shift back our
attention to the frequency of the signal, rather than to its amplitude values.
There are signals whose instantaneous frequency variations are known up
to an unknown set of parameters. For example, many signals could be
expressed as polynomial-phase signals
2 +a 3 +···+ a N +1 )
x (t) = Ae j(Ω0 t+a1 t 2t Nt
Show that its LPFT could be completely concentrated along the instantaneous
frequency.
⋆Its LPFT has the form
"∞
2
LPFTΩ1 (t, Ω) = x (t + τ )w(τ )e− j(Ωτ +Ω1 τ ) dτ
−∞
"∞
2 2
= e j ( Ω0 t + a1 t )
w(τ )e− j(Ω−Ω0 −2a1 t)τ e− j(Ω1 − a1 )τ dτ. (9.71)
−∞
For Ω1 = a1 , the second-order phase term does not introduce any distortion
to the local polynomial spectrogram,
' '
' LPFTΩ = a (t, Ω)'2 = |W (Ω − Ω0 − 2a1 t)|2 ,
1 1
Ljubiša Stanković Digital Signal Processing 611
Example 9.26. Consider the first-order LPFT of a signal x (t). Show that the second-
order moments of the LPFT could be calculated based on the windowed
signal moment, windowed signal’s Fourier transform moment and one more
LPFT moment for any Ω1 in (9.70), for example for Ω1 = 1.
⋆The second-order moment of the first-order LPFT,
"∞
2
LPFTΩ1 (t, Ω) = xt (τ )e− j(Ωτ +Ω1 τ ) dτ,
−∞
defined by
"∞ ' '2
1
MΩ1 = Ω2 ' LPFTΩ1 (t, Ω)' dΩ (9.72)
2π
−∞
is equal to
' B C' 2
"∞ '' d xt (τ )e− jΩ1 τ 2 ''
MΩ1 = ' ' dτ,
' dτ '
−∞
' '
2
since the LPFT could be considered as the Fourier transform of xt (τ )e− jΩ1 τ ,
2
that is, LPFTΩ1 (t, Ω) = FT{ xt (τ )e− jΩ1 τ }, and the Parseval’s theorem is used.
After the derivative calculation
"∞ '' '2
'
MΩ 1 = ' dxt (τ ) − j2Ω1 τxt (τ )' dτ =
' dτ '
−∞
"∞ '' '
dx (τ ) ''2 dx (τ ) dx ∗ (τ )
('' t ' + j2Ω1 τxt∗ (τ ) t − j2Ω1 τxt (τ ) t + |2Ω1 τxt (τ )|2 )dτ.
dτ dτ dτ
−∞
612 Time-Frequency Analysis
This is the moment of Xt (Ω) = FT{ xt (τ )}, since the integral of |dxt (τ )/dτ |2
over τ is equal to the integral of | jΩXt (Ω)|2 over Ω, according to Parseval’s
theorem. Also, we can see that the last term in MΩ1 contains the signal
moment,
"∞
mx = τ 2 | xt (τ )|2 dτ, (9.73)
−∞
multiplied by 4Ω21 . Then, it is easy to conclude that
"∞ * +
d[ xt (τ )] d[ x ∗ (τ )]
MΩ1 − M0 − 4m x Ω21 = Ω1 j2τxt∗ (τ ) − j2τxt (τ ) t dτ.
dτ dτ
−∞
Note that the last integral does not depend on parameter Ω1 . Thus, the
relation among the LPFT moments at any two Ω1 , for example, Ω1 = a and
an arbitrary Ω1 , easily follows as the ratio
MΩ1 = a − M0 − 4a2 m x a
= . (9.74)
MΩ1 − M0 − 4Ω21 m x Ω1
M1 − M0 − 4m x 1
= , (9.75)
MΩ1 − M0 − 4Ω21 m x Ω1
with M1 = MΩ1 =1 .
Obviously, the second-order moment, for any Ω1 , can be expressed as
a function of other three moments. In this case the relation reads
MΩ1 = 4Ω21 m x + Ω1 ( M1 − M0 − 4m x ) + M0 .
Example 9.27. Find the position and the value of the second-order moment min-
imum of the LPFT, based on the windowed signal moment, the windowed
signal’s Fourier transform moment, and the LPFT moment for Ω1 = 1.
⋆The minimal value of the second-order moment (meaning the best
concentrated LPFT in the sense of the duration measures) could be calculated
from
dMΩ1
=0
dΩ1
Ljubiša Stanković Digital Signal Processing 613
as
M1 − M0 − 4m x
Ω1 = − .
8m x
Since m x > 0 this is a minimum of the function MΩ1 . Thus, in general, there
is no need for a direct search for the best concentrated LPFT over all possible
values of Ω1 . It can be found based on three moments.
The value of MΩ1 is
( M1 − M0 − 4m x )2
MΩ1 = M0 − . (9.76)
16m x
Note that any two moments, instead of M0 and M1 , could be used in
the derivation.
"∞
STFTα (u, v) = Xα (u + τ )w(τ )e− jvτ dτ (9.81)
−∞
"∞
STFTα (u, v) = x (t + τ )w(τ )Kα (u, τ )dτ (9.82)
−∞
meaning that the lag truncation could be applied after signal rotation or
prior to the rotation. Results are similar. A similar relation for the moments,
like (9.75) in the case of LPFT, could be derived here. It states that any FRFT
moment can be calculated if we know just any three of its moments.
can be written as
1 H
STFT (ω, n) = ŝω (n) = h H x(n) = a (ω )x(n)
N
a H (ω ) = [1 e−iω e−iω2 ...e−iω ( N −1) ] (9.83)
x(n) = [ x (n) x (n + 1) x (n + 2)... x (n + N − 1)] T ,
where T denotes the transpose operation, and H denotes the conjugate and
transpose (Hermitian) operation. Normalization of the STFT with N is done,
as in the robust signal analysis.
The average power of the output signal ŝω (n), over M samples (ergod-
icity over M samples around n is assumed), for a frequency ω, is
1
|ŝω (n)|2
M∑
P(ω ) = (9.84)
n
1 H 1 1
= 2
a (ω ) ∑[x(n)x H (n)]a(ω ) = 2 a H (ω )R̂ x a(ω ),
N M n N
1
x ( n ) x H ( n ).
M∑
R̂ x =
n
The standard STFT (9.83) can be derived based on the following con-
sideration. Find h as a solution of the problem
h H x(n) = h H Aa(ω ) = A.
Thus, the condition h H a(ω ) = 1 means that the estimate is unbiased with
respect to input sinusoidal signal with amplitude A.
∂
{h H h + λ(h H a(ω ) − 1)} = 0 subject to h H a(ω ) = 1,
∂h H
2h = −λa(ω ) subject to h H a(ω ) = 1
resulting in
a(ω ) 1
h= = a(ω ) (9.86)
a H (ω )a(ω ) N
and the estimate (9.83), which is the standard STFT, follows.
Consider now a different optimization problem, defined by
1
|h H x(n)|2 } subject to h H a(ω ) = 1.
M∑
min{ (9.87)
h n
1
h H x(n)x H (n)h} subject h H a(ω ) = 1.
M∑
min{
h n
By denoting
1
x ( n ) x H ( n ),
M∑
R̂x =
n
we get
Ljubiša Stanković Digital Signal Processing 617
∂
{h H R̂x h + λ(h H a(ω ) − 1)} = 0 subject to h H a(ω ) = 1.
∂h H
gives the solution
−1 λa (ω )
h = −R̂x subject to h H a(ω ) = 1. (9.88)
2
The solution can be written in the form
R̂− 1
x a(ω )
ĥ = , (9.89)
a H (ω )R̂x−1 a(ω )
where
1
x ( n ) x H ( n ).
M∑
R̂x = (9.90)
n
The output signal power, in these cases, corresponds to Capon’s form
of the STFT, defined by
1
|h H x(n)|2 = h H R̂x h
M∑
SCapon (ω ) = (9.91)
n
( )H
R̂−
x
1 a(ω ) R̂− 1
x a(ω )
= R̂ x (9.92)
a H (ω )R̂− 1
x a(ω ) a H (ω )R̂− 1
x a(ω )
1
= . (9.93)
a H (ω )R̂x−1 a(ω )
where n indicates the time instant of the interest and the mean is calculated
over the observations y(n) in the corresponding window.
In the realization the autocorrelation function is regularized by a unity
matrix I thus, we use
n+K/2
1
x( p)x H ( p) + ρI.
K + 1 p=n∑
R̂(n) = (9.96)
−K/2
instead of R̂x (n) for the inverse calculation in (9.95) and (9.91).
n+K/2
1
x( p)x H ( p) + ρI = V H (n)Λ(n)V(n),
K + 1 p=n∑
R̂(n) =
−K/2
1
SCapon (n, ω ) =
a H ( ω ) V H ( n ) Λ −1 ( n ) V ( n ) a ( ω )
1
= N
∑ 1
λk |STFTk (n, ω )|2
k =1
where
STFTk (n, ω ) = a H (ω )vk (n)
is the STFT of the kth eigenvector (column) of the autocorrelation matrix
R̂(n), corresponding to the eigenvalue λk . If the signal has N − M com-
ponents then the first N − M largest eigenvalues λk (corresponding to the
Ljubiša Stanković Digital Signal Processing 619
smallest values 1/λk ) will represent the signal space (components), and the
remaining M eigenvalues will correspond to the noise space (represented
by ρI in the definition of autocorelation matrix R̂(n)).
If a frequency ω corresponds to a signal component, then all eigenvec-
tors corresponding to the noise space will be orthogonal to that harmonic,
being represented by a H (ω ). It means that the spectrograms of all noise
space only components will be very small at the frequencies corresponding
to the signal frequencies.
The MUSIC STFT is defined based on this fact. It is calculated using
the eigenvectors corresponding to noise space, as
1 1
SMUSIC (n, ω ) = = , (9.97)
a H (ω )V H
M V M a(ω )
N
2
∑ |STFTk (n, ω )|
k = N − M +1
1 128+7
R̂(128) = ∑ x( p)x H ( p) + 0.00001 · I
15 p=128 −7
620 Time-Frequency Analysis
1 1
SCapon (128, ω ) = = .
a H (ω )R̂−1 (128)a(ω ) 16
∑ 1
λk |STFTk (n, ω )|2
k =1
1 1
SMUSIC (n, ω ) = H V a(ω )
=
a H (ω )V14 14
16
2
∑ |STFTk (n, ω )|
k =3
2 +ω
x (n) = Ae j(α0 n 0 n + ϕ0 )
Ljubiša Stanković Digital Signal Processing 621
1 1
0.5 0.5
0 0
(a) (b)
0 1 2 3 0 1 2 3
Ω Ω
MUSIC spectrogram (normalized) Capon spectrogram (zoomed log scale)
1
-1
10
-2
0.5 10
-3
10
0 -4
(d) 10 (d)
0 1 2 3 0.95 1 1.05
Ω Ω
MUSIC spectrogram (zoomed log scale) Pisarenko spectrogram (zoomed log scale)
-1 -1
10 10
-2 -2
10 10
-3 -3
10 10
-4 -4
10 (e) 10 (f)
0.95 1 1.05 0.95 1 1.05
Ω Ω
Figure 9.41 (a) The standard STFT using a rectangular window N = 16. The STFT is interpo-
lated in frequency up to 2048 samples. (b) Capon’s spectrogram calculated in 2048 frequency
points. (c) MUSIC spectrogram calculated in 2048 frequency points. (d) Capon’s spectrogram
zoomed to the signal components. (e) MUSIC spectrogram zoomed to the signal components.
(f) Pisarenko spectrogram zoomed to the signal components.
with α as a parameter. The high-resolution form of the LPFT can be used for
efficient processing of close linear frequency-modulated signals, with the
same rate within the considered interval.
622 Time-Frequency Analysis
0.5 0.5
0 0
t
t
-0.5 -0.5
(a) (b)
-500 0 500 -500 0 500
Ω Ω
0.5 0.5
0 0
t
-0.5 -0.5
(c) (d)
-500 0 500 -500 0 500
Ω Ω
Figure 9.42 (a) The standard STFT, (b) the LPFT, (c) Capon’s STFT, and (d) Capon’s LPFT-
based representations of two close almost linear frequency-modulated signals.
Example 9.30. The Capon LPFT form is illustrated on an example with a signal
with two close components
Optimal STFT with a Hann window Wigner distribution with a Hann window
Figure 9.43 Optimal STFT (absolute value, calculated with optimal window width) and the
Wigner distribution of a linear frequency modulated signal.
modulated (LFM) chirp. For simplicity of analysis assume that its instan-
taneous frequency (IF) coincides with the time-frequency plane diagonal.
It is obvious that, due to symmetry, both time and frequency resolution
are equally important. Therefore, the best STFT would be the one calcu-
lated by using a constant window whose (equivalent) widths are equal
in time and frequency domain. With such a window both resolutions will
be the same. However, these resolutions could be unacceptably low for
many applications. It means that the STFT, including all of its possible time
and/or frequency-varying window forms, would be unacceptable as a time-
frequency representation of this signal. The overlapping STFT could be used
for better signal tracking, without any effect on the resolution.
A way to improve time-frequency representation of this signal is in
transforming the signal into a sinusoid whose constant frequency is equal to
the instantaneous frequency value of the linear frequency modulated signal
at the considered instant. Then, a wide window can be used, with a high
frequency resolution. The obtained result is valid for the considered instant
only and the signal transformation procedure should be repeated for each
instant of interest.
A simple way to introduce this kind of signal representation is pre-
sented. Consider an LFM signal,
Ωi (t) = dφ(t)/dt = at + b.
dφ(t) τ τ
τ = φ(t + ) − φ(t − )
dt 2 2
= τ ( at + b) = τΩi (t).
"∞
WD (t, Ω) = x (t + τ/2) x ∗ (t − τ/2)e− jΩτ dτ. (9.98)
−∞
"∞
1
WD (t, Ω) = X (Ω + θ/2) X ∗ (Ω − θ/2)e jθt dθ (9.99)
2π
−∞
x(t+τ/2)
τ
x(t- τ/2)
τ
x(t+τ/2)x*(t- τ/2)
τ
FT{x(t+τ/2)x*(t- τ/2)}
Ω Ω
Figure 9.44 Illustration of the Wigner distribution calculation, for a considered time instant t.
Real values of a linear frequency modulated signal (linear chirp) are presented.
"∞
∗ 1
x (t + τ/2) x (t − τ/2) = IFT{WD (t, Ω)} = WD (t, Ω)e jΩτ dΩ (9.100)
2π
−∞
"∞
2 1
| x (t)| = WD (t, Ω)dΩ. (9.101)
2π
−∞
Example 9.31. Find the Wigner distribution of signals: (a) x (t) = δ(t − t1 ) and (b)
x (t) = exp( jΩ1 t).
626 Time-Frequency Analysis
since | a| δ( at) x (t) = δ(t) x (0). From the Wigner distribution definition in
terms of the Fourier transform, for x (t) = exp( jΩ1 t) with X (Ω) = 2πδ(Ω −
Ω1 ), follows
WD (t, Ω) = 2πδ(Ω − Ω1 ).
A high concentration of time-frequency representation for both of these
signals is achieved. Note that this fact does not mean that we will be able
to achieve an arbitrary high concentration simultaneously, in a point, in the
time-frequency domain.
2
Example 9.32. Consider a linear frequency modulated signal, x (t) = Ae jbt /2 . Find
with
WD (t, Ω) = 2π | A|2 δ(Ω − bt).
Again, a high concentration along the instantaneous frequency in the time-
frequency plane may be achieved for the linear frequency modulated signals.
M
x (t) = ∑ xm (t)
m =1
M "∞ B
M
τC ∗B τ C − jΩτ
WD (t, Ω) = ∑ ∑ xm t +
2
xn t −
2
e dτ.
m=1 n=1−∞
M M "∞
τ ∗ τ
WDct (t, Ω) = ∑ ∑ xm (t + ) x (t − )e− jΩτ dτ.
m =1 n =1 − ∞
2 n 2
n̸=m
Usually, they are not desirable in the time-frequency signal analysis. Cross-
terms can mask the presence of auto-terms, which makes the Wigner distri-
bution unsuitable for the time-frequency analysis of signals.
For a two-component signal with auto-terms located around (t1 , Ω1 )
and (t2 , Ω2 ) (see Fig.9.45) the oscillatory cross-terms are located around
((t1 + t2 )/2, (Ω1 + Ω2 )/2).
Example 9.34. Analyze auto-terms and cross-terms for two-component signal of
the form
1 2 jΩ1 t 1 2 − jΩ1 t
x ( t ) = e − 2 ( t − t1 ) e
+ e − 2 ( t + t1 ) e
628 Time-Frequency Analysis
Auto-term
Ω
2
Ω Oscillatory
1
cross-term
Auto-term
0 t t t
1 2
where the first and second terms represent auto-terms while the third term
is a cross-term. Note that the cross-term is oscillatory in both directions. The
oscillation rate along the time axis is proportional to the frequency distance
between components 2Ω1 , while the oscillation rate along frequency axis is
proportional to the distance in time of components, 2t1 . The oscillatory nature
of cross-terms will be used for their suppression.
"∞ B τC ∗B τ C − jθt
AF (θ, τ ) = x t+ x t− e dt. (9.103)
2 2
−∞
AF (θ, τ ) = FT2D
t,Ω {WD (t, Ω )},
⎡ ⎤
"∞ "∞ "∞
WD (t, Ω) =
1 ⎣ x (u + τ ) x ∗ (u − τ )e− jθu du⎦ e jθt− jΩτ dτdθ,
2π 2 2
−∞ −∞ −∞
"∞ * + * +
1 θ θ jΩτ
AF (θ, τ ) = X Ω+ X∗ Ω − e dΩ. (9.104)
2π 2 2
−∞
From this form we can conclude that the auto-terms of the components,
limited in frequency to Xm (Ω) ̸= 0 only for |Ω − Ωm | < Wm , are located
in the ambiguity domain around τ-axis within the region |θ/2| < Wm . The
cross-terms are within
|θ + Ωn − Ωm | < Wm + Wn ,
where Ωm and Ωn are the frequencies around which the Fourier transform
of each component lies.
630 Time-Frequency Analysis
| AF (θ,τ) |
τ
Cross-term
τ
2
Auto-terms
0
τ
1
Cross-term
θ 0 θ θ
1 2
Figure 9.46 Auto and cross-terms for two-component signal in the ambiguity domain.
Therefore, all auto-terms are located along and around the ambiguity
domain axis. The cross-terms, for the components which do not overlap in
the time and frequency, simultaneously, are dislocated from the ambiguity
axes, Fig. 9.46. This property will be used in the definition of the reduced
interference time-frequency distributions.
The ambiguity function of a four-component signal consisting of two
Gaussian pulses, one sinusoidal and one linear frequency modulated com-
ponent is presented in 9.47.
Example 9.35. Let us consider signals of the form
1 2
x1 ( t ) = e − 2 t
1 2 jΩ1 t 1 2 − jΩ1 t
x 2 ( t ) = e − 2 ( t − t1 ) e
+ e − 2 ( t + t1 ) e
AF(θ,τ)
100
50
τ 0
-50
-100
0 1 2 3
-3 -2 -1
θ
In the ambiguity domain (θ, τ ) auto-terms are located around (0, 0) while
cross-terms are located around (2Ω1 , 2t1 ) and (−2Ω1 , −2t1 ) as presented in
Fig. 9.46.
P2 – Time-shift property
The Wigner distribution of a signal shifted in time
y ( t ) = x ( t − t0 ),
is
WDy (t, Ω) = WDx (t − t0 , Ω).
632 Time-Frequency Analysis
we have
WDy (t, Ω) = WDx (t, Ω − Ω0 ).
P4 – Time marginal property
"∞
1
WD (t, Ω)dΩ = | x (t)|2 .
2π
−∞
"∞
WD (t, Ω)dt = | X (Ω)|2 .
−∞
1
&∞
⋆ This property follows from 2π −∞ WD (t, Ω )dΩ = | x (t)|2 .
P7 -Frequency moments property
P8 – Scaling
For a scaled version of the signal
F
y(t) = | a| x ( at), a ̸= 0,
"∞
d[ x (t + τ/2) x ∗ (t − τ/2)] 1
= jΩ WD (t, Ω)e jΩτ dΩ
dτ 2π
−∞
"∞
j 1
Ω WD (t, Ω) dΩ = [ x ′ (t) x ∗ (t) − x (t) x ∗′ (t)] = jφ′ (t) A2 (t).
2π 2
−∞
&∞
With the frequency marginal property −∞ WD (t, Ω ) dΩ = 2πA2 (t), this
property follows.
P10 – Group delay
For signal whose Fourier transform is of the form X (Ω) = | X (Ω)| e jΦ(Ω) ,
the group delay t g (Ω) = −Φ′ (Ω) is
&∞
t WD (t, Ω) dt d
&−∞∞ = t g (Ω) = − arg[ X (Ω)] = −Φ′ (Ω).
−∞ WD (t, Ω ) dt
dΩ
The proof is the same as in the instantaneous frequency case, using the
frequency domain relations.
P11 – Time constraint
for
"∞
y(t) = h(t − τ ) x (τ )dτ,
−∞
P14 – Product
"∞
1
WDy (t, Ω) = WDh (t, Ω − v)WDx (t, v)dv
2π
−∞
for
y ( t ) = h ( t ) x ( t ).
⋆ The local autocorrelation of y(t) is h(t + τ/2)h∗ (t − τ/2) x (t +
τ/2) x ∗ (t − τ/2). Thus, the Wigner distribution of y(t) is the Fourier trans-
form of the product of local autocorrelations h(t + τ/2)h∗ (t − τ/2) and
x (t + τ/2) x ∗ (t − τ/2). It is a convolution in frequency of the corresponding
Wigner distributions of h(t) and x (t). Property P13 could be proven in the
same way using the Fourier transforms of signals h(t) and x (t).
P15 – Fourier transform property
for F
y(t) = |c|/(2π ) X (ct), c ̸= 0.
⋆ Here the signal y(t) is equal to the scaled version of the Fourier
transform of signal x (t),
"∞ B
|c| cτ C ∗ B cτ C − jΩτ
WDy (t, Ω) = X ct + X ct − e dτ
2π 2 2
−∞
"∞ * + * +
1 θ ∗ θ
= X ct + X ct − e j(−Ω/c)θ dθ. (9.107)
2π 2 2
−∞
Ljubiša Stanković Digital Signal Processing 635
"∞ * + * + * +
Ω τ ∗ Ω τ − jctτ Ω
WDy (t, Ω) = x − + x − − e dτ = WDx − , ct .
c 2 c 2 c
−∞
for F
2
y(t) = x (t) ∗ |c|e jct /2 .
, 2 , 2
⋆ With Y (Ω) = FT{ x (t) ∗t |c|e jct /2 } = 2πjX (Ω)e− jΩ /(2c) and the
signal’s Fourier transform-based definition of the Wigner distribution, proof
of this property reduces to the next one.
P17 – Chirp product
for
2 /2
y(t) = x (t)e jct .
⋆ The Wigner distribution of y(t) is
"∞
PWD (t, Ω) = w(τ/2)w∗ (−τ/2) x (t + τ/2) x ∗ (t − τ/2)e− jΩτ dτ (9.111)
−∞
where window w(τ ) localizes the considered lag interval. If w(0) = 1, the
pseudo Wigner distribution satisfies the time marginal property. Note that
the pseudo Wigner distribution is smoothed in the frequency direction with
respect to the Wigner distribution
"∞
1
PWD (t, Ω) = WD (t, θ )We (Ω − θ )dθ
2π
−∞
PWD (t,Ω)
1
250
200
150
100
t
50
(a)
0 2.5 3
0.5 1 1.5 2
0
Ω
PWD (t,Ω)
2
250
200
150
100
t
50
(b)
0 2.5 3
0.5 1 1.5 2
0
PWD(t,Ω) PWD(t,Ω)
100 100
50 50
t 0 t 0
-50 -50
-100 (a) -100 (b)
1 2 3 1 2 3
0 0
Ω Ω
Figure 9.49 Pseudo Wigner distribution for sinusoidally frequency modulated signal. Narrow
window (left) and wide window (right).
"2
PWD (Ω, t) = e j32 cos(π (t−τ/2)/64) e− j32 cos(π (t−τ/2)/64) w(τ )e− jΩτ dτ.
−2
"2 π3 τ 3 +τ23
j32 sin(πt/64) 1
WD (Ω, t) = e jπ/2 sin(πt/64)τ e 1283 6 w(τ )e− jΩτ dτ.
−2
' '
' π3 τ 3 +τ 3 '
Obviously, '256 128 sin(πt/64) 1 6 2 ' ≤ 0.081, since |τ1,2 | ≤ 2. Thus, we may
write
PWD (Ω, t) ∼
= W (Ω − π/2 sin(πt/64)),
where W (Ω) is the Fourier transform of window w8 (τ ). For a Hann(ing)
window this approximation holds for wider windows as well, since its values
toward the ending points are small, meaning that the effective window width
is lower than the window width itself.
Ljubiša Stanković Digital Signal Processing 639
∞ B C B C B C B C
∗ ∗ − jmΩ∆t
PWD (t, Ω) = ∑ w m ∆t
2 w − m ∆t
2 x t + m ∆t
2 x t − m ∆t
2 e ∆t.
m=−∞
(9.112)
Sampling in τ with ∆t = π/Ω0 , Ω0 > Ωm corresponds to the sampling of
signal x (t + τ/2) in τ/2 with ∆t/2 = π/(2Ω0 ).
The discrete-lag pseudo Wigner distribution is the Fourier transform
of signal
* + * + * + * +
∆t ∗ ∆t ∆t ∗ ∆t
R(t, m) = w m w −m x t+m x t−m ∆t.
2 2 2 2
∞
PWD (t, ω ) = ∑ R(t, m)e− jmω
m=−∞
with ω = Ω∆t. If the sampling interval satisfies the sampling theorem, then
the sum in (9.112) is equal to the integral form (9.111).
A discrete form of the pseudo Wigner distribution, with N + 1 samples
and ω = 2πk/( N + 1), for a given time instant t, is
N/2
PWD (t, k ) = ∑ R(t, m)e− j2πmk/( N +1) .
m=− N/2
R(n∆t, m∆t)
* + * + * + * +
∆t ∗ ∆t ∆t ∗ ∆t
=w m w −m x n∆t + m x n∆t − m ∆t
2 2 2 2
BmC B mC B C
m ∗ B m C
R(n, m) = w w∗ − x n+ x n− ,
2 2 2 2
640 Time-Frequency Analysis
∞ BmC B mC B mC ∗B m C − jmω
∗
PWD (n, ω ) = ∑ w
2
w −
2
x n +
2
x n −
2
e .
m=−∞
(9.113)
Notation x (n + m/2), for given n and m, should be understood as the
signal value at the instant x ((n + m/2)∆t). In this notation, the discrete-time
pseudo Wigner distribution is periodic in ω with period 2π.
Since various discretization steps are used (here and in open litera-
ture), we will provide a relation of discrete indexes to the continuous time
and frequency, for each definition, as
* +
2πk
PWD (t, Ω)|t=n∆t, Ω= 2πk = PWD n∆t, → PWD (n, k).
( N +1)∆t ( N + 1)∆t
The sign → could be understood as the equality sign in the sense of sam-
pling theorem (Example 2.13). Otherwise it should be considered as a corre-
spondence sign. The discrete form of (9.111), with N + 1 samples, is
* +
2πk
PWD n∆t, → PWD (n, k )
( N + 1)∆t
N/2 BmC B mC B mC ∗B m C − j2πkm/( N +1)
PWD (n, k ) = ∑ w w∗ − x n+ x n− e ,
m=− N/2
2 2 2 2
for − N/2 ≤ 2k ≤ N/2. Since, the standard DFT routines are commonly
used for the pseudo Wigner distribution calculation, we may use every
other (2k) sample in (9.115) or oversample the pseudo Wigner distribution
in frequency (as it has been done in time). Then,
* +
n∆t 2πk
PWD , → PWD (n, k )
2 ( N + 1)∆t
N/2
PWD (n, k ) = ∑ w(m)w∗ (−m) x (n + m) x ∗ (n − m)e− j2πmk/( N +1) .
m=− N/2
(9.116)
15 2 2
PWD (n, k) = ∑ e j31π ((n+m)/124) e− j31π ((n−m)/124) e− j4πmk/31
m=−15
15 sin( π8 (n − 16k))
= ∑ e jπmn/124 e− j4πmk/31 = π .
m=−15
sin( 248 (n − 16k))
The argument k, when the pseudo Wigner distribution reaches maximum for
n = 62, follows from 62 − 16k = 0 as
! 6 - .
62
k̂ = arg max PWD (n, k ) = = 4,
k 16
where [·] stands for the nearest integer. Obviously, the exact instantaneous
frequency is not on the discrete frequency grid. The estimated value of the in-
stantaneous frequency at t = 1/2 is Ω̂ = 4π k̂/(( N + 1)∆t) = 16π/(31/62) =
32π. The true value is Ωi (1/2) = 31π. When the true frequency is not on
the grid, the estimation can be improved by using the interpolation or dis-
placement bin, as explained in Chapter 1. The frequency sampling inter-
val is ∆Ω = 4π/(( N + 1)∆t) = 8π, with maximal estimation absolute error
∆Ω/2 = 4π.
If we used the standard DFT routine (9.116) with N + 1 = 31 and all
available frequency samples, we would get
M 2 2
N
PWD (n, k ) = DFT31 e j31π ((n+m)/124) e− j31π ((n−m)/124)
15 2 2 sin( π8 (n − 8k))
= ∑ e j31π ((n+m)/124) e− j31π ((n−m)/124) e− j2πmk/31 = π .
m=−15
sin( 248 (n − 8k))
N = 4 samples. Then, four values of the signal x (m), used in calculation, are
So, in forming the local autocorrelation function, there are several possibili-
ties. One is to omit sample x (−2) and to use an odd number of samples, in
this case as well. Also, it is possible to periodically extend the signal and to
form the product based on
Here we can use four product terms, but with the first one formed as
x (−2) x ∗ (−2), that is, as x (− N/2) x ∗ (− N/2). When a lag window with zero
ending value is used (for example, a Hann(ing) window), this term does not
make any influence to the result. The used lag window must also follow the
symmetry, for example we (m) = cos2 (πm/N ), when,
* +
n∆t 2πk
PWD , → PWD (n, k )
2 N∆t
N/2−1
PWD (n, k ) = ∑ we (m) x (n + m) x ∗ (n − m)e− j2πmk/N
m=− N/2
N/2−1
= ∑ we (m) x (n + m) x ∗ (n − m)e− j2πmk/N ,
m=− N/2+1
be replaced by a sum
N
∗ ∆t − jmΩ∆t
WD (t, Ω) = ∑ x (t + m ∆t
2 ) x (t − m 2 )e ∆t
m=− N
N/2
∗ ∆t − j2mΩ∆t
= ∑ x (t + 2m ∆t
2 ) x (t − 2m 2 )e ∆t
m=− N/2
N/2−1
∗ ∆t − j(2m+1)Ω∆t
+ ∑ x (t + (2m + 1) ∆t
2 ) x ( t − (2m + 1) 2 )e ∆t. (9.117)
m=− N/2
The initial sum is split into its even and odd terms part. Now, let us assume
that the signal is sampled in such a way that twice wider sampling interval
∆t is also sufficient to obtain the Wigner distribution (by using every other
signal sample). Then, for the first sum (with an odd number of samples)
holds,
N/2
1
∑ x (t + m∆t) x ∗ (t − m∆t)e− j2mΩ∆t ∆t = WD (t, Ω).
m=− N/2
2
The factor 1/2 comes from the sampling interval. Now, from (9.117) follows
N/2−1
∗ ∆t − j(2m+1)Ω∆t 1
∑ x (t + (2m + 1) ∆t
2 ) x ( t − (2m + 1) 2 )e ∆t = WD (t, Ω).
m=− N/2
2
(9.118)
This is just the discrete Wigner distribution with an even number of sam-
ples. If we denote
x (t + (2m + 1) ∆t ∆t
2 ) = x ( t + m∆t + 2 ) = xe ( t + m∆t )
√
x (n∆t + m∆t + ∆t 2 ) 2∆t = xe ( n + m )
then
∆t ∆t
x (t − m∆t − 2 ) = x ( t − m∆t + 2 − ∆t )
√
∆t
x (n∆t − m∆t + 2 − ∆t ) 2∆t = xe (n − m − 1).
N/2−1
WD (t, Ω) = e− jΩ∆t ∑ xe (t + m∆t) xe∗ (t − m∆t − ∆t)e− j2mΩ∆t (2∆t)
m=− N/2
for any t and Ω (having in mind the sampling theorem). Thus, we may also
write
* +
πk
WD n∆t, → WD (n, k )
N∆t
N/2−1
WD (n, k ) = e− jπk/N ∑ xe (n + m) xe∗ (n − m − 1)e− j2πmk/N . (9.119)
m=− N/2
x− ∗ ∗ ∗
n = [ xe ( n + N/2 − 1), xe (n + N/2 − 2), ..., xe (n − N/2)].
! B CT 6
WD (n, k )=e− jπk/N x+
n ∗ x −
n . ∗ e − jπkm/N
,
√
xe (n) ↔ x (n∆t + ∆t/2) 2∆t.
646 Time-Frequency Analysis
To check this statement, consider the time marginal property of this distri-
bution. It is
1 N/2−1
∑ WD (n, k)
N k=− N/2
( )
N/2−1
1 N/2−1 − j(2m+1)πk/N
= ∑ xe (n + m) xe∗ (n − m − 1) ∑ e
m=− N/2
N k=− N/2
( )
N/2−1
1 j(2m+1)π/2 1 − e− j(2m+1)π
= ∑ xe (n + m) xe∗ (n − m − 1)
e
m=− N/2
N 1 − e− j(2m+1)π/N
' '
N/2−1
∗
'
' 1 ''2
= ∑ ( xe (n + m) xe (n − m − 1)δ(2m + 1)) = ' xe (n − )'
m=− N/2
2
= | x (n∆t)|2 (2∆t),
where
Y (k ) = DFT N {y(n)},
the pseudo Wigner distribution (9.6.4), without frequency ovesampling, in
the case of an even N, can be calculated as
* +
2πk
WD n∆t, → WD (n, k)
N∆t
N/4−1
WD (n, k ) = e− jπk/( N/2) ∑ ( R(n, m) + R(n, m + N/2)) e− j2πmk/( N/2)
m=− N/4
where
R(n, m) = xe (n + m) xe∗ (n − m − 1).
Periodicity in m, for a given n, with period N is assumed in R(n, m), that
is, R(n, m + N ) = R(n, m) = R(n, m − N ). It is needed to calculate R(n, m +
N/2) for − N/4 ≤ m ≤ N/4 − 1 using R(n, m) for − N/2 ≤ m ≤ N/2 − 1
only.
In the case of real-valued signals, in order to avoid the need for
oversampling, as well as to eliminate cross-terms (that will be discussed
later) between positive and negative frequency components, their analytic
part is used in calculations.
Ljubiša Stanković Digital Signal Processing 647
"∞
1
PWD (t, Ω) = STFT (t, Ω + θ )STFT ∗ (t, Ω − θ )dθ. (9.120)
π
−∞
"L P
1
SM (t, Ω) = P(θ )STFT (t, Ω + θ )STFT ∗ (t, Ω − θ )dθ, (9.122)
π
− LP
where P(θ ) is a finite frequency domain window (we also assume rectangu-
lar form), P(θ ) = 0 for |θ | > L P . Distribution obtained in this way is referred
to as the S-method. Two special cases are: the spectrogram P(θ ) = πδ(θ ) and
the pseudo Wigner distribution P(θ ) = 1.
The S-method can produce a representation of a multi-component sig-
nal such that the distribution of each component is its Wigner distribution,
avoiding cross-terms, if the STFTs of the components do not overlap in time-
frequency plane.
Consider a signal
M
x (t) = ∑ xm (t)
m =1
where xm (t) are monocomponent signals. Assume that the STFT of each
component lies inside the region Dm (t, Ω), m = 1, 2, ..., M and assume that
regions Dm (t, Ω) do not overlap. Denote the length of the m-th region along
Ω, for a given t, by 2Bm (t), and its central frequency by Ω0m (t). Under this
assumptions the S-method of x (t) produces the sum of the pseudo Wigner
distributions of each signal component
M
SMx (t, Ω) = ∑ PWDxm (t, Ω), (9.123)
m =1
648 Time-Frequency Analysis
if the width of the rectangular window P(θ ), for a point (t, Ω), is defined by
!
Bm (t) − |Ω − Ω0m (t)| for (t, Ω) ∈ Dm (t, Ω)
L P (t, Ω) =
0 elsewhere.
To prove this consider a point (t, Ω) inside a region Dm (t, Ω). The integra-
tion interval in (9.122), for the m-th signal component is symmetrical with
respect to θ = 0. It is defined by the smallest absolute value of θ for which
Ω + θ or Ω − θ falls outside Dm (t, Ω), i.e.,
M
produces SMx (t, f ) = ∑m =1 PWDxm (t, Ω ), if the regions
' Dm (t, Ω) for m' =
1, 2, .., M, are at least 2L P apart along the frequency axis, 'Ω0p (t) − Ω0q (t)' >
B p (t) + Bq (t) + 2L P , for each p, q and t. This is the S-method with constant
window width. The best choice of L P is the value when P(θ ) is wide
enough to enable complete integration over the auto-terms, but narrower
than the distance between the auto-terms, in order to avoid the cross-terms.
If two components overlap for some time instants t, then the cross-term will
appear, but only between these two components and for that time instants.
A discrete form of the S-method (9.122) reads
L
SM L (n, k ) = ∑ S N (n, k + i )S∗N (n, k − i )
i =− L
The spectrogram is the initial distribution SM0 (n, k ) = |S N (n, k )|2 and
2 Re[S N (n, k + i )S∗N (n, k − i )], i = 1, 2,..., L are the correction terms. Changing
parameter L we can start from the spectrogram ( L = 0) and gradually make
the transition toward the pseudo Wigner distribution by increasing L.
For the S-method realization we have to implement the STFT first,
based either on the FFT routines or recursive approaches suitable for hard-
ware realizations. After we get the STFT we have to “correct” the ob-
tained values, according to (9.124), by adding few “correction” terms to the
spectrogram values. Note that S-method is one of the rare quadratic time-
frequency distributions allowing easy hardware realization, based on the
hardware realization of the STFT, presented in the first part, and its “correc-
tion” according to (9.124). There is no need for analytic signal since the cross-
terms between negative and positive frequency components are removed in
the same way as are the other cross-terms. If we take that STFT (n, k ) = 0
outside the basic period, i.e., when k < − N/2 or k > N/2 − 1, then there
is no aliasing when the STFT is alias-free (in this way we can calculate the
alias-free Wigner distribution by taking L = N/2 in (9.124)). The calcula-
tion in (9.124) can be performed for the whole matrix of the S-method and
the STFT. This can significantly save time in some matrix based calculation
tools.
There are two ways to implement summation in the S-method. The
first one is with a constant L. Theoretically, in order to get the Wigner
distribution for each individual component, the number of correcting terms
L should be such that 2L is equal to the width of the widest auto-term. This
will guarantee cross-terms free distribution for all components which are at
least 2L frequency samples apart.
The second way to implement the S-method is with a time-frequency
dependent L = L(n,k) . The summation, for each point (n, k ), is performed
as long as the absolute values of S N (n, k + i ) and S∗N (n, k − i ) for that (n, k )
are above an assumed reference level (established, for example, as a few
percents of the STFT maximal value). Here, we start with the spectrogram,
L = 0. Consider the correction term S N (n, k + i )S∗N (n, k − i ) with i = 1. If the
STFT values are above the reference level then it is included in summation.
The next term, with i = 2 is considered in the same way, and so on. The
summation is stopped when a STFT in a correcting term is below the
reference level. This procedure will guarantee cross-terms free distribution
for components that do not overlap in the STFT.
with
( a1 , a2 , a3 ) = (−21, −1, 20)
and
(b1 , b2 , b3 ) = (2, −0.75, −2.8),
is considered at the instant n = 0. The IFs of the signal components are k i = ai ,
while the normalized squared amplitudes of the components are indicated by
dotted lines in Fig.9.50. An ideal time-frequency representation of this signal,
at n = 0, would be
STFT |SN(0,k)|
first correction term second correction term
* *
2Re[SN(0,k+1) SN (0,k-1)] 2Re[SN(0,k+2) SN (0,k-2)]
k k k
-32 -16 0 16 31 -32 -16 0 16 31 -32 -16 0 16 31
(b) (d) (f)
k
(g) -32 -16 0 16 31 (i)
(h)
k
(j) (k) -32 -16 0 16 31
(l)
Figure 9.50 Analysis of a signal consisting of three LFM components (at the instant n = 0).
(a) The STFT with a cosine window of the width N = 64. (b) The spectrogram. (c) The first
correction term. (d) The S-method (SM) with one correction term. (e) The second correction
term. (f) The S-method with two correction terms. (g) The S-method with three correction
terms. (h) The S-method with five correction terms. (i) The S-method with six correction terms.
(j) The S-method with eight correction terms.(k) The S-method with nine correction terms. (l)
The Wigner distribution (the S-method with L = 31 correction term).
652 Time-Frequency Analysis
1 1
0.5 0.5
t 0 t 0
0.5
1
t 0 0.5
t 0
-0.5
!
1 when |STFTxi (n, k)|2 ≥ Rn
Di (n, k ) =
0 elsewhere
and presented in Fig.9.51(c). White regions mean that the value of spectro-
gram is below 0.14% of its maximal value at that time instant n, meaning
that the concentration improvement is not performed at these points. The
signal dependent S-method is given in Fig.9.51(d). The method sensitivity,
with respect to the reference level is low.
Ljubiša Stanković Digital Signal Processing 653
"∞ "∞
1
P(t, Ω) dΩ dt = Ex , (9.125)
2π
−∞ −∞
"∞
1
P(t, Ω) dΩ = | x (t)|2 , and (9.126)
2π
−∞
654 Time-Frequency Analysis
|x(t)|2
Integration over Ω
Ω Ω
Integration over t
t 2
|X(Ω)|
P(t,Ω)
"∞
P(t, Ω) dt = | X (Ω)|2 , (9.127)
−∞
θ AF(τ,θ)
|x(t)|2 FT [ |x(t)|2 ]
t τ
Integration over Ω
FT [ | X(Ω)| ]
2
2D FT
Ω Ω
Integration over t
t 2
|X(Ω)|
P(t,Ω)
Figure 9.53 Marginal properties and their relation to the ambiguity function.
The parameter σ controls the slope of the kernel function which affects the
influence of cross-terms. Small σ causes the elimination of cross-terms but
it should not be too small because, for the finite width of the auto-terms
around θ and τ coordinates, the kernel will cause their distortion, as well.
Thus, there should be a trade-off in the selection of σ.
Here we will mention some other interesting kernel functions, produc-
ing corresponding distributions, Fig. 9.54.
Born-Jordan distribution
sin( θτ
2 )
c(θ, τ ) = θτ
,
2
Zhao-Atlas-Marks distribution
sin( θτ
2 )
c(θ, τ ) = w(τ ) |τ | θτ
,
2
Sinc distribution
!
θτ 1 for |θτ/α| < 1/2
c(θ, τ ) = rect( ) =
α 0 otherwise
Butterworth distribution
1
c(θ, τ ) = ,
1 + ( θθτ
c τc
)2N
c(θ,τ) c(θ,τ)
100 100
50 50
τ 0 τ 0
-50 -50
(a) (b)
-100 -100
0 2 0 2
-2 -2
θ θ
c(θ,τ) c(θ,τ)
100 100
50 50
τ 0 τ 0
-50 -50
(c) (d)
-100 -100
0 2 0 2
-2 -2
θ θ
Figure 9.54 Kernel functions for: Choi-Williams distribution, Born-Jordan distribution, Sinc
distribution and Zhao-Atlas-Marks distribution.
"∞ B τC B τ C − jθt
c(θ, τ ) = w t− w t+ e dt = AFw (θ, τ ).
2 2
−∞
Since the Cohen class is linear with respect to the kernel, it is easy to
conclude that a distribution from the Cohen class is positive if its kernel
Ljubiša Stanković Digital Signal Processing 659
can be written as
M
c(θ, τ ) = ∑ ai AFwi (θ, τ ),
i =1
where ai ≥ 0, i = 1, 2, ..., M.
There are several ways for calculation of the reduced interference dis-
tributions from the Cohen class. The first method is based on the ambiguity
function (9.131):
1. Calculation of the ambiguity function,
2. Multiplication with the kernel,
3. Calculation of the inverse two-dimensional Fourier transform of this
product.
The reduced interference distribution may also be calculated by using
(9.132) or (9.134) with appropriate kernel transformations defined by (9.135)
and (9.137). All these methods assume signal oversampling in order to avoid
aliasing effects. Figure 9.55 presents the ambiguity function along with ker-
nel (Choi-Williams). Figure 9.56(a) presents Choi-Williams distribution cal-
culated according to the presented procedure. In order to reduce high side
lobes of the rectangular window, the Choi-Williams distribution is also cal-
culated with the Hann(ing) window in the kernel definition c(θ, τ )w(τ ) and
presented in Fig. 9.56(b). The pseudo Wigner distribution with Hann(ing)
window is shown in Fig. 9.48.
For the discrete-time signals. there are several ways to calculate a
reduced interference distributions from the Cohen class, based on (9.131),
(9.132), (9.133), or (9.134).
The kernel functions are usually defined in the Doppler-lag domain
(θ, τ ). Thus, here we should use (9.131) with the ambiguity function of a
discrete-time signal
∞ * + * +
∆t ∗ ∆t
AF (θ, m∆t) = ∑ x p∆t + m x p∆t − m e− jpθ∆t ∆t.
p=−∞ 2 2
The signal should be sampled as in the Wigner distribution case. For a given
lag instant m, the ambiguity function can be calculated by using the stan-
dard DFT routines. Another way to calculate the ambiguity function is just
to take the inverse two-dimensional transform of the Wigner distribution.
Note that the corresponding transformation pairs are time ↔ Doppler and
lag ↔ f requency, that is, t ↔ θ and τ ↔ Ω. The relation between discretiza-
tion values in the Fourier transform pairs (considered interval, sampling
660 Time-Frequency Analysis
100
50
τ 0
-50
-100
-3 -2 -1 0 1 2 3
θ
Figure 9.55 Ambiguity function for signal from Fig.9.4 with the Choi-Williams kernel
1 ∞ ∞
CD (n∆t, k∆Ω) = ∑ ∑ AFg (l∆θ, m∆t)e− jkm∆t∆Ω e jnl∆θ∆t ∆t∆θ.
2π l =−∞ m=−∞
CWD(t,Ω)
250
200
150
t 100
50 (a)
0 2.5 3
0.5 1 1.5 2
0
Ω
CWD(t,Ω)
250
200
150
t 100
50 (b)
0 2.5 3
0.5 1 1.5 2
0
Ω
Figure 9.56 Choi-Williams distribution: (a) direct calculation, (b) calculation with the kernel
multiplied by a Hann(ing) lag window.
∞ ∞
CD (n∆t, k∆Ω) = ∑ ∑ c T (n∆t − p∆t, m∆t)
p=−∞m=−∞
* + * +
∆t ∆t
× x p∆t + m x ∗ p∆t − m e− jkm∆t∆Ω (∆t)2 (9.139)
2 2
662 Time-Frequency Analysis
with
1 ∞
c T (n∆t − p∆t, m∆t) = ∑ c(l∆θ, m∆t)e jnl∆θ∆t e− jl p∆θ∆t ∆θ.
2π l =− ∞
For the discrete-time signals, it is common to write and use the Cohen
class of distributions in the form
∞ ∞
CD (n, ω ) = ∑ ∑ c T (n − p, m) x ( p + m) x ∗ ( p − m)e− j2mω , (9.140)
p=−∞ m=−∞
where
* + * +
∗ ∆t ∗ ∆t
x ( p + m) x ( p − m) = x ( p + m) x ( p − m) ∆t
2 2
* +
∆t ∆t
c T (n − p, m) = c T (n − p) , m∆t
2 2
* +
∆t
CD (n, ω ) → CD n , Ω∆t .
2
Here we should mention that the presented kernel functions are of
infinite duration along the coordinate axis in (θ, τ ) thus, they should be
limited in calculations. Their transforms exist in a generalized sense only.
we can write
CD (n, ω ) = xn CxnH
where xn is a vector with elements x (n + n1 )e− jωn1 . We can now perform
the eigenvalue decomposition, finding solutions of det (C − λI) = 0 and
determining eigenvectors matrix Q that satisfies QQ H = I and
C = QΛQ H ,
SPEC(t,Ω) WD(t,Ω)
250 250
200 200
150 150
t 100 t 100
50 50
0 2 2.5 3 0 2 2.5 3
a) 0.5 1 1.5 b) 0.5 1 1.5
0 0
Ω Ω
CWD(t,Ω) SM(t,Ω)
250 250
200 200
150 150
t 100 t 100
50 50
0 2 2.5 3 0 2 2.5 3
c) 0.5 1 1.5 d) 0.5 1 1.5
0 0
Ω Ω
Figure 9.57 Time-frequency representation of a four component signal: (a) the spectrogram,
(b) the Wigner distribution, (c) the Choi-Williams distribution, and (d) the S-method.
Chapter 10
Sparse Signal Processing
_________________________________________________
Authors: Ljubiša Stanković, Miloš Daković, Srdjan Stanković, Irena Orović
665
666 Sparse Signal Processing
Before we start the analysis we will describe few widely known examples
that can be interpreted and solved within the context of sparse signal
processing and compressive sensing.
Consider a large set of real numbers X (0), X (1),...,X ( N − 1). Assume
that only one of them is nonzero (or different from a common and known
expected value). We do not know either its position or its value. The aim is to
find the position and the value of this number. This case can easily be related
to many real life examples when we have to find one sample which differs
from other N − 1 samples. The nonzero value (or the difference from the
expected value) will be denoted by X (i ). A direct way to find the position
of nonzero (different) sample would be to perform up to N measurements
and compare each of them with zero (the expected) value. However, if N
is very large and there is only one nonzero (different than expected) sample
we can get the result in just a few observations/measurements. A procedure
for the reduced number of observations/measurements is described next.
Take random numbers as weighting coefficients ai , i = 0, 1, 2, ..., N − 1,
for each sample. Measure the total value of all N weighted samples, with
weights ai , from the set. Since only one is different from the common and
known expected value m (or from zero) we will get the total measured value
M = a1 m + a2 m + ... + ai (m + X (i )) + ... + a N m.
1 2 3 N
a a a a
1 2 3 N
One bag with false coins Two bags with false coins
M-M = a m+a m+...+a (m+X(i))+...+a m M-M = a m+...+a (m+X(i))+...+a (m+X(k))+...+a m
T 1 2 i N T 1 i k N
-(a 1 +a2 +...+ai +...+aN )m=aiX(i) -(a 1 +a2 +...+ai +...+aN )m=aiX(i)+akX(k)
Figure 10.1 There are N bags with coins. One of them, at an unknown position, contains false
coins. False coins differ from the true ones in mass for unknown X (i ) = ∆m. The mass of the
true coins is m. Set of coins for measurement is formed using a1 coins from the first bag, a2 coins
from the second bag, an so on. The total measured value is M = a1 m + ... + ai (m + X (i )) + ... +
a N m. The difference of this value from the case if all coins were true is M − MT . Equation for
the case with one and two bags with false coins are presented (left and right).
weight m of true coins. The goal is to find the position and the difference
in weight of false coins. From each of N bags we will take ai , i = 1, 2, ...N,
coins. Number of coins from the ith bag is denoted by ai . The total measured
weight of all coins from N bags is M, Fig.10.1.
After the expected value is subtracted the observation/measurement
y(0) is obtained
N −1
y (0 ) = ∑ X (k )ψk (0), (10.1)
k =0
As expected, from one measurement we are not able to solve the problem
and to find the position and the value of nonzero sample.
If we perform one more measurement y(1) with another set of weight-
ing coefficients ψk (1), k = 0, 1, ..., N − 1, and get measured value y(1) =
X (i )ψi (1) the result will be a hyperplane
N −1
y (1) = ∑ X (k )ψk (1).
k =0
This measurement will produce a new set of possible solutions for each X (k )
as
X (k) = y(1)/ψk (0), k = 0, 1, 2, ..., N − 1.
If these two hyperplanes (sets of solutions) produce only one common value
for any i ̸= k.
In order to prove this statement assume that two different solutions
X (i ) and X (k ), for the case of one nonzero coefficient, satisfy the same
measurement hyperplane equations
and
ψk (0) X (k ) = y(0), ψk (1) X (k ) = y(1).
Then
ψi (0) X (i ) = ψk (0) X (k )
and
ψi (1) X (i ) = ψk (1) X (k ).
then - . - .- .
y (0 ) ψi (0) ψk (0) X (i )
=
y (1 ) ψi (1) ψk (1) 0
- . - .- .
y (0) ψi (0) ψk (0) 0
= . (10.2)
y (1) ψi (1) ψk (1) X (k )
Subtraction of the previous matrix equations results in
- .- .
ψi (0) ψk (0) X (i )
= 0.
ψi (1) ψk (1) − X (k )
for any i ̸= k. It also means that rank (A2 ) = 2 for any A2 being a 2 × 2 sub-
matrix of the matrix of coefficients (measurement matrix) A. For additional
illustration of this simple problem see Section 10.5.2.
In numerical and practical applications we would not be satisfied,
if for example ψi (0)ψk (1) − ψi (1)ψk (0) ̸= 0 but ψi (0)ψk (1) − ψi (1)ψk (0) =
ε close to zero. In this case the theoretical condition for a unique solution
would be satisfied, however the analysis and possible inversion would be
highly sensitive to any kind of noise, including quantization noise. Thus,
a practical requirement is that the determinant is not just different from
zero, but that it sufficiently differs from zero so that an inversion stability
and robustness to a noise is achieved. Inversion stability for a matrix B is
commonly described by the condition number of matrix
λmax
cond {B} =
λmin
where λmax and λmin are the largest and the smallest eigenvalue of matrix
B (when B H B = BB H )1 . The inversion stability worsens as λmin approaches
to zero (when λmin is small as compared to λmax ). For stable and robust
1 The value of determinant of matrix B is equal to the product of its eigenvalues, det {B} =
λ1 λ2 ...λ N , where N is the order of square matrix B. Note that the condition number can
be interpreted as a ratio of the norms-two (square roots of energies) of noise ε and signal x
after and before inversion y + yε = B−1 (x+ε). This number is always greater or equal to 1.
The best value for this ratio is achieved when λmin is close to λmax .
Ljubiša Stanković Digital Signal Processing 671
calculations a requirement
λmax
≤1+δ
λmin
N −1
y (0) = ∑ X (l )ψl (0) = X (i )ψi (0) + X (k )ψk (0) (10.3)
l =0
N −1
y (1) = ∑ X (l )ψl (1) = X (i )ψi (1) + X (k )ψk (1)
l =0
will result in X (i ) and X (k ) for any i and k. They are the solution of a system
with two equations and two unknowns. Therefore, with two measurements
we cannot get a result of the problem and find the positions and the values
of nonzero coefficients. If two more measurements are performed then an
additional system with two equations
is formed. Two systems of two equations (10.3) and (10.4) could be solved
for X (i ) and X (k ) for each combination of i and k. If these two systems
produce only one common solution pair X (i ) and X (k ) then this pair is the
solution of our problem. As in the case of one nonzero coefficient, we may
show that the sufficient condition for a unique solution is
⎡ ⎤
ψk1 (0) ψk2 (0) ψk3 (0) ψk4 (0)
⎢ ψk (1) ψk2 (1) ψk3 (1) ψk4 (1) ⎥
det ⎢ 1
⎣ ψk (2)
⎥ ̸= 0 (10.5)
1
ψk2 (2) ψk3 (2) ψk4 (2) ⎦
ψk1 (3) ψk2 (3) ψk3 (3) ψk4 (3)
1
ψk2 (2) ψk3 (2) ψk4 (2) ⎦ ⎣ 0 ⎦
y (3 ) ψk1 (3) ψk2 (3) ψk3 (3) ψk4 (3) 0
and
⎡ ⎤ ⎡ ⎤⎡ ⎤
y (0) ψk1 (0) ψk2 (0) ψk3 (0) ψk4 (0) 0
⎢ y(1) ⎥ ⎢ ψk (1) ψk2 (1) ψk3 (1) ψk4 (1) ⎥ ⎢ 0 ⎥
⎢ ⎥ ⎢ ⎥⎢ ⎥.
⎣ y(2) ⎦ = ⎣ ψk (2)
1
1
ψk2 (2) ψk3 (2) ψk4 (2) ⎦ ⎣ X (k3 ) ⎦
y (3) ψk1 (3) ψk2 (3) ψk3 (3) ψk4 (3) X (k4 )
Since (10.5) holds then X (k1 ) = X (k2 ) = X (k3 ) = X (k4 ) = 0, meaning that
the assumption about two independent pairs of solutions with two nonzero
coefficients is not possible.
This approach to solve a problem (and to check the solution unique-
ness) is illustrative, however not computationally feasible. For example, for
a simple case with N = 1024 and just two nonzero coefficients, in order to
find a solution we have to solve two times systems of equations (10.3) and
(10.4) for each possible combination of i and k and to compare their solu-
tions. Total number of combinations of two indices out of the total number
of N indices is * +
N
∼ 106 .
2
In order to check the solution uniqueness we should calculate a determinant
value for all combinations of four indices k1 , k2 , k3 and k4 out the set of N
values. The number of determinants is ( N4 ) ∼ 1012 . If one determinant of the
forth order is calculated in 10 −5 [sec], then more than 5 days are needed
to calculate all determinants for this quite simple case of two nonzero
coefficients.
As a next example consider a signal described by a weighted sum of
K harmonics from a set of possible oscillatory functions e j2πkn/N , k = 0, 1, 2,
Ljubiša Stanković Digital Signal Processing 673
..., N − 1,
N −1
y (0 ) = x ( n1 ) = ∑ X (k )ψk (n1 )
k =0
with the weighting coefficients ψk (n1 ) = exp( j2πn1 k/N )/N. The previous
relation is the IDFT. Now a similar analysis like in the previous illustrative
example can be performed, assuming for example K = 1 or K = 2. We can
find position and value of nonzero X (k ) using just a few of signal samples
y(i ). This model corresponds to many signals in real life. For example, in
the Doppler-radar systems the speed of a radar target is transformed into
a frequency of a sinusoidal signal. Since the returned signal contains only
one or just a few of targets, the signal representing target velocity is a sparse
signal in the DFT domain. It can be reconstructed from fewer number of
samples than the total number of radar return signal samples N, Fig.10.2.
The signal model with complex-valued sinusoids is specific and very
important in engineering applications. We will focus most of our presen-
tation to this model. To illustrate complexity of the problem we will dis-
cus the simplest possible case consisting of one complex sinusoid at a fre-
quency k0 . Within the previous framework it means that we consider a
case with only one nonzero DFT coefficient at an unknown frequency in-
dex k0 . Assume that two samples/observations x (n1 ) = A exp( j2πk0 n1 /N )
and x (n2 ) = A exp( j2πk0 n2 /N ) of this signal are available. Note that signal
amplitude A is complex-valued and includes the initial phase. In order to
find the unknown position (frequency index) form the ratio
x ( n1 )
= exp( j2πk0 (n1 − n2 )/N ).
x ( n2 )
674 Sparse Signal Processing
0 20 40 60 (c) 0 20 40 60 (d)
Figure 10.2 (a) Signal in the frequency domain, where it is sparse (velocities of two targets
in Doppler radar signal). (b) Signal in the time domain, where it is dense. (c) Reduced set of
measurements (samples) and (d) its DFT before reconstruction, calculated using the available
samples only. Real parts of signals are presented.
k0 = 5 + 16k/4,
Ljubiša Stanković Digital Signal Processing 675
1 p(k)=P(200,k)
0.5
0
0 100 200 300 400 500
1 X(k)=p(k)-p(k-1)
-1
1 x(ξ)=FT[X(k)]
-1
-2 0 2
Figure 10.3 Shepp-Logan model for the computed tomography reconstruction (left), along
with its slice along indicated line (right-top), its derivative (right-middle) and its Fourier
transform.
N −1
x (n) = ∑ X (k )ψk (n)
k =0
or
x= ΨX,
where Ψ is the transformation matrix with elements ψk (n), x is the signal
vector column, and X is the transformation coefficients vector column. A
signal is sparse in the transformation domain if the number of nonzero
transform coefficients K is much lower than the number of the original
signal samples N, i.e., if
X (k ) = 0
Ljubiša Stanković Digital Signal Processing 677
for
/ {k1 , k2 , ..., k K } = K,
k∈
The number of nonzero samples is
where
N −1
∥ X ∥0 = ∑ | X (k )|0
k =0
and card {X} is the notation for the number of nonzero transformation
coefficients in X. Counting the nonzero coefficients in a signal representation
can be achieved by using the so called ℓ0 -norm denoted by ∥X∥0 . This form
is referred to as the ℓ0 -norm (norm-zero) although it does not satisfy norm
properties. By definition | X (k )|0 = 0 for | X (k )| = 0 and | X (k )|0 = 1 for
| X (k)| ̸= 0.
A signal x (n), whose transformation coefficients are X (k ), is sparse in
this transformation domain if
card {X} = K ≪ N.
For linear signal transforms the signal can be written as a linear combination
of the sparse domain coefficients X (k )
2 x(n)
-1
-2
0 20 40 60 80 100 120
2 y(n)
-1
-2
0 20 40 60 80 100 120
M equations
⎡ ⎤ ⎡ ⎤⎡ ⎤
x ( n1 ) ψ0 (n1 ) ψ1 (n1 ) ψ N −1 ( n 1 ) X (0 )
⎢ x (n2 ) ⎥ ⎢ ψ0 (n2 ) ψ1 (n2 ) ψ N −1 ( n 2 ) ⎥ ⎢ X (0 ) ⎥
⎢ ⎥=⎢ ⎥⎢ ⎥
⎣ ... ⎦ ⎣ ... ... ... ⎦⎣ ... ⎦
x (n M ) ψ0 (n M ) ψ1 (n M ) ψ N −1 ( n M ) X ( N − 1)
or
y = AX
where A is the M × N matrix of measurements/observations/available
signal samples.
The fact that the signal is sparse with X (k ) = 0 for k ∈
/ {k1 , k2 , ..., k K } =
K is not included in the measurement matrix A since the positions of
the nonzero values are unknown. If the knowledge that X (k ) = 0 for k ∈ /
{k1 , k2 , ..., k K } = K were included then a reduced observation matrix would
be obtained as
⎡ ⎤ ⎡ ⎤⎡ ⎤
x ( n1 ) ψk1 (n1 ) ψk2 (n1 ) ψkK (n1 ) X (k1 )
⎢ x (n2 ) ⎥ ⎢ ψk (n2 ) ψk (n2 ) ψkK (n2 ) ⎥ ⎢ ⎥
⎢ ⎥=⎢ 1 2 ⎥ ⎢ X (k2 ) ⎥
⎣ ... ⎦ ⎣ ... ... ... ⎦ ⎣ ... ⎦
x (n M ) ψk1 (n M ) ψk2 (n M ) ψkK (n M ) X (k K )
or
y = AK XK .
Matrix AK would be formed if we knew the positions of nonzero samples
k ∈ {k1 , k2 , ..., k K } = K. It would follow from the measurement matrix A by
omitting the columns corresponding to the zero-valued coefficients X (k ).
Assuming that there are K nonzero coefficients X (k ), out of the total
number of N values, the total number of possible different matrices AK is
equal to the number of combinations with K out of N. It is equal to ( N K ).
In the common signal transform cases (like the DFT) the set of miss-
ing/unavailable samples can be defined as well
The union of sets y and yc is a set containing all signal samples (complete
set of samples/measurements). If x is the complete set of samples then
x = y ∪ yc .
f = B M x.
Since the signal is related to its sparsity domain by x = ΨX, the measure-
ments are related to the sparsity domain form of signal as
f = B M ΨX = AX,
where
A = B M Ψ.
An example of indirect measurements is a linear signal transform
N −1
f (i ) = ∑ h (i − m ) x ( m )
m =0
with bim = h(i − m − 1). In this case samples of the output signal (a trans-
form of original signal) are the measurements, while the sparsity domain
is the transformation domain of the input (original) signal. All linear signal
transforms can be considered within this framework.
with
N −1
X (k) = ∑ x ( n ) ϕ k ( n ).
n =0
In a matrix form
x= ΨX and X = Φx.
Ljubiša Stanković Digital Signal Processing 681
i.e., Φ = W N and Ψ = W− 1 1 H
N = N W N . The elements of matrix W N are WN
k
1 −k 1
ψk (n) = √ WN = √ e j2πk/N
M M
so that its energy over M measurements (energy of a column of measure-
ments matrix A) is
M
⟨ψk , ψk∗ ⟩ = ∑ |ψk (ni )|2 = 1.
i =1
−k
For the common DFT matrix ψk (n) = WN /N with
⟨ψk , ψk∗ ⟩ = N.
√
√Bernoulli random matrix, whose elements take the value 1/ N and
−1/ N, is also used in compressive sensing. An interesting class of mea-
surement matrices is called structured random matrices. One type of such
matrices is obtained by random sampling of functions that have a sparse
expansion in terms of an orthonormal system. The partial DFT matrix is
one of such examples. The randomness is a result of the random sampling
positions. Another more complex example of such sampling and structured
random matrix will be presented on the case of nonuniform sampling of
signal x (t) and the DFT transform as its sparsity domain, by the end of this
chapter.
It is assumed that EΨ is the same for any k. For normal basis functions,
EΨ = 1 by definition, and
1 −k 1
ψk (n) = W = e j2πk/N .
N N N
Then EΨ = 1/N in the DFT case. Note that the unitary property in the DFT
is just Parseval’s theorem, since ΨX = x and ΨY = y. With EΨ = 1/N the
Ljubiša Stanković Digital Signal Processing 683
relation
⟨ΨX, ΨY⟩ = EΨ ⟨X, Y⟩
results in
N −1 N −1
1
∑ x (n)y∗ (n) = ∑ X ( k )Y ∗ ( k ) .
n =0 N k =0
1
EΨ ∥ΨX∥22 − ∥X∥22
= 0.
∥X∥22
1
− ∥XK ∥22 δK ≤ ∥AK XK ∥22 − ∥XK ∥22 ≤ ∥XK ∥22 δK
EA
or
1
EA ∥AK XK ∥22
1 − δK ≤ ≤ 1 + δK .
∥XK ∥22
for 0 ≤ δK < 1. For δK = 0 the isometry property holds for AK .
10.3.3 Coherence
where
M
1 1
µ(m, k) =
ψ (n )|2
∑ ψm (ni )ψk∗ (ni ) = EA ⟨ψm , ψk∗ ⟩ (10.12)
∑iM
=1 | k i i =1
Y Z
and ψk is the kth column of matrix A with E A = ψk , ψk∗ . This index plays
an important role in the analysis of measurement matrices.
The coherence index cannot be arbitrary small for an M × N matrix A
(M < N). The Welch upper bound relation holds
<
N−M
µ≥ . (10.13)
M ( N − 1)
Ljubiša Stanković Digital Signal Processing 685
The Welch limit for matrix A, whose columns have energy E A , will be
proven next.
Denote the elements of matrix E1 A H A by b(m, k ). By definition, the
A
trace of this matrix is a sum of its diagonal elements,
N
1 H
Trace{ A A} = ∑ b(m, m) = N.
EA m =1
1 H
Trace and energy are related to the eigenvalues λi of EA A A as
M
1 H
Trace{ A A } = ∑ λi
EA i =1
W W
W 1 H W2 N N M
W W
W E A AW =
A
∑ ∑ |b(m, k)|2 = ∑ λ2i .
2 m =1 k =1 i =1
We may write
( )2 W W
B C2 M M W 1 H W2
2 H
N = Trace{A A} = ∑ λi ≤ M∑ λ2i = M W A AW
W
W . (10.14)
i =1 i =1
E A 2
Schwartz’s inequality
(λ1 + λ2 + ... + λ M )2
≤ M. (10.15)
λ21 + λ22 + ... + λ2M
is used. Since the elements b(m, k ) are equal to the scalar products (10.12) of
columns ψm ∗ (n ) and ψ (n ) then
i k i
W W ' '2
W 1 H W2 M M M M ' 1 '
W
WE
A
A A W =
W ∑ ∑ |b(m, k)|2 = ∑ '
A
ψ ∗
∑ ' E m k ''
⟨ , ψ ⟩
2 m =1 k =1 m =1 k =1
M M
= ∑ ∑ |µ(m, k)|2 ≤ ( N + N ( M − 1)µ2 ). (10.16)
m =1 k =1
N 2 ≤ M ( N + N ( N − 1 ) µ2 ).
686 Sparse Signal Processing
The equality holds for matrices that form an equiangular tight frame.
From the presented proof for the Welch bound we can see that two inequal-
ities in (10.14) and (10.16) become equalities if
λ1 = λ2 = ... = λ M
∗
|⟨ψm , ψk ⟩| = µ for any m, k.
will be used few more times (in various forms), within this chapter, we will
present its proof here. Note that with y(n) = 1 and x (n) = λn it produces
(10.15). Previous inequality easily follows from
M M
0≤ ∑ ∑ (x(n)y(m) − x(m)y(n))2
n =1 m =1
M M M M M M
= ∑ ∑ x 2 ( n ) y2 ( m ) − 2 ∑ ∑ x (n)y(n) x (m)y(m) + ∑ ∑ x 2 ( m ) y2 ( n ).
n =1 m =1 n =1 m =1 n =1 m =1
Since the first and last sums are equal, Schwartz’s inequality follows from
( )2
M M M
2 2
2 ∑x (n) ∑y (m) − 2 ∑ x (n)y(n) ≥ 0.
n =1 m =1 n =1
With y(n) = 1 and x (n) = | x (n)| Schwartz’s inequality can also be written as
( )2
M M
∑ |x(n)| ≤M ∑ |x(n)|2 (10.17)
n =1 n =1
or
√ 1
∥ x ∥1 ≤ M ∥x∥2 or ∥ x ∥2 ≥ √ ∥ x ∥1
M
Ljubiša Stanković Digital Signal Processing 687
with [
\ M
M \
∥ x ∥1 = ∑ | x (n)| and ∥x∥2 = ] ∑ | x (n)|2 .
n =1 n =1
Equality in this relation holds when | x (n)| = Cy(n) = C, i.e., for | x (1)| =
| x (2)| = ... = | x ( M)|.
For a K sparse vector X holds
( )2
K K
∑ |X (ki )| ≤ K ∑ | X (k i )|2
i =1 k =1
1
∥ X ∥2 ≥ √ ∥ X ∥1 .
K
2
Using E A = ∑iM
=1 |ψni (k )| and µ (k 1 , k 2 ) =
1
EA ∑iM ∗
=1 ψni (k 1 )ψni (k 2 ) we
get
N −1 N −1 N −1
∥AX∥22 = E A ∑ | X (k )|2 + ∑ ∑ 2 Re { X (k1 ) X ∗ (k2 )µ(k1 , k2 ) EΨ } .
k =0 k 1 =0 k 2 = k 1 +1
(10.19)
Since the restricted isometry property reads
' '
' 1 2 2 '' 2
'
' E ∥AX∥2 − ∥X∥2 ' ≤ δK ∥X∥2 , (10.20)
A
688 Sparse Signal Processing
Value on the right side of inequality is highly signal dependent. We will find
an estimate of its bound. Since
we can write
% ;
∑kN1− 1 N −1 ∗
=0 ∑i =k2 +1 2 | X (k 1 ) X (k 2 )|
δK ≤ µ max 2
.
∑kN=−01 | X (k )|
N −1
∥A2 X∥22 = E A ∑ | X (k)|2 + 2 Re { X (k1 ) X ∗ (k2 )µ(k1 , k2 ) EΨ } .
k =0
since
| X (k1 )|2 + | X (k2 )|2
≥ 2. (10.23)
| X (k1 ) X ∗ (k2 )|
The maximal value in Schwartz’s inequality (10.22) is achieved for | X (k1 )| =
| X (k2 )| and µ = max |µ(k1 , k2 )|. Inequality (10.23) easily reduces to the well
known inequality
1
(a + ) ≥ 2
a
for a > 0 and ( a + 1a ) = 2 for a = 1. Since the limit value may be achieved for
a specific signal, if our aim is that (10.20) holds for any signal, we may write
δ2 = µ.
Ljubiša Stanković Digital Signal Processing 689
' 1 2 2 ''
'
' E A ∥ A3 X ∥2 − ∥ X ∥2 ' | X (k1 ) X ∗ (k2 )| + | X (k1 ) X ∗ (k3 )| + | X (k2 ) X ∗ (k3 )|
' ' ≤ 2µ
' ∥X∥22 ' | X (k1 )|2 + | X (k2 )|2 + | X (k3 )|2
( )
(| X (k1 )| + | X (k2 )| + | X (k3 )|)2
= − 1 µ ≤ (3 − 1) µ = 2µ.
| X (k1 )|2 + | X (k2 )|2 + | X (k3 )|2
K K ' ' * +2
K
2∑ ∑ ' X (k i ) X ∗ (k j )' ∑ | X (k i )|
i =1 j = i +1 i =1
K
= K
− 1 ≤ ( K − 1 ).
2 2
∑ | X (k i )| ∑ | X (k i )|
i =1 i =1
δK ≤ (K − 1)δ2 = (K − 1)µ.
' '
' 1 2 2 '' 2
' −
'E ∥ AX ∥ 2 ∥ X ∥ 2 ' ≤ ( K − 1) µ ∥ X ∥2 .
A
In general, it does not mean that that there is no lower values of bound
δK such that the restricted isometry inequality is satisfied. This is just an
estimate of the upper bound value of the constant δK . Equality could be
checked by examining the imposed inequality conditions.
690 Sparse Signal Processing
' '
' '
For the DFT matrix with |ψni (k )| = 'e j2πk/N /N ' = 1/N and E A =
M/N 2 from (10.19) we get
N −1
M
∥AX∥22 = ∑ | X (k)|2 (10.24)
N2 k =0
% ;
1 N −1 N −1
∗
M
j2πni (k1 −k2 )/N
+ 2 ∑ ∑ 2 Re X (k1 ) X (k2 ) ∑ e .
N k =0 k i =1
1 2 = k 1 +1
Introducing notation
% ;
N −1 N −1 M
1 ∗ j2πni (k1 −k2 )/N
α= ∑ ∑
M ∥X∥22 k1 =0 k2 =k1 +1
2 Re X (k1 ) X (k2 ) ∑ e (10.25)
i =1
we can write
N2
∥AX∥22 = ∥X∥22 + α ∥X∥22 . (10.26)
M
For M = N it is easy to check that the isometry property
N ∥AX∥22 = ∥X∥22
∥BX∥22 X T B T BX
dmin ≤ = ≤ dmax ,
∥X∥22 ∥B∥22
692 Sparse Signal Processing
where dmin and dmax denote the minimal and the maximal eigenvalue of
Gram matrix B T B. Eigenvalues of Gram matrix are real and nonnegative.
In our case
1 T
BT B = A AK .
EA K
Using this inequality we can write
1
EA ∥AK XK ∥22
dmin ≤ ≤ dmax
∥XK ∥22
1
EA ∥AK XK ∥22
1 − δmin ≤ ≤ 1 + δmax
∥XK ∥22
where constants δmin and δmax are defined by δmin = 1 − dmin , δmax =
dmax − 1. A symmetric form of the restricted isometry property is commonly
used with
δK = max{δmin , δmax }.
A symmetric restricted isometry property inequality
1
EA ∥AK XK ∥22
1 − δK ≤ ≤ 1 + δK
∥XK ∥22
1 T
is obtained. It can be related to the condition number of matrix EA AK AK
defined by
! 6
1 T dmax
cond AK AK =
EA dmin
Since
1 − δK ≤ dmin ≤ dmax ≤ 1 + δK
it means ! 6
1 T 1 + δK
cond A AK ≤ .
EA K 1 − δK
Small values of δK , close to 0, mean robust and stable invertibility of Gram
matrix. In theory 0 ≤ δK < 1 is sufficient.
If the eigenvalues of matrix E1 AKT AK are denoted by di then, by
A
definition,
1 T
det( A A K − di I ) = 0
EA K
Ljubiša Stanković Digital Signal Processing 693
λi = di − 1.
In a symmetric case the restricted isometry property bounds δmin , δmax
are symmetric for small sparsity, while for large sparsity the value δmax
dominates. It is common to calculate δK = δmax or
! * +6
1 T
δK = dmax − 1 = λmax = max eig AK AK −I , (10.27)
EA
derived√in literature for√a large M. Dashed thick vertical lines indicate the
values 2 − 1 and −( 2 − 1) for λ. Later it will be shown that these
limits play an important role in the definition of a sufficiently small δK . The
absolute reconstruction limit δK = 1 is achieved first with E {dmax ( M, K )} =
* F +2 F
K K
√
1+ M = 2 or M ≤ 2 − 1 for K ≤ 0.1716M. We can see than the
case K = 16 is the last one whose eigenvalues in 10, 000 realizations are
within limits, meaning that M = 1024 observations are sufficient for unique
reconstruction (in the sense of these limits) of K = 8 sparse signal (for a K
sparse signal the reconstruction requires that all limits and constants are
satisfied for a 2K sparse signal). Note that the presented values are only
the mean values. Values dmax ( M, K ) and dmin ( M, K ) are random variables.
Minimal and maximal values obtained in 10, 000 realizations are given in the
table. √ √
Limit 2 − 1 in λ or 2 in d is achieved using (10.28) for K = 0.0358M.
For M = 1024 its value is K = 36.6. Therefore this kind of bounds estimate
is optimistic. The value of the bound determined by the mean value is
lower than the maximal value based bound of a random variable, as we can
see from the table. Calculation of the bounds with satisfactory probability,
taking into account stochastic nature of eigenvalue limits, may be found in
literature.
K = 8, λmin = −0.24, λmax = 0.27,
K = 16, λmin = −0.30, λmax = 0.35,
K = 24, λmin = −0.34, λmax = 0.41,
K = 32, λmin = −0.37, λmax = 0.48,
K = 64, λmin = −0.47, λmax = 0.65,
K = 128, λmin = −0.60, λmax = 0.91,
K = 256, λmin = −0.76, λmax = 1.32,
K = 1024, λmin = −0.98, λmax = 3.08.
Limit cases for K/M ≪ 1 and for the case K = M easily follow.
Example 10.4. Write the full DFT transformation matrix for a signal of N = 8
samples.
(a) Show that it satisfies the unitary and isometry property (restricted
isometry property with δ = 0).
(b) Write the measurement matrix A if the number of available signal
samples/measurements in time domain is M = 6.
(c) If the sparsity in the DFT domain is K = 2 what is the form of the
submatrix A2 and the isometry constant δ2 .
(d) Write δ2 in terms of coherence index µ.
(e) Consider cases with K = 3 and K = 4. Comment the results.
Ljubiša Stanković Digital Signal Processing 695
pλ (ξ) pd (ξ)
2 K=8 2 K=8
1 1
0 0
-1 -0.5 0 0.5 1 0 0.5 1 1.5 2
pλ (ξ) pd (ξ)
2 K=16 2 K=16
1 1
0 0
-1 -0.5 0 0.5 1 0 0.5 1 1.5 2
1 1
0 0
-1 -0.5 0 0.5 1 0 0.5 1 1.5 2
1 1
0 0
-1 -0.5 0 0.5 1 0 0.5 1 1.5 2
Figure 10.5 Histograms (normalized) of the eigenvalues of AKT AK (Wishart matrix) and
AKT AK −I matrix for N = 2048, M = 1024 and K = 8, 16, 32, 256, 1024. Dashed thick vertical lines
√ √
show the limits 2 − 1 and −( 2 − 1) sufficient for unique K/2 signal reconstruction.
696 Sparse Signal Processing
x= ΨX
x = [ x (0), x (1), x (2), x (3), x (4), x (5), x (6), x (7)] T
X = [ X (0), X (1), X (2), X (3), X (4), X (5), X (6), X (7)] T
⎡ ⎤∗
1 1 1 1 1 1 1 1
⎢ 1 W81 W82 W83 W84 W85 W86 W87 ⎥
⎢ ⎥
⎢ 1 W82 W84 W86 W88 W810 W812 W814 ⎥
⎢ 3 6
⎥
1⎢ 1 W8 W8 W8 W812 W815 W818 W821
9 ⎥
Ψ= ⎢ ⎥
8⎢⎢ 1 W 4
8 W 8
8 W 8
12 W 16 W 20 W 24 W 28
8 8 8 8
⎥
⎥
⎢ 1 W 5 W 10 W 15 W 20 W 25 W 30 W 35 ⎥
⎢ 8 8 8 8 8 8 8 ⎥
⎣ 1 W 6 W 12 W 18 W 24 W 30 W 36 W 42 ⎦
8 8 8 8 8 8 8
1 W87 W814 W821 W828 W835 W842 W849
where ∗ denotes complex conjugate and W8nk = exp(− j2πnk/8). The trans-
formation matrix W is unitary matrix according Parseval’s theorem
As expected for the full DFT matrix, the isometry property is satisfied, since
' '
' N ∥ΨX∥2 − ∥X∥2 '
' 2 2'
' ' ≤ δ with δ = 0.
' ∥X∥22 '
n i ∈ { n1 , n2 , n3 , n4 , n5 , n6 } = M
⊂N = {0, 1, 2, 3, 4, 5, 6, 7}
with
y = AX
⎡ ⎤∗
1 W8n1 W82n1 W83n1 W84n1 W85n1 W86n1 W87n1
⎢ 1 W8n2 W82n2 W83n2 W84n2 W85n2 W86n2 W87n2 ⎥
⎢ ⎥
1⎢
⎢ 1 W8n3 W82n3 W83n3 W84n3 W85n3 W86n3 W87n3
⎥
⎥
A= ⎢ ⎥
8⎢
⎢ 1 W8n4 W82n4 W83n4 W84n4 W85n4 W86n4 W87n4 ⎥
⎥
⎣ 1 W8n5 W82n5 W83n5 W84n5 W85n5 W86n5 W87n5 ⎦
1 W8n6 W82n6 W83n6 W84n6 W85n6 W86n6 W87n6
For the isometry property this matrix is a special case of (10.24) using only
k = k1 and k = k2 ,
6 B C 2 M N
∥A2 X∥22 = | X (k1 )|2 + | X (k2 )|2 + Re X (k1 ) X ∗ (k2 )∑6i=1 W8−ni k1 W8ni k2
64 64
! 6
B C M
2 ∗ (k ) − ni k 1 ni k 2
64 2 2 2 Re X ( k ) X ∑ W W
6 ∥ AX ∥2 − | X ( k 2 )| + | X (k 2 )| 6 1 2 8 8
i =1
2 2
= 2 2
| X (k1 )| + | X (k2 )| | X (k1 )| + | X (k2 )|
(d) Using the coherence definition
# $
ψk1 , ψk∗2
µ(k1 , k2 ) = # $
ψk1 , ψk∗
1
1 6 1 6
µ(k1 , k2 ) = ∑ W8−ni k1 W8ni k2 = ∑ e j2πni (k1 −k2 )/8 .
6 i =1 6 i =1
The maximal value in this inequality is achieved for | X (k1 )| = | X (k2 )| and
max |µ(k1 , k2 )| = µ. Having in mind inequality for Re { X (k1 ) X ∗ (k2 )µ(k1 , k2 )}
the overall maximum is achieved for | X (k1 )| = | X (k2 )| with
where r is an integer.
To comment the results consider the least mean square solution of
system
A2 X =y
H
= A2H y
A2 A2 X
B C −1
X = A2H A2 A2H y
⎛⎡ ⎤T ⎡ ⎤ ∗ ⎞ −1
W8n1 k1 W8n1 k2 W8n1 k1 W8n1 k2
⎜⎢ ⎥ ⎢ ⎥ ⎟
⎜⎢ W8n2 k1 W8n2 k2 ⎥ ⎢ W8n2 k1 W8n2 k2 ⎥ ⎟
⎜⎢ ⎥ ⎢ ⎥ ⎟
⎜⎢ W8n3 k1 W8n3 k2 ⎥ ⎢ W8n3 k1 W8n3 k2 ⎥ ⎟
=⎜
⎜⎢
⎢ ⎥
⎥
⎢
⎢
⎥
⎥
⎟
⎟ X0
⎜⎢ W8n4 k1 W8n4 k2 ⎥ ⎢ W8n4 k1 W8n4 k2 ⎥ ⎟
⎜⎢ ⎥ ⎢ ⎥ ⎟
⎝⎣ W8n5 k1 W8n5 k2 ⎦ ⎣ W8n5 k1 W8n5 k2 ⎦ ⎠
W8n6 k1 W8n6 k2 W8n6 k1 W8n6 k2
where
X0 = NA2H y (10.30)
P QT
and A2H = A2∗ . Then by multiplying A2H A2 we get
- . −1
M Mµ∗ (k1 , k2 )
X= X0 , (10.31)
Mµ(k1 , k2 ) M
with
1 M j2πni (k1 −k2 )/N
M i∑
µ(k1 , k2 ) = e .
=1
Ljubiša Stanković Digital Signal Processing 699
Obviously if
1 M j2πni (k1 −k2 )/N
M i∑
µ(k1 , k2 ) = e = ±1
=1
when
ρ2 = µ = max |µ(i, k )| = 1
the system does not have a (unique) solution. It means that measurements
y(n) are not independent and that during the projection of the N dimensional
space of the sparse vector X to the space of dimension M < N by the
linear transformation AX = y the information about one of the two nonzero
coordinates is lost, i.e. it is projected to zero and can not be recovered.
The inversion robustness in (10.31) is the highest when µ(k1 , k2 ) = 0.
The reconstruction is done in this case using the identity matrix. For values
of µ(k1 , k2 ) increasing toward 1 the determinant value M2 (1 − µ2 (k1 , k2 ))
reduces. It means the results in the reconstruction are multiplied by 1/ ( M2 −
M2 µ2 (k1 , k2 )). If there is noise in the measurements y, i.e., in the initial
estimate X0 = A H y, then the noise in the reconstruction will be increased,
meaning degradation of the signal-to-noise ratio. Therefore the values of
ρ2 = max |µ(i, k )| close to 1 are not desirable in the reconstruction, although
in theory, the reconstruction is possible. Reduction of the value of isometry
constant ρK toward zero will be of crucial importance in the application of
some reconstruction algorithms that will be presented later.
The values of
1 M j2πni (k1 −k2 )/N
M i∑
µ(k1 , k2 ) = e (10.32)
=1
for the DFT matrix are calculated for all possible (k1 , k2 ) and presented in
Fig.10.6. The coherence index value is equal to the maximal absolute value
of µ(k1 , k2 ). Signals of sparsity K = 2 (top), K = 3 (middle), and K = 4
(bottom) are considered for all possible positions of the available samples
ni and nonzero coefficients k i . The restricted isometry constant for this signal
with N = 8 samples and M = 6 observations (available samples) at ni for
i = 1, 2, 3, 4, 5, 6 is also calculated. The restricted isometry property
B constant
C
1 T
δK is calculated by using eigenvalues of the matrix Λ = eig 6 AK AK −I for
all possible nonzero positions of X (k ), as in (10.27). Then, for example for
K = 2, δ(k1 , k2 ) = λmax = max {Λ} is calculated for each possible AK . Finally
δ2 = maxk1 ,k2 δ(k1 , k2 ). Note that in this case equality in δK ≤ (K − 1)µ holds
for all K, where µ = maxk1 ,k2 |µ(k1 , k2 )|, Fig.10.6.
(e) The calculation is done for K = 3 and K = 4 as well. The restricted
isometry property is not satisfied for matrix AK in the case K = 4. The
700 Sparse Signal Processing
200 200
δ =0.333
2
100 100
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
400
200
200
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Histogram of max|µ(k ,k )|
i j
Histogram of δ(k1,k2,k3,k4)
for all k , k , k , k , and n , i=1,2,3,4,5,6 for all k , k , k , k , and n , i=1,2,3,4,5,6
1 2 3 4 i 1 2 3 4 i
600
N=8, K=4, M=6 N=8, K=4, M=6
1000
400
500
200
δ4=1
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Figure 10.6 The coherence index value and the restricted isometry constant for signal with
N = 8 samples and M = 6 observations (available samples) at ni for i = 1, 2, 3, 4, 5, 6. Signals of
sparsity K = 2 (top), K = 3 (middle), and K = 4 (bottom) are considered for all possible positions
of the available samples ni and nonzero coefficients k i . The DFT is the transformation matrix.
Ljubiša Stanković Digital Signal Processing 701
From the introductory examples we have seen that for a signal of sparsity
K = 1, two samples/measurements may produce full reconstruction. We
have also shown that any two samples/measurements may not be suffi-
cient. The solution is unique if the determinant of any second order linear
system of equations, for these measurements, is nonzero
- .
ψi (0) ϕ k (0 )
det ̸= 0
ψi (1) ϕ k (1 )
y = AX (10.33)
T
rank(A2K ) = rank(A2K A2K ).
T A
Matrix A2K 2K is the Gram matrix of A2K . For matrices A2K with complex
elements the conjugate transpose (Hermitian matrix) is used A2K H A . A
2K
T
way to check if the rank of A2K A2K is 2K is to calculate and check
T
det(A2K A2K ) = d1 d2 ...d2K ̸= 0
1
EA ∥A2K X2K ∥22
1 − δ2K ≤ ≤ 1 + δ2K
∥X2K ∥22
Ljubiša Stanković Digital Signal Processing 703
with δ2K = max{1 − dmin , dmax − 1} and 1 − δ2K ≤ dmin ≤ dmax ≤ 1 + δ2K ,
! 6
1 T 1 + δ2K
cond A A2K ≤ .
E A 2K 1 − δ2K
y = AX and y = AH.
Then
(AX − AH)=0
A(X − H)=0.
0 ≤ δ2K < 1
1 ≤ rank {A} ≤ M.
The rank of matrix A is rank {A} = 3 since we may easily check that the
determinant of a matrix formed using first three columns of A is nonzero. If
that determinant was zero, then before concluding that rank of A is lower
than 3 we should try with all possible combinations of columns. If all com-
binations
M ofN3 columns were dependent, then we should check if rank {A} =
rank AA T = 2 by forming all possible 2 × 2 submatrices. If any of them has
a nonzero determinant then the rank would be 2, otherwise the rank would
be one when only one nonzero element of matrix A exists.
There are several methods for calculation of rank of matrix without
combinatorial search.MThe rank
N calculation can be simplified using the fact
that rank {A} = rank AA T . The only one matrix
⎡ ⎤
19 13 8
AA T = ⎣ 13 19 18 ⎦
8 18 32
M N
should be checked for the possible rank 3. Note also that det AA T =
M N M N
λ1 λ2 λ3 where {λ1 , λ2 , λ3 } = eig AA T . Therefore for rank AA T = 3 all
Ljubiša Stanković Digital Signal Processing 705
spark {A} = 3.
spark {A} = p
It means that
The spark of measurement matrix is used for very simple definition of the
existence of the sparsest solution of a minimization problem
∥ X ∥0 = K
then if
1
K < spark {A}
2
the solution X is unique.
In order to prove this statement consider a matrix A whose spark is
spark {A}. Then for a sparse vector X of sparsity K = spark {A} obviously
there exists such a combination of nonzero elements in X so that they
coincide with the dependent columns. Then we can obtain
AX = 0.
Note that for any X of sparsity K < spark {A} the relation AX = 0 will
not hold, since nonzero signal elements of X cannot produce a zero result
when multiplied by columns which are independent. Since K < spark {A}
it means that in all cases K columns are independent.
The proof that K < 12 spark {A} means that X, being solution of AX = y,
is unique, will be based on contradiction.
Ljubiša Stanković Digital Signal Processing 707
Assume that X is a solution and that it satisfies K < 12 spark {A} but
that there is another solution H such that AH = y which is also sparse with
sparsity lower than the sparsity of X, i.e., lower than 12 spark {A}.
Since
AH = AX = y
A(H − X) = 0
then
spark {A} = min ∥H − X∥0 such that A(X − H) = 0.
or
If there is another solution H such that ∥H∥0 < 12 spark {A} then from
the last inequality follows ∥X∥0 > 12 spark {A} . This is a contradiction to
the assumption that both solutions H and X have sparsity lower than
1
2 spark { A }.
The spark of matrix can be related to the coherence of matrix. The
relation is
1
spark {A} = 1 +
µ(A)
where µ(A) (or just µ) is the coherence index of matrix A. The proof is based
on the quadratic norm positivity of the matrix A T A.
The coherence index value is (10.29)
'Y '
' ψ , ψ∗ Z '
'Y i kZ'
µ(A) = max |µ(i, k )| = max ' '.
i ̸=k ' ψi , ψi∗ '
where ψi are
Y columns
Z of matrix A. It is assumed that all columns are of equal
1 ∗
energy M ψi , ψi = 1.
The maximal possible value of spark is spark {A} = M + 1 when there
is no dependent columns. Then
1
K< ( M + 1) .
2
For K sparse signal we must have at least
M ≥ 2K.
708 Sparse Signal Processing
spark {A} = M + 1
with a very high probability. However, in the cases of noisy signals or ap-
proximately sparse signals, more robust calculations are required increasing
the number of required observations. For a quadratic and orthogonal matrix
A the coherence index is µ(A) = 0 and for that matrix spark {A} → ∞, by
definition.
For the illustrative example from the beginning of this chapter we
had a condition that one false bag can be discovered if we performed two
measurements
⎡ ⎤
- . - . X (0 )
y (0) ψ0 (0) ψ1 (0) ... ψN −1 (0) ⎢ ⎢ X (1 ) ⎥
⎥
= ⎣ ⎦
y (1) ψ0 (1) ψ1 (1) ... ψN −1 (1) ...
X ( N − 1)
y = AX
such that ψi (0) ϕk (1) − ψi (1) ϕk (0) ̸= 0 for any combination of columns i and
k. It means that two columns are not dependent, i.e., that
ψi (0) ψ (1 )
= i does not hold for any i ̸= k.
ψk (0) ψk (1)
Assuming that there is no an all zero column then spark {A} = 3 meaning
that a signal X of sparsity K < 12 (2 + 1) can be recovered.
Within this framework we can now consider the case with three mea-
surements
⎡ ⎤
⎡ ⎤ ⎡ ⎤ X (0 )
y (0) ψ0 (0) ψ1 (0) ... ψN −1 (0) ⎢ ⎥
⎣ y(1) ⎦ = ⎣ ψ0 (1) ψ1 (1) ... ψN −1 (1) ⎦ ⎢ X (1 ) ⎥
⎣ ... ⎦
y (2) ψ0 (2) ψ1 (2) ... ψN −1 (2)
X ( N − 1)
y = AX.
ψi (0) ψ (1 ) ψ (2 )
= i = i does not hold for any i ̸= k.
ψk (0) ψk (1) ψk (2)
Ljubiša Stanković Digital Signal Processing 709
- .
ψi (0) ϕ k (0 )
In the notation of the determinants it means that det ̸= 0
- . ψi (1) ϕ k (1 )
ψi (1) ϕk (1)
or det ̸= 0. In the terminology of a matrix rank it means
ψi (2) ϕk (2)
that rank {A2 } = 2 for any submatrix A2 of two columns of A. The matrix
A2 has two columns and M rows. For the rank calculation there is no need
for combinations over rows since
M N
rank {A2 } = rank A2T A2 ,
Since the variance of ψk1 (ni ) is 1/M then E A = 1. For Gaussian variables the
variance of random variable µ(k1 , k2 ) is
1 1
σ2 = Mσψ2 σψ2 = M = 1/M
MM
(see Problem 7.13). As a sum of large number of random variables the
resulting variable µ(k1 , k2 ) can be considered as Gaussian with variance
σ2 = 1/M. Since δ2K ≤ (2K − 1)µ where µ = max |µ(k1 , k2 )| then using the
equality δ2K = (2K − 1)µ in the estimation for a given δ2K then all absolute
values of µ(k1 , k2 ) should satisfy
δ2K
|µ(k1 , k2 )| ≤ µ =
( − 1)
2K
√
with a high probability P = erf(S/ 2), following
√
µ Mδ2K
S= =
σ (2K − 1)
710 Sparse Signal Processing
sigma rule. In order to find P (and corresponding S) note that there are ( N2 )
different values of µ(k1 , k2 ). Assuming that they are independent
√ B
Mδ2K √ C( 2 )
N
for ( N2 ) = 2038 · 1024 ∼ 106 the value S = 6.5 will produce the above proba-
bility of order 0.9999. It means
√
Mδ2K
(2K − 1) = = 2.02.
S
The largest value of K according to this analysis is K = 1. This is a
very pessimistic estimation, as compared to the analysis in Fig.10.5. There
we could expect a unique reconstruction, with the same probability, for
K = 16/2 = 8. Note that here M = δS2K (2K − 1)2 holds. Calculations closer
to the expected results are derived in literature.
Welsh bound <
N−M
µ≥
M ( N − 1)
and the restricted isometry property with
<
N−M
δK = (K − 1)
M ( N − 1)
Although the ℓ0 -norm cannot be used in the direct minimization, the algo-
rithms based on the assumption that some coefficients X (k ) are equal to
zero, and the minimization of the number of remaining nonzero coefficients
that can reconstruct sparse signal, may efficiently be used.
such that
min ∥X∥0 subject to y = AX
where ∥X∥0 = card{X} = K. Consider a discrete-time signal x (n). Signal is
sparse in a transformation domain defined by the basis functions set ψk (n),
k = 0, 1, ..., N − 1. The number of nonzero transform coefficients K is much
lower than the number of the original signal samples N, i.e., X (k ) = 0 for
/ {k1 , k2 , ..., k K } = K,
k∈
K ≪ N. A signal
x (n) = ∑ X (k )ψk (n). (10.35)
k∈{k1 ,k2 ,...,k K }
AK XK = y, (10.37)
rank (AK ) = K.
Note that this condition does not guarantee that another set {k1 , k2 , ..., k K } =
K can also have a (unique) solution, for the same set of available samples.
The uniqueness of solution is considered within the previous subsections.
It requires rank (A2K ) = 2K for any submatrix A2K of the measurements
matrix A. It will be addressed for the DFT case again later in this chapter.
System (10.36) is used with K ≪ M ≤ N. Its solution, in the mean
squared sense, follows from the minimization of difference of the available
signal values and the values Rproduced
S by inverse transform of the recon-
structed coefficients, min X (k) e2 where
' '2
' '
2 ' '
e = ∑ 'y(n) − ∑ X (k )ψk (n)' =
n∈M
' k∈K
'
or M N
min (y − AK XK ) H (y − AK XK )
∂e2
= 2 ∑ (y(n) − ∑ X (k )ψk (n))ψ∗p (n).
∂X ∗ ( p) n∈M k∈K
AKH y = AKH AK XK .
Ljubiša Stanković Digital Signal Processing 713
Its solution is B C −1
XK = AKH AK AKH y. (10.41)
∂e2
= −2AKH y + 2AKH AK XK = 0.
∂XKH
pinv(A)A = I
Apinv(A) = I.
X0 = pinv(A)y
714 Sparse Signal Processing
is unique.
For invertible AA H we have an indeterminate system. All solutions
can be written in form (10.42) with arbitrary z. It can be easily shown that,
in this case, by using the norm-two ( ℓ2 -norm) minimization
X̂ (k ) = ∑ x ( n ) ϕ k ( n ), (10.44)
n∈M
Ljubiša Stanković Digital Signal Processing 715
where for the DFT ϕk (n) = exp(− j2πnk/N ) and n ∈ M = {n1 , n2 , ..., n M }.
Since ϕk (n) = Nψk∗ (n) this relation can be written as (10.30)
X̂ = NA H y
(ii) Set the transform values X (k ) to zero at all positions k except the
highest ones. Alternative:
(ii) Set the transform values X (k ) to zero at all positions k where this
initial estimate X̂ (k ) is below a threshold Tr ,
X (k ) = 0 for k ̸= k i , i = 1, 2, ..., K̂
' '
k i = arg{' X̂ (k )' > Tr }.
AK XK = y
is now reduced to the problem with known positions of non zero coefficients
(considered in the previous section). It is solved in the least square sense as
(10.41)
B C −1
XK = AKH AK AKH y. (10.46)
where
n ∈ M = {0, 1, 3, 5, 6, 7, 8, 9, 10, 12, 13, 15}.
(2) Detecting, for example K = 3 positions of maximal DFT values, k1 , k2 , and
k3 , and (3) calculating the reconstructed DFT values at k1 , k2 , and k3 from
system
3
∑ X (ki )e j2πk n/16 = x(n),
i
i =1
Ljubiša Stanković Digital Signal Processing 717
where n ∈ M = {0, 1, 3, 5, 6, 7, 8, 9, 10, 12, 13, 15} are the instants where the
signal is available.
⋆The discrete-time signal x (n), with 0 ≤ n ≤ 15 is shown in Fig. 10.7.
The signal is sparse in the DFT domain since only three DFT values are
different than zero (Fig. 10.7(second row)). The CS signal, with missing
samples x (2), x (4), x (11), and x (14), being set to 0 for the initial DFT
estimation, is shown in Fig. 10.7 (third row). The DFT of the signal, with
missing values being set to 0, is calculated and presented in Fig. 10.7 (fourth
row). There are three DFT values, at k1 = 1, k2 = 6, and k3 = 7
K = {1, 6, 7}
above the assumed threshold, for example, at level of 11. The rest of the DFT
values is set to 0. This is justified by using the assumption that the signal is
sparse. Now, we form a set of equations, for these frequencies k1 = 1, k2 = 6,
and k3 = 7 as
3
∑ X (ki )e j2πk n/16 = x(n),
i
i =1
where n ∈ M = {0, 1, 3, 5, 6, 7, 8, 9, 10, 12, 13, 15} are the instants where the
signal is available. Since there are more equations than unknowns, the system
P Q −1 H
AK XK = y is solved using XK = AKH AK AK y. The obtained reconstructed
values are exact, for all frequencies k, as in Fig. 10.7(second row). They are
shown in Fig. 10.7 (fifth row).
If the threshold was lower, for example at 7, then six DFT values at
positions
K = {1, 6, 7, 12, 14, 15}
are above the assumed threshold. The system with six unknowns
6
∑ X (ki )e j2πk n/16 = x(n),
i
i =1
where n ∈ M = {0, 1, 3, 5, 6, 7, 8, 9, 10, 12, 13, 15} will produce the same values
for X (1), X (6), and X (7) while the values X (12) = X (14) = X (15) = 0 will be
obtained.
If the threshold is high to include the strongest signal component only,
then the solution is obtained through an iterative procedure described later,
after noise analysis.
4
Original signal
2
0
-2
-4
0 5 10 15
30
DFT of original signal
20
10
0
0 5 10 15
4
Signal with 4 missing samples
2
0
-2
-4
0 5 10 15
30
DFT of signal with
20 4 missing samples set to 0
threshold for reconstruction
10
0
0 5 10 15
30
Reconstructed DFT
20 on detected frequencies
10
0
0 5 10 15
Figure 10.7 Original signal in the discrete-time domain (first row); the DFT of the original
signal (second row); signal with four missing samples at n = 2, 4, 11, and 14 set to zero (third
row); the DFT of signal with missing values being set to 0 (fourth row). The reconstructed
signal assuming that the DFT contains components only at frequencies where the initial DFT is
above threshold (fifth row). Absolute values of the DFT and real part of signal are shown.
K
X (k ) = ∑ x (n)e− j2πnk/N = ∑ ∑ A p e− j2πn(k−k p )/N . (10.47)
n∈M n ∈ M p =1
We can distinguish two cases: (1) For k = k i ∈ {k1 , k2 , ..., k K } then, with
M = card(M),
K
X ( k i ) = Ai M + ∑ ∑ A p e− j2πn(ki −k p )/N .
n∈M p=1,p̸=i
The value of
K
Ξ= ∑ ∑ A p e− j2πn(ki −k p )/N (10.48)
n∈M p=1,p̸=i
Obviously, % ;
2
E ∑ | A1 | = | A1 |2 M.
n∈M
Full set of signal samples would produce the DFT of original signal. It
means that all variables e j2πn(k−k1 )/N are not statistically independent for
(k − k1 ) ̸= 0. They satisfy
N −1
e− j2πm(k−k1 )/N ∑ e j2πn(k−k1 )/N = 0
n =0
N −1
∑ E{e− j2πm(k−k1 )/N e j2πn(k−k1 )/N } = 0 (10.51)
n =0
Since all values e j2πn(k−k1 )/N (with random n) are equally distributed we
may write their expected value over many realizations of different sets M
as
( N − 1) B + 1 = 0.
except at k i ∈ {k1 , k2 , ..., k K } when the values are lower for | Ai |2 M NN−−M
1
K ' '2
2 N−M
σN (k i ) = M
N−1 ∑ 'Ap' ,
p =1
p ̸ =i
since all ith component values are then added up in phase at k = k i , without
random variations.
According to the central limit theorem, for 1 ≪ M ≪ N the real and
imaginary parts of the DFT values for noise only positions k ∈ / {k1 , k2 , ..., k K }
can be described by Gaussian distribution, N (0, σN 2 /2) with zero-mean and
respectively, where
N−M
2
σS2p = σN − A2p M ,
N−1
according to (10.49).
722 Sparse Signal Processing
x (t) = A1 exp( j2πk1 t/N ) + A2 exp( j2πk2 t/N ) + A3 exp( j2πk3 t/N ) (10.55)
16 64 128
M=16 M=64 M=128
12 48 96
8 32 64
4 16 32
0 0 0
1 128 257 1 128 257 1 128 257
signal transform
192 224 256
M=192 M=224 M=257
144 168 192
96 112 128
48 56 64
0 0 0
1 128 257 1 128 257 1 128 257
frequency
Figure 10.8 Initial DFT of a signal with various number of available samples M. Available M
samples are a random subset of N samples taken according to the sampling theorem interval.
Dots represent the original signal DFT values, scaled with M/N to match the mean value of
the DFT calculated using a reduced set of signal samples. The DFT values are presented as a
function of the frequency index.
the signal is estimated using the frequency from this and the previous step(s).
The estimated two components are subtracted from the original signal. The
frequency of next components is detected, and the process with estimation
and subtraction is continued until the energy is negligible. This iterative
procedure will be the topic of next subsection.
K
X (k ) = ∑ ∑ e− j2πn(k−k p )/N = K ∑ e− j2πn(k−k p )/N
n ∈ M p =1 n∈M
Distribution of noise only DFT, ℜ{X(k)} Distribution of noise only DFT, ℜ{X(k)}
0.2
p (ξ) M=16 pℜ{X(k)}(ξ) M=64
ℜ{X(k)}
0.2 0.15
0.1
0.1
0.05
0 0
-5 0 5 10 15 20 -20 0 20 40 60 80
0.2
p (ξ) M=16 p (ξ) M=64
ℜ{X(k )} ℜ{X(k )}
1 1
0.2 0.15
0.1
0.1
0.05
0 0
-5 0 5 10 15 20 -20 0 20 40 60 80
Figure 10.9 Histograms and Gaussian probability density functions for the signal and noise
only positions in the initial DFT for a three-component signal with N = 128 and M = 16 (left)
and M = 64 (right). The histograms are calculated in 10 5 random realizations of M available
samples and random signal frequency positions.
or
1
K < (1 + 1/µ).
2
According to several very unlikely assumptions that have been made, we
can state that this is a very pessimistic bound for K. Therefore, for a high
degree of randomness, a probabilistic approach may be more suitable for
the analysis than the spark based relation.
This kind analysis will be repeated on the case of Gaussian real-valued
random matrix. In this case there is no complete set of measurements. This
analysis then can be considered as a reduced set of measurements analysis.
In this case B C −1
X = AT A AT y
where
X0 = A T y
is the initial estimation. It uses available reduced set of M measurements y
to calculate N values of X0 . Its value is the same as if a complete transfor-
mation matrix existed and all values of the missing measurements (to com-
plete set of measurements) were considered as zero. If the initial estimation
X0 = A T y can produce correct positions of nonzero values in a K-sparse X
then the solution will be straightforward using only nonzero values of X
denoted by XK and corresponding measurements submatrix AK as
B C −1
XK = AKT AK AKT y.
K K
x (n) = ∑ X (k i )ψki (n) = ∑ Ai ψki (n)
i =1 i =1
with the elements of y being x (n) for n ∈ M and k i ∈ {k1 , k2 , ..., k K }. Then the
elements of X0 = A T y are
K
X0 ( k ) = ∑ A i
i =1
∑ ψk (n)ψki (n).
n∈M
726 Sparse Signal Processing
Obviously
E { X0 (k )} = 0 for k ̸= k i
E { X0 (k )} = Ai for k = k i
R S
since E ∑n∈M ψk2 (n) = 1. For k ̸= k i
∑ ψk (n)ψki (n) ≤ µ
n∈M
X0 ( k i ) = 1 − ( K − 1 ) µ
The signal components should assume this lowest possible case and it
should be greater than the highest possible value at a k ̸= k i
X0 (k ) = Kµ.
It should hold
1 − (K − 1)µ > Kµ
* +
1 1
K< 1+ .
2 µ
We can easily see why the coherence index based limit in Example 10.6,
produced very conservative estimate. It calculates the sparsity limit as-
suming that an order of K Gaussian variables ∑n∈M ψk (n)ψki (n) assume,
at the same time, maximal upper limit and that (K − 1) variables assume
at the same time lower limit −µ. The eigenvalue based calculation does
not make such an assumption. Therefore it is closer to the expected be-
havior, although it also assumes a specific, the worst case, signal form.
(Note: Show that any other A1 ≥ A2 ≥ ... ≥ AK ≥ 0 will produce more re-
laxed condition than when all amplitudes are equal A1 − µ( A2 + ... + AK ) >
µ( A1 + A2 + ... + AK )).
A realistic and very simplified probabilistic approach would be based
on:
(1) Variance of K random variables ∑n∈M ψk (n)ψki (n) corresponding
to signal components k i in the worst case is K/M.
Ljubiša Stanković Digital Signal Processing 727
M
K≤
36
For M = 1024 we get
K ≤ 28
or 2K < 28 for the unique solution, corresponding to Fig.10.5.
Algorithm
(i) Calculate the initial transform estimate X̂1 (k ) by using the avail-
able/remaining signal values x1 (n) = x (n)
and check
2
∑n∈M | x (n) − x̂1 (n)|
ϵ= 2
.
∑n∈M | x (n)|
If, for example ϵ < 10−5 , stop the calculation and use x (n) = x̂1 (n). If not
then go to the next step.
(ii) Set the counter to r = r + 1. Form a signal
Êr (k ) = ∑ er ( n ) ϕ k ( n ).
n∈M
Set the transform values Êr (k ) to zero at all positions k except the highest
one at k = k r . Form the set of r indices, using union of the previous maxima
positions and the detected position, as
K̂r = {K̂r−1 , kr }.
Form matrix Ar using the available samples in time n ∈ M and detected K̂r
indices k ∈ K̂r . Calculate the estimate of Kr transformation coefficients
B C −1
X̂Kr = ArH Ar ArH y.
and check
2
∑n∈M | x (n) − x̂r (n)|
ϵ= 2
.
∑n∈M | x (n)|
If, for example ϵ < 10−5 , stop the calculation and use
Assume now that an input additive noise ε(n) exists in the available signal
samples. Note that the noise due to missing samples influences the results
in the sense of the possibility to recover the signal. When the recovery is
achieved the result accuracy is related to the input additive noise in signal
samples and the number of available samples as it will be shown next.
The reconstruction equations (10.36) for noisy samples are
K
x (n) + ε(n) = ∑ X (k i )ψki (n), for n ∈ M.
i =1
for the detected indices k = {k1 , k2 , ..., k K }. Matrix form of these equations is
y+ε = AK XK .
730 Sparse Signal Processing
50
2 x(n) X(k)
40
1
30
0
20
-1
10
-2
0
0 20 40 60 -20 0 20
50
2 y(n) X1(k)
40
1
30
0
20
-1
10
-2
0
0 20 40 60 -20 0 20
50 50
X (k) X (k)
40 2 40 3
30 30
20 20
10 10
0 0
-20 0 20 -20 0 20
50 50
X4(k) X5(k)
40 40
30 30
20 20
10 10
0 0
-20 0 20 -20 0 20
and B C −1
XKN = AKH AK AKH ε
WB W
W H C −1 W W W
W
∥XKN ∥2 ≤ W AK AK W W HW
W WA K W ∥ ε ∥2
2 2
√
∥XKN ∥2 dmax 1 ,
≤ ≤ 1 + δK .
∥ ε ∥2 dmin 1 − δK
WP Q −1 W
W W
The fact that W AKH AK W ≤ 1/dmin ≤ 1/(1 − δK ) is used, where dmin
2 W W W W
is the smallest eigenvalue of WAKH AK W2 . The norm of WAKH W2 is equal to
FW W W W √ √
WA H AK W , meaning WA H W ≤ dmax ≤ 1 + δK , where dmax is the largest
K 2 W K 2
W
eigenvalue of WAKH AK W2 .
For a small noise, a simplified analysis can be performed. If all signal
samples were available, the input signal-to-noise (SNR) ratio, would be
2
∑nN=−01 | x (n)| Ex
SNRi = 10 log 2
= 10 log .
∑nN=−01 |ε(n)| Eε
Assume that the noise energy in the available samples is
The true amplitude in the signal transform at the index k p , in the case if
all signal samples were used, would be N A p , where A p is the amplitude
of the signal component corresponding to the index k p . To compensate
the resulting transform for the known bias in amplitude when only M
available samples are used the coefficient should be multiplied by N/M.
In a full recovery, a signal transform coefficient is equal to the coefficient
of the original signal with all signal samples being used. The noise in the
transform coefficients are multiplied by the same factor. The energy of noise
732 Sparse Signal Processing
2
∑nN=−01 | x (n)|
SNR = 10 log . (10.58)
N2 2
M2 ∑n∈M |ε(n)|
K N2
|ε(n)|2 .
N M2 n∑
EεR =
∈M
2
∑nN=−01 | x (n)|
SNR = 10 log 2
. (10.59)
K̂N
M2 ∑n∈M |ε(n)|
Since the variances in all samples and the available samples are the same
then
1 1 N −1
∑ |ε(n)|2 = |ε(n)|2
N n∑
(10.60)
M n∈M =0
Thus, the SNR in the recovered signal is
* +
K
SNR = SNRi − 10 log . (10.61)
M
and the detected frequencies are {k1 , k2 , k3 }. The input SNR was SNRi =
2.6383 [dB]. The theoretical result for the output SNR, for example, M =
N/2 = 128 and K = 3, according to (10.61) is SNR = 18.88 [dB]. For statistical
check of the results, 100 random realizations of the available sample positions
are used. The statistical SNR was obtained as SNR = 18.87 [dB]. Its agreement
with the theory is high.
According to the results in Section 10.4.4 the missing samples can be repre-
sented by a noise influence. Assume that we use a reconstruction algorithm
for a signal of sparsity K on a signal whose DFT coefficients X are not sparse
(or not sufficiently sparse). Denote by XK the sparse signal with K nonzero
coefficients equal to the largest K coefficients of X. Suppose that the number
of components K and the measurements matrix satisfy the reconstruction
conditions so that a reconstruction algorithm can detect (one by one or at
once) largest K components (A1 , A2 ,...,AK ) and perform signal reconstruc-
tion to get X R . The remaining N − K components (AK +1 ,AK +2 ,...,A N ) will
be treated as a noise in these K largest components. Variance from a signal
component is | Ai |2 M ( N − M )/( N − 1). After reconstruction this variance
is multiplied by ( N/M )2 , according to the analysis in previous subsection,
producing
N 2 M( N − M) ∼ N−M
| A i |2 2 2
= | Ai | N .
M N−1 M
The total energy of noise in the reconstructed K largest components X R will
be
N−M N
∥X R −XK ∥22 = KN | A i |2
M i=∑ K +1
Denoting the energy of remaining signal, when the K largest are removed
from the original signal, by
N
∥X − XK ∥22 = N ∑ | A i |2
i = K +1
we get
N−M
∥X R −XK ∥22 = K ∥X − XK ∥22 .
M
If the signal is sparse, i.e., X = XK , then
∥X R −XK ∥22 = 0.
734 Sparse Signal Processing
The same result follows if N = M. The error will be zero if a complete DFT
matrix is used in the calculation of any signal component.
Finally using Schwartz’s inequality for X − XK having N − K nonzero
elements,
1
∥ X − X K ∥2 ≤ √ ∥ X − X K ∥1 ,
N−K
follows =
N−M K
∥ X K − X R ∥2 ≤ ∥ X − X K ∥1 .
M N−K
In the case of additive input noise with variance σε2 , a general expres-
sion is obtained in the form
N−M K
∥X R −XK ∥22 = K ∥X − XK ∥22 + Nσε2 .
M M
Example 10.12. Consider a nonsparse signal
x (n) = e j2πk1 n/N + 0.8e j2πk2 n/N + 0.77e j2πk3 n/N + 0.75e j2πk4 n/N
255 * +[1+(i −5)/50]
1
+∑ e j2πki n/N
i =5
3
Note that a closed form expression for ∥X − XK ∥22 in SNRtheor can be obtained
since we assumed that the amplitudes of disturbing components are coeffi-
cients of a geometric series. One realization is presented in Fig.10.11.
In the case of additive complex-valued noise of variance σε2 = 2 the
results are
( )
∥XK ∥22
SNRstat = 10 log = 17.0593
∥X R −XK ∥22
( )
∥XK ∥22
SNRtheor = 10 log 2
= 17.0384.
K N− M K
M ∥ X − XK ∥2 + M Nσε
2
Ljubiša Stanković Digital Signal Processing 735
350
300
250
200
150
100
50
0
50 100 150 200 250
The simulation is repeated with M = 128 and the same noise. The SNR values
are SNRtheor = 14.3345 and SNRstat = 14.4980.
for p → 0. We can expect that the behavior of this measure will not signif-
icantly change if p is slightly increased from 0. This kind of concentration
measure with 0 ≤ p ≤ 1 has been used for decades in optimization of time-
frequency representations, as an alternative to measure based on the ratio
of higher order norms.
In compressive sensing the most commonly used sparsity measure
is the norm with p = 1 since it is the only convex function for p within
the interval 0 ≤ p ≤ 1. Convex form of the measure enables application of
linear programming in the solution of the minimization problem. Thus the
minimization problem formulation with p = 1 is
where
N −1
M1 = ∥ X ∥1 = ∑ | X (k)| .
k =0
N −1
∥X∥22 = ∑ | X (k)|2 .
k =0
Resulting transform X (k ) is then not sparse. It was the reason why this norm
was not used as a concentration measure as well.
Example 10.13. Minimization in a space with two variables x, y will be illustrated
on the cases with p = 1, p = 1/2, p = 1/4 and p = 2 using the condition
y = ax + ḃ. Note that in the case of p = 1 the result of function z = | x | + |y|
minimization subject to y = ax + ḃ is a point with minimal value of z =
| x | + |y| on the line where the surface z = | x | + |y| intersects with the plane
y = ax + ḃ (the plane y = ax + ḃ in x, y, z space is z independent). Constant
values of | x | + |y| are presented by isolines on the first subplot of Fig.10.12.
The minimal value of z is the one where projection of y = ax + ḃ on z = 0
touches the isoline of z = | x | + |y|. All points on isolines crossing this line
correspond to larger values of z = | x | + |y| while all isolines corresponding
to lower values of z = | x | + |y| do not have a common point with the plane
y = ax + ḃ. The minimization of z = | x | + |y| with y = ax + ḃ can also be
written as P ' 'Q
min (| x | + |y|) = min | x | + ' ax + ḃ' .
Since we have a sum of two piecewise linear functions | x | and | ax + b| its
minimum is either at x = 0 or at ax + b = 0 for | a| < 1 or | a| > 1, respectively.
Therefore the function z = | x | + | ax + b| will have a minimum at one of
these two points. For y = 0.5x + 1 the solution is (0, 1) and for y = 3x − 3
the solution is (1, 0), Fig.10.12. The solution is the same for p = 1, p = 1/2
(when z = | x |1/2 + | ax + b|1/2 ), and p = 1/4. For p = 2 the solution follows
as a minimum of z = x2 + ( ax + b)2 . It is (−0.4, 0.8) and (0.9, −0.3) for the
considered functions, respectively. This is just a mathematical illustration of
a constraint minimization. Due to its low dimensionality it cannot be defined
within the measurements and sparsity framework (when for sparsity K = 1
at least two measurements are required).
2 2
1 1
y=0.5x+1 y=0.5x+1
0 0
1/2 1/2
|y|+|x|=z |y| +|x| =z
-1 -1
y=3x-3 y=3x-3
-2 -2
-2 -1 0 1 2 -2 -1 0 1 2
2 2
1 1
y=0.5x+1 y=0.5x+1
0 0
1/4 1/4 2 2
|y| +|x| =z y +x =z
-1 -1
y=3x-3 y=3x-3
-2 -2
-2 -1 0 1 2 -2 -1 0 1 2
N −1
M0 = ∥ X ∥0 = ∑ | X (k)|0 = 2
k =0
N −1
4 9N
M1 = ∥ X ∥1 = ∑ | X (k)|1 = N (1 + ) = .
k =0
5 5
N −1
16 41N 2
M2 = ∑ | X (k)|2 = N 2 (1 + )= .
k =0
25 25
N −1
1 1
X (k) = ∑ x (n)e− j2πnk/N = Nδ(k − 5) + Nδ(k − 7) + Nδ(k − 2).
n =0
4 5
N −1
1 1 441N 2
M2 = ∑ | X (k)|2 = N 2 (1 + + )= .
k =0
16 25 400
N −1 @ A
X (k ) = ∑ A1 e j10πn/N + A2 e j14πn/N e− j2πnk/N
n =0
@ A
+ z − A1 e j10π2/N − A2 e j14π2/N e− j2π2k/N
= A1 Nδ(k − 5) + A2 Nδ(k − 7) + Z0 (k), (10.63)
with
@ A
Z0 (k) = z − A1 e j10π2/N − A2 e j14π2/N e− j2π2k/N = z0 e− j2π2k/N .
740 Sparse Signal Processing
It is obvious that
⎧
⎪
⎪ N for Z0 (k ) ̸= 0 and Z0 (5) ̸= − A1 N and Z0 (7) ̸= − A2 N
⎨ N−1 for Z0 (k ) ̸= 0 and (Z0 (5) = − A1 N or Z0 (7) = − A2 N)
M0 =
⎪
⎪ N−2 for Z0 (k ) ̸= 0 and (Z0 (5) = − A1 N and Z0 (7) = − A2 N)
⎩
2 for Z0 (k ) = 0, i.e., for z = A1 e j10π2/N + A2 e j14π2/N .
X (k ) = A1 Nδ(k − 5) + A2 Nδ(k − 7) + Z0 (k ),
follows
N −2
M1 = | A1 N + Z0 (5)| + | A2 N + Z0 (7)| + ∑ | Z0 (k)|
k =0
k̸=5,k̸=7
' ' ' '
' ' ' '
= 'A1 N + z0 e− j2π10/N ' + 'A2 N + z0 e− j2π14/N ' + ( N − 2) |z0 |
M1 = | A1 | N + | A2 | N.
( N − 4) | z0 | > 0
Ljubiša Stanković Digital Signal Processing 741
for any |z0 | ̸= 0. 2 The minimization result |z0 | = 0 is the same as in the ℓ0 -
norm based measure if N ≥ 5. The minimal requirement for this reconstruc-
tion is N = 5. The number of available samples is M = 4 and the signal spar-
sity is K = 2.
Note that the condition for ℓ0 -norm to fail for N = 4 was Z0 (5) =
z0 e− j2π10/N = − A1 N and Z0 (7) = z0 e− j2π14/N = − A2 N. It means that A1 =
A2 e− j2π4/N should hold. In the ℓ1 -norm the phases of A1 N and z0 e− j2π10/N
and the phases of A2 N and z0 e− j2π14/N should only be opposite, in the worst
case. The condition for the ℓ0 -norm to fail is just a special case of the ℓ1 -norm
condition with | A1 | = | A2 | = |z0 | /N. If the condition for the ℓ0 -norm to fail
is satisfied then the condition for the ℓ1 -norm to fail is satisfied as well. This
conclusion, drown from a very specific example, will be generalized later.
For the energy
N −1 N −1 N −1
M2 = NEx = ∑ | X (k)|2 = N ∑ | x (n)|2 = N ( ∑ | x (n)|2 + | x (2)|2 ).
k =0 n =0 n =0
n ̸ =2
2
Since the value of ∑nN=−0,n
1
̸=2 | x (n )| is constant (the available samples are exact
and that they should not be changed) then the value of M2 is minimal if
| x (2)| = |z| = 0.
Therefore in the ℓ2 -norm (or energy) based minimization the missing sample
will be set in such a way to produce the minimal energy. That is zero value
of the missing samples/measurements. The reconstructed DFT using M2
minimization is
@ A
X (k ) = A1 Nδ(k − 5) + A2 Nδ(k − 7) + − A1 e j10π2/N − A2 e j14π2/N e− j2π2k/N .
Since this is a direct search method then any valid sparsity measure can
be used. From the available samples we can estimate the range limits for
the missing samples A. For example, A = max | x (ni )|, i = 1, 2, ..., M. In the
direct search approach we can vary each missing sample value from − A to
A with a step ∆x = 2A/( L − 1), where L is the number of considered values
within the selected range. It is obvious that the reconstruction error in each
sample is limited by the step 2A/( L − 1) used in the direct search. Number
of the analyzed values for N − M coefficients (variables) is L( N − M) . For any
reasonable accuracy the value of L is large and the number of calculations
L( N − M) is extremely large. One possible approach to reduce the number
of calculations in the direct search is to use a large step (small L) for the
first (rough) estimation, then to reduce the step around the rough estimate
of unavailable/missing values x (n M+1 ), x (n M+2 ),..., x (n N ). This procedure
can be repeated several times, until the desired accuracy is achieved.
Example 10.15. Consider a discrete signal
x (n) = cos(2πn/N ) + 0.5 sin(8πn/N ) + 0.4 cos(30πn/N + π/3) − 0.8
(10.65)
for n = 0, 1, . . . , N − 1, and N = 256 is the number of signal samples. The case
of two missing samples x (n N −1 ) and x (n N ) is presented. The direct search
is performed over a wide range [−3, 3] with a step of 0.01. Sparsity measure
M p is calculated for p = 0, p = 1/2, p = 1, and p = 2. Results for M p /N
are shown in Fig. 10.13. The measure minimum is located on the true sample
values for p ≤ 1 (norms ℓ1 and lower). The measure minimum for p > 1 (ℓ2
norm, for p = 2) is not located at the true signal values, as expected.
Note that p ≤ 1 produces accurate position of the sparsity measure at
the missing sample positions. For ℓ0 -norm the value of measure is constant
and equal to N everywhere, except at the exact values of the missing samples.
For p = 2 the measure with ℓ2 -norm has a minimum when the missing signal
samples are set to zero, which is not the solution of this problem.
5 5
4.5 4.5
4 4
3.5 3.5
3 3
2.5 2.5
2 2
1 1
0 0
-1 -1
x(n ) x(n )
N -2 N -2
-3 -1 -2 -3 -3 -1 -2 -3
1 0 x(nN-1 ) 1 0 x(n )
N-1
5 5
4.5 4.5
4 4
3.5 3.5
3 3
2.5 2.5
2 2
1 1
0 0
-1 -1
x(n ) x(n )
N -2 N -2
-3 -1 -2 -3 -3 -1 -2 -3
1 0 x(n ) 1 0 x(n )
N-1 N-1
Figure 10.13 Measure as a function of two missing sample values yc (0) = x (n N −1 ) and
yc (1) = x (n N ) corresponding to various norms. True values of missing samples are presented
with lines. For the presentation all measures are normalized to the interval from 2.5 to 4.9.
The solution will be illustrated in the space of variables X (0), X (1), and
X (2).
Consider one measurement denoted by
Figure 10.14 Illustration of solution for N = 3 and K = 1 for various possible cases.
' '
' i X (0) i X (1) i X (2) '
' '
p = '' ψ0 (0) ψ1 (0) ψ2 (0) ',
'
' ψ0 (1) ψ1 (1) ψ2 (1) '
where i X (k) are unity vectors along coordinate axes representing X (k ). For
sparsity K = 1 the solution is unique if the measurements line is not within
746 Sparse Signal Processing
Then the measurements line of system will (10.67) not lie in one of the
coordinate planes, meaning that the solution is unique, Fig.10.14. Note that
the values of vector p components are equal to the determinants of the
system presented and discussed in the first illustrative example, (10.2).
In the ℓ0 -norm based minimization, the task is to solve
N −1
min ∥X∥0 = ∑ | X (k)|0 subject to y = AX
k =0
When the number of zero values of X (k ) is maximal then the number of its
nonzero values (the sparsity) is minimal.
Example 10.16. Find the minimal sparsity solution for measurements
0.3617X (0) − 0.4942X (1) + 0.3611X (2) = −0.4550
−0.2991X (0) − 0.4967X (1) + 0.4052X (2) = −0.5105
using combinatorial approach and ℓ0 sparsity measure.
⋆Start with possible sparsity K = 1. Then we find solutions of these
equations with all possible combinations with one nonzero coefficient: { X (0),
X (1) = 0, X (2) = 0}, { X (0) = 0, X (1), X (2) = 0}, and { X (0) = 0, X (1) = 0,
X (2)}. For each of these combinations we get a solution of the first and the
second equation. The solution which is the same for the first and second
equation is { X (0) = 0, X (1) = 0, X (2) = −1.2600}. It is the solution of the
problem. Signal is of sparsity card { X (k )} = 1.
Ljubiša Stanković Digital Signal Processing 747
N −1
min ∥X∥1 = ∑ | X (k )| subject to y = AX.
k =0
for i = 1, 2, ..., M.
For the graphical illustration we use the three-dimensional signal with
transformation coefficients X (k ), k = 0, 1, 2. We will also assume that the
sparsity is K = 1 and that M = 2 measurements/samples are available. In
this case we minimize
has a common point with line (10.68), Fig.10.15 (left). Since the sparsity
K = 1 is assumed, intersection of the measurements line is at the corner
of the ℓ1 -norm "ball". Considering the values of minimization function z =
| X (0)| + | X (1)| + | X (2)| along the line (10.68) its minimum will be achieved
at the corner, which is a sparse solution of the problem. It is important to
note that, in this case, the solution is the same as if we used minimization
748 Sparse Signal Processing
Figure 10.15 Illustration of solution with norm-one and norm-1/4 (close to norm-zero) for a
three dimensional case. In lower graphics a view from the direction where the measurements
line and norm-1/4 ball are touching is presented.
or X (2) = 0,
|ψ0 (0)ψ2 (1) − ψ2 (0)ψ0 (1)| > 0 and |ψ0 (0)ψ1 (1) − ψ1 (0)ψ0 (1)| > 0
the measurements line in the ℓ1 -norm case should not have such a direction
to intersect with (go thought) the ℓ1 -norm "ball" | X (0)| + | X (1)| + | X (3)| =
z0 . Therefore, in the worst case the measurements line should intersect the
plane X (0) = 0 just outside the thick line | X (1)| + | X (2)| = z0 . If a part
of line is in the first octant then it means that it should pass above the
line | X (1)| + | X (2)| = z0 , Fig.10.16. Several possible measurements lines are
presented in Fig.10.16 (top-left). Their intersections with X (0) = 0 plane are
denoted by numbers from 1 to 7. For the measurements lines presented by
2, 3 or 4, the ℓ1 -norm minimization will produce the correct result for X (k ).
It is ( X (0), 0, 0). Line 1 is the critical case when z = | X (0)| + | X (1)| + | X (3)|
is constant along whole line within the first octant (any value within this
interval can be the minimization solution). Value of z = | X (0)| + | X (1)| +
| X (3)| will not be minimal at ( X (0), 0, 0) for lines 5, 6 and 7. The ℓ1 -norm
function assumes lower values along these lines than at ( X (0), 0, 0) point, as
the line penetrate into the ℓ1 -norm "ball".
A unified condition for all possible nonzero values of X (k ), is that the
direction of the measurements line has such direction vectors p X (0) , p X (1) ,
and p X (2) that its minimal coordinate along any of axes X (k ) is such that it
passes above the minimization ℓ1 -norm "ball". It means
' ' ' ' ' ' ' ' ' ' ' '
' ' ' ' ' ' ' ' ' ' ' '
'p X (0) ' + 'p X (1) ' + 'p X (2) ' − max{'p X (0) ' , 'p X (1) ' , 'p X (2) '}
' ' ' ' ' ' > 1.
' ' ' ' ' '
max{'p X (0) ' , 'p X (1) ' , 'p X (2) '}
If this relation is satisfied for the worst case, it means that it holds for other
directions as well. Then the line should pass trough X (0) = 0 outside the
region indicated by thick lines in Fig.10.16. It includes lines 2,3, and 4. The
imposed condition is still very close to line 1. If the measurements line is
close to line 1 it would be sensitive to even a small noise.
750 Sparse Signal Processing
For a norm close to the ℓ0 -norm the condition reduces to the discussed case
when all direction coordinates should be slightly greater than zero
[' ' [
\' \' '
\ 'p X (1) '' \ ''p X (2) ''
\
5 \
5
] ' ' +] ' ' > 1.
' ' ' '
'p X (0) ' 'p X (0) '
In this case all measurements corresponding to lines 1-6 will produce correct
result. The measurement 7 is the only one which will not produce the correct
sparse solution, Fig.10.16 (bottom).
An ideal measurements line would correspond to the case when full
isometry is preserved, i.e., when
∥A2 X2 ∥22
1 − δ2 ≤ ≤ 1 + δ2
∥X2 ∥22
X(2) X(2)
5 3
7
1 2
6
4
X(1) X(1)
p=1 p=1
X(0) X(2) X(0) X(2)
5 3
7
1 2
6
4
X(1) X(1)
p = 0.5 p = 0.5
X(0) X(2) X(0) X(2)
5 3
7
1 2
6
4
X(1) X(1)
p = 0.2 p = 0.2
X(0) X(0)
Figure 10.16 Minimization function | X (0)| + | X (1)| + | X (3)| = z0 in the first coordinate
system octant (X (0), X (1), X (3) > 0) thick lines. A dot at (0, 1, 1) surrounded with a gray
rectangular region belongs to the ideal measurement line.
Figure 10.17 Minimization using the l1 -norm with the solution illustration for the case when
the measurements line crosses through the l1 -norm "ball".
Example 10.17. The previous relations are tested on K = 1 sparse signal with N = 3
possible values of X (k ) using two measurements with random Gaussian
coefficients ψk (n) = N (0, 1/2). Reconstruction mean square error for each
of 1000 realizations, classified using the measurements line directions, is
presented.
-In 791 random realizations we had the case that the measurements
line direction is outside the ℓ1 -norm "ball". The error in reconstruction using
the ℓ1 -norm minimization for the measurements line directions outside the
ℓ1 -norm "ball" is shown in Fig.10.18 (top). We see that for all cases with the
measurements line directions outside the ℓ1 -norm "ball" the reconstruction is
successful, with a small (computer precision) error.
Ljubiša Stanković Digital Signal Processing 753
Figure 10.18 Reconstruction square error in 1000 realizations, classified using the data line
direction: Error using l1 minimization for directions outside the l1 "ball" (top). Error using l1
minimization for directions through the l1 "ball" (second). Error using l1/2 minimization for
directions through the l1 "ball" (third). Error using l1/4 minimization for directions through the
l1 "ball" (bottom).
754 Sparse Signal Processing
-In 209 random realization we had the case that the measurements line
direction is crossing the ℓ1 -norm "ball", Fig, 10.17. In all these cases the ℓ1 -
norm based reconstruction was not successful. Error using ℓ1 minimization
for directions through the ℓ1 -norm "ball" is presented in Fig.10.18 (second).
-All 209 random realizations (when the measurements line direction
is crossing the ℓ1 -norm "ball") are also considered by using the l1/2 -norm
minimization. Then many of the measurements lines crossing the ℓ1 -norm
"ball" will not be crossing the ℓ1/2 -norm "ball". Recovery results for the
directions crossing the ℓ1 -norm "ball" by using the ℓ1/2 -norm minimization
are presented in Fig.10.18 (third). As expected many full recovery realizations
are achieved.
-Finally all 209 random realizations when the measurements line di-
rection is crossing the ℓ1 -norm "ball" are considered by using the ℓ1/4 -norm
minimization. Error using the ℓ1/4 minimization for directions through the
ℓ1 -norm "ball" is given in Fig.10.18 (bottom). All cases are successfully re-
covered since the ℓ1/4 -norm is close to the ℓ0 -norm. It would fail in a low
probable case when the measurements line would pass trough (or would be
very close to) one of the coordinate planes.
-Two specific examples of measurements (illustrating the reconstruc-
tion calculation) with directions inside and outside "ball" will be given in
detail next. For the direction outside the ℓ1 -norm "ball" the measurement
is normalized so that the energy of each column is ∥ψi ∥22 = 1. In that case
randomness is reduced and ψi (nm ) can be considered as coordinates of an
M-dimensional vector ψi whose ending points are on the M-dimensional
unity sphere. This condition can change behavior of the measurement ma-
trix.
Example 10.18. For the normal set of measurement coefficients (when the column
energies are normalized)
∥A2 X∥22 = |ψi (0) X (i ) + ψk (0) X (k)|2 + |ψi (1) X (i ) + ψk (1) X (k)|2
B C B C
= |ψi (0)|2 + |ψi (1)|2 | X (i )|2 + |ψk (0)|2 + |ψk (1)|2 | X (k)|2
+2 [ψi (0)ψk (0) + ψi (1)ψk (1)] X (i ) X (k)
and
∥A2 X∥22 − ∥X∥22 X (i ) X ( k )
= 2 [ψi (0)ψk (0) + ψi (1)ψk (1)]
∥X∥22 ∥X∥22
≤ ψi (0)ψk (0) + ψi (1)ψk (1) = µ(i, k)
[ψi (0)ψk (0) + ψi (1)ψk (1)]2 + [ψi (0)ψk (1) − ψk (0)ψi (1)]2
@ A @ A
ψi2 (0) ψk2 (0) + ψk2 (1) + ψi2 (1) ψk2 (0) + ψk2 (1) = 1
for any i, k, l. It means that the normalized matrix (for the three-dimensional
case) will always satisfy the condition that the ℓ1 -norm and the ℓ0 -norm
solutions are the same (measurements lines are always outside the ℓ1 -norm
"ball").
produces the same result as the ℓ0 -norm based minimization if the restricted
isometry property is satisfied with the constant
√
0 ≤ δ2K < 2 − 1.
Note that other possible upper bounds on the isometry constant have been
derived in literature. Illustration of the reason why the restricted isometry
condition has to be more strict in the ℓ1 -norm based minimization than in
the ℓ0 -norm is presented in the previous section. Proof is outside of the
mathematical tools used in this book.
Ljubiša Stanković Digital Signal Processing 757
∥ X K − X ∥1
∥X R −X∥2 ≤ C0 √ (10.70)
K
Example 10.19. Consider a signal with coefficients X = [ X0 a b] where |b| < | a| <
X0 . Consider M = 2 measurements with idealized measurements line when
δ2K = 0 (in real cases δ2K can be small but nor zero), defined by
X ( 0 ) − X0 X (1) − a X (2) − b
= = =t
−1 1 1
Find the result of minimization problem (10.69) as a function of a and b.
⋆Replacing X (0) = X0 − t, X (1) = a + t and X (2) = b + t, where t is
the line parameter, we get the value of minimization function z = ∥X∥1 along
the measurements line in the form
z = | X0 − t | + | a + t | + | b + t | .
t0 = median{ X0 , − a, −b}
since the function z increases both right and left from t0 . It increases with
rate 1 until the first of X0 , − a, −b is reached left and right, and then increases
toward +∞ as t tends toward ±∞. More details about median based mini-
mization will be given in the next subsection.
Illustration is presented in Fig.10.19 with
when
t0 = median{2/3, −2/9, 1/9} = 1/9 = −b
with
∥y − AX∥2 ≤ ϵ
then
∥ X K − X ∥1
∥X R −X∥2 ≤ C0 √ + C1 ϵ
K
where C0 and C1 are constants depending on δ2K .
Example 10.20. For Examples 10.3 and 10.4 estimate the maximal signal sparsity
when the solutions using the ℓ1 -norm based minimization and the ℓ0 -norm
based minimization are the same.
√
⋆The restricted isometry property is satisfied with ρK = λmax < 2 − 1
for K = 24 in Example 10.3. It means that the uniqueness is guarantied for
signals of sparsity K/2 = 12. Note that this is a statistical estimate in 10000
realizations. The true bound is slightly lower.
In the case of the DFT matrix√ in Example 10.4 the restricted isometry
property was satisfied with ρK < 2 − 1 for K = 2 only, meaning that in the
recovery we can guarantee the same solution for sparsity K = 1 only, with
M = 6 out of N = 8 samples.
The order of signal sparsity K such that the signal can be recovered
using M measurements/samples has been derived in literature as
M
K<C for Gaussian measurement matrix
log( N/M )
M
K<C for partial DFT matrix
log6 N
Ljubiša Stanković Digital Signal Processing 759
Figure 10.19 Minimization using the ℓ1 -norm and the solution illustration for the case when
the measurements line corresponds to noisy data.
is the same as K < 12 (1 + 1/µ). Note that δ2K = (2K − 1)µ is just a bound
of δ2K . For a matrix A there could be a lower and less restrictive constant
satisfying the restricted isometry property.
P Q −1
In an ideal case the matrix AKT AK should be identity matrix for any
combination of K columns. It means that the lines are with vector coordinate
1 in each direction. Reconstruction condition would be always satisfied. The
transformation y = AX would correspond to a rotation on a sphere with all
axis 1. Each X (0), X (1), X (2) would be transformed as y = AX keeping its
amplitude. Since this is not the case then the transform y = AX will change
amplitudes in addition to the rotation. For matrix A (not square matrix) the
maximal gain of vector X is obtained in a direction defined by the maximal
eigenvector. In reality
B C −1 B C−1
XK = AKT AK AKT y = AT
K AK X
0
1
∥XK ∥22 ≤ ∥X0 ∥22
d2min
√
with d2min = (1 − δk )2 . The condition δ2 < 2 − 1 would here mean that
1/d2min > 0.343. It has been assumed that E A = 1.
Then
X (1) = X (0) − b1
X (2) = X (0) − b2
...
X ( N − 1 ) = X (0 ) − b N −1 (10.71)
Then for any value of coefficients bi at least one X (k ) will be equal to zero,
since at least one of the elements in z is zero. It means that the solution will
be of sparsity K = N − 1 at least.
In order to prove that the median produces position of (10.72) mini-
mum assume that the total number of terms N is an odd number. Function
z in (10.72) is a sum of the functions of form | x − a|. The rate (derivative) of
these functions is +1 for x > a and −1 for x < a. If there are N terms, as in
(10.72), then the rate of function z will be + N for x → ∞. Going now back
from x → ∞ toward the term with largest shift, the rate will remain + N.
At the position of the largest shift, the rate of this term will change from
+1 to −1 meaning that the overall rate of z will be reduced to +( N − 2).
By passing each term, the rate will be reduced for additional factor of 2. It
means that after the kth term the rate will be ( N − 2k ). The rate of z will
change its sign when ( N − 2k ) = −1. This will be the position of function z
minimum. It is k = ( N + 1)/2 and it corresponds to the middle coefficient
positions, i.e., to the median of coefficients (shifts).
Example 10.21. As an example consider the case with N = 7 and M = 6 measure-
ments AX = y producing an ideal line in a seven-dimensional space of the
form (10.71). with b1 = 0.7, b2 = 0.2, b3 = −0.5, b4 = 1, b5 = 0.8, and b6 = −0.9.
For the data presented in Fig.10.20 the solution is | X (0)| = arg{min{z}} =
median{0, 0.7, 0.2, −0.5, 1, 0.8, −0.9} = 0.2 with the coefficient corresponding
to X (2) = X (0) − 0.2 = 0 being equal to zero.
If the signal sparsity is K < N/2 then there will exist more than
N/2 values bi = b such that | X (0) − bi | = 0. The solution of minimization
problem then will not depend on other bk ̸= bi = b and will be unique
762 Sparse Signal Processing
1.8
1.6
1.4
1.2
0.8
0.6
0.4
0.2
0
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1
7
dz/dx=-5
dz/dx=N=7
6
dz/dx=5
5 dz/dx=-3
dz/dx=3
dz/dx=1
dz/dx=-1
4
arg{min{z}}= median{x1,x2,x3,x4,x5,x6,x7}
3
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1
For P = 1 we get
| a0 | + | a1 | + | a2 | − max{| a0 | , | a1 | , | a2 |}
> 1.
max{| a0 | , | a1 | , | a2 |}
Consider next the case when two degrees of freedom exist (with M =
N − 2 measurements). All coefficients X (k ) can be expressed as a function
of, for example, X (0) and X (1) as
Then
lines
X (0 ) = 0
X (1 ) = 0
a2,0 X (0) + a2,1 X (1) − b2 = 0
...
a N −1,0 X (0) + a N −1,1 X (1) − b N −1 = 0
AX = y
-1
-0.5
median (z(x,y))=(1,0)
2
0
0.5
1.5
-1.5 -1 -0.5 0 0.5 1 1.5
⎡ ⎤ ⎡ ⎤⎡ ⎤
x ( n1 ) ψ0 (n1 ) ψ1 (n1 ) ψ N − M −1 ( n 1 ) X (0 )
⎢ x (n2 ) ⎥ ⎢ ψ0 (n2 ) ψ1 (n2 ) ψ N − M −1 ( n 2 ) ⎥ ⎢ X (1 ) ⎥
⎢ ⎥=⎢ ⎥⎢ ⎥
⎣ ... ⎦ ⎣ ... ... ... ⎦⎣ ... ⎦
x (n M ) ψ0 (n M ) ψ1 (n M ) ψ N − M −1 ( n M ) X ( N − M − 1)
⎡ ⎤⎡ ⎤
ψ N − M ( n 1 ) ψ N − M +1 ( n 1 ) ψ N −1 ( n1 ) X ( N − M)
⎢ ψ N − M ( n 2 ) ψ N − M +1 ( n 2 ) ψ N −1 ( n2 ) ⎥ ⎢ ⎥
+⎢ ⎥ ⎢ X ( N − M + 1) ⎥
⎣ ... ... ... ⎦ ⎣ ... ⎦
ψ N − M ( n M ) ψ N − M +1 ( n M ) ψ M −1 ( n M ) X ( N − 1)
766 Sparse Signal Processing
or
min ∥X∥1 subject to ∥y − AX∥22 < ε,
subject to the minimal energy values of X, i.e. subject to ∥X∥22 . The mini-
mization of the ridge constraint problem can be reformulated in Lagrangian
form using a parameter λ as
M N
X = arg min ∥y − AX∥22 + λ ∥X∥22 .
X
Minimization of
∂F (X)
= −2A T y + 2A T AX + 2λX = 0
∂X T
as B C −1
Xridge = A T A + Iλ A T y.
Parameter λ balances the error and constraint. Its inclusion makes that the
inversion is nonsingular even if A T A is singular. Real valued matrix A is
assumed, otherwise Hermitian conjugate and transpose A H would be used.
The standard ridge regression minimizes the energy of solution X (k )
and not its sparsity, Fig.10.22. That is the reason while the ℓ1 -norm con-
straint is introduced in the cost function
1 1 1
0 0 0
-1 -1 -1
-1 0 1 -1 0 1 -1 0 1
Figure 10.22 Minimization with constraint: in ridge regression (left), LASSO regression (mid-
dle), and the ℓ1/4 -norm being a function closer to the ℓ0 -norm .
Function ∥X∥1 promotes sparsity. It produces the same results (under cer-
tain conditions) as if ∥X∥ p , with p close to 0, is used, Fig.10.22.
The minimization problem with the ℓ1 -norm constraint does not have a
close form solution. It is solved in iterative ways. In order to define an
iterative procedure we will add a nonnegative term, having zero value at
the solution Xs of the problem,
to the function F (X). This term will not change the minimization solution.
New function is
where α is such that the added term is always nonnegative. It means α >
λmax , where λmax is the largest eigenvector of A T A. Gradient of H (X) is
∂H (X)
∇ H (X)= = −2A T y + 2A T AX+λsign{X} + 2(αI − A T A)(X − Xs ).
∂X T
Solution of ∇ H (X) = 0 is
λ
−A T y+ sign{X}−(αI − A T A)Xs + αX = 0
2
λ 1
X+ sign{X} = A T (y − AXs ) + Xs .
2α α
Ljubiša Stanković Digital Signal Processing 769
λ 1
X s +1 + sign{Xs+1 } = A T (y − AXs ) + Xs .
2α 2α
x + λsign( x ) = y
or
soft(y, λ) = sign(y)max{0, |y| − λ}.
The same rule can be applied to each coordinate of vector Xs+1 ,
1 λ
Xs+1 =soft( A T (y − AXs ) + Xs , ) (10.73)
α 2α
or
1 λ
X (k )s+1 =soft( ( a(k ) − b(k )) + X (k )s , )
α 2α
where a(k ) and b(k ) are coordinates of vectors a and b defined by a = A T y
and b = A T AXs .
This is the iterative soft-thresholding algorithm (ISTA) for LASSO
minimization. It can be easily modified to improve convergence to fast ISTA
(FISTA). Note that this is just one of possible solutions of the minimization
problem with the ℓ1 -norm.
The Lagrangian constant λRis a balance S between the error and the
ℓ1 -norm value, while α = 2 max eig{A T A} is commonly used. The al-
gorithms that solve this kind of problem are implemented as functions
X =lasso(A, y).
Example 10.22. Measurement matrix A is formed as a Gaussian random matrix
of the size 40 × 60. Since there are 40 measurements the random variable
N (0, σ2 ) with σ2 = 1/40 is used. The original sparse signal of the total
length N = 60 is X (k ) = δ(k − 5) + 0.5δ(k − 12) + 0.9δ(k − 31) − 0.75δ(k −
45) in the transformation domain. It is measured with a matrix A with
40 measurements stored in vector y. All 60 signal values are reconstructed
using these 40 measurements y and the matrix A, in 1000 iterations. In
770 Sparse Signal Processing
1 1
λ=0.01 λ=0.0001
0.75 0.75
sparse signal X(k)
Figure 10.23 A sparse signal with N = 60 and K = 4 reconstructed using a reduced set of
M = 40 observations and LASSO iterative algorithm. The results for λ = 0.01 and λ = 0.0001
are presented.
the initial iteration X0 = 0 is used. Then for each next s the new values
of X are
R calculated S using (10.73), given data y and matrix A. Value of α =
2 max eig{A T A} is used. The results for λ = 0.01 and λ = 0.0001 are
presented in Fig.10.23. For very small λ = 0.0001 the result is not sparse, since
the constraint is too weak.
(m)
where yc is the vector of missing samples in the mth iteration and M is
(m)
the sparsity measure. Gradient of sparsity measure calculated at yc = yc is
denoted by ∂M/∂yc | (m) , while α is the iteration step. For the algorithm
yc =yc
convergence a convex measure function is required.
A signal x (n) that is sparse in a transformation domain X (k ) =
T { x (n)} is used for illustration. As in Example 10.15 it has been assumed
that two samples x (n N −1 ) and x (n N ) are not available, yc = ( x (n N −1 ),
x (n N )). Signal x a (n) is formed. Its values at the available sample positions
y = ( x (n1 ), x (n2 ), ..., x (n M )), M = N − 2, are considered as constants. Sam-
ples x (n N −1 ) and x (n N ) at the positions q1 = n N −1 and q2 = n N are con-
sidered as variables. For various values of x (n N −1 ) and x (n N ) the sparsity
measure of x a (n) is calculated as M = ∥T [ x a (n)]∥1 = ∥X a ∥1 and presented
in Fig. 10.24, along with illustration of the gradient ∂M/∂yc |yc =0 coordi-
nates at x (n N −1 ) = 0, x (n N ) = 0 .
Consider a signal x (n) with available samples at n ∈ M. Signal is
sparse in a transformation domain X (k ) = T { x (n)}. The DFT will be used
as a study case, X (k ) = DFT[ x (n)].
(0)
As the initial estimate of reconstructed signal x a we will use values
that would follow as a result of the ℓ2 -norm based minimization of the signal
(0)
transform. Values of x a are
!
(0) 0 for missing samples, n ∈ NQ
x a (n) =
x (n) for available samples, n ∈ M
where NQ is the set of missing sample positions. The available samples are
considered as constants, while the missing samples are changed through
(m)
iterations. Denote by x a the values of the signal reconstructed after m
iterations. The minimization process can be described as
(m)
min ∥X a ∥1 subject to x a (n)= x (n) for n ∈ M
(m)
where Xa (k ) = DFT[ Xa (n)]. Since the task is to find the position of func-
tion z = ∥X a ∥1 minimum, trough an iterative procedure, the relation for
772 Sparse Signal Processing
5
|| X ||
a 1
4.5
3.5
2.5
2
1
-1
x(n )
N
-2
x(n ) -3
-3 -1 -2 N-1
1 0
Figure 10.24 Sparsity measure function in the case of two unavailable signal sam-
ples yc = ( x (n N −1 ), x (n N )) with corresponding gradient. Available samples are y =
( x (n1 ), x (n2 ), ..., x (n N −2 )).
∥X+ −
a ∥1 − ∥ X a ∥1
g ( ni ) =
2∆N
Ljubiša Stanković Digital Signal Processing 773
where
Xa+ (k ) = T { x +
a (n )}
Xa− (k ) = T { x −
a (n )}
and
(m)
x+
a ( n ) = x a ( n ) + ∆δ ( n − ni )
(m)
x−
a ( n ) = x a ( n ) − ∆δ ( n − ni ).
Before presenting the algorithm, the basic idea and parameters in (10.74)
will be discussed. Assume first a simple case when a single signal sample at
n0 ∈ NQ is not available, with card {M} = N − 1. This sample is considered
as variable. It may assume an arbitrary signal value x a (n0 ) = x (n0 ) + z(n0 ),
where z(n0 ) is a variable representing shift from the true signal value at n0 .
In order to estimate the finite difference of the sparsity measure
N −1
∥ X a ∥1 = ∑ | Xa (k)| ,
k =0
x+
a ( n ) = x ( n ) + ( z ( n ) + ∆ ) δ ( n − n0 )
x−
a ( n ) = x ( n ) + ( z ( n ) − ∆ ) δ ( n − n0 ),
∥X+ −
a ∥1 − ∥ X a ∥1
g ( n0 ) = .
2N∆
774 Sparse Signal Processing
The pulses δ(n − n0 ) are uniformly spread over all frequencies in the DFT
domain. Then
W +W N −1 ' '
WX a W = ∑ ' Xa+ (k )' ∼
= µ + | z ( n0 ) + ∆ | N
1
k =0
W −W N −1 ' '
WX a W = ∑ ' Xa− (k )' ∼
= µ + |z(n0 ) − ∆| N,
1
k =0
where µ = ∥X∥1 is the sparsity measure of the original signal x (n). There-
fore the gradient approximation of the sparsity measure ∥X a ∥1 along the
direction of variable z(n0 ) is
∥X+ −
a ∥1 − ∥ X a ∥1 ∼ | z ( n0 ) + ∆ | − | z ( n0 ) − ∆ |
g ( n0 ) = = .
2N∆ 2∆
For deviations from the true signal value smaller than the step |z(n0 )| < ∆
we get
z ( n0 )
g ( n0 ) ∼
= ∼ z ( n0 ). (10.75)
∆
It means that the gradient value can be used as an indicator of the signal
value deviation from the correct value (this property will be later used
for detection of impulsive noise in signal samples as well). For a large
|z(n0 )| > ∆
1
g ( n1 ) ∼
= sign(z(n0 )). (10.76)
2
In that case the gradient assumes correct direction toward minimum posi-
tions, with a deviation independent intensity.
In order to analyze the influence of ∆ to the solution precision, when
z(n0 ) is very small, assume that we have obtained the exact solution and
that the change of sparsity measure is tested on the change of sample x (n0 )
for ±∆. Then for a signal x (n) = ∑iK=1 Ai e j2πn0 ki /N of sparsity K the DFTs of
Ljubiša Stanković Digital Signal Processing 775
x+ −
a (n ) = x ( n ) + ∆δ (n − n0 ) and x a ( n ) = x (n ) − ∆δ ( n − n0 ) are
W +W '
K '
WX a W = ' − j2πn0 k i /N '
1 ∑ i
'A + ∆e ' + ( N − K )∆
i =1
W −W K ' '
WX a W = ' − j2πn0 k i /N '
1 ∑ i
'A − ∆e ' + ( N − K )∆.
i =1
For the worst case analysis, assume that Ai are in phase with e− j2πn0 ki /N
and ∆ ≤ | Ai | when
W +W K
WX a W = ∑ | Ai | + K∆ + ( N − K)∆ = µ + N∆
1
i =1
W −W K
WX a W = ∑ | Ai | − K∆ + ( N − K)∆ = µ + ( N − 2K )∆.
1
i =1
N (∆ − b) = ( N − 2K )(∆ + b) (10.77)
K K
b= ∆∼= ∆ for K ≪ N. (10.78)
N−K N
The bias upper limit can be reduced by using very small ∆. However,
calculation with a small ∆ would be time consuming (with many iterations).
Efficient implementation can be done by using ∆ of an order of signal
amplitude in the initial iteration. When the algorithm reaches a stationary
point, with a given ∆, the value of mean squared error will assume its almost
constant value. The error will be changing the gradient direction around
correct point only, for almost π. This fact may be used as an indicator
to reduce the step ∆, in order to approach the signal true value with a
given precision. For example, if the signal amplitudes are of order of 1 and
K/N = 0.1 taking ∆ = 1 in the first iteration will produce the solution with
a precision better than 20 [dB]. Then, the step ∆ should be reduced, for
example to ∆ = 0.1. A precision better than 40 [dB] would be obtained, and
so on.
Through simulation study it has been concluded that appropriate step
parameter value in (10.74) is related to the finite difference step as α = 2∆.
776 Sparse Signal Processing
10.7.2.2 Algorithm
The presented analysis is used as a basic idea for the algorithm summarized
as follows:
(0)
Step 0:) Set m = 0 and form the initial signal estimate x a (n) defined for n ∈
N as
!
(0) 0 for missing samples, n ∈ NQ
x a (n) = , (10.79)
x (n) for available samples, n ∈ M
(m)
Step 1: Set x p (n) = x a (n). This signal is used in Step 3 in order to estimate
reconstruction precision.
Step 2.1: Set m = m + 1. For each missing sample at ni ∈ NQ form signals
x+ −
a ( n ) and x a (n ):
(m)
x+
a ( n ) = x a (n ) + ∆δ (n − ni )
(m)
x−
a ( n ) = x a (n ) − ∆δ (n − ni ). (10.81)
Step 2.2: Estimate differential of the signal transform measure
∑kN=−01 Gm−1 (k ) Gm (k )
β m = arccos F F
N −1 2
G
∑ k =0 m −1 ( k ) ∑kN=−01 Gm
2 (k )
Ljubiša Stanković Digital Signal Processing 777
If angle β m is lower than 170◦ and the maximal allowed number of iterations
is not reached (m < mmax ) go to Step 2.1.
Step 3: If the maximal allowed number of iterations is reached stop the
algorithm. Otherwise calculate
(m)
∑n∈NQ | x p (n) − x a (n)|2
Tr = 10 log10 (m)
.
∑n∈NQ | x a (n)|2
(m) (m)
with Xa (k) = T { x a (n)} and Dni (k ) = T {δ(n − ni )} = exp(− j2πni k/N ),
for the DFT and each ni ∈ M. Since Dni (k ) are independent of the iteration
number m they can be calculated independently from the DFT of the signal.
Example 10.23. Consider a signal
n
x (n) = 3 sin(20π )
N
with N = 8. Missing samples are n ∈ N Q = {1, 6}. The signal is reconstructed
using a simplified gradient based algorithm using Step 0 to Step 2.4, from
(10.79) to (10.83), in 60 iterations. The initial algorithm parameter ∆ = 1 and
778 Sparse Signal Processing
4
4
3.5 5.5
3.5
4.5
3
2.5
2
5
3.5
4
1.5
4.5
4
1
5.5
0.5
-0.5 4.5
5
6
-1
-4 -3 -2 -1 0 1
the initial value of missing samples x (1) = 0 and x (6) = 0 are used. The values
of missing samples in the first 20 iterations are presented by dots (connected
by a line) in Fig.10.25. After about 6 iterations the algorithm with ∆ = 1
does not significantly change the missing sample values (zoomed changes
are shown in lower subplot within the figure). Close to the stationary point
obtained for ∆ = 1 the gradient coordinates are almost zero-valued (with
direction changes for almost π), since the measures are on the contour with
almost the same measure (circles). After the step is reduced to ∆ = 0.1 in
the 20th iteration, the algorithm resumes its fast approach toward the exact
value, until a new stationary state. With a new change of ∆ to ∆ = 0.01 the
approach is again continued.
K
The stationary state bias for ∆ = 1 is lower than N ∆ = 1/4 (it corre-
sponds to the bias caused MSE lower than 15.5 [dB]). By each reduction of ∆
to ∆/10 the bias caused MSE will be lower for 20 [dB]. The reconstruction re-
sult and the MSE for the estimated missing values x (1) and x (6) is presented
in Fig.10.26.
The calculation is repeated with the signal
n n n
x (n) = 3 sin(20π ) + 2 cos(60π ) + 0.5 sin(46π )
N N N
and N = 32. Missing samples are n ∈ N Q = {2, 4, 5, 7, 9, 13, 17, 19, 24, 26, 28, 31}.
The result for this case is shown in Fig.10.27.
Ljubiša Stanković Digital Signal Processing 779
(m)
Considering the complex number a = Xa (k )/(∆Dni (k )), with | a| ≪ 1 for
a large ∆, from the problem geometry it is easy to show that the following
bounds hold 0 ≤ ||1 + a| − |1 − a|| ≤ 2 | a| . Exact value of this expression
depends on the phase of a. Therefore,
'' ' ' '' ' '
' (m) '
0 ≤ '' Xa+ (k )' − ' Xa− (k )'' ≤ 2 'Xa (k )' .
Lower
' ' 0 is obtained if a is imaginary-valued, while the upper limit
limit
' (m) '
2 'Xa (k )' follows if a is real-valued.
It means that the value of the finite difference | Xa+ (k )| − | Xa− (k )| , that
is used to correct the missing signal samples, does not depend on the value
of the step ∆ if ∆ is large. The missing signal values will be adapted for
780 Sparse Signal Processing
4 4
2 2
0 0
-2 -2
-4 -4
0 10 20 30 0 10 20 30
Reconstructed signal in 5 iteratons Reconstructed signal in 15 iteratons
4 4
2 2
0 0
-2 -2
-4 -4
0 10 20 30 0 10 20 30
Reconstructed signal in 60 iteratons Reconstruted MSE in [dB]
0
4
-20
2
0 -40
-2
-60
-4
-80
0 10 20 30 0 20 40 60
time n iteration
2
∑nN=−01 | x (n)|
SRR = 10 log 2
. (10.85)
∑nN=−01 | x (n) − x R (n)|
Bright colors indicate the region where the algorithm had fully recovered
missing samples in all realizations, while dark colors indicate the region
where the algorithm could not recover missing samples in any realization.
In the transition region for M slightly greater than 2K we have cases when
the signal recovery is not achieved and the cases of full signal recovery. The
simulations are done for N = 128 and for N = 64, Fig.10.28(a),(b). A stopping
criterion for the accuracy of 120 [dB] is used. It corresponds to a precision
in the recovered signal of an input samples precision if they are acquired by
a 20-bit A/D converter. The case with N = 64 is repeated with an additive
input Gaussian noise such that the input signal-to-noise ratio is 20 [dB] in
each realization Fig.10.28(c). The reconstruction error in this case is limited
by the input signal-to-noise value. The number of iterations to achieve the
required precision is presented in Fig.10.28(d). We can see that the number
of iterations is well bellow 100 for the most important region where the
reconstruction was achieved in all realizations (high values of M and small
value of K, M ≫ K).The number of iterations is quite small in the region
where the reconstruction can be achieved.
An illustration of the algorithm performance regarding to the SRR and
the gradient angle β m in one realization, with K = 6, is presented in Fig.10.29.
The algorithm reached 120 [dB] accuracy in 47 iterations. From√ the gradient
angle graph we see that the algorithm step is reduced to ∆/ 10 → ∆ in about
each 4 iterations. According √ to (10.77) the expected MSE improvement by
each reduction of ∆ is 20 log( 10) = 10 [dB].
782 Sparse Signal Processing
60 30
120 120
50 25
100 100
sparsity K
sparsity K
40 20 80
80
30 60 15 60
20 40 10 40
10 20 5 20
0 0
20 40 60 80 100 120 20 40 60
(a) available samples M (b) available samples M
30 30 400
15
25 25
300
sparsity K
sparsity K
20 20
10
15 15 200
10 5 10
100
5 5
0
20 40 60 20 40 60
(c) available samples M (d) available samples M
Figure 10.28 Signal-to-reconstruction-error (SRR) averaged over 100 realizations for various
sparsity K and number of available samples M: (a) The total number of samples is N = 128. (b)
The total number of samples is N = 64. (c) With a Gaussian noise in the input signal, SNR = 20
[dB] and N = 64. (d) Number of iterations to reach the solution with the defined precision.
100
50
0
0 10 20 30 40 50
iteration
100
SSR [dB]
50
0 10 20 30 40 50
iteration
Figure 10.29 Angle between successive gradient estimations β m and the signal-to-
reconstruction-error ratio (SRR) as a function of the number of iterations in the algorithm for
one signal realization with 6 nonzero DFT coefficients and M = 64.
will be analyzed here. For the available signal positions n ∈ M the value of
z(n) is fixed z(n) = 0, while z(n) may take arbitrary value at the positions of
missing samples n = qm ∈ NQ = {q1 , q2 , ...., q Q }. If x (n) is a K sparse signal
then the DFT of x a (n) is
Xa (k ) = X (k ) + Z (k )
K Q
= N ∑ Ai δ(k − k0i ) + ∑ z(qm )e− j2πqm k/N .
i =1 m =1
Positions of nonzero values in X (k ) are k0i ∈ K = {k01 , k02 , ...., k0K } with
amplitudes X (k0i ) = N Ai . The values of missing samples of x a (n) = x (n) +
z(n) for n ∈ NQ are considered as variables. The goal of reconstruction
process is to get x a (n) = x (n), or z(n) = 0 for all n ∈ N. This goal should be
achieved by minimizing a sparsity measure of the signal transform Xa (k ).
Existence of the unique solution of this problem depends on the number of
missing samples, their positions, and the signal form.
If a signal with the transform X (k ) of sparsity K is obtained using a re-
construction method, with a set of missing samples, then the reconstruction
X (k ) is unique if there is no other signal of the same or lower sparsity that
satisfies the same set of available samples (using the same set of missing
samples as variables).
Example 10.25. Consider the simplest case of one missing sample at position
n = q. The signal sparsity is K. Signal reconstruction is based on x a (n) =
x (n) + zδ(n − q) where z indicates an arbitrary deviation from the true signal
value, since the missing sample x (q) is considered as variable. The DFT of
x a (n) is
K
Xa (k ) = N ∑ Ai δ(k − k0i ) + ze− j2πkq/N .
i =1
' K '0 N
' − j2πk0i q/N '
card{X a } = ∥X a ∥0 = ∑ 'N A i + ze ' + ∑ | z |0
i =1 i = K +1
Ljubiša Stanković Digital Signal Processing 785
Possible sparsity of Xa (k ) is
⎧
⎪
⎪ N for |z| ̸= 0 and z ̸= − N Ai e j2πk0i q/N for any i
⎪
⎪
⎨ N − 1 for |z| ̸= 0 and z = − N Ai e j2πk0i q/N for one i only
∥ X a ∥0 = ... ... ...
⎪
⎪ j2πk0i q/N for i = 1, .., K
⎪
⎪ N − K for | z | ̸ = 0 and z = − N A ie
⎩
K for |z| = 0.
(10.86)
With just one missing value and arbitrary signal, the minimum of ∥X a ∥0 is
achieved at |z| = 0 only if the signal sparsity is lower than the lowest possible
sparsity with |z| ̸= 0,
K < N − K.
It means K < N/2. For K = N/2 the last two rows of (10.86) will produce the
same result N − K = N/2 and K = N/2. In that case the minimum of ∥X a ∥0
is not unique. Note that this is true only if the considered signal x (n) has a
very specific form
A1 e j2πk01 q/N = A2 e j2πk02 q/N = A3 e j2πk03 q/N = ... = AK e j2πk0K q/N = C. (10.87)
Example 10.26. Consider a signal x (n) with N = 32 and two missing samples at
qm ∈ N Q = {3, 19}.
Signal sparsity is K. In order to simplify the notation assume that one DFT
value of the reconstructed signal is X (5) = 2.
(a) Show that the limit for sparsity K (when we can claim that the
reconstructed sparse signal is unique, assuming that all signal amplitudes
may assume arbitrary values) is K < 8.
(b) What are the properties that a signal must satisfy in the limit case
K = 8 so that the solution is not unique?
(c) What is the sparsity limit if the missing samples are at qm ∈ N Q =
{5, 9}?
786 Sparse Signal Processing
In the worst case for the minimization Z (k ) should have maximal possible
number of zeros and they should remain in Xa (k) = X (k ) + Z (k ). We con-
clude that either z3 = z19 or z3 = −z19 should hold (when sparsity of Z (k) is
16), otherwise the sparsity of Z (k ) would be 32. In addition, in the worst case
nonzero values of Z (k ) could cancel out all K components including assumed
X (5) = 2. Therefore the maximal number of zeros in Xa (k ) with nonzero z(n)
is 16 + K. The sparsity of Xa (k ) is then 32 − (16 + K ). It should be greater than
the sparsity K of the correct solution when all z(n) = 0 and Xa (k) = X (k ). It
means
32 − (16 + K ) > K
should hold. This completes the proof that K < 8 should hold.
(b) Since z3 = z19 would produce Z (2k + 1) = 0 it would not be able
to cancel X (5). Therefore for the worst case analysis we must use z3 = −z19
with
Z (5) = e− j2π15/32 (z3 − z19 ) = − X (5) = −2.
It means z3 = −z19 = −e j2π15/32 and
%
−2e− j2π (3k−15)/32 for odd k
Z (k) =
0 for even k.
X (k) ̸= 0 for k ∈ {5, k02 , k03 , k04 , k05 , k06 , k07 , k08 }.
The values of X (k) must be of opposite sign and equal amplitude to the
corresponding (determined) values of Z (k )
resulting in
%
2e− j2π (3k−15)/32 for k ∈ {5, k02 , k03 , k04 , k05 , k06 , k07 , k08 }
X (k ) = (10.89)
0 elsewhere.
Ljubiša Stanković Digital Signal Processing 787
Both of these signals have the same sparsity K = 8 and satisfy the same set
of available samples. However, if the sampled signal x (n) is not the signal of
very specific from (10.89) then the solution of sparsity K = 8 will be unique
for a given set of available samples. Then z(n) = δ(n − 3) − δ(n − 19) will not
be able to cancel all 8 DFT values of signal and the sparsity of X (k ) + Z (k )
will be 8 only for z(n) = 0, producing correct unique solution. Signal Y (k ) =
− Z (2k − 1) is Y (k) = 2e− j2π (3(2k−1)−15)/32 = 2e− j2π (3k−9)/16 . It is periodic
with period N/Q = 16. Group delay of this signal is n0 = 3 with period 16.
Therefore within n = 0, 1, ..., 31 group delays n0 = 3 and n0 + 16 = 19 of Y (k )
correspond to the missing sample positions. The signal must have the form
X (k0m ) ∈ {2e− j2π (3k−9)/16 | k = 0, 1, . . . , N
Q − 1}, with k = 3 corresponding to
k0m = 2k − 1 = 5 producing X (5) = 2.
(c) Influence of missing samples highly depends on their positions. If
the missing samples are at qm ∈ N Q = {5, 9} then
B C
Z (k) = z5 e− j2π5k/32 + z9 e− j2π9k/32 = e− j2π5k/32 z5 + e− j2πk/8 z9 .
for z4 = −z7 e− j2π3k/32 . In addition, all K signal nonzero values X (k) can be
cancel out. Then the uniqueness relation is N − 1 − K > K.
(e) If the missing samples are qm ∈ N Q = {3, 4, 19} then this case may
be considered as a case with three variables producing two nonzero values
in Z (k ), but also it can be considered as {3, 19} ∪ {4}, when z(4) = 0 and two
variables z(3) and z(19) define sparsity as in (a). The second case is worse,
meaning that it defines the resulting sparsity K < 8.
Using Test 1 we will find the sparsity limit K when we are able to claim that
the reconstructed sparse signal is unique for any signal form.
-For h = 0 we use Q20 = Q and get 20 ( Q20 − 1) − 1 = ( Q − 1) − 1 = 9.
-For h = 1, the number Q21 is the greater value of
or
K < 8.
Ljubiša Stanković Digital Signal Processing 789
Test 1 considers general signal form. It includes the case when the
amplitudes of signal components are related to each other and related to
the missing sample positions. The specific signal form required by Test 1, to
reach its bound, is analyzed in the example. Since this kind of relation is a
zero-probability event, the condition obtained by neglecting the probability
that the signal values are dependent to each other and related to missing
sample positions at the same time is presented next.
C1: Assume that the amplitudes of signal components in Test 1 are arbitrary
with arbitrary phases so that the case when all of them can be related to the values
defined by using the missing sample positions is a zero-probability event. The
reconstruction result is not unique if the inequality
M N
K ≥ N − max 2 h ( Q2h − 1 ) − 1
h=0,1,...,r −1
holds. Integers Q2h are calculated in the same way as in the Test1.
Example 10.28. Consider a signal with N = 25 = 32 and Q = 9 missing samples at
qm ∈ N Q = {2, 3, 8, 13, 19, 22, 23, 28, 30}.
The sparsity limit K when we are able to claim that the reconstructed sparse
signal is not unique is
M N
K ≥ N − max 2 h ( Q 2h − 1 ) − 1
h=0,1,2,3,4
K ≥ 32 − max {9, 8, 8, 8, 16} − 1
K ≥ 15.
holds. Integers Q2h and S2r−h are calculated as in the Test 2. The case when all of
signal components can be related to the values defined by using the missing sample
positions is considered here.
Example 10.29. Consider a signal with N = 32 and Q = 9 missing samples at
qm ∈ N Q = {2, 3, 8, 13, 19, 22, 23, 28, 30}.
Assume that with these missing samples we have reconstructed signals with
nonzero DFT values at the positions
a) K = {1, 3, 5, 7, 9, 11, 13, 15, 17, 21, 23, 25, 27, 29, 31},
b) K = {1, 3, 5, 9, 13, 17, 21, 29, 31, 2, 4, 8, 12, 16, 20, 24, 30}.
Example 10.30. Consider a signal with N = 1024 and Q = 512 missing samples at
qm ∈ N Q = {0, 2, 4, ...1022}. The reconstructed signal is at the frequencies: a)
K = {3}, b) K = {3, 515}. We can easily check that in all cases with Test 1,
Corollary C1 and Test 2, the reconstruction is nonunique although K = 1 or
K = 2 is much smaller than the available number of samples N − Q = 512. The
answer is obtained almost immediately, since the computational complexity
of Test 1, Corollary C1 and Test 2, is of order O( N ).
f = BM x
where
A = B M Ψ.
As a simple study case for this kind of measurements consider a
discrete-time signal x (n) obtained by sampling a continuous-time signal
x (t) at nonuniform (or random) positions. Using the results presented in
this chapter we can state that if the signal x (t) satisfies the sampling theorem
and its DFT is sparse, then the signal can be reconstructed from a reduced set
of samples x (ti ) at {t1 , t2 , ..., t M } not corresponding to the sampling theorem
positions.
Since the DFT is used in the analysis, we can assume that the con-
tinuous time signal is periodically extended with a period T. According to
the sampling theorem, the period T is related to the number of samples N,
the sampling interval ∆t, and the maximal frequency Ωm as Ωm = π/∆t =
πN/T. The continuous-time signal can be written as an inverse Fourier se-
ries
N/2−1
x (t) = ∑ Xk e j2πkt/T , (10.92)
k=− N/2
N −1 t
sin[(n − ∆t )π ]
x (t) = ∑ x (n)e j(n−t/∆t)π/N t
. (10.93)
n =0 N sin[(n − ∆t )π/N ]
This relation holds for an even N. Similar relation can be written for an odd
N, Section 3.6.
For a sparse x (n) in the DFT domain, the number K of nonzero
transform coefficients X (k ) is much lower than the number of the original
signal samples N within T, K ≪ N, i.e., X (k ) = NXk = 0 for k ∈ / {k1 , k2 , ...,
k K }. A signal
K
x (t) = ∑ Xki e j2πki t/T . (10.94)
i =1
at instants
tni ∈ T A = {tn1 , tn2 , ..., tn M }.
being a random subset of {t1 , t2 , ..., t N }, with tni = ni ∆t + νni . The measure-
ments matrix relation is, (10.92)
⎡ ⎤ ⎡ − j2πNtn /(2T ) ⎤⎡ ⎤
x ( t n1 ) e 1 ... e j2π ( N −2)tn1 /(2T ) X− N/2
⎢ x (tn ) ⎥ ⎢ e− j2πNtn2 /(2T ) ... e j2π ( N −2)tn2 /(2T ) ⎥ ⎢
⎥
X− N/2+1 ⎥
⎢ 2 ⎥=⎢
⎢ ⎥⎢ ⎥
⎣ ... ⎦ ⎣ ... ... ... ⎦ ⎣ ... ⎦
x (tn M ) e − j2πNtn M /(2T )
... e j2π ( N −2)tn M /(2T ) X N/2−1
(10.96)
f= AX
The analysis presented in this chapter can be used to solve this problem and
calculate sparse coefficients Xk from the reduced set of observations f. The
measurements matrix in this case is a structured random matrix.
The nonzero positions of the Fourier transform coefficients can be
estimated using the available measurements only
X0 = A H f or
X0 (k ) = NX0,k = ∑ x (tni )e− j2πktni /T (10.97)
t ni ∈T A
K
E{ X (k )} = M ∑ X k p δ ( k − k p ).
p =1
794 Sparse Signal Processing
The variance of this estimator is different from the case when the avail-
able signal samples were at the sampling theorem positions. The condition
that a value of the DFT coefficient at k ̸= k p is zero (with zero variance) if
M = N samples are used, does not hold any more. The total variance can be
estimated as a simple sum of variances
K D E
var{ X (k )} = ∑ Xk2p M 1 − δ(k − k p ) . (10.98)
p =1
f= AK XK
B C −1 B C −1
XK = AKH AK AKH f = AKH AK X0K .
Example 10.31. Some of the random realizations of the initial DFT (10.97) for
signal (10.55) are given in Fig.10.30. In contrast to the partial DFT matrix case,
the variance of the estimator (10.97) does not tend to zero as M approaches
to N. However, we can see that the signal frequencies can be detected and
used to recover the signal using (10.37) and (10.39) with known time instants
ti ∈ {tn1 , tn2 , ..., tn M } and detected frequencies {k1 , k2 , ..., k K }.
The results for several random realization and nonuniform sampling of
signal (10.55), with recalculated signal values at the sampling theorem posi-
tions, are shown in Fig.10.31. As the number of available samples approaches
to the total number of samples N the reconstructed DFT is again noise-free,
Fig.10.31. For the signal defined by (10.55) the variance of initial DFT is cal-
culated in 100 random realizations of the sets of available samples for the
cases of when the signal is sampled according to the sampling theorem and
for nonuniform sampling without and with recalculation. The results for the
variance is presented in Fig. 10.32. From Fig.10.32 we can conclude that the
recalculation is not efficient for a small number of available samples, when
M ≪ N. In that case even slightly worse results are obtained than without re-
calculation, what could be expected, since the recalculated signal with many
inserted zeros is not sparse any more. For a large number of available sam-
ples (in Fig.10.32 for M > 5N/8) the recalculation produces better results,
approaching to the sparse signal without any deviation, for N = M.
16 64 128
M=16 M=64 M=128
12 48 96
8 32 64
4 16 32
0 0 0
1 128 257 1 128 257 1 128 257
signal transform
192 224 256
M=192 M=224 M=257
144 168 192
96 112 128
48 56 64
0 0 0
1 128 257 1 128 257 1 128 257
frequency
Figure 10.30 DFT of a signal with various number of available samples M. Available M
samples are taken at random positions within 0 ≤ ti ≤ T. Dots represent the original signal
DFT values, scaled with M/N to match the mean value of the DFT calculated using a reduced
set of samples signal.
and
x = B− 1
N fN . (10.99)
with
sin[( j − tni /∆t)π ]
bij = e j( j−tni /∆t)π/N
N sin[( j − tni /∆t)π/N ]
If a reduced set of available samples is used we know just M < N
of signal samples/measurements (10.91). Each available sample is a linear
796 Sparse Signal Processing
16 64 128
M=16 M=64 M=128
12 48 96
8 32 64
4 16 32
0 0 0
1 128 257 1 128 257 1 128 257
signal transform
192 224 256
M=192 M=224 M=257
144 168 192
96 112 128
48 56 64
0 0 0
1 128 257 1 128 257 1 128 257
frequency
Figure 10.31 DFT of a signal with various number of available samples M. Available M
samples are a random subset of N nonuniform samples taken at random positions within the
sampling theorem interval. Dots represent the original signal DFT values, scaled with M/N to
match the mean value of the DFT calculated using a reduced set of samples signal.
Their positions are assumed at the sampling theorem instants, tni = ni ∆t for
t ni ∈
/ T A , since they are not known anyway,
ni ∆t = tni ∈
/ TA.
450
400
350
300
variance
250
200
150
100
50
0
64 128 192 256
number of available samples
Figure 10.32 Variance of the DFT for three methods of sampling and various number of
available samples M. (1)-line with marks "x": Available samples a subset of samples taken
at the sampling theorem grid (solid line-theory, marks "x"-statistics). (2)-line with marks
"o": Randomly positioned M samples taken within 0 ≤ ti ≤ T (solid line-theory, marks "o"-
statistics). (3)-marks "+": Nonuniform randomly shifted samples from the sampling theorem
grid. (4)-marks "*": Nonuniform randomly shifted available samples being recalculated on the
sampling theorem grid.
iteration
(0)
x a = [ x (t0 ) x (1) x (t2 ) x (t3 ) x (t4 ) x (5) x (t6 ) x (t7 )]T
= [ x (t0 ) 0 x (t2 ) x (t3 ) x (t4 ) 0 x (t6 ) x (t7 )]T .
4
x(t0) x(t3)
2 x(t )
ni
6
x(t ), for t=t
x(5)
0
x(1)
x(t4)
-2 x(t2)
x(t )
7
-4
0 1 2 3 4 5 6 7 8
time
(m)
x+
a ( qi ) = x a ( qi ) + ∆
(m)
x−
a (qi ) = x a ( qi ) − ∆.
The available samples x (tni ), tni ∈ T A = {tn1 , tn2 , ..., tn M } are unchanged.
Since the sparsity domain is the DFT of signal x =[ x (0), x (1), ..., x ( N − 1)]
then the signals x+ −
a and x a are used to recalculate corresponding signals at
the sampling theorem positions x1 and x2 according to (10.99)
x1 = B − 1 +
N xa
and
x2 = B − 1 +
N xa
Sparsity minimization using the DFT of these signals X1 (k ) = DFT[ x1 (n)]
and X2 (k ) = DFT[ x2 (n)], with the estimation of the sparsity measure gradi-
ent
∑ N −1 | X1 (k )| − ∑kN=−01 | X2 (k )|
g ( q i ) = k =0 (10.100)
2N∆
reduces this problem to the problem with the sampling at the sampling
theorem rate. The reconstruction is then based on the same procedure using
the steps (10.82)-(10.83) from the presented algorithm.
Ljubiša Stanković Digital Signal Processing 799
Example 10.32. Consider the signal defined by (10.84) with M samples at instants
where tni = ni ∆t + νni and νni is a uniform random variable −∆t/2 ≤ νni ≤
∆t/2. Similar results for the SRR and the average number of iterations, for
various M and sparsity K, are obtained as in Fig.10.28. They will not be re-
peated. A particular realization with K = 6 nonzero DFT coefficients, out of
N = 128, and a number of available samples M = 16 within the transition
region, when the recovery is not always obtained, is considered. The realiza-
tions, when the recovery conditions, for a given signal and for some of the
considered sets of available samples, are met, can be detected. The criterion
for detection of a sparse signal after reconstruction is the measure of signal
sparsity. In this case measures closer to the ℓ0 -norm should be used. For ex-
ample, with ℓ1/4 -form in the case of a nonsparse reconstruction all transform
coefficients are nonzero with ∑kN=−01 | X (k)/N |1/4 ∼ N. For a full recovery of a
sparse signal the number of nonzero coefficients (the measure value) is much
lower since K ≪ N.
Among 100 performed realizations a possible sparse recovery event is
detected when the described sparsity measure of the result is much lower
than N. The set of DFT coefficient positions for the detected sparse signal
is K = {22, 35, 59, 69, 93, 106}. This sparse reconstruction is checked for
uniqueness using the Test 1. The missing samples are from the set qm ∈ N Q .
It is a set difference of all samples N= {n |0 ≤ n ≤ 127 } and
M = {7, 14, 18, 21, 34, 37, 51, 69, 79, 82, 89, 90, 99, 100, 113, 117} .
h 0 1 2 3 4 5 6
Q2h 112 58 31 16 8 4 2 .
S27−h 0 0 4 5 4 4 2
Note that Q20 = 112 is the total number of missing samples, while Q21 is
obtained by counting odd and even samples in N Q and taking higher number
of these two. Since there are 54 samples at odd positions and 58 samples at
even positions, it means that Q21 = 58.
For h = 2 there are 31 missing sample qm ∈ N Q with mod(qm , 4) = 0, 26
missing samples with mod (qm , 4) = 1, 27 missing samples with mod (qm , 4) =
2, and 28 missing samples with mod (qm , 4) = 3, resulting in
and so on. We can easily conclude that samples x (1) and x (65) are missing,
meaning that Q64 assumes its maximal possible value Q64 = 2.
800 Sparse Signal Processing
Q64 −1 1
S27−6 = S21 = ∑ P6 (l ) = ∑ P6 (l ) = P6 (1),
l =1 l =1
-7.5
0 16 32 48 64 80 96 112
time
original and reconstructed signal on the sampling interval grid
x(t), xR(n)
7.5
-7.5
0 16 32 48 64 80 96 112
time
Figure 10.34 Available randomly positioned samples x (ti ) (dots) of a sparse signal x (t)
(top). Reconstructed signal x R (n) at the sampling theorem positions (crosses) along with the
available samples (dots) (bottom). Continuous-time signal x (t) is presented by solid line.
_____________________________________________________
This Section presents results from: L. Stankovic, M. Dakovic and S. Vujovic, "Reconstruc-
tion of Sparse Signals in Impulsive Disturbance Environments", preprint, 2014. Adapted for
this book by S. Vujović.
802 Sparse Signal Processing
Very simple and intuitive idea is used first to address the problem of this
kind of noise elimination. A random set of M signal samples is used and
considered as available samples/measurements. The number of available
samples should be sufficiently large so that the signal of assumed sparsity
K can be reconstructed. Signal is then reconstructed. If nonnoisy samples are
selected then a sparse signal will be obtained. Detection of a sparse signal
reconstruction event is done by measuring sparsity of the obtained signal.
By using a sparsity measure close to l0 -norm the reconstruction realizations
containing disturbed samples, will produce nonsparse signal with the value
of sparsity measure close to the total number of samples N. In the case when
only the uncorrupted samples are used in the reconstruction, the sparsity
measure value is of order K, which is much lower than the total number of
samples N. The measure of form
N −1
M{ X (k)} = ∑ | X (k)/N | p , (10.101)
k =0
can be used with a small p so that its behavior is similar to the l0 -norm. In the
calculation with a finite precision, a sparse recovery will produce very small
(but nonzero) transformation coefficients values X (k ) at the positions where
they should be zero. Value of p should be such that | X (k )| p at these positions
is much lower than the value of | X (k )| p at the original nonzero signal
positions. Robustness to small but non-zero values in X (k ) is achieved using
p slightly greater than zero, for example p = 1/4. A threshold Tµ within
K < Tµ < N can be used in order to detect a sparse reconstruction event.
Now we will estimate the probability that all samples from a ran-
domly chosen subset are uncorrupted. The total number of samples in this
randomly chosen subset is M, at the positions n ∈ M. Probability that the
first randomly chosen sample is not affected by the described disturbance
is ( N − I )/N since there are N samples in total and N − I of them are un-
corrupted. Similarly, the probability that both the first and second chosen
samples are not affected by disturbance is NN− I NN−−I −1 1 . In general, probabil-
ity that all of M randomly chosen samples at the positions n ∈ M are not
affected by a disturbance is
M −1
N−I−i
P( M, N ) = ∏ N−i
. (10.102)
i =0
The probability P( M, N ) decreases as the number of terms in the product
increases, since NN−−I −i i < 1. In order to improve probability of a sparse
Ljubiša Stanković Digital Signal Processing 803
The presented direct search procedure can be used on signal with a small
number of corrupted samples since a number of the random realizations
804 Sparse Signal Processing
20
x(n)+ε(n) SRR [dB]
10
-10
(a) (b)
-20
0 50 100 0 200
time realization index
60
M{X(k)}
10 x(n), x (n)
R
5 40
0
20
-5
(c) (d)
-10 0
0 50 100 0 200
time realization index
Figure 10.35 Reconstruction of a signal with I = 15 out of N = 128 samples being affected
by an impulsive disturbance. In each realization 96 randomly chosen samples are removed.
Total number of realizations is 200. a) The available corrupted signal; b) The SRR for each of
200 realizations; c) The original (black line) and the reconstructed (dots) signal for the best
realization; d) The sparsity measure for each of 200 realizations.
In some applications the impulsive noise is much stronger than the signal.
The trimmed L-statistics can be used to eliminate the corrupted signal
samples, without any search procedure. The values of signal samples x (n)
are ordered into a nonincreasing sequence
If strong impulsive noise components exist, well above the signal level, then
very large absolute values of signal samples should be omitted as corrupted.
Ljubiša Stanković Digital Signal Processing 805
After these samples are removed then the remaining M < N samples
y = { x (n1 ), ..., x (n M )} .
A criterion that will mark some signal samples as probably more corrupted
than the others is presented next. In this process, no particular distribution
or number of corrupted samples is assumed.
Consider a corrupted signal xε (n) = x (n) + ε(n). For each time instant
we will form two signals x + −
a ( n ) = xε (n ) + ∆δ (n − m ) and x a (n ) = x ε (n ) −
∆δ(n − m), where m = 0, ..., N − 1. Then, a difference of measure values is
calculated as
N −1 ' ' N −1 ' '
g(m) = ∑ ' Xa+ (k )' − ∑ ' Xa− (k )' , (10.105)
k =0 k =0
150 150
SRR [dB] M{X(k)}
K=10 K=10
100
100
50
50
0
Figure 10.36 Reconstruction of a sparse signal when corrupted samples are removed by using
the criterion in iterative way. In each iteration r = 4 samples are removed. a) The SRR during
the iterations. b) The sparsity measure during the iterations for a signal of sparsity K = 10.
40
disturbance samples
30 K=10
20 2A
10
0
20 40 60 80 100 120
sorting index
Figure 10.37 Disturbance values in the signal, sorted according to the introduced significance
criterion, with signal range in amplitude 2A.
0
10
uniqueness probability, Q=72 out of N=128
-1
10
-2
10
-3
10
-4
10
0 5 10 15 20 25 30 35
sparsity K
Figure 10.38 Sparsity limit probability distribution for the worst possible case of signal with
Q = 72 out of N = 128 samples in 100,000 random realizations.
The gradient based algorithm is applied on the image x (n, m). As the
transformation domain the two-dimensional DCT (in symmetric form) will
be used
N −1 N −1 * + * +
2π (2m + 1)k 2π (2n + 1)l
C (k, l ) = vk vl ∑ ∑ x (m, n) cos cos ,
m =0 n =0 4N 4N
√ √
where v0 = 1/N and vk = 1/N for k ̸= 0. Assume that random set of
pixels is available (not corrupted) at (n, m) ∈ M. The goal is to reconstruct
_________________________________________
This section is written by Isidora Stanković.
Ljubiša Stanković Digital Signal Processing 809
x+
a (m, n ) = x
( p)
(m, n) + ∆δ(m − mi , n − ni )
x−
a (m, n ) = x
( p)
(m, n) − ∆δ(m − mi , n − ni ). (10.106)
The finite difference of the signal transform measure is calculated
∥Ca+ (k, l )∥1 − ∥Ca− (k, l )∥1
g ( mi , ni ) = (10.107)
2∆
where Ca+ (k, l ) = DCT[ x + − −
a (m, n )] and Ca (k, l ) = DCT[ x a (m, n )].
A gradient matrix Gm,n is of the same size as the image. At the
positions of available samples (n, m) ∈ M, this matrix has zero value, Gm,n =
0. At the missing sample positions n ∈ N Q its values are Gm,n = g(m, n),
calculated using (10.107).
The image values are corrected iteratively as
( p) ( p −1)
x a (m, n) = x a (m, n) − 2∆Gm,n . (10.108)
The change of step ∆ and the stopping criterion are the same as in one-
dimensional case. The results in 50 iterations are shown in Fig. 10.39. Re-
constructed image after 1, 3, and 50 iterations are presented.
810 Sparse Signal Processing
811
812 Index
Variance, 331
Voting mashines, 519
[3] M. Amin, Compressive Sensing for Urban Radar, CRC Press, 2014.
815
816 Bibliography
[26] S. Haykin and B. Van Veen, Signals and Systems, Wiley, 2002.
[31] E. Kudeki and D. C. Munson Jr., Analog Signals and Systems, Prentice
Hall, 2008.
[35] S. R. Laxpati and V. Goncharoff, Practical Signal Processing and its Appli-
cations, Part I, CreateSpace Independent Publishing Platform, 2013.
[56] M. J. Roberts, Signals and Systems: Analysis Using Transform Methods &
MATLAB, McGraw-Hill Education, 2011.
[64] H. Stark and J. W. Woods, Probability and Random Processes with Applica-
tions to Signal Processing, Prentice Hall, third edition, 2001.
[65] L. Tan and J. Jiang, Digital Signal Processing: Fundamentals and Applica-
tions, Academic Press; 2 edition , 2013.
820