Professional Documents
Culture Documents
DIGITAL PROCESSING
OF
SPEECH AND IMAGE SIGNALS
RWTH Aachen, WS 2006/7
Literature:
A. V. Oppenheim, R. W. Schafer: Discrete Time Signal Processing,
Prentice Hall, Englewood Cliffs, NJ, 1989.
A. Papoulis: Signal Analysis, McGraw-Hill, New York, NY, 1977.
A. Papoulis: The Fourier Integral and its Applications, McGraw-Hill
Classic Textbook Reissue Series, McGraw-Hill, New York, NY, 1987.
W. K. Pratt: Digital Image Processing, Wiley & Sons Inc, New York,
NY, 1991.
Further reading:
T. K. Moon, W. C. Stirling: Mathematical Methods and Algorithms
for Signal Processing. Prentice Hall, Upper Saddle River, NJ, 2000.
J. R. Deller, J. G. Proakis, J. H. L. Hansen: Discrete-Time Processing
of Speech Signals, Macmillan Publishing Company, New York, NY,
1993.
W. H. Press, S. A. Teukolsky, W. T. Vetterling, B. P. Flannery: Numerical Recipes in C, Cambridge Univ. Press, Cambridge, 1992.
L. Rabiner, B. H. Juang: Fundamentals of Speech Recognition, Prentice Hall, Englewood Cliffs, NJ, 1993.
T. Lehmann, W. Oberschelp, E. Pelikan, R. Repges: Bildverarbeitung
f
ur die Medizin, Springer Verlag, Berlin, 1997.
L. Berg: Lineare Gleichungssysteme mit Bandstruktur, VEB Deutscher
Verlag der Wissenschaften, Berlin, 1986.
Contents
1 System Theory and Fourier Transform
1.1 Introduction . . . . . . . . . . . . . . .
1.2 Linear time-invariant Systems . . . . .
1.3 Fourier Transform . . . . . . . . . . . .
1.4 Properties of the Fourier Transform . .
1.5 Parseval Theorem . . . . . . . . . . . .
1.6 Autocorrelation Function . . . . . . . .
1.7 Existence of the Fourier Transform . .
1.8 -Function . . . . . . . . . . . . . . . .
1.9 Motivation for Fourier Series . . . . . .
1.10 Time Duration and Band Width . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
2
11
16
25
33
34
35
36
41
45
51
52
53
56
61
70
72
74
78
88
90
98
102
104
105
108
118
124
3 Spectral analysis
131
3.1 Features for Speech Recognition . . . . . . . . . . . . . . . 132
3.2 Short Time Analysis and Windowing . . . . . . . . . . . . 135
3.3 Autocorrelation Function and Power Spectral Density . . . 159
3.4 Spectrograms . . . . . . . . . . . . . . . . . . . . . . . . . 165
3.5 Filter Bank Analysis . . . . . . . . . . . . . . . . . . . . . 168
3.6 Mel-frequency scale . . . . . . . . . . . . . . . . . . . . . . 171
3.7 Cepstrum . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
3.8 Statistical Interpretation of the Cepstrum Transformation
183
3.9 Energy in acoustic Vector . . . . . . . . . . . . . . . . . . 185
4 Fourier Transform and Image Processing
4.1 Spatial Frequencies and Fourier Transform for Images
4.2 Discrete Fourier Transform for Images . . . . . . . .
4.3 Fourier Transform in Computer Tomography . . . . .
4.4 Fourier Transform and RST Invariance . . . . . . . .
5 LPC Analysis
5.1 Principle of LPC Analysis . . . . . . . . .
5.2 LPC: Covariance Method . . . . . . . . . .
5.3 LPC: Autocorrelation Method . . . . . . .
5.4 LPC: Interpretation in Frequency Domain
5.5 LPC: Generative Model . . . . . . . . . .
5.6 LPC: Alternative Representations . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
187
188
196
197
199
.
.
.
.
.
.
207
208
212
213
216
221
223
233
237
ii
List of Figures
1.1
1.6
1.7
2.1
Digital photo . . . . . . . . . . . . . . . . . . . . . . . . .
58
2.2
Gradient image . . . . . . . . . . . . . . . . . . . . . . . .
58
2.3
Several real cases of Laplace Operator subtraction from original image. a) Original image b) Original image minus
Laplace Operator (negative values are set to 0 and values
above the grey scale are set to the highest grade of grey) .
60
64
1.2
1.3
1.4
1.5
2.4
iii
2.5
2.6
2.7
2.8
2.9
2.10
2.11
2.12
2.13
2.14
2.15
2.16
2.17
3.1
3.2
3.3
3.4
3.5
138
146
148
151
158
3.6
3.7
163
164
3.8
3.9
3.10 Wide-band and narrow-band spectrogram and speech amplitude for the sentence Every salt breeze comes from the
sea. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
167
177
179
4.1
4.2
4.3
4.4
4.5
4.6
TVimage (analog) . . . . . .
Digitized TVimage . . . . . .
Amplitude spectrum of Figure
Low-pass filtered . . . . . . .
High-pass filtered . . . . . . .
High-pass enhancement . . . .
193
193
193
193
194
194
5.1
5.2
. .
. .
4.2
. .
. .
. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
178
List of Tables
2.1
2.2
87
88
Chapter 1
System Theory and Fourier
Transform
Overview:
1.1 Introduction
1.2 Linear time-invariant Systems
1.3 Fourier Transform
1.4 Properties of Fourier Transform
1.5 Parseval Theorem
1.6 Autocorrelation Function
1.7 Existence of the Fourier Transform
1.8 Function
1.9 Fourier Series
1.10 Duration and Band Width
WS 2006/2007
1.1
Introduction
WS 2006/2007
=0
= /2
random
0
0
Figure 1.3: from left to right: original photo, low-pass and high-pass filtered version
WS 2006/2007
amplitude spectrum
original signal
Figure 1.4: Phase manipulation for portion of a speech signal (vowel o) sampled at 8kHz,
25ms analysis window (200 samples), 512 point FFT
WS 2006/2007
amplitude spectrum
original signal
Figure 1.5: Phase manipulation for portion of a speech signal (consonant n) sampled at
8kHz, 25ms analysis window (200 samples), 512 point FFT
WS 2006/2007
amplitude spectrum
original signal
WS 2006/2007
Why Fourier?
Roughly:
Production, description and algorithmic operations on signals (functions or measurement curves over the time axis) can be described very
well in Fourier domain (frequency domain).
Deeper reason:
Production, description and algorithmic operations on signals are largely
based on linear time-invariant (LTI) operations.
Fourier Transform: simple representation of LTI-operations (later:
convolution theorem)
Why continuous?
Real world is continuous
Computer (digital = time discrete = sampled)
model of the real world
WS 2006/2007
a)
A
glottal
pulses
vocal
tract
filter
b)
speech [a:]
radiation
from lips
and nose
|E(f)| [dB]
[a:]
|V(f)| [dB]
|A(f)| [dB]
|S(f)| [dB]
[a:]
1/T
NOSE
OUTPUT
NASAL
CAVITY
VELUM
PHARYNX
CAVITY
VOCAL
CORDS
LARYNX
TUBE
MOUTH
CAVITY
TONGUE
HUMP
MOUTH
OUTPUT
TRACHEA AND
BRONCHI
LUNG
VOLUME
MUSCLE
FORCE
WS 2006/2007
feature extraction
(signal analysis)
feature vector
(pattern vector)
(pattern)
comparison
reference data
(vectors, features)
decision
Examples:
Spoken language
Written numbers (letters)
Cell recognition (red blood cells)
WS 2006/2007
10
WS 2006/2007
1.2
Example:
speech production
electrical systems
h(t)
input signal
x(t)
output signal
y(t)
symbolic:
{t y(t)} = S {t x(t)}
simplified:
y(t) = S {x(t)}
Note: the complete time domain of the function is important, not
individual positions in time t.
more exact:
y = S {x}
LTISystem:
Linear:
Additive:
S {x1 + x2 } = S {x1 } + S {x2 }
Homogeneous:
S { x} = S {x} ,
IR
Time-invariant:
{t y(t t0 )} = S {t x(t t0 )} ,
Digital Processing of Speech and Image Signals
11
t0 IR
WS 2006/2007
Mathematical theorem:
Linearity and time invariance result in convolution representation
Output signal y(t) of LTI system S with input signal x(t):
y(t) =
x(t ) h( ) d
x( ) h(t ) d
= x(t) h(t)
h: impulse response of the system S
e
(t)
x (t)
1/
12
WS 2006/2007
lim
"
lim
(i
additivity:
=
lim
"
x(i ) e (t i )
X
i
x(i ) e (t i )
S { x(i ) e (t i ) }
)#
time invariance:
=
lim
"
X
i
x(i ) h (t i )
limiting case 0 :
X
i
h (t)
h(t)
result:
y(t) =
h(t):
13
WS 2006/2007
Examples of LTI-operations:
Oscillatory systems (electrical or mechanical) with
external excitation:
h( )
x(t)
y(t) =
y(t)
h(t ) x( ) d
y(t) := x(t)
+T
Z /2
1
x(t) =
T
x(t + ) d
T /2
Differentiator:
x(t)
y(t) := x (t)
[ + further constraints ]
14
WS 2006/2007
frequency doubling
15
WS 2006/2007
1.3
Fourier Transform
Sinusoidal oscillation:
x(t) = A sin ( t + )
amplitude A
phase / null phase
angular frequency = 2 f
j 2 = 1,
jC
Im
1
sin
cos 1
complex representation:
Re
ej = cos + j sin ,
ej + ej
cos =
2
and
IR
ej ej
sin =
2j
dimension:
DIM() DIM(t) = 1
DIM() =
1
1
=
= [Hz]
DIM(t) [sec]
16
WS 2006/2007
LTI-System
y(t) =
A ej((t )+) h( )d
= A ej(t+)
h( )ej d
|
{z
}
H() = F {h( )}
x(t) H()
decomposition into ej )
17
WS 2006/2007
Remarks
FT is complex:
H() = Re {H()} + j Im {H()} = |H()| ej()
Amplitude (spectrum):
q
Re {H()}2 + Im {H()}2
|H()| =
Phase (spectrum):
Im {H()}
arctan
Re {H()}
Im
{H()}
arctan
+
Re
{H()}
() =
18
Re {H()} > 0
Re {H()} < 0
Re {H()} = 0,
Im {H()} > 0
Re {H()} = 0,
Im {H()} < 0
WS 2006/2007
H() =
1,
0,
|t| T /2
|t| > T /2
jt
h(t)e
dt =
Z2
jt
T2
i
1 h j T
j T2
2
e
e
dt =
j
T
)
T sin(
2
T
2
sin(
) =
T
2
2
(here: Im {H()} = 0)
h(t)
H()
19
WS 2006/2007
2. Double-sided exponential
h(t) = e|t|
H() =
with > 0
h(t)ejt dt
e(+j)t dt +
=
=
=
=
e(j)t dt
e(j)t
e
+
( + j) ( j) 0
1
1
0+0
( + j) ( j)
j + + j
2 + 2
2
2 + 2
(+j)t
h(t)
20
WS 2006/2007
3. Damped oscillations
h(t) = e|t| cos(t) with > 0
H() =
h(t)ejt dt
e(+j)t cos(t)dt +
e(j)t cos(t)dt
e(+j)t
ejt + ejt
dt +
2
e(j)t
ejt + ejt
dt
2
...
(elementary calculation)
+
2 + ( )2
2 + ( + )2
Limiting case:
H()|= =
2 + (2)2
H( )
h(t)
21
WS 2006/2007
h(t)ejt dt
Z2
cos( t)ejt dt
T2
...
(elementary calculation)
T
sin
(
)
T
2
T
2
( )
2
T
sin ( + )
2
T
( + )
2
h(t)
h(t)
H()
H()
22
WS 2006/2007
Sinc function
-1/2
sin(u)
u
1/2
Triangle function
-1/2
1/2
Exponential function
2
2+(2u)2
e-|x|
Gaussian function
e -x
- u
e
Unit impulse
(x)
23
WS 2006/2007
Inverse Fouriertransform
Z
H() =
h(t)ejt dt
h(t)
=
2
assumption:
with:
H() =
H()ejt d
h( )ej d
h(t)
=
1
2
lim
,T
1
lim lim
2 T
1
lim lim
lim
= h(t)
ZT
h( ) ej(t ) d d
T
ZT Z
ej(t ) d h( ) d
T
ZT
sin ((t ))
h( ) d
t
sin ((t ))
h( ) d
t
due to:
1
lim
Z
sin(t)
h(t) dt = h(0)
t
formal expression:
h(t) =
1
2
|
ej(t ) d h( ) d
{z
= (t )
24
WS 2006/2007
1.4
Symmetry
H() =
1
2
h(t) =
25
WS 2006/2007
ejt h(t) dt
h(t) ejt dt =
F {h(t)} =
1
||
h( ) ej d
H( ),
||
IR\{0}
Note:
Absolute value, because integral boundaries are swapped for < 0.
3. Shift:
h(t t0 )
Z
h(t t0 ) ej(tt0 ) dt
h( ) ej d
26
WS 2006/2007
, because
results in
h(t) = h(t)
5. Complex conjugation:
Z
Im{H()} = 0
results in
Re{H()} = 0
h(t) ejt dt
h(t) ejt dt
h(t) is real, so
h(t) = h(t)
27
WS 2006/2007
6. Differentiation:
dh
dt
1
t
2
1
2
H() ejt d
H() j ejt d
F{
dh(t)
} = j F {h(t)}
dt
Zt
h( )d } =
Proof:
1
F {h(t)}
j
8. Modulation principle:
F {h(t) cos(0 t)} =
1
h(t) ej0 t ejt dt +
h(t) ej0 t ejt dt
2
Z
Z
1
h(t) ej(0 )t dt +
h(t) ej(+0 )t dt
=
2
1
[ H( 0 ) + H( + 0 ) ]
2
and similarly
F { h(t) sin(0 t) } =
1
[ H( 0 ) H( + 0 ) ]
2j
28
WS 2006/2007
y(t)
x(t)
h(t), H()
Y()
X()
Convolution theorem
Convolution in time domain corresponds to multiplication in frequency
domain
Z
Time domain:
y(t) = x(t) h(t) =
x(t ) h( ) d
Frequency domain:
Y () =
ejt
h( ) x(t ) d dt
Z
h( )
x(t ) ejt dt d
h( ) X() ej d
= X()
(shifting)
h( ) ej d
= X() H()
29
WS 2006/2007
Frequency domain:
Y () =
1
2
1
2
B(
)ej t ejt d
dt
B(
)
a(t)ej()t dt d
A(
) B(
)d
1
A() B()
2
30
WS 2006/2007
x(t)
y(t)
y (t) =
y (t) =
1
2
1
2
1
2
Z+
Y ()ejt d
Z+
Y ()j ejt d
Z+
Y ()[ 2 ] ejt d
Z+
Z+
[ 2 + 2j + 2 ]Y ()ejt d =
X()ejt d
Z+
[ 2 + 2j + 2 ] Y () X() ejt d = 0
|
{z
}
=0
H() =
Y ()
1
=
X() 2 + 2j + 2
31
WS 2006/2007
1
h(t) =
2
Z+
H()ejt d
Z+
x(t) h(t )d
y(t) =
Note:
y(t) does not contain the component which corresponds to the homogeneous differential equation of the oscillator.
x(t)
Convolution with
h(t)
Inverse Fourier
Transform
Fourier
Transform
X()
y(t)
Multiplication with
H() = F{h(t)}
32
Y()
WS 2006/2007
1.5
Parseval Theorem
Convolution theorem:
F 1 {H() X()} =
()
1
2
H() X() ej d
h(t) x( t) dt
= (h x) ( )
1
2
H()H() d
Z
|H()|2 d
h(t)h(t) dt
|h(t)|2 dt = E
33
WS 2006/2007
1.6
Autocorrelation Function
Autocorrelation function
Autocorrelation function of time continuous
signal or function h(t) is defined as:
R(t) =
h( ) h(t + )d
which results in
R(t) = R(t)
(Wiener-Khinchin Theorem)
R(t) ejt dt =
R(t) cos(t) dt
Remark:
autocorrelation is a special case of the cross correlation between signals x( ) and h(t)
Ch,x =
h( ) x(t + )d
34
WS 2006/2007
1.7
H() =
1
h(t) =
2
ejt h(t) dt ,
ejt H() d
|h(t)|dt <
2. h(t) has finite number of jumps, minima and maxima in each interval
of IR
3. h(t) has no infinite jumps
More general conditions are possible (but rather complex set of conditions):
Generalized functions, distributions,
definition as functional
Example: -function:
Z
35
WS 2006/2007
Impulse response:
y(t) =
h(t )( ) d
= h(t) (t)
= h(t)
Consequence:
h(t) 1
(t) dt = 1
A function like (t) does not exist. But it is possible to define the
functional for each function t h(t):
[t h(t)] (h)
:= h(0)
1.8
-Function
Z+
(1.1)
1 t [, +]
2
a) (t) =
0 otherwise
b) (t) =
1
2 + t2
36
WS 2006/2007
c) (t) =
1 sin (t/)
d) (t) =
1
22
t2
e 22
1
2
Z+
ejt d = lim
1 sin (t)
(1.2)
Z+
ejt (t) dt
1
2
1
2
Z+
Z+
ejt F {(t)} d
ejt d
general
according to (1.2)
37
WS 2006/2007
cos (0 t) =
=
1 j0 t
e
+ ej0 t
2
Z+
Z+
1
( + 0 ) ejt d
( 0 ) ejt d +
2
Z+
1
2
[ ( 0 ) + ( + 0 ) ] ejt d
F { cos (0 t) } = [ ( 0 ) + ( + 0 ) ]
Note: another derivation:
consider damped oscillations
1 |t|
e
cos (0 t)
2
in the limit 0 .
38
WS 2006/2007
Comb function
define comb function (pulse train, sequence of -impulses):
+
X
x(t) =
n=
(t nT )
Z+
x(t) ejt dt
Z+ X
+
(t nT ) ejt dt
(t nT ) ejt dt
n=
+
+ Z
n=
+
X
jnT
n=
=
=
...
+
2 X
2
( n )
T n=
T
in words:
-impulse sequence with period T in time domain
produces
-impulse sequence with period T1 in frequency domain
(i.e. 2
T in -frequency domain)
comb function is transformed to comb function
Digital Processing of Speech and Image Signals
39
WS 2006/2007
Comb function
n=-
-6T
2
T
(t-nT)
-3T -T
3T
n=-
-6 -4 -2
T T T
6T
(-n2/T)
2 4 6
T T T
1((- )+(+ ))
0
0
2
cos(0t)
1 j((- )+(+ ))
0
0
2
sin(0t)
0
0
40
WS 2006/2007
1.9
IR IR
t x(t)
for each t IR
for k Z
Examples:
Constant function:
x0 (t) = A0
Harmonic oscillator:
x1 (t) = A1 cos (
2
t + 1 ) ,
T
A1 > 0
2
t + n ) ,
T
An > 0
therefore
x(t) =
An cos (n 0 t + n ) with 0 =
n=0
2
,
T
An 0
2
0
Another notation:
x(t) =
Bn ej n 0 t
where Bn
is a complex number
n=
41
WS 2006/2007
42
WS 2006/2007
2
0
approach:
x(t) =
+
X
an ej n 0 t
aC
n=
x(t) ej m 0 t dt =
+
X
ej (nm) 0 t dt
an
n=
T /2
+T
Z /2
T /2
j (nm) 0 t
dt =
T /2
T
0
if n = m
if n =
6 m
Then:
ZT /2
x(t) ej m 0 t dt = am T
T /2
Result:
an
1
T
+T
Z /2
x(t) ej n 0 t dt
T /2
1
T
+T
Z /2
x(t) cos (n 0 t) dt j
T /2
1
T
+T
Z /2
x(t) sin (n 0 t) dt
T /2
43
WS 2006/2007
x(t) =
+
X
2
0
an ej n 0 t ,
n=
, then
an C
+
X
= 2
F {ej n 0 t }
| {z }
= 2( n0 )
n=
an ( n0 )
Note:
In words:
44
WS 2006/2007
1.10
1. Similarity principle:
F {h(t)} =
H( )
||
h(t)
H( )
0<<1:
_ H( _
1
h( t)
time duration T
band width B
T B
= const.
45
WS 2006/2007
1
H() cos(t) d
2
H() d = h(0)
define:
T
1
h(0)
1
H(0)
h(t) dt
H() d
from
T
H(0)
h(0)
and B = 2
h(0)
H(0)
follows
T B
46
= 2
WS 2006/2007
3. In general:
T2
B2
:=
:=
h2 (t) t2 dt
| H() |2 2 d
= 2
T B
2
[h (t)] dt
|
{z
}
{z
}
{z
}
|
|
2
2
1
B
=T
=
4
2
From:
partial integration
u(t) v (t) dt
1
t dt = h(t)2 t
[ h(t) h (t) ] |{z}
{z
}
|
2
v(t)
u (t)
[ h(t) h (t) ] t dt = 0
47
1 2
h (t) 1 dt
2
1
2
WS 2006/2007
Optimum T B =
> 0
1 2
t
h(t) = e 2
2
Variance: 2 =
48
WS 2006/2007
+
Z
Z+
|g(t)| dt
because g(t) 0
= G(0)
Define the band width B as:
|G(B )|2 =
G2 (0)
2
and
Then:
T B
Digital Processing of Speech and Image Signals
49
2
WS 2006/2007
Proof:
The following inequalities are valid:
(a b)2
a +b
2
| sin | + | cos | 1
2
a, b IR
IR
Re{G()} =
ZT
g(t) cos t dt
Im{G()} =
ZT
g(t) sin t dt
holds:
cos t 0, sin t 0
2
and therefore:
cos t + sin t = | cos t| + | sin t| 1
For 0 t
Re{G()} Im{G()} =
ZT
ZT
g(t) 1 dt
= G(0)
|G()|2 = Re2 {G()} + Im2 {G()}
[Re{G()} Im{G()}]2
2
1 2
G (0) |G(B )|2
2
Digital Processing of Speech and Image Signals
50
WS 2006/2007
Chapter 2
Discrete Time Systems
Overview:
2.1 Motivation and Goal
2.2 Digital Simulation using Discrete Time Systems
2.3 Examples of Discrete Time Systems
2.4 Sampling Theorem and Reconstruction
2.5 Logarithmic Scale and dB
2.6 Quantization
2.7 Fourier Transform and zTransform
2.8 System Representation and Examples
2.9 Discrete Time Signal Fourier Transform Theorems
2.10 Discrete Fourier Transform (DFT)
2.11 DFT as Matrix Operation
2.12 From continuous FT to Matrix Representation of DFT
2.13 Frequency Resolution and Zero Padding
2.14 Finite Convolution
2.15 Fast Fourier Transform (FFT)
2.16 FFT Implementation
51
WS 2006/2007
2.1
52
WS 2006/2007
2.2
Task definition:
Given:
Analog system with input signal x(t) and output signal y(t);
Sampling with sampling period TS
Wanted:
Discrete System with input signal x[n] and output signal y[n], such
that
x[n] = x(nTS )
results in
y[n] = y(nTS )
For which signals is such a digital simulation possible?
The sampling theorem gives (most of) the answer.
53
WS 2006/2007
54
n0
whole number
WS 2006/2007
[n] =
1,
0,
n = 0
n 6= 0
The signal x[n] is represented with amplitude weighted and time shifted
unit impulses [n]. The system reacts on [n] with h[n]:
h[n] = S {[n]}
Input signal:
x[n] =
k=
x[k] [n k]
Output signal:
y[n] = S
k=
x[k] [n k]
Additivity
=
S { x[k] [n k] }
x[k] S { [n k] }
k=
Homogeneity
=
k=
Time invariance
=
k=
x[k] h[n k]
Input signal x[n] and output signal y[n] of a discrete time LTI system are
linked through discrete convolution.
h[n] is called impulse response like in continuous time case.
Digital Processing of Speech and Image Signals
55
WS 2006/2007
2.3
Difference calculation:
y[n] = x[n] x[n n0 ]
1-2-1-averaging:
y[n] = 0.5 x[n 1] + x[n] + 0.5 x[n + 1]
sliding window averaging (smoothing)
M
X
1
y[n] =
x[n k]
2M + 1
k=M
1
2M + 1
M
X
k=M
h[k] x[n k]
56
WS 2006/2007
j+1
i+1
2
|x[i, j]|2 = (x[i, j] x[i + 1, j + 1])2 + (x[i, j + 1] x[i + 1, j])2
57
WS 2006/2007
58
WS 2006/2007
-2
1
j+1
1
1
-4
1
-2
j-1
1
i-1
i+1
Image enhancement:
y[i, j] = x[i, j] 2 x[i, j]
= h[i, j] x[i, j]
59
WS 2006/2007
Figure 2.3: Several real cases of Laplace Operator subtraction from original image. a)
Original image b) Original image minus Laplace Operator (negative values are set to 0
and values above the grey scale are set to the highest grade of grey)
60
WS 2006/2007
2.4
1
x(t) = F 1 { X() } =
2
X() ejt d
(2.1)
Signal x(t) has limited bandwidth with upper limit B , which means:
X() = 0
for all || B
Note: X(B ) = 0
X() in domain B < < B can be represented as Fourier Series:
an exp(jn )
X() =
(2.2)
B
n=
The coefficients an are given by:
ZB
1
X() exp(jn ) d
an =
2B
B
(2.3)
Comparison of the equations (2.1) and (2.3) shows that the coefficients
an are given by the values of the inverse Fourier transform of x(t) at
points
n
tn =
(2.4)
B
The band limitation of X() has to be considered for the integration
limits in (2.1). Result:
n
(2.5)
an = x( )
B B
Digital Processing of Speech and Image Signals
61
WS 2006/2007
Inserting Eq. (2.5) into Eq. (2.2) and then in Eq. (2.1) results in:
1
x(t) =
2
ZB
X
x( ) exp(jn ) exp(jt) d
B n= B
B
x(t) =
x(
n=
n
)
B
n
))
B
n
)
B (t
B
sin(B (t
(2.6)
TS =
B
The sampling period TS corresponds to the sampling frequency S :
S =
2
TS
62
WS 2006/2007
eB =
then:
x(t) =
x(n TS )
n=
TS
sin( (t n TS )/TS )
(t n TS )/TS
(reconstruction formula)
= 1 (lHopitals rule)
Note: limt0 sin(t)
t
The condition
eB B results in:
TS
(2.7)
(2.8)
63
WS 2006/2007
a)
x(t)
b)
xs(t)
c)
xr(t)
T
Figure 2.4: Ideal reconstruction of a band-limited signal (from Oppenheim, Schafer)
a) original signal b) sampled signal c) reconstructed signal
64
WS 2006/2007
X()
a)
XS1() , S > 2
b)
...
...
-S
...
...
-S
...
...
S
65
WS 2006/2007
2
TS
Sampling procedure
xs (t) = Ts x(t)
+
X
n=
(t nTs )
2
Ts n=
Ts
Z+
+
X
2n
d
X(
)
=
T
s
n=
+
X
2
X n
=
Ts
n=
B S B
2B S
Digital Processing of Speech and Image Signals
66
WS 2006/2007
x( ) h(t ) d
k=
k=
h[n]
= h(nTS )
k]
x[k] h[n
67
WS 2006/2007
Important:
In the domain || < S /2 the Fourier transform of a continuous time
signal x(t) is identical with the Fouriertransform of the corresponding
sampled discrete time signal x(nTS ):
Z
X() =
x(t) exp(jt) dt
for || S /2 is identical to
X
x(nTS ) exp(jTS n)
TS XS () = TS
= TS
n=
x(nTS ) exp(j
n=
2
n)
S
1
S
ZS /2
XS () exp(jTS n) d
S /2
One period:
S
S
2
2
2
S
68
WS 2006/2007
Frequency normalization
Define the normalized frequency N :
N : = 2
Definition:
X(e ) =
x[n] exp(jn)
n=
X(ej ) exp(jn) d
69
WS 2006/2007
2.5
Why?
large dynamic range for the amplitude values of a signal
x(t) = A cos t
A :=
amplitude
(pressure, velocity, inclination, current, voltage, ... )
linear variable
A0 :=
reference amplitude
predefined value for calibration
dB := decibel
A[dB] 20 lg
A
,
A0
A2
= 10 lg 2 ,
A0
lg log10
A2 = quadratic variable = energy, intensity
3 dB =
factor 2 for intensity
70
WS 2006/2007
Phonem: s
Phonem: s
1.5
1
0.5
A
log A
4
3
0
-0.5
-1
1
0
-1.5
0
1000
2000
3000
4000
f / Hz
5000
6000
7000
-2
8000
1000
2000
3000
4000
f / Hz
5000
6000
7000
8000
Phonem: ae
Phonem: ae
12
2.5
2
10
1.5
1
log A
0.5
0
-0.5
2
-1
0
1000
2000
3000
4000
f / Hz
5000
6000
7000
-1.5
8000
Ah
1000
2000
3000
4000
f / Hz
5000
6000
7000
8000
Pause
Pause
0.9
-0.5
0.8
0.7
-1
log A
0.6
0.5
-1.5
0.4
-2
0.3
0.2
-2.5
0.1
0
1000
2000
3000
4000
f / Hz
5000
6000
7000
-3
8000
1000
2000
3000
4000
f / Hz
5000
6000
7000
8000
71
WS 2006/2007
2.6
Quantization
Uniform quantization
-X MAX
XMAX
Quantisation: x = Q(x)
B bits correspond to 2B quantisation levels
Boundaries:
x0 , x1 , . . . , xk , . . . , xK
where
K = 2B
2 XM AX
2B
Quantisation error:
e2
Z+
Zxk
K
X
=
(x x)2 p(x) dx =
(x xk )2 p(x) dx
k=1 x
k1
xk xk1 = = const(k)
xk = 12 (xk1 + xk )
e2
X 2
k
2
1
2
XM
AX
=
=
12 K
12
3 22B
72
WS 2006/2007
XM AX
x
73
for XM AX = 4x
WS 2006/2007
2.7
< n <
( is dimensionless here)
Proof:
x[n] = ej n
X
y[n] =
h[k] ej (nk)
k=
jn
= e
h[k] ej k
k=
Define:
H(ej ) =
h[k] ej k
k=
Remark:
The Fourier transform of a discrete time signal is already introduced as
Fourier series during the derivation of sampling theorem and reconstruction formula (equation (2.2)).
Result:
y[n] = ej n H(ej )
74
WS 2006/2007
ztransform:
Fourier transform of a discrete time signal: x[n]
+
X
X(e ) =
x[n] ejn
n=
periodic in
+
X
X(z) =
x[n] z n
n=
1
x[n] =
2j
formally: z = ej
X(z) z n1 dz
dz = jzd
x[n] =
1
2
Z2
X(ej ) ejn d
75
WS 2006/2007
N
1
X
X(z) =
a z
0nN 1
otherwise
n=0
z N 1
N
1
X
1 n
(a z )
n=0
z a
za
1 (a z 1 )N
=
1 a z 1
Fourier transform
ztransform results in Fourier transformation using substitution
z = ej
X(e ) =
1 aN ejN
1 a ej
special case for a = 1 (discrete time rectangle):
N
sin
(N 1)
2
= exp j
2
sin
2
76
WS 2006/2007
1
x[k] =
2j
X(z) z k1 dz
Fourier:
z = ej
dz = j ej d
Then:
x[n] =
1
2j
Z+
1
2
Z+
X(ej ) ejn d
77
WS 2006/2007
2.8
jn
y[n] e
n=
n0 integral number
jn
x[n] e
n=
Y (ej ) = X(ej )
j
n=
jn0
= X(e ) e
Then follows:
n=
x[n n0 ] ejn
Y (ej )
H(e ) =
X(ej )
= 1 ejn0
j
|H(ej )|2
|H(ei )|2
5
78
n0
WS 2006/2007
Delay
y[n-1]
H(e ) =
=
=
+
X
h[k] ejk
k=
+
X
k ejk
k=0
+
X
ej
k=0
1
1 ej
79
k
for || < 1
WS 2006/2007
80
Y (ej )
X(ej )
1
1 ej
WS 2006/2007
I
X
i=0
b[i] x[n i]
z-transform:
Y (z) = X(z)
I
X
b[i]z
i=0
Result:
H(z) =
Y (z)
X(z)
I
P
J
X
a[j]z j
j=1
b[i] z i
1+
j=1
j=1
a[j] y[n j]
Y (z)
i=0
J
P
+
X
J
X
a[j] z j
h[n] z n
n=
81
WS 2006/2007
in general:
h[n] has infinite number of non-zero values
= IIRfilter: Infinite Impulse Response
h[n] =
b[n]
0
n = 0, . . . , I
otherwise
82
WS 2006/2007
Example 4:
Impulse response as truncated geometric series
h[n] =
H(z) =
an
0
M
X
0nM
otherwise
a z
n=0
a IR
1 aM +1 z (M +1)
=
1 a z 1
system operation:
y[n] =
k=
M
X
k=0
h[k] x[n k]
ak x[n k]
83
WS 2006/2007
zk
z0
>0
2k
k = 0, 1, . . . , M
= a ej M +1
= a
(cancelled by zero z0 = a)
Im
M=11
Re
84
WS 2006/2007
Example 5:
Fibonacci numbers
Difference equation:
n0
n<0
H(z) =
h[n]z n
n=
= 1 + z
= 1 + z
+
+
n=0
(n+2)
n=0
= 1 + z
+ z
= 1 + z 1 (1 +
|
h[n]z (n+2)
n=0
n=0
h[n + 1]z
h[n]z n ) + z 2
{z
H(z)
= 1 + z 1 H(z) + z 2 H(z)
H(z)
=
=
+ z
h[n]z n
n=0
n=1
H(z)(1 z 1 z 2 )
(n+1)
h[n]z n
|n=0 {z
H(z)
1
1
1 z 1 z 2
1
a
b
1 bz 1
5 1 az 1
1 5
1+ 5
and b =
where a =
2
2
Digital Processing of Speech and Image Signals
85
WS 2006/2007
X
X
a n
1
=
=
an z n
a
1 (z )
z
n=0
n=0
X
1
an+1 bn+1 z n
H(z) =
5
n=0
X
!
=
h[n] z n
n=0
h[n] =
an+1 bn+1
86
WS 2006/2007
signal
Fouriertransform
1.
[n]
2.
[n n0 ]
ejn0
3.
4.
( < n < )
an u[n]
2( + 2k)
k=
1
1 aej
(|a| < 1)
X
1
+
( + 2k)
1 ej k=
5.
u[n]
6.
(n + 1)an u[n]
7.
rn sin p (n + 1)
u[n]
sin p
8.
sin c n
n
(
1, || < c ,
X(e ) =
0, c < ||
9.
(
1, 0 n M
x[n] =
0, otherwise
10. ej0 n
(|a| < 1)
(|r| < 1)
1
(1 aej )2
1
1 2r cos p ej + r2 ej2
j
k=
11. cos(0 n + )
2( 0 + 2k)
k=
u[n] =
1, n 0
0, n < 0
87
WS 2006/2007
2.9
Fouriertransform
X(ej ), Y (ej )
1.
ax[n] + by[n]
aX(ej ) + bY (ej )
2.
x[n nd ],
nd is integral number
ejnd X(ej )
3.
ej0 n x[n]
X(ej(0 ) )
4.
x[n]
X(ej )
X(ej ) if x[n] is real
5.
nx[n]
6.
x[n] y[n]
X(ej )Y (ej )
x[n]y[n]
1
2
7.
8.
Parseval theorem
9.
1
|x[n]| =
2
n=
X(ej )Y (ej() )d
(1 ej )X(ej )
|1 ej |2 = 2(1 cos )
x[n] x[n 1]
dX(ej )
d
1
10.
x[n]y[n] =
2
n=
|X(ej )|2 d
X(ej )Y (ej )d
88
WS 2006/2007
X(e ) =
x[k] ejk
k=
k=
+
X
=
j
k=
+
X
d
X(ej ) =
d
x[k] ejk
k=
+
X
+
X
d
d
d
X(ej ) =
d
d
d
x[k] ejk
k=
F {n x[n]} = j
d
F {x[n]}
d
k=
+
X
k=
j
jk
x[k] e
x[k] ejk
= X(e ) 1 ej
+
X
k=
+
X
x[k 1] ejk
x[k] ejk ej
k=
= |F {x[n]} |2 |1 ej |2
= |F {x[n]} |2 2 (1 cos())
89
WS 2006/2007
2.10
The Fourier transform for discrete time signals and systems has been explained on the previous pages. For discrete time signals with finite length
there is also another Fourier representation called Discrete Fourier Transform (DFT).
The DFT plays a central role in digital signal processing.
Decisive reasons:
fast algorithms exist for DFT calculation
(Fast Fourier Transform, FFT).
discrete frequencies k can be better represented in the computer than
continuous frequencies .
90
WS 2006/2007
Assume a discrete time signal x[n] with finite length (see also chapter 3.2
on page 135):
x[n] =
x[n]
0
0nN 1
otherwise
X(e ) =
N
1
X
x[n] exp(jn)
n=0
2
k
N
where k = 0, 1, . . . , N 1
Define:
X[k] : = X(ej )| = k
Im
N=8
Re
91
WS 2006/2007
N
1
X
x[n] exp(j
n=0
2
k n),
N
k = 0, 1, . . . , N 1
Inverse DFT:
N 1
1 X
2
x[n] =
k n),
X[k] exp(j
N
N
k=0
n = 0, 1, . . . , N 1
Remark:
This equation can be proven by inserting the equation for X[k] in the
equation for x[n] and using the orthogonality:
N 1
1 X
2
1
exp j kn =
0
N n=0
N
k = m N,
otherwise
m is integral number
Note:
Consider the analogy between inverse DFT (above) and inverse
Fourier transform of discrete time signal:
x[n] =
1
2
Z2
X(ej ) ejn d
92
WS 2006/2007
Remarks:
DFT coefficients X[k] are not an approximation of the discrete time
signal Fourier transform X(ej ). On the contrary:
X[k] = X(ej )| = k
Number of the coefficients X[k] depends on the signal length N . A
finer sampling of the discrete time signal Fourier transform is possible
by appending zeros to the signal x[n] (ZeroPadding).
x[n]
N-1
93
WS 2006/2007
|X(e
)|
k =
yields the DFT coefficients X[k].
N
N
+ 1, . . . , 0, . . . , .
2
2
|X(e
-N/2+1
-1
94
)|, |X[k]|
N/2
WS 2006/2007
X[k] =
N
1
X
x[n] exp(j
n=0
|X(e
2
k n)
N
)|, |X[k]|
N-1
N
+1k N 1
2
f =0
0<f <
95
fS
2
fS
2
fS
<f <0
2
WS 2006/2007
Re(X[k]) =
Re(X[N k])
Im(X[k]) = Im(X[N k])
For the amplitude spectrum | X[k] | the following holds:
| X[k] |2
96
WS 2006/2007
Realization of DFT:
/*
/*
/*
/*
PI = 3.14159265358979
x:
input signal
N: length of input signal
Xre, Xim: real and imaginary part of DFT coefficients
void
}
Remark:
discrete realization
2j
2j
97
WS 2006/2007
*/
*/
*/
*/
2.11
X[k]
N
1
X
x[n] exp (
n=0
N
1
X
2j
k n)
N
x[n] WNkn
n=0
where
WN := exp (
2j
)
N
N=12
W 0N =1
W 1N
W
3
N
W 2N
Periodicity of WN
unit root:
2
k) = (WN )k
N
2
:= exp (j )
N
exp (j k ) = exp (j
WN
98
WS 2006/2007
Note:
1.
WNr
= WNr mod N
2.
WNkN
= (WNN )k = 1k = 1
3.
WN2
= [exp (
kZ
2j
2j 2
)] = exp (
2)
N
N
= exp (
2j
) = WN/2
N/2
N/2
= exp (
2j N
) = exp (j) = 1
N 2
r+N/2
= WN
4.
WN
5.
WN
N/2
N even
WNr = WNr
99
WS 2006/2007
X[k] =
=
=
N
1
X
n=0
N
1
X
n=0
N
1
X
n=0
x[n] exp (
2j
k n)
N
WNkn x[n]
{WN }kn x[n]
=
=
1
N
1
N
k=0
N
1
X
k=0
N
1
X
k=0
100
1
{WN1 }
N
WS 2006/2007
N
complex Fourier components
2
(due to symmetry)
in words:
DFT causes no information loss in the signal.
Parseval theorem for DFT
general Fourier:
N
1
X
n=0
1
=
2
|x[n]|2
Z+
|X(ej )|2 d
|x[n]|
N 1
1 X
=
|X[k]|2
N
k=0
in words:
1
, the DFT is a norm conserving (=
energy
N
conserving) transformation (mathematical terminology: unitary).
101
WS 2006/2007
2.12
X() = F {x(t)} =
x(t) ejt dt
(2.9)
(sampling theorem).
This results in the Fourier transform of the discrete time signal x[n]:
X(e ) =
x[n] ejn
(2.10)
102
WS 2006/2007
1
0.8
0.6
0.4
0.2
0
0
N-1
Y (e
) =
N
1
X
y[n] ejk
n=0
DFT:
k
Y [k] =
2k
N
N
1
X
where k = 0, . . . , N 1
2
y[n] e N kn
n=0
Matrix representation:
Y [0]
..
Y [k] =
..
.
Y [K 1]
K=N
..
.
2j
e N
nk
..
.
103
y[0]
..
.
y[n]
..
.
y[N 1]
WS 2006/2007
2.13
2
k,
K
jk
X(e
=
=
N
1
X
n=0
K1
X
x[n] exp (
2j
k n)
K
x[n] exp (
2j
k n)
K
n=0
where
x[n] =
x[n]
0
n = 0, . . . , N 1
n = N, . . . , K 1
X[0]
..
..
.
X[K 1]
x[0]
..
..
x[N 1]
..
.
0
WKnk
n=0
n=N 1
n=N
n=K 1
Note:
Zero Padding does not introduce any additional information into the signal. This is only a trick so that DFT and particularly FFT (Fast Fourier
Transform) can be performed with a
higher frequency resolution
.
Digital Processing of Speech and Image Signals
104
WS 2006/2007
2.14
Finite Convolution
h[n] 0
for
n 6 {0, 1, 2, . . . , Nh 1}
Input signal:
x[n] 0
for
n 6 {0, 1, 2, . . . , Nx 1}
Output signal:
y[n] =
=
k=
N
h 1
X
k=0
h[k] x[n k]
h[k] x[n k]
h[k]
N-1
h
x[-k]
n=0
-(N-1)
x
Altogether:
k
0
n > N x + Nh 2
0
y[n] =
...
n = 0, 1, . . . , Nx + Nh 2
0
n<0
105
WS 2006/2007
a)
h[k]
1
Nh-1=12
x[k]
1
0.8
0.6
0.4
0.2
b)
Nx-1=4
x[n-k] , n=-1
i)
-Nx
-1 0
Nh-1
Nh-1
x[n-k] , n=Nh+Nx-1
iii)
0
c)
Nh+Nx-1
Nh-1
y[n]
2.8 3
2.4
2
1.8
1.2
1
0.6
0.2
Nx-1
Nh-1
Nh+Nx-2
Figure 2.13: Example of a linear convolution of two finite length signals: a) two signals;
b) signal x[n-k] for different values of n:
i) n < 0, no overlap with h[k], therefore convolution y[n] = 0
ii) n between 0 and Nh + Nx 2, convolution 6= 0
iii) n > Nh + Nx 2, no overlap with h[k], convolution y[n] = 0
c) resulting convolution y[n].
Digital Processing of Speech and Image Signals
106
WS 2006/2007
k=
h[k] x[n k]
Fourier:
Y (ej ) = H(ej ) X(ej ),
0 2
2
k,
N
k = 0, . . . , N 1 for any N
107
WS 2006/2007
2.15
Principle of FFT:
Calculation of the DFT can be done by successive decomposition into
smaller DFT calculations. In this way, the number of elementary operations (multiplications and additions) is dramatically reduced:
FFT:
N2
N = 1024 :
N
ld N
2
2N
2 1024
=
= 200
ld N
10
operations
factor of velocity gain
yes/no
radix 2 radix 4
decomposition to prime factors instead of N = 2n
History
1965 Cooley and Tukey
1942 Danielson and Lanczos
1905 Runge
1805 Gauss
108
WS 2006/2007
Algorithms which are based on a decomposition of the signal x[n] are called
decimationintime algorithms.
The case N = 2 is considered in the following.
X[k] =
=
N
1
X
n=0
N
1
X
x[n] exp(j
2
k n)
N
x[n] WNnk
where k = 0, 1, . . . , N 1
n=0
2
k n)
N
Decomposition of the sum over n into the sums over even and odd n:
N/21
N/21
X[k] =
x[2r]
WN2rk
r=0
r=0
N/21
(2r+1)k
x[2r + 1] WN
x[2r]
(WN2 )rk
N/21
WNk
r=0
r=0
Because of
WN2 = exp(2j
2
2
) = exp(j
) = WN/2
N
N/2
for k = 0, . . . , N 1 holds:
N/21
X[k] =
N/21
x[2r]
rk
WN/2
r=0
= G[k] +
WNk
r=0
WNk
H[k]
Each of the two sums corresponds to the DFT with the length N/2.
The first sum is a N/2DFT of the even indexed signal values x[n],
the second sum is a N/2DFT of the odd indexed values.
The DFT of the length N can be obtained by getting the two N/2
DFTs together, with the factor WNk .
109
WS 2006/2007
Complexity:
The complexity O(N 2 ) of one-dimensional FT can be reduced by adequate
resorting values from two FTs with length N2 and complexity O(2 ( N2 )2 ) =
N2
2 . By successive application of this resorting the complexity can be reduced to O(N log N ).
The case N = 23 = 8 is considered in the following.
X[4] can be obtained from H[4] and G[4] according to previous equation.
Because of the DFTlength
N
2
= 4:
And then:
X[4] = G[0] + WN4 H[0]
The values X[5], X[6] and X[7] can be obtained analogously.
Flow diagram for decomposition of one N -DFT into two N/2DFTs:
x[n]
X[k]
G[0]
x[0]
X[0]
G[1]
x[2]
x[4]
0
N
X[1]
N/2-point
G[2]
DFT
1
N
X[2]
G[3]
2
N
x[6]
X[3]
W3N
x[1]
X[4]
H[0]
x[3]
x[5]
4
N
X[5]
N/2-point
H[1]
5
N
DFT
X[6]
H[2]
6
N
x[7]
X[7]
H[3]
7
N
Figure 2.14: Flow diagram for decomposition of one N -DFT to two N/2DFTs with
N =8
Digital Processing of Speech and Image Signals
110
WS 2006/2007
x[n]
X[k]
X[0]
x[0]
W0N
x[4]
-1
X[1]
0
N
x[2]
W
W
x[6]
X[2]
-1
2
N
0
N
-1
-1
X[3]
W0N
x[1]
1
N
0
N
W
x[5]
-1
2
N
0
N
x[3]
W2N
W0N
x[7]
-1
-1
-1
W3N
X[4]
-1
X[5]
-1
X[6]
-1
X[7]
-1
111
WS 2006/2007
Complexity reduction
Number of complex multiplications in FFT
is N/2 ld N .
Comparison:
Direct application of the DFT definition needs
N 2 complex multiplications.
Example:
N = 1024 = 210
N2
200
N/2 ld N
112
WS 2006/2007
1
w
w2
w3
w4
w5
w6
w7
1
w2
w4
w6
w8
w10
w12
w14
1
w3
w6
w9
w12
w15
w18
w21
1
w4
w8
w12
w16
w20
w24
w28
1
w5
w10
w15
w20
w25
w30
w35
1
w6
w12
w18
w24
w30
w36
w42
1
1 1 1
w7
1 w w2
w14
1 w2 w4
w21
1 w3 w6
w28 = 1 w4 1
w35
1 w5 w2
w42
1 w6 w4
w49
1 w7 w6
113
1
w3
w6
w
w4
w7
w2
w5
1
w4
1
w4
1
w4
1
w4
1
w5
w2
w7
w4
w
w6
w3
1
w6
w4
w2
1
w6
w4
w2
1
w7
w6
w5
w4
w3
w2
w
WS 2006/2007
T2
10 1 0
0 1 0 w2
1 0 -1 0
0 1 0 w2
00 0 0
00 0 0
00 0 0
00 0 0
00 0 0
00 0 0
00 0 0
00 0 0
10 1 0
0 1 0 w2
1 0 1 0
0 1 0 w2
TS
T1
T1
1100
1 -1 0 0
0011
0 0 1 -1
0000
0000
0000
0000
TS
0 0 0 0 1000
0 0 0 0 0000
0 0 0 0 0010
0 0 0 0 0000
1 1 0 0 0100
1 -1 0 0 0 0 0 0
0 0 1 1 0001
0 0 1 -1 0 0 0 0
T2
0000
1000
0000
0010
0000
0100
0000
0001
T3
x[0]
x[1]
X[0]
X[1]
-1
x[2]
2
x[3]
-1
X[2]
-1
X[3]
-1
x[4]
x[5]
-1
x[6]
2
x[7]
-1
114
2
-1
3
-1
X[4]
X[5]
X[6]
X[7]
WS 2006/2007
115
WS 2006/2007
Butterfly Operation
Signal flow diagram and matrix representation of the FFT are based
on the following basic operation:
Xm[p]
Xm-1[p]
WrN
Xm-1[q]
Xm[q]
-1
For two input values Xm1 [p] and Xm1 [q] this operation produces
two output values Xm [p] and Xm [q]. The output values are thereby a
linear combination of the input values.
Because of the flow graph, the operation is called
Butterfly Operation.
Xm [p]
Xm [q]
1
WNr
1 WNr
116
Xm1 [p]
Xm1 [q]
WS 2006/2007
Bit Reversal
The matrix representation of the FFT uses a sorting matrix, i.e. the
signal which is to be transformed is at first resorted.
Example for N = 8:
n Binary representation Reversed n
0
000
000
0
1
001
100
4
2
010
010
2
3
011
110
6
4
100
001
1
5
101
101
5
6
110
011
3
7
111
111
7
Bit Reversal is a necessary part of the FFTAlgorithm.
Bit Reversal for N = 23
117
WS 2006/2007
2.16
FFT Implementation
Fortran version
C
adapted from: Oppenheim, Schafer p. 608
C SUBROUTINE FFT DecimationInTime (X, ld n) **********************************
C ****************************************************************************
PARAMETER PI = 3.14159265358979
PARAMETER N max = 2048
COMPLEX
COMPLEX
COMPLEX
COMPLEX
INTEGER
X(N max)
! array for input AND output
Temp
! temporary storage
W uni
! root of unity
W pow
! powers of W uni
N, ld N, ip, iq, iqbeq, j, k, i exp, istp
N = 2**ld n
IF (N.GT.N max) STOP
C BIT Reversed Sorting *********************************************************
j = 1
DO i = 1, N-1
IF (i.LT.j) THEN
! swap X(j) and X(i)
Temp = X(j)
X(j) = X(i)
X(i) = Temp
ENDIF
k = N/2
DO WHILE (k.LT.j)
j = j - k
k = k / 2
ENDDO
j = j + k
ENDDO
C End of Bit Reversed Sorting **************************************************
C FFT Butterfly Operations *****************************************************
DO i=1, ld N
i exp = 2**i
! exponent
istp = i exp/2
! stepsize
W pow = (1.0,0.0)
W uni = CMPLX (COS (PI/FLOAT(istp)), -SIN(PI/FLOAT(istp)))
DO ipbeg = 1, istp
DO ip = ipbeg, N, i exp
iq = ip + istp
Temp = X(iq) * W pow
X(iq) = X(iq) - Temp
X(ip) = X(iq) + Temp
ENDDO
W pow = W pow * W uni
ENDDO
ENDDO
C End of FFT Butterfly Operations **********************************************
RETURN
END
118
WS 2006/2007
119
WS 2006/2007
x[0]
X[0]
0
N
W
x[4]
-1
X[1]
0
N
x[2]
0
N
W
x[6]
X[2]
-1
2
N
-1
-1
X[3]
0
N
x[1]
0
N
1
N
W
x[5]
W
-1
W0N
W2N
x[3]
W0N
x[7]
-1
W2N
-1
-1
W3N
-1
-1
-1
-1
X[4]
X[5]
X[6]
X[7]
120
WS 2006/2007
121
WS 2006/2007
real
imag
real
imag
2N-3
real
}
}
b)
real
imag
real
imag
N-1
real
imag
N+1
real
t=0
t=
}
}
N+2
imag
N+3
real
N+4
imag
2N-1
real
}
}
}
}
}
f=0
f= 1
N
f = N/2 - 1
N
f=
1
2
f=
N/2 - 1
N
f=
1
N
t = (N-2)
2N-2
imag
2N-1
real
t = (N-1)
2N
imag
2N
imag
Figure 2.17: Input and output arrays of an FFT. a) The input array contains N (N is
power of 2) complex input values in one real array of the length 2N . with alternating
real and imaginary parts. b) The output array contains complex Fourier spectrum at N
frequency values. Again alternating real and imaginary parts. The array begins with the
zero-frequency and then goes up to the highest frequency followed with values for the
negative frequencies.
122
WS 2006/2007
DFT
FFT
transformation
(Nx + Nh )2
Nx Nh
Nx +Nh
2
log2 (Nx + Nh )
Nx + Nh
inverse transformation
(Nx + Nh )2
123
Nx +Nh
2
log2 (Nx + Nh )
WS 2006/2007
2.17
The Fourier transform plays a significant role for so-called cyclic matrices
(cf. chapter 3.8):
0
H =
h0
h1 h2
hN 1 . . . . . .
hN 2 . . . . . .
...
..
.
N 1
...
...
... ...
... ... ...
... ... ...
... ...
h1
so:
hN 1
hN 1
hN 2
..
.
h2
h1
h0
N 1
Hmn = h(nm)modN
N 1
N 1
w00
wn0
w(N 1)0
..
..
.
.
wn1
..
..
.
.
..
..
.
.
..
..
.
.
..
..
.
..
..
.
wn(N 1)
.
124
where w = e 2j
N
WS 2006/2007
N
1
X
2j
hn e N
kn
n=0
where k = 0, 1 . . . , N 1
N 1
h0 h1 h2
h1 . . . . . . . . .
... ... ...
... ...
..
...
.
h2 h3
h1 h2
125
...
...
...
...
...
N 1
h2 h1
h2
h3
. . . ...
...
... h
1
h1 h0
WS 2006/2007
N
1
X
n=0
2kn
hn cos
N
..
.
..
.
.
..
cos 2nm )
m
N
..
.
..
..
.
Application: diagonalisation of covariance matrices, e.g. for coding
of image and speech signals.
126
WS 2006/2007
Proof:
We will prove that
vk =
cos 2k0
N
2k1
cos N
..
.
..
.
..
.
2k(N 1)
cos
N
where hn = hN n
N 1
Hmn vkn
2km
= h0 cos
+
N
2
X
l=1
hl
2k(m l)
2k(m + l)
cos
+ cos
N
N
For even N only one term for l = (N 1)/2 can go into the sum.
According to addition theorem:
cos(x + y) + cos(x y) = 2 cos x cos y
With x = 2km/N and y = 2kl/N follows:
X
n
N 1
Hmn vkn
2km
+ 2
= h0 cos
N
N 1
2
2
X
l=1
hl
2km
2kl
cos
cos
N
N
2km
2kl i
cos
= h0 + 2
hl cos
N
| {zN }
l=1
|
{z
}
vkm
h
= k vkm
127
WS 2006/2007
with Hmn IR
H =
h0
h1
h2
..
.
h1
...
h2
...
... ...
... ... ...
... ... ...
...
...
h2N
h1N h2N
h2
hN 2 hN 1
hN 2
..
...
...
h2
...
h1
h1 h0
i.e.
H =
Hmn = h|nm|
h0
h1
h2
..
.
h1
...
...
...
h2
. . . hN 2
... ...
... ... ...
... ... ... ...
... ... ... ...
... ... ...
hN 2
hN 1 hN 2
h2 h1
128
hN 1
hN 2
..
h2
h1
h0
WS 2006/2007
H =
h0
hN 1
hN 2
..
.
h1 h2
. . . hN 2
... ... ...
... ... ... ...
... ... ... ...
... ... ...
... ...
h2
h1
h2
hN 1
hN 1
hN 2
..
h2
h1
h0
for n = 0, . . . , N
H =
h0 h1 h2 h3 h4 h3 h2 h1
h1 h0 h1 h2 h3 h4 h3 h2
h2 h1 h0 h1 h2 h3 h4 h3
h3 h2 h1 h0 h1 h2 h3 h4
h4 h3 h2 h1 h0 h1 h2 h3
h3 h4 h3 h2 h1 h0 h1 h2
h2 h3 h4 h3 h2 h1 h0 h1
h1 h2 h3 h4 h3 h2 h1 h0
129
WS 2006/2007
Chapter 3
Spectral analysis
Overview:
3.1 Features for Speech Recognition
3.2 Short Time Analysis and Windowing
3.3 Autocorrelation function and Power Spectral Density
3.4 Spectrograms
3.5 Filter Bank Analysis
3.6 MelScale
3.7 Cepstrum
- Cepstrum Calculation from Filter Bank Output
- MelCepstrum according to Davis and Mermelstein
3.8 Statistical Interpretation of Cepstrum Transformation
3.9 Energy in acoustic Vector
131
WS 2006/2007
3.1
speech signal
short-time
analysis
each 10 ms
(using FFT)
sequence of
acoustic vectors
reference model
for each word
in the vocabulary
pattern
comparison
decision
132
WS 2006/2007
133
WS 2006/2007
Goal:
Ideally: Real features for the recognition
In practice: Data reduction, i.e. compact description
of the speech signal (amplitude spectrum)
Side effect:
Method also enables coding of speech signals using lowest possible
number of bits
Key words:
Fourier transform: wide band/narrow band, autocorrelation function
Filter bank
Cepstrum
Linear Predictive Coding (LPC) analysis
Fundamental frequency analysis
134
WS 2006/2007
3.2
1,
0,
|n| N/2
otherwise
S(ej ) W (ej() ) d
135
WS 2006/2007
Window function:
Impulse response:
0
-10
0.8
-20
dB
0.6
0.4
-30
-40
0.2
-50
0
0
-60
-0.5 fs
N-1
0.5 fs
0.5 fs
0.5 fs
0.5 fs
Rectangle
-10
0.8
-20
dB
0.6
0.4
-30
-40
0.2
-50
0
0
-60
-0.5 fs
N-1
Triangle
-10
0.8
-20
dB
0.6
0.4
-30
-40
0.2
-50
0
0
-60
-0.5 fs
N-1
Hanning
-10
0.8
-20
dB
0.6
0.4
-30
-40
0.2
-50
0
0
-60
-0.5 fs
N-1
Hamming
Digital Processing of Speech and Image Signals
136
WS 2006/2007
Window function:
Impulse response:
0
-10
0.8
-20
dB
0.6
0.4
-30
-40
0.2
-50
0
0
-60
-0.5 fs
N-1
0.5 fs
0.5 fs
0.5 fs
Nuttall
-10
0.8
-20
dB
0.6
0.4
-30
-40
0.2
-50
0
0
-60
-0.5 fs
N-1
Gauss
-10
0.8
-20
dB
0.6
0.4
-30
-40
0.2
-50
0
0
-60
-0.5 fs
N-1
Chebyshev
137
WS 2006/2007
Fourier Transform
of a continuous
time signal
SC()
-0
Frequency graph
of anti-aliasing
low-pass filter
H()
-
T
XC()
Fourier Transform
of filtered signal
-
T
-0
Fourier Transform of
sampled signal
X(ej)
Fourier Transform
of window function
0=T
W(ej)
2
n
Fourier Transform
of windowed signal
and sampled values
of continuous spectrum
obtained using DFT
V(ej), V[k]
Figure 3.1: Example for the application of the Discrete Fourier Transform (DFT).
138
WS 2006/2007
139
WS 2006/2007
<t<
<n<
where 0 = 0 TS and 1 = 1 TS
with the window function w[n]:
v[n] = A0 w[n] cos(0 n) + A1 w[n] cos(1 n)
Intermediate calculations:
v[n] =
+
also modulation principle
A0
A0
w[n] exp(j 0 n) +
w[n] exp(j 0 n)
2
2
A1
A1
w[n] exp(j 1 n) +
w[n] exp(j 1 n)
2
2
A0
A0
W (ej(0 ) ) +
W (ej(+0 ) )
2
2
A1
A
1
W (ej(1 ) ) +
W (ej(+1 ) )
2
2
140
WS 2006/2007
Assume:
4
2
10kHz, 1 =
10kHz
14
15
1/TS = 10kHz, rectangle window with N = 64, A0 = 1, A1 = 0.75
0 =
The windowed signal v[n] for the discrete time signal x(n) is therefore:
4
2
0 :
otherwise
v[n]
2
63
0
-1
141
WS 2006/2007
64
Example 1:
Leakage Effect
2 4
10 Hz,
6
1 =
2 4
10 Hz
3
0 = 0 TS =
2 4
2
10 Hz 104 s =
6
6
1 = 1 TS =
2
2 4
10 Hz 104 s =
3
3
142
WS 2006/2007
Case 1a (continued):
0 =
2
6
1 =
2
3
V()
32
Case 1b:
2
3
0 =
2
6
2
14
2
6
1 =
2
3
4
15
V()
32
4
15
2
14
143
2
14
4
15
WS 2006/2007
Case 1c:
0 =
2
14
1 =
2
12
V()
30
Case 1d:
2 2
12 14
0 =
2
14
1 =
4
25
V()
40
144
WS 2006/2007
Example 2:
145
WS 2006/2007
a)
v[n]
2
63
0
-1
b)
V(k)
30
c)
63
V()
32
146
WS 2006/2007
Case 2b:
In contrast to case 2a, the frequencies of sinusoids are changed only
slightly.
Windowed signal v[n]:
(
2
2
cos( n) + 0.75 cos( n) : 0 n 63
v[n] =
16
8
0 :
otherwise
147
WS 2006/2007
a)
v(n)
63
-1
b)
V(k)
30
c)
63
V()
32
148
WS 2006/2007
Analysis of Example 2:
The manifestation of the DFT can be put down to the spectral sampling. Although in Case 2b the windowed signal v[n] contains a significant number of frequencies beyond 0 and 1 , they do not show in
the DFT spectrum of length N = 64.
Using a rectangle window, the DFT of the sinusoidal signal gives sharp
spectral lines, if the period N of the transformation is a whole multiple
of the signal period and no Zero Padding is applied.
Explanation for the case of a complex exponential function:
Assume the signal x[n]:
1
2
n)
exp(j
N
n0
x[n] =
Then:
X[k] = (k
N
)
n0
sin(k)
sin(k/N )
149
N
is non-zero.
n0
WS 2006/2007
Example 2 (continued)
150
WS 2006/2007
a)
V(k)
30
b)
63
V(k)
32
c)
127
V()
32
Figure 3.4: a) DFT of length N = 64; b) DFT of length N = 128; c) Fourier spectrum
V (ej ).
151
WS 2006/2007
Example 3:
Explanation of following illustrations:
Assume: signal of Example 2, Case 2a.
Window: Kaiser window is applied instead of rectangle window.
First: window length L = 64 and DFT length N = 64.
Then: window length L and DFT length N are halved.
Afterwards: for the case L = 32, the DFT length N is gradually
increased up to N = 1024 (Zero Padding).
Finally: DFT spectrum for the case N = 1024 and L = 64.
The Kaiser window is defined as:
1/2
2
I0 1 [(n ) /]
wK [n] =
: 0nL1
I0 ()
0 :
otherwise
In this example:
= 0.8
and
L1
2
4
2
n) + 0.75 wK [n] cos( n)
14
15
152
WS 2006/2007
Example 3: (continued)
DFT length N = 64, window length L = 64
Windowed signal
v(n)
0
63
-1
DFT spectrum
V(k)
30
63
153
WS 2006/2007
Example 3: (continued)
DFT length N = 32, window length L = 32
(N and L halved)
Windowed signal
v(n)
0
31
DFT spectrum
V(k)
8
31
154
WS 2006/2007
Example 3: (continued)
Effect of changing DFT length N at constant window length L = 32 (Zero
Padding)
DFT length N = 32, window length L = 32
V(k)
8
31
63
155
WS 2006/2007
Example 3: (continued)
DFT length N = 128, window length L = 32
V(k)
8
127
1024
156
WS 2006/2007
Example 3: (continued)
Increasing the window length (L)
DFT length N = 1024, window length L = 64
V(k)
16
1024
157
WS 2006/2007
speech signal
phoneme "a"
amplitude spectrum
- rectangle window -
amplitude spectrum
- Hamming window -
158
WS 2006/2007
3.3
x[n] x[n + k]
n=
NX
1k
x[n] x[n + k]
n=0
triangular effect
number of terms in R[k]
-N
Cross correlation:
Rxy [k] =
x[n] y[n k]
x[n] y[k n]
n=
In contrast to convolution:
Oxy [k] =
n=
159
WS 2006/2007
Properties of ACF:
1. R[k] = R[k]
2. R[k] R[0]
= X(ej ) X(ej )
X
X
=
x[k] exp(jk)
x[l] exp(jl)
=
=
=
=
k=
l=
k= l=
X
X
k= l=
X
X
k=
x[k + l] x[l]
l=
exp(jk)
R[k] exp(jk)
k=
Note:
160
WS 2006/2007
5. Because of the symmetry R[k] = R[k] the DFT becomes the cosine
transform:
j
| X(e ) |
=
=
R[k] exp(jk)
k=
N
1
X
k=(N 1)
= R[0] +
R[k] exp(jk)
N
1
X
k=1
N
1
X
= R[0] + 2
R[k] cos(k)
because
R[k] = R[k]
k=1
Moivre formula:
k
k
cosk4 () sin4 () . . .
cosk2 () sin2 () +
cos(k) = cosk ()
4
2
161
WS 2006/2007
speech signal
phoneme "a"
amplitude spectrum
- Hamming window -
amplitude spectrum
- short hamming window -
amplitude spectrum
- 19 ACF-coefficients -
amplitude spectrum
- 13 ACF-coefficients -
162
WS 2006/2007
autocorrelation
- rectangle window -
autocorrelation
- rectangle window -
autocorrelation
- Hamming window -
autocorrelation
- Hamming window -
Figure 3.7: Signal progression and autocorrelation function of voiced (left) and unvoiced
(right) speech segment
163
WS 2006/2007
0
0
Figure 3.8: Temporal progression of speech signal and four autocorrelation coefficients
164
WS 2006/2007
3.4
Spectrograms
Using DFT
Wide-band:
in frequency domain:
in frequency domain:
165
WS 2006/2007
Figure 3.9: a) wide-band spectrogram: short time window, high time resolution (vertical lines), no frequency resolution; for voiced signals provides information on formant
structure b) narrow-band spectrogram: long time window, no time resolution, high
frequency resolution (horizontal lines); for voiced signals provides information on fundamental frequency (pitch)
Digital Processing of Speech and Image Signals
166
WS 2006/2007
Figure 3.10: Wide-band and narrow-band spectrogram and speech amplitude for the
sentence Every salt breeze comes from the sea.
167
WS 2006/2007
3.5
History:
Decomposition of the signal using a bank of band-pass filters and
energy calculation in each frequency band
transfer
function
Today digitally:
Digital filters:
yk [n] =
m=
hk [n m] x[m] ,
k = 1, . . . , K
DFT/FFT Method:
Window function
Appending zeros for desired resolution (zero padding)
FFT
Energy calculation:
168
WS 2006/2007
transfer
function
transfer
function
169
WS 2006/2007
Averaging:
summation should be as smooth as possible over all channels
Form: rectangle, triangle, trapeze, etc.
Choosing the central frequencies fk :
constant:
fk = const. for all k
e.g. 20 channels with f = 200Hz for 0 4 kHz
constant relative band width:
fk
= const. for all k
fk
frequency groups of the ear (total number 24):
f
< 500Hz :
500Hz :
f = 100
f
= 20%
f
170
WS 2006/2007
3.6
Mel-frequency scale
MEL
2700
7000
f / Hz
fMEL
A filter bank with constant band-widths can be used on the Mel scale:
f
MEL
171
WS 2006/2007
172
WS 2006/2007
3.7
Cepstrum
The Cepstrum is the Fourier series expansion of the logarithm of the spectrum.
Comparison: autocorrelation function is a Fourier series of the normal
spectrum.
We consider:
y[n] =
k=
h[n k] x[k]
Goal:
Separating the kernel h[n] from the input signal x[n].
This problem is also called inversion or deconvolution.
Convolution theorem:
Y (ej ) = H(ej ) X(ej )
Logarithm (complex):
log Y (ej ) = log H(ej ) + log X(ej )
Inverse Fourier Transform:
F 1 log Y (ej ) = F 1 log H(ej ) + F 1 log X(ej )
173
WS 2006/2007
Another notation:
#
"
Z
X
1
x[m] exp(jm) d
=
exp(jn) log
2
m
= C {x[n]}
Note:
Cepstrum = artificial word derived from spectrum
Cepstrum is located in time domain
174
WS 2006/2007
x[n] = C {x[n]}
y[n] =
k=
h[n k] x[k]
y[n] = h[n]
+ x[n]
o
n
x[n]}
L {
y [n]} = L h[n] + L {
With the definition GL for the concatenation of the cepstrum, the operation
L, and the inverse cepstrum
GL := C 1 L C
we obtain
GL {h[n] x[n]} = GL {h[n]} GL {x[n]} .
Such a transformation GL acts on h[n] and x[n] separately, and is called:
homomorph (structure preserving)
175
WS 2006/2007
Complex cepstrum:
1
x[n] =
2
exp(jn) logX(ej ) d
1
x[n] =
2
Z2
exp(jn) log|X(ej )| d
176
WS 2006/2007
1
T
frequency
F-1(log|F(w)|2)
time
177
WS 2006/2007
Example 2: Smoothing
speech signal
phoneme "a"
Figure 3.12: Cepstral smoothing: speech signal (vowel a), windowed speech signal
(Hamming window), spectrum obtained from the whole cepstrum (blue) and smoothed
spectrum obtained from the first 13 cepstral coefficients (red).
178
WS 2006/2007
speech signal
phoneme "a"
179
WS 2006/2007
A
-K+1
A
-1
A A
180
WS 2006/2007
K
X
k=K+1
2j
nk
Ak exp
2K
K
2j
2j
1 X
nk + exp
n(k + 1)
Ak exp
=
2K
2K
2K
k=1
K
2j
2j
1 X
2j
= exp
0.5
n(k 0.5) + exp
n(k 0.5)
Ak exp
2K
2K
2K
2K
k=1
K
n
1 X
2j
0.5
(k 0.5)
Ak cos
= exp
2K
K
K
k=1
2j
2K
0.5 depends on the position of the symmetry axis
K
n
1 X
(k 0.5)
Ak cos
K
K
k=1
181
WS 2006/2007
= 100
MEL
k=1
= 300
MEL
MEL
k=K
k=3
Filter bank:
overlapping band-pass filters triangular shape,
all channels have equal band width, and filter positioning is equidistant on a Mel scale.
Calculation of the filter bank outputs:
magnitude of DFT coefficients,
for each channel summation of the magnitudes according to triangular
weight function,
for each channel logarithm of the sum.
Thus the filter outputs A[k] with k = 1, . . . , K are obtained. Using the
filter bank outputs, the cepstrum is calculated using a cosine transform.
(see previous description)
182
WS 2006/2007
3.8
N/2
Assumption: The correlation between the outputs s and p, i.e. the element
Csp of the covariance matrix does not depend directly on s or p, but only
on their difference. Because the spectrum is periodical there is no distance
greater than N :
Csp = c(sp)modN
It is further assumed that the correlation is locally symmetric:
Cs,s+n = Cs,sn
Then:
c(ssn)modN = c(ss+n)modN
c(n)modN = c(+n)modN
With 0 n N follows:
cn = cN n
183
WS 2006/2007
c0 c1 c2 c3
c c c c
1 0 1 2
c c c c
2 1 0 1
c c c c
C = 3 2 1 0
c4 c3 c2 c1
c3 c4 c3 c2
c2 c3 c4 c3
c1 c2 c3 c4
= 8:
c4
c3
c2
c1
c0
c1
c2
c3
c3
c4
c3
c2
c1
c0
c1
c2
c2
c3
c4
c3
c2
c1
c0
c1
c1
c2
c3
c4
c3
c2
c1
c0
184
WS 2006/2007
3.9
The energy is usually added as zeroth (or first) component to the acoustic
vector.
For the logarithmic energy we have:
log E =
1
2
log|X(ej )|2 d
logYk2 0
185
WS 2006/2007
186
Chapter 4
Fourier Transform and Image
Processing
Overview:
4.1 Spatial Frequencies and Fourier Transform for Images
4.2 Discrete Fourier Transform for Images
4.3 Fourier Transform in Computer Tomography
4.4 Fourier Transform and RST Invariance
187
WS 2006/2007
4.1
Convention:
g(x, y) 0
188
WS 2006/2007
Z+
Gy (fx ) e2jfy y dy
Z+ Z+
189
WS 2006/2007
We would like to interpret the two-dimensional FT visually. For this purpose, we consider the exponential factor in the FT and require the following
condition:
!
e2j(fx x+fy y) = 1
2j(fx x + fy y) = 2n
y =
for n IN
fx
n
x +
fy
fy
1/fy
1/fx
1
L = q
fx2 + fy2
spatial period
190
WS 2006/2007
Special case:
|G(fx , fy )| has a large value only at one point (u, v) = (fx , fy ) in the
spatial frequency plane
fy
|G(fx,fy)|
-u
fx
-v
191
WS 2006/2007
Since G(fx , fy ) = G(fx , fy ) for a real image g(x, y), we have two
dominant frequency pairs in the Fouriertransform integral:
|G(u, v)| [e2j(ux+vy) + e2j(ux+vy) ] = 2|G(u, v)| cos 2(ux + vy)
This function describes a black-white cosine wave pattern with
(fx , fy ) = (u, v)
192
WS 2006/2007
193
WS 2006/2007
Explanation for figures 4.14.6 (from Duda & Hart 1973, pp. 310312):
Figure 4.1: TVimage (analog)
Figure 4.2: digitized TVimage
- 120120 pixels
- grey values from 0 (black) to 15 (white)
Figure 4.3: Fouriertransform of the image from Figure 4.2 (amplitude spectrum)
log|G(fx , fy )|: black =
high amplitude
note:
1. strong components along the axes
=
vertical and horizontal image edges
2. concentration around (fx , fy ) = (0, 0)
=
regions with constant grey values
Figure 4.4: Low-pass filter:
H(fx , fy ) = [cos(fx ) cos(fy )]16
0H1
Figure 4.5: High pass filter:
H(fx , fy ) = 1.5 [cos(fx ) cos(fy )]4
0.5 H 1.5
Figure 4.6: High pass enhancement:
H(fx , fy ) = 2.0 [cos(fx ) cos(fy )]4
1.0 H 2.0
194
WS 2006/2007
195
WS 2006/2007
4.2
2i
N u
,e
2i
N v
) =
1
N
1 N
X
X
2i
where u, v = 0, 1, . . . , N 1
j=0 k=0
G[u, v] =
2i
j=0 k=0
N
1
X
j=0
2i
N uj
N
1
X
2i
N vk
g[j, k] e
k=0
Interpretation:
Fouriertransform of the image is first performed row by row, then column
by column.
2i
Using usual definition of the Fourier matrix W (i.e. Wvk = (e N )vk ),
we obtain the matrix representation of Fouriertransform.
Using the notation:
g IRN xN
W CN xN
G CN xN
we obtain
G =
g
[W g W ]
1
[W 1 G W 1 ]
2
N
196
WS 2006/2007
4.3
y=ax+b,
a const.
g(x, ax + b) dx
ga (b) e2jfb b db
Z Z
=
g(x, ax + b) e2jfb b db dx
Ga (fb ) =
= G(afb , fb )
= Fouriertransform G(fx , fy ) of g(x, y) along
1
the spatial frequency straight line (fx , fy ) with fy = fx
a
197
WS 2006/2007
Remarks:
a. Straight line in spatial frequency domain: (fx , fy ) = (afb , fb ) is
orthogonal to y = ax + b:
1
y = ax + b => in Fouriertransform fy = fx
a
The angle
between these straight lines is a right angle because
1
a a = 1.
In general:
y1 (x) = m1 x + b1
y2 (x) = m2 x + b2
y1 (x) y2 (x) m1 m2 = 1
b. The value Ga (fb ) is independent of the offset b and depends only
on the orientation a of the straight line. Therefore, if we calculate the projection for many different inclinations a and apply the
one-dimensional FT, we obtain the two-dimensional FT of the image
g(x, y).
198
WS 2006/2007
4.4
fx
=
fy
IR2
199
WS 2006/2007
Translation
z z + z0
x0
with translation vector z0 =
IR2
y0
Image : g(z) g(z) := g(z + z0 )
) = exp (i[fx x0 + fy y0 ]) G(f )
FT : G(f
Rotation
z
D z
with rotation matrix D =
cos
sin
sin cos
200
WS 2006/2007
Scaling
z
Image : g(z)
) =
FT : G(f
z
with scaling factor > 0
g(z) = g( z)
f
1
G
2
Image : g(z)
FT :
Az
where A IR22 invertable
) =
G(f
=
Proof:
g(z) = g(Az)
...
1
1 T
G((A
) f)
det(A)
201
WS 2006/2007
C2
Complex logarithm:
fz := ln fz = ln r + i
We already know:
a) rotation by angle 0 in spatial domain
=
rotation by angle 0 in spatial frequency domain
b) scaling with factor in spatial domain
1
1 2
=
scaling with factor
respectively in spatial
and
frequency domain
fz
scaling and
rotation
202
fz
WS 2006/2007
fz
= ln z
= ln r + i
r
= ln
+ i( 0 )
= ln r + i ln i 0
= fz ln i 0
= translation with the shift vector ( ln i ) C2
in logarithmic polar coordinates of the spatial frequency plane
203
WS 2006/2007
G(fx , fy )
= F {g(x, y)}
204
WS 2006/2007
y
6
- x
original image
fy
6
- f
x
|FFT|
6
-
logpolar
ln r
fy
6
- f
x
|FFT|
205
WS 2006/2007
Warning:
a) Invariant observations are not necessarily good for classification.
b) Observations that are calculated using the two-dimensional Fourier
transform are not complete, i.e. the original image cannot be reconstructed completely.
206
WS 2006/2007
Chapter 5
LPC Analysis
Overview:
5.1 Principle of LPC Analysis
5.2 LPC: Covariance Method
5.3 LPC: Autocorrelation Method
5.4 LPC: Interpretation in Frequency Domain
5.5 LPC: Generative Model
5.6 LPC: Alternative Representations
207
WS 2006/2007
5.1
n-2 n
time
signal value
predicted value
K
X
k=1
k x[n k]
208
WS 2006/2007
Outlook
Starting point: coding in time domain (goal: bit reduction)
Parseval Theorem
209
WS 2006/2007
For a reliable set of LPCcoefficients we calculate the squared error criterion E as sum of the squared prediction errors e[n]:
X
e2 (n)
E =
n
X
n
"
x[n]
K
X
k=1
k x[n k]
#2
P
k
P
P
k x[n k]x[n l] = x[n l]x[n]
n
210
WS 2006/2007
K
P
k=1
P
n
x[n k] x[n l] =
X
n
x[n l] x[n]
211
WS 2006/2007
5.2
predicted value
N-1
Covariance Method
No window function is applied, such that we obtain the following
summation limits:
X
e (n) =
N
1
X
e2 (n)
n=0
i.e. we also use signal values x[n] with n < 0 for prediction.
The resulting equation system for LPCcoefficients:
l = 1, . . . , K :
K
X
k (l, k) = (l, 0)
k=1
N
1
X
n=0
x[n l] x[n k]
212
WS 2006/2007
5.3
N-1
We consider the signal after multiplication with a convenient window function, usually Hamming window:
In principle, the summation limits now are
X
e [n] =
n=+
X
e2 [n] .
n=
Since, due to windowing the signal x[n] is identical to zero outside the
window function, i.e.
x[n] 0
NX
+K1
e2 [n]
n=0
The prediction error e[n] can become large on the window function
boundaries:
- Beginning: prediction from zeros
- End:
prediction of zeros
213
WS 2006/2007
NX
1l
n=0
X
n
X
n
x[n k] x[n l]
x[n] x[n l] =
NX
1l
n=0
x[n] x[n l]
In this way we obtain the following equation system for the LPCcoefficients
k :
l = 1, ..., K :
K
X
k=1
or in matrix form:
R(0)
R(1)
R(1)
..
.
R(0)
..
.
R(K 1) R(K 2)
1
R(1)
2
...
R(K 2)
R(2)
..
..
.
.
...
.. ..
R(1)
. . . R(1)
R(0)
K
R(K)
...
R(K 1)
214
WS 2006/2007
Note that this equation system is completely determined by the autocorrelation coefficients
R(0), ..., R(k), ..., R(K).
Hence, the autocorrelation coefficients will only be converted to obtain
the LPCcoefficients
1 , ..., k , ..., K .
The matrix of this equation system has the following properties:
- Toeplitz structure (follows from time invariance)
- solution: Durbinalgorithm with complexity O(K 2 )
215
WS 2006/2007
5.4
K
X
k=1
k x[n k]
The total error Etot for the squared error criterion becomes:
Etot =
NX
+K1
e2 [n]
n=0
Z+
1
2
1
2
1
2
|E(ej )|2 d
Z+
(Parseval Theorem)
2
K
X
jk
k e
|X(ej )|2 d
1
Z+
k=1
P (ej )2 |X(ej )|2 d
216
WS 2006/2007
P (e ) := 1
K
X
k ejk
k=1
= ...
=
K
X
k=1
Bk cos(k)
217
WS 2006/2007
Observations:
These zeros are complex conjugated pairs because k IR.
j 2
The zeros can cause minima of P (e ) . The minima of |P (ej )|2
approximately correspond to the maxima of the smoothed spectrum
|X(ej )|2 , because for minimization of the error integral it is first of
all necessary to compensate for the maxima of the signal spectrum.
The LPC analysis could therefore be used to describe of the speech
signal formant structure.
|P(e i )|2
|X(e i )|2
218
WS 2006/2007
prediction error
- 12 LPC-coefficients -
0
0
LPC-spectrum
- 12 coefficients -
spectrum of
prediction error
(12 LPC-coefficients)
LPC-spectrum
- 18 coefficients -
219
WS 2006/2007
amplitude spectrum
- Hamming window -
LPC-spectrum
- 4 coefficients -
LPC-spectrum
- 8 coefficients -
LPC-spectrum
- 12 coefficients -
LPC-spectrum
- 16 coefficients -
LPC-spectrum
- 18 coefficients -
LPC-spectrum
- 20 coefficients -
220
WS 2006/2007
5.5
x(n)
recursive
filter
k
K
X
k x[n k]
k=1
K
X
E(z) = X(z)
k X(z) z k
k=1
= X(z) [1
K
X
k z k ]
k=1
K
X
k=1
k x[n k] .
E(z)
K
P
k z k
1
k=1
221
WS 2006/2007
Special case:
E[n] = G [n]
Then for LPC model spectrum X(z) holds:
X(z) =
G
K
P
k z k
1
k=1
This spectrum is often interpreted as LPC model spectrum X(z) of observed signal. It is reasonable to set (without explanation):
#
"
K
K
X
X
R(k)
G2 = R(0)
k R(k) = R(0) 1
k
R(0)
k=1
k=1
This LPC model spectrum does not have any zeros, it has only poles, and
therefore is also called allpole model.
Remarks:
stability problems by solving the equation system
( truncation error in autocorrelation)
222
K = 10
K = 12
K = 14
WS 2006/2007
5.6
so far:
G
gain
k
LPCcoefficients
impulse response of generative model
impulse response of squared absolute value of predictor polynom
cepstrum
poles / zeros of synthesis model / predictor polynom
= formants / bandwidths
A1
A2
A3
Glottis
A4
A5
Lips
223
WS 2006/2007
Chapter 6
Outlook: Wavelet Transform
Overview:
6.1 Motivation: from Fourier to Wavelet Transform
6.2 Definition
6.3 Discrete Wavelet Transform
225
WS 2006/2007
6.1
complex in general
Z+
f (t)w(t b)ejt dt
226
WS 2006/2007
6.2
Definition
Like the window function for the shorttime Fourier transform the MotherWavelet should be localized as much as possible.
Example:
Mexican-Hat Function:
1 2
(t) = (1 t2 )e 2 t
(t)
227
WS 2006/2007
Z+
tb
f (t)
dt
a
da db F (a, b)
C a2
a
with
C :=
Z
0
|()|2
d <
F (a, b)
1
=
a
1
=
a
with
Z+
Z+
1
ab (t) =
a
f (t) ab (t) dt
F () ab () d
tb
a
228
WS 2006/2007
6.3
F (n, m)
m
2
mn (t) := a0
(am
0 t nb0 )
L2 (IR).
Note: The scalar product < f (t), g(t) > of two functions f (t) and g(t) is
defined as:
Z
< f (t), g(t) > =
f (t) g(t) dt
229
WS 2006/2007
In this way we obtain the following representation for the discrete Wavelet
transform:
F (m, n) =
Z+
f (t)a0 2 (am
0 t nb0 ) dt
1 XX
F (m, n)mn (t)
C m n
1 XX
m
F (m, n) a0 2 (am
0 t nb0 )
C m n
230
WS 2006/2007
Example:
0 t 12
1
(t) = 1 12 t < 1
0
otherwise
This defines the Haar basis
(t) | m, n Z, m > 0 :
1
mn (t) =
2m
2m t n
It is easy to see that for increasing m a increasingly finer resolution is obtained and that n determines localization in time.
231
WS 2006/2007
232
WS 2006/2007
Chapter 7
Coding
The following types of coding are distinguished:
source coding (data compression)
goal: transmission (storage) using as few bits as possible without or
with few errors
channel coding
goal: preferably faultless data transmission (storage)
e.g. error-recognizing and error-correcting codes
simultaneous source and channel coding
goal: simultaneous optimization
The following data types are distinguished:
discrete alphabet
continuous signal (audio, video, . . . )
Source coding
lossless coding (compression)
usually discrete sources, e.g. text compression
lossy coding
usually continuous signals
notation:
rate - distortion theory
distortion, error
bit rate
233
234
WS 2006/2007
signal
transmission
reconstructed
signal
T -1
Q -1
C -1
T:
Q:
C:
235
WS 2006/2007
References:
Ze-Nian Li: CMPT 365 Multimedia Systems. Simon Fraser
University, British Columbia, Canada, fall 1999, Version Jan.2000;
http://www.cs.sfu.ca/CourseCentral/365/li/index_prev.html.
Peter Noll: MPEG Digital Audio Coding. IEEE Signal Processing
Magazine, pp.59-81, Sep. 1997.
Thomas Sikora: MPEG Digital Video-Coding Standards. IEEE Signal
Processing Magazine, pp.82-100, Sep. 1997.
A. Ortega, K. Ramchandran: Rate-Distortion Methods for Image
and Video Compression. IEEE Signal Processing Magazine, pp.2350, Nov. 1998.
G. J. Sullivan, Th. Wiegand: Rate-Distortion Optimization for Video
Compression. IEEE Signal Processing Magazine, pp.74-90, Nov. 1998.
236
WS 2006/2007
Chapter 8
Image Segmentation and
Contour-Finding
The lecture notes for this chapter are available as a separate document.
237
238