Professional Documents
Culture Documents
Bernd Girod: EE398A Image and Video Compression Transform Coding no. 1
Transform coding - topics
Bernd Girod: EE398A Image and Video Compression Transform Coding no. 2
Block-wise transform coding
original reconstructed
image block
block
Transform A Inverse
transform A-1
Transform Quantized
coefficients Quantization,
transform
entropy coding
& storage or
coefficients
transmission
Bernd Girod: EE398A Image and Video Compression Transform Coding no. 3
Properties of orthonormal transforms
Forward transform
y = Ax
Inverse transform
-1 T
x=A y=A y
Linearity: x is represented as linear combination of basis
T
functions (i.e., columns of A )
Bernd Girod: EE398A Image and Video Compression Transform Coding no. 4
Energy conservation
Interpretation
Vector length (energies) conserved
Bernd Girod: EE398A Image and Video Compression Transform Coding no. 5
2-d orthonormal transform
cos sin
A
sin cos
x2 y2 x2
x1 y1 x1
Bernd Girod: EE398A Image and Video Compression Transform Coding no. 6
Unequal variances of transform coefficients
E A x X x X A T AR xx A T
T
Variances of the coefficients yi are diagonal elements of Ryy
Y2 R yy AR xx AT
i i,i i,i
Bernd Girod: EE398A Image and Video Compression Transform Coding no. 7
Coding gain of orthonormal transform
d R 2 2 2
X
2 R
d R
GT
d XFORM R
Bernd Girod: EE398A Image and Video Compression Transform Coding no. 8
Coding gain of orthonormal transform (cont.)
1 N 1 2 2 2 Rn 1 N 1
Jd XFORM
R R Y 2
N n0 n
Rn
N n0
R0 ,R1 ,K RN 1
min.
J
Solution by setting 0 for all n
Rn
Distortion of
Pareto condition
individual di d j
coefficient for all i, j
Ri R j
Vilfredo Pareto
Economist
1848-1923
Bernd Girod: EE398A Image and Video Compression Transform Coding no. 9
Coding gain of orthonormal transform (cont.)
2 Y2
dn Rn =d XFORM R for all n 1
Rn = log 2 XFORM
n
for all n
2 d
Transform coding gain
N 1
1
d R X2 N
Y
2
n
GT n 0
d XFORM
R N 1 N 1
N
Y
2
n 0
n
N
Y
2
n 0
n
Bernd Girod: EE398A Image and Video Compression Transform Coding no. 10
Reverse water filling
2
1
Rn = log 2 n for all n
Yn , if 2 dn
Yn
Bernd Girod: EE398A Image and Video Compression Transform Coding no. 11
Karhunen Love Transform (KLT)
Bernd Girod: EE398A Image and Video Compression Transform Coding no. 12
KLT maximizes coding gain
det R YY Y2
n
n0
KLT det R A
n0
2
Yn YY
n0
2
Yn
Bernd Girod: EE398A Image and Video Compression Transform Coding no. 13
Disadvantages of KLT
Bernd Girod: EE398A Image and Video Compression Transform Coding no. 14
Various orthonormal transforms
Karhunen Love transform [1948/1960]
Haar transform [1910]
Walsh-Hadamard transform [1923]
Slant transform [Enomoto, Shibata, 1971]
Discrete CosineTransform (DCT)
[Ahmet, Natarajan, Rao, 1974]
Comparison of 1-d
basis functions for
block size N=8
Bernd Girod: EE398A Image and Video Compression Transform Coding no. 15
Separable transforms, I
A transform is separable, if the transform of a signal block of
size NxN can be expressed by
y AxAT Note: A A A
NxN transform Orthonormal transform NxN block of
coefficients matrix of size NxN input signal Transform Kronecker
matrix for product
The inverse transform is vectors
y = Ax
x AT yA
Great practical importance: The transform requires 2 matrix
multiplications of size NxN instead one multiplication of a
vector of size 1xN2 with a matrix of size N2xN2
Bernd Girod: EE398A Image and Video Compression Transform Coding no. 17
Coding gain with 8x8 transforms
16
18
14
15
Haar
12
12
Hadamard
GT dB
10 9 Slant
6 DCT Haar
Hadamard
8
KLT Slant
6
3
0
4
ill
n
d
an
I
R
ei
ne
dr
M
am
t
an
bi
ns
m
er
M
Ei
co
am
C
0
MRI Einstein Mandrill Cameraman combined
Bernd Girod: EE398A Image and Video Compression Transform Coding no. 18
Discrete Cosine Transform and
Discrete Fourier Transform
Transform coding of images
using the Discrete Fourier
Transform (DFT):
For stationary image statistics,
the energy concentration
properties of the DFT
converge against those of the
KLT for large block sizes.
Problem of blockwise DFT
coding: blocking effects due to
circular topology of the DFT
and Gibbs phenomena.
Remedy: reflect image at block
boundaries, DFT of larger
symmetric block DCT
Bernd Girod: EE398A Image and Video Compression Transform Coding no. 19
DCT
(2k 1)i
aik i cos
2N
for i, k 0,..., N 1
1
with 0
N
2
i i 0
N
Bernd Girod: EE398A Image and Video Compression Transform Coding no. 20
Amplitude distribution of the DCT
coefficients
Histograms for 8x8 DCT coefficient amplitudes measured for test image
[Lam, Goodman, 2000]
Test image
Bridge
1 1 v y2n
pYn y e y2
6
2v
e dv
5
0 2 v 2
1 2 y yn
3
e
2 2 yn
2
0
x
-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5
For a given block variance, coefficient pdfs are Gaussian
Gaussian mixture w/ exponential variance distribution yields a Laplacian
Gaussian mixture w/ half-Gaussian variance distribution yields pdf very
close to Laplacian [Lam, Goodman, 2000]
Elegant explanation of Laplacian pdfs of DCT coefficients
Bernd Girod: EE398A Image and Video Compression Transform Coding no. 22
Threshold coding, I
Bernd Girod: EE398A Image and Video Compression Transform Coding no. 23
Threshold coding, II
Bernd Girod: EE398A Image and Video Compression Transform Coding no. 24
Threshold coding, III
198 202 194 179 180 184 196 168 1480 26.0 9.5 8.9 -26.4 15.1 -8.1 0.3 185 3 1 1 -3 2 -1 0
187 196 192 181 182 185 189 174 11.0 8.3 -8.2 3.8 -8.4 -6.0 -2.8 10.6 1 1 -1 0 -1 0 0 1
188 185 193 179 188 188 187 170 -5.5 4.5 9.0 5.3 -8.0 4.0 -5.1 4.9 0 0 1 0 -1 0 0 0
184 188 182 187 183 186 195 174 DCT 10.7 9.8 4.9 -8.3 -2.1 -1.9 2.8 -8.1
Q 1 1 0 -1 0 0 0 -1
194 193 189 187 180 183 181 185 1.6 1.4 8.2 4.3 3.4 4.1 -7.9 1.0 0 0 1 0 0 0 -1 0
193 195 193 192 170 189 187 181 -4.5 -5.0 -6.4 4.1 -4.4 1.8 -3.2 2.1 0 0 0 0 0 0 0 0
181 185 183 180 175 184 185 176 0 0 0 0 0 0 0 0
5.9 5.8 2.4 2.8 -2.0 5.9 3.2 1.1
195 185 177 178 170 179 195 175
-3.0 2.5 -1.0 0.7 4.1 -6.1 6.0 5.7 0 0 0 0 0 0 0 0 Run-level
coding
Original 8x8 Transformed Zig-zag scan Mean of Block: 185
(0,3) (0,1) (1,1) (0,1) (0,1) (0,1) (0,-1) (1,1)
block 8x8 block (1,1) (0,1) (1,-3) (0,2) (0,-1) (6,1) (0,-1) (0,-1)
(1,-1) (14,1) (9,-1) (0,-1) EOB
Transmission
Mean of Block: 185
(0,3) (0,1) (1,1) (0,1) (0,1) (0,1) (0,-1) (1,1)
Reconstructed (1,1) (0,1) (1,-3) (0,2) (0,-1) (6,1) (0,-1) (0,-1)
(1,-1) (14,1) (9,-1) (0,-1) EOB
8x8 block
Run-level
192 201 195 184 177 184 193 174 185 3 1 1 -3 2 -1 0
189 191 195 182 182 187 190 171 1 1 -1 0 -1 0 0 1
decoding
188 185 190 181 185 187 189 171 0 0 1 0 -1 0 0 0
189 188 185 183 183 182 190 175 Scaling and inverse DCT 1 1 0 -1 0 0 0 -1
191 192 186 189 179 182 188 178 0 0 1 0 0 0 -1 0
190 191 189 190 177 186 184 179 0 0 0 0 0 0 0 0
189 188 185 184 175 186 187 179 0 0 0 0 0 0 0 0
189 188 178 176 173 183 193 180 0 0 0 0 0 0 0 0
Bernd Girod: EE398A Image and Video Compression Transform Coding no. 25
Detail in a block vs. DCT coefficients
block
quantized DCT reconstructed
DCT coefficients coefficients from quantized
image block of block of block coefficients
30 30
20 20
10 10
0 0
- 10 - 10
- 20 - 20
0 0
- 30 - 30
2 2
0 0
4 4
2 2
4 4
6 6
6 6
30 30
20 20
10 10
0 0
- 10 - 10
- 20 - 20
0 0
- 30 - 30
2 2
0 0
4 4
2 2
4 4
6 6
6 6
30 30
20 20
10 10
0 0
- 10 - 10
- 20 - 20
0 0
- 30 - 30
2 2
0 0
4 4
2 2
4 4
6 6
6 6
Bernd Girod: EE398A Image and Video Compression Transform Coding no. 26
Typical DCT coding artifacts
DCT coding with increasingly coarse quantization, block size 8x8
Bernd Girod: EE398A Image and Video Compression Transform Coding no. 27
Influence of DCT block size
18
16
2x2
15
14
4x4
12
12 8x8
9 16 x 16
GT dB 6 32 x 32
10
8
3
6
0
4
RI
ill
in
ed
an
dr
M
te
bin
am
2
an
ns
m
M
Ei
er
co
am
0
MRI Einstein Mandr ill Camer aman combined
C
Bernd Girod: EE398A Image and Video Compression Transform Coding no. 28
Fast DCT algorithm I
DCT matrix factored into sparse matrices
[Arai, Agui, and Nakajima; 1988]
y Ax
SPM 1 M 2 M 3 M 4 M 5 M 6 x
S0 1 1 1
S1 0 1 1 0 1 0
S2 1 1 1 1
S3 1 1 1 1
S P M1 M2
S4 1 1 1 1
S5 1 1 1 1 1
0 S6 1 0 1 1 0 1
S7 1 1 1 1 1
1 1 1 1 1 1 0 1
1 0 1 1 0 1 1 0 1 1
C4 1 1 1 1 1 1
1 1 1 1 0 1 1
M3 M 4 M5 M6
C2 1 1 1 1 1 0
C4 C6 1 1 1 1 1
0 C6 C2 0 1 0 1 1 1 1
1 1 1 1 0 1
Bernd Girod: EE398A Image and Video Compression Transform Coding no. 29
Fast DCT algorithm II
Signal flow graph for fast (scaled) 8-DCT [Arai, Agui, Nakajima, 1988]
scaling
only 5 + 8
multiplications
(direct matrix
multiplication:
64 multiplications)
1
Addition: a1 C4 s0
2 2
u a2 C 2 C 6
u+v 1
v a3 C4 sk k 1,...,7
4Ck
u a 4 C6 C2
u-v Ck cos
16 k
v a5 C6
Bernd Girod: EE398A Image and Video Compression Transform Coding no. 30
Transform coding: summary
Orthonormal transform: rotation of coordinate system in signal
space
Purpose of transform: decorrelation, energy concentration
Bit allocation proportional to logarithm of variance, equal
distortion
KLT is optimum, but signal dependent and, hence, without a
fast algorithm
DCT shows reduced blocking artifacts compared to DFT
8x8 block size, uniform quantization, zig-zag-scan + run-level
coding is widely used today (e.g. JPEG, MPEG, ITU-T H.261,
H.263)
Fast algorithm for scaled 8-DCT: 5 multiplications,
29 additions
Bernd Girod: EE398A Image and Video Compression Transform Coding no. 31
Reading
Bernd Girod: EE398A Image and Video Compression Transform Coding no. 32