Professional Documents
Culture Documents
Information
Channel Receiver
Source Transmitted Received
information information
2
Audio/Video Coding Applications
3
Detailed Communication Model
Transmitter
Information Data Source Channel
Encrypt Coding
Source Reduction Coding
Information
Noise
Channel
Receiver
Data Source Channel
Destination Reconstruction Decrypt Decoding
Decoding
4
Agenda
• Introduction
• Conclusion
5
Agenda
• Introduction
– Why compressing?
– Audio & Video basics
– MPEGx, & H.26x Compression Standards Overview
• Conclusion
6
Why compressing?
7
The need for compression
• Audio: Compression needed in spectral domain
m
ti e
8
Digital Audio
DAT 48 16 2 1,536
(Digital Audio Tape)
9
The need for compression
• Video: Compression needed in spatial domain
10
SQ
CI
F
(1
28
*9
6)
Q
3,69
C
IF
(1
74
*1
44
)
7,52
CI
F
(3
52
*2
88
)
30,41
4C
IF
(7
04
16 *5
Bit Rate (in Mbps)
76
CI
F )
4:
3
162,20
(1
40
8
*1
16 15
CI 2)
F
16
648,81
:9
(1
92
0*
11
52
Bit Rate versus Spatial Resolution
)
1061,68
12
The need for compression
• Channels available for A/V transmission
– Analog television channel (compatibility)
• Cable (bandwidth = 8MHz)
• Satellite (Bandwidth = 30-40MHz)
Capacity around 40Mbits/sec
13
Illustrative example
• Ratio between the required bit rate and largest possible bit
rate: 72.99Mbps/56kbps = 1289
– To accomplish the transmission over PSTN, a need to compress data by
at least 1289 times.
15
The need for compression
V di eo :166M b it/se c
1 4. M b it/se c
A ud oi :1 4. M b it/se c
C om p re s s oi n
16
The need for compression
• MPEG-2 target
– Program stream (DVD)
1 p ro g ram
(v di e o , 3 -9 M b it/se c (va rai b el b itra et )
m u ltci h a n n e l (b u th gi h e r q u a lity ht a nM P EG -1 )
a u d oi , ....)
= m o tvi a toi n of r ht e ca p a c ity
C om p re s s oi n ni c re a se o f ht e C D (--> D VD )
n p rog ram s
(v di eo , ab ou t 40 M b it /se c (con s tan t b itra te )
m u lt ci hanne l D
( V B -S a te llite & D VB C
- ab el )
aud oi , . . . .)
C om p re ss oi n
17
The need for compression
• Compression extends the playing time of a given storage
device.
18
Principles of Compression
• Compression (or Source Coding) is achieved by
suppressing information:
– redundant information
– irrelevant information
19
Principles of Compression
• Suppression of irrelevant information
lossy compression (Perceptive Coding)
Example: bandwidth limitation, masking in audio
The original signal and the one obtained after encoding and decoding
are different but are perceived as identical
20
Principles of Compression
• Lossless vs. lossy data compression
L0
– Source entropy H(X)
– Rate-Distortion function R(D)
Lossless methods
or D(R)
H(S)
22
Principles of Compression
• Reversible (lossless): data files (i.e.: V.42bis standard in
modems, zip files)
23
Principles of Compression
Bit
For Gaussian source N(0, 2)
Rate
Constant Bit Rate
Constant
2 2R
Quality DR 2
Complex
Simple
Distortion
24
Principles of Compression
• Constant Bit Rate systems –
CBR (G.711, G.722, G.729) are better suited for
connection-oriented services.
26
Principles of Compression
28
Principles of Compression
29
Principles of Compression
30
Principles of Compression
• Trade-off Complexity/Quality/Bit Rate
MPEG Layer 2
MPEG Layer 1
MPEG Layer 3
Other Technique
Speech coding MPEG AAC
Bitrate
32
Principles of Compression
Redundancies
Statistical Psychological
Redundancy Redundancy (HVS)
33
Quality Measurements
• Objective
– Mean Square Error (MSE)
– Peak Signal-to-Noise-Ratio (PSNR)
– Measure the fidelity to original video
• Subjective
– Human Vision System (HVS) based
– Emphasize audiovisual quality rather than fidelity
34
Quality Measurements
36
Quality Measurements
Speech Coding - Compression vs quality
Standard MOS
64 PCM (G.711)
G.711 (64 Kb/s): 4,10
Bit Rate (Kb/s)
32 ADPCM 32 (G.726)
24 ADPCM 24 (G.725)
38
Audio & Video Basics
39
Audio Basics
• Analog signal sampled at
• Example: 8,000 mono samples/sec,
constant rate 256 quantized values --> 64kbps
– telephone: 8,000 samples/sec • Receiver converts it back to analog
– CD music: 44,100 samples/sec signal:
• Each sample quantized, i.e., – some quality reduction
rounded
– e.g., 28=256 possible quantized
values Example rates
• Each quantized value represented • CD: 1.411Mbps
by bits • MP3: 96, 128, 160kbps
– 8 bits for 256 values • Internet telephony: 5.3 - 13kbps
– 16 bits for 65536 values (G.723.3, G.729, and GSM – Global
• Mono, stereo, or surround? System for Mobile communication)
– 1, 2 or more channels
40
Audio Basics:
Speech Coding and compression
• 5 quality ranges (human ear sensitivity: 20Hz to 20kHz):
43
Video Basics
• Operation of analogue television: The image captured by the camera lens
is converted into three monochrome images obtained by applying filters of
the three fundamental (primary) colors –
R (Red), G (Green), B (Blue).
– All kind colors are produced by using different proportions of these primary
colors
• Additive Color Mixing on a black surface
• Subtractive Color Mixing on a white surface
– RGB signals thus obtained are available in some cameras, though it is unusual
to work with them
44
Video Basics: Digital Video & Pixels
• Digital video is a sequence of frames, each consisting
of a rectangular grid of picture elements or pixels.
– For good colour video, 8 bits are used per pixel for each of
the RGB colours, resulting in 24 bits per pixel.
46
Video Basics : Digital Video & Pixels
Digital Camera
Film
Source: Digital Image Processing – Gonzalez, Woods. Prentice Hall
The Eye 47
Video Basics: Sampling & Quantization
48
Video Basics: Scanning
• When an image (frame) appears on the retina of the human
eye, the image is retained for several milliseconds before
decaying.
49
Video Basics: Scanning
51
Spatial and Temporal Sampling of a Video Sequence
Source: H.264 and MPEG-4 Video Compression. Video Coding for next generation multimedia. I.E.G. Richardson. John Wiley & Sons, Ltd. 2003. Chapter 2.
53
Video Basics: Color Format
• RGB is not efficient since it uses equal bandwidth for each
color component.
55
Video Basics: Color Format
• The combination is performed such that:
– One of the new signals collects all the information light or brightness of the
image, Y, this signal is called luminance.
– The other two signals, called U and V, correspond to different combinations of
the three original signals, chosen so that capture all the color information
which is why these two signals are generically referred to as chrominance.
56
Color Formats Conversion
Cr R Y
Y kr R k g G kb B Cg G Y Cr Cg Cb cste
Cb B Y
• kr, kg, kb are weighting factors
1 kr
Y kr R 1 kr kb G kb B R Y Cr
0.5
0.5 2k r 1 k r 2k b 1 k b
Cr R Y G Y Cr Cb
1 kr 1 k r kb 1 k r kb
0.5 1 kb
Cb B Y B Y Cb
1 kb 0.5
58
Color Formats Conversion
59
Video Basics: Color Format
http://www.yorku.ca/eye/photopik.htm
61
Video Basics: Color Format
• (Y, Cr, Cb) may use different resolutions 4:n:m: The numbers
indicate the relative sampling rate of each component in the
horizontal direction.
63
Video Basics:
Chrominance Downsampling
• 4:4:4 sampling: the three components
have the same resolution (3n bits per
pixel)
– a sample of each component exists at
every pixel position.
– Preservation of the full fidelity of the
chrominance components.
64
Video Basics:
Chrominance Downsampling
• 4:1:1 sampling: Cb and Cr have the
same vertical resolution as Y, but
quarter the horizontal resolution (1.5n
bits per pixel).
65
Video Basics: Spatial Resolution Formats
• CIF: Common Interchange (Intermediate) Format - Intermediate format used
in videoconferencing (communication between US & Europe)
QCIF SQCIF
CIF
SCIF
16CIF 4:3
16CIF 16:9
71
Video Basics: Spatial Resolution Formats
72
MPEG, what is it?
76
International Organizations
•ISO (1947): International Organization for Standardization;
•ISO/IEC JTC 1 (1987): Joint Technical Committee 1 of the ISO and the
IEC. It deals with all matters of information technology.
77
International Organizations (Cont’d)
• JPEG - ITU-T T.81, ISO/IEC IS 10918-1 : Joint Photographic Experts Group one of
two sub-groups of ISO/IEC Joint Technical Committee 1, Subcommittee 29,
Working Group 1 (ISO/IEC JTC 1/SC 29/WG 1) - titled as Coding of still pictures.
• MPEG: Moving Picture Experts Group (ISO/IEC JTC 1/SC 29/WG 11) - a working
group of ISO/IEC in charge of the development of standards for coded
representation of digital audio and video and related data.
• JVT: Joint Video Team - a group of video coding experts from ITU-T Study Group
16 (VCEG) and ISO/IEC JTC 1 SC 29 / WG 11 (MPEG), created to develop an
advanced video coding specification.
•Formed in 2001, the JVT’s main result has been ITU-T Rec. H.264 | ISO/IEC 14496-10,
commonly referred to as H.264/MPEG-4-AVC, H.264/AVC, or MPEG-4 Part 10 AVC.
78
MPEG: Moving Picture Experts Group
• Moving Picture Expert Group established in 1988 for the
development of digital video
– Still active (MPEG-21 is currently in development)
79
MPEG: Moving Picture Experts Group
80
List of MPEG standards
• MPEG-1 (ISO 11172)
The standard on which such products as Video CD and MP3 are based
(approved in Nov. 1992)
81
List of MPEG standards (Cont’d)
• MPEG-2 (ISO 13818)
The standard on which such products as Digital Television set top boxes and DVD
are based (approved in 1994, 1996);
– Compatible extension of MPEG-1 'up‘
– Oriented broadcast (interlaced video)
– Multiple resolutions standardized, from SIF (compatible with MPEG 1 up to
high definition formats for DVDs and so on.
– Intended for studio-quality audio and video. Broadcast quality HDTV also.
– Various bit rates 4-100Mb/s.(CBR & VBR)
– Useful for all types of applications (business, entertainment, etc.).
82
List of MPEG standards (cont’d)
• MPEG-4 (ISO 14496)
The standard for multimedia for the fixed and mobile web (Version 1 -
approved in Oct. 1998, Version 2 - approved in Dec. 1999, Versions 3, 4, 5)
– Computer Graphics Applications;
83
List of MPEG standards (cont’d)
• MPEG-4 (ISO 14496) …
84
List of MPEG standards (cont’d)
• MPEG-7 (ISO 15938) The standard for description and search of audio and
visual content (approved in Jul. 2001);
85
List of MPEG standards (cont’d)
• MPEG-A (23000) – Application-specific formats, integrating multiple MPEG technologies
• MPEG-V (23005) – Context and media control - interchange with virtual worlds
86
List of ITU-T Standards
• H.261 (1983-1990)
– A standard for video telephony and video conferencing
over PSTN (Public Switching Telephone Networks) and wireless
networks.
– Uses either the CIF or QCIF format.
– Uses p x 64kbps where p can be between 1 and 30.
– Originally designed for ISDN usage (Integrated Services Digital
Network).
– Still in use
• Low complexity, low latency
• Mostly as a backward-compatibility feature
• Overtaken by H.263
87
List of ITU-T Standards (cont’d)
• H.263, H.263+, H.263++ (1993-1999)
– Based on H.261 but offers significant improvement on
coding efficiency, employs advanced coding options and
lower resolutions to preserve quality over lower bit rates
channels.
– Uses either the QCIF or S-QCIF formats.
– Uses less than 64kbps.
– PSTN and mobile network: 10 to 24kbps
– Adopted by several videophone terminal standards:
H.324 (PSTN), H.320 (ISDN), H.310 (B-ISDN)
• H.264/AVC (1999-2003)
– Double the coding efficiency in comparison to any other
existing video coding standards
88
Chronological Table of Video Coding Standards
• Conclusion
94
Audio Compression principles
95
Speech Coding and Compression
• Source Coding
– Speech modeling and parameters transmission of
the model (G728, G729, …)
• Hybrid Coding
96
Audio compression
98
Audio Compression
• Sub-band Coding
– Techniques used in Layer I and II of MPEG audio are based
on sub-band coding.
• Transform Coding
– DCT is used in Layer III of MPEG audio.
• Predictive Coding
– Frequency prediction is used in AC-3 and MPEG AAC.
100
Common Audio Formats and Standards
• MPEG Audio
– Layer I
– Layer II
– Layer III
104
Audio compression
– Threshold of audibility
• Compress the bit rate without
affecting the quality perceived
– Frequency masking
by the human ears (based on the
imperfection of human ears)
– Critical bands
112
Audio compression
• Principle 1: Threshold of audibility
Not all frequency components need to be encoded with the
same resolution. Nr_bit(f) = (signal/threshold)db/6
http://www.audiodesignline.com
113
Audio compression
• Principle 2: Frequency masking
Analysis of the incoming signal
http://www.audiodesignline.com
114
Audio compression
• Principle 3: Critical bands
– Human ear may be modelled as a collection of narrow band filters
– Bandwidth of these filters = critical band
– critical band
(<100 Hz) for lowest audible frequencies
( 4 kHz) for highest audible frequencies
– The human ear cannot distinguish between two sounds having two different
frequencies in a critical band.
Example : when we hear 50 & 60 Hz at the same time we cannot distinguish
them.
– Consequence:
Noise masking threshold depends solely of the signal energy within a limited
bandwidth domain.
The largest sound is taken as the representative of the critical band.
Necessity to analyse the signal at 100Hz resolution at low-frequency
115
Audio compression
• Principle 4: Temporal masking
The masking that occurs when a sound raises the audibility
threshold for a brief interval preceding and following the
sound, selection of the frame duration for frequency analysis
and encoding.
http://www.audiodesignline.com
116
The MPEG encoder
http://www.audiodesignline.com
117
Audio features in MPEG
• MPEG1 :
– Mono/stereo/dual/joint stereo (Possibility Dolby surround)
– Sampling frequencies : 32, 44.1 & 48 kHz
– 3 layers : trade-off complexity/delay versus coding
efficiency of compression
– Various bit rate : trade-off quality versus bit rate
• MPEG2 :
– 5.1 channels
– Sampling frequencies extended to 16, 22.05 & 24 kHz
122
Layer I coding
• The Layer I coding scheme provides a 4:1 compression.
123
Layer II Coding
• The Layer II coder provides a higher compression rate by
making some relatively minor modifications to the Layer I
coding scheme.
130
Layer III Coding - MP3
• One of the problems with the Layer I and Layer II coding
schemes was that with the 32-band decomposition, the
bandwidth of the subbands at lower frequencies is
significantly larger than the critical bands.
131
Layer III Coding - MP3
• Layer III offers almost CD quality with less than 2 bits/sample (enables
transferring music files via Internet over 28.8kbps modems)
132
Layer III Coding - MP3
• First the 32-band subband decomposition used in Layer I and
Layer II is employed.
• The Layer III algorithm specifies two sizes for the MDCT, 6 or
18. This means that the output of each subband can be
decomposed into 18 frequency coefficients or 6 frequency
coefficients.
133
Advanced Audio Coding
134
Advanced Audio Coding
135
Video Compression principles
136
Video Compression
• Two applied techniques for video compression:
137
Video compression
• Result
– 4:2:0 SIF resolution : 30 Mbps
(= 25images/sec * 288lines * 352pixels * 1.5(lum & chrom) * 8bits)
138
Image Codec (e.g. JPEG)
Image Model Entropy Decoder
Transmit
/Store
140
Discrete Cosine Transform
DCT
141
Discrete Cosine Transform
• Any 8x8 block of pixels can be
represented as a sum of 64 basis
patterns (black and white patterns)
142
Quantize and zig-zag scanning
Quantize Zigzag
143
Video compression
• Spatial redundancy reduction (DCT example)
1 39 1 44 1 49 15 3 15 5 15 5 1 55 1 55 1 26 0 - 1 -1 2 -5 2 -2 -3 1
1 44 1 51 1 53 15 6 15 9 15 6 1 56 1 56 -2 3 -1 7 - 6 -3 -3 0 0 -1
1 50 1 55 1 60 16 3 15 8 15 6 1 56 1 56 DC T -1 1 - 9 - 2 2 0 -1 -1 0
1 59 1 61 1 62 16 0 16 0 15 9 1 59 1 59 -7 -2 0 1 1 0 0 0
1 59 1 60 1 61 16 2 16 2 15 5 1 55 1 55 -1 -1 1 2 0 -1 1 1
1 61 1 61 1 61 16 1 16 0 15 7 1 57 1 57 2 0 2 0 -1 1 1 -1
1 62 1 62 1 61 16 3 16 2 15 7 1 57 1 57 -1 0 0 -1 0 2 1 -1
1 62 1 62 1 61 16 1 16 3 15 8 1 58 1 58 -3 2 -4 -2 2 1 -1 0
144
Run-Length Encoding
RLE
145
Variable-Length Encoding
VLC
146
Image decoding
• Reverse the stages to recover the image
147
Video coding
• Moving images contain significant temporal redundancy
– Successive frames are very similar
148
Video Encoder
• Video frames
Motion Model
Motion
DCT Quantize Zigzag RLE VLC Buffer
Comp.
Motion
Vectors
Headers
Motion
Estim.
Motion
Vectors
Headers
Motion Estimation
• Process 16x16 luminance samples at a time (“macroblock”)
151
Motion Estimation
152
Motion Compensation
• Subtract the reference area from the current macroblock
– Difference macroblock
153
Motion Compensation
– In Motion Estimation (ME), each macroblock (MB) of
the Target P-frame is assigned a best matching MB
from the previously coded I or P frame - prediction.
154
Motion Compensation
• MPEG introduces a third frame type — B-frames, and its
accompanying bi-directional motion compensation.
155
B-frame Coding Based on Bidirectional Motion Compensation.
156
Motion Compensation
158
I Frames (Intra)
Intra frames are coded as self-contained,
without reference to other frames
18 KBytes I
18 KBytes I
18 KBytes I
18 KBytes I
18 KBytes I
25 frames
72 x 1024 x 8 / 0,16 = 3,7Mbps per second
159
P frames (Predictive)
Predictive frames are encoded using
motion compensation based on
previous I or P frame 18 KB I
6 KB P
6 KB P
18 KB I
6 KB P
6 KB P
18 KB I
B i-d ire c to
i na lp red c
i to
in
O rde r o f 0 1 2 3 4 5 6 7 8 9
p re sen ta to
in
I B B P B B P B B P I B
P red c
i to
in
O rde r o f
tran sm s
i so
in 0 3 1 2 6 4 5 9 7 8
I P B B P B B P B B I P
163
Synchronisation - Getting data on time
• Synchronisation in the multimedia context refers to the
mechanism that ensures a temporal consistent presentation
of the audio-visual information to the user
164
Streams
• Idea of continuity (pipelining): Carry time information for
clock recovery
165
Requirement on for stream transport
• Data information
BER (Bit Error Rate) requirement
No repetition of frame possible FEC (Forward
Error Correction)
166
Agenda
• Introduction
• Conclusion
167
MPEG Versions
• MPEG-1
– For video storage in CD-ROM & transmission over T-1 lines (1.5Mbps)
• MPEG-2
– Many options: 352x240 pixel; 720x480 pixel; 1440x1152 pixel;
1920x1080 pixel
– Many profiles (set of coding tools & parameters)
• Main Profile
– I, P & B frames; 720x480 conventional TV
– Very good quality @ 4-6 Mbps
• MPEG-4
– <64kbps to 4Mbps
– Designed to enable viewing, access & manipulation of objects, not only
pixels
– For digital TV, streaming video, mobile multimedia & games
168
MPEG Coding Standard
• Motion Picture Expert Group (MPEG)
– Video and audio compression & multiplexing
– Video display controls
• Fast forward, reverse, random access
• Elements of encoding
– Intra- and inter-frame coding using DCT
– Bidirectional motion compensation
– Group of Picture structure
– Scalability options
169
Video H.26x
• ITU-T video Standards for video conferencing: low speed,
low turnover. Less action in movies.
– H.261: Developed in the late 80 for ISDN (constant flow).
– H.263, H.263+, H.264. More modern and efficient.
170
Video H.26x (Cont’d)
• Subsampling 4:1:1
• Resolutions:
– CIF (Common Interchange Format): 352 x 288
– QCIF (Quarter CIF): 176 x 144
– SCIF (Super CIF): 704 x 576
171
The MPEG model
A ud oi A ud oi A ud oi A ud oi
s gi na l en code r de c ode r s gi na l
M u ltpi el xe r T ran sm si s oi n D em u lt i-
cha nne l p el xe r
V di eo V di eo V di eo V di eo
s gi na l en code r D gi ita l s to rage m ed uim de c ode r s gi na l
or
N e wt o rk
C ap tu red s gi na sl P re sen ted s gi na sl
172
Components of the MPEG standard
• The MPEG standard is composed of 3 main parts :
– Audio : Specifies the compression of audio signals
– Video : Specifies the compression of video signals
– System : specifies how the compressed audio and video signals are
combined in the multiplexed stream (program stream or transport
stream).
174
MPEG in a communication context
• A simple view of MPEG in the communication context
ES TS (T ran spo r tS tre am )
E( elm en at ry or
S tre am ) PS P( ro g ram S tream )
TS A da p -
at toi n
A ud oi , M u lt i- ot ht e
v di eo p el x ni g ch ann e l C a b el
so u rce s TS
(n p ro -
V di eo g ram s )
E n code r A da p -
at toi n
A ud oi ot ht e S a et lliet
en cod e r ch ann e l
PS
M u lt i-
p el x ni g
PS A da p -
(1 p ro - at toi n
g ram ) ot ht e D si c
ch ann e l
179
JPEG Coding Standard
• Key Components:
– Transform:
• 8×8 DCT
• boundary padding
– Quantization:
• uniform quantization
• DC/AC coefficients
– Coding:
• Zigzag scan
• run length/Huffman coding
180
JPEG Baseline Coder
Tour Example
183 160 94 153 194 163 132 165
183 153 116 176 187 166 130 169
179 168 171 182 179 170 131 167
177 177 179 177 179 165 131 167
178 178 179 176 182 164 130 171
179 180 180 179 183 169 132 169
179 179 180 182 183 170 129 173
180 179 181 179 181 170 130 169
181
Step 1: Transform
• DC level shifting
• 2D DCT
55 36 34 25 66 35 4 37 313 56 27 18 78 60 27 27
55 25 12 48 59 38 2 41 38 27 13 44 32 1 24 10
51 40 43 54 51 42 3 39 20 17 10 33 21 6 16 9
49 49 51 49 51 37 3 39
DCT 10 8 9 17 9 10 13 1
50 50 51 48 54 36 2 43 6 1 6 4 3 7 5 5
51 52 52 51 55 41 4 41 2 3 0 3 7 4 0 3
51 51 52 54 55 42 1 45 4 4 1 2 9 0 2 4
52 51 53 51 53 42 2 41 3 1 0 4 2 1 3 1
182
Step 2: Quantization
16 11 10 16 24 40 51 61
12 12 14 19 26 58 60 55 Why increase
Q-table 14 13 16 24 40 57 69 56
14 17 22 29 51 87 80 62 from top-left to
18 22 37 56 68 109 103 77 bottom-right?
24 35 55 64 81 104 113 92
49 64 78 87 103 121 120 101
72 92 95 98 112 100 103 99
313 56 27 18 78 60 27 27 20 5 3 1 3 2 1 0
38 27 13 44 32 1 24 10 3 2 1 2 1 0 0 0
20 17 10 33 21 6 16 9 Q 1 1 1 1 1 0 0 0
10 8 9 17 9 10 13 1 1 0 0 1 0 0 0 0
6 1 6 4 3 7 5 5 0 0 0 0 0 0 0 0
2 3 0 3 7 4 0 3 0 0 0 0 0 0 0 0
4 4 1 2 9 0 2 4 0 0 0 0 0 0 0 0
3 1 0 4 2 1 3 1 0 0 0 0 0 0 0 0
183
Step 3: Entropy Coding
20 5 3 1 3 2 1 0
3 2 1 2 1 0 0 0
1 1 1 1 1 0 0 0
1 0 0 1 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
Zigzag Scan
(20,5,-3,-1,-2,-3,1,1,-1,-1,
0,0,1,2,3,-2,1,1,0,0,0,0,0,
0,1,1,0,1,EOB)
185
H.261 & MPEG1
186
H.261 Coding Standard
• Background:
– Facilitate video conferencing and videophone service over
ISDN
– p×64 kbps
• p=1: videophone;
• p>5: videoconference;
• p=30: VHS-quality;
– Basis of MPEG-1 and MPEG-2
• Features
– Maximum coding delay of 150ms
– Amenable to low-cost VLSA implementation
187
Input Image Formats
CIF QCIF
188
Video Multiplex
• It defines a data structure so that a decoder can
interpret the received bit stream without any
ambiguity
189
Picture and GOB Layers
190
Macroblock and Block Layers
Y1 Y2
Cr Cb
Y3 Y4
191
Compression Modes
• Intra Mode
– Similar to JPEG coding
– Support two compression modes
• Inter Mode
– ME is not specified (MC is optional)
– Usually, 16-by-16 BMA, integer-pel accuracy,
search range [-15,15]
– Support various compression modes
192
H.261 Encoder
Intra
Huffman
8x8 DCT Q VLC
block
- Inter
Q-1
Filter
CRC error p x 64
and
Frame Fixed-length
I-DCT
Memory control
Motion
Estimation Motion Vector
• The MPEG-1 standard is also referred to as ISO/IEC 11172. It has five parts:
– 11172-1 Systems,
– 11172-2 Video,
– 11172-3 Audio,
– 11172-4 Conformance, and
– 11172-5 Software.
198
Hierarchical Data Structure
• Sequences are formed by Group Of Pictures (GOP)
200
Hierarchical Data Structure
202
Slices in an MPEG-1 Picture.
203
Video MPEG (MPEG-1)
• Subsampling 4:2:0 (25% more savings than 4:2:2)
204
MPEG-1 Video
• Typical Sequence (360ms): I1 B2 B3 P4 B5 B6 P7 B8 B9 I10
• Order of encoding / decoding : I1 P4 B2 B3 P7 B5 B6 I10 B8 B9
210
Audio MPEG-1
• Mono or stereo sampling to 32, 44.1 (CD) or 48 (DAT) kHz. If you are
using a reduced bit rate it is desirable to sample at 32 kHz.
• Psychoacoustic compression (with losses) asymmetric.
• From 32 to 448 kbps per audio channel
• Three layers in ascending order of complexity/quality:
– Layer I: good quality with 192-256 kbps per channel is not used
– Layer II: 96-128 kbps CD quality per channel
– Layer III: quality CD with 64 kbps per channel
• Each layer introduces new algorithms, and includes those of the
above.
• Layer III used in DAB (Digital Audio Broadcast) and MP3
214
MPEG-1System
• Responsible for ensuring the synchronization between
audio and video through a system of time slots (
'timeslots') based on a clock of 90kHz.
216
Synchronization of audio and video MPEG
217
Prototypical Decoder
ISO/IEC 11172
219
Major Differences from H.261
225
H.263/H.263+ & MPEG2
226
Video Codecs: H.263
• Frame-based coding
• Low Bit rate Coding:
– < 64 kbps (typical)
• Coding Control
– Intra/Inter switch
230
Advanced Coding Modes in H.263
I B P B P …
231
H.263+
• Advanced intra coding mode • Temporal, SNR and Spatial
scalability mode
• Deblocking filter mode
• Reference picture resampling
• Slice structure mode mode
• Supplemental enhancement • Reduced resolution update mode
information mode
235
MPEG-2
• MPEG-2: For higher quality video at a bit-rate of more than 4
Mbps.
244
Video MPEG-2
• Compatible extension of MPEG-1
247
Bit rates of Levels and Profiles MPEG-2
Profiles Simple Main SNR Spatial High 4:2:2
Scalability Scalability (Studio)
Subsampling 4:2:0 4:2:0 4:2:0 4:2:0 4:2:0/2 4:2:2
High 1920x1152 80Mbps 100Mbps
(HDTV 16:9)
High -1440 60Mbps 60Mbps 80Mbps
1440x1152
(HDTV 4:3)
Levels
The peak rates are shown under the standard for each combination of profile and level.
248
Five Modes of Predictions
• MPEG-2 defines Frame Prediction and Field Prediction as well
as five prediction modes:
249
Five Modes of Predictions
3. Field Prediction for Frame-pictures:
The top-field and bottom-field of a Frame-picture are treated
separately. Each 16×16 macroblock (MB) from the target Frame-
picture is split into two 16×8 parts, each coming from one field. Field
prediction is carried out for these 16×8 parts.
This mode is good for a finer MC when motion is rapid and irregular.
250
Five Modes of Predictions
5. Dual-Prime for P-pictures:
First, Field prediction from each previous field with the same parity
(top or bottom) is made. Each motion vector mv is then used to derive
a calculated motion vector cv in the field with the opposite parity
taking into account the temporal scaling and vertical shift between
lines in the top and bottom fields. For each MB the pair mv and cv
yields two preliminary predictions. Their prediction errors are
averaged and used as the final prediction error.
This is the only mode that can be used for either Frame-pictures or
Field-pictures.
251
Supporting Interlaced Video
• MPEG-2 must support interlaced video as well since this is one of
the options for digital broadcast TV and HDTV.
252
Audio MPEG-2
• Algorithms:
– Version compatible with MPEG-1 Layer I, II and III
– Improved Compression System Advanced Audio Coding (AAC).
Comparable quality to MPEG-1 layer III with 50-70% of flow. Not
compatible with MPEG-1.
• Channels:
– Stereo version compatible with MPEG-1
• Independent (each channel)
• Set (exploits redundancy between channels)
– Support multi-channel (languages) and 5.1 (5 channels surround)
259
MPEG-2 Scalabilities
• The MPEG-2 scalable coding: A base layer and one or more enhancement
layers can be defined — also known as layered coding.
261
MPEG-2 Scalabilities (Cont’d)
• MPEG-2 supports the following scalabilities:
5. Data Partitioning — quantized DCT coefficients are split into partitions (Separate
headers and payloads apart).
262
Non-Scalable
Decoder 1
Decoder 2
Decoder 3 265
Decoder 4
PSNR Scalability (Quality)
1 0 1 1 1 … 0 1 0 1 0 0 0 … 1 1 0 1 0 0
272
Hybrid Scalability
• Any two of the above three scalabilities can be combined
to form hybrid scalability:
1. Spatial and Temporal Hybrid Scalability.
2. SNR and Spatial Hybrid Scalability.
3. SNR and Temporal Hybrid Scalability.
276
Data Partitioning
• The Base partition contains lower-frequency DCT coefficients,
enhancement partition contains high-frequency DCT
coefficients.
277
Major Differences from MPEG-1
• Better resilience to bit-errors: In addition to Program Stream, a
Transport Stream is added to MPEG-2 bit streams.
• More restricted slice structure: MPEG-2 slices must start and end in
the same macroblock row. In other words, the left edge of a picture
always starts a new slice and the longest slice in MPEG-2 can have
only one row of macroblocks.
278
Major Differences from MPEG-1 (Cont’d)
279
Other Improvements
MPEG-I MPEG-II
280
Videoconference
• Interactive communication through audio, video and
data sharing
• It can be:
– Point to point
– Point to multipoint
– Multipoint to multipoint
282
Requirements / Features of the
videoconference
• Compression / Decompression in real time.
• Mobility disabled.
283
Videoconference Standards
• Videoconferencing systems have been standardized by the
ITU-T (International Telecommunications Union -
Telecommunications sector) in the standards of the series H
(multimedia and audiovisual systems)
284
H.32x Standards
Standard Physical Service Type Year approval
environment
H.320 ISDN Circuit 1990
Streaming a/v
128 to 384 Kb/s
H.321 ATM Circuit
H.322 IsoEthernet TDM
H.323 Ethernet Packet 1996
Streaming a/v
14,4 - 512 Kb/s
H.324 analog Modem Circuit
The H.32x are standards umbrella. Each is based on a previous set of standards to
specify all the necessary services in a videoconference.
e.g., G.711 audio coding
285
H.320 Standard
286
H.323 Standard
• Packet-based multimedia communications systems
287
H.320 & H.323 Standards
ISDN IP
288
H.320 & H.323 Standards
H.323 H.320
Control H.225.0 Call Control Q.931
H.245 System Control H.242
H.225.0 Multiplexing H.221
Media G.711 Audio G.711
G.722 G.722
G.723.1 G.728
G.728
H.261 Video H.261
H.263 H.263
T.120 Data T.120
289
H.32x audio Formats
G.711 64 1:1 64
G.722 224 3,5-4,6 : 1 48-64
G.723.1 64 10 : 1 6,4
G.728 64 4:1 16
G.729 64 8:1 8
MPEG 706 3-11 : 1 64-256
290
Agenda
• Introduction
• Conclusion
294
Some Digital Audio Formats
Sampling Freq. Capacity per Channel
Format # Channels Application
(KHz) (Kb/s)
PCM (G.711) 8 1 64 Telephony
295
Digital Video Formats
Color Frame Rate Raw Data Rate
Video Format Y Size
Sampling (Hz) (Mbps)
296
Compressed video standard resolutions
MPEG-4
MPEG-1
MPEG-2 Low Principal High 1440 High
297
Video compression formats
298
Video compression formats Bit rates
299
Video compression formats
Type Method Format Original Compressed
300
Agenda
• Introduction
• Conclusion
303
References
• Yun Q. Shi, Huifung Sun, 2008. Image and Video Compression for Multimedia Engineering.
Fundamentals, Algorithms, and Standards. CRC Press.
• Gonzalez, Woods, 2008. Digital Image Processing. Prentice Hall.
• Jae-Beom Lee, Hari Kalva, 2008. The VC-1 and H.264 Video Compression Standards for
Broadband Video Services. Springer.
• H.R. Wu & .R. Rao, 2006. Digital Video Image Quality and Perceptual Coding. Taylor & Francis
Group. LLC.
• Khalid Sayood, 2005. An introduction to data compression. Morgan Kaufmann Publishers.
• I.E.G. Richardson, 2003. H.264 and MPEG-4 Video Compression. Video Coding for next
generation multimedia. John Wiley & Sons, Ltd.
• Richardson, 2002. Video Codec Design. John Wiley & Sons.
• John WATINSON, 2001. The MPEG Handbook MPEG1, MPEG2, MPEG4. Focal Press.
• Ghanbari, 1999. Video coding: an introduction to standard codecs. IEE Press.
• Riley and Richardson, 1997. Digital Video Communications. pub. Artech House.
• Bhaskaran V, Konstantinides, 1996. Image and video compression standards – algorithms and
architectures. Kluwer academic publishers.
• Netravali, A N and Haskell, B G, 1995. Digital pictures: Representation, Compression and
Standards. 2nd Edition, Plenum Press.
304
References
• www.chiariglione.org/mpeg/
• http://www.mpeg.org
• http://jura1.eng.rgu.ac.uk/ (Digital Video pages)
• http://www.vcodex.com
305