You are on page 1of 5

(IJCNS) International Journal of Computer and Network Security, 1

Vol. 2, No. 4, April 2010

RD-Optimisation analysis for H.264/AVC scalable


video coding
Sinzobakwira Issa1, Abdi Risaq M. Jama2 and Othman Omar Khalifa3
1
Olympia College, School of Engineering
Persiaran Raja Chulan, 50200 Kuala Lumpur, Malaysia
Issa10issa@gmail.comgmail.com
2, 3
International Islamic University Malaysia, Department of Electrical and Computer Engineering
Jalan Gombak, Box: 53100 Kuala Lumpur, Malaysia

Abstract: The development of multimedia propagations and The clear picture is an example of a 100 minutes movie
applications has led to a greater expansion in the field of video displayed at 30 frames per second with width of frame
transmission over a heterogeneous media as well as iterative 640x480 pixels with each pixel taking 3 bytes of memory.
delivery platforms with dedicated content requirements. It is The reality shows that, for each second of the movie, the
known that conventional video coding systems encode video requirement be at least 27MB of memory; as a result, the
content with given bitrates adapted to a specific function or entire movie will need almost 162GB of memory. If this
application. As a result, conventional video coding does not meet
movie were stored on DVD’s, then considered the current
the fundamental requirements of the state-of-the-art flexible
digital media application. The newly technology based on
DVD capacity of 4.7 GB, would roughly require 35 DVD’s.
scalable video coding appears as a new modus operandi that has Therefore, video needs to be compressed considerably for
the ability to satisfy the underlying requirements. In this work, efficient storage and sharing over the web [1]
a multi-users scenario was considered for an optimum However, there are a lot of redundancies within the video
performance between multiple streams. A rate distortion data that can be eliminated yielding file size reduction or
optimized video frame dropping strategy which can be applied compression.
on active network nodes during high traffic intensity was
developed. The concept of scalability here, come to introduce the 2. H.264/AVC Scalable Video Coding
operability of high level of suppleness coding and decoding
systems. A base layer which can display the suitable quality of
the premium file was considered and take care of the
2.1 Basic H.264/AVC structure
improvement of video quality. The H.264/AVC standard has a range of coding tools
contributing to its high compression performance, flexibility
Keywords: Bitrates, PSNR, bandwidth, multi-users scenario and robustness. However, the performance improvements
and RDO. come at a cost of significantly high computational
complexity. Therefore, encoder implementations should
1. Introduction make use of the available coding tools effectively to achieve
the desired compression performance with the available
The past few decades, starting in the early nineties, a processing resources.
remarkable development has been achieved in the field of H.264/AVC is an extremely scalable video codec,
video compression. A lot of efforts were and still are being delivering excellent quality across the entire bandwidth
exerted for compressing, storing data in digital medium and spectrum, from high definition television to the video
allocation over the web. conferencing and 3G mobile multimedia. The following can
It is very crucial to have the idea of monochrome digital thusly be summarized as the important differences.
video data sequence which is a set of individual pictures • Enhanced motion prediction capability
called frames happening at predetermined time increments. • Use of a small block-size exact match transform
This frame needs to be considered as a light intensity of two
• Adaptive in-loop deblocking filter
dimensions in terms of function of variable x and y; f(x, y),
• Enhanced entropy coding methods
where x and y denote special coordinates and the value off at
any point (x, y) is proportional of the brightness of the frame
or the gray level at the point for monochrome. The normal
standard speed at which these frames are displayed is 30
frames per second.
This representation is called canonical representative.
However, this canonical representation has negative impact
because it needs very huge amounts of memory, resulting in
impracticality of being stored or shared on the web or to be
launched into digital channel. The fact may seem as an
amusing game when we try to illustrate how it could be
done.

Figure 1. H.264/AVC structure


2 (IJCNS) International Journal of Computer and Network Security,
Vol. 2, No. 4, April 2010

2.2 Scalable Video Coding of a given video. Doing this, the encoder would be looking
to minimize distortion in a sequence of particular video.
Scalable video coding is desirable in heterogeneous and Rate-distortion Optimisation (RDO) methods used in video
error-prone environments for various reasons. For example, compression are discussed in [6] [2], which include dynamic
scalable coding helps streaming servers avoid congestions in programming and Lagrange optimisation methods.
network by allowing the server to reduce the bitrate of A Lagrange optimisation method, which is also known as
bitstreams whilst still transmitting a useable bitstream. Lagrange multiplier method, offer computationally less
One application for scalability is to improve error complex (although sometimes sub-optimal) solutions to the
resilience in transport systems that allow different qualities optimisation problem was proposed. Due to its less complex
of service. nature, a specific form of the Lagrange optimisation method
For example, the essential information could be delivered has been used in rate-distortion optimisation of H.264/AVC
through a channel with high error protection. Scalability can [10].
also be used to enable different quality representations
depending on playback devices processing power. 3.2 Constrained Optimisation Problem
Devices with better processing power can decode and
display the full quality version, whereas devices having The objective function within source constraints is
lower processing power decode the lower quality version. minimized or maximized by the constrained optimization.
In the case here of video coding standards, this issue of
2.3 Types of SVC constrained optimization can be considered as reducing the
There are three conventional types of scalability: temporal, amount of distortion of a given video sequence meaning to
quality and spatial. Temporal scalability enables adjustment strive looking to increase the number of bits that can be
of picture rate. encoded in exactly that particular coding sequence[4].
a) This is commonly carried out with either disposable Below is the mathematical representation of the constrained
pictures or disposable sub-sequences, which are optimization unit;
explained later on. Picture rate adjustment is then Let S represent all the allowable vectors and let B an
simply done by removing these disposable parts element of S, (BЄS). The objective function is defined for all
from the coded sequence thus lowering the frame B in S as D(B) and the constraint function R(B) is defined
rate. for all B in S . The constrained problem can be presented as:
b) In conventional quality scalability, also known as Given a constraint Rc, find
SNR scalability, an enhancement layer is achieved
with pictures having finer quantizers than the
particular picture in the lower reference layer[3]. BЄS

In coarse-granularity quality scalability, pictures in Subject to


enhancement layers may be used as prediction
references and therefore all the enhancement layer
pictures in a group of pictures typically have to be
disposed as a unit. In fine granularity scalability, The solution (BЄS*) to the problem satisfies that R (B*) ≤
the use of enhancement layer pictures as prediction Rc and D (B*) ≤D (B) for all B
sources is limited and therefore finer steps of In S*, where
bitrate can be achieved compared to coarse-
granularity scalability.
c) Finally, spatial scalability is used for creation of
multi-resolution bitstreams to meet different That is, if the solution to the problem is B*, then there is
display requirements or constraints and is very no other B in S which satisfies the constraint Rc, that will
similar to SNR scalability [5]. result in a smaller value for the objective function than D
A spatial enhancement layer enables recovery of coding loss (B*). The Lagrange multiplier theory offers a way of
between an up-sampled version of the reconstructed layer solving the above constrained problem (i.e. finding B*) by
used as a reference by the enhancement layer and a higher representing the problem as an unconstrained problem [3].
resolution version of the original picture.
3.3 Major Theorem
3. Rate Distortion Optimization
The constrained optimisation problem was presented earlier
3.1 Lagrangian multiplier method in previous section, equation (2). The Lagrange theory
represents the constrained problem as an unconstrained
In H.264/AVC, it is the art of the encoder to have the ability problem as follows:
of having the effective way of encoding a given video Theorem: for any λ≥0, the solution B*(λ) to the
sequence by selecting among numerous ranges of modes and unconstrained problem
parameters.
The encoder targets to achieve optimum rate distortion
performance by choosing the best of modes and parameters
(IJCNS) International Journal of Computer and Network Security, 3
Vol. 2, No. 4, April 2010

This is considered as solution of the constrained problem in


(1) presenting Rc = R (B* (λ)) as the constraint.
Proof of the theorem The optimum coding mode M* (if one exists) can be found
If B* (λ) is the solution to the constrained problem (4) then: by solving (14). That means, when the macroblock is coded
in mode M* it would satisfy the target rate (R (M*) =
Rc). All the other modes (if they exist) that satisfy R (M)≤ Rc
will have a higher distortion than D(M*).
Therefore, The term D (M) + λ .R (M) in equation (4-9) is called the
Lagrangian rate-distortion cost. The mode that minimises
the Lagrangian rate-distortion cost for a particular λ≥0
(which satisfies the rate constraint in the constraint
If this is true for all B in S, it is true for a subset of B in S problem) is selected as the solution mode for the constrained
where, problem.

4. Methodology
Now, for the above subset and for any λ≥0:
4.1 Objective video quality measurement

Objective video quality measurements are used to measure


the video quality, typically in situations where fast
Therefore with the constraint Rc = R (B* (λ)), the solution (sometimes online) and repeatable measurements of the
B* for the unconstrained problem is also the solution for the distortion or the difference between the video under test and
constrained problem. a reference video are needed [7].
It should be noted that the theory does not guarantee a
solution for the constrained problem. It only states that for 4.2 PSNR
any λ≥0 of the unconstrained problem, there is a
corresponding constrained problem which has the same The Peak Signal to Noise Ratio (PSNR) is the most
solution as the unconstrained problem. commonly used objective measure of video quality. PSNR is
measured as follows:
3.4 Optimisation problem

Consider a macroblock, for which the encoder can encode


the macroblock using only one of the ‘K’ possible modes
given by the set m = {m1,, m2, … , mK }. Let ‘M’ (M Єm) be Where n is the bit depth and MSE is the Mean Squared
the mode selected to code the macroblock. In the context of Error between corresponding pixel values of the original
H.264/AVC, these mode allocations could be any allowable image and the current image of the sequence under test. For
combination of macroblock partition modes, Quantisation M × N array of pixels, MSE is given by:
Parameters (QP), choice of reference frames etc… so that
the K possible modes will include all the possible admissible
parameter combinations for the
Macroblock
Define the objective function D(M) and constraint
function R(M) , where D(M) and R(M) are distortion and Where Po (i, j) denotes a pixel from the original image
rate of the macroblock as a result of selecting a particular and Pi (i, j) denotes the corresponding pixel from the test
coding mode. If the rate constraint is Rc, the constraint image. The parameters ‘i’ and ‘j’ point to a position in the
problem is defined as: pixel arrays.
Find the coding mode M*, The MSE in itself can be a measure of distortion.
However PSNR is preferred because the log scale provides a
more realistic mapping to quality variations. Therefore,
PSNR continues to be the most commonly used objective
Subject to quality measure [5].

5. Implementation
This may be written as an unconstrained problem using a For the objectives to be achieved, software video simulation
Lagrange multiplier: tool JSVM was used to implement and test the algorithms.
There are plenty and different H.264/AVC reference
software. JSVM was chosen for this research due to its
flexibility of varying parameters.
Where the solution to (4-11), M*, would satisfy, JSVM codec is commonly used to test new algorithms in
the video community. The use of this reference software
4 (IJCNS) International Journal of Computer and Network Security,
Vol. 2, No. 4, April 2010

enables realistic comparison of the performance of different slices one by one. If the link is still overloaded, the spatial
algorithms developed by different researchers. The source enhancement layers are dropped next in the same spirit, i.e.,
code is mainly the same as the one used in the C scale out the enhancement layers completely sticking only to
programming language [8]. the base layer. The optimized SVC offers better quality than
the unoptimized SVC one
6. Results analysis

In this part of the simulation, basic parameters such as


frame rate of 30 Hz, number of frames 300 and group of
pictures 16 were taken into consideration. Set of stiff video
were used to evaluate the performance such as foreman,
garden, football, flower, Claire and Carphone. The PSNR
versus bitrates graph for various group of pictures were
studied in difference circumstances. Below are different
cases that were taken into consideration:

Initially, spatial dimensioning is represented by


QCIF and CIF, but was taken without additional progressive
refinement (PR) slices. With additional PR, the transform Figure 4. Rate distortion optimization for scalable coding
coefficients are refined thus the improvement of the
reconstructive pictures’ quality. The performance clearly Comparing to single-layer at higher bitrates, also, when the
proves that the PSNR varies with the quality. outgoing capacity Rout is larger than the required incoming
rate, at 1670 kB/s, the RD-optimized single-layer coding
and unoptimized coding perform the same.
This is obvious, as at higher bitrates, the network link
will rarely overflow and very few or no video packets are
lost. However, if the outgoing rate is very small, it can be
seen that SVC strategy leads to good improvements in terms
of reconstructed video quality. Table 1 shows the
improvements obtained for individual video streams for the
outgoing link Rout = 600 Kbit/s.

Figure 2. Sequential scalable coding (Foreman)

In this case, several spatial resolution or bitrates are taken


into consideration or provided by the encoded bitstream.
The result shows that the PSNR is directly proportional to
bitrates.

Figure 5. evaluation of SVC and SLC

Table 1: comparison of different video streams

Sequences Scalable Video Coding Single layer coding

Optimized Unoptimized Optimized Unoptimized


(dB) (dB) (dB) (dB)

Garden 45.0008 38.5645 42.5682 40.2658

Foreman 34.5545 35.2564 32.5654 30.1254


Figure 3. Single Layer coding
Football 37.2356 37.0052 36.2545 36.5485

Based on Lagrangian Cost Function, if a video frame is to Flower 40.3215 39.0235 37.5468 37.6256
be sent on the outgoing link, it is first placed in the output Claire 36.2597 36.4566 31.2564 32.2564
buffer. Note that, for simplicity, we don’t consider the buffer
Carphone 41.3255 38.4552 38.2545 39.2545
limitations for the simulations in here.
If the outgoing link cannot accommodate all the video
packets, it will first drop the additional enhancement PR
(IJCNS) International Journal of Computer and Network Security, 5
Vol. 2, No. 4, April 2010

7. Recommendations [3] K. P. Lim, "JVT -I020, Fast INTER Mode Selection."


San Diego: ISO/IEC MPEG and ITU-T VCEG Joint
Although the video coding standards exhibit acceptable Video Team, 2003.
quality-compression performance in many visual [4] X. Li. Scalable video compression via over complete
communication applications, further improvements are motion compensated wavelet coding. Signal Processing:
desired and more features need to be added, especially for Image Communication, special issue on
some specific applications. The important considerations subband/wavelet interframe video coding, 19:637—651,
for video coding schemes to be used within future networks August 2004.
could be bases on Compression efficiency, robustness with [5] S.-R. Kang, Y. Zhang, M. Dai, and D. Loguinov, \Multi-
respect to packet loss, adaptability to different available layer active queue management and congestion control
bandwidths and adaptability to memory and computational for scalable video streaming," in Proc. IEEE ICDCS,
power for different clients. Tokyo, Japan, Mar. 2004, pp. 768{777}.
Several other communication and networking issues are also [7] T. Oelbaum, V. Baroncini, T. K. Tan, and C. Fenimore,
relevant, such as scalability, robustness, and interactivity. “Subjective quality assessment of the emerging
A network with a single active node was considered, in our AVC/H.264 video coding standard,” International
simulations. This could be further enhanced to more Broadcasting Conference (IBC), Sept., 2004.
practical situations with a hierarchy of many active network [7] R. Leung and D. Taubman. Impact of motion on the
nodes and perform rate shaping at every node accordingly. random access efficiency of scalable compressed video.
Different values for the Lagrangian multiplier λ could be Proc. IEEE Int. Conf. Image Processing, 3:169—172,
modeled for more stringent buffer conditions. A reasonable September 2005.
value for λ can be determined in maximizing the [8] R. Leung and D. Taubman. Perceptual mappings for
Lagrangian cost function, since λ is determined as a visual quality enhancement in scalable video
function of buffer fullness. compression. Proc. IEEE Int. Conf. Image Processing,
The scalable video coding approach could be further 2:65—68, September 2005.
extended to MCTF based scalable video codec which [9] R. Leung and D. Taubman. Transform and embedded
employs an open-loop architecture. coding techniques for maximum efficiency and random
accessibility in 3-D scalable compression. IEEE Trans.
Image Processing, 14(10):1632—1646, October 2005.
8. Conclusion [10] R. Leung and D. Taubman. Minimizing the perceptual
impact of visual distortion in scalable wavelet
The choice of a Scalable Video Coding framework in this
compressed video. Proc. IEEE Int. Conf. Image
context brings technical and economical advantages. Under
Processing, October 2006.
this framework, network elements can adapt the video
[11] R. Leung and D. Taubman. Perceptual optimization for
streams to the channel conditions and transport the adapted
scalable video compression based on visual masking
video streams to receivers with acceptable perceptual
principles. IEEE Trans. Circuits Syst. Video Technol.,
quality. The advantages of deploying such an adaptive
submitted in 2006.
framework are that it can achieve suitable QoS for video
[12] T. Wedi and Y. Kashiwagi, “Subjective quality
over wired and wireless networks, bandwidth efficiency and
evaluation of H.264/AVC FRExt for HD movie
fairness in sharing resources [11].
content,” Joint Video Team document JVT-L033, July,
The adaptive scalable video coding technology produces
2004.
bitstreams decodable at different bitrates, requiring
[13] ISO/IEC JTC 1/SC 29/WG 11 (MPEG), “Report of the
different computational power and channel bitrate. In
formal verification tests on AVC/H.264,” MPEG
addition, the bitstream is organized with a hierarchical
document N6231, Dec., 2003 (publicly available at
syntax that enables users to easily extract only a subpart of
http://www.chiariglione.org/mpeg/quality_tests.htm).
the data contained in the bitstream and still being able to
[14] T. Schierl, T. Stockhammer and T. Wiegand, "Mobile
decode the original input video but at a reduced spatial
Video Transmission using Scalable Video Coding
resolution or frame rate. This process can be applied
(SVC)," IEEE Trans. On Circuits and Systems for
recursively, that is, once a new bit stream is extracted out of
Video Technology, Special issue on Scalable Video
the original, it can undergo successive extractions
Coding, scheduled June 2007.
corresponding to always lower resolutions.
[15] S. Wenger, Y.-K. Wang and T. Schierl, “Transport and
Signaling of SVC in IP networks,” IEEE Transactions
References on Circuits and Systems for Video Technology, Special
[1] C. S. Kannangara, I. E. G. Richardson, M Bystrom, J. issue on Scalable Video Coding, scheduled for: March
Solera, Y. Zhao, A. MacLennan & R. Cooney, 2007.
"Complexity Reduction of H.264 using Lagrange
Optimization Methods," IEE VIE 2005, Glasgow, 4~6
April, 2005.
[2] H. Kim and Y. Altunbasak, "Low-complexity
macroblock mode selection for H.264/AVC encoders,"
presented at International Conference on Image
Processing, Singapore, 2004.

You might also like