Professional Documents
Culture Documents
available codec software
1 Introduction
1.1 Digital Video
A digital video is a collection of images presented sequentially to produce the effect of continuous
motion. It takes advantage of the spatio‐temporal properties of the human eye to simulate continuity in
motion. The persistence of the human eye is such that nanoseconds of exposure to an image results in
milliseconds of image on the retina. Hence, images played at a speed greater than a millisecond would
appear to be continuous. In general, the eye cannot differentiate between individual images when they
are played at a rate of 25 per second or higher. Several standards for television exist, which define the
frame rate of the video being displayed. Some of them are the NTSC, PAL etc. The frame rate varies from
25 fps to 60fps depending on the standard. The video file consists of the individual images (also known
as frames) and the sequencing information.
1.2 The Size Barrier
Consider a video that is being played out at the rate of 30 images per second. For a 640x480 grayscale
video represented in the raw lossless format, that would be 640x480x30 bytes per second. For a 30
minute video, this would be approximately 16GB. For a colour video, using three bytes per pixel, that
would be 48GB not even including the audio and the sequencing information. This is almost the size of 2
blue‐ray discs for a small sized SD video. For some modern HD transmissions, the frame sizes are as high
as 1920x1080 which would work out to video sizes be greater than 300GB when uncompressed .
1.3 Video Compression
It is impossible to even imagine transmitting videos of such huge sizes. To reduce the size of the video
to manageable proportions, the videos are usually never stored or transmitted in the raw format. Even
in situations where compression is not required, the video is still compressed. This is because, the
human eye is insensitive to higher frequencies and minute variations in colour and transmitting this
information would be a waste of resources. Every video is subjected to some kind of compression. The
compression method is based upon the application and bandwidth constraints where the video is used.
Compression may be classified
I. Based upon the reproducibility into
a. Lossless compression
As the name indicates, videos compressed using this method can be reproduced to the
original content without any change in data. Some methods which perform lossless
compression are Huffman coding, Run Length Coding etc. The amount of compression
achieved using these methods is very less compared to those achieved using loss methods(
which are discussed next). Further, the amount of compression is also greatly dependent
upon the content of the video.
b. Lossy compression
This compression is performed by dropping information which does not significantly affect
the visualization of the video. For example, the human eye is insensitive to high frequencies
and also does not recognize minor variations in colours. Hence this information can be
dropped while encoding the video. Methods such as JPEG perform lossy compression.
II. Based upon where the compression is performed into
a. Intraframe compression
This method takes advantage of the spatial redundancy present in each frame of the
video and compresses each frame based upon one of the compression methods. In
general, indoor videos have a uniform, non‐changing background and have a very high
spatial redundancy which can be greatly compressed.
b. Interframe compression
This method identifies the temporal redundancies between consecutive frames in a
video and attempts to remove them. Usually, videos do not have much scene changes
and hence will have a lot of temporal redundancy.
Usually good video formats implement both Inter and intra frame compression techniques.
1.4 Video encoding formats
A video encoding format is a representation for compressed video. Such a format specifies the
representation of each frame, the sequencing information between frames and compression and
decompression methods for inter and intra frame redundancy.
Although maximum compression is targeted, usually, all formats have a certain amount of redundancy
in them. This is to maintain performance in environments where there is frame dropping and data loss.
Error propagation resistance mechanisms are part of the specifications of all video encoding formats.
These also assist in seeking of data. Without these, every time we watch a move, we would have to start
from the beginning without being able to cue forward.
Some popular formats for video encoding are:
• wmv
• Mpeg – 1
• Mpeg – 4
• Asf
1.5 Container formats
Container formats are different from encoding formats. They hold combinations of the video and audio
encoded formats. They specify the bitrates of the audio and video and help maintain the
synchronization between the audio and video. Some containers are designed to hold only a specific
combination of audio and video while some are capable of holding several combinations (but only one
combination at a time). Two popular container formats are .avi and .wmv. .avi can be used to hold
several video formats including mpeg‐4 and mpeg‐1
Some video encoding formats are containers in themselves and are capable of holding both audio and
video. For example, MPEG‐1, MPEG‐4
1.6 Codec
A codec is an acronym for Coder‐Decoder. It is capable of encoding a set of images into a video and
decoding a video into a set of images. Each image usually constitutes a frame in the video. However,
several additional frames are added for the reasons discussed earlier.
Each codec is capable of working with only a specific video format. However, several codecs can exist for
a single format. Usually, each multimedia company format has its own codec for its player for a format.
For example, theora, mov are all codecs for the mpeg‐4 format. Codecs can be in either software or
hardware. The software codecs are slower and inexpensive as compared to the hardware codecs which
are much faster.
The specifications for a format are not rigid and provide for some variations. Although codecs
implement a specified format, they may vary in their method of operation resulting in variations in
quality and performance.
2 Codec Evaluation
With the ever increasing need for bandwidth, codec designers tend to be over greedy and design
algorithms which might badly affect the aesthecity of the video content. Hence, evaluation criteria for
codec performances are required to verify the quality of the compressed videos.
2.1 Criteria for Comparison
The codecs are compared based on the following criteria
1. Quality of Video
2. Performance of the codec
2.2 Quality of video
Quality of video corresponds to the look and feel of the video, the resolution, the artifacts, the blurring
and other visual aesthetic components. The quality of video depends on both the format of the video
and the codec used to encode to that format. Usually, several codecs implement a single format.
However, each one differs from the other. Quality also depends on the amount of information on the
video being encoded. Also, the performance will not be constant throughout the video. Clips with
higher information have more artifacts than scenes with little movement and scene changes. Quality can
be measured as objective or subjective.
2.2.1 Objective Quality
Objective quality is to measure the quality in mathematical terms which makes it very easy to compare
and evaluate. Some of the metrics available to measure objective quality are:
a. Mean Square Error (MSE): It is the second moment of the difference and describes the variance
between the original frame and the encoded frame.
b. Peak Signal to Noise Ratio (PSNR): The ratio between the maximum signal level and the noise.
Mathematically, it is given by:
c. Colour Difference: This is the absolute difference of the individual colour components between
the input frame and the output frame. It is calculated by
d. Structural Similarity (SSIM)[2] – This is used to measure the similarity between two images. It is
a number between 0 and 1. It is a function of luminance, contrast and structural similarity. It is
independent of the colour components.
2.2.2 Subjective Quality
Subjective quality is measured by visually inspecting the encoded video for artifacts, blurring, blocking
and overall quality.
2.3 Performance of the Codec
The performance of the codec is measured as a function of three quantities
1. Compression ratio of the codec( File size)
2. Speed of encoding( compression)
3. Speed of decoding( decompression)
2.3.1 Compression ratio of the codec
The compression ratio of the codec is measured by encoding a repetitive set of frames using the codecs
to yield videos of different formats. The file size of the encoded video to the uncompressed video will
act as a measure of the capacity for compression. By selecting appropriate frames to compress, we can
measure both the best and worst case scenarios.
2.3.2 Encoding and Decoding speed
The encoding and decoding speed vary from codec to codec and within the same codec for different
frames. Higher the redundancy, slower the encoding and smaller is the size of the file. By selecting
appropriate frames to compress, we can measure both the best and worst case scenarios.
2.4 Bit Rates
Bit Rate is measured in Kilo Bits per second and represents the amount of data flow per unit time. It is
an important factor that decides the quality of the video. For example, consider a video which has a bit
rate of 1000 KbPS. For a standard definition video, this would mean that there would be about 29.8
frames in the 1000 Kilo Bits i.e about 33 Kilo Bits per frame or 4 Kilo bytes per frame. This restricts the
amount of data that can be used to represent a frame. Lower bit rates mean higher compression and
lower quality of video, more noise, blocking, discolouration etc.
Application Bit Rates
a Video streaming 100‐500 KbPS
b SD video 500‐2000 KbPS
c HD video >2000 KbPS
By measuring each of the quantities discussed in 2.3 and 2.4, we will be able to identify the appropriate
codec for a specific application.
3 Implementation
3.1 Codecs
The following codecs are being evaluated in this study
Sl. Codec Designer/Developer Format Container
No
1 WMV2 Microsoft wmv wmv
2 Theora Xiph.org MPEG‐4 avi
3 Asf Microsoft asf asf
4 MPEG4 MPEG MPEG‐4 mp4
5 Quicktime Apple MPEG‐4 mov
6 MPEG‐1 MPEG MPEG‐1 mpeg
All codecs are part of the ffmpeg library.
3.2 Dataset
The following videos are used for evaluation of the codecs. The reason for selection of the video is also
described. All videos are of 352x288 pixel dimension, but may appear stretched in this document.
3.2.1 Quality Measurement
3.2.1.1 Akiyo
Figure 1 A frame from the Akiyo video sequence
This is a 300 frame video in the uncompressed YUV format. This video shows a news reader. It has no
background changes and almost negligible foreground changes.
3.2.1.2 Foreman
Figure 2 A frame from the Foreman video sequence
This is a 300 frame video in the uncompressed YUV format. This video has a sudden scene change at the
end. Other than that, there is no background change. Only the face shows rich emotions which can be
hard to compress
3.2.1.3 Football
Figure 3 A frame from the Football video sequence
This is a 125 frame video in the uncompressed YUV format. This video has a constant background and a
very rapid and large change in the foreground as player keep coming in and going out of the frames.
3.2.1.4 Stephan
Figure 4 A frame from the Stephan video sequence
This is a recording of Stephan Edberg’s tennis match. This is 300 frames in length and is also in the
uncompressed YUV format. This video has a fast foreground change as the player runs about, and a
background change as the camera follows him. This would be the hardest kind of natural video to
encode.
3.2.2 Performance Measurement
In order to measure the performance in terms of compression ratio and speed of encoding, I have
proposed a set of frames as shown below. These frames will together allow us to measure the best and
worst case scenarios.
3.2.2.1
100% 100%
3.2.2.2
100% 0%
3.2.2.3
¬=0% 100%
3.2.2.4
¬=0% ¬=0%
These pairs of alternating frames incorporate the best and worst case scenarios for compression.
3.2.3 Bit Rates
In order to cover the entire range of applications, the videos will be encoded to the following Bit rates:
a. 600 KbPS
This is the range at which youtube plays its videos.
b. 1,000 KbPS
This is the bit rates generally used in video conferencing
c. 3,000KbPS
These bit rates are generally used in optical disc playbacks.
4 Results and Discussion
As part of the exercise, I was able to mesure most of the evaluation parameters. However, due to issues
with the ffmpeg library, I did not get an accurate measure of the coding and decoding times.
4.1 Akiyo
Akio MSE
2.5
2 wmv2
theora
1.5 asf
mpeg4
1
qt
0.5 mpeg1
0
600 1000 3000
Figure 5 Mean Squared Error for Akiyo
Akio PSNR
48
47 wmv2
theora
46 asf
45 mpeg4
qt
44
mpeg1
43
600 1000 3000
Figure 6 PSNR for Akiyo
1 wmv2
0.8 theora
asf
0.6
mpeg4
0.4 qt
0.2 mpeg1
0
600 1000 3000
Figure 7 Absolute Colour distance for Akiyo
Akio SSIM
1
0.9995
wmv2
0.999
theora
0.9985
asf
0.998
0.9975
mpeg4
qt
0.997
0.9965
mpeg1
0.996
600 1000 3000
Figure 8 Structural Similarity for Akiyo
F o o tb all MS E
10
9
8
wmv2
7
Theora
6
as f
5
mpeg4
4
3
Qt
mpeg1
2
1
0
600 1000 3000
Figure 9 Mean Squared Error for Football
F o o tb all P S NR
43
42 wmv
41
Theora
40 as f
39 mpeg4
38 Qt
37 mpeg1
36
600 1000 3000
Figure 10 PSNR for Football
5
4.5
4 wmv2
3.5
theora
3
as f
2.5
2
mpeg4
Qt
1.5
1 mpeg1
0.5
0
Figure 11 Absolute Colour Distance for Football
F o o tb all S S IM
1
0.98
wmv2
0.96 Theora
as f
0.94
mpeg4
0.92 Qt
mpeg1
0.9
0.88
600 1000 3000
Figure 12 Structural Similarity for Football
F oreman MS E
5
4.5
4 wmv2
3.5 Theora
3
as f
2.5
2 mpeg4
1.5 Qt
1 mpeg1
0.5
0
Figure 13 Mean Squared Error for foreman
F o reman P S NR
47
46
45 wmv2
Theora
44 as f
43
mpeg 4
42
Qt
41 mpeg 1
40
39
600 1000 3000
Figure 14 PSNR for foreman
F o reman A b s o lu te C o lo r Dis tan c e
3
2.5 wmv2
2 theora
1.5
as f
mpeg4
1 Qt
mpeg1
0.5
0
600 1000 3000
Figure 15 Absolute Colour Distance for foreman
F o reman S S IM
1.005
1 wmv2
0.995 Theora
0.99
as f
mpeg4
0.985 Qt
mpeg1
0.98
0.975
600 1000 3000
Figure 16 Structural Similarity for foreman
Stephan MSE
10
8 wmv2
theora
6
asf
4 mpeg4
qt
2 mpeg1
0
600 1000 3000
Figure 17 Mean Squared Error for Stephan
Stephan PSNR
45
44
wmv2
43
42 theora
41 asf
40 mpeg4
39 qt
38 mpeg1
37
36
600 1000 3000
Figure 18 PSNR for Stephan
2 mpeg4
qt
1 mpeg1
0
600 1000 3000
Figure 19 Absolute Colour Distance for Stephan
Stephan SSIM
1.02
1 wmv2
0.98 theora
0.96 asf
0.94 mpeg4
0.92 qt
0.9 mpeg1
0.88
600 1000 3000
Figure 20 Structural Similarity for Stephan
W-W File Sizes B-W File Sizes
mpeg-1
mpeg-1
Qt Qt
1 1
mpeg-4 mpeg-4
asf asf
theora theora
w mv2 wmv2
0 100 200 300 400
0 200 400 600 800 1000 1200 1400 1600
Size in KB Size in KB
Figure 22 File sizes with high spatial and temporal redundancy Figure 21 File sizes with high spatial and low temporal redundancy
Figure 23 File sizes with low spatial and high temporal redundancy Figure 24 File sizes with low spatial and low temporal redundancy
mpeg-1
mpeg-1
Qt Qt
1 mpeg-4 1
mpeg-4
asf asf
theora
theora
w mv2
wmv2
0 500 1000 1500 2000 2500 3000 0 5000 10000 15000 20000
Size in KB Size in KB
Following are some sample frames from the encoded videos
Figure 25 Counter Clockwise from the top – a frame from the Stephan video – original frame, wmv encoded at 600kbps and
wmv encoded at 3000kbps
Figure 26 Counter Clockwise from the top – a frame from the Akiyo video – original frame, wmv encoded at 600kbps and
wmv encoded at 1000kbps
In Figure 25, the distortion is clearly visible when encoded at 600kbps, but at 3000kbps, it is almost
negligible. However, in Figure 26, there is no visible distortion even at 600kbps. This implies that the
encoding process is sensitive to the content of the video also.
5 Conclusion
Selection of a format for encoding or representation depends upon the application which uses the
video. The various criteria to be considered before selecting a format are:
• Application
o Transmission
Videos used for transmission and viewing over the internet require a high
compression ratio. They can compromise on the quality as such videos are rarely
used for important applications.
o Video Conferencing
Video conferencing applications have specific criteria when it comes to quality. They
need the videos to be clear, but the frame rate can be compromised. Surveillance
videos also fall into this category. The encoding and decoding speed are of
significance here.
o Archiving
Videos used for this purpose do not have significant demands on encoding or
decoding speed. They require higher resolution and quality with lower file sizes.
• Performance Requirements
o Real time video processing for UAVs etc.
The requirement here is for faster encoding speed and very little blurring
o Video Viewing
Video viewing, in general, does not have much processing requirements. This is
because of the availability of sufficient processing capability and non‐ real time
nature of the application.
• Quality requirements
o Entertainment
o Conferencing
o Surgical procedures
6 Future Work
Possible future work includes
a. Measuring blurring effects of the codecs
b. Measuring blocking effects and impact on edge detection algorithms
c. Evaluating coding and decoding times.
d. Identifying impact of frame size on coding speed and compression ratio.
7 References
[1] Madhuri Khambete, and Madhuri Joshi, “Blur and Ringing Artifact Measurement Image Compression
using Wavelet Transform”, PROCEEDINGS OF WORLD ACADEMY OF SCIENCE, ENGINEERING AND
TECHNOLOGY VOLUME 20 APRIL 2007 ISSN 1307-6884
[2] Zhou Wang, Alan Conrad Bovik, Hamid Rahim Sheikh, and Eero P. Simoncelli, “Image Quality
Assessment: From Error Visibility to Structural Similarity” , IEEE TRANSACTIONS ON IMAGE
PROCESSING, VOL. 13, NO. 4, APRIL 2004