You are on page 1of 22

A Seminar Project on Compression and Decompression

Submitted to: Submitted By: Sushma Rani (HOD) Prachi Trehan(CS-8th)

Data Compression

Compression is the reduction in size of data in order to save space or transmission time. Data compression is particularly useful in communications because it enables devices to transmit or store the same amount of data in fewer bits.

Advantages of Data Compression


It reduces storage requirements. The rate of input-output operations in a

computing device can be greatly increased due to shorter representation of data.


Data Compression obviously reduces the cost of

backup and recovery of data in computer systems by storing the backup of large database files in compressed form.

Types of Data Compression

Data compression

Lossless Compression

Lossy Compression

Lossless Compression
In this type, no information is lost during the

compression and the decompression process.


It achieves only about a 2:1 compression

ratio.
It looks for patterns in strings of bits and then

expresses them more concisely.

Lossy compression
In lossy compression some information is lost

during the processing.


It provides much higher compression rates but

there will be some loss of information compared to the original source file.
The main advantage is that the loss cannot be

visible to eye or it is visually lossless

Techniques of Data compression


Techniques of Data Compression

Basic

Statistical

Dictionary

Run Length Encoding

Shannon Fano Encoding

Huffman Encoding

Arithmetic Encoding

Lempel ziv Encoding

Run-length encoding
Represents data using value and run length.

Run length defined as number of consecutive

equal values e.g 1110011111


Values

130215
Run Lengths

Useful for compressing data that contains

repeated values

Shannon fanos coding


Message x1 x2 x3 x4 x5 x6 x7 x8

Probability

0.25

0.25

0.125

0.125

0.0625

0.0625

0.0625

0.0625

x1,x2,x3,x4,x5,x6,x7,x8

0
x1,x2

1 x3,x4,x5,x6,x7,x8

00
x1

01
x2 100 x3 x4

10 x3,x4 101 110 x5,x6 1100 x5

11
x5,x6,x7,x8 111 x7,x8

1101 1110
x6 x7 x8

Huffman Coding
Every character is coded with a fixed length using one

byte.
Eg.
A E I S T SP nl 000 001 010 011 100 101 110

Huffman Coding
Character code can be found by starting at the root and recording the path, using a 0 to indicate the left branch and a 1 to the right branch. For instance, s is reached by going left, then right and finally right

000

001

010

011

100

101

110

Arithmetic Coding
We begin with a current interval" [L; H) initialized

to [0; 1). For each symbol, we perform two steps : (a) We subdivide the current interval into subintervals, one for each possible alphabet symbol. The size of a symbol's subinterval is proportional to the estimated probability that the symbol will be the next symbol in the file, according to the model of the input. (b) We select the subinterval corresponding to the symbol that actually occurs next in the file, and make it the new current interval.

Lempel Ziv Encoding


It is dictionary-based encoding

LZ is a compression that realizes compression

ratios of up to 20 to 1.It relies on the fact that, in any document, character strings are going to be repeated. Let us suppose, at the senders end, we wish to send the message:

ABABAAABBCACABABACAC

Lempel Ziv Compression


Have 2 phases:
Building an indexed dictionary Compressing a string of symbols

Algorithm:
Extract the smallest substring that cannot be found in the

remaining uncompressed string. Store that substring in the dictionary as a new entry and assign it an index value Substring is replaced with the index found in the dictionary Insert the index and the last character of the substring into the compressed string

Lempel Ziv Compression


The sender would have the following symbol table, assuming

that all possible messages consist only of patterns of the characters: A B and C. 0 A Beginning Symbol Table 1 B
2 C At this point the sender and receiver symbol tables would contain: Sender Receiver 0 A A 1 B B 2 C C 3 AB AB 4 BA BA 5 ABA ABA 6 ABC ABC 7 CB CB 8 BAB BAB 9 BABA not yet

JPEG Encoding
JPEG stands for Joint Photographic Experts

Group JPEG compression is used with .jpg and can be embedded in .tiff and .eps files. Used on 24-bit color files.
Works well on photographic images. Although it is a lossy compression technique, it

yields an excellent quality image with high compression rates.

Steps in JPEG Compression


(Optionally) If the color is represented in RGB

mode, translate it to YUV. Divide the file into 8 X 8 blocks. Transform the pixel information from the spatial domain to the frequency domain with the Discrete Cosine Transform. Quantize the resulting values by dividing each coefficient by an integer value and rounding off to the nearest integer. Look at the resulting coefficients in a zigzag order. Do a run-length encoding of the coefficients ordered in this manner. Follow by

MPEG Encoding
Used to compress video.

Basic idea:
Each video is a rapid sequence of a set of frames. Each

frame is a spatial combination of pixels, or a picture. Compressing video = spatially compressing each frame + temporally compressing a set of frames.

MPEG Encoding
Spatial Compression
Each frame is spatially compressed by JPEG.

Temporal Compression
Redundant frames are removed. For example, in a static scene in which someone is talking,

most frames are the same except for the segment around the speakers lips, which changes from one frame to the next.

Audio Compression
Used for speech or music
Speech: compress a 64 kHz digitized signal Music: compress a 1.411 MHz signal

Two categories of techniques:


Predictive encoding
Perceptual encoding

Audio Encoding
Predictive Encoding
Only the differences between samples are encoded, not the

whole sample values. Several standards: GSM (13 kbps), G.729 (8 kbps), and G.723.3 (6.4 or 5.3 kbps)

Perceptual Encoding: MP3


CD-quality audio needs at least 1.411 Mbps and cannot be

sent over the Internet without compression. MP3 (MPEG audio layer 3) uses perceptual encoding technique to compress audio.

References
The data compression book by Mark

Nelson and Jean-loup Gailly Introduction to data compression by Khalid Sayood Adaptive data compression by Ross Williams http://www.cs.jhu.edu/~hager/Teaching/cs2 26/Notes/Huffman.ppt.pdf http://ihoque.bol.ucla.edu/presentation.ppt

You might also like