Data Compression

A Seminar Project on Compression and Decompression
Submitted to: Submitted By: Sushma Rani (HOD) Prachi Trehan(CS-8th)
Data Compression
Compression is the reduction in size of data in order to save space or transmission time. Data compression is particularly useful in communications because it enables devices to transmit or store the same amount of data in fewer bits.
Advantages of Data Compression

It reduces storage requirements. The rate of input-output operations in a
computing device can be greatly increased due to shorter representation of data.

Data Compression obviously reduces the cost of
backup and recovery of data in computer systems by storing the backup of large database files in compressed form.
Types of Data Compression
Data compression
Lossless Compression
Lossy Compression
Lossless Compression
In this type, no information is lost during the
compression and the decompression process.

It achieves only about a 2:1 compression
ratio.
It looks for patterns in strings of bits and then
expresses them more concisely.
Lossy compression
In lossy compression some information is lost
during the processing.

It provides much higher compression rates but
there will be some loss of information compared to the original source file.
The main advantage is that the loss cannot be
visible to eye or it is visually lossless
Techniques of Data compression

Techniques of Data Compression
Basic
Statistical
Dictionary
Run Length Encoding
Shannon Fano Encoding
Huffman Encoding
Arithmetic Encoding
Lempel ziv Encoding
Run-length encoding
Represents data using value and run length.
Run length defined as number of consecutive
equal values e.g 1110011111

Values
130215
Run Lengths
Useful for compressing data that contains
repeated values
Shannon fanos coding

Message x1 x2 x3 x4 x5 x6 x7 x8
Probability
0.25
0.25
0.125
0.125
0.0625
0.0625
0.0625
0.0625
x1,x2,x3,x4,x5,x6,x7,x8
0
x1,x2
1 x3,x4,x5,x6,x7,x8
00
x1
01
x2 100 x3 x4
10 x3,x4 101 110 x5,x6 1100 x5
11
x5,x6,x7,x8 111 x7,x8
1101 1110
x6 x7 x8
Huffman Coding
Every character is coded with a fixed length using one
byte.
Eg.
A E I S T SP nl 000 001 010 011 100 101 110
Huffman Coding
Character code can be found by starting at the root and recording the path, using a 0 to indicate the left branch and a 1 to the right branch. For instance, s is reached by going left, then right and finally right
000
001
010
011
100
101
110
Arithmetic Coding
We begin with a current interval" [L; H) initialized
to [0; 1). For each symbol, we perform two steps : (a) We subdivide the current interval into subintervals, one for each possible alphabet symbol. The size of a symbol's subinterval is proportional to the estimated probability that the symbol will be the next symbol in the file, according to the model of the input. (b) We select the subinterval corresponding to the symbol that actually occurs next in the file, and make it the new current interval.
Lempel Ziv Encoding

It is dictionary-based encoding
LZ is a compression that realizes compression
ratios of up to 20 to 1.It relies on the fact that, in any document, character strings are going to be repeated. Let us suppose, at the senders end, we wish to send the message:
ABABAAABBCACABABACAC
Lempel Ziv Compression

Have 2 phases:
Building an indexed dictionary Compressing a string of symbols
Algorithm:
Extract the smallest substring that cannot be found in the
remaining uncompressed string. Store that substring in the dictionary as a new entry and assign it an index value Substring is replaced with the index found in the dictionary Insert the index and the last character of the substring into the compressed string
Lempel Ziv Compression

The sender would have the following symbol table, assuming
that all possible messages consist only of patterns of the characters: A B and C. 0 A Beginning Symbol Table 1 B
2 C At this point the sender and receiver symbol tables would contain: Sender Receiver 0 A A 1 B B 2 C C 3 AB AB 4 BA BA 5 ABA ABA 6 ABC ABC 7 CB CB 8 BAB BAB 9 BABA not yet
JPEG Encoding
JPEG stands for Joint Photographic Experts
Group JPEG compression is used with .jpg and can be embedded in .tiff and .eps files. Used on 24-bit color files.
Works well on photographic images. Although it is a lossy compression technique, it
yields an excellent quality image with high compression rates.
Steps in JPEG Compression

(Optionally) If the color is represented in RGB
mode, translate it to YUV. Divide the file into 8 X 8 blocks. Transform the pixel information from the spatial domain to the frequency domain with the Discrete Cosine Transform. Quantize the resulting values by dividing each coefficient by an integer value and rounding off to the nearest integer. Look at the resulting coefficients in a zigzag order. Do a run-length encoding of the coefficients ordered in this manner. Follow by
MPEG Encoding
Used to compress video.
Basic idea:
Each video is a rapid sequence of a set of frames. Each
frame is a spatial combination of pixels, or a picture. Compressing video = spatially compressing each frame + temporally compressing a set of frames.
MPEG Encoding
Spatial Compression
Each frame is spatially compressed by JPEG.
Temporal Compression
Redundant frames are removed. For example, in a static scene in which someone is talking,
most frames are the same except for the segment around the speakers lips, which changes from one frame to the next.
Audio Compression
Used for speech or music
Speech: compress a 64 kHz digitized signal Music: compress a 1.411 MHz signal
Two categories of techniques:

Predictive encoding
Perceptual encoding
Audio Encoding
Predictive Encoding
Only the differences between samples are encoded, not the
whole sample values. Several standards: GSM (13 kbps), G.729 (8 kbps), and G.723.3 (6.4 or 5.3 kbps)
Perceptual Encoding: MP3

CD-quality audio needs at least 1.411 Mbps and cannot be
sent over the Internet without compression. MP3 (MPEG audio layer 3) uses perceptual encoding technique to compress audio.
References
The data compression book by Mark
Nelson and Jean-loup Gailly Introduction to data compression by Khalid Sayood Adaptive data compression by Ross Williams http://www.cs.jhu.edu/~hager/Teaching/cs2 26/Notes/Huffman.ppt.pdf http://ihoque.bol.ucla.edu/presentation.ppt

Data Compression

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Compression

Uploaded by

Copyright:

Available Formats

A Seminar Project on Compression and Decompression

Submitted to: Submitted By: Sushma Rani (HOD) Prachi Trehan(CS-8th)

Advantages of Data Compression

computing device can be greatly increased due to shorter representation of data.

Types of Data Compression

compression and the decompression process.

expresses them more concisely.

during the processing.

visible to eye or it is visually lossless

Techniques of Data compression

Run Length Encoding

Shannon Fano Encoding

Lempel ziv Encoding

Run length defined as number of consecutive

equal values e.g 1110011111

Useful for compressing data that contains

Shannon fanos coding

10 x3,x4 101 110 x5,x6 1100 x5

Lempel Ziv Encoding

LZ is a compression that realizes compression

Lempel Ziv Compression

Lempel Ziv Compression

yields an excellent quality image with high compression rates.

Steps in JPEG Compression

Two categories of techniques:

Perceptual Encoding: MP3

You might also like