Professional Documents
Culture Documents
Data Compression
Compression is the reduction in size of data in order to save space or transmission time. Data compression is particularly useful in communications because it enables devices to transmit or store the same amount of data in fewer bits.
backup and recovery of data in computer systems by storing the backup of large database files in compressed form.
Data compression
Lossless Compression
Lossy Compression
Lossless Compression
In this type, no information is lost during the
ratio.
It looks for patterns in strings of bits and then
Lossy compression
In lossy compression some information is lost
there will be some loss of information compared to the original source file.
The main advantage is that the loss cannot be
Basic
Statistical
Dictionary
Huffman Encoding
Arithmetic Encoding
Run-length encoding
Represents data using value and run length.
130215
Run Lengths
repeated values
Probability
0.25
0.25
0.125
0.125
0.0625
0.0625
0.0625
0.0625
x1,x2,x3,x4,x5,x6,x7,x8
0
x1,x2
1 x3,x4,x5,x6,x7,x8
00
x1
01
x2 100 x3 x4
11
x5,x6,x7,x8 111 x7,x8
1101 1110
x6 x7 x8
Huffman Coding
Every character is coded with a fixed length using one
byte.
Eg.
A E I S T SP nl 000 001 010 011 100 101 110
Huffman Coding
Character code can be found by starting at the root and recording the path, using a 0 to indicate the left branch and a 1 to the right branch. For instance, s is reached by going left, then right and finally right
000
001
010
011
100
101
110
Arithmetic Coding
We begin with a current interval" [L; H) initialized
to [0; 1). For each symbol, we perform two steps : (a) We subdivide the current interval into subintervals, one for each possible alphabet symbol. The size of a symbol's subinterval is proportional to the estimated probability that the symbol will be the next symbol in the file, according to the model of the input. (b) We select the subinterval corresponding to the symbol that actually occurs next in the file, and make it the new current interval.
ratios of up to 20 to 1.It relies on the fact that, in any document, character strings are going to be repeated. Let us suppose, at the senders end, we wish to send the message:
ABABAAABBCACABABACAC
Algorithm:
Extract the smallest substring that cannot be found in the
remaining uncompressed string. Store that substring in the dictionary as a new entry and assign it an index value Substring is replaced with the index found in the dictionary Insert the index and the last character of the substring into the compressed string
that all possible messages consist only of patterns of the characters: A B and C. 0 A Beginning Symbol Table 1 B
2 C At this point the sender and receiver symbol tables would contain: Sender Receiver 0 A A 1 B B 2 C C 3 AB AB 4 BA BA 5 ABA ABA 6 ABC ABC 7 CB CB 8 BAB BAB 9 BABA not yet
JPEG Encoding
JPEG stands for Joint Photographic Experts
Group JPEG compression is used with .jpg and can be embedded in .tiff and .eps files. Used on 24-bit color files.
Works well on photographic images. Although it is a lossy compression technique, it
mode, translate it to YUV. Divide the file into 8 X 8 blocks. Transform the pixel information from the spatial domain to the frequency domain with the Discrete Cosine Transform. Quantize the resulting values by dividing each coefficient by an integer value and rounding off to the nearest integer. Look at the resulting coefficients in a zigzag order. Do a run-length encoding of the coefficients ordered in this manner. Follow by
MPEG Encoding
Used to compress video.
Basic idea:
Each video is a rapid sequence of a set of frames. Each
frame is a spatial combination of pixels, or a picture. Compressing video = spatially compressing each frame + temporally compressing a set of frames.
MPEG Encoding
Spatial Compression
Each frame is spatially compressed by JPEG.
Temporal Compression
Redundant frames are removed. For example, in a static scene in which someone is talking,
most frames are the same except for the segment around the speakers lips, which changes from one frame to the next.
Audio Compression
Used for speech or music
Speech: compress a 64 kHz digitized signal Music: compress a 1.411 MHz signal
Audio Encoding
Predictive Encoding
Only the differences between samples are encoded, not the
whole sample values. Several standards: GSM (13 kbps), G.729 (8 kbps), and G.723.3 (6.4 or 5.3 kbps)
sent over the Internet without compression. MP3 (MPEG audio layer 3) uses perceptual encoding technique to compress audio.
References
The data compression book by Mark
Nelson and Jean-loup Gailly Introduction to data compression by Khalid Sayood Adaptive data compression by Ross Williams http://www.cs.jhu.edu/~hager/Teaching/cs2 26/Notes/Huffman.ppt.pdf http://ihoque.bol.ucla.edu/presentation.ppt