Professional Documents
Culture Documents
SUBMITTED BY
MUKESH KUMAR TANWAR 11104EN067
VIKASH KUMAR 11104EN068
PARAS CHANDOLIA 11104EN070
ABSTRACT
Text Compression is the science and art of representing information in
a compact form. There are a lot of text compression algorithms which
are available to compress files of different formats. In this paper, we
have used Huffman algorithm for the text compression.
INTRODUCTION
Text compression refers to reducing the amount of space needed to
store the text. There are two types of compression LOSSLESS- Data compression can be lossless only if it is possible to
exactly reconstruct the original data from the compressed version. Such a
lossless technique is used when the original data of a source are so
important that we cannot afford to lose any details. Examples of such
source data are medical images, text, some computer executable files, etc.
LOSSY- Another method of compression algorithms is called lossy as these
algorithms irreversibly remove some parts of data and only an
approximation of the original data can be reconstructed. Data such as
multimedia images, video and audio are more easily compressed by lossy
compression techniques
Huffman coding
The Huffman coding transforms the original code used for the characters of the text(ASCII
code on 8 bits). Coding the text is just replacing each symbol by its new code-word. The
Huffman algorithm uses the notion of prefix code. A prefix code is a set of words
containing no word that is prefix of another word of the set. The advantage of such coding
is that the decoding is immediate. Huffman Code assigns shorter encodings to elements
with a high frequency. Elements with the highest frequency get assigned the shortest bit
length code. The key to decompressing Huffman code is a Huffman tree.
The steps required while compressing the text are as follows Counts the character frequencies.
Builds the Huffman coding tree.
Builds character code-words from the coding tree.
Stores the coding tree in the compressed file.
Encodes the characters in the compressed file.
Complete function for the Huffman encoding.
Rebuilds the tree from header of the compressed file.
Recovers the original text.
Complete function for the Huffman decoding.
ENCODING
else {
c=GET_LABEL(root);
code[c].codeword=(char *)malloc(length);
code[c].lg=length;
strncpy(code[c].codeword, temp, length);
}
}
character frequencies:
DECODING
Decoding a file containing a text compressed by Huffman algorithm is a mere programming exercise.
First the coding tree is rebuild by the algorithm1 given below. It uses a function to decode a integer
written on 9 bits. Then, the uncompressed text is recovered by parsing the compressed
text with the coding tree. The process begins at the root of the coding tree, and follows a left
branch when a 0 is read or a right branch when a 1 is read. When a leaf is encountered, the
corresponding character (in fact the original codeword of it) is produced and the parsing phase
resumes at the root of the tree. The parsing ends when the codeword of the end marker is read.
The decoding of the text is presented in Figure Algorithm3. Again in this algorithm the bits are read
with the use of a buffer
(buffer)
and a counter (bits in stock). A byte is read in the compressed file only if the counter is equal to
zero Algorithm4.
The complete decoding program is given in Algorithm5. It calls the preceding functions. The
running time of the decoding program is linear in the sum of the sizes of the texts it manipulates.
Algorithm2: Reads the next 9 bits in the compressed file and returns
the corresponding value
Int BITS2ASCII(FILE *fin)
{
int i, value;
value=0;
for (i=8; i >= 0; i--) value=value<<1+READ_BIT(fin);
return(value);
}
Algorithm3: Reads the compressed text and produces the
uncompressed text
voidDECODE_TEXT(FILE *fin, FILE *fout, NODE root)
{
NODE node;
while (1) {
if (GET_LEFT(node) == NULL)
if (GET_LABEL(node) != END)
{
putc(GET_LABEL(node), fout);
node=root;
}
else break;
else if (READ_BIT(fin) == 1) node=GET_RIGHT(node);
else node=GET_LEFT(node);
}
}
Algorithm4: Reads the next bit from the compressed file
Int READ_BIT(FILE *fin)
{
int bit;
if (bits_in_stock == 0) {
buffer=getc(fin);
bits_in_stock=8;
}
bit=buffer&1;
buffer>>=1;
bits_in_stock--;
return(bit);
}
Algorithm5: Complete function for decoding
int buffer;
int bits_in_stock;
void DECODING(char *fichin, char *fichout)
{
FILE *fin, *fout;
NODE root;