Huffman Algorithm

Huffman Algorithm
Written to fulfill the 1 task of

st
Algorithm and Data Structure
Group 1
Names : Abdullah Azzam Robbani
Ivan Eka Putra
Reyhan Radhitiansyah
Class : 2SC3
CEP CCIT
FAKULTAS TEKNIK UNIVERSITAS INDONESIA
2018
PREFACE
Praise be to Allah SWT, because only with His grace and mercy we can finish this
task well. This ISAS is titled “Huffman Algorithm”. This paper discusses one of the coding
techniques which is used by data compression algorithms, that is Huffman algorithm. This
algorithm is usually implemented into a software to perform compress the file.
The authors realized in the preparation of this ISAS is still far from perfect, given the
limitations of the science of own. Nevertheless, the authors have tried to finish it with all the
capabilities that the authors have. In the process of preparation of this paper, the authors have
received a lot of guidance, suggestion, and help from various parties. They are:
1. Author’s parents who always support authors through spirit and other things.
2. Muhammad Suryanegara S.T., M.Sc. as director of CEP-CCIT Faculty of
Engineering, University of Indonesia.
3. Tirta Akdi Toma Mesoya Hulu as author’s faculty who always lead and give advices
until ISAS the ISAS has finished.
4. Other parties that help authors in searching for source of information and reference,
such as website, journal, and book.
Authors hope all readers can give comment and suggestion for this ISAS, so that the
next one is better. Authors also hope this ISAS can be useful, especially for all CEP-CCIT
Faculty of Engineering, University of Indonesia students and also for IT development. On this
occasion with all the humility, the authors apologized on shortcomings in this matter.
Depok, March 6th 2018
Authors
.
TABLE OF CONTENTS
INTRODUCTION ……………………………………………………………………….4
Background ……………………………………………………………………….4
1.2 Writing Objective ……………………………………………………………….4
1.3 Problem Domain …………………………………………………………….....4
1.4 Writing Methodology ……………………………………………………………….4
1.5 Writing Framework ……………………………………………………………….4
BASIC THEORY ……………………………………………………………………….6

PROBLEM ANALYSIS ……………………………………………………………….8
CONCLUSION AND SUGGESTION ……………………………………………….10
4.1 Conclusion …………………………………………………………………….....10
4.2 Suggestion ……………………………………………………………………….10
BIBLIOGRAPHY ……………………………………………………………………......11
TABLE OF FIGURES
II.1. David Albert Huffma ………………………………………………………………...7
II.2. Robert Fano ………………………………………………………………………...8
CHAPTER I
INTRODUCTION
I.1. Background
Algorithm is a set of instructions used to solve a problem. In general, the algorithm is
more or less the same as a procedure that is done every day, such as a procedure for using the
phone, cooking procedures, and so on. In the field of computer science, algorithms are also
used. For example, a programmer needs an algorithm to create an effective and efficient
program. There are many algorithms used in the field, for example the data compression
algorithm.
In the last decade, there has been an transformation on how to communicate. This
transformation is characterized by an ever-present internet and increased use of video
communications. Data compression is one that enables the existence of such technology. It
would not be practical to put pictures, let alone audio and video on the website if it were not
for data compression. The existence of digital television will not be possible without
compression. This is what make the authors interested to discuss the data compression
algorithm, especially Huffman algorithm, because now it’s all digital and it will not happen if
not for compression.
I.2. Writing Objective

Based on the problem domain, the following is the purpose of writing ISAS entitled
"Huffman Algorithm":
1. To know the history of Huffman algorithm

2. To know how the Huffman algorithm works
3. To know the characteristics and various models of Huffman algorithm
4. To find out cases using Huffman algorithm
5. To know the advantages and disadvantages of Huffman algorithm if compared to
other methods that are also used in data compression
I.3. Problem Domain

Things that will be discussed in this ISAS is the definition, history, model, design, and
characteristics of Huffman algorithm.
I.4. Writing Methodology

Writing methodology used is to find source of information and reference from internet,
journals, and books from various media.
I.5. Writing Framework

The following is the systematics of ISAS writing entitled "Huffman Algorithm":
1. CHAPTER I INTRODUCTION
a. Background
b. Writing Objective
c. Problem Domain
d. Writing Methodology
e. Writing Framework
2. CHAPTER II BASIC THEORY
a. The Definition of Algorithm

b. Data Compression
c. Type of Data Compression
3. CHAPTER III PROBLEM ANALYSIS
4. CHAPTER IV CONCLUSION AND SUGGESTION
a. Conclusion
b. Suggestion
5. BIBLIOGRAPHY
6. APPENDIX
CHAPTER II
BASIC THEORY
2.1. Algorithm
The origin of the word algorithm comes from the name Abu Ja'far Mohammed Ibn
Musa al-Khowarizmi, Persian scientist who wrote the book al-jabr w'al-muqabala (Rules of
Restoration And Reduction) around the year 825 AD.
2.1.1. The Definition of Algorithm

Algorithm is an instruction to solve a problem. It can be a simple process, like
multiplying two numbers, or a complex operation, like playing a compressed video
file. Search engines use algorithms to display relevant results from their search index.
In non-technical terms, algorithms are used in everyday activities. Like a recipe for
making a cake or a guidebook.
In computer programming, algorithms are often created as functions. This

function is presented as a small program that can be referenced by a larger program.
For example, an image viewer application consists of a library of functions that each
uses an algorithm to render a different image file formats. An image editing program
consists of algorithms designed to process image data. Examples of image processing
algorithms are cropping, resizing, sharpening, blurring, red-eye reduction, and color
enhancement.
2.1.2. Algorithm Criteria By Donald E. Knuth

1. Input : Algorithm can have zero or more input from outside.
2. Output : Algorithm must have at least one output.
3. Definiteness : Algorithm has clear and unambiguous instructions.
4. Finiteness : Algorithm must have stopping role.
5. Effectiveness : Algorithm as much as possible should be executable
and effective
2.1.3. Type of Algorithm Process

1. Sequence Process : Instruction is done sequentially.
2. Selection Process : Instruction is done if it meets certain criteria.
3. Iteration Process : Instruction is done during a certain condition.
4. Concurrent Process : Several instructions are done together.
2.2. What is Data Structure?

Data structure, in the simplest term is a scheme for organizing related data in
computer’s memory logically. As long as data structure is a scheme for data organization so
functional definition of a data structure must be independent of its implementation. The
functional definition of a data structure as known as ADT (Abstract Data Type). The part of
implementation left on developers who decide which technology is right for their project needs.
[4]
Data structure is foundation of a program; rightly chosen a data structure then the
program becomes efficient.
2.3. Data Compression

Compression is used almost anywhere. All images obtained from the web are
compressed, usually in JPEG or GIF format, most modems use compression, HDTV will be
compressed using MPEG-2, and some file systems automatically compress files while storing.
Data compression is the process of converting an input data into another data that has a smaller
size. Data compression is popular for several reasons. For example, people like to accumulate
data and reluctant to throw it away. Then, people do not like to wait for data transfers that take
a lot of time.
Data compression has an important role in data transmission and data storage. Many
data processing applications that require large volumes of data storage and the number of such
applications is constantly increasing along with the use of extends to new disciplines. At the
same time, the proliferation of computer communication networks is resulting in massive
transfer of data over communication links. Compressing data to be stored or transmitted will
reduce storage and / or communication costs. When data to be transmitted is reduced, the effect
is that of increasing the capacity of the communication channel. Similarly, compressing a file
to half of its original size is equivalent to doubling the capacity of the storage medium.
There are many methods used to perform data compression. They have different ideas,
data types, and results, but they have the same principles. They perform data compression by
eliminating redundancies on the original data in the source file. Some data sets have structures,
and these structures can be exploited to produce a smaller representation of data. The terms
redundancy and structure are used in professional literature, both of which lead to the same
thing. Therefore, redundancy is an important concept of data compression.
2.3.1. Type of Data Compression

The type of data compression can be divided into two, namely based on the data
reception and the resulting output. The following is the type of data compression based
on the data reception:
1. Dialogue Mode
Dialogue mode is a data compression that must be within the limits of

sight and human hearing. The point is data compression is done by the
interaction through sight and hearing, as in video conferencing.
2. Retrieval Mode
Retrieval mode is a data compression opposite to dialogue mode that is

done in real time.
The following is the type of data compression based on the output:
1. Lossy Compression
In this data compression, the decompression result is not the same as the
data before compression. Examples: MP3, JPEG, MPEG, and WMA. Lossy
compression has a smaller size compared to loseless compression but still
qualify for use.
2. Loseless Compression
In this data compression, the compression result can be re-compressed

and the result remains the same as the data before the compression process.
Examples: ZIP, RAR, GZIP, 7-ZIP.
2.3.2. Criteria And Classification of Data Compression Technique

The main criteria of the system used for data compression are as follows:
1. The quality of the encoded data should be able to make the file size smaller
than the original file, as well as the undamaged data for lossy compression.
2. Speed, ratio, and efficiency of compression and decompression processes.
3. The precise process of data decompression must make the data
decompression results remain the same as the data before the compression.
While the classification of data compression techniques are as follows:
1. Entropy Encoding
This compression technique has the following characteristics:
a. Being loseless.
b. The technique is not based on media with specifications and
certain characteristics but based on sequence of data.
c. Statistical encoding. do not pay attention to semantic data.
d. For example: Run-length coding, Huffman coding, Arithmetic
coding.
2. Source Coding
a. Be lossy.
b. Associated with semantic and media data.
c. For example: Prediction (DPCM, DM), transformation (FFT,
DCT), layered coding (Bit position, subsampling, sub-band coding),
vector quantization.
3. Hybrid Coding
a. Combined between lossy + loseless.

b. For example JPEG, MPEG, H.261, DVI.
CHAPTER III
PROBLEM ANALYSIS
3.1. Data Compression With Huffman Code

Huffman Algorithm is one of the compression algorithm. Huffman algorithm is the
most famous algorithm for compressing text. There are three phases in using the Huffman
algorithm for compress a text, first, the Huffman tree forming phase, second, encoding phase,
and third, decoding phase. The principle used by the Huffman algorithm is that character often
appear to be encoded with short bits and rarer characters encoded with longer bits. Huffman
algorithm compression technique can save memory usage up to 30%. Huffman algorithm has
complexity O (n log n) for the set with n characters.
Figure III.1. David Albert Huffman

(REF: https://www.computer.org/web/awards/mcdowell-david-huffman)
Story of invention of Huffman code is a great story that demonstrating students can
better than professors. David Huffman (1952 - 1999) is a student in electrical engineering
course in 1951. His professor, Robert Fano, offers students a choice to take final exam or
writing term paper. Huffman would not take final exam so he started working on his paper.
The topic of the paper is to find the most efficient code. What Professor Fano did not tell his
students was fact that it was an open problem and that he was working on his problem himself.
Huffman spent a lot of time on his problem and ready to give up his solution suddenly coming
to him. Code that he found was optimal, meaning that it has the lowest average message length.
Method that Fano developed for his problem does not always result optimal code. Therefore,
Huffman is better than his professor. Then, Huffman said that his professor was struggling with
him.
Figure III.2. Robert Fano
(REF: http://news.mit.edu/2016/robert-fano-obituary-0715)
In data compression, Huffman code is binary codes that encode a particular symbol on
a data. The codes are formed by observing the frequency of occurrence of certain symbol on
the data. Huffman code is not unique, code for each symbol is different on each different data
that is compressed.
In its formation, Huffman code applies prefix code concept, which is the set of binary
code, such that no member of the set is the prefix of the other members, so in the decoding
process, there is not ambiguity between one symbol with another. Prefix codes that represent
more frequent symbols use a binary circuit that is shorter than code used to represent symbol
that appear less frequently. Thus, the number of bits used to store information on a data may
be shorter. [5]
The order of algorithm to form Huffman codes is as follows:
1. First calculated the frequency of occurrence of each symbol in data.

2. Huffman coding is done by constructs a binary tree with a minimum weighted trajectory,
(called Huffman tree):
a. First, two symbols are selected with smallest chance of occurence.

b. The two symbols are combined to form the parent’s parent of both symbols themselves,
with chance of occurrence equals to the sum of chance of occurence of the both symbols.
c. This new symbol is treated as new node and taken into account in search the next
symbol that has the smallest chance of occurrence.
d. Then, the others of two symbols are selected also has the smallest chance of occurrence.
e. Same procedure is performed on the next two symbols that has the smallest chance of
occurrence.
f. Second step is continued until all symbols are created binary tree.
3. Leaf on the Huffman tree represent symbol contained in compressed data.
4. Each symbols encoded by give label 0 on each left branches of binary tree and label 1 for
each of its right branches.
5. Created track from root to leaf, while read label 0 or 1 contained on each branches.
6. Huffman code for symbol on a leaf is binary circuit that reads from root to leaf.
3.2. Determination of Huffman Code Simple Example

For example, there is data whose contents “Huffman”. ASCII (American Standard Code
for Information Interchange) code from the the string characters is given in table below:
Table III.1. ASCII Code of “Huffman” Characters

CHARACTERS ASCII CODE (BINARY)
H 01001000
U 01110101
F 01100110
M 01101101
A 01100001
N 01101110
By use ASCII code, the “Huffman” representation in a series of bits is

“01001000011101010110011001100110011011010110000101101110”.
By use ASCII code method, it takes 56 bits (7 bytes) to store the string. The text will
compressed by Huffman code. First calculated the frequency of each symbol occurrence in the
string. Table of frequency and probability of each symbol occurrence are given in table below:
Table II.2. Frequency And Probability of “Huffman” Characters Occurrence

CHARACTERS FREQUENCIES PROBABILITIES
H 1 1/7
U 1 1/7
F 2 2/7
M 1 1/7
A 1 1/7
N 1 1/7
By use Huffman tree development algorithm, a Huffman tree can be constructed as

shown in figure below:
Figure II.3. Huffman tree for “Huffman”
So, Huffman code for each character are:
Table II.2. Huffman code for “Huffman”

CHARACTER HUFFMAN CODE
“H” 100
“u” 00
“f” 01
“m” 110
“a” 101
“n” 111
After compressed by Huffman code, the string can be represented into

“100000101110101111” (18 bits). With average number of bit used for encoding one
character are 2.6 bits. Thus, the compression has saved 38 bits (or 67.9% of the data size). This
is a very simple data compression example, if the compressed data is much larger, surely the
size that can be saved is also much greater.
3.3. Data Compression Application with Huffman Algorithm

Huffman code is not compression method with best performance, however, Huffman
code is still used because the simplicity, high speed, and least patent rights associated with it.
[5]
Currently, Huffman code is often used as last step of several other compression
methods. Here are some examples of Huffman code implementation:
1. DEFLATE method (Combination of LZ77 and Huffman code). This method used
in .ZIP, .GZ (GZIP), and .PNG file format.
2. Utility pack on Linux system in .Z file format.
3. Combination of Burrows-Wheeler and Huffman code transformation in .BZ2
(BZIP2) file format.
4. Image compression with .JPEG (Joint Photographic Experts Group) format. This
image compression used discrete cosine transformation, quantization, then ends
with Huffman code as last step.
5. Audio compression with .MP3 format. This compression is part of MPEG-1
standard for audio and music compression use subapps, MDCT (Modified Discrete
Cosine Transform), perceptual modelling, quantization, then ended with Huffman
code as last step.
6. Audio compression with .AAC (Advanced Audio Coding) format. This
compression is part of MPEG-2 and MPEG-4 audio encoding specification use
MDCT, perceptual modelling, quantization, and also ended with Huffman code as
last step.
7. In addition to data compression, Modificated Huffman code also used on fax
machine to encode black on white.
8. HDTV (High-Definition Television) and modem (Modulator-Demodulator) also
use Huffman code principle.
3.4. Characteristics of Huffman Codes
1. Approach
o Variable length encoding of symbols
o Exploit statistical frequency of symbols
o Efficient when symbol probabilities vary widely
2. Principle
o Use fewer bits to represent frequent symbols
o Use more bits to represent infrequent symbols
Features associated with Huffman Codes:

Huffman codes tend to be prefix-free binary code trees, consequently about all
substantial factors apply appropriately. Codes generated through the Huffman algorithm
achieve the perfect code length up to the bit boundary. The maximum deviation is less than 1
bit.
Huffman Codes Example:
Table II.2. Features associated with Huffman Codes for “Huffman”
Symbol P(x) I(x) Code H(x)

length
A 0,387 1,369 1 0,530
B 0,194 2,369 3 0,459
C 0,161 2,632 3 0,425
D 0,129 2,954 3 0,381
E 0,129 2,954 3 0,381
Theoretical minimum: 2,176 bit
Code length Huffman : 2,226 bit
The computation from the entropy results in an average code length of 2.176 bit per
symbol on the assumption of the distribution mentioned. In contrast to this a Huffman code
attains an average of 2.226 bits per symbol. Therefore Huffman coding methods the optimum
on 97.74%. An even better outcome is possible just using the arithmetic coding, however it's
utilization is fixed through patents.
3.5. Basic Technique in Huffman Algorithm

The technique works by creating a binary tree of nodes. These can be stored in a regular
array, the size of which depends on the number of symbols, n. A node can be either a leaf node
or an internal node. Initially, all nodes are leaf nodes, which contain the symbol itself,
the weight (frequency of appearance) of the symbol and optionally, a link to a parent node
which makes it easy to read the code (in reverse) starting from a leaf node. Internal nodes
contain symbol weight, links to two child nodes and the optional link to a parent node. As a
common convention, bit '0' represents following the left child and bit '1' represents following
the right child. A finished tree has n leaf nodes and n-1 internal nodes.
The process essentially begins with the leaf nodes containing the probabilities of the
symbol they represent, then a new node whose children are the 2 nodes with smallest
probability is created, such that the new node's probability is equal to the sum of the children's
probability. With the previous 2 nodes merged into one node (thus not considering them
anymore), and with the new node being now considered, the procedure is repeated until only
one node remains, the Huffman tree.
The simplest construction algorithm uses a priority queue where the node with lowest
probability is given highest priority:
1. Create a leaf node for each symbol and add it to the priority queue.
2. While there is more than one node in the queue:
a. Remove the two nodes of highest priority (lowest probability) from the queue
b. Create a new internal node with these two nodes as children and with probability
equal to the sum of the two nodes' probabilities.
c. Add the new node to the queue.
3. The remaining node is the root node and the tree is complete.
Since efficient priority queue data structures require O(log n) time per insertion, and a tree
with n leaves has 2n−1 nodes, this algorithm operates in O(n log n) time.
If the symbols are sorted by probability, there is a linear-time (O(n)) method to create a
Huffman tree using two queues, the first one containing the initial weights (along with pointers
to the associated leaves), and combined weights (along with pointers to the trees) being put in
the back of the second queue. This assures that the lowest weight is always kept at the front of
one of the two queues:
1. Start with as many leaves as there are symbols.

2. Enqueue all leaf nodes into the first queue (by probability in increasing order so that
the least likely item is in the head of the queue).
3. While there is more than one node in the queues:
a. Dequeue the two nodes with the lowest weight by examining the fronts of both
queues.
b. Create a new internal node, with the two just-removed nodes as children (either
node can be either child) and the sum of their weights as the new weight.
c. Enqueue the new node into the rear of the second queue.
4. The remaining node is the root node; the tree has now been generated.
It is generally beneficial to minimize the variance of codeword length. For example, a

communication buffer receiving Huffman-encoded data may need to be larger to deal with
especially long symbols if the tree is especially unbalanced. To minimize variance, simply
break ties between queues by choosing the item in the first queue. This modification will retain
the mathematical optimality of the Huffman coding while both minimizing variance and
minimizing the length of the longest character code.
3.6. Main properties in Huffman Algorithm

The probabilities used can be generic ones for the application domain that are based on
average experience, or they can be the actual frequencies found in the text being compressed.
(This variation requires that a frequency table or other hint as to the encoding must be stored
with the compressed text; implementations employ various tricks to store tables efficiently.)
Huffman coding is optimal when the probability of each input symbol is a negative
power of two. Prefix codes tend to have slight inefficiency on small alphabets, where
probabilities often fall between these optimal points. "Blocking", or expanding the alphabet
size by coalescing multiple symbols into "words" of fixed or variable-length before Huffman
coding, usually helps, especially when adjacent symbols are correlated (as in the case of natural
language text). The worst case for Huffman coding can happen when the probability of a
symbol exceeds 2-1 = 0.5, making the upper limit of inefficiency unbounded. These situations
often respond well to a form of blocking called run-length encoding.
Arithmetic coding produces slight gains over Huffman coding, but in practice these
gains have seldom been large enough to offset arithmetic coding's higher computational
complexity and patent royalties.
3.7. Variations of Huffman Algorithm

Many variations of Huffman coding exist, some of which use a Huffman-like algorithm,
and others of which find optimal prefix codes (while, for example, putting different restrictions
on the output). Note that, in the latter case, the method need not be Huffman-like, and, indeed,
need not even be polynomial time.
1. n-ary Huffman coding

The n-ary Huffman algorithm uses the {0, 1, ... , n − 1} alphabet to encode message
and build an n-ary tree. This approach was considered by Huffman in his original paper. The
same algorithm applies as for binary (n equals 2) codes, except that the n least probable
symbols are taken together, instead of just the 2 least probable. Note that for n greater than 2,
not all sets of source words can properly form an n-ary tree for Huffman coding. In this case,
additional 0-probability place holders must be added. This is because the tree must form an n to
1 contractor. For binary coding, this is a 2 to 1 contractor: any sized set can form such a
contractor.
2. Adaptive Huffman coding

A variation called adaptive Huffman coding calculates the probabilities dynamically
based on recent actual frequencies in the source string. This is somewhat related to the LZ
family of algorithms.
3. Huffman template algorithm

Most often, the weights used in implementations of Huffman coding represent numeric
probabilities, but the algorithm given above does not require this; it requires only a way to
order weights and to add them. The Huffman template algorithm enables one to use any kind
of weights (costs, frequencies, pairs of weights, non-numerical weights) and one of many
combining methods (not just addition). Such algorithms can solve other minimization
problems, a problem first applied to circuit design.
4. Length-limited Huffman coding

Length-limited Huffman coding is a variant where the goal is still to achieve a
minimum weighted path length, but there is an additional restriction that the length of each
codeword must be less than a given constant. The package-merge algorithm solves this problem
with a simple greedy approach very similar to that used by Huffman's algorithm. Its time
complexity is O(nL), where L is the maximum length of a codeword. No algorithm is known
to solve this problem in linear or linearithmic time, unlike the presorted and unsorted
conventional Huffman problems, respectively.
5. Huffman coding with unequal letter costs

In the standard Huffman coding problem, it is assumed that each symbol in the set that
the code words are constructed from has an equal cost to transmit: a code word whose length
is N digits will always have a cost of N, no matter how many of those digits are 0s, how many
are 1s, etc. When working under this assumption, minimizing the total cost of the message and
minimizing the total number of digits are the same thing.
Huffman coding with unequal letter costs is the generalization in which this assumption
is no longer assumed true: the letters of the encoding alphabet may have non-uniform lengths,
due to characteristics of the transmission medium. An example is the encoding alphabet of
Morse code, where a 'dash' takes longer to send than a 'dot', and therefore the cost of a dash in
transmission time is higher. The goal is still to minimize the weighted average codeword
length, but it is no longer sufficient just to minimize the number of symbols used by the
message. No algorithm is known to solve this in the same manner or with the same efficiency
as conventional Huffman coding.
6. Optimal alphabetic binary trees (Hu-Tucker coding)

In the standard Huffman coding problem, it is assumed that any codeword can
correspond to any input symbol. In the alphabetic version, the alphabetic order of inputs and
outputs must be identical. Thus, for example, A = {a,b,c} could not be assigned code H = (A,C)
= {00,1,01}, but instead should be assigned either H (A,C) = {00,01,1}or H(A,C) = {0,10,11}.
This is also known as the Hu-Tucker problem, after the authors of the paper presenting the first
linearithmic solution to this optimal binary alphabetic problem, which has some similarities to
Huffman algorithm, but is not a variation of this algorithm. These optimal alphabetic binary
trees are often used as binary search trees.
7. The canonical Huffman code

If weights corresponding to the alphabetically ordered inputs are in numerical order,
the Huffman code has the same lengths as the optimal alphabetic code, which can be found
from calculating these lengths, rendering Hu-Tucker coding unnecessary. The code resulting
from numerically (re-)ordered input is sometimes called the canonical Huffman code and is
often the code used in practice, due to ease of encoding/decoding. The technique for finding
this code is sometimes called Huffman-Shannon-Fano coding, since it is optimal like Huffman
coding, but alphabetic in weight probability, like Shannon-Fano coding. The Huffman-
Shannon-Fano code corresponding to the example is {000,001,01,10,11}, which, having the
same codeword lengths as the original solution, is also optimal.
CHAPTER IV
CONCLUSION AND SUGGESTION
IV.1 Conclusion
Compression is an important technique in the field of computer science because it can
reduces size of data, then transmitted and stored the data on the internet or storage media
quickly and cheaply than uncompressed data. This paper focused on Huffman algorithm so we
can know it. The authors have explained this algorithm from definition, history, work
mechanism, example of its use, to its application. From the example, we know that the data
compressed by Huffman code saves the number of bit up to 67.9%. That means, the algorithm
proposed by David A. Huffman is indeed proved successful in compressing. Although,
Huffman code is not the best compression method. The simplicity, high speed, and a few patent
rights related to it that make Huffman code is still used.
IV.2. Suggestion
Because this paper only discusses Huffman algorithm theory in brief, it is recommended
that readers also have other references to get the complete material.
BIBLIOGRAPHY
[1] Lelewer, Debra A., and Daniel S. Hirschberg. “Data Compression.” ACM Computing Surveys,
vol. 19, no. 3, 1987, pp. 261–296., doi:10.1145/45072.45074.
[2] Salomon, David. Data Compression: the Complete Reference. Springer London, 2007.
[3] Christensson, Per. "Algorithm Definition." TechTerms. Sharpened Productions, 02 August

2013. Web. 09 March 2018. <https://techterms.com/definition/algorithm>.
[4] Kumar, Krishan. “What Is Data Structure? Definition Data Structure.” Cs-Fundamentals.com,
Cs-Fundamentals.com, cs-fundamentals.com/tech-interview/dsa/what-is-data-structure.php.
[5] W., I.Y.B. Aditya Eka Prabawa. “Kompresi Data Dengan Kode Huffman Dan Variasinya.”
informatika.stei.itb.ac.id/~rinaldi.munir/Matdis/2008-2009/Makalah2008/Makalah0809-080.pdf.

Huffman Algorithm

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Huffman Algorithm

Uploaded by

Copyright:

Available Formats

Huffman Algorithm

Written to fulfill the 1 task of

Algorithm and Data Structure

Depok, March 6th 2018

BASIC THEORY ……………………………………………………………………….6

I.2. Writing Objective

1. To know the history of Huffman algorithm

I.3. Problem Domain

I.4. Writing Methodology

I.5. Writing Framework

2. CHAPTER II BASIC THEORY

a. The Definition of Algorithm

3. CHAPTER III PROBLEM ANALYSIS

4. CHAPTER IV CONCLUSION AND SUGGESTION

2.1.1. The Definition of Algorithm

In computer programming, algorithms are often created as functions. This

2.1.2. Algorithm Criteria By Donald E. Knuth

2.1.3. Type of Algorithm Process

2.2. What is Data Structure?

2.3. Data Compression

2.3.1. Type of Data Compression

Dialogue mode is a data compression that must be within the limits of

Retrieval mode is a data compression opposite to dialogue mode that is

In this data compression, the compression result can be re-compressed

2.3.2. Criteria And Classification of Data Compression Technique

While the classification of data compression techniques are as follows:

This compression technique has the following characteristics:

This compression technique has the following characteristics:

a. Combined between lossy + loseless.

3.1. Data Compression With Huffman Code

Figure III.1. David Albert Huffman

The order of algorithm to form Huffman codes is as follows:

1. First calculated the frequency of occurrence of each symbol in data.

a. First, two symbols are selected with smallest chance of occurence.

3.2. Determination of Huffman Code Simple Example

Table III.1. ASCII Code of “Huffman” Characters

By use ASCII code, the “Huffman” representation in a series of bits is

Table II.2. Frequency And Probability of “Huffman” Characters Occurrence

By use Huffman tree development algorithm, a Huffman tree can be constructed as

So, Huffman code for each character are:

Table II.2. Huffman code for “Huffman”

After compressed by Huffman code, the string can be represented into

3.3. Data Compression Application with Huffman Algorithm

3.4. Characteristics of Huffman Codes

Features associated with Huffman Codes:

Symbol P(x) I(x) Code H(x)

A 0,387 1,369 1 0,530

B 0,194 2,369 3 0,459

C 0,161 2,632 3 0,425

D 0,129 2,954 3 0,381

E 0,129 2,954 3 0,381

Theoretical minimum: 2,176 bit

Code length Huffman : 2,226 bit

3.5. Basic Technique in Huffman Algorithm

1. Start with as many leaves as there are symbols.

It is generally beneficial to minimize the variance of codeword length. For example, a

3.6. Main properties in Huffman Algorithm

3.7. Variations of Huffman Algorithm

1. n-ary Huffman coding

2. Adaptive Huffman coding

3. Huffman template algorithm

4. Length-limited Huffman coding

5. Huffman coding with unequal letter costs

6. Optimal alphabetic binary trees (Hu-Tucker coding)