You are on page 1of 59

Chapter 2

Source Coding
Dr. Mohamed A. Abdelhamed
Department of Communication and Computer Eng.
Higher Institute of Engineering, El-Shorouk Academy.
Academic year (2018-2019)

Contacts:
WhatsApp: +201002323525
E-mail: m.abdelhamed@sha.edu.eg
mohabdelhamed@yahoo.com
2.1 SOURCE ENTROPY
 The entropy of a source is defined as the average
amount of information per symbol (bit/symbol) for
the message generated by the source (X), that is

 Thus, the entropy of a source depends upon the symbol


probabilities and it is maximized when the symbols of
the source is equiprobable as will be shown below.

2
2.1 SOURCE ENTROPY

L: Number of symbols in the source

and the entropy of this source becomes

3
2.1 SOURCE ENTROPY
 Generally, H ( X )  logL for any given set of source
symbol probabilities and the equality is achieved when
the symbols are equiprobable.
 Example:
Consider a discrete memoryless source that emits two
symbols (or letters) x1 and x2 with probabilities q and 1-q,
respectively. Find and sketch the entropy of this source as
a function of q. Hint: This source can be a binary source
that emits the symbols 0 and 1 with probabilities q and 1-
q, respectively.

4
2.1 SOURCE ENTROPY
 Solution:

5
2.1 SOURCE ENTROPY

6
2.1 SOURCE ENTROPY

7
2.1 SOURCE ENTROPY
(Extended source)

8
2.1 SOURCE ENTROPY
(Extended source)

 
H X k  kH ( X )

9
2.1 SOURCE ENTROPY
(Extended source)

10
2.1 SOURCE ENTROPY
(Extended source)

X 2   x1x1, x1x2 , x1x3 , x2 x1, x2 x2 , x2 x3 , x3 x1, x3 x2 , x3 x3

The respective probabilities of these blocks are


1 1 1 1 1 1 1 1 1
 , , , , , , , , 
16 16 8 16 16 8 8 8 4 

11
2.1 SOURCE ENTROPY
(Extended source)

Thus, the entropy of the extended source is

12
2.2 Coding for Discrete Memoryless Sources (DMS)

 The objective of source coding is the efficient


representation of data generated by a discrete
source.
 The device that performs the representation is called a
source encoder, shown in Figure 2.2.

stream of bits
DMS Source
encoder Stream of bits
Correspond to
Figure 2.2 Source Encoding stream of symbols
or message

13
2.2 Coding for Discrete Memoryless Sources (DMS)

14
2.2 Coding for Discrete Memoryless Sources (DMS)

equality is achieved
when the symbols
are equiprobable

L Probability of occurrence
Average code word R   Pn
i i
of symbol
length i 1
Code word length
of symbol 15
2.2 Coding for Discrete Memoryless Sources (DMS)

 The coding efficiency of the source encoder is

Rmin H ( x)
  1
R R

H(x) ‫ألن في الحقيقة العملية النستطيع أن نصل إلي‬

16
2.2 Coding for Discrete Memoryless Sources (DMS)

 The redundancy of the source encoder is

Code efficiency

17
2.2.1 Fixed-Length Code Words

 All symbols have the same number of bits (R (bits)).


Assuming L possible symbols, then

R  log 2 L   1
Where:
 x  : denotes the largest integer less than x.
R: Number of bits/symbol
L: Number of symbol

18
2.2.1 Fixed-Length Code Words

H ( x)

R

log 2 L

R

R  log 2 L

 1
19
2.2.1 Fixed-Length Code Words

 Problem:
Find the code efficiency for the fixed-length word
encoder assuming the following DMSs:
(a) 8 equiprobable symbols
(b) 10 equiprobable symbols
(c) 100 equiprobable symbols
(d) 4 symbols with probabilities 0.5, 0.25, 0.125, 0.125

20
2.2.1 Fixed-Length Code Words

Ideal value Actual value

Therefore, R is larger than H(x) by at most 1 bit per


symbol.

21
2.2.1 Fixed-Length Code Words

2 L
N k or N  k log 2 L
Hence, the minimum value of N is given by
N   k log 2 L   1
For extended source 22
2.2.1 Fixed-Length Code Words

2 L
N k or N  k log 2 L
Hence, the minimum value of N is given by
N   k log 2 L   1
For extended source

23
2.2.1 Fixed-Length Code Words

 Thus, the average number of bits/symbol is:

For extended source

 By extended source, we can approach to entropy H(x)


as k increase but at expense of complexity in design.
 The efficiency of the encoder is given by
H ( x) kH ( x)
 
R N 24
2.2.2 Variable Length Code Words

 When the source symbols are not equiprobable, a more


efficiently coding method is to use variable length code
words.
 An example of such coding is the Morse code in which
the letters which occur frequently are assigned short
code words and those that occur infrequently are
assigned long code words. This type of coding is called
entropy coding.
 Advantage: Provides the optimum (lower) data rate.
 Disadvantage: Complex in design of encoder/decoder.

25
2.2.2 Variable Length Code Words

 The average number of bits/symbol is minimized


where:
L Probability of occurrence
R   ni Pi
i 1
Code word length of symbol
(bit/symbol)

 For detection, the code should be uniquely decodable


and instantaneously decodable.
What has been sent will
be received without error
When the last bit arrives in the code
we know that the code has been finished 26
2.2.2 Variable Length Code Words

 Example:
consider a DMS with output symbols x1, x2, x3 and x4
with probabilities 1/2, 1/4, 1/8, and 1/8, respectively.

Table 2.1 shows three different codes. Consider the


sequence to be decoded is 0 0 1 0 0 1.
Table 2.1 Three different codes for the same source.
Symbol Pi Code I Code II Code III
x1 1/2 1 0 0
x2 1/4 00 01 10
x3 1/8 01 011 110
x4 1/8 10 111 111 27
2.2.2 Variable Length Code Words

 In code I,
the first symbol corresponding to 0 0 is x2. However, the
next four bits are not uniquely decodable since they may
be decoded as x4x3 or x1x2x1.
Perhaps, the ambiguity can be resolved by waiting for
additional bits, but such a decoding delay is highly
undesirable.
 The tree structure of Code II and code III are shown in
Figure 2.3. Code II is uniquely decodable but not
simultaneously decodable (delay in decoding process
which is undesired).
28
2.2.2 Variable Length Code Words

Figure 2.3 Code structure for code II and code III.


29
2.2.2 Variable Length Code Words

30
Kraft Inequality

 For given code, we can determine is a prefix or not


by this condition (Kraft Inequality):
Code length
L
 2  nk
1
k 1

This condition on code word lengths of the


code not on the code words themselves

This condition may be achieved but the code is not


prefix. 31
Kraft Inequality

 Code I violates the Kraft inequality and, therefore, it


cannot be a prefix code. However, Kraft inequality is
satisfied by both code II and code III, though only code
III is a prefix code.
 Prefix codes are distinguished from other uniquely
decodable codes by the fact that the end of the code
word is always recognizable which makes the code
instantaneously decodable.

32
Huffman Coding Algorithm

33
Huffman Coding Algorithm

34
Huffman Coding Algorithm

35
Huffman Coding Algorithm

 Example:
Consider a DMS with five possible symbols having the
probabilities 0.4, 0.2, 0.2, 0.1 and 0.1. Use Huffman
encoding algorithm to find the code word for each symbol
and the code efficiency.

Solution
The following table shows the complete steps of Huffman
encoding for the given source.

36
Huffman Coding Algorithm

37
Huffman Coding Algorithm

 The table shows that the source symbols with


probabilities 0.4, 0.2, 0.2, 0.1, 0.1 are assigned the code
words 00, 10, 11, 010, 011, respectively.
 Therefore, the average number of bits per symbol is

L
R   ni Pi  (2  0.4  2  0.2  2  0.2  3  0.1  3  0.1) = 2.2 bits
i 1

Since the source entropy is


L
1
R   Pi log = 2.1219 bits
i 1 Pi
38
Huffman Coding Algorithm

 Therefore, the code efficiency can be determined as

H ( x)
 = 0.9645
R

 The code redundancy is

  1  = 0.0355

39
Huffman Coding Algorithm

 It should be noted that the Huffman encoding process is


not unique. There is arbitrariness in the way a bit 0 and
a bit 1 are assigned to the two symbols in the last stage.
Also, when the probability of a combined symbol is
found to equal another probability in the list, the
combined symbol may be placed as high as possible or
as low as possible. In these cases, the code words can
have different lengths but the average codeword length
is the same.

40
Huffman Coding Algorithm

L
 2   (ni  R)2 Pi
i 1

41
Huffman Coding Algorithm

 Report:
Consider a DMS that produces three symbols x1, x2 and x3
with probabilities 0.45, 0.35, and 0.2, respectively. Find
the entropy, the code words, and the encoding efficiency
in both cases of single symbol encoding and two symbol
encoding (second-order extension code).

42
Fano Coding Algorithm

 The Fano code is performed as follows:


1. Arrange the information source symbols in order of
decreasing probability.
2. Divide the symbols into two equally probable groups,
as possible as you can.
3. Each group receives one of the binary symbols (i.e. 0 or
1) as the first symbol.
4. Repeat steps 2 and 3 per group as many times as this is
possible.
5. Stop when no more groups to divide
43
Fano Coding Algorithm

 Example:

44
Fano Coding Algorithm

Note that:
 If it was not possible to divide precisely the
probabilities into equally probable groups, we should
try to make the division as good as possible, as we can
see from the following example.

45
Huffman vs. Fano coding

46
MUTUAL INFORMATION IN DMCs

 Discrete memoryless channel (DMC) (; present output depend


only on present input) is a mathematical (statistical) model for a
channel with discrete input X and discrete output Y.
 The Channel is completely specified by “A set of transition
probabilities”.
 The DMC is represented graphically as shown in the figure.
The sum of the
probabilities
that will come- Transition
out of the same (Conditional)
symbol must Prob. of channel
equal one

47
MUTUAL INFORMATION IN DMCs

 If the channel is noiseless, y1 received when x1 is


transmit.
 If the channel is noisy, there is found amount of
uncertainty, thus the characteristic of the channel can be
described by the channel matrix P,
 P( y1 x1 ) P( y2 x1 ) . . . P( yQ x1)  Each row
  represent one
 P( y1 x2 ) P( y2 x2 ) . . . P( yQ x2 )  input, thus sum.of
  Prob. In each row
P  .  Equal one
. 
 
. 
 
 P( y1 xq ) 0 P( y2 xq ) . . . P( yQ xq ) 

48
MUTUAL INFORMATION IN DMCs

Where 𝑃 𝑦𝑗 𝑥𝑖 is the probability that 𝑦𝑗 is received when


𝑥𝑖 is transmitted.
 It should be noted that each row of the channel matrix P
corresponds to a fixed channel input, whereas each
column corresponds to a fixed channel output.
 The sum of the elements along a row is always equal to
one, that is
Q
 P  y j xi   1 for all 𝑖
j 1

49
MUTUAL INFORMATION IN DMCs

 The joint probability distribution of the random


variables X and Y is given by

    
P xi , y j  P y j , xi  P X  xi ,Y  y j 
 
 P Y  y j X  xi P  X  xi 


 P X  xi Y  y j P Y  y j   
 P  y j xi  P  xi   P  xi y j  P  y j 

50
MUTUAL INFORMATION IN DMCs

 The marginal probability distribution of the output


random variable 𝑌 can be determined by averaging
𝑃(𝑥𝑖 , 𝑦𝑖 ) on 𝑥𝑖 , that is
q q
     
P y j   P xi , y j   P y j xi P  xi 
i 1 i 1

 Thus, the probabilities of the different output symbols


can be determined knowing the probabilities of the
input symbols 𝑝(𝑥𝑖 ) for 𝑖 = 1, 2, … , 𝑞 and the matrix of
transition probabilities 𝑃 𝑦𝑗 𝑥𝑖 .

51
MUTUAL INFORMATION IN DMCs

 The reverse conditional probability 𝑃 𝑥𝑖 𝑦𝑗 is the


probability that 𝑥𝑖 is transmitted when 𝑦𝑗 is received,
and it can be determined using Bayes rule as

P  xi y j  

P xi , y j  
P  y j xi  P  xi 
P y j  P y j 

P  y j xi  P  xi  P  y j xi  P  xi 
 q  q
 P  xi , y j   P  y j xi  P  xi 
i 1 i 1
52
MUTUAL INFORMATION IN DMCs

 The conditional entropy of X given that Y=yj is given


by

   
q
1
H X y j   P xi y j log
i 1 
P xi y j  bits/symbol

 Thus, the mean entropy or the average uncertainty


about a transmitted symbol when a symbol is received is

53
MUTUAL INFORMATION IN DMCs

   
Q
H X Y  H X yj P yj
j 1

 P  xi  
Q q
1

 
y j P y j log 2
j 1 i 1 P xi y j
Q q
 P  xi , y j  log2 P
1

j 1 i 1  xi y j 

54
MUTUAL INFORMATION IN DMCs

 The conditional entropy 𝐻 𝑋 𝑌 is also called the


equivocation and it represents the amount of
uncertainty remaining about the channel input 𝑋 after
observing the channel output 𝑌.
 Since 𝐻 𝑋 represents the amount of uncertainty about
the channel input 𝑋 before observing the channel output
𝑌, then the difference 𝐻 𝑋 − 𝐻 𝑋 𝑌 represents the
amount of information provided by observing the
channel output 𝑌 , and it is called the mutual
information of the channel, that is

55
MUTUAL INFORMATION IN DMCs

I  X ;Y   H  X   H  X Y 
 H Y   H Y X 

H  X   H  X Y   H Y   H Y X 

56
Relationships between entropies

57
Relationships between entropies

58
?

59

You might also like