Professional Documents
Culture Documents
Information Theory
Information Theory
P(s = sk ) = pk ,
k = 0,1, .. , K -1
and
=1
k =0
1
( Info.
)
Pr ob.
1
I (Sk ) = log( )
pk
Property of Information
1) I (s k ) = 0 for p k = 1
2) I (sk ) 0 for 0 pk 1
3) I (sk ) > I (si ) for pk < pi
4) I (sk si ) = I (sk ) + I (si ), if sk and si statist. indep.
Case 1: Obviously, if we are absolutely certain of the outcome of
an event, even before it occurs, there is no information gained.
Case 2: That is to say, the occurrence of an event S = sk either
provides some or no information, but never brings about a loss of
information.
Case 3: That is, the less probable an event is, the more information
we gain when it occurs.
Entropy of DMS
H ( S ) = E [ I ( s k )] =
k=0
The property of H
K -1
pk I ( sk ) =
k =0
p k lo g 2 (
1
)
pk
0 H ( S ) log 2 K
pk = 1
0.5
0.5
source
Extension of DMS
(Entropy)
Extension of DMS
(Entropy)
Extension of DMS
(Entropy)
p(s0) =1/ 4
p(s1) =1/ 4
p(s2) =1/ 2
Extension of DMS
(Entropy)
Statement:
L H (S )
Coding efficiency:
K 1
L = p k lk
k =0
H (S )
=
L
H (S ) L < H (S ) + 1
Prefix Coding
Prefix Coding
Decoding:
l
2 k 1
k =0
Instantaneous Codes:
Example:
Example:
Example:
Given
S={s1,s2,s3,s4,s5,s6,s7,s8,s9}
and
X={0,1}. Further if l1=l2=2 and l3=l4=l5
=l6=l7=l8=l9=k.
Algorithm.
Step 1: Arrange all messages in descending
order of probability.
Example
No. Of
Bits
Messages
mk
pk
Coding Procedure
m1
m2
1/8/
m3
1/8
m4
1/16
m5
(lk)
Code
100
101
1100
1/16
1101
m6
1/16
1110
m7
1/32
11110
m8
1/32
11111
Ri = R s * H ( S )
R0 = R s * L
bps
= H (S ) L
37
K 1
2 = p k (lk L ) 2
k =0
38
Method 1:
Placing the probability of
possible.
the
39
Figure 4.1.a
40
Figure 4.1.b
Memory storage requirement for Huffman code:
In order to store 100 characters in computer , the umber
of bits required=(40*2)+(20*2)+(20*2)+(10*3)+(10*3)= 220
bits.
41
K 1
2 = p k (l k L ) 2 = 0 .16
k =0
42
K 1
2 = p k ( l k L ) 2 = 1 .36
k =0
44
Example:
If we have to encode a string of symbols s3s2s4s3s0s1s2 we
shall start from the left, taking one symbol at a time.
The code corresponding to the first symbol s3 is 010, the second
symbol s2 has a code 11, the third symbol s4 has a code 011, the
fifth symbol s0 has a code 00; and the sixth symbol s1 has a code
10;
Example:
If we have to encode a string of symbols s3s2s4s3s0s1s2 we
shall start from the left, taking one symbol at a time.
The code corresponding to the first symbol s3 is 010, the second
symbol s2 has a code 11, the third symbol s4 has a code 011, the
fifth symbol s0 has a code 00; and the sixth symbol s1 has a code
10;
50
Applications:
Used in many, if not most compression algorithms such as
gzip, bzip, jpeg (as option), fax compression,
51
code.
Cheap to generate codes.
Cheap to encode and decode.
Conditional probabilities
Suppose we have a single event A with possible
outcomes {ai}.
Everything we know is specified by the
probabilities for the possible outcomes: P(ai).
For the coin toss the possible outcomes are
heads and tails:
P(heads) = 1/2 & P(tails) = 1/2.
More generally::
0 P (ai ) 1
P (ai ) = 1
Two events:
Add a second event B with outcomes {bj} and probabilities
P(bj).
Complete description provided by the joint probabilities:
P(ai,bj)
If A and B are independent and uncorrelated then
P(ai ) = P (ai , b j )
j
P(b j ) = P (ai , b j )
i
Two events:
What does learning the value of A tell us about the
probabilities for the value of B?
If we learn that A = a0, then the quantities of interest are the
conditional probabilities:
P(bj|a0)
given that
This conditional probability is proportional to the joint probability:
P (b j | a0 ) P (a0 , b j )
Finding the constant of proportionality leads to Bayes rule:
P (ai , b j ) = P (b j | ai ) P (ai ) or
P (ai , b j ) = P (ai | b j ) P (b j )
ij
H ( A) =
ij
H ( B) =
ij
H ( A) + H ( B) H ( A, B) =
ij
P(ai , b j )
0
P(ai , b j ) log
P(ai ) P(b j )
p( x j , yk ) = P X = x j Y = yk
As the entropy depends only on the probability
distribution, it is natural to define the joint entropy of
the pair H(X ,Y) as
J 1 K 1
H ( , ) =
j =0 k =0
p ( x j , y k ) log 2
p ( x j , y k )
57
58
59
Input alphabet :
Output alphabet :
61
63
65
Pe =
P (Y = y
k =0
k j
J 1 K 1
)=
p( y
/ x j ) p( x j )
j =0 k =0
k j
67
Mutual Information
Channel input X(selected from input alphabet X ) and
Channel output Y(selected from output alphabet Y).
How can we measure the uncertainty about X after
observing Y ?
Now defining the conditional entropy of X selected from
input alphabet X, given that Y=yk. We can write
Mutual Information
The mean of entropy H ( / Y = y k )
alphabet Y is therefore given by
Mutual Information
The difference H ( ) H ( / )
is called the
mutual information, which is measure of uncertainty
about the channel input after observing the channel
output.
p ( x j , yk )
I ( ; ) == p ( x j , y k ) log 2
p ( x j ) p ( yk )
j =0 k =0
J 1 K 1
70
I ( ; ) = I (Y ; )
where the mutual information I ( ; ) is a measure
of the uncertainty about the channel input that is
resolved by observing the channel output, and
the mutual information I (Y ; ) is a measure of the
uncertainty about the channel output that is resolved
by sending the channel input.
71
I ( ; ) = I (Y ; )
I ( ; ) = H ( ) H ( / )
72
I ( ; ) = H ( ) H ( / )
J 1 K 1
I ( ; ) =
j=0 k =0
J 1 K 1
j=0 k =0
p ( x j , y k ) log 2
p(x j )
p ( x j , y k ) log 2
p(x j / yk )
73
I ( ; ) =
j=0 k =0
1
log 2
p ( x j , y k ) log 2
p(x j )
p ( x j / y k )
J 1 K 1
I ( ; ) =
j=0 k =0
1
+ log 2 p ( x j / y k )
p ( x j , y k ) log 2
p( x j )
J 1 K 1
I ( ; ) =
j=0 k =0
p ( x j / y k )
p ( x j , y k ) log 2
p ( x j )
74
J 1 K 1
I ( ; ) =
j=0 k =0
p ( y k / x j )
= I ( ; )
p ( x j , y k ) log 2
p
(
y
)
75
I ( ;Y ) 0
with equality if and only if
76
J 1 K 1
H ( , ) =
j=0 k =0
p ( x j , y k ) log 2
p ( x j , y k )
p ( x j ) p ( y k )
H ( , ) =
p ( x j , y k ) log 2
p ( x j , y k )
j=0 k =0
J 1 K 1
+
p ( x j , y k ) log 2
p( x j ) p( yk )
j=0 k =0
J 1 K 1
(1)
77
j=0 k =0
p ( x j , y k ) log 2
p ( x j ) p ( y k )
1 K 1
=
p(x j , yk ) +
log 2
p( x j )
j =0
k =0
J 1
K 1
1 J 1
p(x j , yk )
log 2
p( yk ) j=0
k =0
p(x
K 1
j,
yk ) =
k =0
j=0
j ) p( yk
/ x j ) = p(x j ) *
k =0
J 1
p(x
p(x
K 1
yk ) =
p( y
j =0
/ x j ) = p( x j )
k =0
J 1
j,
p( y
J 1
) p( x j / yk ) = p( yk ) *
p(x
/ yk ) = p( yk )
j=0
78
p ( x j , y k ) log 2
p( x j ) p ( yk )
j=0 k =0
J 1
1 K 1
1
=
p ( x j ) log 2
+
p ( y k ) log 2
p(x j )
p
(
y
)
k
j =0
k =0
= H ( ) + H ( )
J 1 K 1
p ( x j / y k )
I ( ; ) =
p ( x j , y k ) log 2
p ( x j )
j =0 k =0
J 1 K 1
p ( x j , yk )
=
p ( x j , y k ) log 2
p ( x j ) p( y k )
j=0 k =0
J 1 K 1
79
H ( , ) = I ( ; ) + H ( ) + H ( )
I ( ; ) = H ( ) + H ( ) H ( , )
80
k
r=
n
For accurate reconstruction of the original source sequence
at the destination requires that the average probability of
symbol error should be low.
82
84
85
p ( x j , yk )
I ( ; ) = p ( x j , y k ) log 2
p( x j ) p ( yk )
j =0 k =0
J 1 K 1
I ( ;Y )
of
86
C = max {I ( ; Y )}
p(x j )
C is measured
in bits per
f X ( x)
variable X.
h( X ) =
dx
f
(
x
)
log
2
X
f X ( x)
H (X )
1
= lim
f X ( x k ) x log 2 f ( x ) x
x 0 k = 0
X k
H (X )
1
1
= lim
f X ( x k ) x log 2 f ( x ) + log 2 x
x 0 k = 0
X k
H (X )
1
f X ( x k ) log 2
f (x )
X k
k =
= lim
x 0
log 2 ( x ) f X ( x k ) x
k =
89
x0
f X ( x)dx = 1
dx lim log2 (x )
H ( X ) = f X ( x) log2
f X ( x)
x0
90
1
dx
h ( ) = f ( x ) log 2
f ( x)
92
f X ( x) = b a
0
for
x (a, b)
otherise
1
1
h( X ) =
dx = log 2 ( b a )
log 2
ba
1 (b a )
a
f X ( x)dx = 1
f X ( x) =
1
2 2
x2
2 2
e
94
h( X ) =
f X ( x) log2 ( f X ( x))dx
1
h( X ) = f X ( x) log2
2 2
h( X ) = f X ( x) log2
x2
2 2
dx
e
x2
2 2
1
dx
dx f X ( x) log2 e
2
2
95
log
e
h( X ) = f X ( x) log2 2 2 dx + 2 x 2 f X ( x)dx
2
log
e
log2 e 2
2
h( X ) = log2 2 *1 +
*
2 2
1/ 2 1
1
2
2
h( X ) = log2 2 *1 + log2 e = log2 2
+ log2 e
2
2
1
1
1
2
2
2
96
f X ( x / y)
dxdy
I ( X ,Y ) =
f X ,Y ( x, y) log2
f X ( x)
Where fX,Y(x,y) is the joint probability density function
of X and Y, and fx(x|y) is the conditional probability
density function of X, given that Y = y.
1 . I ( ; ) = I (Y ; X )
2 . I ( ; ) 0
3.
4.
I ( ; ) = h ( X ) h ( X | Y )
I ( ; ) = h (Y ) h (Y | X )
X ,
1
dxdy
h( X | Y ) =
f X ,Y ( x, y) log2
f X ( x | y)
1
dxdy
h(Y | X ) =
f X ,Y ( x, y) log2
f Y ( y | x)
99
K = 2 BT
-------------(x)
100
Nk
Let additive white Gaussian noise (AWGN) of zero mean
and power spectral density N0/2. The noise is bandlimited to B hertz.
Let Yk, k = 1,2, . . ,K, denote continuous random
variables obtained by uniform sampling of the process Y(t)
at the Nyquist rate of 2B samples per second .
101
102
Yk = X k + N k
2 = N0B
We assume that the samples Yk, k = 1, 2, . . . , K are
statistically independent.
Typically, the transmitter is power limited; it is therefore
reasonable to define the cost as
E [ X k2 ] = P
; k = 1, 2 , 3,......., K .
103
C=
max I ( ; Y ) : E [ X 2 ] = P
k
f X k ( x)
104
I ( k ; k ) = h (Yk ) h (Yk | X k )
Now we can that the conditional differential entropy of
Yk, given Xk, is equal to the differential entropy of Nk
h (Yk | X k ) = h ( N k )
Hence, we may rewrite mutual information I(Xk; Yk)
I ( k ; k ) = h (Yk ) h ( N k )
In order to maximize mutual information I(Xk; Yk), we
have choose the samples of the transmitted signal from a
noiselike process of average power P.
C = max [I ( ; Y ) ] : X k Gaussian
and
E[ X 2 ] = P
k
105
P +
Gaussian
1
h (Yk ) = log 2 2 e P + 2
1
h ( N k ) = log 2 2 e 2
2
106
I ( k ; k ) = h (Yk ) h ( N k )
1
1
I ( k ; k ) = log 2 2 e P + log 2 2 e 2
P + 2
1
I ( k ; k ) = log 2
2
2
1
= log 1 + P
2
2
2
1
P bits per transmission -------(p)
C = log 2 1 +
107
C = B log 2 1 +
2
108