You are on page 1of 25

ECNG 6703 - Principles of Communications

Introduction to Information Theory - Source Coding

Sean Rocke

September 16th , 2013

ECNG 6703 - Principles of Communications

1 / 25

Outline

Digital Communication Preliminaries Models for Information Sources Measures of Information Source Coding Conclusion

ECNG 6703 - Principles of Communications

2 / 25

Digital Communication Preliminaries

Communications Signal Examples


Lets get some intuition before starting source coding. . . Consider what happens just before transmitting or just after receiving any of the following: Taking & posting a narcissistic picture of you ziplining in Chaguaramas, on your Facebook prole A GSM phone conversation Sending instrumentation data to a control system for a manufacturing plant Downloading a legal copy of an ebook from amazon.com Live transmission of a Machel Montano concert over the Internet How do we communicate these signals using a digital communications system? What characteristics are of importance?
ECNG 6703 - Principles of Communications 3 / 25

Digital Communication Preliminaries

Elements of a Digital Communications System


Information source and input transducer Source encoder Channel encoder Digital modulator

Channel

Output transducer

Source decoder

Channel decoder

Digital demodulator

Elements not specically included in the illustration: Carrier and Symbol Synchronization A\D interface Channel interfaces (e.g., RF front end (RFFE), ber optic front end (FOFE), BAN front end (BANFE), . . . )

ECNG 6703 - Principles of Communications

4 / 25

Digital Communication Preliminaries

Signal Classication Matrix

Time Discrete Continuous

Value Discrete
ECNG 6703 - Principles of Communications

Continuous

5 / 25

Models for Information Sources

Source Coding Dened

Source coding: The process of efciently converting the output of either an analog or digital source into a bit sequence. Source coding challenge: How can the source output (digital or analog) be represented in as few bits as possible? Key performance metrics: coding efciency, redundancy, rate distortion, implementation complexity To answer the above, it is essential to model the signal sources. . .

ECNG 6703 - Principles of Communications

6 / 25

Models for Information Sources

Analog Source Models

An information source produces a random output The analog output at any time, t , is a random variable, X (t ) with CDF FX (x1 , t1 ) = P [X (t1 ) x1 ] The joint CDF is dened as, FX (x1 , . . . , xn ; t1 , . . . , tn ) = P [X (t1 ) x1 , . . . , X (tn ) xn ] We consider statistically stationary outputs where, FX (x1 , . . . , xn ; t1 , . . . , tn ) = FX (x1 , . . . , xn ; t1 + , . . . , tn + )

ECNG 6703 - Principles of Communications

7 / 25

Models for Information Sources

AnalogtoDigital Conversion
1 0.8 0.6 0.4 Amplitude 0.2 0 0.2 0.4 0.6 0.8 1 0 1 2 3 Time, t 4 5 6 Original signal Quantized signal Sampled signal Quantization error

For bandlimited X (t ) with bandwidth W , we can obtain samples {X ( 2n W )} to obtain a discretetime output Precision is lost when discretetime analog signals are quantized (Unavoidable!)
ECNG 6703 - Principles of Communications 8 / 25

Models for Information Sources

Discrete Source Models


An information source produces a random output, either directly (i.e., if discretetime&discretevalued), or through A\D conversion (i.e., if continuoustime/continuousvalued) Information source with an alphabet, L := {x1 , . . . , xL }, emits random letter sequence Assume each letter has probability, pk = PX (xk ), 1 k L of occurrence, where L k =1 pk = 1 If letter occurrences are independent, source is discrete memoryless source (DMS) For dependent case we consider statistically stationary outputs where, pX (x1 , . . . , xn ; t1 , . . . , tn ) = pX (x1 , . . . , xn ; t1+m , . . . , tn+m )

ECNG 6703 - Principles of Communications

9 / 25

Models for Information Sources

Information Source Models


Information Sources

Continuous

Discrete

Memory

Memoryless

Questions:
1

Annotate this diagram based upon the previous information.


ECNG 6703 - Principles of Communications 10 / 25

Measures of Information

Measures of Information
So given a particular source model, how do we quantify the information content?. . . Consider two discrete RVs, X and Y , and assume that an outcome Y = y is observed Can we determine quantitatively the amount of information the occurrence of Y = y provides about the event X = x ? Mutual information between outcomes x and y : measure of information provided by the occurrence of Y = y about X = x : I (x ; y ) = log
PX |Y (x ,y ) PX (x )

Mutual information between RVs X and Y : average of I (x ; y ): I (X ; Y ) = E [I (x ; y )] =


x X y Y

PXY (x , y )I (x ; y )

Units of bits (i.e., if log2 used) or nats (i.e., if ln used)


ECNG 6703 - Principles of Communications 11 / 25

Measures of Information

Measures of Information
Properties of Mutual Information: I (X ; Y ) = I (Y ; X ) I (X ; Y ) 0 I (X ; Y ) = 0 if and only if X and Y independent I (X ; Y ) min{|X |, |Y|} Questions: Consider I (x ; y ) for the following cases:
1 2 3

X and Y are statistically independent X and Y are fully dependent X and Y are partially dependent

ECNG 6703 - Principles of Communications

12 / 25

Measures of Information

Measures of Information: Entropy


How can we measure uncertainty in the source? Entropy of X : H (X ) = E [log PX (x )] = Note: We dene 0 log 0 = 0 Units of bits (i.e., log2 used) or nats (i.e., ln used) Questions:
1

x X

PX (x )log PX (x )

What is the entropy of a deterministic information source? When is the entropy of a DMS with alphabet size, |X |, maximized? What is H (X ) in this case?

ECNG 6703 - Principles of Communications

13 / 25

Measures of Information

Measures of Information: Entropy


Properties: I (X ; X ) = H (X ) 0 H (X ) log |X | I (X ; Y ) min{H (X ), H (Y )|} If Y = g (X ), then H (Y ) H (X ) Questions:
1

Calculate the entropy for a binary source with probability, p, that a 1 occurs. Plot the Entropy function (i.e., H (X ) vs p) using MATLAB.

ECNG 6703 - Principles of Communications

14 / 25

Measures of Information

Measures of Information: Joint & Conditional Entropy


Multivariable extension of entropy. Consider RVs X and Y : H (X , Y ) = E [log PXY (x , y )] = (x ,y )X Y PXY (x , y )log PXY (x , y ) If X = x is known, then the entropy of Y , given X = x : H (Y |X = x ) =
y Y

PY |X (x , y )log PY |X (x , y )

We obtain the conditional entropy if this quantity is averaged over all X values: H (Y |X ) = E log PY |X (x , y ) = (x ,y )X Y PXY (x , y )log PY |X (x , y ) Question: Show that H (X , Y ) = H (X ) + H (Y |X ).
ECNG 6703 - Principles of Communications 15 / 25

Measures of Information

Measures of Information: Joint & Conditional Entropy

Properties of Joint and Conditional Entropy: 0 H (X |Y ) H (X ) H (X |Y ) = H (X ) if and only if X and Y independent H (X , Y ) = H (X ) + H (Y ) if and only if X and Y independent H (X , Y ) = H (X ) + H (Y |X ) = H (Y ) + H (Y |X ) H (X ) + H (Y ) I (X ; Y ) = H (X ) H (X |Y ) = H (Y ) H (Y |X ) = H (X ) + H (Y ) H (X , Y )

ECNG 6703 - Principles of Communications

16 / 25

Measures of Information

Measures of Information: Continuous RVs


Consider X and Y are continuous RVs with joint and marginal PDFs, fXY (x , y ), fX (x ) and fY (y ) respectively: Average mutual information: I (X ; Y ) = E log
log fY |X (x ,y )fX (x ) = fX (x )fY (y ) fY |X (x ,y )fX (x ) dxdy fX (x )fY (y )

Average conditional entropy: H (X |Y ) = E log fX |Y (x , y ) =


log fX |Y (x , y )dxdy

Differential entropy: Not the same as entropy!!! H (X ) = E [log fX (x )] =


fX (x )log fX (x )dx

Note: I (X ; Y ) = H (X ) H (X |Y ) = H (Y ) H (Y |X ).
ECNG 6703 - Principles of Communications 17 / 25

Source Coding

Lossless Source Coding


How can the source output (digital) be represented in as few bits as possible, such that perfect reconstruction of the source is possible from compressed data? What is the theoretical lowest number of bits required for representing the source output? Shannons 1st Theorem: Lossless Source Coding Theorem: Let X denote a DMS with entropy, H (X ). There exists a lossless source code at any rate R if R > H (X ). No code exists for R < H (X ). Is it possible to achieve this bound? How?
Variablelength coding algorithms (e.g., Huffman algorithm) Fixedlength coding algorithms (e.g., LempelZiv algorithm)

You are required to know how to use both the Huffman and LempelZiv algorithms to determine source codes!
ECNG 6703 - Principles of Communications 18 / 25

Source Coding

Evaluating Code Performance: Efciency


Letter x1 x2 x3 Probability Self-information 0.45 1.156 0.35 1.520 0.20 2.330 H(X) = 1.513 bits/letter R1 = 1.55 bits/letter Efficiency = 97.6% Code 1 00 01

Questions:
1

Can we improve on this code?


ECNG 6703 - Principles of Communications 19 / 25

Source Coding

Evaluating Code Performance: Efciency


Letter x1 x1 x1x2 x2x1 x2 x2 x1 x3 x3x1 x2 x3 x3x2 x3x3 Probability Self-information 0.2025 2.312 0.1575 2.676 0.1575 2.676 0.1225 3.039 0.09 3.486 0.09 3.486 0.07 3.850 0.07 3.850 0.04 4.660 2H(X) = 3.026 bits/letter pair R2 = 3.0675 bits/letter pair Efficiency = 98.6% Code 10 001 010 011 111 0000 0001 1100 1101

Questions:
1

What is the tradeoff compared to the rst code for single letters?
ECNG 6703 - Principles of Communications 20 / 25

Source Coding

Evaluating Code Performance


Letter x1 x2 x3 x4 Variable Length? Fixed Length? Uniquely Decodable? Instantaneous? Efficiency Probability Code 1 0.5 1 0.25 00 0.125 01 0.125 10 Code 2 0 10 110 111 Code 3 0 01 011 111 Code 4 00 01 10 11

Questions:
1 2 3

Classify the following codes. Which is the most efcient? Which would you choose?
ECNG 6703 - Principles of Communications 21 / 25

Source Coding

Evaluating Code Performance: Rate Distortion


Sampling and quantization of an analog source generally results in:
waveform distortion loss of signal delity

Distortion between actual source samples {xk } & quantized k }: values {x


Single sample measure: k ) = (xk x k )2 (squared error ) d (xk , x nsample sequence measure: k ) = d (xk , x
1 n n k =1

k ) d (xk , x

Distortion, D : k ) = E d (Xk , X
1 n n k =1

k ) E d (Xk , X
22 / 25

ECNG 6703 - Principles of Communications

Source Coding

Evaluating Code Performance: Rate Distortion

Rate distortion function: Minimum bits/source output symbol required to represent source output, X , with distortion less than or equal to D: R (D ) = minf Note: Evaluation of code performance using rate distortion applies to lossy coding, where the data is compressed, subject to a maximum tolerable distortion (i.e., some of the information is lost during coding and hence cannot be regained via reconstruction).
X |X

)] D {I (X ; X )} ):E [d (X ,X (x ,x

ECNG 6703 - Principles of Communications

23 / 25

Conclusion

Conclusion
We covered: Elements of a digital communications system Mathematical models for information sources Measures of information systems Lossless & lossy source coding Source code evaluation More MATLAB Your goals for next class: Continue ramping up your MATLAB skills Make sure you can apply Huffman and Lempel-Ziv algorithms! Complete HW 2 Review notes on Channel Coding in prep for next class
ECNG 6703 - Principles of Communications 24 / 25

Q&A

Thank You

Questions????

ECNG 6703 - Principles of Communications

25 / 25

You might also like