Slides IT2 SS2012

Lecture Course: Information Theory II
Marius Pesavento
Communication Systems Group

Institute of Telecommunications
Technische Universitat Darmstadt
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 1
NTS
COURSE ORGANIZATION
Instructors: Dr.-Ing. Marius Pesavento, S3/06/204,

pesavento@nt.tu-darmstadt.de FG Nachrichtentechnische Systeme
(NTS),
Teaching assistant: Yong Cheng, S3/06/205, e-mail:

yong.cheng@nt.tu-darmstadt.de
Website: http://www.nts.tu-darmstadt.de/
Lecture notes and slides will be posted in TUCAN
Office hours: on request (please send an e-mail to the TA or instructor)
Written final exam (closed-book)
Examination date (presumably) Tuesday July 31, 2012: 12.00 - 14.00
NTS
RECOMMENDED TEXTBOOKS
1. D. Tse, Fundamentals of Wireless Communication, Cambridge University

Press, 2005. (main reference)
2. A. El Gamal and Y.H. Kim Network Information Theory, Cambridge
University Press, 2012.
3. A. Goldsmith, Wireless Communications, Cambridge University Press, 2005.
4. T. M. Cover and J A. Thomas, Elements of Information Theory, John Wiley
& Sons, 1991.
NTS
COURSE OUTLINE
Overview of basics of information theory

I
I
I
I
I
Multi-antenna channel capacity, water-filling

Basic theory of network information theory
I
I
I
I
I
I
Entropy, mutual information, capacity

Source coding and channel coding theorem
Memoryless Gaussian channel
Multi-access channels
Broadcast channels
Relay channels
Cyclic codes
Convolutional codes
Turbo-codes
NTS
Topics of the earlier basic IT course
Information, entropy, mutual information, and their derivatives
Basic theory of source coding, Shannons source coding theorem, Huffman

coding, Lempel-Ziv coding
Channel capacity, Shannons channel coding theorem, Gaussian channel,

bandlimited channel, Shannons limit, multiple Gaussian channels, multiple
colored noise channels, water-filling, ergodic and outage capacities, basics of
MIMO channels,
Basic theory of channel coding, linear block coding, Reed-Muller codes, Golay
code
NTS
REVIEW OF PROBABILITY THEORY:

CDF AND PDF
Let X be a continuous random variable with the cumulative density function (cdf)
FX (x) = Probability{X x} = P(X x)
Probability density function (pdf):
fX (x) =
where
FX (x)
x
x0
FX (x0 ) =
fX (x) dx
NTS
NORMALIZATION PROPERTY OF CDFs
Since FX () = 1, we obtain the so-called normalization property

Z
fX (x) dx = 1
Simple interpretation:
fX (x) = lim0
P{x /2 X x + /2}
f (x)
X
x1 x2
SURFACE =Probability{x
1< X< x 2}
NTS
EXAMPLE 1
Let the real-valued random variable X be uniformly distributed in the interval

[0, T ].
FX (x)
1
0
f (x)
1/T
0
NTS
EXAMPLE 2
Let the real-valued random variable X has the so-called Gaussian (normal)
distribution
2
2
1
fX (x) = p
e (xX ) /2X
2
2X
where X2 = var{X } is the variance and X is the mean. The corresponding
distribution function is given by
Z x
2
2
1
FX (x) = p
e (X ) /2X d
2
2X
NTS
CDF AND PDF OF GAUSSIAN RANDOM

VARIABLE
FX (x)
1
0
1/2
(22X)
f (x)
X
2 X
0
x
NTS
PROBABILITY MASS FUNCTION
Let X now be a discrete random variable which takes the values xi (i = 1, ... , I )
with the probabilities P(xi ) (i = 1, ... , I ), respectively.
For discrete variables, we define the probability mass function
P(xi ) = Probability(X = xi )
The normalization condition:

I
X
P(xi ) = 1
i=1
NTS
EXTENSION TO DISCRETE VARIABLES
How to extend the concepts of pdf and cdf to discrete variables?

Define the unit step function as

0, x < 0
u(x) =
1, x 0
Define the Dirac delta-function as

(x) =
, x = 0
0, x =
6 0
Z
,
(x) dx = 1
NTS
EXTENSION TO DISCRETE VARIABLES
Relationships between the delta-function and unit step function:

Z x
u(x)
() d = u(x) ,
(x) =
x
Shifting property of delta-function:

Z
g (x) (x y ) dx = g (y )
Using the definition of the unit step function, we can express the cdf as
FX (x) =
I
X
P(xi )u(x xi )
i=1
NTS
EXTENSION TO DISCRETE VARIABLE
Then, the pdf can be expressed as

fX (x) =
I
X
P(xi )(x xi )
i=1
Using the delta-function sifting property, we have

Z
I
X
fX (x) dx =
P(xi )(x xi ) dx
i=1
I Z
X
i=1
P(xi )(x xi ) dx =
I
X
P(xi ) = 1
i=1
NTS
EXAMPLE 1
Let the random variable X be an outcome of the coin tossing experiment.
FX (x)
1.0
0.5
0
f (x)
(0.5)
0
(0.5)
1
x
NTS
EXAMPLE 2
Let the random variable X be an outcome of the die throwing experiment.
FX (x)
1
1/6
0
f (x)
(1/6)
0
x
NTS
STATISTICAL EXPECTATION
Expected value (mean) of a continuous random variable:

Z
X = E{X } =
x fX (x) dx
For a discrete random variable:

Z
X = E{X } =
xfX (x) dx =
I Z
X
i=1
I
X
P(xi )(x xi ) dx
i=1
xP(xi )(x xi ) dx =
I
X
xi P(xi )
i=1
NTS
STATISTICAL EXPECTATION
We can also compute expected value of a function of continuous random variable:

Z
E{g (X )} =
g (x) fX (x) dx
For a discrete random variable:

E{g (X )} =
I
X
g (xi ) P(xi )
i=1
NTS
VARIANCE OF A RANDOM VARIABLE
var{X } = E{(X E{X })2 }

= E{X 2 } E{X }2 = X2
where X is commonly called standard deviation.
The variance and standard deviation can be interpreted as measures of the

statistical dispersion of a random variable w.r.t. the expected value.
NTS
EXAMPLE
Compute the mean and variance of the random variable uniformly distributed in
the interval [0, 1]
f (x)
X
1
1
1
x
x dx = =
2
2
0
Z
X
=
0
X2
1
2
x dx
=
0
2X
1
x 3
1
1 1
1
= = =
3
4
3 4
12
0
NTS
JOINT DISTRIBUTION
Let us now consider two random variables X and Y jointly.
Joint distribution function:

FX ,Y (x, y ) = P(X x, Y y )
Joint pdf:
fX ,Y (x, y ) =
2 FX ,Y (x, y )
x y
NTS
JOINT DISTRIBUTION
The inverse relationship:

Z
x0
y0
FX ,Y (x0 , y0 ) =
fX ,Y (x, y ) dx dy
Any pdf satisfies the following normalization property:

Z Z
fX ,Y (x, y ) dx dy = 1
Also,
Z
fX ,Y (x, y ) dx = fY (y ) ,
fX ,Y (x, y ) dy = fX (x)
NTS
CONDITIONAL DISTRIBUTION
In practical problems, we are often interested in the pdf of one random variable X
conditioned by the fact that a second random variable Y has some specific value
y . It is obvious that
P(X x; Y y ) = P(X x|Y y )P(Y y )
Then, conditional cdf is defined as
FX (x|y ) = P(X x|Y y ) =
FX ,Y (x, y )
FY (y )
From symmetry, it also follows that

FY (y |x) =
FX ,Y (x, y )
FX (x)
NTS
CONDITIONAL DISTRIBUTION
fX ,Y (x, y )
fY (y )
fX ,Y (x, y )
fY (y |x) =
fX (x)
fX (x|y ) =
From the last two equations, we obtain the Bayes rule

fX (x|y ) fY (y ) = fY (y |x) fX (x)
NTS
NORMALIZATION CONDITION
fX ,Y (x, y )
dx
fY (y )
Z
1
=
fX ,Y (x, y ) dx = 1
fY (y )
fX (x|y ) dx =
Conditional expectation:
Z
E{g (X )|y } =
g (x)fX (x|y ) dx
NTS
STATISTICAL INDEPENDENCE
Two random variables X and Y are statistically independent if

fX ,Y (x, y ) = fX (x)fY (y )
Substituting this equation to the conditional pdf, we obtain that statistical
independence implies
fX (x|y ) = fX (x)
That is, the variable Y does not have any influence on the variable X .
NTS
EXAMPLE
Let

fX ,Y (x, y ) =
4xy , 0 x 1, 0 y 1
0,
otherwise
Are the variables X and Y statistically dependent?

Z
Z
fX (x) =
fX ,Y (x, y ) dy = 4x
2x , 0 x 1,
0,
otherwise
2y , 0 y 1,
0,
otherwise
=
fY (y )
1
y 2
y dy = 4x
2
0
fX ,Y (x, y ) = fX (x)fY (y ) and, hence, the variables are independent!

NTS
CORRELATION AND COVARIANCE
Two fundamental characteristics of linear statistical dependence are correlation

rXY = E{XY }
and covariance
cov{X , Y }
E{(X E{X })(Y E{Y })}
E{XY } E{X }E{Y }
E{XY } X Y
For X = Y , covariance boils down to variance:

cov{X , X } = E{X 2 } 2X = var{X }
NTS
SOME USEFUL PROPERTIES
var{X + Y } = var{X } + var{Y } + 2 cov{X , Y }.
If the variables X and Y are statistically independent then for any functions h
and g , E{h(X )g (Y )} = E{h(X )}E{g (Y )}.
If the variables X and Y are statistically independent then cov{X , Y } = 0.

Therefore, covariance is sometimes used a measure of statistical dependence.
However, the reverse statement is not necessarily true!
If the variables X and Y are statistically independent then

var{X + Y } = var{X } + var{Y }.
NTS
EXTENSION TO
MULTIVARIAT DISTRIBUTIONS
We may also consider multiple (more than two) random variables X1 , ... , Xn .
Joint distribution function:

FX1 ,X2 ,...,Xn (x1 , x2 , ... , xn ) = P(X1 x1 , X2 x2 , ... , Xn xn )
Joint pdf:
fX1 ,X2 ,...,Xn (x1 , x2 , ... , xn ) =
n FX1 ,X2 ,...,Xn (x1 , x2 , ... , xn )

x1 x2 xn
NTS
Introducing vectors
X =
[X1 , X2 , ... , Xn ]T
[x1 , x2 , ... , xn ]T
we rewrite the previous equations in symbolic (vector) notation as

FX (x)
P(X x)
fX (x)
N FX (x)
x1 x2 xn
Normalization condition:
Z
fX (x) dx = 1
NTS
Statistical expectation can be defined as

Z
Z
E{g (X)} =
g (x) fX (x) dx
where g (X) is some function of the random vector X.

In particular bivariate case
Z Z
g (x, y ) fX ,Y (x, y ) dx dy
E{g (X , Y )} =
NTS
MULTIVARIAT GAUSSIAN DISTRIBUTIONS
Jointly Gaussian random variables have the following joint multivariate pdf:
T 1
1
1
fX (x) =
e 2 (xX ) R (xX )
n
1/2
( 2) det{R}
where the mean

X = E{X}
and the covariance matrix
R = E{(X E{X})(X E{X})T } = E{XXT } X T
X
In symbolic notation
X N (X , R)
NTS
MULTIVARIAT GAUSSIAN DISTRIBUTION
In the case of a single (n = 1) random variable X = X1 , the n-variate Gaussian

pdf reduces to
2
2
1
pX (x) = p
e (xX ) /2X
2
2X
which is the well-known Gaussian pdf.
In the case of two (N = 2) random variables X = X1 and Y = X2 , we have that

R=
X2
X Y
X Y
Y2

,
E{(X X )(Y Y )}
X Y
Note that = XY is nothing else as correlation coefficient.
NTS
The determinant of R is given by

det{R} = X2 Y2 (1 2 )
and, therefore, the n-variate pdf reduces to the so-called bivariate pdf
1
p
fXY (x, y ) =
2X Y 1 2

1
2(x X )(y Y ) (y Y )2
(x X )2
exp
+
2(1 2 )
X Y
X2
Y2
The maximum of this function is located in the point {x = X ; y = Y } and the
maximal value is
1
p
max {fX ,Y (x, y )} = fX ,Y (X , Y ) =
2X Y 1 2
NTS
In the case of uncorrelated X and Y , = 0 and we have

1
1 (x X )2
(y Y )2
fXY (x, y ) =
exp
+
2X Y
2
X2
Y2

2
2
2
2
1
1
e (xX ) /2X
e (y Y ) /2Y
=
2X
2Y
= fX (x) fY (y )
i.e., the variables X and Y become statistically independent. This is a very
important result showing that any uncorrelated Gaussian random variables are also
statistically independent! Note that in the case of non-Gaussian random variables,
this is not true.
NTS

EXAMPLE
Contour plots of the bivariate Gaussian pdf with the parameters X = Y = 0 and
X = Y = 1.
3
BIVARIATE GAUSSIAN PDF, CORRELATION COEFFICIENT = 0
3
3
0
x
NTS

EXAMPLE
BIVARIATE GAUSSIAN PDF, CORRELATION COEFFICIENT = 0.25
3
3
0
x
NTS

EXAMPLE
3
3
0
x
NTS

EXAMPLE
3
3
0
x
NTS

EXAMPLE
3
3
0
x
NTS

EXAMPLE
3
3
0
x
NTS

EXAMPLE
3
3
0
x
NTS

EXAMPLE

3
3
3
0
x
NTS

EXAMPLE
3
3
0
x
NTS
BASICS OF INFORMATION THEORY
Shannon: Information is the resolution of uncertainty about some statistical event:

I
Before the event occurs, there is an amount of uncertainty.
After the occurrence of the event, there is no uncertainty anymore, but there
is gain in the amount of information.
Highly expected messages deliver small amount of information, while highly

unexpected ones deliver a large amount of information.Hence, the amount of
information should be inversely proportional to the probability of the message.
NTS
Information and entropy
The amount of information of the symbol x with the probability P(x):

1
I (x) = log
= log(P(x)) with [I (x)] = Bit
P(x)
Considering a source with the alphabet X = {x1 , ... , xN }, entropy is defined as the
statistically averaged amount of information (mean of I (X )):
H(X ) = E{I (X )} = E{log(P(X ))}
=
N
X
P(xi ) log(P(xi ))
i=1
N
X

P(xi ) log
i=1
1
P(xi )

with
[H(X )] = Bit/Symbol
NTS
Example
Entropy of non-symmetric binary source with the probabilities P(0) = p and

P(1) = 1 p
HB (X ) = p log(p) (1 p) log(1 p)
H B (X), Bit/Zeichen maximale Ungewissheit
0
I
I
0.5
The entropy characterizes the source uncertainty.

The entropy is a concave function of probability.
NTS
SOME DERIVATIVES OF ENTROPY

Joint Entropy
The definition of entropy can be extended to a pair of random variables X and Y

(two discrete sources X = {x1 , ... , xN } and Y = {y1 , ... , yM }).
The joint entropy H(X , Y ) is defined as:
H(X , Y ) = E{log(P(X , Y ))}
=
N X
M
X
P(xi , yl ) log(P(xi , yl ))
i=1 l=1
NTS
Conditional Entropy
The conditional entropy H(Y |X ) is the amount of uncertainty remaining about

the random variable Y after the random variable X has been observed:
H(Y |X ) = EX ,Y {log(P(Y |X ))}
=
N X
M
X
P(xi , yl ) log(P(yl |xi ))
i=1 l=1
N
X
i=1
P(xi )
M
X
P(yl |xi ) log(P(yl |xi ))
l=1
where we use the Bayes rule

P(xi , yl ) = P(xi |yl )P(yl ) = P(yl |xi )P(xi )
NTS
Useful properties
Important conditional entropy property:

H(X , Y ) = H(X ) + H(Y |X )
Hence the entropy, conditional entropy, and joint entropy are related quantities.
Another important property: Conditioning reduces entropy:
H(X |Y ) H(X )
with the equality if and only if X and Y are statistically independent.
NTS
Mutual information
Let us consider two random variables (sources). The amount of information

exchanged between two symbols xi und yl can be defined as:

P(xi |yl )
I (xi ; yl ) = log
P(xi )

P(xi , yl )
= log
with [I (xi ; yl )] = Bit
P(xi )P(yl )
where we again use the Bayes rule P(xi , yl ) = P(xi |yl )P(yl ).
NTS
Mutual information
The amount of mutual information exchanged between two sources X and Y can
be obtained by averaging of I (xi ; yl ) as
I (X ; Y ) =
N X
M
X
P(xi |yl )
P(xi )
P(xi , yl )
P(xi )P(yl )
P(xi , yl ) log
i=1 l=1
N X
M
X
i=1 l=1
P(xi , yl ) log
with [I (X ; Y )] = Bit/Symbol
NTS
Mutual information
Mutual information is the reduction in the uncertainty of X due to the knowledge

of Y :
I (X ; Y ) = H(X ) H(X |Y )
Relation of mutual information to entropies and joint entropy:

I (X ; Y ) = H(X ) + H(Y ) H(X , Y )
NTS
Channel capacity
The input probabilities P(xi ) are independent of the channel. We can then
maximize the mutual information I (X ; Y ) w.r.t. P(xi ). The channel capacity can
be then defined as the maximum mutual information in any single use of the
channel, where the maximization is over P(xi ) (i = 1, ... , M):
C = max I (X ; Y )
{P(xi )}
with [C ] = Bits/Symbol
or bits per channel use (bpcu).
NTS
Example
Channel capacity of a binary symmetric channel
x1
y1
1- p
1- p
x2
y2
CB = 1 + p logp + (1 p) log(1 p)
= 1 HB (X )
NTS
Entropy/capacity of a binary symmetric channel
H B (X), Bit/Zeichen
C B , Bit/Zeichen
0.5
0.5
NTS
Channel coding/decoding
The inevitable presence of noise in a channel causes errors between the output
and input data sequences of a digital communication system. To reduce these
errors we will resort to channel coding.
Channel encoder maps the incoming source data into a channel input sequence. It
adds redundancy to these data to protect it from errors.
Channel decoder inversely maps the channel output sequence into an output data
sequence in a way that the overall effect of the channel noise on the system is
minimized.
NTS
Shannons Channel-Coding Theorem
Let information be transmitted through a discrete memoryless channel of capacity

C . If the transmission rate
R<C
then there exists a channel coding scheme for which the source output can be
transmitted over the channel with an arbitrarily small probability of error.
Conversely, if
RC
than it is impossible to transmit information over the channel with an arbitrary
small probability of error.
NTS
Joint source-channel coding theorem
If
H(X ) > C
then it is impossible to transmit the source outputs over the channel with an
arbitrary small probability of error.
The latter theorem follows from the direct combination of source-coding and
channel-coding theorems.
NTS
Continuous sources
The mutual information between two continuous random sources X and Y with
the joint symbol pdf fX ,Y (x, y ) is given by

Z Z
fX (x|y )
I (X ; Y ) =
fX ,Y (x, y ) log
dx dy
fX (x)

Z Z
fX ,Y (x, y )
dx dy
=
fX ,Y (x, y ) log
fX (x)fY (y )

NTS
What is the relationship between the discrete and continuous mutual information?
It can be shown that the definitions of mutual information in the continuous and
discrete cases are essentially similar.
This property enables to use the continuous mutual information to define the
capacity in the case of continuously distributed (infinite alphabet) sources.
NTS
Continuous-time bandlimited channel
Consider a continuous-time bandlimited channel with additive Gaussian white

noise (AGWN). The output of such AWGN channel can be described as
Y (t) = (X (t) + Z (t)) h(t)
where X (t) and Z (t) are the signal and noise waveforms, respectively, and h(t) is
the impulse response of an ideal bandpass filter with the cutoff frequency B.
NTS
Bandbegrentzter AWGN-Kanal
H(f)
x(t)
y(t)
+
AWGN
n(t)
Leistungsdichtespectrum
des Rauschens
B f
-B
idealer Tiefpass
Bandbreite B
SN (f)
N 0 /2
f
NTS
Capacity
Capacity of the bandlimited channel:

P
C = B log 1 +
bits per second
N0 B
where it is taken into account that PN = N0 B.
Shannons bound:
C

P
= lim B log 1 +
B
N0 B
P
P
= loge
' 1.44
N0
N0
NTS
Parallel AWGN channels
Consider multiple parallel AWGN channels

Yi = Xi + Zi ,
i = 1, ... , K
with a common power constraint

E
( K
X
)
Xi2
i=1
K
X
i=1
E{Xi2 } =
K
X
Pi P
i=1
where Zi N (0, PN,i ), the noise is statistically independent from channel to

channel, and Pi = E{Xi2 }.
How to distribute the power P among the channels to maximize the total
capacity?
NTS
Water-filling
Result (water-filling): The total capacity is maximized when

Pi = ( PN,i )+
where the value of is chosen that
K
X
Pi =
K
X
i=1
( PN,i )+ P
i=1
and ()+ denotes the positive part, i.e., for any x,

(x)+ ,
x,
0,
if x 0
if x < 0
NTS
Water-filling
NTS
SKETCH OF THE PROOF
The mutual information of a system with multiple Gaussian channels can be

shown to be upper-bounded by the value

K
1X
Pi
log 1 +
2
PN,i
i=1
Equality is achieved when X = [X1 , X2 , ... , XK ]T is Gaussian vector:

X N (0, P)
NTS
SKETCH OF THE PROOF
The covariance matrix
P=
P1
0
..
.
0
P2
..
.
..
.
0
0
..
.
PK
= diag{P1 , ... , PK }
Hence, the capacity of multiple Gaussian channels is given by

K
Pi
1X
log 1 +
C=
2
PN,i
i=1
Let us now maximize C over {Pi }K

i=1 subject to the constraints
Pi 0, for i = 1, ... , K .
PK
i=1
Pi = P. and
NTS
SKETCH OF THE PROOF
We use the Lagrange multiplier method. The Lagrangian function can be written
as

K
K
K
X
X
1X
Pi
L(P1 , ... , PK ) =
log 1 +
+ 0 (P
Pi ) +
i Pi
2
PN,i
i=1
i=1
i=1
where 0 , ... , K are the Lagrange multipliers. Differentiating L(P1 , ... , PK ) w.r.t.
Pi , we have
!

K
K
X
X
L
1
Pi
=
log e ln 1 +
+ 0 (P
Pi ) +
i Pi
Pi
Pi 2
PN,i
i=1
i=1
1/PN,i
1
= log e
0 + i
2
1 + Pi /PN,i
log e
1
=
0 + i
2 Pi + PN,i
NTS
SKETCH OF THE PROOF
From the so-called Karnush-Kuhn-Tucker (KKT) conditions for constraint convex

optimization problems:
K
X
Pi? = P
Pi? 0
(zero gradient)
(complementary slackness)
(constraint satisfaction)
i=1
log e
1
?0 + ?i
2 Pi? + PN,i
?i Pi?
?i
Thus
Pi?
0 and
PK
?
i=1 Pi
Pi? (?0
0; i = 1, ... , K
(for inequality constraints)
= P as well as
log e
1
log e
1
) and ?0
2 Pi? + PN,i
2 Pi? + PN,i
NTS
SKETCH OF THE PROOF
from KKT: Pi? 0 and

Pi? (?0
PK
i=1
Pi? = P as well as
log e
1
log e
1
) = 0 and ?0
2 Pi? + PN,i
2 Pi? + PN,i
Thus if
?0 <
log e 1
,
2 PN,i
then from the last equation we have Pi? > 0 which by slackness conditions implies
that
log e
1
.
?0 =
?
2 Pi + PN,i
and thus for ? = log e/(2?0 )
Pi? = ( ? PN,i ) .
NTS
SKETCH OF THE PROOF
from KKT: Pi? 0 and

Pi? (?0
PK
i=1
Pi? = P as well as
log e
1
log e
1
) = 0 and ?0
2 Pi? + PN,i
2 Pi? + PN,i
Reversely if
?0
log e 1
,
2 PN,i
then Pi? > 0 is impossible as it would imply that

?0
log e 1
log e
1
>
,
?
?
2 Pi
2 Pi + PN,i
which violates the complementary slackness condition. We conclude that for

PN,i ? we have Pi? = 0 and Pi? = ( ? PN,i ) otherwise.
NTS
EXTENDED DEFINITIONS OF CAPACITY

Ergodic capacity
Ergodic capacity: In the case of random Gaussian channel, it is sometimes more

useful to separate the effects of the transmitted signal and the channel as
Y (i) = X (i)H(i) + Z (i)
where H(i) is the channel gain in the ith channel use. In contrast to noise and
signal waveforms, the channel gain is usually treated as non-random
(deterministic) value.
NTS
Ergodic capacity
For this model,

P = E{X 2 }
can be interpreted as the transmitted signal power, whereas
P = E{(XH)2 } = E{X 2 }H 2 = PH 2
can be interpreted as the received signal power.
NTS
Ergodic capacity
In this case, the capacity formula reads

C=

1
PH 2
log 1 +
2
PN
Note that the conventional capacity is instantaneous, that is, it characterizes the
maximal achievable rate for particular given realization of the gain H of the
channel.
How can we characterize the maximal achievable rate in average rather than for
some particular channel gain?
NTS
Ergodic capacity
In practice, wireless channels are random and, therefore, should be treated as

random.
Based on this fact, the ergodic capacity is defined as the instantaneous capacity C
averaged over the channel realizations:
CE = EH {C }
where EH {} denotes statistical expectation over the random channel gain.
NTS
Ergodic capacity
Assume that we know the channel gain pdf fH (h). In this case, we can compute
the ergodic capacity as
Z
CE =
fH (h) C (h) dh
Ergodic capacity provides another look at the achievable transmission rate as

compared to the conventional instantaneous capacity, because it gives the average
rather than the instantaneous picture.
NTS
Outage
Outage capacity: the transmission rate Cpout which does not exceed the
instantaneous capacity C in pout 100 percents of channel realizations.
The quantity pout is called outage probability.
Outage is defined as the event where, for some particular channel realization, the
chosen transmission rate is higher than the instantaneous capacity (that is, where
no error-free transmission is possible).
In the cases of small pout (roughly speaking, pout 0.1), outage-induced errors
can be cured by means of channel coding.
NTS
Outage
NTS
Outage
The outage capacity can be characterized as follows. Let the pdf of the
instantaneous capacity C = C (H) be fC (c) where fC (c) = 0 for c < 0. Then, the
outage capacity is defined by the equation
Z
p = P(C < Cpout ) =
Cpout
fC (c) dc
0
NTS
Channel coding
Channel encoding and decoding is used to correct errors that may occur during
the signal transmission over the channel.
NTS
Linear block codes
Linear binary block codes: coding/decoding operations can be described using

linear algebra. Binary codes use modulo-2 arithmetic.
A code is said to be linear if the modulo-2 sum of any two codewords in the code
give another codeword of this code.
A code is denoted as (n, k) linear block code if n is the total number of bits of the
code, and k is the number of bits containing the message.
NTS
Linear block codes
Row-vector notations
m = [m1 , ... , mk ]
b = [b1 , ... , bnk ]
c = [b1 , ... , bnk , m1 , ... , mk ] = [b, m]
Block codes use the message bits to generate parity-check bits according to the
equation:
b = mP
where P is the k (n k) coefficient matrix. Noting that c = [b, m], we get that
c = [b, m] = [mP, m] = m[P, Ik ] = mG
where G is the k n generator matrix.
NTS
Hamming codes
Hamming codes, a family of codes with

n = 2m 1
k = 2m m 1, n k = m
(7,4) Hamming code (n = 7, m = 3, k = 4) generator matrix:
1
0
G = [P, I4 ] =
1
1
1
1
1
0
0
1
1
1
1
0
0
0
0
1
0
0
0
0
1
0
0
0
0
1
NTS
Message
0000
0001
0010
0011
0100
0101
...
Codeword
0000000
1010001
1110010
0100011
0110100
1100101
and so on ...
Hamming Weight
0
3
4
3
3
4
...
For the given Hamming code, dmin = 3. Therefore, it is a single-error correcting

code.
NTS
MULTI-ANTENNA CHANNELS
Consider a multiple-input multiple-output (MIMO) channel:

N
antennas
channel
Tx
antennas
Rx
In the frequency flat fading case, the signal in the mth receive antenna
Ym (t) =
N
X
Hmn (t)Xn (t) + Zm (t),
m = 1, ... , M
n=1
where Hmn is the channel coefficient between the mth receive and nth transmit
antennas, Xn is the signal sent from the nth transmit antenna, and Zm is the noise
in the mth receive antenna.
NTS
MIMO channel
Defining the M N channel matrix
H=
H11
H21
..
.
H12
H22
..
.
..
.
H1N
H2N
..
.
HM1
HM2
HMN
and the transmit signal, receive signal, and noise column-vectors

x = [X1 , ... , XN ]T , y = [Y1 ... , YM ]T , z = [Z1 , ... , ZM ]T
we can write the system input-output relationship in the matrix form as:
y = Hx + z
NTS
SIMO channel
One particular case of the MIMO channel is a single-input multiple-output

(SIMO) channel:
N
1 antenna
antennas
Tx
Rx
In the frequency flat fading case, the signal in the nth receive antenna
Yn (t) = Hn (t)X (t) + Zn (t),
n = 1, ... , N
where Hn is the channel coefficient between the nth receive antenna and the
transmit antenna, and X (t) is the signal sent from the transmit antenna.
NTS
SIMO channel
Defining the N 1 channel vector

h = [H1 , ... , HN ]T
we can write the system input-output relationship in the vector form as:
y = hX + z
NTS
MISO channel
Another particular case of the MIMO channel is a multiple-input single-output

(MISO) channel:
N
antennas
1 antenna
Tx
Rx
In the frequency flat fading case, the signal in the receive antenna
Y (t) =
N
X
Hn (t)Xn (t) + Z (t)
n=1
where Hn is the channel coefficient between the receive antenna and the nth
transmit antenna, and Xn (t) is the signal sent from the nth transmit antenna.
NTS
MISO channel
Defining the N 1 channel row-vector

h = [H1 , ... , HN ]
we can write the system input-output relationship in the vector form as:
Y = hx + Z
NTS
Capacity in the case of an informed transmitter
Let us consider the MIMO case assuming that z NC (0, 2 I). Then, the equation
y = Hx + z
describes a vector Gaussian channel. In the case of known channel at the
transmitter, the capacity can be computed by decomposing this channel into a set
of parallel independent scalar Gaussian sub-channels.
Singular value decomposition (SVD) of H:
H = UVH
where the M M matrix U and the N N matrix V are unitary, that is,
UH U = UUH = I and VH V = VVH = I.
NTS
SVD for any n m matrix A
A = UVH =
m
n
= U
111
000
000 11
111
00
00
11
i ui viH
111
000
VH
n<m
m
111 11
000
00
00
11
A =
111
000
VH
n>m
NTS
MIMO capacity (informed transmitter)
Using the SVD of H, the MIMO model equation becomes

y = UVH x + z
Multiplying this equation by UH from right, and using the unitary property of U,
we have
UH y = VH x + UH z
NTS
Introducing the notations y , UH y,

x , VH x,
of parallel Gaussian channels
y =
x + z
z , UH z we become a system
where E{zzH } = UH E{zzH }U = 2 I, and, therefore

z NC (0, 2 I)
Moreover,
k
xk2 = xH VVH x = kxk2
Thus, the power is preserved!
NTS
The system of parallel channels can be also written componentwise

i + Zi ,
Yi = i X
i = 1, ... , no
where no = min{N, M}. The transition to this equivalent system corresponds to

pre-processing
x = V
x
at the transmitter and post-processing
y = UH y
at the receiver. Hence, the pre- and post-processing operators are V and UH ,
respectively.
NTS
To implement the pre-/post-processing operations, the original vector to be

transmitted has to be
x. It should be pre-processed at the transmitter to obtain
x = V
x
The vector x should then be sent over the channel. At the receiver, we have
y = Hx = UVH V
x = U
x
and after post-processing, we obtain y = UH y = UH U
x = x
NTS
NTS
The capacity of the resulting parallel independent channel system:

C =B
no
X
i=1
Pi 2
log 1 + 2 i

bits/s
where Pi are the water-filling power allocations:

Pi =
2
2i
+
Pno
and the water level is obtained from the total power constraint i=1
Pi P
Each i corresponds to an eigenmode of the channel, also called eigenchannel.
NTS
Wireless MIMO channel
N Tx
wireless
MIMO
channel
M Rx
System model
+
assume perfect channel state information at the transmitter
MIMO channel equation in matrix

notation
N Tx
M Rx
What is the optimum transmission and power allocation

scheme if the channel matrix H is known at the transmitter?
Capacity of a MIMO channel
max
subject to:
(1)
(2)
sum power contraint due to hardware

limitations and/or regulations
Singular Value Decomposition of MIMO

channel
ui and vi are left and right singular vectors.

i is corresponding sigular value ( 0)
and
are unitary:
Decoupling the channels using linear

transformation
;
for
i = 1,,r;
r independent parallel channels
Independent parallel channel

representation
X
r independent parallel channels
Optimization problem
Capacity:
max
subject to (1)
(2)
pi is power assigned to i-th input signal
Water-filling principle
High SNR regime

What are the key parameters that determine the performance?
At high SNR, the water level is high and the policy of allocating equal amounts of
power to each channel is asymptotically optimal. In this case,
r
X

P2i
log 1 +
r 2
i=1
2
r
X
Pi
' B
log
r 2
C ' B
i=1
' rB log SNR + B
r
X
i=1
log
2i
r
bits/s
where r , rank{H} and SNR = P/ 2 .
NTS
High SNR regime

It can be proved that among the channels with the same power gain, the channels
with the equal spread of singular values result in the highest capacity.
This means that well-conditioned channel matrices are preferable in the high SNR
regime.
NTS
Low SNR regime

In this regime, the optimal policy is to allocate power to the channel with the
strongest eigenmode:

P2max
C ' B log 1 +
2
and ill-conditioned (rank-one) channel matrices are preferable.
Using the property log(1 + x) ' xloge that is valid for x 1, we have
C '
BP2max log e
2
NTS
MIMO capacity (uninformed transmitter)
Let us now obtain the MIMO channel capacity based on general considerations
assuming that H is fixed, while the other values (x, y and z) are random. In such
a case, no assumption on the channel knowledge at the transmitter is used at this
time, but the receiver is assumed to know H.
Capacity via mutual information:
C = max I (x; y) = max[H(y) H(y|x)]
p(x)
p(x)
The output covariance matrix is given by

R = E{yyH } = HPHH + 2 I
where
P , E{xxH }
NTS
MIMO capacity (uninformed transmitter)
Result: (Telatar, 1995; Foschini and Gans, 1998): Consider the model
y = Hx + z
where x NC (0, P), z NC (0, 2 I), and H is fixed. Let B be the channel
bandwidth in Hz. Then, the MIMO channel capacity is equal to

1
C = B log det I + 2 HPHH bits/s
NTS
Result 1
Let X1 , ... , Xn have a multivariate complex circular Gaussian distribution with the
mean and covariance matrix P:
fX (x) =
1
()n det{P}
e (xX )
P1 (xX )
Then
H(X) = H(X1 , ... , Xn ) = log ((e)n det{P})
Proof:
Z
H(X) =
fX (x)(x x )H P1 (x x )dx
+ ln (()n det{P})
NTS
Result 1 (proof)
= E{(x X )H P1 (x X )} + ln(()n det{P})

= E{tr(P1 (x X )(x X )H )} + ln(()n det{P})
= tr(P1 E{(x X )(x X )H }) + ln(()n det{P})
= tr(P1 P) + ln(()n det{P})
= n + ln(()n det{P})
= ln((e)n det{P}) nats
= log((e)n det{P}) bits
NTS
Result 2
Let the random vector x Cn have zero mean and covariance ExxH = P. Then
H(X) = H(X1 , ... , Xn ) log {(e)n det{P}} with equality if and only if
X NC (0, P).
R
Proof:: Let g (x) be a pdf with covariance [P]ij = g (x)xi xj dx and let P (x) be
the complex circular Gaussian pdf NC (0, P).
Note, that the logarithm of the complex circular Gaussian vector

P (x) (x x )H P1 (x x ) is a quadratic form in x.
NTS
Result 2 (proof)
Then the Kullback-Leibler D(g (x)||P (x)) distance between the two pdfs is given
as
Z
g (x)
0 D(g (x)||P (x)) = g (x) log(
)dx
P (x)
Z
= Hg (X) g (x) log(P (x)) dx
| {z }
quadratic form
|
Z
= Hg (X)
{z
second moment of X
P (x) log(P (x))dx
= Hg (X) + HP (X) HP (X) Hg (X)

The Gaussian distribution maximizes the entropy over all distributions with the
same variance.
NTS
Proof of MIMO capacity result
It has been shown that for all random vectors with the covariance matrix R, the
entropy of y is maximized when y is zero-mean circularly-symmetric complex
Gaussian. But this is only true when the input vectors x are zero-mean
circularly-symmetric complex Gaussian, and, therefore, it is the optimal
distribution for X.
Using these facts, the capacity formula can be proved by obtaining explicit
expressions for H(Y) and H(Y|X).
NTS
Recall the signal model:

Y = HX + Z
Then the mutual information between X and Y is given as
I (X; Y)
= H(Y) H(Y|X)
= H(Y) H(HX + Z|X)
= H(Y) H(HX|X) H(Z|X)
| {z }
0
= H(Y) H(Z)
NTS
From Result 2 we know that the entropy H(Y) is maximized for the complex
circular Gaussian input distribution, thus
max
all pdfs with R
I (X; Y)
max
all pdfs with R
H(Y) H(Z)
log{(e)n det(HPHH + 2 I)} log{(e)n 2 }
log det(I + 1/ 2 HPHH )} bits per channel use
NTS
Transition to the classic Shannons capacity

result
Assuming a single-input single-output (SISO) system with N = M = 1 and the

constant channel gain H, which transmits with a power P, we have
H = H,
P = P,
I=1
and, therefore

1
C = B log det I + 2 HPHH

|H|2 P
= B log 1 +
2
This is the classical Shannon capacity formula for a bandlimited channel!
NTS
Channel known at the transmitter
If the channel matrix H is known at the transmitter then the in general unequal
powers should be chosen, and P is not a scaled identity matrix.
Eigenchannels and power allocation using water-filling should be used as discussed
above.
NTS
Channel unknown at the transmitter
If the channel matrix H is unknown at the transmitter, then if follows from the
symmetry reasons that P should be scaled identity matrix. Using the power
constraint
tr{P} = P
we obtain that P has to be chosen as
P = (P/N)I
Indeed, the power constraint is satisfied because
tr{P} = tr {(P/N)I} = (P/N)tr{I} = P
NTS
Choosing P = (P/N)I, we obtain that the MIMO capacity in the uninformed

transmitter case is given by

P
C = B log det I + 2 HHH
N
Assuming that, although being fixed, the entries of H are statistically independent
random values with the unit variance, and using the law of large numbers, we
obtain that for a large number of transmit antennas and a fixed number of receive
antennas
HHH
I
N
NTS
Using the latter property, we obtain that for large N,

P
C = B log det
1+ 2 I
(
M )
P
= B log
1+ 2

P
= MB log 1 + 2
which is M times the SISO Shannon capacity!
NTS
Parallel SISO channel interpretation
Consider the general MIMO channel capacity formula. Let the eigendecomposition
of the positive semi-definite Hermitian matrix HPHH be
HPH
r
X
H
i ui uH
i = UU
i=1
where UH U = I, and r , rank{HPHH }. The matrices U and should not be

confused with that of the SVD of the matrix H used earlier!
We will use the property
det{I + AB} = det{I + BA}
valid for any matrices A and B of conformable dimensions.
NTS
Assuming that A = U and B = UH , we obtain that

1
1
H
H
C = B log det I + 2 HPH
= B log det I + 2 UU

1
1
H
= B log det I + 2 U U = B log det I + 2
)
( r
r
X
Y

= B log
(1 + i / 2 ) = B
log 1 + i / 2
i=1
i=1
NTS
The latter formula interprets the capacity of the MIMO channel as the sum of
capacities of r parallel SISO channels.
Assuming the case of uninformed transmitter (P = (P/N)I), r can be interpreted
as the rank of H full rank channels are preferable!
If H is drawn randomly, then almost sure

rank{H} = min{M, N}
This leads us to the conclusion that the capacity grows nearly proportionally to
min{M, N}.
NTS
Assume M = N and let the Frobenius norm of H be given. What type of channel
will maximize the MIMO capacity?
Result: The capacity is maximized in the case when H is orthogonal:
HH H = HHH = I
where is a constant. In this case,
C

N
P
P
I = B log 1 + 2
= B log det
1+ 2
N
N

P
= NB log 1 + 2
N
NTS
SIMO channel capacity
Consider a SIMO column-vector channel h with one transmit and N receive

antennas. The capacity formula becomes

1
H
C = B log det I + 2 Phh

1
H
= B log 1 + 2 Ph h

P
2
= B log 1 + 2 khk
Hence, the SIMO channel comprises only one spatial data pipe. The addition of
receive antennas yields only a logarithmic (rather than linear) increase of capacity.
NTS
MISO channel capacity
Consider a MISO row-vector channel h with one receive and N transmit antennas.
The capacity formula becomes

1
H
C = B log 1 + 2 hPh

1
1/2 2
= B log 1 + 2 khP k
The situation is similar to that in the SIMO case. The increase in capacity is only
logarithmic (rather than linear).
NTS
Ergodic MIMO channel capacity
The channel matrix H is no longer fixed, but is treated as random. The capacity
formula can be averaged over H:

1
EH {C } = B EH log det I + 2 HPHH
Result (Telatar, 1999): Let H be a Gaussian random matrix with i.i.d. elements.
Then, the average capacity is maximized subject to the power constraint
tr{P} P when
P
P= I
N
That is, to maximize the average capacity, the antennas should transmit
uncorrelated streams with the same power an intuitively appealing fact.
NTS
Ergodic MIMO channel capacity:

Proof (sketch)
H be a Gaussian random matrix with i.i.d. elements.

1
C = max EH log det{I + 2 HPHH }
P:trPP
Introduce P = P + Poff
C

1
max EH log det{I + 2 HP HH +
P:trPP

1
=
max EH log det{I + 2 HP HH
P :trP P

1
max log det{EH I + 2 HP HH

P :trP P

1
H
HP
H
}
off
2

1
+ 2 HPoff HH }

1
H
+ 2 HPoff H }
Where the last inequality follows form Jensens inequality.

NTS

Proof (sketch)

1
1
H
H
max log det{EH I + 2 HP H + 2 HPoff H }

P :trP P

1
1
H
H
=
max log det{EH I + 2 HP H + EH 2 HPoff H }
P :trP P
|
{z
}
=0
where the last term in the second equation is identical zero due to the statistical
independence of the entries in H.
We conclude that restricting the transmit covariance to exhibit the diagonal
structure P = P does not reduce the achievable capacity.
NTS

Proof (sketch)
Thus
C

1
H
=
max EH log det{ I + 2 HP H }
P :trP P
We can show that due to the i.i.d. property of H the objective function is
symmetric w.r.t. the input variable, i.e. exchanging the order of the entries
P1 , ... , PN does not change the function value. Further the function is concave.
We conclude that the optimal power allocation strategy in this case is to equally
distribute the power among the transmitted symbols, e.g. to choose
P1 = P2 = ... = PN .
NTS
Note that the latter choice of P coincides with our earlier choice of this matrix in
the case of fixed channel and uninformed transmitter.
Choosing P = (P/N) I, the maximal average capacity (which is commonly
referred to as ergodic capacity) becomes

P
H
CE = B EH log det I + 2 HH
N
Ergodic capacity has an important advantage w.r.t. fixed-channel capacity as it
gives an average rather than an instantaneous picture.
NTS
Using the parallel SISO channel interpretation and denoting the singular values of
H as i , we obtain
" r
#

X
Pi2
CE = B EH
log 1 + 2
N
i=1

r
X
P 2
= B
EH log 1 + 2 i
N
i=1
Please, note the difference with the water-filling capacity. In contrast to it, in the
latter expression equal powers are used for each eigen-channel.
NTS
Large antenna regime
Let us denote SNR = P2

Then, the capacity formula becomes
CE = B
r
X
i=1

SNRi2
EH log 1 +
N
Assume M = N and i.i.d. Rayleigh fading. Then, using random matrix theory, it
can be obtained that for any SNR
lim
CE
= const
N
Therefore, capacity grows linearly in N at any SNR in such an asymptotic regime!
NTS
Outage capacity
A value Cout which is larger than the capacity C in pout percents of channel
realizations. In other words,
Pr(Cout > C ) = pout
If one wants to transmit with Cout bits per second, then the channel capacity is
less than Cout with the probability pout . Hence, the transmission is impossible (the
system is in outage) in pout 100 percents of time.
Alternatively, we can write
Pr(Cout C ) = 1 pout
and, hence, in (1 pout ) 100 percents of time the transmission is possible as the
system is not in outage.
NTS
Outage capacity
1 pout is called non-outage probability.

Using the instantaneous MIMO capacity formula, we can define the MIMO outage
capacity by means of the following expression

1
H
min Pr Cout > B log det I + 2 HPH
= pout
tr{P}P
where we additionally use the opportunity to minimize the outage probability by
means of a proper choice of P. This particular choice, of course, depends on the
statistics of the random channel matrix H.
NTS
Example: Rayleigh fading channel
Rayleigh fading, the channel coefficients are circularly symmetric complex

Gaussian with zero mean and unit variance: a) known channel at the transmitter;
b) unknown channel at the transmitter
2
10
Outage capacity (bits / s / Hz)
Outage capacity (bits / s / Hz)
10
10
10
pout = 0.01
pout = 0.1
pout = 0.5
10
10
10
20
SNR (dB)
30
10
10
pout = 0.01
pout = 0.1
pout = 0.5
40
10
10
10
20
SNR (dB)
30
40
NTS
MULTIUSER CHANNELS
Why multiuser channels:

I
Up to now, we have considered point-to-point communications links.
Most of communication systems serve multiple users. Therefore, multiuser

channels are of great interest.
In multiuser channels, one user can interfere to another user. This type of
interference is called multiuser interference (MUI).
Common multiuser channel types:

I
Multiple-access channels
Broadcast channels
Relay channels
NTS
Multiple-access channel
NTS
Broadcast channel
NTS
Relay channel
NTS
Two-user multiple-access Gaussian channel:

Y (i) = X1 (i) + X2 (i) + Z (i),
Z (i) NC (0, 2 )
In the point-to-point (single user) case, the rate limit is the channel capacity. The
achievable rate region is, therefore, given by:

P
R < B log 1 + 2
In the two-user case, we should extend this concept to a capacity region C which
is a set of all pairs (R1 , R2 ) such that users 1 and 2 can simultaneously reliably
communicate at rates R1 and R2 , respectively.
NTS
Since the two users share the same bandwidth, there is a tradeoff between the
rates R1 and R2 : if one user wants to communicate at a higher rate, then the
other user may need to lower its rate.
Example of tradeoff: In orthogonal multiple access schemes such as OFDM, the
tradeoff can be achieved by varying the number of subcarriers allocated to each
user.
NTS
Rate region
Different scalar performance measures can be obtained from the capacity region:
I
The symmetric capacity

Csym = max R
(R,R)C
is the maximum common rate at which both users can simultaneously reliably
communicate.
I
The sum capacity

Csum =
max (R1 + R2 )
(R1 ,R2 )C
is the maximum total throughput that can be achieved.
NTS
Rate region
If we have two users with the powers P1 and P2 , then the capacity region for the
two-user channel is defined by the following inequalities:

P1
R1 < B log 1 + 2

P2
R2 < B log 1 + 2

P1 + P2
R1 + R2 < B log 1 +
2
The first two constraints say that the rate of each individual used cannot exceed
the capacity of the point-to-point link with the other user absent.
The last constraint says that the total throughput cannot exceed the capacity of
the point-to-point link with a single user defined as the sum of the two users.
NTS
Rate region
That is, not only the rates R1 and R1 are limited, but their sum is limited as well.
This means that the signal of each user may be viewed as an interference for
another user.
Result: The two-user capacity region is a pentagon.
NTS
Rate region: multiple-access channel
NTS
Rate region: multiple-access channel
Remark: Surprisingly, user 1 can achieve its single-user rate bound

R1 = B log 1 + P12 while at the
same time, user 2 can get a non-zero rate, as
high as R2 = B log 1 +
region plot. Indeed,
P2
P1 + 2
. This corresponds to point A of the capacity

P2
P1
1+ 2 1+
P1 + 2

P1
P2
P1 P2
= B log 1 + 2 +
+
P1 + 2
2 (P1 + 2 )

P 2 + P1 2 + P2 2 + P1 P2
= B log 1 + 1
2 (P1 + 2 )

P1 + P2
= B log 1 +
2
R1 + R2 = B log
NTS
Successive interference cancellation

How to achieve this?
Each user should encode its data using a capacity achieving channel code. The
receiver should decode the information of both users in two stages:
I
In the first stage, the data of user 2 are decoded treating user
1 as AWGN.

2
Then, the maximum rate of user 2 can achieve R2 = B log 1 + P1P+
.
2
In the second stage, the reconstructed (decoded) signal of user 2 is

subtracted from the aggregate received signal, and then the data of user 1
are decoded. Since the user 2 is already subtracted and there is only the
background AWGN left in the system, the achieved rate for user 1 will be

R1 = B log 1 + P12 .
This two-stage decoding is called successive interference cancellation.
NTS
Successive interference cancellation
If one reverses the order of cancellation then one can achieve point B rather than
A.
All other rate points on the segment AB can be obtained by time-sharing between
the multiple-access strategies of points A and B.
The segment AB contains all the optimal operating points of the channel, in the
sense that any point in the capacity region is dominated by some point on AB.
That is, for any point within the capacity region that corresponds to the rates R1
and R2 we can always find a point on the segment AB whose rates R1 and R2
satisfy:
R1 R1 ,
R2 R2
NTS
Pareto-optimal
The points on the segment AB are called Pareto-optimal.
One can always increase the user rates to move to a point on the segment AB,
and there is no reason not to do this.
NTS
The concrete choice of the point on AB depends on our particular objectives:

I
To maximize the sum capacity Csum , any point on AB is equally fine. Note
that we have already computed the sum of R1 and R2 in the point A. Hence,

P1 + P2
Csum = B log 1 +
2
To maximize the symmetric capacity Csym , we should take the point on AB

that gives us equal rates R1 and R2 .
I Some operating points on AB may be not fair, especially if the received
power of one user is much higher than that of the other user. In this case, we
should consider operating on the corner point in which the stronger user is
decoded first.
NTS
I
How does the system with successive cancellation compares to

a standard CDMA system in terms of achievable rate?
The principal difference between CDMA detection and successive cancellation

detection is that:
I
In the CDMA system, each user is decoded treating other users as

interference. This corresponds to the single-user receiver principle and we
immediately conclude that the performance of the CDMA system is
suboptimal; i.e., it achieves the point which is strictly in the interior of the
capacity region.
In contrast to CDMA, the successive cancellation receiver is a multiuser

receiver: only one of the users (say, user 1) is decoded treating user 2 as
interference, but user 2 is decoded with the benefit of the signal of user 1
being already removed.
NTS
In the successive cancellation receiver case,

P1
P2
R1 = B log 1 + 2 , R2 = B log 1 +
P1 + 2
or

R1 = B log 1 +
In the CDMA receiver case,

R1 = B log 1 +
P1
P2 + 2
P1
P2 + 2

P2
, R2 = B log 1 + 2

, R2 = B log 1 +
P2
P1 + 2
That is, one of the rates in the CDMA case is always lower than in the case of
successive cancellation!
NTS
Correspondingly, in the successive cancellation receiver case,

P1 + P2
Csum = B log 1 +
2
In the CDMA receiver case, the sum rate is

P2
P1
+
B
log
1
+
B log 1 +
P2 + 2
P1 + 2

P1
P2
= B log
1+
1
+
P2 + 2
P1 + 2

P1 + P2
P1 P2 (P1 + P2 + 2 )
< Csum
= B log 1 +
2
2 (P1 + 2 )(P2 + 2 )
NTS
K -user multiple-access Gaussian channel
Y (i) =
K
X
Xk (i) + Z (i),
Z (i) NC (0, 2 )
k=1
Similar to the two-user case, in the case of K users, all of them share the same
bandwidth, and there is a tradeoff between the rates Rk (k = 1, 2, ... , K ). If one
(or more) users want to communicate at higher rate(s), then the other user(s)
may need to lower their rate(s).
NTS
In the K user-case, we can define the capacity region C as a set of all

(R1 , R2 , ... , RK ) such that users 1, 2, ... , K can simultaneously reliably
communicate at rates R1 , R2 , ... , RK , respectively.
This capacity region is described by the 2K 1 constraints:

Pk
Rk < B log 1 + 2 , k = 1, ... , K

Pk + Pi
Rk + Ri < B log 1 +
, k, i = 1, ... , K
2

Pk + Pi + Pi
Ri + Rk + Rl < B log 1 +
, k, i, l = 1, ... , K
2

!
PK
K
X
k=1 Pk
Rk < B log 1 +
2
k=1
NTS
K -user multiple-access Gaussian channel
The K -user capacity region can be written in a short form as

P

X
kS Pk
Rk < B log 1 +
for all S {1, ... , K }
2
kS
The right hand side

P

Pk
B log 1 + kS2
is the maximum sum rate that can be achieved by a single transmitter with the
total power of the users in S and with no other users in the system.
NTS
The sum capacity can be defined as

Csum =
K
X
max
(R1 ,...,RK )C
It can be shown that
PK
Csum = B log 1 +
RK
k=1
k=1
2
Pk
and that there are exactly K ! corner points in the capacity region, each one
corresponding to a different successive cancellation order among the users.
NTS
In the equal power case

(P1 = P2 = = PK = P)

KP
Csum = B log 1 + 2
Observe that the sum capacity is unbounded as the number of users grows. In
contrast, in the conventional CDMA receiver (decoding each user treating all the
other users as noise), the sum rate will be only

P
BK log 1 +
(K 1)P + 2
which approaches
BKP
log e ' B log e
(K 1)P + 2
as K . The growing interference is a limiting factor here!
NTS
The symmetric capacity can be defined as

Csym =
max
(R,R,...,R)C
It can be shown that in the equal power case (P1 = P2 = = PK = P),

KP
B
Csym = log 1 + 2
K
This rate for each user can be obtained by orthogonal multiplexing where each
user is allocated a fraction 1/K of the total degrees of freedom (for example, of
the total bandwidth B).
Note that Csym = CKsum
NTS
Broadcast channels
Two-user broadcast AGWN channel:

Yk (i) = hk X (i) + Z (i),
k = 1, 2;
Zk (i) NC (0, 2 )
where hk is the fixed complex channel gain corresponding to the kth user.
The broadcast case is often referred to as downlink.
The transmit power constraint: the average power of the transmit signal is P.
As in the multi-access (uplink) channel case, we can define the capacity region C
as the region of rates (R1 , R2 ) at which both users can simultaneously reliably
communicate.
NTS
Broadcast channels
We have just two single-user bounds:

P|hk |2
Rk < B log 1 +
,
2
k = 1, 2
For any k, this upper bound on Rk can be attained by using all the transmit
power to communicate to user k (with the rate of the remaining user being zero).
Thus, we have two extreme points:

P|h1 |2
R1 = B log 1 +
, R2 = 0
2

P|h2 |2
R2 = B log 1 +
, R1 = 0
2
NTS
Rate region in the symmetric case |h1 | = |h2 |
Further, we can share the degrees of freedom (time and bandwidth) between the
users in an orthogonal manner to obtain any rate pair on the line joining these two
extreme points.
Hence, for the symetric ase of |h1 | = |h2 | the capacity region is a triangle.
NTS
Rate region in the symmetric case |h1 | = |h2 |
NTS
In the symmetric case |h1 | = |h2 | , |h| sum rate can be shown to be bounded by
the single-user capacity:

P|h|2
R1 + R2 < B log 1 +
2
The latter conclusion follows from the triangle form of the capacity region.
NTS
As have been already mentioned, the rate pairs in the capacity region can be
achieved by sharing the degrees of freedom (bandwidth and time) between the
two users. What are the alternative ways to achieve the boundary of the capacity
region?
The channel symmetry suggests an alternative natural approach:
I
Let the channel of the user 2 be stronger than that of user 1 (|h1 | < |h2 |).
Thus, if user 1 can successfully decode its data from Y1 , then user 2 (which
has higher SNR) should also be able to decode the data of user 1 from Y2 .
Then, user 2 can subtract the data of user 1 from its received signal Y2 to
better decode its own data; i.e., it can perform successive interference
cancellation.
NTS
Consider the following transmission strategy that superposes the signals of two
users, much like in a spread-spectrum CDMA system. The transmitted signal is
the sum of two signals:
X (i) = X1 (i) + X2 (i)
where Xk (i) is the signal intended for user k.
NTS
Superposition coding
Weaker user 1 decodes it own signal by treating the signal for user 2 as noise.
Stronger user 2 performs successive interference cancellation: it first decodes the
data of user 1 by treating X2 as noise, subtracts the so-determined signal of user 1
from Y2 , and then extracts its own data. As a result, for any possible power split
of P = P1 + P2 , the following rate pair can be achieved

P1 |h1 |2
R1 = B log 1 +
P2 |h1 |2 + 2

P2 |h2 |2
R2 = B log 1 +
2
This strategy is commonly referred to as superposition coding.
NTS
Orthogonal scheme
On the other hand, in orthogonal schemes, for any power split P = P1 + P2 and
degree-of-freedom split [0, 1], the following rates are jointly achieved

P1 |h1 |2
R1 = B log 1 +
2

P2 |h2 |2
R2 = (1 )B log 1 +
(1 ) 2
Here, can be interpreted, for example, as the fraction of bandwidth (e.g. both
bandwidth B and noise power reduced by factor ) assigned to user 1. Alternative
can be interpreted as a fraction of time assigned to user 1 (e.g bits per 1
seconds becomes bits per second and signal power P1 is consumed in fraction
second of time).
NTS
Rate region in the symmetric case

|h1 | = |h2 | = |h|
Assume that superposition coding is used and that the power is split such that
P P1 + P2 . In this case if user 1 can decode its data treating the data of user 2
as noise, then also user 2 can decode the data of user 1, substract it from its
received signal and decoding its own data. Hence the the following rates pairs are
suported.
P1 |h1 |2
(P1 + P2 )|h1 |2
P2 |h1 |2
) = B log(1 +
) B log(1 +
)
2
2
2
P2 |h1 | +
2
P2 |h2 |2
R2 B log(1 +
)
2
Thus for |h1 | = |h2 | = |h| and the power constraint P P1 + P2 the sum capacity
is given by
P|h|2
)
R1 + R2 B log B log(1 +
2
. April 2012 | NTS TU Darmstadt | Marius Pesavento | 181
19.
NTS
R1
B log(1 +
Rate region in the general case

|h1 | |h2 |
Solid line: optimal power split using superposition coding.

Dashed line: optimal degrees of freedom split using orthogonal coding.
NTS
In the K -user broadcast case, the boundary of the capacity region can be proved
to be given by
!
Pk |hk |2
, k = 1, ... , K
Rk = log 1 +
PK
2 + ( l=k+1 Pl )|hk |2
for all possible power splits P =
PK
k=1
Pk of the total power at the base station.
The optimal points are achieved by superposition coding and successive

interference cancellation at the receivers. The cancellation order at every receiver
should be always to decode the weaker users before decoding its own data.
NTS
Fading channels
Until now, all multi-user channels have been considered without random channel
fading.
Let us now include fading in the signal model. Channel state information issue is
critical in such cases.
NTS
Multiple-access fading channels
K -user multiple-access fading channel:

Y (i) =
K
X
hk (i)Xk (i) + Z (i)
k=1
where {hk (i)} is the random fading process of user k.

We assume that

E |hk (i)|2 = 1, k = 1, ... , K
and that the fading processes of different users are i.i.d.
NTS
Slow fading
The time-scale of communication is short relative to the channel coherence time

of all users. Hence, hk (i) = hk for all K .
Suppose all users transmit at the rate R. Conditioned on each realization of
h1 , ... , hK , we have the standard multiple-access AGWN channel with received
SNR of user k equal to |hk |2 P/ 2 . If the symmetric capacity is less than R, then
this results in outage. Using the expressions for the K -user capacity region, the
outage probability can be written as
(
!
)
X
2
pout = Pr B log 1+SNR
|hk | ) < |S|R for some S {1, ... , K }
kS
where |S| denotes the cardinality of S and SNR = P/ 2 .
NTS
Fast fading
Each hk (i) is modelled as a time-varying ergodic process.

The sum capacity in the fast fading case:
(
!)
PK
2
k=1 |hk | P
Csum = E B log 1 +
2
How does this compare to the sum capacity of the uplink channel without fading?
Let us use Jensens inequality which basically says that
E{f (X )} f (E{X })
for any concave function f () and random variable X .
NTS
Using this inequality, we obtain that

(
!)
|hk |2 P
Csum = E B log 1 +
2
nP
o
K
2
E
P
k=1 |hk |
B log 1 +
2

KP
= B log 1 + 2

2
where the property E |hk (m)| = 1 (k = 1, ... , K ) has been used in the last line.
The last expression can be identified as the sum capacity of the AWGN
multiple-access channel. Hence, without channel state information at the
transmitter, fading can only hurt.
PK
k=1
NTS
However, if the number of users K becomes large, then

K
X
|hk |2 K
k=1
and the penalty due to fading vanishes. Basically, the effect of fading is averaged
over a large number of users.
NTS
Let us now assume that we have full (possibly also non-causal) channel state
information at both the transmitter and receiver sides.
Block-fading model:
K
X
Y (i) =
hk (i)Xk (i) + Z (i)
k=1
where hk (i) = hk,l remains constant over the lth coherence channel period of Tc
(Tc 1) symbols and is i.i.d. across different coherence periods.
The channel over L such coherence periods can be viewed as a number of L

parallel sub-channels which fade independently. Therefore, we can again use
water-filling philosophy.
NTS
For a given realization of the channel gains hk,l (k = 1, ... , K ; l = 1, ... , L), the
sum capacity is given by
!
PK
L
2
BX
k=1 Pk,l |hk,l |
log 1 +
max
2
{Pk,l } L
l=1
subject to Pk,l 0 (k = 1, ... , K ; l = 1, ... , L) and the average power constraint

L
1X
Pk,l = P,
L
k = 1, ... , K
l=1
The solution to this optimization problem as L yields the appropriate power

allocation policy.
NTS
This leads to a variable rate scheme: in each lth sub-channel, the rates that are
dictated by the above optimization problem are used.
Optimal strategy: The sum rate in the lth sub-channel
!
PK
2
k=1 Pk,l |hk,l |
B log 1 +
2
PK
for a given total power k=1 Pk,l allocated to this sub-channel is maximized by
giving all this power to the user with the strongest channel gain. That is, each
time only one user with the best channel is allowed to transmit. Under this
strategy, the multiuser channel for each time l reduces to a point-to-point channel
with the channel gain
max |hk,l |2
k=1,...,K
NTS
Broadcast fading channels
K -user downlink fading channel:

Yk (i) = hk (i)X (i) + Zk (i),
k = 1, ... , K
where {hk (i)} is the random fading process of user k.

Similar to the uplink case, we assume that

E |hk (i)|2 = 1,
k = 1, ... , K
and that the fading processes of different users are i.i.d.

The transmit power is constrained to be equal to P.
NTS
Let us first consider the case when the channel state information is available only
at the receiver.
We have the following single-user bounds:

P|h|2
Rk < B E log 1 +
, k = 1, ... , K
2
where h is a random channel gain.
For any k, this upper bound on Rk can be attained by using all the transmit power
to communicate to user k (with the rate to the remaining users being zero). Thus,
as in the non-fading case, we have K extreme points of the capacity region.
NTS
Similar to the non-fading case, it can be shown that the sum rate is also bounded
by the same quantity
K
X
k=1

P|h|2
Rk < B E log 1 +
2
This bound can be achieved by transmitting only to one user or by time-sharing

between any number of users.
It can be shown that the rate pairs in the capacity region can be achieved by both
orthogonal schemes and superposition coding.
NTS
Let us now consider the case when the channel state information is available both
at the transmitter and receiver.
Let us focus on the sum capacity. As in the uplink case, it can be shown that the
sum capacity is achieved by transmitting only to the best user at each time. Under
this strategy, the downlink channel reduces to a point-to-point channel with the
channel gain
max |hk |2
k=1,...,K
NTS
Multiuser diversity
We have seen that in the full channel state information case, from the sum
capacity perspective, the optimal strategy both in the uplink and downlink cases
reduces the multiuser case to the single-user (point-to-point) case with the fading
of magnitude maxk |hk (i)|. Compared to a system with a single user, the multiuser
diversity gain comes from:
I
the increase of the total transmit power in the uplink case;
the improvement of the effective channel gain at time i from |hk (i)|2 to
maxk=1,...,K |hk |2 .
The second effect appears entirely due to the ability to dynamically schedule
resources among the users as a function of the channel state.
NTS
Remarks
The multiuser diversity gain comes from the following effect: when many
users fade independently, at any time there is a high probability that one of
them has a strong channel. By allowing only that user to transmit or, vice
versa, transmitting only to that user, the shared channel resource is used in
the most efficient manner, and the total throughput is maximized.
The larger the number of users, the higher is the multiuser diversity gain.
The amount of multiuser diversity gain depends critically on the tail of the
distribution of |hk |2 : the heavier the tail, the more likely there is a user with
the strong channel, and the larger the multiuser diversity gain.
NTS
System requirements to extract the multiuser

diversity benefits
the base station has to access the channel quality of each user:
I
in downlink, each user has to track its own channel SNR and feed back the
channel quality to the base station.
in uplink, the base station has to track the user channel quality (user SNRs).
the base station has to schedule transmissions among the users as well as to
adapt the data rate as a function of instantaneous channel quality.
Such a scheduling procedure is often called opportunistic scheduling.
NTS
Fairness and delay
In reality, the fading statistics of different users may be non-symmetric: there

are users some users who are closer to the base station and better in their
average SNR; there are users that are stationary (non-moving), or having no
scatterers around.
The multiuser diversity strategy is only concerned with maximizing long-term

average throughputs. In practice, there are latency requirements, that is, the
average throughput over the delay is the performance metric of interest.
NTS
Channel measurement and feedback
All scheduling decisions are done as a function of user channel states. Hence,
the quality of channel estimation is a primary issue, and feedback from the
users to the base station is needed in the downlink case.
Both the error in channel measurement and the delay/error in feeding the
channel state back are significant bottlenecks of practical applications of the
multiuser diversity strategy.
Slow or limited fading:

I
We have observed that the use of multiuser diversity strategy requires fading
to be rich and fast. Not useful for line-of-sight scenarios or cases with little
scattering or slowly changing environments.
NTS
Proportional fair downlink scheduling
Keeps track of the average throughput Tk (i) (k = 1, ... K ) of each user in

some (e.g., exponentially weighted) time-window of length tW .
In the ith time-slot, the base station receives the requested/supportable rates
Rk (i) (k = 1, ... K ) from all users, and transmits to the user k with the
largest
= Rk (i)/Tk (i)
The average throughputs are updated as:

Tk (i + 1) =
(1 1/tW )Tk (i) + Rk (i)/tW , k = k

(1 1/tW )Tk (i),
k=
6 k
This algorithm is used in the downlink mode of the 3G system IS-856.

NTS
Combination of multiuser diversity and superposition coding
Divides the users in several classes (say, in two classes depending on whether
they are near to the base station or near the cell edge). Then, users in each
class have statistically comparable channel strengths.
Users whose current channel is instantaneously strongest in their own class

are scheduled for simultaneous transmission using superposition coding. Users
of stronger classes (e.g., nearby users) receive less power, still enjoying very
good rates and minimally affecting the performance of the weak classes of
users.
NTS
ADVANCES OF CHANNEL CODING
We have already discussed the linear block channel codes in the Information
Theory I. Now, we will discuss cyclic codes as well as convolutional codes.
NTS
Cyclic codes
An important subclass of linear block codes.

Consider an n-tuple
c = [c0 , c1 , ... , cn1 ]
Cyclically shifting the components of c, we have
c(1) = [cn1 , c0 , ... , cn2 ]
Using i subsequent cyclic shifts, we have
c(i) = [cni , cni+1 , ... , cn1 , c0 , c1 , ... , cni1 ]
NTS
Definition: cyclic codes
An (n, k) linear block code C is called a cyclic code if every cyclic shift of any
codeword in C is also a codeword in C .
Properties:
I Linearity: the sum of any two codewords is also a codeword;
I Cyclic property: Any cyclic shift of any codeword is also a codeword.
To develop the theory of cyclic codes, let us treat the components of the
codeword c as the coefficients of the following polynomial:
c(X ) = c0 + c1 X + + cn1 X n1
where X is an indeterminate.
The fact that all ci are binary is taken into account by using the binary arithmetic
for all polynomial coefficients when operating with polynomials.
NTS
Cyclic codes
There is one-to-one correspondence between the vector c and the polynomial

c(X ). We will call c(X ) the code polynomial of c.
Each power of X in the polynomial c(X ) represents a one-bit shift in time. Hence,
multiplication of c(X ) by X may be viewed as shift to the right.
Key question: how to make such a shift cyclic?
Let c(X ) be multiplied by X i , yielding
X ic(X ) = X i(c0 +c1 X +...+cni1 X ni1+cni X ni+...+cn1 X n1)
= c0 X i +c1X i+1+...+cni1X n1+cni X n+...+cn1 X n+i1
= cni X n+...+cn1 X n+i1+c0 X i +c1X i+1+...+cni1X n1
where, in the last line, we have just rearranged the terms.
NTS
Cyclic codes
Recognizing, for example, that cni + cni = 0 in the modulo-2 arithmetic, we

can manipulate the first i terms as follows:
X ic(X ) = cni +...+ cn1X i1 + c0 X i + c1 X i+1 + ... + cni1 X n1
+cni (X n + 1) + ... + cn1 X i1 (X n + 1)
Defining
c (i) (X ) , cni +...+ cn1X i1 + c0 X i + c1 X i+1 + ... + cni1 X n1
q(X ) , cni + cni+1 X + ... + cn1 X i1
we can reformulate the first equation in this page in the following compact form
X i c(X ) = q(X )(X n + 1) + c (i) (X )
NTS
Cyclic codes
The polynomial c (i) (X ) can be recognized as the code polynomial of the codeword
c(i) obtained by applying i cyclic shifts to the codeword c.
Moreover, from the latter equation, we readily see that c (i) (X ) is the remainder
that results from dividing X i c(X ) by (X n + 1).
Hence, we may formally state the cyclic property in polynomial notation as
follows: if c(X ) is a code polynomial, then the polynomial
c (i) (X ) = X i c(X ) mod(X n + 1)
is also a code polynomial for any cyclic shift i, where mod(X n + 1) stands for
modulo-(X n + 1) multiplication.
NTS
Cyclic codes
Note that n cyclic shifts of any codeword does not change it, which means that
X n = 1, and hence X n + 1 = 0 in modulo-(X n + 1) arithmetics!
Generator polynomial: a polynomial g(X ) of minimal degree that completely
specifies the code and is a factor of X n + 1. The degree of g(X ) is equal to the
number of parity-check bits of the code, n k.
It can be shown that any cyclic code is uniquely determined by its generator
polynomial in that each code polynomial in the code can be expressed in the form
of a polynomial product as follows:
c(X ) = a(X )g (X )
where a(X ) is a polynomial of degree k 1.
NTS
Cyclic codes
Given the generator polynomial g (X ), we want to encode the message

[m0 , ... , mk1 ] in an (n, k) systematic form. The codeword structure is
[b0 , b1 , ... , bnk1 , m0 , m1 , ... , mk1 ]
Define the message bits and parity bits polynomials as
m(X ) , m0 + m1 X + ... + mk1 X k1
b(X ) , b0 + b1 X + ... + bnk1 X nk1
We want the code polynomial to be in the form c(X ) = b(X ) + X nk m(X )
This means that b0 , ... , bnk1 occupy the first n k positions of each codeword,
whereas the message bits start from the (n k + 1)st position.
NTS
Cyclic codes
Using equation c(X ) = a(X )g (X ) yields

a(X )g (X ) = b(X ) + X nk m(X )
Equivalently,
X nk m(X )
b(X )
= a(X ) +
g (X )
g (X )
which means that b(X ) is the remainder left over after dividing X nk m(X ) by
g (X ).
NTS
Example: A (7,4) cyclic code
We start with the polynomial X 7 1 and factorize it into three irreducible

polynomials as
X 7 1 = (1 + X )(1 + X 2 + X 3 )(1 + X + X 3 )
where by an irreducible polynomial we mean a polynomial that cannot be factored
using only polynomials with binary coefficients.
Let us take
g (X ) = 1 + X + X 3
as generator polynomial whose degree is equal to the number of parity bits.
NTS
We can also define a parity check polynomial

h(X ) = 1 +
k1
X
hi X i + X k
i=1
such that g (X )h(X ) = X n + 1

or, equivalently g (X )h(X ) mod(X n + 1) = 0
For our example, the parity check polynomial
h(X ) = 1 + X + X 2 + X 4
so that h(X )g (X ) = (1 + X + X 2 + X 4 )(1 + X + X 3 ) = X 7 + 1.
NTS
How to encode, for example, the message sequence 1001?

The corresponding message polynomial is
m(X ) = 1 + X 3
Multiplying m(X ) by X nk = X 3 , we have
X nk m(x) = X 3 + X 6
Dividing X nk m(x) by g (X ), we have
X3 + X6
X + X2
= X + X3 +
3
1+X +X
1 + X + X3
NTS
That is,
a(X ) = X + X 3 ,
b(X ) = X + X 2
and the encoded message is

c(X ) = b(X ) + X nk m(X )
= X + X 2 + X 3 (1 + X 3 )
= X + X2 + X3 + X6
or, alternatively,
c = [0111001]
NTS
Relationship to conventional linear block codes
for the considered (7, 4) code, we can construct the generator matrix from
generator polynomial by using
g (X ) = 1 + X + X 3
Xg (X ) = X + X 2 + X 4
X 2 g (X ) = X 2 + X 3 + X 5
X 3 g (X ) = X 3 + X 4 + X 6
as the rows of the 4 7 generator matrix
1 1 0 1
0 1 1 0
G=
0 0 1 1
0 0 0 1
0
1
0
1
0
0
1
0
0
0
0
1
NTS
Relationship to conventional linear block codes
Clearly, the latter generator matrix is in non-systematic form. We can put it in a

systematic form by manipulating with its rows, that is, by adding the first row to
the third row and adding the sum of the first two rows to the fourth row. Then,
we get
1
0
G=
1
1
1
1
1
0
0
1
1
1
1
0
0
0
0
1
0
0
0
0
1
0
0
0
0
1
Decoding cyclic codes can be made in the same way as for any other linear block
codes, e.g., using syndrome.
NTS
Popular cyclic codes are the so-called cyclic redundancy check (CRC) codes,
Bose-Chaudhuri-Hocquenghem (BCH) codes, and non-binary Reed-Solomon (RS)
codes. The are parts of different international communication standards, e.g.,
digital subscriber line (DSL) standards.
NTS
Convolutional codes
Most powerful class of linear codes.

Similar to the linear block codes the encoder of a convolutional code accepts k-bit
message blocks and produces an encoded sequence of n-bit blocks. However, each
encoded block depends not only on the corresponding k-bit message block, but
also on the M previous message blocks.
Such an encoder is said to have a memory order of M.
The ratio
R = k/n
is called the code rate.
NTS
Convolutional codes
The message sequence m = [m0 , m1 , m2 , ...] enters the encoder one bit at a time.
The encoder output sequences are obtained as the convolution of the input
sequence with the encoder generator sequences. For an encoder with the memory
order M, the length of these sequences is M + 1. For example, in the case of two
impulse generator sequences,
(0)
(0)
g(0) = [g0 , ... , gM ],
(1)
(1)
g(1) = [g0 , ... , gM ]
we can write encoding equations

c(0) = m g(0) ,
c(1) = m g(1)
where denotes the discrete convolution and all operations are modulo-2.
NTS
Convolutional codes
The convolution operation implies that

(j)
cl
M
X
(j)
mli gi ,
j = 0, 1
i=0
where mli = 0 for all l < i.
After encoding, the output sequences are multiplexed into a single sequence called
the codeword
(0) (1) (0) (1)
c = [c0 , c0 , c1 , c1 , ... ]
NTS
Convolutional codes
Defining a matrix
(0)
g0
G =
(1)
g0
(0)
g1
(0)
g0
(1)
g1
(1)
g0
..
.
(0)
g1
(0)
gM
(1)
g1
..
.
(1)
gM
(0)
gM
(1)
gM
..
where all blank areas are zeros, we can rewrite the encoding equations in matrix
form as
c = mG
This form of this equation is equivalent to that of linear block codes! Therefore,
we call G the generator matrix of the code.
In the case of semi-infinite message sequence, the matrix G is semi-infinite as well.
However, if m is finite-length, then G becomes finite-length as well.
NTS
Example: R = 1/2 code
With the generator sequences:

g(0) = [1011]
g(1) = [1111]
Let the message sequence be
m = [10111]
Encoding equations yield
c(0) = [10111] [1011] = [10000001]
c(1) = [10111] [1111] = [11011101]
and, hence, the 2(k + M)-bit codeword
c = [11 01 00 01 01 01 00 11]
NTS
Example: R = 1/2 code
Alternatively, we can write the k 2(k + M)
11 01 11 11
11 01 11
G =
11 01
11
and obtain the same codeword
11 01 11 11
11 01 11
c = [10111]
11 01
11
generator matrix as
11
11 11
01 11 11
11 01 11 11
as
11
11 11
01 11 11
11 01 11 11
= [11 01 00 01 01 01 00 11]
NTS
Code tree and trellis
Let us discuss the concepts of code tree and trellis using a particular example of
the R = 1/2 convolutional code with M = 2 and the impulse responses
g(0) = [111],
g(1) = [101]
Consider the input sequence m = [10011]. Similar to the example above, it can be
shown that the codeword becomes
c = [11 10 11 11 01 01 11]
To enforce the R = 1/2 property, let us truncate the codeword by dropping the
last 2M = 4 bits (the effect of truncation becomes negligible if longer messages
and codewords are used). Then, the codeword becomes [11 10 11 11 01]
NTS
Convolutional Encoder
NTS
The code tree is defined as follows: each branch of the tree represents an input
symbol (0 or 1). The corresponding output (coded) symbols are indicated on each
branch. A specific path can be traced for each message sequence. The
corresponding coded symbols on the branches following this path form the output
sequence.
NTS
Code tree
NTS
State diagram
NTS
Trellis diagram
NTS
Trellis diagram
NTS
Trellis diagram
NTS
Trellis diagram
NTS
Trellis diagram
NTS
Complexity of Viterbi Decoder
Over L binary intervals, the total number of comparisons made by the Viterbi
algorithm is 2K 1 L, rather than 2L comparisons required by the standard
maximum-likelihood procedure (full tree search).
NTS
Probability of deviating from correct path
Let a(d) denote the number of pathes with a Hamming distance d deviating from,
and then returning to, the all-0 test path. The error probability of Pe of deviating
from the correct path is then upper bounded by
Pe <
a(d)Pd
d=dF
where Pd denotes that probability that d bits are received in error and dF denotes
the minimum free distance.
Inequality sign because pathes a not mutually exclusive.
Pe depends critically on the minimum free distance dF !
NTS
CONCLUSION
We have studied advanced information theory including the capacity

characterization of multi-antenna and multi-user channels (and the resulting
concept of multiuser diversity), and advanced channel coding approaches
such as cyclic, convolutional codes.
To apply these concepts and approaches to practice or to do research in these

fields, a deeper study is required.
NTS

Slides IT2 SS2012

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Slides IT2 SS2012

Uploaded by

Copyright:

Available Formats

Lecture Course: Information Theory II

Communication Systems Group

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 1

Instructors: Dr.-Ing. Marius Pesavento, S3/06/204,

Teaching assistant: Yong Cheng, S3/06/205, e-mail:

Lecture notes and slides will be posted in TUCAN

Office hours: on request (please send an e-mail to the TA or instructor)

Written final exam (closed-book)

Examination date (presumably) Tuesday July 31, 2012: 12.00 - 14.00

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 2

1. D. Tse, Fundamentals of Wireless Communication, Cambridge University

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 3

Overview of basics of information theory

Multi-antenna channel capacity, water-filling

Entropy, mutual information, capacity

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 4

Topics of the earlier basic IT course

Information, entropy, mutual information, and their derivatives

Basic theory of source coding, Shannons source coding theorem, Huffman

Channel capacity, Shannons channel coding theorem, Gaussian channel,

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 5

REVIEW OF PROBABILITY THEORY:

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 6

NORMALIZATION PROPERTY OF CDFs

Since FX () = 1, we obtain the so-called normalization property

Let the real-valued random variable X be uniformly distributed in the interval

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 9

CDF AND PDF OF GAUSSIAN RANDOM

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 10

PROBABILITY MASS FUNCTION

The normalization condition:

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 11

EXTENSION TO DISCRETE VARIABLES

How to extend the concepts of pdf and cdf to discrete variables?

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 12

EXTENSION TO DISCRETE VARIABLES

Relationships between the delta-function and unit step function:

Shifting property of delta-function:

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 13

EXTENSION TO DISCRETE VARIABLE

Then, the pdf can be expressed as

Using the delta-function sifting property, we have

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 14

Let the random variable X be an outcome of the coin tossing experiment.

Let the random variable X be an outcome of the die throwing experiment.

Expected value (mean) of a continuous random variable:

For a discrete random variable:

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 17

We can also compute expected value of a function of continuous random variable:

For a discrete random variable:

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 18

VARIANCE OF A RANDOM VARIABLE

var{X } = E{(X E{X })2 }

The variance and standard deviation can be interpreted as measures of the

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 19

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 20

Let us now consider two random variables X and Y jointly.

Joint distribution function:

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 21

The inverse relationship:

Any pdf satisfies the following normalization property:

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 22

From symmetry, it also follows that

From the last two equations, we obtain the Bayes rule