You are on page 1of 238

Lecture Course: Information Theory II

Marius Pesavento

Communication Systems Group


Institute of Telecommunications
Technische Universitat Darmstadt

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 1

NTS

COURSE ORGANIZATION

Instructors: Dr.-Ing. Marius Pesavento, S3/06/204,


pesavento@nt.tu-darmstadt.de FG Nachrichtentechnische Systeme
(NTS),

Teaching assistant: Yong Cheng, S3/06/205, e-mail:


yong.cheng@nt.tu-darmstadt.de

Website: http://www.nts.tu-darmstadt.de/

Lecture notes and slides will be posted in TUCAN

Office hours: on request (please send an e-mail to the TA or instructor)

Written final exam (closed-book)

Examination date (presumably) Tuesday July 31, 2012: 12.00 - 14.00

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 2

NTS

RECOMMENDED TEXTBOOKS

1. D. Tse, Fundamentals of Wireless Communication, Cambridge University


Press, 2005. (main reference)
2. A. El Gamal and Y.H. Kim Network Information Theory, Cambridge
University Press, 2012.
3. A. Goldsmith, Wireless Communications, Cambridge University Press, 2005.
4. T. M. Cover and J A. Thomas, Elements of Information Theory, John Wiley
& Sons, 1991.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 3

NTS

COURSE OUTLINE

Overview of basics of information theory


I
I
I

I
I

Multi-antenna channel capacity, water-filling


Basic theory of network information theory
I
I
I

I
I
I

Entropy, mutual information, capacity


Source coding and channel coding theorem
Memoryless Gaussian channel

Multi-access channels
Broadcast channels
Relay channels

Cyclic codes
Convolutional codes
Turbo-codes

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 4

NTS

Topics of the earlier basic IT course

Information, entropy, mutual information, and their derivatives

Basic theory of source coding, Shannons source coding theorem, Huffman


coding, Lempel-Ziv coding

Channel capacity, Shannons channel coding theorem, Gaussian channel,


bandlimited channel, Shannons limit, multiple Gaussian channels, multiple
colored noise channels, water-filling, ergodic and outage capacities, basics of
MIMO channels,

Basic theory of channel coding, linear block coding, Reed-Muller codes, Golay
code

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 5

NTS

REVIEW OF PROBABILITY THEORY:


CDF AND PDF

Let X be a continuous random variable with the cumulative density function (cdf)
FX (x) = Probability{X x} = P(X x)
Probability density function (pdf):
fX (x) =
where

FX (x)
x
x0

FX (x0 ) =

fX (x) dx

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 6

NTS

NORMALIZATION PROPERTY OF CDFs

Since FX () = 1, we obtain the so-called normalization property


Z

fX (x) dx = 1

Simple interpretation:
fX (x) = lim0

P{x /2 X x + /2}

f (x)
X

x1 x2
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 7

SURFACE =Probability{x

1< X< x 2}

NTS

EXAMPLE 1

Let the real-valued random variable X be uniformly distributed in the interval


[0, T ].

FX (x)
1
0

f (x)

1/T
0
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 8

NTS

EXAMPLE 2

Let the real-valued random variable X has the so-called Gaussian (normal)
distribution
2
2
1
fX (x) = p
e (xX ) /2X
2
2X
where X2 = var{X } is the variance and X is the mean. The corresponding
distribution function is given by
Z x
2
2
1
FX (x) = p
e (X ) /2X d
2
2X

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 9

NTS

CDF AND PDF OF GAUSSIAN RANDOM


VARIABLE

FX (x)
1

0
1/2

(22X)

f (x)
X

2 X
0

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 10

x
NTS

PROBABILITY MASS FUNCTION

Let X now be a discrete random variable which takes the values xi (i = 1, ... , I )
with the probabilities P(xi ) (i = 1, ... , I ), respectively.
For discrete variables, we define the probability mass function
P(xi ) = Probability(X = xi )

The normalization condition:


I
X

P(xi ) = 1

i=1

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 11

NTS

EXTENSION TO DISCRETE VARIABLES

How to extend the concepts of pdf and cdf to discrete variables?


Define the unit step function as

0, x < 0
u(x) =
1, x 0
Define the Dirac delta-function as

(x) =

, x = 0
0, x =
6 0

Z
,

(x) dx = 1

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 12

NTS

EXTENSION TO DISCRETE VARIABLES

Relationships between the delta-function and unit step function:


Z x
u(x)
() d = u(x) ,
(x) =
x

Shifting property of delta-function:


Z
g (x) (x y ) dx = g (y )

Using the definition of the unit step function, we can express the cdf as
FX (x) =

I
X

P(xi )u(x xi )

i=1

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 13

NTS

EXTENSION TO DISCRETE VARIABLE

Then, the pdf can be expressed as


fX (x) =

I
X

P(xi )(x xi )

i=1

Using the delta-function sifting property, we have


Z

I
X

fX (x) dx =

P(xi )(x xi ) dx

i=1

I Z
X
i=1

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 14

P(xi )(x xi ) dx =

I
X

P(xi ) = 1

i=1

NTS

EXAMPLE 1

Let the random variable X be an outcome of the coin tossing experiment.

FX (x)
1.0
0.5
0

f (x)

(0.5)
0
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 15

(0.5)
1

x
NTS

EXAMPLE 2

Let the random variable X be an outcome of the die throwing experiment.

FX (x)
1
1/6
0

f (x)

(1/6)
0
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 16

x
NTS

STATISTICAL EXPECTATION

Expected value (mean) of a continuous random variable:


Z

X = E{X } =

x fX (x) dx

For a discrete random variable:


Z

X = E{X } =

xfX (x) dx =

I Z
X
i=1

I
X

P(xi )(x xi ) dx

i=1

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 17

xP(xi )(x xi ) dx =

I
X

xi P(xi )

i=1

NTS

STATISTICAL EXPECTATION

We can also compute expected value of a function of continuous random variable:


Z
E{g (X )} =
g (x) fX (x) dx

For a discrete random variable:


E{g (X )} =

I
X

g (xi ) P(xi )

i=1

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 18

NTS

VARIANCE OF A RANDOM VARIABLE

var{X } = E{(X E{X })2 }


= E{X 2 } E{X }2 = X2
where X is commonly called standard deviation.

The variance and standard deviation can be interpreted as measures of the


statistical dispersion of a random variable w.r.t. the expected value.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 19

NTS

EXAMPLE

Compute the mean and variance of the random variable uniformly distributed in
the interval [0, 1]

f (x)
X

1
1
1
x
x dx = =
2
2
0

Z
X

=
0

X2

1
2

x dx

=
0

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 20

2X

1
x 3
1
1 1
1
= = =
3
4
3 4
12
0

NTS

JOINT DISTRIBUTION

Let us now consider two random variables X and Y jointly.

Joint distribution function:


FX ,Y (x, y ) = P(X x, Y y )

Joint pdf:
fX ,Y (x, y ) =

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 21

2 FX ,Y (x, y )
x y

NTS

JOINT DISTRIBUTION

The inverse relationship:


Z

x0

y0

FX ,Y (x0 , y0 ) =

fX ,Y (x, y ) dx dy

Any pdf satisfies the following normalization property:


Z Z
fX ,Y (x, y ) dx dy = 1

Also,
Z

fX ,Y (x, y ) dx = fY (y ) ,

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 22

fX ,Y (x, y ) dy = fX (x)

NTS

CONDITIONAL DISTRIBUTION

In practical problems, we are often interested in the pdf of one random variable X
conditioned by the fact that a second random variable Y has some specific value
y . It is obvious that
P(X x; Y y ) = P(X x|Y y )P(Y y )
Then, conditional cdf is defined as
FX (x|y ) = P(X x|Y y ) =

FX ,Y (x, y )
FY (y )

From symmetry, it also follows that


FY (y |x) =
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 23

FX ,Y (x, y )
FX (x)
NTS

CONDITIONAL DISTRIBUTION

fX ,Y (x, y )
fY (y )
fX ,Y (x, y )
fY (y |x) =
fX (x)
fX (x|y ) =

From the last two equations, we obtain the Bayes rule


fX (x|y ) fY (y ) = fY (y |x) fX (x)

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 24

NTS

NORMALIZATION CONDITION

fX ,Y (x, y )
dx
fY (y )

Z
1
=
fX ,Y (x, y ) dx = 1
fY (y )

fX (x|y ) dx =

Conditional expectation:
Z

E{g (X )|y } =

g (x)fX (x|y ) dx

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 25

NTS

STATISTICAL INDEPENDENCE

Two random variables X and Y are statistically independent if


fX ,Y (x, y ) = fX (x)fY (y )
Substituting this equation to the conditional pdf, we obtain that statistical
independence implies
fX (x|y ) = fX (x)
That is, the variable Y does not have any influence on the variable X .

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 26

NTS

EXAMPLE

Let

fX ,Y (x, y ) =

4xy , 0 x 1, 0 y 1
0,
otherwise

Are the variables X and Y statistically dependent?


Z
Z
fX (x) =
fX ,Y (x, y ) dy = 4x

2x , 0 x 1,
0,
otherwise

2y , 0 y 1,
0,
otherwise

=
fY (y )

1
y 2
y dy = 4x
2
0

fX ,Y (x, y ) = fX (x)fY (y ) and, hence, the variables are independent!


19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 27

NTS

CORRELATION AND COVARIANCE

Two fundamental characteristics of linear statistical dependence are correlation


rXY = E{XY }
and covariance
cov{X , Y }

E{(X E{X })(Y E{Y })}

E{XY } E{X }E{Y }

E{XY } X Y

For X = Y , covariance boils down to variance:


cov{X , X } = E{X 2 } 2X = var{X }
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 28

NTS

SOME USEFUL PROPERTIES

var{X + Y } = var{X } + var{Y } + 2 cov{X , Y }.

If the variables X and Y are statistically independent then for any functions h
and g , E{h(X )g (Y )} = E{h(X )}E{g (Y )}.

If the variables X and Y are statistically independent then cov{X , Y } = 0.


Therefore, covariance is sometimes used a measure of statistical dependence.
However, the reverse statement is not necessarily true!

If the variables X and Y are statistically independent then


var{X + Y } = var{X } + var{Y }.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 29

NTS

EXTENSION TO
MULTIVARIAT DISTRIBUTIONS

We may also consider multiple (more than two) random variables X1 , ... , Xn .

Joint distribution function:


FX1 ,X2 ,...,Xn (x1 , x2 , ... , xn ) = P(X1 x1 , X2 x2 , ... , Xn xn )
Joint pdf:
fX1 ,X2 ,...,Xn (x1 , x2 , ... , xn ) =

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 30

n FX1 ,X2 ,...,Xn (x1 , x2 , ... , xn )


x1 x2 xn

NTS

MULTIVARIAT DISTRIBUTIONS

Introducing vectors
X =

[X1 , X2 , ... , Xn ]T

[x1 , x2 , ... , xn ]T

we rewrite the previous equations in symbolic (vector) notation as


FX (x)

P(X x)

fX (x)

N FX (x)
x1 x2 xn

Normalization condition:
Z

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 31

fX (x) dx = 1

NTS

MULTIVARIAT DISTRIBUTIONS

Statistical expectation can be defined as


Z
Z
E{g (X)} =

g (x) fX (x) dx

where g (X) is some function of the random vector X.


In particular bivariate case
Z Z
g (x, y ) fX ,Y (x, y ) dx dy
E{g (X , Y )} =

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 32

NTS

MULTIVARIAT GAUSSIAN DISTRIBUTIONS

Jointly Gaussian random variables have the following joint multivariate pdf:
T 1
1
1
fX (x) =
e 2 (xX ) R (xX )
n
1/2
( 2) det{R}

where the mean


X = E{X}
and the covariance matrix
R = E{(X E{X})(X E{X})T } = E{XXT } X T
X
In symbolic notation
X N (X , R)
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 33

NTS

MULTIVARIAT GAUSSIAN DISTRIBUTION

In the case of a single (n = 1) random variable X = X1 , the n-variate Gaussian


pdf reduces to
2
2
1
pX (x) = p
e (xX ) /2X
2
2X
which is the well-known Gaussian pdf.
In the case of two (N = 2) random variables X = X1 and Y = X2 , we have that

R=

X2
X Y

X Y
Y2


,

E{(X X )(Y Y )}
X Y

Note that = XY is nothing else as correlation coefficient.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 34

NTS

MULTIVARIAT GAUSSIAN DISTRIBUTION

The determinant of R is given by


det{R} = X2 Y2 (1 2 )
and, therefore, the n-variate pdf reduces to the so-called bivariate pdf
1
p
fXY (x, y ) =
2X Y 1 2



1
2(x X )(y Y ) (y Y )2
(x X )2
exp

+
2(1 2 )
X Y
X2
Y2
The maximum of this function is located in the point {x = X ; y = Y } and the
maximal value is
1
p
max {fX ,Y (x, y )} = fX ,Y (X , Y ) =
2X Y 1 2
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 35

NTS

MULTIVARIAT GAUSSIAN DISTRIBUTION

In the case of uncorrelated X and Y , = 0 and we have





1
1 (x X )2
(y Y )2
fXY (x, y ) =
exp
+
2X Y
2
X2
Y2



2
2
2
2
1
1

e (xX ) /2X
e (y Y ) /2Y
=
2X
2Y
= fX (x) fY (y )
i.e., the variables X and Y become statistically independent. This is a very
important result showing that any uncorrelated Gaussian random variables are also
statistically independent! Note that in the case of non-Gaussian random variables,
this is not true.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 36

NTS

MULTIVARIAT GAUSSIAN DISTRIBUTION


EXAMPLE

Contour plots of the bivariate Gaussian pdf with the parameters X = Y = 0 and
X = Y = 1.
3

BIVARIATE GAUSSIAN PDF, CORRELATION COEFFICIENT = 0

3
3

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 37

0
x

NTS

MULTIVARIAT GAUSSIAN DISTRIBUTION


EXAMPLE

BIVARIATE GAUSSIAN PDF, CORRELATION COEFFICIENT = 0.25

3
3

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 38

0
x

NTS

MULTIVARIAT GAUSSIAN DISTRIBUTION


EXAMPLE

BIVARIATE GAUSSIAN PDF, CORRELATION COEFFICIENT = 0.5

3
3

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 39

0
x

NTS

MULTIVARIAT GAUSSIAN DISTRIBUTION


EXAMPLE

BIVARIATE GAUSSIAN PDF, CORRELATION COEFFICIENT = 0.75

3
3

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 40

0
x

NTS

MULTIVARIAT GAUSSIAN DISTRIBUTION


EXAMPLE

BIVARIATE GAUSSIAN PDF, CORRELATION COEFFICIENT = 0.95

3
3

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 41

0
x

NTS

MULTIVARIAT GAUSSIAN DISTRIBUTION


EXAMPLE

BIVARIATE GAUSSIAN PDF, CORRELATION COEFFICIENT = 0.25

3
3

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 42

0
x

NTS

MULTIVARIAT GAUSSIAN DISTRIBUTION


EXAMPLE

BIVARIATE GAUSSIAN PDF, CORRELATION COEFFICIENT = 0.5

3
3

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 43

0
x

NTS

MULTIVARIAT GAUSSIAN DISTRIBUTION


EXAMPLE

BIVARIATE GAUSSIAN PDF, CORRELATION COEFFICIENT = 0.75


3

3
3

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 44

0
x

NTS

MULTIVARIAT GAUSSIAN DISTRIBUTION


EXAMPLE

BIVARIATE GAUSSIAN PDF, CORRELATION COEFFICIENT = 0.95

3
3

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 45

0
x

NTS

BASICS OF INFORMATION THEORY

Shannon: Information is the resolution of uncertainty about some statistical event:


I

Before the event occurs, there is an amount of uncertainty.

After the occurrence of the event, there is no uncertainty anymore, but there
is gain in the amount of information.

Highly expected messages deliver small amount of information, while highly


unexpected ones deliver a large amount of information.Hence, the amount of
information should be inversely proportional to the probability of the message.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 46

NTS

Information and entropy

The amount of information of the symbol x with the probability P(x):




1
I (x) = log
= log(P(x)) with [I (x)] = Bit
P(x)
Considering a source with the alphabet X = {x1 , ... , xN }, entropy is defined as the
statistically averaged amount of information (mean of I (X )):
H(X ) = E{I (X )} = E{log(P(X ))}
=

N
X

P(xi ) log(P(xi ))

i=1

N
X


P(xi ) log

i=1
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 47

1
P(xi )


with

[H(X )] = Bit/Symbol
NTS

Example

Entropy of non-symmetric binary source with the probabilities P(0) = p and


P(1) = 1 p
HB (X ) = p log(p) (1 p) log(1 p)
H B (X), Bit/Zeichen maximale Ungewissheit

0
I
I

0.5

The entropy characterizes the source uncertainty.


The entropy is a concave function of probability.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 48

NTS

SOME DERIVATIVES OF ENTROPY


Joint Entropy

The definition of entropy can be extended to a pair of random variables X and Y


(two discrete sources X = {x1 , ... , xN } and Y = {y1 , ... , yM }).
The joint entropy H(X , Y ) is defined as:
H(X , Y ) = E{log(P(X , Y ))}
=

N X
M
X

P(xi , yl ) log(P(xi , yl ))

i=1 l=1

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 49

NTS

Conditional Entropy

The conditional entropy H(Y |X ) is the amount of uncertainty remaining about


the random variable Y after the random variable X has been observed:
H(Y |X ) = EX ,Y {log(P(Y |X ))}
=

N X
M
X

P(xi , yl ) log(P(yl |xi ))

i=1 l=1

N
X
i=1

P(xi )

M
X

P(yl |xi ) log(P(yl |xi ))

l=1

where we use the Bayes rule


P(xi , yl ) = P(xi |yl )P(yl ) = P(yl |xi )P(xi )
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 50

NTS

Useful properties

Important conditional entropy property:


H(X , Y ) = H(X ) + H(Y |X )

Hence the entropy, conditional entropy, and joint entropy are related quantities.
Another important property: Conditioning reduces entropy:
H(X |Y ) H(X )
with the equality if and only if X and Y are statistically independent.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 51

NTS

Mutual information

Let us consider two random variables (sources). The amount of information


exchanged between two symbols xi und yl can be defined as:


P(xi |yl )
I (xi ; yl ) = log
P(xi )


P(xi , yl )
= log
with [I (xi ; yl )] = Bit
P(xi )P(yl )
where we again use the Bayes rule P(xi , yl ) = P(xi |yl )P(yl ).

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 52

NTS

Mutual information

The amount of mutual information exchanged between two sources X and Y can
be obtained by averaging of I (xi ; yl ) as
I (X ; Y ) =

N X
M
X

P(xi |yl )
P(xi )

P(xi , yl )
P(xi )P(yl )

P(xi , yl ) log

i=1 l=1

N X
M
X
i=1 l=1

P(xi , yl ) log

with [I (X ; Y )] = Bit/Symbol

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 53

NTS

Mutual information

Mutual information is the reduction in the uncertainty of X due to the knowledge


of Y :
I (X ; Y ) = H(X ) H(X |Y )

Relation of mutual information to entropies and joint entropy:


I (X ; Y ) = H(X ) + H(Y ) H(X , Y )

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 54

NTS

Channel capacity

The input probabilities P(xi ) are independent of the channel. We can then
maximize the mutual information I (X ; Y ) w.r.t. P(xi ). The channel capacity can
be then defined as the maximum mutual information in any single use of the
channel, where the maximization is over P(xi ) (i = 1, ... , M):
C = max I (X ; Y )
{P(xi )}

with [C ] = Bits/Symbol

or bits per channel use (bpcu).

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 55

NTS

Example

Channel capacity of a binary symmetric channel

x1

y1

1- p
1- p

x2

y2

CB = 1 + p logp + (1 p) log(1 p)
= 1 HB (X )
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 56

NTS

Entropy/capacity of a binary symmetric channel

H B (X), Bit/Zeichen

C B , Bit/Zeichen

0.5

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 57

0.5

NTS

Channel coding/decoding

The inevitable presence of noise in a channel causes errors between the output
and input data sequences of a digital communication system. To reduce these
errors we will resort to channel coding.
Channel encoder maps the incoming source data into a channel input sequence. It
adds redundancy to these data to protect it from errors.
Channel decoder inversely maps the channel output sequence into an output data
sequence in a way that the overall effect of the channel noise on the system is
minimized.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 58

NTS

Shannons Channel-Coding Theorem

Let information be transmitted through a discrete memoryless channel of capacity


C . If the transmission rate
R<C
then there exists a channel coding scheme for which the source output can be
transmitted over the channel with an arbitrarily small probability of error.
Conversely, if
RC
than it is impossible to transmit information over the channel with an arbitrary
small probability of error.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 59

NTS

Joint source-channel coding theorem

If
H(X ) > C
then it is impossible to transmit the source outputs over the channel with an
arbitrary small probability of error.

The latter theorem follows from the direct combination of source-coding and
channel-coding theorems.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 60

NTS

Continuous sources

The mutual information between two continuous random sources X and Y with
the joint symbol pdf fX ,Y (x, y ) is given by


Z Z
fX (x|y )
I (X ; Y ) =
fX ,Y (x, y ) log
dx dy
fX (x)



Z Z
fX ,Y (x, y )
dx dy
=
fX ,Y (x, y ) log
fX (x)fY (y )

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 61

NTS

What is the relationship between the discrete and continuous mutual information?

It can be shown that the definitions of mutual information in the continuous and
discrete cases are essentially similar.

This property enables to use the continuous mutual information to define the
capacity in the case of continuously distributed (infinite alphabet) sources.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 62

NTS

Continuous-time bandlimited channel

Consider a continuous-time bandlimited channel with additive Gaussian white


noise (AGWN). The output of such AWGN channel can be described as
Y (t) = (X (t) + Z (t)) h(t)
where X (t) and Z (t) are the signal and noise waveforms, respectively, and h(t) is
the impulse response of an ideal bandpass filter with the cutoff frequency B.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 63

NTS

Bandbegrentzter AWGN-Kanal
H(f)

x(t)

y(t)

+
AWGN

n(t)
Leistungsdichtespectrum
des Rauschens

B f

-B

idealer Tiefpass
Bandbreite B

SN (f)
N 0 /2
f

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 64

NTS

Capacity

Capacity of the bandlimited channel:




P
C = B log 1 +
bits per second
N0 B
where it is taken into account that PN = N0 B.

Shannons bound:
C



P
= lim B log 1 +
B
N0 B
P
P
= loge
' 1.44
N0
N0

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 65

NTS

Parallel AWGN channels

Consider multiple parallel AWGN channels


Yi = Xi + Zi ,

i = 1, ... , K

with a common power constraint


E

( K
X

)
Xi2

i=1

K
X
i=1

E{Xi2 } =

K
X

Pi P

i=1

where Zi N (0, PN,i ), the noise is statistically independent from channel to


channel, and Pi = E{Xi2 }.
How to distribute the power P among the channels to maximize the total
capacity?
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 66

NTS

Water-filling

Result (water-filling): The total capacity is maximized when


Pi = ( PN,i )+
where the value of is chosen that
K
X

Pi =

K
X

i=1

( PN,i )+ P

i=1

and ()+ denotes the positive part, i.e., for any x,


(x)+ ,

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 67

x,
0,

if x 0
if x < 0
NTS

Water-filling

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 68

NTS

SKETCH OF THE PROOF

The mutual information of a system with multiple Gaussian channels can be


shown to be upper-bounded by the value


K
1X
Pi
log 1 +
2
PN,i
i=1

Equality is achieved when X = [X1 , X2 , ... , XK ]T is Gaussian vector:


X N (0, P)

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 69

NTS

SKETCH OF THE PROOF

The covariance matrix

P=

P1
0
..
.

0
P2
..
.

..
.

0
0
..
.

PK

= diag{P1 , ... , PK }

Hence, the capacity of multiple Gaussian channels is given by




K
Pi
1X
log 1 +
C=
2
PN,i
i=1

Let us now maximize C over {Pi }K


i=1 subject to the constraints
Pi 0, for i = 1, ... , K .
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 70

PK

i=1

Pi = P. and

NTS

SKETCH OF THE PROOF

We use the Lagrange multiplier method. The Lagrangian function can be written
as


K
K
K
X
X
1X
Pi
L(P1 , ... , PK ) =
log 1 +
+ 0 (P
Pi ) +
i Pi
2
PN,i
i=1

i=1

i=1

where 0 , ... , K are the Lagrange multipliers. Differentiating L(P1 , ... , PK ) w.r.t.
Pi , we have
!


K
K
X
X
L

1
Pi
=
log e ln 1 +
+ 0 (P
Pi ) +
i Pi
Pi
Pi 2
PN,i
i=1

i=1

1/PN,i
1
= log e
0 + i
2
1 + Pi /PN,i
log e
1
=
0 + i
2 Pi + PN,i
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 71

NTS

SKETCH OF THE PROOF

From the so-called Karnush-Kuhn-Tucker (KKT) conditions for constraint convex


optimization problems:
K
X

Pi? = P

Pi? 0

(zero gradient)

(complementary slackness)

(constraint satisfaction)

i=1

log e
1
?0 + ?i
2 Pi? + PN,i
?i Pi?
?i
Thus

Pi?

0 and

PK

?
i=1 Pi

Pi? (?0

0; i = 1, ... , K

(for inequality constraints)

= P as well as

log e
1
log e
1
) and ?0
2 Pi? + PN,i
2 Pi? + PN,i

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 72

NTS

SKETCH OF THE PROOF

from KKT: Pi? 0 and


Pi? (?0

PK

i=1

Pi? = P as well as

log e
1
log e
1
) = 0 and ?0
2 Pi? + PN,i
2 Pi? + PN,i

Thus if
?0 <

log e 1
,
2 PN,i

then from the last equation we have Pi? > 0 which by slackness conditions implies
that
log e
1
.
?0 =
?
2 Pi + PN,i
and thus for ? = log e/(2?0 )
Pi? = ( ? PN,i ) .
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 73

NTS

SKETCH OF THE PROOF

from KKT: Pi? 0 and


Pi? (?0

PK

i=1

Pi? = P as well as

log e
1
log e
1
) = 0 and ?0
2 Pi? + PN,i
2 Pi? + PN,i

Reversely if
?0

log e 1
,
2 PN,i

then Pi? > 0 is impossible as it would imply that


?0

log e 1
log e
1
>
,
?
?
2 Pi
2 Pi + PN,i

which violates the complementary slackness condition. We conclude that for


PN,i ? we have Pi? = 0 and Pi? = ( ? PN,i ) otherwise.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 74

NTS

EXTENDED DEFINITIONS OF CAPACITY


Ergodic capacity

Ergodic capacity: In the case of random Gaussian channel, it is sometimes more


useful to separate the effects of the transmitted signal and the channel as
Y (i) = X (i)H(i) + Z (i)
where H(i) is the channel gain in the ith channel use. In contrast to noise and
signal waveforms, the channel gain is usually treated as non-random
(deterministic) value.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 75

NTS

Ergodic capacity

For this model,


P = E{X 2 }
can be interpreted as the transmitted signal power, whereas
P = E{(XH)2 } = E{X 2 }H 2 = PH 2
can be interpreted as the received signal power.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 76

NTS

Ergodic capacity

In this case, the capacity formula reads


C=



1
PH 2
log 1 +
2
PN

Note that the conventional capacity is instantaneous, that is, it characterizes the
maximal achievable rate for particular given realization of the gain H of the
channel.

How can we characterize the maximal achievable rate in average rather than for
some particular channel gain?

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 77

NTS

Ergodic capacity

In practice, wireless channels are random and, therefore, should be treated as


random.

Based on this fact, the ergodic capacity is defined as the instantaneous capacity C
averaged over the channel realizations:
CE = EH {C }
where EH {} denotes statistical expectation over the random channel gain.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 78

NTS

Ergodic capacity

Assume that we know the channel gain pdf fH (h). In this case, we can compute
the ergodic capacity as
Z
CE =
fH (h) C (h) dh

Ergodic capacity provides another look at the achievable transmission rate as


compared to the conventional instantaneous capacity, because it gives the average
rather than the instantaneous picture.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 79

NTS

Outage

Outage capacity: the transmission rate Cpout which does not exceed the
instantaneous capacity C in pout 100 percents of channel realizations.
The quantity pout is called outage probability.
Outage is defined as the event where, for some particular channel realization, the
chosen transmission rate is higher than the instantaneous capacity (that is, where
no error-free transmission is possible).
In the cases of small pout (roughly speaking, pout 0.1), outage-induced errors
can be cured by means of channel coding.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 80

NTS

Outage

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 81

NTS

Outage

The outage capacity can be characterized as follows. Let the pdf of the
instantaneous capacity C = C (H) be fC (c) where fC (c) = 0 for c < 0. Then, the
outage capacity is defined by the equation
Z
p = P(C < Cpout ) =

Cpout

fC (c) dc
0

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 82

NTS

Channel coding

Channel encoding and decoding is used to correct errors that may occur during
the signal transmission over the channel.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 83

NTS

Linear block codes

Linear binary block codes: coding/decoding operations can be described using


linear algebra. Binary codes use modulo-2 arithmetic.
A code is said to be linear if the modulo-2 sum of any two codewords in the code
give another codeword of this code.
A code is denoted as (n, k) linear block code if n is the total number of bits of the
code, and k is the number of bits containing the message.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 84

NTS

Linear block codes

Row-vector notations
m = [m1 , ... , mk ]
b = [b1 , ... , bnk ]
c = [b1 , ... , bnk , m1 , ... , mk ] = [b, m]
Block codes use the message bits to generate parity-check bits according to the
equation:
b = mP
where P is the k (n k) coefficient matrix. Noting that c = [b, m], we get that
c = [b, m] = [mP, m] = m[P, Ik ] = mG
where G is the k n generator matrix.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 85

NTS

Hamming codes

Hamming codes, a family of codes with


n = 2m 1
k = 2m m 1, n k = m
(7,4) Hamming code (n = 7, m = 3, k = 4) generator matrix:
1
0
G = [P, I4 ] =
1
1

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 86

1
1
1
0

0
1
1
1

1
0
0
0

0
1
0
0

0
0
1
0

0
0

0
1

NTS

Message
0000
0001
0010
0011
0100
0101
...

Codeword
0000000
1010001
1110010
0100011
0110100
1100101
and so on ...

Hamming Weight
0
3
4
3
3
4
...

For the given Hamming code, dmin = 3. Therefore, it is a single-error correcting


code.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 87

NTS

MULTI-ANTENNA CHANNELS

Consider a multiple-input multiple-output (MIMO) channel:


N

antennas

channel

Tx

antennas

Rx

In the frequency flat fading case, the signal in the mth receive antenna
Ym (t) =

N
X

Hmn (t)Xn (t) + Zm (t),

m = 1, ... , M

n=1

where Hmn is the channel coefficient between the mth receive and nth transmit
antennas, Xn is the signal sent from the nth transmit antenna, and Zm is the noise
in the mth receive antenna.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 88
NTS

MIMO channel

Defining the M N channel matrix

H=

H11
H21
..
.

H12
H22
..
.

..
.

H1N
H2N
..
.

HM1

HM2

HMN

and the transmit signal, receive signal, and noise column-vectors


x = [X1 , ... , XN ]T , y = [Y1 ... , YM ]T , z = [Z1 , ... , ZM ]T
we can write the system input-output relationship in the matrix form as:
y = Hx + z
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 89

NTS

SIMO channel

One particular case of the MIMO channel is a single-input multiple-output


(SIMO) channel:
N

1 antenna

antennas

Tx

Rx

In the frequency flat fading case, the signal in the nth receive antenna
Yn (t) = Hn (t)X (t) + Zn (t),

n = 1, ... , N

where Hn is the channel coefficient between the nth receive antenna and the
transmit antenna, and X (t) is the signal sent from the transmit antenna.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 90

NTS

SIMO channel

Defining the N 1 channel vector


h = [H1 , ... , HN ]T
we can write the system input-output relationship in the vector form as:
y = hX + z

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 91

NTS

MISO channel

Another particular case of the MIMO channel is a multiple-input single-output


(MISO) channel:
N

antennas

1 antenna

Tx

Rx

In the frequency flat fading case, the signal in the receive antenna
Y (t) =

N
X

Hn (t)Xn (t) + Z (t)

n=1

where Hn is the channel coefficient between the receive antenna and the nth
transmit antenna, and Xn (t) is the signal sent from the nth transmit antenna.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 92

NTS

MISO channel

Defining the N 1 channel row-vector


h = [H1 , ... , HN ]
we can write the system input-output relationship in the vector form as:
Y = hx + Z

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 93

NTS

Capacity in the case of an informed transmitter

Let us consider the MIMO case assuming that z NC (0, 2 I). Then, the equation
y = Hx + z
describes a vector Gaussian channel. In the case of known channel at the
transmitter, the capacity can be computed by decomposing this channel into a set
of parallel independent scalar Gaussian sub-channels.
Singular value decomposition (SVD) of H:
H = UVH
where the M M matrix U and the N N matrix V are unitary, that is,
UH U = UUH = I and VH V = VVH = I.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 94

NTS

SVD for any n m matrix A

A = UVH =
m
n

= U

111
000
000 11
111
00
00
11

i ui viH

111
000

VH

n<m
m

111 11
000
00
00
11

A =

111
000

VH

n>m
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 95

NTS

MIMO capacity (informed transmitter)

Using the SVD of H, the MIMO model equation becomes


y = UVH x + z
Multiplying this equation by UH from right, and using the unitary property of U,
we have
UH y = VH x + UH z

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 96

NTS

MIMO capacity (informed transmitter)

Introducing the notations y , UH y,


x , VH x,
of parallel Gaussian channels
y =
x + z

z , UH z we become a system

where E{zzH } = UH E{zzH }U = 2 I, and, therefore


z NC (0, 2 I)
Moreover,
k
xk2 = xH VVH x = kxk2
Thus, the power is preserved!

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 97

NTS

MIMO capacity (informed transmitter)

The system of parallel channels can be also written componentwise


i + Zi ,
Yi = i X

i = 1, ... , no

where no = min{N, M}. The transition to this equivalent system corresponds to


pre-processing
x = V
x
at the transmitter and post-processing
y = UH y
at the receiver. Hence, the pre- and post-processing operators are V and UH ,
respectively.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 98

NTS

MIMO capacity (informed transmitter)

To implement the pre-/post-processing operations, the original vector to be


transmitted has to be
x. It should be pre-processed at the transmitter to obtain
x = V
x
The vector x should then be sent over the channel. At the receiver, we have
y = Hx = UVH V
x = U
x
and after post-processing, we obtain y = UH y = UH U
x = x

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 99

NTS

MIMO capacity (informed transmitter)

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 100

NTS

MIMO capacity (informed transmitter)

The capacity of the resulting parallel independent channel system:


C =B

no
X
i=1

Pi 2
log 1 + 2 i


bits/s

where Pi are the water-filling power allocations:



Pi =

2
2i

+

Pno
and the water level is obtained from the total power constraint i=1
Pi P
Each i corresponds to an eigenmode of the channel, also called eigenchannel.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 101

NTS

Wireless MIMO channel

N Tx

wireless
MIMO
channel

M Rx

System model
+

assume perfect channel state information at the transmitter

MIMO channel equation in matrix


notation
N Tx

M Rx

What is the optimum transmission and power allocation


scheme if the channel matrix H is known at the transmitter?

Capacity of a MIMO channel

max

subject to:

(1)
(2)

sum power contraint due to hardware


limitations and/or regulations

Singular Value Decomposition of MIMO


channel

ui and vi are left and right singular vectors.


i is corresponding sigular value ( 0)

and

are unitary:

Decoupling the channels using linear


transformation
;

for

i = 1,,r;

r independent parallel channels

Independent parallel channel


representation
X

r independent parallel channels

Optimization problem
Capacity:

max

subject to (1)
(2)

pi is power assigned to i-th input signal

Water-filling principle

Water-filling principle

Water-filling principle

Water-filling principle

Water-filling principle

High SNR regime


What are the key parameters that determine the performance?

At high SNR, the water level is high and the policy of allocating equal amounts of
power to each channel is asymptotically optimal. In this case,
r
X



P2i
log 1 +
r 2
i=1
 2
r
X
Pi
' B
log
r 2

C ' B

i=1

' rB log SNR + B

r
X
i=1

log

2i
r

bits/s

where r , rank{H} and SNR = P/ 2 .

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 115

NTS

High SNR regime


What are the key parameters that determine the performance?

It can be proved that among the channels with the same power gain, the channels
with the equal spread of singular values result in the highest capacity.
This means that well-conditioned channel matrices are preferable in the high SNR
regime.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 116

NTS

Low SNR regime


What are the key parameters that determine the performance?

In this regime, the optimal policy is to allocate power to the channel with the
strongest eigenmode:


P2max
C ' B log 1 +
2
and ill-conditioned (rank-one) channel matrices are preferable.
Using the property log(1 + x) ' xloge that is valid for x  1, we have
C '

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 117

BP2max log e
2

NTS

MIMO capacity (uninformed transmitter)

Let us now obtain the MIMO channel capacity based on general considerations
assuming that H is fixed, while the other values (x, y and z) are random. In such
a case, no assumption on the channel knowledge at the transmitter is used at this
time, but the receiver is assumed to know H.
Capacity via mutual information:
C = max I (x; y) = max[H(y) H(y|x)]
p(x)

p(x)

The output covariance matrix is given by


R = E{yyH } = HPHH + 2 I
where
P , E{xxH }
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 118

NTS

MIMO capacity (uninformed transmitter)

Result: (Telatar, 1995; Foschini and Gans, 1998): Consider the model
y = Hx + z
where x NC (0, P), z NC (0, 2 I), and H is fixed. Let B be the channel
bandwidth in Hz. Then, the MIMO channel capacity is equal to


1
C = B log det I + 2 HPHH bits/s

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 119

NTS

Result 1

Let X1 , ... , Xn have a multivariate complex circular Gaussian distribution with the
mean and covariance matrix P:
fX (x) =

1
()n det{P}

e (xX )

P1 (xX )

Then
H(X) = H(X1 , ... , Xn ) = log ((e)n det{P})
Proof:
Z
H(X) =

fX (x)(x x )H P1 (x x )dx

+ ln (()n det{P})
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 120

NTS

Result 1 (proof)

= E{(x X )H P1 (x X )} + ln(()n det{P})


= E{tr(P1 (x X )(x X )H )} + ln(()n det{P})
= tr(P1 E{(x X )(x X )H }) + ln(()n det{P})
= tr(P1 P) + ln(()n det{P})
= n + ln(()n det{P})
= ln((e)n det{P}) nats
= log((e)n det{P}) bits

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 121

NTS

Result 2

Let the random vector x Cn have zero mean and covariance ExxH = P. Then
H(X) = H(X1 , ... , Xn ) log {(e)n det{P}} with equality if and only if
X NC (0, P).
R
Proof:: Let g (x) be a pdf with covariance [P]ij = g (x)xi xj dx and let P (x) be
the complex circular Gaussian pdf NC (0, P).

Note, that the logarithm of the complex circular Gaussian vector


P (x) (x x )H P1 (x x ) is a quadratic form in x.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 122

NTS

Result 2 (proof)

Then the Kullback-Leibler D(g (x)||P (x)) distance between the two pdfs is given
as
Z
g (x)
0 D(g (x)||P (x)) = g (x) log(
)dx
P (x)
Z
= Hg (X) g (x) log(P (x)) dx
| {z }
quadratic form

|
Z
= Hg (X)

{z

second moment of X

P (x) log(P (x))dx

= Hg (X) + HP (X) HP (X) Hg (X)


The Gaussian distribution maximizes the entropy over all distributions with the
same variance.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 123

NTS

Proof of MIMO capacity result

It has been shown that for all random vectors with the covariance matrix R, the
entropy of y is maximized when y is zero-mean circularly-symmetric complex
Gaussian. But this is only true when the input vectors x are zero-mean
circularly-symmetric complex Gaussian, and, therefore, it is the optimal
distribution for X.
Using these facts, the capacity formula can be proved by obtaining explicit
expressions for H(Y) and H(Y|X).

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 124

NTS

Proof of MIMO capacity result

Recall the signal model:


Y = HX + Z
Then the mutual information between X and Y is given as
I (X; Y)

= H(Y) H(Y|X)
= H(Y) H(HX + Z|X)
= H(Y) H(HX|X) H(Z|X)
| {z }
0

= H(Y) H(Z)

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 125

NTS

Proof of MIMO capacity result

From Result 2 we know that the entropy H(Y) is maximized for the complex
circular Gaussian input distribution, thus
max

all pdfs with R

I (X; Y)

max

all pdfs with R

H(Y) H(Z)

log{(e)n det(HPHH + 2 I)} log{(e)n 2 }

log det(I + 1/ 2 HPHH )} bits per channel use

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 126

NTS

Transition to the classic Shannons capacity


result

Assuming a single-input single-output (SISO) system with N = M = 1 and the


constant channel gain H, which transmits with a power P, we have
H = H,

P = P,

I=1

and, therefore


1
C = B log det I + 2 HPHH



|H|2 P
= B log 1 +
2
This is the classical Shannon capacity formula for a bandlimited channel!

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 127

NTS

Channel known at the transmitter

If the channel matrix H is known at the transmitter then the in general unequal
powers should be chosen, and P is not a scaled identity matrix.
Eigenchannels and power allocation using water-filling should be used as discussed
above.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 128

NTS

Channel unknown at the transmitter

If the channel matrix H is unknown at the transmitter, then if follows from the
symmetry reasons that P should be scaled identity matrix. Using the power
constraint
tr{P} = P
we obtain that P has to be chosen as
P = (P/N)I
Indeed, the power constraint is satisfied because
tr{P} = tr {(P/N)I} = (P/N)tr{I} = P

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 129

NTS

Channel unknown at the transmitter

Choosing P = (P/N)I, we obtain that the MIMO capacity in the uninformed


transmitter case is given by


P
C = B log det I + 2 HHH
N
Assuming that, although being fixed, the entries of H are statistically independent
random values with the unit variance, and using the law of large numbers, we
obtain that for a large number of transmit antennas and a fixed number of receive
antennas
HHH
I
N

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 130

NTS

Channel unknown at the transmitter

Using the latter property, we obtain that for large N,



 
P
C = B log det
1+ 2 I

(
M )
P
= B log
1+ 2



P
= MB log 1 + 2

which is M times the SISO Shannon capacity!

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 131

NTS

Parallel SISO channel interpretation

Consider the general MIMO channel capacity formula. Let the eigendecomposition
of the positive semi-definite Hermitian matrix HPHH be
HPH

r
X

H
i ui uH
i = UU

i=1

where UH U = I, and r , rank{HPHH }. The matrices U and should not be


confused with that of the SVD of the matrix H used earlier!
We will use the property
det{I + AB} = det{I + BA}
valid for any matrices A and B of conformable dimensions.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 132

NTS

Parallel SISO channel interpretation

Assuming that A = U and B = UH , we obtain that






1
1
H
H
C = B log det I + 2 HPH
= B log det I + 2 UU





1
1
H
= B log det I + 2 U U = B log det I + 2

)
( r
r
X
Y


= B log
(1 + i / 2 ) = B
log 1 + i / 2
i=1

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 133

i=1

NTS

Parallel SISO channel interpretation

The latter formula interprets the capacity of the MIMO channel as the sum of
capacities of r parallel SISO channels.
Assuming the case of uninformed transmitter (P = (P/N)I), r can be interpreted
as the rank of H full rank channels are preferable!

If H is drawn randomly, then almost sure


rank{H} = min{M, N}
This leads us to the conclusion that the capacity grows nearly proportionally to
min{M, N}.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 134

NTS

Assume M = N and let the Frobenius norm of H be given. What type of channel
will maximize the MIMO capacity?
Result: The capacity is maximized in the case when H is orthogonal:
HH H = HHH = I
where is a constant. In this case,
C


 

N
P
P
I = B log 1 + 2
= B log det
1+ 2
N
N


P
= NB log 1 + 2
N

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 135

NTS

SIMO channel capacity

Consider a SIMO column-vector channel h with one transmit and N receive


antennas. The capacity formula becomes


1
H
C = B log det I + 2 Phh



1
H
= B log 1 + 2 Ph h



P
2
= B log 1 + 2 khk

Hence, the SIMO channel comprises only one spatial data pipe. The addition of
receive antennas yields only a logarithmic (rather than linear) increase of capacity.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 136

NTS

MISO channel capacity

Consider a MISO row-vector channel h with one receive and N transmit antennas.
The capacity formula becomes


1
H
C = B log 1 + 2 hPh



1
1/2 2
= B log 1 + 2 khP k

The situation is similar to that in the SIMO case. The increase in capacity is only
logarithmic (rather than linear).

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 137

NTS

Ergodic MIMO channel capacity

The channel matrix H is no longer fixed, but is treated as random. The capacity
formula can be averaged over H:



1
EH {C } = B EH log det I + 2 HPHH

Result (Telatar, 1999): Let H be a Gaussian random matrix with i.i.d. elements.
Then, the average capacity is maximized subject to the power constraint
tr{P} P when
P
P= I
N
That is, to maximize the average capacity, the antennas should transmit
uncorrelated streams with the same power an intuitively appealing fact.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 138

NTS

Ergodic MIMO channel capacity:


Proof (sketch)

H be a Gaussian random matrix with i.i.d. elements.




1
C = max EH log det{I + 2 HPHH }
P:trPP

Introduce P = P + Poff
C


1
max EH log det{I + 2 HP HH +
P:trPP


1
=
max EH log det{I + 2 HP HH
P :trP P


1

max log det{EH I + 2 HP HH


P :trP P


1
H
HP
H
}
off
2

1
+ 2 HPoff HH }


1
H
+ 2 HPoff H }

Where the last inequality follows form Jensens inequality.


19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 139

NTS

Ergodic MIMO channel capacity:


Proof (sketch)



1
1
H
H

max log det{EH I + 2 HP H + 2 HPoff H }


P :trP P





1
1
H
H
=
max log det{EH I + 2 HP H + EH 2 HPoff H }
P :trP P

|
{z
}
=0

where the last term in the second equation is identical zero due to the statistical
independence of the entries in H.
We conclude that restricting the transmit covariance to exhibit the diagonal
structure P = P does not reduce the achievable capacity.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 140

NTS

Ergodic MIMO channel capacity:


Proof (sketch)

Thus
C



1
H
=
max EH log det{ I + 2 HP H }
P :trP P

We can show that due to the i.i.d. property of H the objective function is
symmetric w.r.t. the input variable, i.e. exchanging the order of the entries
P1 , ... , PN does not change the function value. Further the function is concave.
We conclude that the optimal power allocation strategy in this case is to equally
distribute the power among the transmitted symbols, e.g. to choose
P1 = P2 = ... = PN .

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 141

NTS

Ergodic MIMO channel capacity

Note that the latter choice of P coincides with our earlier choice of this matrix in
the case of fixed channel and uninformed transmitter.
Choosing P = (P/N) I, the maximal average capacity (which is commonly
referred to as ergodic capacity) becomes



P
H
CE = B EH log det I + 2 HH
N
Ergodic capacity has an important advantage w.r.t. fixed-channel capacity as it
gives an average rather than an instantaneous picture.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 142

NTS

Ergodic MIMO channel capacity

Using the parallel SISO channel interpretation and denoting the singular values of
H as i , we obtain
" r
#

X
Pi2
CE = B EH
log 1 + 2
N
i=1



r
X
P 2
= B
EH log 1 + 2 i
N
i=1

Please, note the difference with the water-filling capacity. In contrast to it, in the
latter expression equal powers are used for each eigen-channel.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 143

NTS

Large antenna regime

Let us denote SNR = P2


Then, the capacity formula becomes
CE = B

r
X
i=1




SNRi2
EH log 1 +
N

Assume M = N and i.i.d. Rayleigh fading. Then, using random matrix theory, it
can be obtained that for any SNR
lim

CE
= const
N

Therefore, capacity grows linearly in N at any SNR in such an asymptotic regime!

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 144

NTS

Outage capacity

A value Cout which is larger than the capacity C in pout percents of channel
realizations. In other words,
Pr(Cout > C ) = pout
If one wants to transmit with Cout bits per second, then the channel capacity is
less than Cout with the probability pout . Hence, the transmission is impossible (the
system is in outage) in pout 100 percents of time.
Alternatively, we can write
Pr(Cout C ) = 1 pout
and, hence, in (1 pout ) 100 percents of time the transmission is possible as the
system is not in outage.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 145

NTS

Outage capacity

1 pout is called non-outage probability.


Using the instantaneous MIMO capacity formula, we can define the MIMO outage
capacity by means of the following expression



1
H
min Pr Cout > B log det I + 2 HPH
= pout

tr{P}P
where we additionally use the opportunity to minimize the outage probability by
means of a proper choice of P. This particular choice, of course, depends on the
statistics of the random channel matrix H.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 146

NTS

Example: Rayleigh fading channel

Rayleigh fading, the channel coefficients are circularly symmetric complex


Gaussian with zero mean and unit variance: a) known channel at the transmitter;
b) unknown channel at the transmitter
2

10
Outage capacity (bits / s / Hz)

Outage capacity (bits / s / Hz)

10

10

10

pout = 0.01
pout = 0.1
pout = 0.5

10
10

10
20
SNR (dB)

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 147

30

10

10

pout = 0.01
pout = 0.1
pout = 0.5

40

10
10

10
20
SNR (dB)

30

40

NTS

MULTIUSER CHANNELS

Why multiuser channels:


I

Up to now, we have considered point-to-point communications links.

Most of communication systems serve multiple users. Therefore, multiuser


channels are of great interest.

In multiuser channels, one user can interfere to another user. This type of
interference is called multiuser interference (MUI).

Common multiuser channel types:


I

Multiple-access channels

Broadcast channels

Relay channels

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 148

NTS

Multiple-access channel

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 149

NTS

Broadcast channel

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 150

NTS

Relay channel

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 151

NTS

Multiple-access channels

Two-user multiple-access Gaussian channel:


Y (i) = X1 (i) + X2 (i) + Z (i),

Z (i) NC (0, 2 )

In the point-to-point (single user) case, the rate limit is the channel capacity. The
achievable rate region is, therefore, given by:


P
R < B log 1 + 2

In the two-user case, we should extend this concept to a capacity region C which
is a set of all pairs (R1 , R2 ) such that users 1 and 2 can simultaneously reliably
communicate at rates R1 and R2 , respectively.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 152

NTS

Multiple-access channels

Since the two users share the same bandwidth, there is a tradeoff between the
rates R1 and R2 : if one user wants to communicate at a higher rate, then the
other user may need to lower its rate.
Example of tradeoff: In orthogonal multiple access schemes such as OFDM, the
tradeoff can be achieved by varying the number of subcarriers allocated to each
user.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 153

NTS

Rate region

Different scalar performance measures can be obtained from the capacity region:
I

The symmetric capacity


Csym = max R
(R,R)C

is the maximum common rate at which both users can simultaneously reliably
communicate.
I

The sum capacity


Csum =

max (R1 + R2 )

(R1 ,R2 )C

is the maximum total throughput that can be achieved.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 154

NTS

Rate region

If we have two users with the powers P1 and P2 , then the capacity region for the
two-user channel is defined by the following inequalities:


P1
R1 < B log 1 + 2



P2
R2 < B log 1 + 2



P1 + P2
R1 + R2 < B log 1 +
2
The first two constraints say that the rate of each individual used cannot exceed
the capacity of the point-to-point link with the other user absent.
The last constraint says that the total throughput cannot exceed the capacity of
the point-to-point link with a single user defined as the sum of the two users.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 155

NTS

Rate region

That is, not only the rates R1 and R1 are limited, but their sum is limited as well.
This means that the signal of each user may be viewed as an interference for
another user.
Result: The two-user capacity region is a pentagon.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 156

NTS

Rate region: multiple-access channel

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 157

NTS

Rate region: multiple-access channel

Remark: Surprisingly, user 1 can achieve its single-user rate bound



R1 = B log 1 + P12  while at the
 same time, user 2 can get a non-zero rate, as
high as R2 = B log 1 +
region plot. Indeed,

P2
P1 + 2

. This corresponds to point A of the capacity




P2
P1
1+ 2 1+

P1 + 2


P1
P2
P1 P2
= B log 1 + 2 +
+

P1 + 2
2 (P1 + 2 )


P 2 + P1 2 + P2 2 + P1 P2
= B log 1 + 1
2 (P1 + 2 )


P1 + P2
= B log 1 +
2

R1 + R2 = B log

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 158

NTS

Successive interference cancellation


How to achieve this?

Each user should encode its data using a capacity achieving channel code. The
receiver should decode the information of both users in two stages:
I

In the first stage, the data of user 2 are decoded treating user
 1 as AWGN.

2
Then, the maximum rate of user 2 can achieve R2 = B log 1 + P1P+
.
2

In the second stage, the reconstructed (decoded) signal of user 2 is


subtracted from the aggregate received signal, and then the data of user 1
are decoded. Since the user 2 is already subtracted and there is only the
background AWGN left in the system, the achieved rate for user 1 will be

R1 = B log 1 + P12 .

This two-stage decoding is called successive interference cancellation.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 159

NTS

Successive interference cancellation

If one reverses the order of cancellation then one can achieve point B rather than
A.
All other rate points on the segment AB can be obtained by time-sharing between
the multiple-access strategies of points A and B.
The segment AB contains all the optimal operating points of the channel, in the
sense that any point in the capacity region is dominated by some point on AB.
That is, for any point within the capacity region that corresponds to the rates R1
and R2 we can always find a point on the segment AB whose rates R1 and R2
satisfy:
R1 R1 ,
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 160

R2 R2
NTS

Pareto-optimal

The points on the segment AB are called Pareto-optimal.

One can always increase the user rates to move to a point on the segment AB,
and there is no reason not to do this.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 161

NTS

The concrete choice of the point on AB depends on our particular objectives:


I

To maximize the sum capacity Csum , any point on AB is equally fine. Note
that we have already computed the sum of R1 and R2 in the point A. Hence,


P1 + P2
Csum = B log 1 +
2

To maximize the symmetric capacity Csym , we should take the point on AB


that gives us equal rates R1 and R2 .
I Some operating points on AB may be not fair, especially if the received
power of one user is much higher than that of the other user. In this case, we
should consider operating on the corner point in which the stronger user is
decoded first.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 162
NTS
I

How does the system with successive cancellation compares to


a standard CDMA system in terms of achievable rate?

The principal difference between CDMA detection and successive cancellation


detection is that:
I

In the CDMA system, each user is decoded treating other users as


interference. This corresponds to the single-user receiver principle and we
immediately conclude that the performance of the CDMA system is
suboptimal; i.e., it achieves the point which is strictly in the interior of the
capacity region.

In contrast to CDMA, the successive cancellation receiver is a multiuser


receiver: only one of the users (say, user 1) is decoded treating user 2 as
interference, but user 2 is decoded with the benefit of the signal of user 1
being already removed.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 163

NTS

In the successive cancellation receiver case,






P1
P2
R1 = B log 1 + 2 , R2 = B log 1 +

P1 + 2
or


R1 = B log 1 +

In the CDMA receiver case,



R1 = B log 1 +

P1
P2 + 2

P1
P2 + 2



P2
, R2 = B log 1 + 2


, R2 = B log 1 +

P2
P1 + 2

That is, one of the rates in the CDMA case is always lower than in the case of
successive cancellation!
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 164

NTS

Correspondingly, in the successive cancellation receiver case,




P1 + P2
Csum = B log 1 +
2

In the CDMA receiver case, the sum rate is






P2
P1
+
B
log
1
+
B log 1 +
P2 + 2
P1 + 2



P1
P2
= B log
1+
1
+
P2 + 2
P1 + 2


P1 + P2
P1 P2 (P1 + P2 + 2 )
< Csum
= B log 1 +

2
2 (P1 + 2 )(P2 + 2 )
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 165

NTS

K -user multiple-access Gaussian channel

Y (i) =

K
X

Xk (i) + Z (i),

Z (i) NC (0, 2 )

k=1

Similar to the two-user case, in the case of K users, all of them share the same
bandwidth, and there is a tradeoff between the rates Rk (k = 1, 2, ... , K ). If one
(or more) users want to communicate at higher rate(s), then the other user(s)
may need to lower their rate(s).

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 166

NTS

In the K user-case, we can define the capacity region C as a set of all


(R1 , R2 , ... , RK ) such that users 1, 2, ... , K can simultaneously reliably
communicate at rates R1 , R2 , ... , RK , respectively.
This capacity region is described by the 2K 1 constraints:


Pk
Rk < B log 1 + 2 , k = 1, ... , K



Pk + Pi
Rk + Ri < B log 1 +
, k, i = 1, ... , K
2


Pk + Pi + Pi
Ri + Rk + Rl < B log 1 +
, k, i, l = 1, ... , K
2

!
PK
K
X
k=1 Pk
Rk < B log 1 +
2
k=1

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 167

NTS

K -user multiple-access Gaussian channel

The K -user capacity region can be written in a short form as


P


X
kS Pk
Rk < B log 1 +
for all S {1, ... , K }
2
kS

The right hand side


P


Pk
B log 1 + kS2

is the maximum sum rate that can be achieved by a single transmitter with the
total power of the users in S and with no other users in the system.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 168

NTS

The sum capacity can be defined as


Csum =

K
X

max
(R1 ,...,RK )C

It can be shown that

PK
Csum = B log 1 +

RK

k=1

k=1
2

Pk

and that there are exactly K ! corner points in the capacity region, each one
corresponding to a different successive cancellation order among the users.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 169

NTS

In the equal power case


(P1 = P2 = = PK = P)



KP
Csum = B log 1 + 2

Observe that the sum capacity is unbounded as the number of users grows. In
contrast, in the conventional CDMA receiver (decoding each user treating all the
other users as noise), the sum rate will be only


P
BK log 1 +
(K 1)P + 2
which approaches
BKP
log e ' B log e
(K 1)P + 2
as K . The growing interference is a limiting factor here!
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 170

NTS

The symmetric capacity can be defined as


Csym =

max

(R,R,...,R)C

It can be shown that in the equal power case (P1 = P2 = = PK = P),




KP
B
Csym = log 1 + 2
K

This rate for each user can be obtained by orthogonal multiplexing where each
user is allocated a fraction 1/K of the total degrees of freedom (for example, of
the total bandwidth B).
Note that Csym = CKsum
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 171

NTS

Broadcast channels

Two-user broadcast AGWN channel:


Yk (i) = hk X (i) + Z (i),

k = 1, 2;

Zk (i) NC (0, 2 )

where hk is the fixed complex channel gain corresponding to the kth user.
The broadcast case is often referred to as downlink.
The transmit power constraint: the average power of the transmit signal is P.
As in the multi-access (uplink) channel case, we can define the capacity region C
as the region of rates (R1 , R2 ) at which both users can simultaneously reliably
communicate.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 172

NTS

Broadcast channels

We have just two single-user bounds:




P|hk |2
Rk < B log 1 +
,
2

k = 1, 2

For any k, this upper bound on Rk can be attained by using all the transmit
power to communicate to user k (with the rate of the remaining user being zero).
Thus, we have two extreme points:


P|h1 |2
R1 = B log 1 +
, R2 = 0
2


P|h2 |2
R2 = B log 1 +
, R1 = 0
2
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 173

NTS

Rate region in the symmetric case |h1 | = |h2 |

Further, we can share the degrees of freedom (time and bandwidth) between the
users in an orthogonal manner to obtain any rate pair on the line joining these two
extreme points.
Hence, for the symetric ase of |h1 | = |h2 | the capacity region is a triangle.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 174

NTS

Rate region in the symmetric case |h1 | = |h2 |

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 175

NTS

In the symmetric case |h1 | = |h2 | , |h| sum rate can be shown to be bounded by
the single-user capacity:


P|h|2
R1 + R2 < B log 1 +
2
The latter conclusion follows from the triangle form of the capacity region.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 176

NTS

As have been already mentioned, the rate pairs in the capacity region can be
achieved by sharing the degrees of freedom (bandwidth and time) between the
two users. What are the alternative ways to achieve the boundary of the capacity
region?
The channel symmetry suggests an alternative natural approach:
I

Let the channel of the user 2 be stronger than that of user 1 (|h1 | < |h2 |).
Thus, if user 1 can successfully decode its data from Y1 , then user 2 (which
has higher SNR) should also be able to decode the data of user 1 from Y2 .
Then, user 2 can subtract the data of user 1 from its received signal Y2 to
better decode its own data; i.e., it can perform successive interference
cancellation.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 177

NTS

Consider the following transmission strategy that superposes the signals of two
users, much like in a spread-spectrum CDMA system. The transmitted signal is
the sum of two signals:
X (i) = X1 (i) + X2 (i)
where Xk (i) is the signal intended for user k.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 178

NTS

Superposition coding

Weaker user 1 decodes it own signal by treating the signal for user 2 as noise.
Stronger user 2 performs successive interference cancellation: it first decodes the
data of user 1 by treating X2 as noise, subtracts the so-determined signal of user 1
from Y2 , and then extracts its own data. As a result, for any possible power split
of P = P1 + P2 , the following rate pair can be achieved


P1 |h1 |2
R1 = B log 1 +
P2 |h1 |2 + 2


P2 |h2 |2
R2 = B log 1 +
2
This strategy is commonly referred to as superposition coding.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 179

NTS

Orthogonal scheme

On the other hand, in orthogonal schemes, for any power split P = P1 + P2 and
degree-of-freedom split [0, 1], the following rates are jointly achieved


P1 |h1 |2
R1 = B log 1 +
2


P2 |h2 |2
R2 = (1 )B log 1 +
(1 ) 2
Here, can be interpreted, for example, as the fraction of bandwidth (e.g. both
bandwidth B and noise power reduced by factor ) assigned to user 1. Alternative
can be interpreted as a fraction of time assigned to user 1 (e.g bits per 1
seconds becomes bits per second and signal power P1 is consumed in fraction
second of time).
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 180

NTS

Rate region in the symmetric case


|h1 | = |h2 | = |h|

Assume that superposition coding is used and that the power is split such that
P P1 + P2 . In this case if user 1 can decode its data treating the data of user 2
as noise, then also user 2 can decode the data of user 1, substract it from its
received signal and decoding its own data. Hence the the following rates pairs are
suported.
P1 |h1 |2
(P1 + P2 )|h1 |2
P2 |h1 |2
) = B log(1 +
) B log(1 +
)
2
2
2
P2 |h1 | +

2
P2 |h2 |2
R2 B log(1 +
)
2
Thus for |h1 | = |h2 | = |h| and the power constraint P P1 + P2 the sum capacity
is given by
P|h|2
)
R1 + R2 B log B log(1 +
2
. April 2012 | NTS TU Darmstadt | Marius Pesavento | 181
19.
NTS
R1

B log(1 +

Rate region in the general case


|h1 | |h2 |

Solid line: optimal power split using superposition coding.


Dashed line: optimal degrees of freedom split using orthogonal coding.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 182

NTS

In the K -user broadcast case, the boundary of the capacity region can be proved
to be given by
!
Pk |hk |2
, k = 1, ... , K
Rk = log 1 +
PK
2 + ( l=k+1 Pl )|hk |2
for all possible power splits P =

PK

k=1

Pk of the total power at the base station.

The optimal points are achieved by superposition coding and successive


interference cancellation at the receivers. The cancellation order at every receiver
should be always to decode the weaker users before decoding its own data.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 183

NTS

Fading channels

Until now, all multi-user channels have been considered without random channel
fading.
Let us now include fading in the signal model. Channel state information issue is
critical in such cases.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 184

NTS

Multiple-access fading channels

K -user multiple-access fading channel:


Y (i) =

K
X

hk (i)Xk (i) + Z (i)

k=1

where {hk (i)} is the random fading process of user k.


We assume that


E |hk (i)|2 = 1, k = 1, ... , K
and that the fading processes of different users are i.i.d.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 185

NTS

Slow fading

The time-scale of communication is short relative to the channel coherence time


of all users. Hence, hk (i) = hk for all K .
Suppose all users transmit at the rate R. Conditioned on each realization of
h1 , ... , hK , we have the standard multiple-access AGWN channel with received
SNR of user k equal to |hk |2 P/ 2 . If the symmetric capacity is less than R, then
this results in outage. Using the expressions for the K -user capacity region, the
outage probability can be written as
(
!
)
X
2
pout = Pr B log 1+SNR
|hk | ) < |S|R for some S {1, ... , K }
kS

where |S| denotes the cardinality of S and SNR = P/ 2 .

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 186

NTS

Fast fading

Each hk (i) is modelled as a time-varying ergodic process.


The sum capacity in the fast fading case:
(
!)
PK
2
k=1 |hk | P
Csum = E B log 1 +
2
How does this compare to the sum capacity of the uplink channel without fading?
Let us use Jensens inequality which basically says that
E{f (X )} f (E{X })
for any concave function f () and random variable X .

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 187

NTS

Using this inequality, we obtain that


(

!)
|hk |2 P
Csum = E B log 1 +
2
nP
o

K
2
E
P
k=1 |hk |

B log 1 +
2


KP
= B log 1 + 2



2
where the property E |hk (m)| = 1 (k = 1, ... , K ) has been used in the last line.
The last expression can be identified as the sum capacity of the AWGN
multiple-access channel. Hence, without channel state information at the
transmitter, fading can only hurt.
PK

k=1

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 188

NTS

However, if the number of users K becomes large, then


K
X

|hk |2 K

k=1

and the penalty due to fading vanishes. Basically, the effect of fading is averaged
over a large number of users.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 189

NTS

Let us now assume that we have full (possibly also non-causal) channel state
information at both the transmitter and receiver sides.
Block-fading model:
K
X
Y (i) =
hk (i)Xk (i) + Z (i)
k=1

where hk (i) = hk,l remains constant over the lth coherence channel period of Tc
(Tc  1) symbols and is i.i.d. across different coherence periods.

The channel over L such coherence periods can be viewed as a number of L


parallel sub-channels which fade independently. Therefore, we can again use
water-filling philosophy.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 190

NTS

For a given realization of the channel gains hk,l (k = 1, ... , K ; l = 1, ... , L), the
sum capacity is given by
!
PK
L
2
BX
k=1 Pk,l |hk,l |
log 1 +
max
2
{Pk,l } L
l=1

subject to Pk,l 0 (k = 1, ... , K ; l = 1, ... , L) and the average power constraint


L

1X
Pk,l = P,
L

k = 1, ... , K

l=1

The solution to this optimization problem as L yields the appropriate power


allocation policy.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 191

NTS

This leads to a variable rate scheme: in each lth sub-channel, the rates that are
dictated by the above optimization problem are used.
Optimal strategy: The sum rate in the lth sub-channel
!
PK
2
k=1 Pk,l |hk,l |
B log 1 +
2
PK
for a given total power k=1 Pk,l allocated to this sub-channel is maximized by
giving all this power to the user with the strongest channel gain. That is, each
time only one user with the best channel is allowed to transmit. Under this
strategy, the multiuser channel for each time l reduces to a point-to-point channel
with the channel gain
max |hk,l |2
k=1,...,K

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 192

NTS

Broadcast fading channels

K -user downlink fading channel:


Yk (i) = hk (i)X (i) + Zk (i),

k = 1, ... , K

where {hk (i)} is the random fading process of user k.


Similar to the uplink case, we assume that


E |hk (i)|2 = 1,

k = 1, ... , K

and that the fading processes of different users are i.i.d.


The transmit power is constrained to be equal to P.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 193

NTS

Let us first consider the case when the channel state information is available only
at the receiver.
We have the following single-user bounds:

 
P|h|2
Rk < B E log 1 +
, k = 1, ... , K
2
where h is a random channel gain.
For any k, this upper bound on Rk can be attained by using all the transmit power
to communicate to user k (with the rate to the remaining users being zero). Thus,
as in the non-fading case, we have K extreme points of the capacity region.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 194

NTS

Similar to the non-fading case, it can be shown that the sum rate is also bounded
by the same quantity
K
X
k=1

 

P|h|2
Rk < B E log 1 +
2

This bound can be achieved by transmitting only to one user or by time-sharing


between any number of users.
It can be shown that the rate pairs in the capacity region can be achieved by both
orthogonal schemes and superposition coding.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 195

NTS

Let us now consider the case when the channel state information is available both
at the transmitter and receiver.
Let us focus on the sum capacity. As in the uplink case, it can be shown that the
sum capacity is achieved by transmitting only to the best user at each time. Under
this strategy, the downlink channel reduces to a point-to-point channel with the
channel gain
max |hk |2
k=1,...,K

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 196

NTS

Multiuser diversity

We have seen that in the full channel state information case, from the sum
capacity perspective, the optimal strategy both in the uplink and downlink cases
reduces the multiuser case to the single-user (point-to-point) case with the fading
of magnitude maxk |hk (i)|. Compared to a system with a single user, the multiuser
diversity gain comes from:
I

the increase of the total transmit power in the uplink case;

the improvement of the effective channel gain at time i from |hk (i)|2 to
maxk=1,...,K |hk |2 .

The second effect appears entirely due to the ability to dynamically schedule
resources among the users as a function of the channel state.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 197

NTS

Remarks

The multiuser diversity gain comes from the following effect: when many
users fade independently, at any time there is a high probability that one of
them has a strong channel. By allowing only that user to transmit or, vice
versa, transmitting only to that user, the shared channel resource is used in
the most efficient manner, and the total throughput is maximized.

The larger the number of users, the higher is the multiuser diversity gain.

The amount of multiuser diversity gain depends critically on the tail of the
distribution of |hk |2 : the heavier the tail, the more likely there is a user with
the strong channel, and the larger the multiuser diversity gain.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 198

NTS

System requirements to extract the multiuser


diversity benefits

the base station has to access the channel quality of each user:
I

in downlink, each user has to track its own channel SNR and feed back the
channel quality to the base station.
in uplink, the base station has to track the user channel quality (user SNRs).

the base station has to schedule transmissions among the users as well as to
adapt the data rate as a function of instantaneous channel quality.

Such a scheduling procedure is often called opportunistic scheduling.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 199

NTS

Fairness and delay

In reality, the fading statistics of different users may be non-symmetric: there


are users some users who are closer to the base station and better in their
average SNR; there are users that are stationary (non-moving), or having no
scatterers around.

The multiuser diversity strategy is only concerned with maximizing long-term


average throughputs. In practice, there are latency requirements, that is, the
average throughput over the delay is the performance metric of interest.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 200

NTS

Channel measurement and feedback

All scheduling decisions are done as a function of user channel states. Hence,
the quality of channel estimation is a primary issue, and feedback from the
users to the base station is needed in the downlink case.

Both the error in channel measurement and the delay/error in feeding the
channel state back are significant bottlenecks of practical applications of the
multiuser diversity strategy.

Slow or limited fading:


I

We have observed that the use of multiuser diversity strategy requires fading
to be rich and fast. Not useful for line-of-sight scenarios or cases with little
scattering or slowly changing environments.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 201

NTS

Proportional fair downlink scheduling

Keeps track of the average throughput Tk (i) (k = 1, ... K ) of each user in


some (e.g., exponentially weighted) time-window of length tW .

In the ith time-slot, the base station receives the requested/supportable rates
Rk (i) (k = 1, ... K ) from all users, and transmits to the user k with the
largest
= Rk (i)/Tk (i)

The average throughputs are updated as:



Tk (i + 1) =

(1 1/tW )Tk (i) + Rk (i)/tW , k = k


(1 1/tW )Tk (i),
k=
6 k

This algorithm is used in the downlink mode of the 3G system IS-856.


19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 202

NTS

Combination of multiuser diversity and superposition coding

Divides the users in several classes (say, in two classes depending on whether
they are near to the base station or near the cell edge). Then, users in each
class have statistically comparable channel strengths.

Users whose current channel is instantaneously strongest in their own class


are scheduled for simultaneous transmission using superposition coding. Users
of stronger classes (e.g., nearby users) receive less power, still enjoying very
good rates and minimally affecting the performance of the weak classes of
users.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 203

NTS

ADVANCES OF CHANNEL CODING

We have already discussed the linear block channel codes in the Information
Theory I. Now, we will discuss cyclic codes as well as convolutional codes.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 204

NTS

Cyclic codes

An important subclass of linear block codes.


Consider an n-tuple
c = [c0 , c1 , ... , cn1 ]
Cyclically shifting the components of c, we have
c(1) = [cn1 , c0 , ... , cn2 ]
Using i subsequent cyclic shifts, we have
c(i) = [cni , cni+1 , ... , cn1 , c0 , c1 , ... , cni1 ]

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 205

NTS

Definition: cyclic codes

An (n, k) linear block code C is called a cyclic code if every cyclic shift of any
codeword in C is also a codeword in C .
Properties:
I Linearity: the sum of any two codewords is also a codeword;
I Cyclic property: Any cyclic shift of any codeword is also a codeword.
To develop the theory of cyclic codes, let us treat the components of the
codeword c as the coefficients of the following polynomial:
c(X ) = c0 + c1 X + + cn1 X n1
where X is an indeterminate.
The fact that all ci are binary is taken into account by using the binary arithmetic
for all polynomial coefficients when operating with polynomials.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 206

NTS

Cyclic codes

There is one-to-one correspondence between the vector c and the polynomial


c(X ). We will call c(X ) the code polynomial of c.
Each power of X in the polynomial c(X ) represents a one-bit shift in time. Hence,
multiplication of c(X ) by X may be viewed as shift to the right.
Key question: how to make such a shift cyclic?
Let c(X ) be multiplied by X i , yielding
X ic(X ) = X i(c0 +c1 X +...+cni1 X ni1+cni X ni+...+cn1 X n1)
= c0 X i +c1X i+1+...+cni1X n1+cni X n+...+cn1 X n+i1
= cni X n+...+cn1 X n+i1+c0 X i +c1X i+1+...+cni1X n1
where, in the last line, we have just rearranged the terms.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 207

NTS

Cyclic codes

Recognizing, for example, that cni + cni = 0 in the modulo-2 arithmetic, we


can manipulate the first i terms as follows:
X ic(X ) = cni +...+ cn1X i1 + c0 X i + c1 X i+1 + ... + cni1 X n1
+cni (X n + 1) + ... + cn1 X i1 (X n + 1)
Defining
c (i) (X ) , cni +...+ cn1X i1 + c0 X i + c1 X i+1 + ... + cni1 X n1
q(X ) , cni + cni+1 X + ... + cn1 X i1
we can reformulate the first equation in this page in the following compact form
X i c(X ) = q(X )(X n + 1) + c (i) (X )
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 208

NTS

Cyclic codes

The polynomial c (i) (X ) can be recognized as the code polynomial of the codeword
c(i) obtained by applying i cyclic shifts to the codeword c.
Moreover, from the latter equation, we readily see that c (i) (X ) is the remainder
that results from dividing X i c(X ) by (X n + 1).
Hence, we may formally state the cyclic property in polynomial notation as
follows: if c(X ) is a code polynomial, then the polynomial
c (i) (X ) = X i c(X ) mod(X n + 1)
is also a code polynomial for any cyclic shift i, where mod(X n + 1) stands for
modulo-(X n + 1) multiplication.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 209

NTS

Cyclic codes

Note that n cyclic shifts of any codeword does not change it, which means that
X n = 1, and hence X n + 1 = 0 in modulo-(X n + 1) arithmetics!
Generator polynomial: a polynomial g(X ) of minimal degree that completely
specifies the code and is a factor of X n + 1. The degree of g(X ) is equal to the
number of parity-check bits of the code, n k.
It can be shown that any cyclic code is uniquely determined by its generator
polynomial in that each code polynomial in the code can be expressed in the form
of a polynomial product as follows:
c(X ) = a(X )g (X )
where a(X ) is a polynomial of degree k 1.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 210

NTS

Cyclic codes

Given the generator polynomial g (X ), we want to encode the message


[m0 , ... , mk1 ] in an (n, k) systematic form. The codeword structure is
[b0 , b1 , ... , bnk1 , m0 , m1 , ... , mk1 ]
Define the message bits and parity bits polynomials as
m(X ) , m0 + m1 X + ... + mk1 X k1
b(X ) , b0 + b1 X + ... + bnk1 X nk1
We want the code polynomial to be in the form c(X ) = b(X ) + X nk m(X )
This means that b0 , ... , bnk1 occupy the first n k positions of each codeword,
whereas the message bits start from the (n k + 1)st position.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 211

NTS

Cyclic codes

Using equation c(X ) = a(X )g (X ) yields


a(X )g (X ) = b(X ) + X nk m(X )
Equivalently,
X nk m(X )
b(X )
= a(X ) +
g (X )
g (X )
which means that b(X ) is the remainder left over after dividing X nk m(X ) by
g (X ).

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 212

NTS

Example: A (7,4) cyclic code

We start with the polynomial X 7 1 and factorize it into three irreducible


polynomials as
X 7 1 = (1 + X )(1 + X 2 + X 3 )(1 + X + X 3 )
where by an irreducible polynomial we mean a polynomial that cannot be factored
using only polynomials with binary coefficients.
Let us take
g (X ) = 1 + X + X 3
as generator polynomial whose degree is equal to the number of parity bits.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 213

NTS

Example: A (7,4) cyclic code

We can also define a parity check polynomial


h(X ) = 1 +

k1
X

hi X i + X k

i=1

such that g (X )h(X ) = X n + 1


or, equivalently g (X )h(X ) mod(X n + 1) = 0
For our example, the parity check polynomial
h(X ) = 1 + X + X 2 + X 4
so that h(X )g (X ) = (1 + X + X 2 + X 4 )(1 + X + X 3 ) = X 7 + 1.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 214

NTS

Example: A (7,4) cyclic code

How to encode, for example, the message sequence 1001?


The corresponding message polynomial is
m(X ) = 1 + X 3
Multiplying m(X ) by X nk = X 3 , we have
X nk m(x) = X 3 + X 6
Dividing X nk m(x) by g (X ), we have
X3 + X6
X + X2
= X + X3 +
3
1+X +X
1 + X + X3
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 215

NTS

Example: A (7,4) cyclic code

That is,
a(X ) = X + X 3 ,

b(X ) = X + X 2

and the encoded message is


c(X ) = b(X ) + X nk m(X )
= X + X 2 + X 3 (1 + X 3 )
= X + X2 + X3 + X6
or, alternatively,
c = [0111001]

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 216

NTS

Relationship to conventional linear block codes

for the considered (7, 4) code, we can construct the generator matrix from
generator polynomial by using
g (X ) = 1 + X + X 3
Xg (X ) = X + X 2 + X 4
X 2 g (X ) = X 2 + X 3 + X 5
X 3 g (X ) = X 3 + X 4 + X 6
as the rows of the 4 7 generator matrix

1 1 0 1
0 1 1 0
G=
0 0 1 1
0 0 0 1
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 217

0
1
0
1

0
0
1
0

0
0

0
1
NTS

Relationship to conventional linear block codes

Clearly, the latter generator matrix is in non-systematic form. We can put it in a


systematic form by manipulating with its rows, that is, by adding the first row to
the third row and adding the sum of the first two rows to the fourth row. Then,
we get
1
0
G=
1
1

1
1
1
0

0
1
1
1

1
0
0
0

0
1
0
0

0
0
1
0

0
0

0
1

Decoding cyclic codes can be made in the same way as for any other linear block
codes, e.g., using syndrome.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 218

NTS

Popular cyclic codes are the so-called cyclic redundancy check (CRC) codes,
Bose-Chaudhuri-Hocquenghem (BCH) codes, and non-binary Reed-Solomon (RS)
codes. The are parts of different international communication standards, e.g.,
digital subscriber line (DSL) standards.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 219

NTS

Convolutional codes

Most powerful class of linear codes.


Similar to the linear block codes the encoder of a convolutional code accepts k-bit
message blocks and produces an encoded sequence of n-bit blocks. However, each
encoded block depends not only on the corresponding k-bit message block, but
also on the M previous message blocks.
Such an encoder is said to have a memory order of M.
The ratio
R = k/n
is called the code rate.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 220

NTS

Convolutional codes

The message sequence m = [m0 , m1 , m2 , ...] enters the encoder one bit at a time.
The encoder output sequences are obtained as the convolution of the input
sequence with the encoder generator sequences. For an encoder with the memory
order M, the length of these sequences is M + 1. For example, in the case of two
impulse generator sequences,
(0)

(0)

g(0) = [g0 , ... , gM ],

(1)

(1)

g(1) = [g0 , ... , gM ]

we can write encoding equations


c(0) = m g(0) ,

c(1) = m g(1)

where denotes the discrete convolution and all operations are modulo-2.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 221

NTS

Convolutional codes

The convolution operation implies that


(j)

cl

M
X

(j)

mli gi ,

j = 0, 1

i=0

where mli = 0 for all l < i.

After encoding, the output sequences are multiplexed into a single sequence called
the codeword
(0) (1) (0) (1)
c = [c0 , c0 , c1 , c1 , ... ]

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 222

NTS

Convolutional codes

Defining a matrix

(0)
g0

G =

(1)

g0

(0)

g1
(0)
g0

(1)

g1
(1)
g0
..
.

(0)
g1

(0)

gM
(1)
g1
..
.

(1)

gM

(0)

gM

(1)

gM

..

where all blank areas are zeros, we can rewrite the encoding equations in matrix
form as
c = mG
This form of this equation is equivalent to that of linear block codes! Therefore,
we call G the generator matrix of the code.
In the case of semi-infinite message sequence, the matrix G is semi-infinite as well.
However, if m is finite-length, then G becomes finite-length as well.
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 223

NTS

Example: R = 1/2 code

With the generator sequences:


g(0) = [1011]
g(1) = [1111]
Let the message sequence be
m = [10111]
Encoding equations yield
c(0) = [10111] [1011] = [10000001]
c(1) = [10111] [1111] = [11011101]
and, hence, the 2(k + M)-bit codeword
c = [11 01 00 01 01 01 00 11]
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 224

NTS

Example: R = 1/2 code

Alternatively, we can write the k 2(k + M)

11 01 11 11

11 01 11

G =
11 01

11
and obtain the same codeword

11 01 11 11

11 01 11

c = [10111]
11 01

11

generator matrix as

11
11 11
01 11 11
11 01 11 11

as

11
11 11
01 11 11
11 01 11 11

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 225

= [11 01 00 01 01 01 00 11]

NTS

Code tree and trellis

Let us discuss the concepts of code tree and trellis using a particular example of
the R = 1/2 convolutional code with M = 2 and the impulse responses
g(0) = [111],

g(1) = [101]

Consider the input sequence m = [10011]. Similar to the example above, it can be
shown that the codeword becomes
c = [11 10 11 11 01 01 11]
To enforce the R = 1/2 property, let us truncate the codeword by dropping the
last 2M = 4 bits (the effect of truncation becomes negligible if longer messages
and codewords are used). Then, the codeword becomes [11 10 11 11 01]
19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 226

NTS

Convolutional Encoder

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 227

NTS

The code tree is defined as follows: each branch of the tree represents an input
symbol (0 or 1). The corresponding output (coded) symbols are indicated on each
branch. A specific path can be traced for each message sequence. The
corresponding coded symbols on the branches following this path form the output
sequence.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 228

NTS

Code tree

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 229

NTS

State diagram

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 230

NTS

Trellis diagram

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 231

NTS

Trellis diagram

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 232

NTS

Trellis diagram

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 233

NTS

Trellis diagram

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 234

NTS

Trellis diagram

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 235

NTS

Complexity of Viterbi Decoder

Over L binary intervals, the total number of comparisons made by the Viterbi
algorithm is 2K 1 L, rather than 2L comparisons required by the standard
maximum-likelihood procedure (full tree search).

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 236

NTS

Probability of deviating from correct path

Let a(d) denote the number of pathes with a Hamming distance d deviating from,
and then returning to, the all-0 test path. The error probability of Pe of deviating
from the correct path is then upper bounded by
Pe <

a(d)Pd

d=dF

where Pd denotes that probability that d bits are received in error and dF denotes
the minimum free distance.
Inequality sign because pathes a not mutually exclusive.
Pe depends critically on the minimum free distance dF !

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 237

NTS

CONCLUSION

We have studied advanced information theory including the capacity


characterization of multi-antenna and multi-user channels (and the resulting
concept of multiuser diversity), and advanced channel coding approaches
such as cyclic, convolutional codes.

To apply these concepts and approaches to practice or to do research in these


fields, a deeper study is required.

19. April 2012 | NTS TU Darmstadt | Marius Pesavento | 238

NTS

You might also like