You are on page 1of 26

Lecture 5: Asymptotic Equipartition Property

Law of large number for product of random variables


AEP and consequences

Dr. Yao Xie, ECE587, Information Theory, Duke University

Stock market
Initial investment Y0, daily return ratio ri, in t-th day, your money is
Yt = Y0r1 . . . rt.
Now if returns ratio ri are i.i.d., with
{
4, w.p. 1/2
ri =
0, w.p. 1/2.
So you think the expected return ratio is Eri = 2,
and then

EYt = E(Y0r1 . . . rt) = Y0(Eri)t = Y02t?

Dr. Yao Xie, ECE587, Information Theory, Duke University

Is optimized really optimal?


With Y0 = 1, actual return Yt goes like
1

16

0...

Optimize expected return is not optimal?


Fundamental reason: products does not behave the same as addition

Dr. Yao Xie, ECE587, Information Theory, Duke University

(Weak) Law of large number


Theorem. For independent, identically distributed (i.i.d.) random
variables Xi,

1
n =
Xi EX,
X
n i=1
n

in probability.

Convergence in probability if for every > 0,


P {|Xn X| > } 0.
Proof by Markov inequality.
So this means
n EX| } 1,
P {|X
Dr. Yao Xie, ECE587, Information Theory, Duke University

n .
3

Other types of convergence


In mean square if as n
E(Xn X)2 0
With probability 1 (almost surely) if as n
}
{
P lim Xn = X = 1
n

In distribution if as n
lim Fn F,
n

where Fn and F are the cumulative distribution function of Xn and X.


Dr. Yao Xie, ECE587, Information Theory, Duke University

Product of random variables


How does this behave?

v
un
u
n
t
Xi
i=1

Geometric mean

n
n

i=1 Xi

arithmetic mean

1
n

i=1 Xi

Examples:
Volume V of a random box, each dimension Xi, V = X1 . . . Xn
Stock return Yt = Y0r1 . . . rt
n
Joint distribution of i.i.d. RVs: p(x1, . . . , xn) = i=1 p(xi)

Dr. Yao Xie, ECE587, Information Theory, Duke University

Law of large number for product of random variables


We can write
Xi = elog Xi
Hence

v
u n
u
1 n log X
n
t Xi = e n i=1
i
i=1

So from LLN
v
u n
u
n
t
Xi eE(log X) elog EX = EX.
i=1

Dr. Yao Xie, ECE587, Information Theory, Duke University

Stock example:
1
1
E log ri = log 4 + log 0 =
2
2
E(Yt) Y0eE log ri = 0,
Example

t .

a, w.p. 1/2
X=
b, w.p. 1/2.
v

u
n
u

a+b
n
t
E
Xi ab

2
i=1

Dr. Yao Xie, ECE587, Information Theory, Duke University

Asymptotic equipartition property (AEP)


LLN states that

1
Xi EX
n i=1
n

AEP states that most sequences


1
1
log
H(X)
n
p(X1, X2, . . . , Xn)
p(X1, X2, . . . , Xn) 2nH(X)
Analyze using LLN for product of random variables

Dr. Yao Xie, ECE587, Information Theory, Duke University

AEP lies in the heart of information theory.


Proof for lossless source coding
Proof for channel capacity
and more...

Dr. Yao Xie, ECE587, Information Theory, Duke University

AEP
Theorem. If X1, X2, . . . are i.i.d. p(x), then
1
log p(X1, X2, . . . , Xn) H(X),
n

in probability.

Proof:
n
1
1
log p(X1, X2, , Xn) =
log p(Xi)
n
n i=1

E log p(X)
= H(X).
There are several consequences.
Dr. Yao Xie, ECE587, Information Theory, Duke University

10

Typical set
A typical set

A(n)

contains all sequences (x1, x2, . . . , xn) X n with the property


2n(H(X)+) p(x1, x2, . . . , xn) 2n(H(X)).

Dr. Yao Xie, ECE587, Information Theory, Duke University

11

Not all sequences are created equal


Coin tossing example: X {0, 1}, p(1) = 0.8
p(1, 0, 1, 1, 0, 1) = p

p(0, 0, 0, 0, 0, 0) = p

Xi

Xi

(1 p)

(1 p)

5 Xi

Xi

= p4(1 p)2 = 0.0164

= p4(1 p)2 = 0.000064

In this example, if

(x1, . . . , xn) A(n)


,
1
H(X) log p(X1, . . . , Xn) H(X) + .
n

This means a binary sequence is in typical set is the frequency of heads


is approximately k/n
Dr. Yao Xie, ECE587, Information Theory, Duke University

12

Dr. Yao Xie, ECE587, Information Theory, Duke University

13

p = 0.6, n = 25, k = number of 1s


k
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25

! "
n
k
1
25
300
2300
12650
53130
177100
480700
1081575
2042975
3268760
4457400
5200300
5200300
4457400
3268760
2042975
1081575
480700
177100
53130
12650
2300
300
25
1

Dr. Yao Xie, ECE587, Information Theory, Duke University

! "
n k
p (1 p)nk
k
0.000000
0.000000
0.000000
0.000001
0.000007
0.000054
0.000227
0.001205
0.003121
0.013169
0.021222
0.077801
0.075967
0.267718
0.146507
0.575383
0.151086
0.846448
0.079986
0.970638
0.019891
0.997633
0.001937
0.999950
0.000047
0.000003

1
log p(x n )
n
1.321928
1.298530
1.275131
1.251733
1.228334
1.204936
1.181537
1.158139
1.134740
1.111342
1.087943
1.064545
1.041146
1.017748
0.994349
0.970951
0.947552
0.924154
0.900755
0.877357
0.853958
0.830560
0.807161
0.783763
0.760364
0.736966

14

Consequences of AEP
(n)

Theorem. 1. If (x1, x2, . . . , xn) A , then for n sufficiently large:


1
H(X) log p(x1, x2, . . . , xn) H(X) +
n
(n)

2. P {A } 1 .
(n)

3. |A | 2n(H(X)+).
(n)

4. |A | (1 )2n(H(X)).

Dr. Yao Xie, ECE587, Information Theory, Duke University

15

Property 1
(n)

If (x1, x2, . . . , xn) A , then


1
H(X) log p(x1, x2, . . . , xn) H(X) + .
n
Proof from definition:
(x1, x2, . . . , xn) A(n)
,
if

2n(H(X)+) p(x1, x2, . . . , xn) 2n(H(X)).

The number of bits used to describe sequences in typical set is


approximately nH(X).
Dr. Yao Xie, ECE587, Information Theory, Duke University

16

Property 2
(n)

P {A } 1 for n sufficiently large.


Proof: From AEP: because
1
log p(X1, . . . , Xn) H(X)
n
in probability, this means for a given > 0, when n is sufficiently large



1

p{ log p(X1, . . . , Xn) H(X) } 1 .
n
|
{z
}
(n)

High probability: sequences in typical set are most typical.


These sequences almost all have same probability - equipartition.
Dr. Yao Xie, ECE587, Information Theory, Duke University

17

Property 3 and 4: size of typical set


n(H(X)+)
(1 )2n(H(X)) |A(n)
|2

Proof:
1=

p(x1, . . . , xn)

(x1 ,...,xn )

p(x1, . . . , xn)
(n)

(x1 ,...,xn )A

p(x1, . . . , xn)2n(H(X)+)
(n)

(x1 ,...,xn )A

n(H(X)+)
= |A(n)
.
|2

Dr. Yao Xie, ECE587, Information Theory, Duke University

18

(n)

On the other hand, P {A } 1 for n, so

1<

p(x1, . . . , xn)
(n)

(x1 ,...,xn )A

n(H(X))
|A(n)
|2
.

Size of typical set depends on H(X).


When p = 1/2 in coin tossing example, H(X) = 1, 2nH(X) = 2n: all
sequences are typical sequences.

Dr. Yao Xie, ECE587, Information Theory, Duke University

19

Typical set diagram


This enables us to divide all sequences into two sets
Typical set: high probability to occur, sample entropy is close to true
entropy
so we will focus on analyzing sequences in typical set
Non-typical set: small probability, can ignore in general

n:|

|n elements

Non-typical set

Typical set

Dr. Yao Xie, ECE587, Information Theory, Duke University

A(n) : 2n(H +

elements

20

Data compression scheme from AEP


Let X1, X2, . . . , Xn be i.i.d. RV drawn from p(x)
We wish to find short descriptions for such sequences of RVs

Dr. Yao Xie, ECE587, Information Theory, Duke University

21

Divide all sequences in X n into two sets

Non-typical set
Description: n log | | + 2 bits

Typical set
Description: n(H + ) + 2 bits

Dr. Yao Xie, ECE587, Information Theory, Duke University

22

Use one bit to indicate which set


(n)

Typical set A use prefix 1


Since there are no more than 2n(H(X)+) sequences, indexing requires
no more than (H(X) + ) + 1 (plus one extra bit)
Non-typical set use prefix 0
Since there are at most |X |n sequences, indexing requires no more
than n log |X | + 1
Notation: xn = (x1, . . . , xn), l(xn) = length of codeword for xn
We can prove

[
E

1
l(X n) H(X) +
n

Dr. Yao Xie, ECE587, Information Theory, Duke University

23

Summary of AEP

Almost everything is almost equally probable.


Reasons that AEP has H(X)

n1 log p(xn) H(X), in probability


n(H(X) ) suffices to describe that random sequence on average
2H(X) is the effective alphabet size
Typical set is the smallest set with probability near 1
Size of typical set 2nH(X)
The distance of elements in the set nearly uniform

Dr. Yao Xie, ECE587, Information Theory, Duke University

24

Next Time
AEP is about the property of independent sequences
What about the dependence processes?
Answer is entropy rate - next time.

Dr. Yao Xie, ECE587, Information Theory, Duke University

25

You might also like