You are on page 1of 35

EE 801: Analysis of Stochastic

Systems

Inequalities and Limit Theorems

Dr. Muhammad Usman Ilyas


School of Electrical Engineering & Computer Science
National University of Sciences & Technology (NUST)
Pakistan

Copyright © Syed Ali Khayam 2009


What will we cover in this lecture?
 In this lecture, we will cover important inequalities and 
theorems that will lead to a fundamental result in probability 
theory known as the Central Limit Theorem

 A list of topics that will be covered is as follows:
 Markov Inequality
 Chebyshev Inequality
 Binomial Theorem
 The Weak Law of Large Numbers
 The Central Limit Theorem

 For this lecture, I am borrowing derivations/discussions from the 
following book:
 Kishore S. Trivedi, Probability and Statistics with Reliability, Queuing, and 
Computer Science Applications, 2005.
Copyright © Syed Ali Khayam 2009 2
Markov Inequality
 Consider that you are given the mean E{X}=μ of a non‐negative 
random variable X

 Define a function of X as:
ì
ï0, if X < t
Y =ï
í
ï
ït, X ³ t
î

Copyright © Syed Ali Khayam 2009 3


Markov Inequality
 Consider that you are given the mean E{X}=μ of a non‐negative 
random variable X

 Define a function of X as:
ì
ï0, if X < t
Y =ï
í
ï
ït, X ³ t
î

 Recall that a function of a random variable is also a random 
variable
 So what is the pmf of Y?

Copyright © Syed Ali Khayam 2009 4


Markov Inequality
ìï0, if X < t
Y = ïí
ïït, X ³ t
î
 Y has a discrete pmf:
pY (0) = Pr{X < t }
pY (t ) = Pr{X ³ t }
 The expected value of Y is:
E{Y } = 0 ´ Pr{X < t } + t ´ Pr{X ³ t } = t Pr{X ³ t }

Copyright © Syed Ali Khayam 2009 5


Relationship  Between RVs X & Y
pX(x)

pY(y) t

Y
0 t

Copyright © Syed Ali Khayam 2009


Markov Inequality
ìï0, if X < t
Y = ïí
ïït, X ³ t
î
E{Y } = t Pr{X ³ t }

 Since X≥Y, we have E{X}≥E{Y} 
E{X } ³ E{Y } = t Pr{X ³ t }

Copyright © Syed Ali Khayam 2009 7


Markov Inequality
ìï0, if X < t
Y = ïí
ïït, X ³ t
î
E{Y } = t Pr{X ³ t }

 Since X≥Y, we have E{X}≥E{Y} 
E{X } ³ E{Y } = t Pr{X ³ t }

E{X }
Pr{X ³ t } £
t
m This is called
Pr{X ³ t } £ the Markov
t
Inequality

Copyright © Syed Ali Khayam 2009 8


Markov Inequality
m
Pr{X ³ t } £
t
 Probability that X takes values far from the mean is low

Copyright © Syed Ali Khayam 2009 9


Chebyshev Inequality
E{X } Markov
Pr{X ³ t } £ Inequality
t
 Define a new random variable W=(X‐μ)2 as a function of the non‐
negative random variable X
 Clearly, W is a non‐negative random variable
 Also, let s=t2
 Then by Markov inequality
E{W }
Pr{W ³ s} £
s
2
2 2 E {(X - m) }
Pr{(X - m) ³ t } £
t2
2
s
Pr{(X - m)2 ³ t 2 } £ 2
t
Copyright © Syed Ali Khayam 2009 10
Chebyshev Inequality
E{X } Markov
Pr{X ³ t } £ Inequality
t
 Define a new random variable W=(X‐μ)2 as a function of the non‐
negative random variable X
 Clearly, W is a non‐negative random variable
 Also, let s=t2
 Then by Markov inequality
E{W }
Pr{W ³ s} £
s
2
2 2 E {(X - m) }
Pr{(X - m) ³ t } £
t2
2
s
Pr{(X - m)2 ³ t 2 } £ 2
t
Copyright © Syed Ali Khayam 2009 11
Chebyshev Inequality
E{X } Markov
Pr{X ³ t } £ Inequality
t
 Define a new random variable W=(X‐μ)2 as a function of the non‐
negative random variable X
 Clearly, W is a non‐negative random variable
 Also, let s=t2
 Then by Markov inequality
E{W }
Pr{W ³ s} £
s
2
2 2 E {(X - m) }
Pr{(X - m) ³ t } £
t2
2
s
Pr{(X - m)2 ³ t 2 } £ 2
t
Copyright © Syed Ali Khayam 2009 12
Chebyshev Inequality
E{X } Markov
Pr{X ³ t } £ Inequality
t
 Define a new random variable W=(X‐μ)2 as a function of the non‐
negative random variable X
 Clearly, W is a non‐negative random variable
 Also, let s=t2
 Then by Markov inequality
E{W }
Pr{W ³ s} £
s
2
2 2 E {(X - m) }
Pr{(X - m) ³ t } £
t2
2
s
Pr{(X - m)2 ³ t 2 } £ 2
t
Copyright © Syed Ali Khayam 2009 13
Chebyshev Inequality
2 2 s2
Pr{(X - m) ³ t } £ 2
t
 Now note that 

Pr{(X - m)2 ³ t 2 } = Pr{X - m ³ t } = Pr{ X - m ³ t }


 Thus:
s2
Pr{ X - m ³ t } £ 2
t
This is called
the Chebyshev
Inequality

Copyright © Syed Ali Khayam 2009 14


Chebyshev Inequality
s2
Pr{ X - m ³ t } £ 2
t
 Chebyshev inequality gives an intuitive sense for variance

 σ2 is small => Values close to the mean have higher probabilities

 σ 2 is large => Values far away from the mean have high 
probabilities

Copyright © Syed Ali Khayam 2009 15


Chebyshev Inequality
s2
Pr{ X - m ³ t } £ 2
t
 Two alternative representations of the Chebyshev Inequality are 
possible:

s2 1
Pr{ X - m ³ k s} £ 2 2 = 2
k s k
1
Pr{ X - m < k s} ³ 1 - 2
k

Copyright © Syed Ali Khayam 2009 16


Reading Assignment
 Example 3.40 in the textbook
 Example 3.41 in the textbook
 Example 3.42 in the textbook

Copyright © Syed Ali Khayam 2009 17


Bernoulli’s Theorem
 Bernoulli’s Theorem is an application of the Chebyshev
Inequality to the Binomial Distribution

 Consider a binomial distribution X with parameters (n, p)

 The mean and standard deviation of the distribution are given 
by:
m = np s = np(1 - p)

Copyright © Syed Ali Khayam 2009 18


Bernoulli’s Theorem
 Consider a binomial distribution X with parameters (n, p)
m = np s = np(1 - p)

 Now applying the Chebyshev Inequality gives:
1
Pr { X - np < k np(1 - p)} ³ 1 - 2
k
ì
ï X p(1 - p) üïï 1
ï
Pr í - p < k ý ³ 1- 2
ï ïï
în
ï n þ k

Copyright © Syed Ali Khayam 2009 19


Bernoulli’s Theorem
ì
ï X p(1 - p ) ü
ï 1
ï
Pr í - p < k ï
ý ³ 1- 2
ï
ïn n ï
ï k
î þ

Set  e = k p(1 - p)/ n


ìï X ü
ï p(1 - p)
Pr í - p < eý ³ 1 -
ïîï n ï
ï
þ n e 2

ìï X ü
ï This is called
lim Pr í - p < eý = 1 Bernoulli’s
n ¥ ïîï n ï
ï
þ
Theorem

Copyright © Syed Ali Khayam 2009 20


Bernoulli’s Theorem
ì
ïX ü
ï
lim Pr í - p < eý = 1
n ¥ ï
ïn
î ï
ï
þ
 Bernoulli’s Theorem states that if we run a large number of 
Bernoulli trials (n) then the probability that the proportion of 
successes in the n trials differs from p is arbitrarily small

 Bernoulli’s Theorem is a special case of the Weak Law of Large 
Numbers discussed next

Copyright © Syed Ali Khayam 2009 21


Weak Law of Large Numbers
 Let X1, X2, …, Xn, be mutually independent identically distributed 
random variables

 Let μ represent the mean of the common distribution of these 
random variables

 Then for large n, we would expect that the sample mean is close 
to μ
x1 + x 2 +  + xn
x= m
n

Copyright © Syed Ali Khayam 2009 22


Weak Law of Large Numbers
x1 + x 2 +  + xn
x= m
n
Sample mean
n
 Let: Sn = å X i and X = Sn / n
i =1

 Then the variance of the sample mean is:

var {X } = var
n{ }
Sn ìïï 1 n
= var í å X i ï
ü
ï
ý
ïîïn i =1 ï
ï
þ
1 n
= 2 å var {X i }
n i =1
ns2 s2
= 2 =
n n

Copyright © Syed Ali Khayam 2009 23


Weak Law of Large Numbers
s2
var {X } =
n
 Thus variance of the sample mean approaches 0 as n approaches 
infinity

 That is, the distribution of the sample mean gets more and more 
concentrated about its mean as the number of trials (n) 
approach infinity

Copyright © Syed Ali Khayam 2009 24


Weak Law of Large Numbers
s2
var {X } =
n
 For large number of trials, the distribution of the sample mean 
gets more and more concentrated about its mean
 Using Chebyshev’s Inequality, we get
var{X } s2
Pr { X - m ³ d } = 2
= 2
d nd

Copyright © Syed Ali Khayam 2009 25


Weak Law of Large Numbers
s2
Pr { X - m ³ d } = 2
nd
Taking the limit of the above inequality yields:

lim Pr { X - m ³ d } = 0 This is called the


Weak Law of
n ¥
Large Number

Copyright © Syed Ali Khayam 2009 26


Weak Law of Large Numbers
s2
Pr { X - m ³ d } = 2
nd
Taking the limit of the above inequality yields:

lim Pr { X - m ³ d } = 0 This is called the


Weak Law of
n ¥
Large Number

Copyright © Syed Ali Khayam 2009 27


Weak Law of Large Numbers
lim Pr { X - m ³ d } = 0
n ¥

 The Weak Law of Large Numbers states that if you run a large 
number of trials (n) then the probability that the sample mean 
converges to the true mean is 1

Copyright © Syed Ali Khayam 2009 28


The Central Limit Theorem
Let X1, X2, …, Xn, be mutually independent random variables with a 
finite mean E{Xi}=μi and a finite variance var{Xi}=(σi)2. We form a 
normalized random variable:
n n

åX i - å mi
Zn = i =1
n
i =1

å i
s 2

i =1

so that E{Zn}=0 and var{Zn}=1. Then, under certain regularity 
conditions, the limiting distribution of Zn is standard normal:
t
1 -y 2 / 2
lim FZn (t ) = lim Pr{Z n £ t } =
n ¥ n ¥ ò 2p
e dy

Z n  N (0,1)

Copyright © Syed Ali Khayam 2009 29


Example: The Central Limit Theorem
Assume that X1, X2, …, Xn, are mutually independent and identically‐
distributed with common mean E{Xi}=μ and a common variance 
var{Xi}=σ2. Then, if we apply central limit theorem, we get the 
normalized random variable:
n n n n
Xi
åX i - å mi nå
i =1 n
- å mi
Zn = i =1
n
i =1
= n
i =1

ås
i =1
2
i å i
s 2

i =1

nX - n m n (X - m)
= = Thus the distribution
ns 2 s of the sample mean
converges to
normality as n->∞

Copyright © Syed Ali Khayam 2009 30


Example: The Central Limit Theorem
This is the histogram of the weights of people on Rhode Island.

Reference: http://www.intuitor.com/statistics/CentralLim.html

Copyright © Syed Ali Khayam 2009 31


Example: The Central Limit Theorem
Now, instead of plotting the histogram of each sample separately, 
we plot averages of 2‐samples

Reference: http://www.intuitor.com/statistics/CentralLim.html

Copyright © Syed Ali Khayam 2009 32


Example: The Central Limit Theorem
Then we plot histogram of the average of 50‐samples

Reference: http://www.intuitor.com/statistics/CentralLim.html

Copyright © Syed Ali Khayam 2009 33


Example: The Central Limit Theorem
Finally, we plot the histogram of the average of 100‐samples

The distribution
approaches
normality as we
move keep
averaging over more
and more samples

Reference: http://www.intuitor.com/statistics/CentralLim.html

Copyright © Syed Ali Khayam 2009 34


Reading Assignment
 Example 5.11 in the textbook
 Example 5.12 in the textbook
 Example 5.13 in the textbook
 Example 5.14 in the textbook

Copyright © Syed Ali Khayam 2009 35

You might also like