Class Notes Large Deviation Theory

LARGE DEVIATION THEORY
Suppose Suppose X 1 , X 2 .... X n is a sequence of independent and identically distributed

random variables each with common mean and S n = X 1 + X 2 + ... + X n . The weak law of
Sn
for large
n
S
n. The large deviation theory deals with the tail probability of the form P n a .
n
large number
and the central limit theorem were concerned with the behavior of
The WLLN states
Sn P
n
S
i.e. lim P n > = 0

n
n
Thus the WLLN asserts that
Sn
converges to and does not deal with any deviation.
n
The central limit theorem says
S n n d
N (0,1)
n
i.e. we can find the probability P a / n <
Sn
<+a/ n
n
In this case, we can find the probability the deviation is of the order 1/ n which is not large.
Thus the CLT is not applicable for finding the probability of the large deviation of the mean.
MGF and CGF
Recall that the moment generating function (MGF) M X ( s ) of a random variable X is defined by
M X ( s ) = Ee sX =
sx
dFX ( x )
where s is a real variable. Unlike the characteristic function, the MGF may not exist for all
random variables and all values of s. If M X ( s ) exists at s = 0 , it may be conveniently used to
generate the moments as:
EX k = M X ( k ) ( s )
s =0
, k = 1, 2,..
the cumulant generating function of a random variable is defined as
K X ( s ) = log e ( M X ( s ) ) = log e ( Ee sX )
where M X ( s ) is the moment generating function.
If M X ( s ) exists and is non-zero, then K X ( s ) also exists.
As M X (0) = 1 , we get K X (0) = 0. The Taylor series expansion of K X ( s ) about the origin gives
K X ( s) = k X (n)
n =1
sn
n!
where the nth coefficient k n is called the nth cumulant of the random variable of X.
From the above expression, we get
k X ( n) =
=
dn
K X ( s)
ds n
s =0
dn
log e ( M X ( s ) )
ds n
s =0
We can derive the first four cumulants as
k X (1) =
dM X ( s )
= X
ds s =0
k X (2) =
d 2 M X ( s)
= X2
ds 2
s =0
d 3 M X ( s)
k X (3) =
= EX 3 EX 2 + 2 3 = E ( X X )3
3
ds
s =0
k X (4) =
d 4 M X ( s)
= E ( X X ) 4 3 X4
ds 4
s =0
Crammers Theorem
Let X1, X2,,, Xn be iid random variables with mean and the MGF
finite in a neighbourhood for s = 0. Then for any a > ,
1
S
lim log P n a = l * (a)

n n
n
M X ( s ) .which is
where
l * (a ) = max ( sa log M X ( s ) ) = s *a log M X ( s* )

s >0
and s* is the value of s corresponding to l * (a ) .
Sn
> a by using the Chernoff bound. According
We can easily find an apper bound for P

to the Chernoff bound,
( P ( X a ) = min e as M X ( s ) ).
s >0
P n > a = P ( S n > na ) min e ans M Sn ( s )

s >0
n
M S n ( s ) = Ee sS n
n
= Ee
Xi
i =1
= E e sX i
i =1
Ee
sX i
i =1
= ( M X (s))
n
S
P n > a min e ans ( M X ( s ) )

s
>
0
n
= min e ans + n log e M x ( s )

s >0
where log e M X ( s ) is the cumulant generating function
P n > a min e n( as loge M X ( s ) )

n
s >0
n as log e M X ( s ) )
Minimization of e (
is equivalent to maximization of l ( s ) = ( as log e M X ( s ) ) . Let
l * (a ) = max ( sa log e M X ( s ) ) = max ( s * a log e M X ( s*) ) .
s >0
s >0
l * ( a ) is known as the Fenchel-Legendre transform of log e M X ( s )

Thus,
*
S
P n > a e nl ( a )
n
1
S
log e P n > a l * ( a )
n
n
We have also to show that
lim
1 Sn
P
> a l * ( a )
n n
For this, consider a set of new random variables Y1 , Y2 ,..., Yn obtained by mapping X i to Yi
such that the common distribution function of Yi s is given by
y
1
FY ( y ) =
e s*u dFX (u )
M X (s *)
As X i s are iid random variables so also Yi s . We can verify that FY ( y ) is a valid distribution
function. We also observe that
dFY (u ) =
e s*u dFX (u )
M X (s *)
The MGF of Y is given by
MY (s ) =
su
dFY (u )
1
( s + s *)u
e dFX (u)
M X (s *)
M X (s + s *)
M X (s *)
EYi = MY/ (0) =
M X/ (s *)
=a
M X (s *)
Similarly,
VarYi = MY// (0) (MY/ (0) )
Define S n =
Y .
i
i =1
Now
MS (s ) = Ee
Yi
i =1
= Ee sYi
i =1
= ( Ee sY )
which we get using the iid property.
MS (s ) = (MY (s ) )
(M X (s + s *))
=
(M x (s *))n
MSn (s + s *)
(M x (s *))n
Noting the definition of MS (s ) , we get

n
e dFS (u ) =
su
( s + s *)u
dFSn (u )
(M x (s *))n
so that
dFS (u ) =
n
es *u dFSn (u )
(M x (s *))n
Using the above relationship, the probability involving Sn can be studied in terms of the
probabilities involving .
Suppose b>a, we have
P n > a = P ( Sn > na )
n
dF
Sn
(u )
na
(M
( s*)) n e s*u dFSn (u )
na
nb
s*nb
( M x ( s*))
dF
Sn
(u )
na
nb
=e
( s*nb n log e ( M x ( s *))
dF
Sn
(u)
na
= e ( s*nbn loge ( M x ( s*)) P( na < Sn < nb)

Taking logarithm of both sides and dividing by n, we get
1
1
S
log e P n > a ( s * b log e ( M x ( s*)) + log e P ( na < Sn < nb)

n
n
n
1
1
= ( s * b log e ( M x ( s*)) + log e P ( Sn < nb) log e P ( Sn na )
n
n
Now according to the CLT,
and according to the SLLN,
Therefore, as n becomes large and noting that we can take b arbitrarily close to a,
1
S
log e P n > a ( s * b log e ( M x ( s*))

n
n
= ( s * a log e ( M x ( s*))
= l * ( a )
Combining the lower bound and upper bounds,
1
S
lim log P n a = l * (a)

n n
n
Thus for or large n ,

*
S
P n a e nl ( a )
n
Example: Let X i s are Bernoulli random variable.
Xi = 1
=0
with probability 'p '

with probability '1- p '
The moment generating function is given by
M X ( s ) = Ee sX = pe s + (1 p )
log M X ( s ) = log (1 p + pe s )
l ( s ) = as log (1 p + pe s )
a (1 p )
p(1 a )
1 a
a
Then, l * (a ) = a log + (1 a) log
p
1 p
l ( s ) is maximum at s = log
In fact, S n =
X
i =1
Binomial( p, n) so that we can find P n a from which we can check

n
the tightness of the bounds.

And a is a number between 0 and 1 so a has a probability interpretation.
Since l * (a ) = a log
1 a
a
*
+ (1 a) log
, so we can call l (a ) is the relative entropy or Kullback
p
1
Leibler distance between ( a,1 a ) and ( p,1 p) .

Class Notes Large Deviation Theory

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Class Notes Large Deviation Theory

Uploaded by

Copyright:

Available Formats

LARGE DEVIATION THEORY

Suppose Suppose X 1 , X 2 .... X n is a sequence of independent and identically distributed

The WLLN states

i.e. lim P n > = 0

Thus the WLLN asserts that

The central limit theorem says

i.e. we can find the probability P a / n <

the cumulant generating function of a random variable is defined as

We can derive the first four cumulants as

lim log P n a = l * (a)

l * (a ) = max ( sa log M X ( s ) ) = s *a log M X ( s* )

and s* is the value of s corresponding to l * (a ) .

> a by using the Chernoff bound. According

We can easily find an apper bound for P

P n > a = P ( S n > na ) min e ans M Sn ( s )

P n > a min e ans ( M X ( s ) )

= min e ans + n log e M x ( s )

where log e M X ( s ) is the cumulant generating function

P n > a min e n( as loge M X ( s ) )

l * ( a ) is known as the Fenchel-Legendre transform of log e M X ( s )

We have also to show that

The MGF of Y is given by

EYi = MY/ (0) =

VarYi = MY// (0) (MY/ (0) )

which we get using the iid property.

Noting the definition of MS (s ) , we get

( s*)) n e s*u dFSn (u )

( s*nb n log e ( M x ( s *))

= e ( s*nbn loge ( M x ( s*)) P( na < Sn < nb)

log e P n > a ( s * b log e ( M x ( s*)) + log e P ( na < Sn < nb)

and according to the SLLN,

log e P n > a ( s * b log e ( M x ( s*))

lim log P n a = l * (a)

Thus for or large n ,

Example: Let X i s are Bernoulli random variable.

with probability 'p '

The moment generating function is given by

Binomial( p, n) so that we can find P n a from which we can check

the tightness of the bounds.

Leibler distance between ( a,1 a ) and ( p,1 p) .

You might also like

l * (a ) = max ( sa log M X ( s ) ) = s a log M X ( s )

( s)) n e su dFSn (u )

( snb n log e ( M x ( s ))

= e ( snbn loge ( M x ( s)) P( na < Sn < nb)