You are on page 1of 7

LARGE DEVIATION THEORY

Suppose Suppose X 1 , X 2 .... X n is a sequence of independent and identically distributed


random variables each with common mean and S n = X 1 + X 2 + ... + X n . The weak law of

Sn
for large
n
S

n. The large deviation theory deals with the tail probability of the form P n a .
n

large number

and the central limit theorem were concerned with the behavior of

The WLLN states

Sn P

n
S

i.e. lim P n > = 0


n
n

Thus the WLLN asserts that

Sn
converges to and does not deal with any deviation.
n

The central limit theorem says

S n n d

N (0,1)
n

i.e. we can find the probability P a / n <

Sn

<+a/ n
n

In this case, we can find the probability the deviation is of the order 1/ n which is not large.
Thus the CLT is not applicable for finding the probability of the large deviation of the mean.
MGF and CGF
Recall that the moment generating function (MGF) M X ( s ) of a random variable X is defined by

M X ( s ) = Ee sX =

sx

dFX ( x )

where s is a real variable. Unlike the characteristic function, the MGF may not exist for all
random variables and all values of s. If M X ( s ) exists at s = 0 , it may be conveniently used to
generate the moments as:

EX k = M X ( k ) ( s )

s =0

, k = 1, 2,..

the cumulant generating function of a random variable is defined as

K X ( s ) = log e ( M X ( s ) ) = log e ( Ee sX )
where M X ( s ) is the moment generating function.
If M X ( s ) exists and is non-zero, then K X ( s ) also exists.
As M X (0) = 1 , we get K X (0) = 0. The Taylor series expansion of K X ( s ) about the origin gives

K X ( s) = k X (n)
n =1

sn
n!

where the nth coefficient k n is called the nth cumulant of the random variable of X.
From the above expression, we get

k X ( n) =
=

dn
K X ( s)
ds n
s =0
dn
log e ( M X ( s ) )
ds n
s =0

We can derive the first four cumulants as

k X (1) =

dM X ( s )
= X
ds s =0

k X (2) =

d 2 M X ( s)
= X2
ds 2
s =0

d 3 M X ( s)
k X (3) =
= EX 3 EX 2 + 2 3 = E ( X X )3
3
ds
s =0
k X (4) =

d 4 M X ( s)
= E ( X X ) 4 3 X4
ds 4
s =0

Crammers Theorem
Let X1, X2,,, Xn be iid random variables with mean and the MGF
finite in a neighbourhood for s = 0. Then for any a > ,

1
S

lim log P n a = l * (a)


n n
n

M X ( s ) .which is

where

l * (a ) = max ( sa log M X ( s ) ) = s *a log M X ( s* )


s >0

and s* is the value of s corresponding to l * (a ) .

Sn

> a by using the Chernoff bound. According

We can easily find an apper bound for P


to the Chernoff bound,
( P ( X a ) = min e as M X ( s ) ).
s >0

P n > a = P ( S n > na ) min e ans M Sn ( s )


s >0

n
M S n ( s ) = Ee sS n
n

= Ee

Xi
i =1

= E e sX i
i =1

Ee

sX i

i =1

= ( M X (s))

n
S

P n > a min e ans ( M X ( s ) )


s
>
0
n

= min e ans + n log e M x ( s )


s >0

where log e M X ( s ) is the cumulant generating function

P n > a min e n( as loge M X ( s ) )


n
s >0
n as log e M X ( s ) )
Minimization of e (
is equivalent to maximization of l ( s ) = ( as log e M X ( s ) ) . Let
l * (a ) = max ( sa log e M X ( s ) ) = max ( s * a log e M X ( s*) ) .
s >0

s >0

l * ( a ) is known as the Fenchel-Legendre transform of log e M X ( s )


Thus,
*
S

P n > a e nl ( a )
n

1
S

log e P n > a l * ( a )
n
n

We have also to show that

lim

1 Sn

P
> a l * ( a )
n n

For this, consider a set of new random variables Y1 , Y2 ,..., Yn obtained by mapping X i to Yi
such that the common distribution function of Yi s is given by
y

1
FY ( y ) =
e s*u dFX (u )

M X (s *)
As X i s are iid random variables so also Yi s . We can verify that FY ( y ) is a valid distribution
function. We also observe that

dFY (u ) =

e s*u dFX (u )
M X (s *)

The MGF of Y is given by

MY (s ) =

su

dFY (u )

1
( s + s *)u
e dFX (u)
M X (s *)

M X (s + s *)
M X (s *)

EYi = MY/ (0) =

M X/ (s *)
=a
M X (s *)

Similarly,

VarYi = MY// (0) (MY/ (0) )

Define S n =

Y .
i

i =1

Now

MS (s ) = Ee

Yi
i =1

= Ee sYi
i =1

= ( Ee sY )

which we get using the iid property.

MS (s ) = (MY (s ) )

(M X (s + s *))
=

(M x (s *))n

MSn (s + s *)
(M x (s *))n

Noting the definition of MS (s ) , we get


n

e dFS (u ) =
su

( s + s *)u

dFSn (u )

(M x (s *))n

so that

dFS (u ) =
n

es *u dFSn (u )
(M x (s *))n

Using the above relationship, the probability involving Sn can be studied in terms of the
probabilities involving .
Suppose b>a, we have

P n > a = P ( Sn > na )
n

dF

Sn

(u )

na

(M

( s*)) n e s*u dFSn (u )

na
nb

s*nb

( M x ( s*))

dF

Sn

(u )

na
nb

=e

( s*nb n log e ( M x ( s *))

dF

Sn

(u)

na

= e ( s*nbn loge ( M x ( s*)) P( na < Sn < nb)


Taking logarithm of both sides and dividing by n, we get

1
1
S

log e P n > a ( s * b log e ( M x ( s*)) + log e P ( na < Sn < nb)


n
n
n

1
1
= ( s * b log e ( M x ( s*)) + log e P ( Sn < nb) log e P ( Sn na )
n
n
Now according to the CLT,

and according to the SLLN,

Therefore, as n becomes large and noting that we can take b arbitrarily close to a,

1
S

log e P n > a ( s * b log e ( M x ( s*))


n
n

= ( s * a log e ( M x ( s*))
= l * ( a )
Combining the lower bound and upper bounds,

1
S

lim log P n a = l * (a)


n n
n

Thus for or large n ,


*
S

P n a e nl ( a )
n

Example: Let X i s are Bernoulli random variable.

Xi = 1
=0

with probability 'p '


with probability '1- p '

The moment generating function is given by

M X ( s ) = Ee sX = pe s + (1 p )

log M X ( s ) = log (1 p + pe s )
l ( s ) = as log (1 p + pe s )
a (1 p )
p(1 a )
1 a
a
Then, l * (a ) = a log + (1 a) log

p
1 p
l ( s ) is maximum at s = log

In fact, S n =

X
i =1

Binomial( p, n) so that we can find P n a from which we can check


n

the tightness of the bounds.


And a is a number between 0 and 1 so a has a probability interpretation.
Since l * (a ) = a log

1 a
a
*
+ (1 a) log
, so we can call l (a ) is the relative entropy or Kullback
p
1

Leibler distance between ( a,1 a ) and ( p,1 p) .

You might also like