You are on page 1of 5

6 Strong law of large numbers

There is a zoo of strong laws of large numbers, each of which varies in the exact
assumptions it makes on the underlying sequence of random variables.

Theorem 6.1 (Strong law of large numbers). Let X1 , X2 , . . . be integrable real random
variables i.i.d. Then  
X1 + + Xn
n
converges almost surely to E[X1 ].

Proof. First, let us note that X1 E[X1 ], X2 E[X2 ], . . . are i.i.d. centered. Then,
without loss of generality, we may assume that E[X1 ] = 0. Second, let Yn := Xn 1(|Xn |n)
for every n N and let h : R R, h(x) = |x|. Since,

X
X
P(Xn 6= Yn ) = P(|Xn | > n)
n=1 n=1

X
= PX1 (h > n)
n=1
X X
= PX1 (m + 1 h > m)
n=1 m=n
X X m
= PX1 (m + 1 h > m)
m=1 n=1
X
= mPX1 (m + 1 h > m)
m=1
Z
hdPX1 ,
R

by the Borel-Cantelli lemma,

Xn = Yn for n sufficiently large

happens almost surely. And it therefore suffices to show that


Y1 + + Yn a.s.
0. (6.1)
n
Finally, let us note that Y1 , Y2 , . . . are

(i) independent and

(ii) square integrable, and


P 2
(iii) n=1 n Var[Yn ] < and

1
(iv) E[Yn ] 0.

Then, by lemma 6.2, proposition 6.1 follows. Indeed, we shall prove every assertion
above:

(i) (Yn ) (Xn ).

(ii) Since,

X
X Z
2 2
n E[Yn2 ] = n x2 1(hn) (x)PX1 (dx)
n=1 n=1 R

X n
X Z
2
n x2 PX1 (dx)
n=1 m=1 (m1<hm)

X X n Z
2
mn |x|PX1 (dx)
n=1 m=1 (m1<hm)
X
X Z
2
= mn |x|PX1 (dx)
m=1 n=m (m1<hm)

!Z
X X
2
= m n |x|PX1 (dx)
m=1 n=m (m1<hm)

X Z
2 |x|PX1 (dx)
m=1 (m1<hm)
Z
2 |x|PX1 (dx),
R


X
n2 E[Yn2 ] < . (6.2)
n=1

(iii) This follows from proposition 6.2.

(iv) Let fn : R R, f (x) = x1(hn) (x), and f : R R, f (x) = x. Since, fn f , by


R R
the dominated convergence theorem, E[Yn ] = R fn dPX1 R f dPX1 = 0.

Lemma 6.2. Let X1 , X2 , . . . be square integrable and independent real random variables
with 2
P
n=1 n Var[Xn ] < and

E[X1 ] + + E[Xn ]
a R.
n
Then
X1 + + Xn a.s.
a.
n
Pn
Proof. Let Sn := k=1 (Xk E[Xk ]). Fix  > 0. For every k N, let Ak be the event
where
n1 |Sn |  for some n with 2k1 n < 2k .

2
Then on Ak we have
|Sn | 2k1 for some n < 2k ,
so by Kolmogorovs inequality,
2 k

k1 2
X
P(Ak ) (2 ) Var[Xn ].
n=1

Therefore,
2 k
X 4 X X 2k
P(Ak ) 2 2 Var[Xn ]
k=1
 k=1 n=1

4 X X 2k
= 2 2 Var[Xn ]
 n=1 k=log n
2

8 X 2
2 n Var[Xn ],
 n=1

so  
P lim sup Ak = 0
k

by the Borel-Cantelli lemma. But lim sup Ak is precisely the set where

n1 |Sn |  for infinitely many n,

so  
1
P lim sup n |Sn | <  = 1.
n

Letting  0 through a countable sequence of values, we have that

X1 + + Xn E[X1 ] + + E[Xn ] a.s.


0,
n n
and the conclusion follows.
Remark 6.3. The core of the weak law of large numbers is Chebyshevs inequality. Here
we present a stronger inequality that claims the same bound but now for the maximum
over all partial sums until a fixed time.
Lemma 6.4 (Kolmogorovs inequality). Let X1 , . . . , Xn be square integrable, indepen-
dent and centered real random variables; let Sk = X1 + + Xk for k = 1, . . . , n. Then,
for any  > 0,  
P max |Sk |  2 Var[Sn ]. (6.3)
1kn

3
Proof. First, fix  > 0. We decompose the probability space according to the first time
at which the partial sums exceed the value . Hence, let

:= min{k = 1, . . . , n ; |Sk | }

and Ak := ( = k) for k = 1, . . . , n. Further, let


 n
 ]
A := max |Sk |  = Ak .
1kn
k=1

The random variables Sn Sk and Sk 1Ak are (Xk+1 , . . . , Xn ) and (X1 , . . . , Xk ) mea-
surable, and thus

E[(Sn Sk )Sk 1Ak ] = E[Sn Sk ]E[Sk 1Ak ] = 0.

Then

Var[Sn ] = E[Sn2 ]
" n #
X
E Sn2 1Ak
k=1
n
X
= E[Sn2 1Ak ]
k=1
n
X
= E[((Sn Sk )2 + 2(Sn Sk )Sk + Sk2 )1Ak ]
k=1
n
X n
X
2
= E[(Sn Sk ) 1Ak ] + E[Sk2 )1Ak ]
k=1 k=1
Xn
E[Sk2 1Ak ]
k=1
Xn
E[2 1Ak ]
k=1
2
=  P(A),

and inequality 6.3 follows.


Example 6.5 (Monte Carlo Integration). ...
Definition 6.6 (Empirical distribution function). Let X1 , X2 , . . . be real random vari-
ables. The map Fn : R [0, 1], x 7 n1 ni=1 1],x] (Xi ) is called the empirical
P

distribution function of X1 , X2 , . . ..
Theorem 6.7 (GlivenkoCantelli). Let X1 , X2 , . . . be i.i.d. real random variables with
distribution function F , and let Fn , n N, be the empirical distribution functions. Then

lim sup sup |Fn (x) F (x)| = 0 a.s.


n+ xR

4
Proof. Exercise.
Example 6.8 (Shannons theorem). ...
Exercise 6.1. Let X1 , X2 , . . . be i.i.d. real random variables with
X1 + . . . + Xn a.s.
Y
n
for some random variable Y . Show that X1 L1 (P) and Y = E[X1 ] a.s. (Hint: first
show that

P(|Xn | > n for infinitely many n) = 0 X1 L1 (P).)

Exercise 6.2. Let E be a finite set and let p be a probability vector on E. Show that
the entropy H(p) is minimal (in fact, zero) if p = e for some e E. It is maximal (in
fact, log(#E)) if p is the uniform distribution on E.
Exercise 6.3. Let X1 , X2 , . . . be independent and centered real random variables with
P
n=1 Var[Xn ] < . Prove that (X1 + + Xn ) converges almost surely. (hint: apply
Kolmogorovs inequality to show that the partial sums are Cauchy almost surely.)
Exercise 6.4. If the plus and minus signs in 1
P
n=1 n are determined by successive
tosses of a fair coin, prove that the resulting series converges almost surely.
Exercise 6.5. Let X1 , X2 , . . . be real random variables i.i.d. that are not integrable.
Prove that
a.s.
lim sup n1 |X1 + + Xn | .
n
P
(Hint: show that n=1 P(|Xn | > n) = and apply the Borel-Cantelli lemma.)
Exercise 6.6. A collection or population of N objects (such as mice, grains of sand,
etc.) may be considered as a sample space in which each object has probability N 1 .
Let X be a random variable on this space (a numerical characteristic of the objects such
as mass, diameter, etc.) with mean m and variance v. In statistics one is interested in
determining m and v by taking a sequence of random samples from the population and
measuring X for each sample, thus obtaining a sequence (Xn ) of numbers that are values
of independent random variables with the same distribution as X. The nth sample mean
is Mn = n1 ni=1 Xn and the nth sample variance is Vn = (n 1)1 ni=1 (Xi Mi )2 .
P P

a.s. a.s.
(i) Show that E[Mn ] = m, E[Vn ] = v, and Mn m and Vn v.

(ii) Can you see why one uses (n 1)1 instead of n 1 in the definition of Vn ?

You might also like