Probability Theory Notes Chapter 3 Varadhan

Chapter 3
Independent Sums
3.1 Independence and Convolution

One of the central ideas in probabilty is the notion of independence. In
intuitive terms two events are independent if they have no influence on each
other. The formal definition is
Definition 3.1. Two events A and B are said to be independent if
P [A B] = P [A]P [B].
Exercise 3.1. If A and B are independent prove that so are Ac and B.
Definition 3.2. Two random variables X and Y are independent if the

events X A and Y B are independent for any two Borel sets A and
B on the line i.e.
P [X A, Y B] = P [X A]P [Y B].
for all Borel sets A and B.
There is a natural extension to a finite or even an infinite collection of

random variables.
51
52 CHAPTER 3. INDEPENDENT SUMS
Definition 3.3. A finite collection collection {Xj : 1 j n} of random

variables are said to be independent if for any n Borel sets A1 , . . . , An on the
line

P 1jn [Xj Aj ] = 1jn P [Xj Aj ].
Definition 3.4. An infinite collection of random variables is said to be in-

dependent if every finite subcollection is independent.
Lemma 3.1. Two random variables X, Y defined on (, , P ) are indepen-

dent if and only if the measure induced on R2 by (X, Y ), is the product
measure where and are the distributions on R induced by X and
Y respectively.
Proof. Left as an exercise.
The important thing to note is that if X and Y are independent and one
knows their distributions and , then their joint distribution is automati-
cally determined as the product measure.
If X and Y are independent random variables having and for their
distributions, the distribution of the sum Z = X +Y is determined as follows.
First we construct the product measure on RR and then consider the
induced distribution of the function f (x, y) = x + y. This distribution, called
the convolution of and , is denoted by . An elementary calculation
using Fubinis theorem provides the following identities.
Z Z
( )(A) = (A x) d = (A x) d (3.1)
In terms of characteristic function, we can express the characteristic func-

tion of the convolution as
Z Z Z
exp[ i t x ]d( ) = exp[ i t (x + y) ] d d
Z Z
= exp[ i t x ] d exp[ i t x ] d
3.1. INDEPENDENCE AND CONVOLUTION 53
or equivalently
(t) = (t) (t) (3.2)
which provides a direct way of calculating the distributions of sums of inde-
pendent random variables by the use of characteristic functions.
Exercise 3.2. If X and Y are independent show that for any two measurable
functions f and g, f (X) and g(Y ) are independent.
Exercise 3.3. Use Fubinis theorem to show that if X and Y are independent
and if f and g are measurable functions with both E[|f (X)|] and E[|g(Y )|]
finite then
E[f (X)g(Y )] = E[f (X)]E[g(Y )].
Exercise 3.4. Show that if X and Y are any two random variables then
E(X + Y ) = E(X) + E(Y ). If X and Y are two independent random
variables then show that
Var(X + Y ) = Var(X) + Var(Y )
where

Var(X) = E [X E[X]]2 = E[X 2 ] [E[X]]2 .
If X1 , X2 , , Xn are n independent random variables, then the distri-
bution of their sum Sn = X1 + X2 + + Xn can be computed in terms of
the distributions of the summands. If j is the distribution of Xj , then the
distribution of n of Sn is given by the convolution n = 1 2 n
that can be calculated inductively by j+1 = j j+1 . In terms of their
characteristic functions n (t) = 1 (t)2 (t) n (t). The first two moments
of Sn are computed easily.
E(Sn ) = E(X1 ) + E(X2 ) + E(Xn )
and
Var(Sn ) = E[Sn E(Sn )]2
X
= E[Xj E(Xj )]2
j
X
+2 E[Xi E(Xi )][Xj E(Xj )].
1i<jn
For i 6= j, because of independence
E[Xi E(Xi )][Xj E(Xj )] = E[Xi E(Xi )]E[Xj E(Xj )] = 0
and we get the formula
Var(Sn ) = Var(X1 ) + Var(X2 ) + + Var(Xn ). (3.3)
3.2 Weak Law of Large Numbers

Let us look at the distribution of the number of succeses in n independent
trials, with the probability of success in a single trial being equal to p.

n r
P {Sn = r} = p (1 p)nr
r
and
X
n r
P {|Sn np| n} = p (1 p)nr
r
|rnp|n
X
1 2 n
(r np) pr (1 p)nr (3.4)
n2 2 r
|rnp|n

1 X 2 n
(r np) pr (1 p)nr
n2 2 1rn r
1 1
= E[Sn np]2 = Var(Sn ) (3.5)
n2 2 n2 2
1
= np(1 p) (3.6)
n2 2
p(1 p)
= .
n 2
In the step (3.4) we have used a discrete version of the simple inequality
Z
g(x)d g(a)[x : g(x) a]
x:g(x)a
with g(x) = (x np)2 and in (3.6) have used the fact that Sn = X1 + X2 +
+ Xn where the Xi are independent and have the simple distribution
3.2. WEAK LAW OF LARGE NUMBERS 55
P {Xi = 1} = p and P {Xi = 0} = 1 p. Therefore E(Sn ) = np and

Var(Sn ) = nVar(X1 ) = np(1 p)
It follows now that
Sn
lim P {|Sn np| n} = lim P {| p| } = 0
n n n
or the average Sn n converges to p in probability. This is seen easily to
be equivalent to the statement that the distribution of Snn converges to the
distribution degenerate at p. See (2.12).
The above argument works for any sequence of independent and iden-
tically distributed random variables. If we assume that E(Xi ) = m and
Var(Xi ) = 2 < , then E( Snn ) = m and Var( Snn ) = n . Chebychevs
2
inequality states that for any random variable X

Z
P {|X E[X]| } = dP
|XE[X]|
Z
1
2 [X E[X]]2 dP
|XE[X]|
Z
1
= 2 [X E[X]]2 dP

1
= 2 Var(X). (3.7)

This can be used to prove the weak law of large numbers for the gen-
eral case of independent identically distributed random variables with finite
second moments.
Theorem 3.2. If X1 , X2 . . . , Xn , . . . is a sequence of independent identically

distributed random variables with E[Xj ] m and VarXj 2 then for
Sn = X1 + X2 + + Xn
we have

Sn
lim P | m| = 0
n n
for any > 0.

Proof. Use Chebychevs inequality to estimate

Sn 1 Sn 2
P | m| 2 Var( ) = 2 .
n n n
Actually it is enough to assume that E|Xi | < and the existence of the
second moment is not needed. We will provide two proofs of the statement
Theorem 3.3. If X1 , X2 , Xn are independent and identically distributed
with a finite first moment and E(Xi ) = m, then X1 +X2n++Xn converges to m
in probability as n .
Proof. 1. Let C be a large constant and let us define XiC as the truncated
random variable XiC = Xi if |Xi | C and XiC = 0 otherwise. Let YiC =
Xi XiC so that Xi = XiC + YiC . Then
1 X 1 X C 1 X C
Xi = X + Y
n 1in n 1in i n 1in i
= nC + nC .
If we denote by aC = E(XiC ) and bC = E(YiC ) we always have m =
aC + bC . Consider the quantity
1 X
n = E[| Xi m|]
n 1in
= E[|nC + nC m|]
E[|nC aC |] + E[|nC bC |]
12
2
E[|n aC | ] + 2E[|YiC |].
C
(3.8)
As n , the truncated random variables XiC are bounded and indepen-

dent. Theorem 3.2 is applicable and the first of the two terms in (3.8) tends
to 0. Therefore taking the limsup as n , for any 0 < C < ,
lim sup n 2E[|YiC |].

n
If Cwe now let the cutoff level C to go to infinity, by the integrability of Xi ,

E |Yi | 0 as C and we are done. The final step of establishing that
3.2. WEAK LAW OF LARGE NUMBERS 57
for any sequence Yn of random variables, E[|Yn |] 0 implies that Yn 0 in

probability, is left as an exercise and is not very different from Chebychevs
inequality.
Proof 2. We can use characteristic functions. If we denoteP the characteristic
1
function of Xi by (t), then the characteristic function of n 1in Xi is given
by n (t) = [( nt )]n . The existence of the first moment assures us that (t)
is differentiable at t = 0 with a derivative equal to im where m = E(Xi ).
Therefore by Taylor expansion
t imt 1
( ) = 1 + + o( ).
n n n
Whenever nan z it follows that (1 + an ) ez . Therefore,
n
lim n (t) = exp[ i m t ]

n
which is the characteristic function of the distribution degenerate at m.

Hence the distribution of Snn tends to the degenerate distribution at the point
m. The weak law of large numbers is thereby established.
Exercise 3.5. If the underlying distribution is a Cauchy distribution with
1 |t|
density (1+x 2 ) and characteristic function (t) = e , prove that the weak
law does not hold.
Exercise 3.6. The weak law may hold sometimes even if the mean does not
exist. If we dampen the tails of the Cauchy ever so slightly with a density
c
f (x) = (1+x2 ) log(1+x 2 ) , show that the weak law of large numbers holds.
Exercise 3.7. In the case of the Binomial distribution with p = 12 , use Stir-
lings formula

n! ' 2 en nn+12
to estimate the probability
X n 1
rnx
r 2n
and show that it decays geometrically in n. Can you calculate the geometric
ratio X n1
n 1
(x) = lim
n
rnx
r 2n
explicitly as a function of x for x > 12 ?
3.3 Strong Limit Theorems

The weak law of large numbers is really a result concerning the behavior of
Sn X 1 + X 2 + + Xn
=
n n
where X1 , X2 , , Xn , . . . is a sequence of independent and identically dis-
tributed random variables on some probability space (, B, P ). Under the
assumption that Xi are integrable with an integral equal to m, the weak law
asserts that as n , Snn m in Probability. Since almost everywhere
convergence is generally stronger than convergence in Probability one may
ask if

Sn ()
P : lim =m =1
n n
This is called the Strong Law of Large Numbers. Strong laws are statements
that hold for almost all .
Let us look at functions of the form fn = An . It is easy to verify that
fn 0 in probability if and only if P (An ) 0. On the other hand
Lemma 3.4. (Borel-Cantelli lemma). If

X
P (An ) <
n
then
P : lim An () = 0 = 1.
n
If the events An are mutually independent the converse is also true.
Remark 3.1. Note that the complementary event

: lim sup An () = 1
n
is the same as
n=1 j=n Aj , or the event that infinitely many of the events
{Aj } occcur.
The cnclusion of the next exercise will be used in the proof.

3.3. STRONG LIMIT THEOREMS 59
Exercise 3.8. Prove the following variant of the monotone convergence the-
orem.
P If fn () 0 are measurble functions the set E = { : S() =
n fn () < } is measurable
P and S() is a measurable function on E. If
each fn is integrable
P and n E[fn ] < then P [E] = 1, S() is integrable
and E[S()] = n E[fn ()].
P P
Proof. By the previous exercise if n P (An ) < , then n An () = S()
is finite almost everywhere and
X
E(S()) = P (An ) < .
n
If an infinite series has a finite sum then the n-th term must go to 0, thereby
proving
P the direct part. To prove the converse we need to show that if

n P (An ) = , then limm P (n=m An ) > 0. We can use independence
and the continuity of probability under monotone limits, to calculate for
every m,
P (
n=m An ) = 1 P (n=m An )
c
Y
= 1 (1 P (An )) (by independence)
n=m

P
P (An )
1e m
= 1
and we are done. We have used the inequality 1 x ex familiar in the
study of infinite products.
Another digression that we want to make into measure theory at this point
is to discuss Kolmogorovs consistency theorem. How do we know that there
are probability spaces that admit a sequence of independent identically dis-
tributed random variables with specified distributions? By the construction
of product measures that we outlined earlier we can construct a measure on
Rn for every n which is the joint distribution of the first n random variables.
Let us denote by Pn this probability measure on Rn . They are consistent
in the sense that if we project in the natural way from Rn+1 Rn , Pn+1
projects to Pn . Such a family is called a consistent family of finite dimen-
sional distributions. We look at the space = R consisting of all real
sequences = {xn : n 1} with a natural -field generated by the field
F of finite dimensional cylinder sets of the form B = { : (x1 , , xn ) A}
where A varies over Borel sets in Rn and varies over positive integers.
Theorem 3.5. (Kolmogorovs Consistency Theorem). Given a con-

sistent family of finite dimensional distributions Pn , there exists a unique
P on (, ) such that for every n, under the natural projection n () =
(x1 , , xn ), the induced measure P n1 = Pn on Rn .
Proof. The consistency is just what is required to be able to define P on F

by
P (B) = Pn (A).
Once we have P defined on the field F , we have to prove the countable
additivity of P on F . The rest is then routine. Let Bn F and Bn ,
the empty set. If possible let P (Bn ) for all n and for some > 0. Then
Bn = k1n
Akn for some kn and without loss of generality we assume that
kn = n, so that Bn = n1 An for some Borel set An Rn . According to
Exercise 3.8 below, we can find a closed bounded subset Kn An such that

Pn (An Kn )
2n+1
and define Cn = n1 Kn and Dn = nj=1 Cj = n1 Fn for some closed bounded
set Fn Kn Rn . Then
Xn

P (Dn ) j+1
.
j=1
2 2
Dn Bn , Dn and each Dn is nonempty. If we take (n) = {xnj : j 1}

to be an arbitrary point from Dn , by our construction (xn1 , xnm ) Fm for
n m. We can definitely choose a subsequence (diagonlization) such that
xnj k converges for each j producing a limit = (x1 , , xm , ) and, for
every m, we will have (x1 , , xm ) Fm . This implies that Dm for
every m, contradicting Dn . We are done.
Exercise 3.9. We have used the fact that given any borel set A Rn , and a
probability measure on Rn , for any > 0, there exists a closed bounded
subset K A such that (A K ) . Prove it by showing that the class
of sets A with the above property is a monotone class that contains finite
disjoint unions of measurable rectangles and therefore contains the Borel -
field. To prove the last fact, establish it first for n = 1. To handle n = 1,
repeat the same argument starting from finite disjoint unions of right-closed
left-open intevals. Use the countable additivity to verify this directly.
3.4. SERIES OF INDEPENDENT RANDOM VARIABLES 61
Remark 3.2. Kolmogorovs consistency theorem remains valid if we replace

R by an arbitrary complete separable metric space X, with its Borel -field.
However it is not valid in complete generality. See [8]. See Remark 4.7 in
this context.
The following is a strong version of the Law of Large Numbers.
Theorem 3.6. If X1 , , Xn is a sequence of independent identically
distributed random variables with E|Xi |4 = C < , then
Sn X 1 + + Xn
lim = lim = E(X1 )
n n n n
with probability 1.
Proof. We can assume without loss of generality that E[Xi ] = 0 . Just take
Yi = Xi E[Xi ]. A simple calculation shows
E[(Sn )4 ] = nE[(X1 )4 ] + 3n(n 1)E[(X1 )2 ]2 nC + 3n2 4
and by applying a Chebychev type inequality using fourth moments,
Sn nC + 3n2 4
P [| | ] = P [ |Sn | n ] .
n n4 4
We see that
X

Sn
P[| | ]<
n=1
n
and we can now apply the Borel-Cantelli Lemma.
3.4 Series of Independent Random variables

We wish to investigate conditions under which an infinite series with inde-
pendent summands
X

S= Xj
j=1
converges with probability 1. The basic steps are the following inequalities
due to Kolomogorov and Levy that control the behaviour of sums of inde-
pendent random variables. They both deal with the problem of estimating
X
k
Tn () = sup |Sk ()| = sup | Xj ()|
1kn 1kn j=1
where X1 , , Xn are n independent random variables.

Lemma 3.7. (Kolmogorovs P Inequality). Assume that EXi = 0 and
Var(Xi ) = i2 < and let s2n = nj=1 j2 . Then
s2n
P {Tn () `} . (3.9)
`2
Proof. The important point here is that the estimate depends only on s2n and
not on the number of summands. In fact the Chebychev bound on Sn is
s2n
P {|Sn | `}
`2
and the supremum does not cost anything.
Let us define the events Ek = {|S1 | < `, , |Sk1| < `, |Sk | `} and
then {Tn `} = nk=1 Ek is a disjoint union of Ek . If we use the independence
of Sn Sk and Sk Ek that only depends on X1 , Xk
Z
1
P {Ek } 2 Sk2 dP
` Ek
Z
1 2
2 Sk + (Sn Sk )2 dP
` Ek
Z
1 2
= 2 Sk + 2Sk (Sn Sk ) + (Sn Sk )2 dP
` Ek
Z
1
= 2 S 2 dP.
` Ek n
Summing over k from 1 to n
Z
1 s2n
P {Tn `} 2 Sn2 dP 2 .
` Tn ` `
eatblishing (3.9)
Lemma 3.8. (Levys Inequality). Assume that
`
P {|Xi + + Xn | }
2
for all 1 i n. Then

P {Tn `} . (3.10)
1
Proof. Let Ek be as in the previous lemma.

Xn
` `
P (Tn `) |Sn | = P Ek |Sn |
2 k=1
2
Xn
`
P Ek |Sn Sk |
k=1
2
Xn
`
= P |Sn Sk | P (Ek )
2
k=1
Xn
P (Ek )
k=1
= P {Tn `}.
On the other hand,

` `
P (Tn `) |Sn | > P |Sn | > .
2 2
Adding the two,
P Tn ` P Tn ` +
or

P Tn `
1
proving (3.10)
We are now ready to prove

Theorem 3.9. (Levys Theorem). If X1 , X2 , . . . , Xn , . . . is a seqence of
independent random variables, then the following are equivalent.
(i) The distribution n of Sn = X1 + + Xn converges weakly to a prob-

ability distribution on R.
(ii) The random variable Sn = X1 + + Xn converges in probability to a

limit S().
(iii) The random variable Sn = X1 + + Xn converges with probability 1

to a limit S().
Proof. Clearly (iii) (ii) (i) are trivial. We will establish (i) (ii)
(iii).
(i) (ii). The characteristic functions j (t) of Xj are such that
Y

(t) = j (t)
i=1
is a convergent infinite product. Since the limit (t) is continuous at t = 0

and (0) = 1 it is nonzero in some interval |t| T around 0. Therefore for
|t| T ,
Yn
lim
n
j (t) = 1.
m m+1
By Exercise 3.10 below, this implies that for all t,
Y
n
lim
n
j (t) = 1
m m+1
and consequently, the distribution of Sn Sm converges to the distribution

degenerate at 0. This implies the convergence in probability to 0 of Sn Sm
as m, n . Therefore for each > 0,
lim P {|Sn Sm | } = 0
n
m
establishing (ii).
(ii) (iii). To establish (iii), because of Exercise 3.11 below, we need only
show that for every > 0

lim
n
P sup |S k S m | =0
m m<kn
and this follows from (ii) and Levys inequality.
Exercise 3.10. Prove the inequality 1 cos 2t 4(1 cos t) for all real t.
Deduce the inequality 1 Real (2t) 4[1 Real (t)], valid for any char-
acteristic function. Conclude that if a sequence of characteristic functions
converges to 1 in an interval around 0, then it converges to 1 for all real t.
Exercise 3.11. Prove that if a sequence Sn of random variables is a Cauchy

sequence in Probability, i.e. for each > 0,
lim P {|Sn Sm | } = 0
n
m
then there is a random variable S such that Sn S in probability, i.e for

each > 0,
lim P {|Sn S| } = 0.
n
Exercise 3.12. Prove that if a sequence Sn of random variables satisfies

lim P
n
sup |Sk Sm | = 0
m m<kn
for every > 0 then there is a limiting random variable S() such that

P lim Sn () = S() = 1.
n
Exercise 3.13. Prove that whenever Xn X in probability the distribution

n of Xn converges weakly to the distribution of X.
Now it is straightforward to find sufficient conditions for the convergence

of an infinite series of independent random variables.
Theorem 3.10. (Kolmogorovs one series Theorem). Let a sequence
{Xi } of independent random variables,
P each of which has finite mean and
variance, satisfy E(Xi ) = 0 and
i=1 Var(X i ) < , then
X

S() = Xi ()
i=1
converges with probability 1.

Proof. By a direct application of Kolmogorovs inequality
1 X
n
lim P sup |Sk Sm | lim E(Xi2 )
n
m m<kn
n
m
2 j=m+1
1 X
n
= lim Var(Xi ) = 0.
n
m
2 j=m+1
Therefore
lim P { sup |Sk Sm | } 0.
n
m m<kn
We can also prove convergence in probability
lim P {|Sn Sm | } = 0
n
m
by a simple application of Chebychevs inequality and then apply Levys

Theorem to get almost sure convergence.
Theorem 3.11. (Kolomogorovs two series theorem). Let ai = E[Xi ]

be the means and i2 = Var(Xi ) the variances
P of P
a sequence of independent
randomPvariables {Xi }. Assume that i ai and i i2 converge. Then the
series i Xi converges with probability 1.
Proof. Define Yi = Xi ai and apply the previous (one series) theorem to

Yi .
Of course in general random variables need not have finite expectations

or variances. If {Xi } is any sequence of random variables we can take a cut
off value C and define Yi = Xi if |Xi | C, and Yi = 0 otherwise. The Yi are
then independent and bounded in absolute value by C. The theorem can be
applied to Yi and if we impose the additional condition that
X X
P {Xi 6= Yi } = P {|Xi| > C} <
i i
by an application of Borel-Cantelli Lemma, P with Probabilty

P 1, Xi = Yi for
all sufficiently large i. The convergence of i Xi and i Yi are therefore
equivalent. We get then the sufficiency part of
Theorem 3.12. (Kolmogorovs three series theorem). For P the con-

vergence of an infinite series of independent random variables i Xi it is
necessary and sufficient that all the three following infinite series converge.
P
(i) For some cut off value C > 0, i P {|Xi| > C} converges.
P
(ii) If Yi is defined to equal Xi if |Xi | C, and 0 otherwise, i E(Yi )
converges.
P
(iii) With Yi as in (ii), iVar(Yi ) converges.
P
Proof. Let us now prove the converse. If i Xi converges for a sequence of
independent random variables, we must necessarily have |Xn | C eventually
with probability 1. By Borel-Cantelli Lemma the first series must converge.
This means that in order to prove the necessity we can assume without loss
of generality that |Xi | are all bounded say by 1. we may also assume that
E(Xi ) = 0 for each i. Otherwise let us take independent
P randomP variables Xi0
0
that have the same distribution as Xi . Then
P i Xi as well as i Xi converge
with probability 1 and therefore so does i (Xi Xi0 ). The random variables
0
Zi = Xi XiP are independent and bounded by 2. They have mean 0. If
we can show Var(Zi ) is convergent, since Var(Zi ) = 2Var(Xi ) we would
have proved the convergencePof the the thirdPseries. Now it is elementary
to conclude
P that since both i Xi as well as i (Xi E(Xi )) converge, the
series i E(Xi ) must be convergent as well. So all we need is the following
lemma to complete the proof of necessity.
P
Lemma 3.13. If i Xi is convergent for a series of independent P random
variables with mean 0 that are individually bounded by C, then i Var(Xi )
is convergent.
Proof. Let Fn = { : |S1 | `, |S2 | `, , |Sn | `} where Sk = X1 + +

Xk . If the series converges with probablity 1, we must have, for some ` and
> 0, P (Fn ) for all n. We have
Z Z
2
Sn dP = [Sn1 + Xn ]2 dP
Fn1
ZFn1
2
= [Sn1 + 2Sn1 Xn + Xn2 ] dP
Fn1
Z
2
= Sn1 dP + n2 P (Fn1 )
ZFn1
2
Sn1 dP + n2
Fn1
and on the other hand,

Z Z Z
2 2
Sn dP = Sn dP + Sn2 dP
Fn1 Fn1 Fn
c
ZFn
Sn2 dP + P (Fn1 Fnc ) (` + C)2
Fn
providing us with the estimate

Z Z
2 2 2
n Sn dP Sn1 dP + P (Fn1 Fnc ) (` + C)2 .
Fn Fn1
Since Fn1 Fnc are disjoint and |Sn | ` on Fn ,
X

1 2
j2 ` + (` + C)2 ].
j=1

This concludes the proof.
3.5 Strong Law of Large Numbers

We saw earlier that in Theorem 3.6 that if {Xi} is sequence of i.i.d. (in-
dependent identically distributed) random variables with zero mean and a
finite fourth moment then X1 ++X
n
n
0 with probability 1. We will now
prove the same result assuming only that E|Xi | < and E(Xi ) = 0.
Theorem 3.14. If {Xi } is a sequence of i.i.d random variables with mean

0,
X 1 + + Xn
lim =0
n n
with probability 1.
Proof. We define
(
Xn if |Xn | n
Yn =
0 if |Xn | > n
an = P [Xn 6= Yn ], bn = E[Yn ] and cn = Var(Yn ).

First we note that (see exercise 3.14 below)
X X
an = P [|X1 | > n ] E|X1 | <
n n
lim bn = 0
n
3.5. STRONG LAW OF LARGE NUMBERS 69
and
X cn X E[Y 2 ] XZ x2
2
2
n
= 2
d
n n |x|n n
n n
Z
n
X Z
2 1
= x 2
d C |x| d <
nx
n
where is the common distribution of Xi . From P the three series theorem

P Xnand
the Borel-Cantelli Lemma, we conclude that n n as well as nP nbn
Yn bn
converge almost surely. It is elementary to verify that for any series n xnn
that converges, x1 ++x
n
n
0 as n . We therefore conclude that

X1 + + Xn b1 + + bn
P lim =0 =1
n n n
Since bn 0 as n , the theorem is proved.
Exercise 3.14. Let X be a nonnegative random variable. Then
X

E[X] 1 P [Xn n] E[X]
n=1
P
In particular E[X] < if and only if n P [X n] < .
Exercise 3.15. If for a sequence of i.i.d. random variables X1 , , Xn , ,
the strong law of large numbers holds with some limit, i.e.
Sn
P [ lim = ]= 1
n n
for some random variable , which may or may not be a constant with prob-
ability 1, then show that necessarily E|Xi| < . Consequently = E(Xi )
with probabilty 1.
One may ask why the limit cannot be a proper random variable. There
is a general theorem that forbids it called Kolmogorovs Zero-One law. Let
us look at the space of real sequences {xn : n 1}. We have the -field B,
the product -field on . In addition we have the sub -fields Bn generated
by {xj : j n}. Bn are with n and B = n Bn which is also a -field is
called the tail -field. The typical set in B is a set depending only on the
tail behavior of the sequence. For example the sets { : xn is bounded },
{ : lim supn xn = 1} are in B whereas { : supn |xn | = 1} is not.
Theorem 3.15. (Kolmogorovs Zero-One Law). If A B and P is

any product measure (not necessarily with identical components) P (A) = 0
or 1.
Proof. The proof depends on showing that A is independent of itself so that

P (A) = P (A A) = P (A)P (A) = [P (A)]2 and therefore equals 0 or 1. The
proof is elementary. Since A B Bn+1 and P is a product measure,
A is independent of Bn = -field generated by {xj : 1 j n}. It is
therefore independent of sets in the field F = n Bn . The class of sets A
that are independent of A is a monotone class. Since it contains the field
F it contains the -field B generated by F . In particular since A B, A is
independent of itself.
Corollary 3.16. Any random variable measurable with respect to the tail -
field B is equal with probaility 1 to a constant relative to any given product
measure.
Proof. Left as an exercise.
Warning. For different product measures the constants can be different.

Exercise 3.16. How can that happen?
3.6 Central Limit Theorem.

We saw before that for any sequence of independent identically distributed
random variables X1 , , Xn , the sum Sn = X1 + + Xn has the prop-
erty that
Sn
lim =0
n n
in probability provided the expectation exists and equals 0. If we assume
that the Variance of the random variables is finite and equals 2 > 0, then
we have
Theorem 3.17. The distribution of Sn

n
converges as n to the normal
distribution with density
1 x2
p(x) = exp[ 2 ]. (3.11)
2
3.6. CENTRAL LIMIT THEOREM. 71
Proof. If we denote by (t) the characteristic function of any Xi then the

Sn
characteristic function of n
is given by
t
n (t) = [( )]n
n
We can use the expansion
2 t2
(t) = 1 + o (t2 )
2
to conclude that
t 2 t2 1
( ) = 1 +o( )
n 2n n
and it then follows that
2 t2
lim n (t) = (t) = exp[ ].
n 2
Since (t) is the characteristic function of the normal distribution with den-
sity p(x) given by equation (3.11), we are done.
Exercise 3.17. A more direct proof is possible in some special cases. For
instance if each Xi = 1 with probability 12 , Sn can take the values n 2k
with 0 k n,
1 n
P [Sn = 2k n] = n
2 k
and
X
Sn 1 n
P [a b] = n .
n 2 k
k:a n2knb n
Use Stirlings formula to prove directly that

Z
Sn b
1 x2
lim P [a b] = exp[ ] dx.
n n a 2 2
Actually for the proof of the central limit theorem we do not need the
random variables {Xj } to have identical distributions. Let us suppose that
they all have zero means and that the variance of Xj is j2 . Define s2n =
12 + + n2 . Assume s2n as n . Then Yn = Ssnn has zero mean and

unit variance. It is not unreasonable to expect that
Z a
1 x2
lim P [Yn a] = exp[ ] dx
n 2 2
under certain mild conditions.
Theorem 3.18. (Lindebergs theorem). If we denote by i the distribu-

tion of Xi , the condition (known as Lindebergs condition)
n Z
1 X
lim x2 di = 0
n s2n i=1 |x|sn
for each > 0 is sufficient for the central limit theorem to hold.
Proof. The first step in proving this limit theorem as well as other limit
theorems that we will prove is to rewrite
Yn = Xn,1 + Xn,2 + + Xn,kn + An
where Xn,j are kn mutually independent random variables and An is a con-

X
stant. In our case kn = n, An = 0, and Xn,j = snj for 1 j n. We denote
by Z Z
x t
n,j (t) = E[e i t Xn,j
] = e dn,j = ei t sn dj = j ( )
itx
sn
where n,j is the distribution of Xn,j . The functions j and n,j are the
characteristic functions of j and n,j respectively. If we denote by n the
distribution of Yn , its characteristic function n (t) is given by
Y
n
n (t) = n,j (t)
j=1
and our goal is to show that

t2
lim n (t) = exp[ ].
n 2
This will be carried out in several steps. First, we define
n,j (t) = exp[n,j (t) 1]

and
Y
n
n (t) = n,j (t).
j=1
We show that for each finite T ,
lim sup sup |n,j (t) 1| = 0

n |t|T 1jn
and
X
n
sup sup |n,j (t) 1| < .
n |t|T j=1
This would imply that

lim sup log n (t) log n (t)
n |t|T
X
n

lim sup log n,j (t) [n,j (t) 1]
n |t|T
j=1
X
n
lim sup C |n,j (t) 1|2
n |t|T
j=1
X
n
C lim sup sup |n,j (t) 1| sup |n,j (t) 1|
n |t|T 1jn |t|T j=1
=0
by the expansion
log r = log(1 + (r 1)) = r 1 + O(r 1)2 .
The proof can then be completed by showing

X
t2 n t2
lim sup log n (t) + = lim sup (n,j (t) 1) + = 0.
n |t|T 2 n |t|T
j=1
2
We see that
Z

sup n,j (t) 1 = sup exp[i t x ] 1 dn,j
|t|T |t|T
Z
x
= sup exp[i t ] 1 dj
|t|T sn
Z
x x
= sup exp[i t ] 1 i t dj (3.12)
|t|T sn sn
Z 2
x
CT dj (3.13)
s2n
Z Z
x2 x2
= CT 2
d j + C T 2
dj
|x|<sn sn |x|sn sn
Z
2 1
CT + CT 2 x2 dj . (3.14)
sn |x|sn
We have used the mean zero condition in deriving equation 3.12 and
the estimate |eix 1 ix| cx2 to get to the equation 3.13. If we let
n , by Lindebergs condition, the second term of equation (3.14) goes
to 0. Therefore

lim sup sup sup n,j (t) 1 2 CT .
n 1jkn |t|T
Since, > 0 is arbitrary, we have

lim sup sup n,j (t) 1 = 0.
n 1jkn |t|T
Next we observe that there is a bound,
X Xn Z
1 X 2
n n
x2
sup n,j (t) 1 CT dj CT 2 = CT
|t|T j=1 j=1
s2n sn j=1 j
uniformly in n. Finally for each > 0,

X
n t2
lim sup (n,j (t) 1) +
n |t|T 2
j=1
Xn
2 t2
lim sup n,j (t) 1 + j
n |t|T
j=1
2s2n
Xn Z
x x t2 2
x
= lim sup exp[i t ] 1 i t + 2 dj
n |t|T sn sn 2sn
j=1
Xn Z
x x t 2 2
x
lim sup exp[i t ] 1 i t + 2 dj
n |t|T sn sn 2sn
j=1 |x|<sn
X n Z
x x t2 2
x
+ lim sup exp[i t ] 1 i t + 2 dj
n |t|T sn sn 2sn
j=1 |x|sn
Xn Z
|x|3
lim CT 3
dj
n
j=1 |x|<s n
s n
X n Z
x2
+ lim CT 2
dj
n
j=1 |x|s n
s n
X n Z
x2
CT lim sup 2
dj
n
j=1
s n
X n Z
x2
+ lim CT 2
dj
n
j=1 |x|s n
s n
= CT
by Lindebergs condition. Since > 0 is arbitrary the result is proven.
Remark 3.3. The key step in the proof of the central limit theorem under
Lindebergs condition, as well as in other limit theorems for sums of inde-
pendent random variables, is the analysis of products
n (t) = kj=1
n
n,j (t).
The idea is to replace each n,j (t) by exp [n,j (t) 1], changing the product
to the exponential of a sum. Although each n,j (t) is close to 1, making
the idea Preasonable, in order for the idea to work one has to show that
the sum kj=1 n
|n,j (t) 1|2 is negligible. This requires the boundedness of
Pkn
j=1 |n,j (t) 1|. One has to use the mean 0 condition or some suitable
centering condition to cancel the first term in the expansion of n,j (t) 1
and control the rest from sums of the variances.
Exercise 3.18. Lyapunovs condition is the following: for some > 0
n Z
1 X
lim |x|2+ dj = 0.
n s2+
n j=1
Prove that Lyapunovs condition implies Lindebergs condition.
Exercise 3.19. Consider the case of mutually independent random variables

{Xj }, where Xj = aj with probability 12 . What do Lyapunovs and Linde-
bergs conditions demand of {aj }? Can you find a sequence {aj } that does
not satisfy Lyapunovs condition for any > 0 but satisfies Lindebergs con-
dition? Try to find a sequence {aj } such that the central limit theorem is
not valid.
3.7 Accompanying Laws.
As we stated in the previous section, we want to study the behavior of the sum
of a large number of independent random variables. We have kn independent
random variables {Xn,j : 1 j kn } with respective distributions {n,j }.
Pn
We are interested in the distribution n of Zn = kj=1 Xn,j . One important
assumption that we will make on the random variables {Xn,j } is that no
single one is significant. More precisely for every > 0,
lim sup P [ |Xn,j | ] = lim sup n,j [ |x| ] = 0. (3.15)

n 1jkn n 1jkn
The condition is referred to as uniform infinitesimality. The following

construction will play a major role. If is a probability distribution on the
line and (t) is its characteristic function, for any nonnegative real number
a > 0, a (t) = exp[a((t) 1)] is again a characteristic distribution. In fact,
3.7. ACCOMPANYING LAWS. 77
if we denote by k the k-fold convolution of with itself, a is seen to be

the characteristic function of the probability distribution
X

aj
a
e j
j=0
j!
which is a convex combination j with weights ea aj! . We use the construc-

j
tion mostly with a = 1. If we denote the probability distribution with charac-

teristic function a (t) by ea () one checks easily that ea+b () = ea () eb().
In particular ea () = e na ()n . Probability distributions that can be written
for each n 1 as the n-fold convolution nn of some probability distribution
n are called infinitely divisible. In particular for every a 0 and , ea ()
is an infinitely divisible probability distribution. These are called compound
Poisson distributions. A special case when = 1 the degenerate distribu-
tion at 1, we get for ea (1 ) the usual Poisson distribution with parameter a.
We can interpret ea () as the distribution of the sum of a random number
of independent random variables with common distribution . The random
n has a distribution which is Poisson with parameter a and is independent
of the random variables involved in the sum.
In order to study the distribution n of Zn it will be more convenient
to replace n,j by an infinitely divisible distribution n,j . This is done as
follows. We define Z
an,j = x dn,j ,
|x|1
0
n,j as the translate of n,j by an,j , i.e.
0
n,j = n,j an,j ,
0
n,j = e1 (n,j ),
0
n,j = n,j an,j
and finally
Y
kn
n = n,j
j=1
A main tool in this subject is the following theorem. We assume always

that the uniform infinitesimality condition (3.15) holds. In terms of notation,
we will find it more convenient to denote by the characteristic function of
the probability distribution .
Theorem 3.19. (Accompanying Laws.) In order that, for some con-

stants An , the distribution n An of Zn + An may converge to the limit it
is necessary and sufficient that, for the same constants An , the distribution
n An converges to the same limit .
Proof. First we note that, for any > 0,

Z

lim sup sup |an,j | = lim sup sup x dn,j
n 1jkn n 1jkn
Z|x|1

lim sup sup x dn,j
n 1jkn |x|
Z

+ lim sup sup x dn,j
n 1jkn <|x|1
+ lim sup sup n,j [ |x| ]

n 1jkn
= .
Therefore
lim sup |an,j | = 0.
n 1jkn
0
This means that n,j are uniformly infinitesimal just as n,j were. Let us
suppose that n is so large that sup1jkn |an,j | 14 . The advantage in going
0
from n,j to n,j is that the latter are better centered and we can calculate
Z
a0n,j = 0
x dn,j
|x|1
Z
= (x an,j ) dn,j
|xan,j |1
Z
= x dn,j an,j n,j [ |x an,j | 1 ]
|xan,j |1
Z
= x dn,j an,j + n,j [ |x an,j | > 1 ]
|xan,j |1
and estimate |a0n,j | by
3 1
|a0n,j | Cn,j [ |x| 0
] Cn,j [ |x| ].
4 2
In other words we may assume without loss of generality that n,j satisfy the
bound
1
|an,j | Cn,j [ |x| ] (3.16)
2
0
and forget all about the change from n,j to n,j . We will drop the primes
and stay with just n,j . Then, just as in the proof of the Lindeberg theorem,
we proceed to estimate

lim sup log n (t) log n (t)
n |t|T
Xkn

lim sup log n,j (t) (n,j (t) 1)]
n |t|T
j=1
X
kn

lim sup log n,j (t) (n,j (t) 1)
n |t|T
j=1
X
kn
lim sup C |n,j (t) 1|2
n |t|T
j=1
= 0.
provided we prove that if either n or n has a limit after translation by some

constants An , then
X
kn

sup sup n,j (t) 1 C < . (3.17)
n |t|T j=1
Let us first suppose that n has a weak limit as n after translation

by An . The characteristic functions
X
kn

exp (n,j (t) 1)) + itAn = exp[fn (t)]
j=1
have a limit, which is again a characteristic function. Since the limiting char-
acteristic function is continuous and equals 1 at t = 0, and the convergence
is uniform near 0, on some small interval |t| T0 we have the bound

sup sup 1 Re fn (t) C
n |t|T0
or equivalently
kn Z
X
sup sup (1 cos t x ) dn,j C
n |t|T0 j=1
and from the subadditivity property (1cos 2 t x ) 4(1cos t x) this bound

extends to arbitrary interval |t| T ,
kn Z
X
sup sup (1 cos t x ) dn,j CT .
n |t|T j=1
If we integrate the inequality with respect to t over the interval [T, T ] and
divide by 2T , we get
kn Z
X sin T x
sup (1 ) dn,j CT
n
j=1
Tx
from which we can conclude that
X
kn
sup n,j [ |x| ] C <
n
j=1
for every > 0 by choosing T = 2 . Moreover using the inequality (1cos x)

c x2 valid near 0 for a suitable choice of c we get the estimate
kn Z
X
sup x2 dn,j C < .
n
j=1 |x|1
Now it is straight forward to estimate, for t [T, T ],

Z

|n,j (t) 1| = [exp(i t x ) 1] dn,j
Z

= [exp(i t x ) 1] dn,j
|x|1
Z

+ [exp(i t x ) 1] dn,j
|x|>1
Z

[exp(i t x ) 1 i t x] dn,j
|x|1
Z

+ [exp(i t x ) 1] dn,j + T |an,j |
|x|>1
Z
1
C1 x2 dn,j + C2 n,j [x : |x| ]
|x|1 2
which proves the bound of equation (3.17).

Now we need to establish the same bound under the assumption that
n has a limit after suitable translations. For any probability measure
we define by (A) = (A) for all Borel sets. The distribution is
denoted by ||2. The characteristic functions of and ||2 are respectively
and |(t)|2 where (t) is the characteristic function of . An elementary
(t)
but important fact is | A|2 = ||2 for any translate A. If n has a limit so
does |n |2 . We conclude that the limit
Y
kn
2
lim |n (t)| = lim |n,j (t)|2
n n
j=1
exists and defines a characteristic function which is continuous at 0 with a

value of 1. Moreover because of uniform infinitesimality,
lim inf |n,j (t)| = 1.

n |t|T
It is easy to conclude that there is a T0 > 0 such that, for |t| T0 ,
X
kn
sup sup [1 |n,j (t)|2 ] C0 <
n |t|T0 j=1
and by subadditivity for any finite T ,
X
kn
sup sup [1 |n,j (t)|2 ] CT <
n |t|T j=1
providing us with the estimates
X
kn
sup |n,j |2 [ |x| ] C < (3.18)
n
j=1
for any > 0, and
kn Z Z
X
sup (x y)2dn,j (x) dn,j (y) C < . (3.19)
n
j=1 |xy|2
We now show that estimates (3.18) and (3.19) imply (3.17)

Z
2
|n,j | [ x : |x| ] n,j [x : |x y| ] dn,j (y)
2 |y| 2 2

n,j [ x : |x| ] n,j [ x : |x| ]
2
1
n,j [ x : |x| ]
2
by uniform infinitesimality. Therfore 3.18 implies that for every > 0,
X
kn
sup n,j [ x : |x| ] C < . (3.20)
n
j=1
We now turn to exploiting (3.19). We start with the inequality

Z Z
(x y)2dn,j (x) dn,j (y)
|xy|2
Z
2
n,j [y : |y| 1 ] inf (x y) dn,j (x) .
|y|1 |x|1
3.8. INFINITELY DIVISIBLE DISTRIBUTIONS. 83
The first term on the right can be assumed to be at least 12 by uniform

infinitesimality. The second term
Z Z Z
2 2
(x y) dn,j (x) x dn,j (x) 2y x dn,j (x)
|x|1 |x|1 |x|1
Z Z

x dn,j (x) 2
2
x dn,j (x)
|x|1 |x|1
Z
1
x2 dn,j (x) Cn,j [x : |x| ].
|x|1 2
The last step is a consequence of estimate (3.16) that we showed we could
always assume.
Z
1
x dn,j (x) Cn,j [x : |x| ]
|x|1 2
Because of estimate (3.20) we can now assert
kn Z
X
sup x2 dn,j C < . (3.21)
n
j=1 |x|1
One can now derive (3.17) from (3.20) and (3.21) as in the earlier part.
Exercise 3.20. Let kn = n2 and n,j = 1 for 1 j n2 . n = n and show
n
that without centering n n converges to a different limit.
3.8 Infinitely Divisible Distributions.

In the study of limit theorems for sums of independent random variables
infinitely divisible distributions play a very important role.
Definition 3.5. A distribution is said to be infinitely divisible if for every

positive integer n, can be written as the n-fold convolution (n )n of some
other probability distribution n .
Exercise 3.21. Show that the normal distribution with density
1 x2
p(x) = exp[ ]
2 2
is infinitely divisible.
Exercise 3.22. Show that for any 0, the Poisson distribution with pa-
rameter
en n
p (n) = for n 0
n!
is infinitely divisible.
Exercise 3.23. Show that any probabilty distribution supported on a finite
set {x1 , . . . , xk } with
[{xj }] = pj
Pk
and pj 0, j=1 pj = 1 is infinitely divisible if and only if it is degenrate,
i.e. [{xj }] = 1 for some j.
Exercise 3.24. Show that for any nonnegative finite measure with total
mass a, the distribution
X
()j
e(F ) = ea
j=0
j!
with characteristic function
Z
[
e(F )(t) = exp[ (eitx 1)d]
is an infinitely divisible distribution.

Exercise 3.25. Show that the convolution of any two infinitely divisible dis-
tributions is again infinitely divisible. In particular if is infinitely divisible
so is any translate a for any real a.
We saw in the last section that the asymptotic behavior of n An can
be investigated by means of the asymptotic behavior of n An and the
characteristic function n of n has a very special form
Y
kn
n = exp[ n,j (t) 1 + i t an,j ]
j=1
Z
X
kn X
kn

= exp [ ei t x 1 ] dn,j + i t an,j
j=1 j=1
Z

= exp [ ei t x 1 ] dMn + i t an
Z Z

= exp [eitx
1 i t (x) ] dMn + i t [ (x) dMn + an ]
Z

= exp [ ei t x 1 i t (x) ] dMn + i t bn . (3.22)
We can make any reasonable choice for () and we will need it to be a

bounded continuous function with
|(x) x| C|x|3
2 , or (x) = x for |x| 1 and sign (x)

x
near 0. Possible choices are (x) = 1+x
for |x| 1. We now investigate when such things will have a weak limit.
Convoluting with An only changes bn to bn + An .
First we note that
Z

(t) = exp [ ei t x 1 i t (x) ] dM + i t a
is a characteristic function for any measure M with finite total mass. In fact it
is the characteristic function of an infinitely divisible probability distribution.
It is not necessary that M be a finite measure for to make sense. M could
be infinite, but in such a way that it is finite on {x : |x| } for every > 0,
and near 0 it integrates x2 i.e.,
M[x : |x| ] < for all > 0, (3.23)

Z
x2 dM < . (3.24)
|x|1
To see this we remark that

Z

|x|
is a characteristic function for each > 0 and because
|ei t x 1 i t x | CT x2
for |t| T , (t) (t) uniformly on bounded intervals where (t) is given
by the integral
Z

which converges absolutely and defines a characteristic function. Let us call

measures that satisfy (3.23) and (3.24), that can be expressed in the form
Z
x2
dM < (3.25)
1 + x2
admissible Levy measures. Since the same argument applies to M n

and na
instead of M and a, for any admissible Levy measure M and real number a,
(t) is in fact an infinitely divisible characteristic function. As the normal
distribution is also an infinitely divisible probability distribution, we arrive
at the following
Theorem 3.20. For every admissible Levy measure M, 2 > 0 and real a
Z
2 t2
2
is the characteristic function of an infinitely divisible distribution .
We will denote this distribution by = e (M, 2 , a). The main theorem

of this section is
Theorem 3.21. In order that n = e (Mn , n2 , an ) may converge to a limit

it is necessary and sufficient that = e (M, 2 , a) and the following three
conditions (3.26) (3.27) and (3.28) are satisfied.
For every bounded continuous function f that vanishes in some neighborhood
of 0,
Z Z
lim f (x)dMn = f (x)dM. (3.26)
n
For some ( and therefore for every) ` > 0 such that ` are continuity points
for M, i.e., M{ `} = 0
Z ` Z `
2 2 2 2
lim n + x dMn = + x dM . (3.27)
n ` `
an a as n . (3.28)
Proof. Let us prove the sufficiency first. Condition (3.26) implies that for
every ` such that ` are continuity points of M
Z Z
lim [eitx
1 i t (x) ] dMn = [ ei t x 1 i t (x) ] dM
n |x|` |x|`
and because of condition (3.27), it is enough to show that
Z `
t2 x2
lim lim sup [ ei t x 1 i t (x) + ] dMn
`0 n ` 2
Z `
t2 x2
[eitx
1 i t (x) + ] dM
` 2
= 0
in order to conclude that
Z
n2 t2
lim + [e itx
1 i t (x)] dMn
n 2
Z
2 t2
= + [e itx
1 i t (x)] dM .
2
This follows from the estimates

itx t2 x2
e 1 i t (x) + CT |x|3
2
and Z Z
` `
3
|x| dMn ` |x|2 dMn .
` `
Condition (3.28) takes care of the terms involving an .

We now turn to proving the necessity. If n has a weak limit then the
absolute values of the characteristic functions |n (t)| are all uniformly close
to 1 near 0. Since
Z
n2 t2
|n (t)| = exp (1 cos t x) dMn
2
taking logarithms we conclude that

2 Z
n t
lim sup + (1 cos t x) dMn = 0.
t0 n 2
This implies (3.29), (3.30) and (3.31 )below.
For each ` > 0,
sup Mn {x : |x| `} < (3.29)

n
lim sup Mn {x : |x| A} = 0. (3.30)

A n
For every 0 ` < ,
Z `
2 2
sup n + |x| dMn < . (3.31)
n `
We can choose a subsequence of Mn (which we will denote by Mn as well)

that converges in the sense that it satisfies conditions (3.26) and (3.27) of
the Theorem. Then e (Mn , n2 , 0) converges weakly to e (M, 2 , 0). It is not
hard to see that for any sequence of probability distributions n if both n
and n an converge to limits and respectively, then necessarily = a
for some a and an a as n . In order complete the proof of necessity
we need only establish the uniqueness of the representation, which is done in
the next lemma.
Lemma 3.22. (Uniqueness). Suppose = e (M1 , 12 , a1 ) = e (M2 , 22 , a2 ),
then M1 = M2 , 12 = 22 and a1 = a2 .
Proof. Since (t) never vanishes by taking logarithms we have
Z
12 t2
(t) = + [eitx
1 i t (x) ] dM1 + i t a1
2
Z
22 t2
= + [eitx
1 i t (x) ] dM2 + i t a2 . (3.32)
2
We can verify that for any admissible Levy measure M

Z
1
lim [ ei t x 1 i t (x) ] dM = 0.
t t2
Consequently
(t)
lim = 12 = 22
t t2
leaving us with
Z
(t) = [e itx
1 i t (x) ] dM1 + i t a1
Z
= [e itx
1 i t (x) ] dM2 + i t a2
for a different . If we calculate
(t + s) + (t s)
H(s, t) = (t)
2
we get Z Z
itx
e (1 cos s x)dM1 = ei t x (1 cos s x)dM2
for all t and s. Since we can and do assume that M{0} = 0 for any admissible
Levy measure M we have M1 = M2 . If we know that 12 = 22 and M1 = M2
it is easy to see that a1 must equal a2 .
Finally
Corollary 3.23. (Levy-Khintchine representation ) Any infinitely di-

visible distribution admits a representation = e (M, 2 , a) for some admis-
sible Levy measure M, 2 > 0 and real number a.
Proof. We can write = n n = n n n with n terms. If we show

that n 0 then the sequence is uniformly infinitesimal and by the earlier
theorem on accompanying laws will be the limit of some n = e (Mn , 0, an )
and therefore has to be of the form e (M, 2 , a) for some choice of admissible
Levy measure M, 2 > 0 and real a. In a neighborhood around 0, (t) is

close to 1 and it is easy to check that
1
n (t) = [(t)] n 1
as n in that neighborhood. As we saw before this implies that n 0 .
Applications.
1. Convergence to the Poisson Distribution. Let {Xn,j : 1 j kn } be

kn independent random variables taking the values 0 or 1 with proba-
bilities 1 pn,j and pn,j respectively. We assume that
lim sup pn,j = 0

n 1jkn
which is the uniform infinitesimality

P n condition. We are interested in
the limiting distribution of Sn = kj=1 Xn,j as n . Since we have
to center by the mean we can pick any level say 12 for truncation. Then
the truncated means are allP 0. The accompanying P laws are given by
e (Mn , 0, an ) with Mn = ( pn,j )1 andPan = ( pn,j ) (1). It is clear
that a limit exists if and only if n = pn,j has a limit as n
and the limit in such a case is the Poisson distribution with parameter
.
Pn
2. Convergence to the normal distribution. If the limit of Sn = kj=1 Xn,j
of kn uniformly infinitesimal mutually independent random variables
exists, then the limit is Normal if and only if M 0. If an,j is the
centering needed, this is equivalent to
X
lim P [|Xn,j an,j | ] = 0
n
j
for all > 0. Since limn supj |an,j | = 0, this is equivalent to

X
lim P [|Xn,j | ] = 0
n
j
for each > 0.

3. The limiting variance and the mean are given by

X
2 2
= lim E [Xn,j an,j ] : |Xn,j an,j | 1
n
j
and X
a = lim an,j
n
j
where Z
an,j = x dn,j
|x|1
P that E[X2 n,j ] = 02 for all 1 2j kn and n. Assume that

Suppose
2
n = j E{[Xn,j ] } and = limn n exists. What do we need in
order to make sure that the limiting distribution is normal with mean
0 and variance 2 ? Let n,j be the distribution of Xn,j .
Z 2 Z 2 Z

|an,j | =
2
x dn,j = x dn,j n,j [ |x| > 1 ] |x|2 dn,j
|x|1 |x|>1
and
X
kn X Z
2 2
|an,j | |x| dn,j sup n,j [ |x| > 1 ]
1jkn
j=1 1jkn

n2 sup n,j [ |x| > 1 ]
1jkn
0.
Pkn
Because j=1 |an,j |2 0 as n we must have
XZ
2
= lim |x|2 dn,j
n |x|`
for every ` > 0 or equivalently

XZ
lim |x|2 dn,j = 0
n |x|>`
for every ` establishing the necessity as well as sufficiency in Lindebergs

Theorem. A simple calculation shows that
X XZ XZ
|an,j | |x| dn,j |x|2 dn,j = 0
j j |x|>1 j |x|>1
establishing that the limiting Normal distribution has mean 0.
Exercise
P 3.26. What happens in the Poisson limit theorem (applicationSn1) if
n
n = j pn,j as n ? Can you show that the distribution of
n
converges to the standard Normal distribution?
3.9 Laws of the iterated logarithm.

When we are dealing with a sequence of independent identically distributed
random variables X1 , , Xn , with mean 0 and variance 1, we have a
strong law of large numbers asserting that

X 1 + + Xn
P lim =0 =1
n n
and a central limit theorem asserting that

Z a
X1 + + Xn 1 x2
P a exp[ ] dx
n 2 2
It is a reasonable question to ask if the random variables X1 ++X

n
n
themselves
converge to some limiting random variable Y that is distributed according
to the the standard normal distribution. The answer is no and is not hard
to show.
Lemma 3.24. For any sequence nj of numbers ,

P lim sup X1 + + Xnj nj = + = 1
j
Proof. Let us define

Z = lim sup X1 + + Xnj nj
j
3.9. LAWS OF THE ITERATED LOGARITHM. 93
which can be +. Because the normal distribution has an infinitely long

tail, i.e the probability of exceeding any given value is positive, we must have

P Za >0
for any a. But Z is a random variable that does not depend on the par-
ticular values of X1 , , Xn and is therefore a set in the tail -field. By
Kolmogorovs zero-one law P Z a must be either 0 or 1. Since it cannot
be 0 it must be 1.
Since we know that X1 ++X

n
n
0 with probability 1 as n , the
question arises as to the rate at which this happens. The law of the iterated
logarithm provides an answer.
Theorem 3.25. For any sequence X1 , , Xn , of independent identi-

cally distributed random variables with mean 0 and Variance 1,

X1 + + Xn
P lim sup = 2 = 1.
n n log log n
We will not prove this theorem in the most general case which assumes
only the existence of two moments. We will assume instead that E[|X|2+ ] <
for some > 0. We shall first reduce the proof to an estimate on the
Sn
tail behavior of the distributions of n
by a careful application of the Borel-
Cantelli Lemma. This estimate is obvious if X1 , , Xn , are themselves
normally distributed and we will show how to extend it to a large class of
distributions that satisfy the additional moment
condition. It is clear that
we are interested in showing that for > 2,

p
P Sn n log log n infinitely often = 0.
It would
be sufficient because of Borel-Cantelli lemma to show that for any
> 2,
X p

P Sn n log log n < .
n
This however is too strong. The condition of the Borel-Cantelli lemma is

not necessary in this context because of
the strong dependence between the
partial sums Sn . The function (n) = n log log n is clearly well defined and
for n 3 and it is sufficient for our purposes to show that

non-decreasing
for any > 2 we can find some sequence kn of integers such that
X
P sup Sj (kn1 ) < . (3.33)
n kn1 jkn
This will establish that with probability 1,

supkn1 jkn Sj
lim sup
n (kn1)
or by the monotonicity of ,
Sn
lim sup
n (n)

with probability 1. Since > 2 is arbitrary the upper bound in the law
of the iterated logarithm will follow. Each term in the sum of 3.33 can be
estimated as in Levys inequality,

P sup Sj (kn1) 2 P Skn ( ) (kn1)
kn1 jkn
with 0 < < , provided

1
sup P |Sj | (kn1) .
1jkn kn1 2
Our choice of kn will be kn = [n ] for some > 1 and therefore
(kn1)
lim =
n kn
and by Chebychevs inequality, for any fixed > 0,

E[Sn2 ]
sup P |Sj | (kn1)
1jkn [(kn1)]2
kn
=
[(kn1)]2
kn
= 2
kn1 log log kn1
= o(1) as n . (3.34)

By choosing small enough so that > 2 it is sufficient to show
that for any 0 > 2,
X
0
P Skn (kn1 ) < .
n
(kn1 )
By picking sufficiently close to 1, ( so that 0 > 2), because (kn )
=
1 we can reduce this to the convergence of

X
P Skn (kn ) < (3.35)
n

for all > 2.
2
If we use the estimate P [X a] exp[ a2 ] that is valid for the standard
normal distribution, we can verify 3.35.
X 2 ((kn ))2
exp <
n
2 kn

for any > 2.
To prove the lower bound we select again a subsequece, kn = [n ] with
some > 1, and look at Yn = Skn+1 Skn , which are now independent
random variables. The tail probability of the Normal distribution has the
lower bound
Z
1 x2
P [X a] = exp[ ]dx
2 a 2
Z
1 x2
exp[ x](x + 1)dx
2 a 2
1 (a + 1)2
exp[ ].
2 2
If we assume Normal like tail probabilities we can conclude that
X X
1 (kn+1) 2
P Yn (kn+1) exp [1 + p ] = +
n n
2 (n+1 n )
2

provided 2(1) < 1 and conclude by the Borel-Cantelli lemma, that Yn =
Skn+1 Skn exceeds (kn+1) infinitely often for such . On the other hand
from the upper bound we already have (replacing Xi by Xi )

Skn 2
P lim sup = 1.
n (kn+1)
Consequently
s
Skn+1 2( 1) 2
P lim sup =1
n (kn+1)
and therefore,
s
Sn 2( 1) 2
P lim sup = 1.
n (n)
We now take arbitrarily large and we are done.

We saw that the law of the iterated logarithm depends on two things.
2
(i). For any a > 0 and p < a2 an upper bound for the probability
p
P [Sn a n log log n] Cp [log n]p
with some constant Cp

a2
(ii). For any a > 0 and p > 2
a lower bound for the probability
p
P [Sn a n log log n] Cp [log n]p
with some, possibly different, constant Cp .

Both inequalities can be obtained from a uniform rate of convergence in
the central limit theorem.
Z
S 1 x2
sup P { a} exp[ ] dx Cn
n
(3.36)
a n a 2 2
for some > 0 in the central limit theorem. Such an error estimate is
provided in the following theorem
Theorem 3.26. (Berry-Esseen theorem). Assume that the i.i.d. se-
quence {Xj } with mean zero and variance one satisfies an additional moment
condition E|X|2+ < for some > 0. Then for some > 0 the estimate
(3.36) holds.
Proof. The proof will be carried out after two lemmas.

Lemma 3.27. Let < a < b < be given and 0 < h < ba
2
be a small
positive number. Consider the function fa,b,h (x) defined as

0 for < x a h

xa+h
for a h x a + h
2h
fa,b,h (x) = 1 for a + h x b h

1 xb+h for b h x b + h

2h
0 for b + h x < .
For any probability distribution with characteristic function (t)

Z Z
1 ei a y ei b y sin h y
fa,b,h (x) d(x) = (y) dy.
2 iy hy
Proof. This is essentially the Fourier inversion formula. Note that
Z
fa,b,h (x) = ei x y dy.
2 iy hy
We can start with the double integral
Z Z
ei x y dy d(x)
2 iy hy
and apply Fubinis theorem to obtain the lemma.
Lemma 3.28. If , are two probability measures with zero mean having
(), () for respective characteristic functions. Then
Z Z
1 ei a y sin h y
fa,h (x) d( )(x) = [(y) (y)] dy
2 iy hy
where fa,h (x) = fa,,h (x), is given by

0 for < x a h
fa,h (x) = xa+h
for a h x a + h
2h

1 for a + h x < .
Proof. We just let b in the previous lemma. Since |(y) (y)| = o(|y|),
there is no problem in applying the Riemann-Lebesgue Lemma. We now
proceed with the proof of the theorem.
Z
[[a, )] fah,h (x) d(x) [[a 2h, )]
and Z
[[a, )] fah,h (x) d(x) [[a 2h, )].
Therefore if we assume that has a density bounded by C,

Z
[[a, )] [[a, )] 2hC + fah,h (x) d( )(x).
Since we get a similar bound in the other direction as well,
Z

sup |[[a, )] [[a, )]| sup fah,h (x) d( )(x)
a a
+ 2hC
Z
1 | sin h y |
|(y) (y)| dy
2 h y2
+ 2hC. (3.37)
Now we return to the proof of the theorem. We take to be the distribu-

tion of Sn
n
having as its characteristic function n (y) = [( yn )]n where (y)
is the characteristic function of the common distribution of the {Xi} and has
the expansion
y2
(y) = 1 + O(|y|2+)
2
for some > 0. We therefore get, for some choice of > 0,
y2 |y|2+
|n (y) exp[ ]| C if |y| n 2+ .
2 n

Therefore for = 2+
Z 2
n (y) exp[ y ] | sin h y | dy
2 h y2
Z
y 2 | sin h y |
= n (y) exp[ ] dy
|y|n h y2
Z
2
+ n (y) exp[ y ] | sin h y | dy
|y|n h y2
Z Z
C |y| dy
dy +
|y|n |y|
h 2
|y|n n
n(+1) + n
C
h
C
=
hn +2
Substituting this bound in 3.37 we get
C
sup |n [[a, )] [[a, )]| C1 h + .
a h n 2+
By picking h = n 2(2+) we get

sup |n [[a, )] [[a, )]| C n 2(2+)

and we are done.


Probability Theory Notes Chapter 3 Varadhan

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Probability Theory Notes Chapter 3 Varadhan

Uploaded by

Copyright:

Available Formats

Chapter 3

3.1 Independence and Convolution

Definition 3.1. Two events A and B are said to be independent if

Exercise 3.1. If A and B are independent prove that so are Ac and B.

Definition 3.2. Two random variables X and Y are independent if the

for all Borel sets A and B.

There is a natural extension to a finite or even an infinite collection of

Definition 3.3. A finite collection collection {Xj : 1 j n} of random

Definition 3.4. An infinite collection of random variables is said to be in-

Lemma 3.1. Two random variables X, Y defined on (, , P ) are indepen-

Proof. Left as an exercise.

In terms of characteristic function, we can express the characteristic func-

For i 6= j, because of independence

E[Xi E(Xi )][Xj E(Xj )] = E[Xi E(Xi )]E[Xj E(Xj )] = 0

and we get the formula

Var(Sn ) = Var(X1 ) + Var(X2 ) + + Var(Xn ). (3.3)

3.2 Weak Law of Large Numbers

P {Xi = 1} = p and P {Xi = 0} = 1 p. Therefore E(Sn ) = np and

inequality states that for any random variable X

Theorem 3.2. If X1 , X2 . . . , Xn , . . . is a sequence of independent identically

for any > 0.

Proof. Use Chebychevs inequality to estimate

As n , the truncated random variables XiC are bounded and indepen-

lim sup n 2E[|YiC |].

If Cwe now let the cutoff level C to go to infinity, by the integrability of Xi ,

for any sequence Yn of random variables, E[|Yn |] 0 implies that Yn 0 in

lim n (t) = exp[ i m t ]

which is the characteristic function of the distribution degenerate at m.

3.3 Strong Limit Theorems

Lemma 3.4. (Borel-Cantelli lemma). If

If the events An are mutually independent the converse is also true.

Remark 3.1. Note that the complementary event

The cnclusion of the next exercise will be used in the proof.

Theorem 3.5. (Kolmogorovs Consistency Theorem). Given a con-

Proof. The consistency is just what is required to be able to define P on F

Dn Bn , Dn and each Dn is nonempty. If we take (n) = {xnj : j 1}

Remark 3.2. Kolmogorovs consistency theorem remains valid if we replace

3.4 Series of Independent Random variables

where X1 , , Xn are n independent random variables.

Proof. Let Ek be as in the previous lemma.

On the other hand,

We are now ready to prove

(i) The distribution n of Sn = X1 + + Xn converges weakly to a prob-

(ii) The random variable Sn = X1 + + Xn converges in probability to a

(iii) The random variable Sn = X1 + + Xn converges with probability 1

is a convergent infinite product. Since the limit (t) is continuous at t = 0

By Exercise 3.10 below, this implies that for all t,

and consequently, the distribution of Sn Sm converges to the distribution

and this follows from (ii) and Levys inequality.

Exercise 3.11. Prove that if a sequence Sn of random variables is a Cauchy

then there is a random variable S such that Sn S in probability, i.e for

Exercise 3.12. Prove that if a sequence Sn of random variables satisfies

Exercise 3.13. Prove that whenever Xn X in probability the distribution

Now it is straightforward to find sufficient conditions for the convergence

converges with probability 1.

We can also prove convergence in probability

by a simple application of Chebychevs inequality and then apply Levys

Theorem 3.11. (Kolomogorovs two series theorem). Let ai = E[Xi ]

Proof. Define Yi = Xi ai and apply the previous (one series) theorem to

Of course in general random variables need not have finite expectations

by an application of Borel-Cantelli Lemma, P with Probabilty

Theorem 3.12. (Kolmogorovs three series theorem). For P the con-

Proof. Let Fn = { : |S1 | `, |S2 | `, , |Sn | `} where Sk = X1 + +

and on the other hand,

Since, > 0 is arbitrary, we have

uniformly in n. Finally for each > 0,

by Lindebergs condition. Since > 0 is arbitrary the result is proven.