You are on page 1of 66

A First Look at Rigorous Probability Theory

Second Edition: Solutions

June 26, 2016

Contents
1.3 Exercises

2.2 Exercises

2.3 Exercises

2.4 Exercises

2.5 Exercises

2.6 Exercises

2.7 Exercises

10

3.1 Exercises

24

3.2 Exercises

24

3.6 Exercises

25

4.1 Exercises

34

4.3 Exercises

34

4.5. Exercises

36

5.5 Exercises

46

6.2 Exercises

52

6.3 Exercises

53

7.2 Exercises

60

1.3 Exercises
Exercise 1.3.1. Suppose that = {1, 2}, with P() = 0 and P{1, 2} = 1.
Suppose P{1} = 14 . Prove that P is countably additive if and only if P{2} = 34 .
Proof. (=) Assume P is countably additive. Since {1} {2} = , countable additivity implies P{1} + P{2} = P{1, 2}. Then, subtracting P{1} from both sides
of the equation we have P{2} = P{1, 2} P{1}. Finally, substituting in P{1} = 41
and P{1, 2} = 1 we obtain P{2} = 1 41 = 34 .
(=) Assume P{2} = 34 . In order to show that P is countably additive, we
do an exhaustive proof by considering all disjoint subsets of . All subsets A
are disjoint with , and since P() = 0, we have P(A) + P() = P(A) + 0 = P(A).
The only sets left which are disjoint are {1} and {2}. By assumption, we have
P{1} + P{2} = 41 + 34 = 1 = P{1, 2}. Therefore P is countably additive.
Exercise 1.3.2. Suppose = {1, 2, 3} and F is the collection of all subsets of .
Find (with proof) necessary and sufficient conditions on the real numbers x, y, and
z, such that there exists a countably additive probability measure P on F, with
x = P{1, 2}, y = P{2, 3}, and z = P{1, 3}.
Proof. In order for P to be a probability measure we must have P() = 1 and
thus P() = 0. All sets A F are disjoint with , and since P() = 0, we have
P(A) + P() = P(A) + 0 = P(A). We must also have P satisfy the following
conditions on disjoint sets:
P{1} + P{2} + P{3} = P{1, 2, 3} = 1
P{1} + P{2} = P{1, 2} = x
P{2} + P{3} = P{2, 3} = y
P{1} + P{3} = P{1, 3} = z
Substituting the last three equations into the first individually gives all combinations of disjoint sets in F. Adding the last three equations and substituting in the
first gives us x + y + z = 2P{1} + 2P{2} + 2P{3} = 2(P{1} + P{2} + P{3}) = 2.
Since A F, we must have P(A) [0, 1], and thus we have derived necessary and
sufficient conditions on x, y, and z, namely {x, y, z [0, 1]; x + y + z = 2}.
Exercise 1.3.3. Suppose that = N is the set of positive integers, and P is
defined for all A by P(A) = 0 if A is finite, and P(A) = 1 if A is infinite. Is
P finitely additive?

Proof. We show that P is NOT finitely additive by counterexample. Consider the


infinite set of odd numbers A = {1, 3, 5, . . . } and the infinite set of even numbers
B = {2, 4, 6, . . . }. We have that A B = and P(A) + P(B) = 1 + 1 = 2. But
A B = N implies that P(A B) = P(N) = 1. Since 2 6= 1, we have that
P(A) + P(B) 6= P(A B). Therefore P is NOT finitely additive.
Exercise 1.3.4. Suppose that = N and P is defined for all A by P(A) = |A|
if A is finite (where |A| is the number of elements in the subset A), and P(A) =
if A is infinite. This P is of course not a probability measure (in fact it is a counting
measure), however we can still ask the following. (By convention, + = .)
(a) Is P finitely additive?
(b) Is P countably additive?
Proof.
(a) Yes. Suppose A, B such that A B = . We consider two cases:
1. Suppose |A|, |B| < . Then |A B| < and |A| + |B| = |A B|.
Therefore P(A) + P(B) = |A| + |B| = |A B| = P(A B).
2. Suppose at least one of A and B contains an infinite number of elements.
Then |A B| = and P(A) + P(B) = P(A B) = .
(b) Yes. Induct on the two cases in (a).

Exercise 1.3.5.
(a) In what step of the proof of Proposition 1.2.6 was (1.2.1) used?
(b) Give an example of a countably additive set function P, defined on all subsets
of [0,1], which satisfies (1.2.3) and (1.2.5), but not (1.2.1).
Proof.
(a) On page 4, (1.2.1) is used to show 1 = P((0, 1]).
(b) Let F be the collection of all subsets of [0, 1]. Consider P : F [0, 1] defined
by P(A) = 0. The set function P clearly satisfies (1.2.3) and (1.2.5), but fails
to satisfty (1.2.1). For example 0 = P([0, 1]) 6= 1 0 = 1.

2.2 Exercises
Exercise 2.2.3. Prove that the above collection J is a semialgebra of subsets of
, meaning that it contains and , it is closed under finite intersection, and the
complement of any element of J is equal to a finite disjoint union of elements of
J.
Proof. We have that = [0, 0) (or any empty interval), and = [0, 1]. Therefore
by definition, , J . Now let A, B J be intervals with endpoints a0 , a1 and
b0 , b1 respectively. Then A and B are intervals contained in [0,1] and A B is
either empty (and hence an empty interval) or an interval with endpoints max{a0 , b0 }
and min{a1 , b1 }. Thus we have A B J . The proof is similar for any finite number of elements A1 , A2 , A3 , J (simply induct on the previous argument) and
therefore J is closed under finite intersection.
Note C = = [0, 1] J and C = = [0, 0) J . Now, let A J be
an interval of the form [a, b). If a = 0, then AC = [b, 1] J and if a 6= 0,
then AC = [0, a) [b, 1] where [0, a), [b, 1] J and thus AC J . The proof for
intervals of the form (a, b), [a, b], (a, b] are similar. Therefore the collection J is a
semialgebra.
Exercise 2.2.5.
(a) Prove that B0 is an algebra (or, field ) of subsets of , meaning that it contains
and , and is closed under the formation of complements and of finite
unions and intersections.
(b) Prove that B0 is not a -algebra.
Proof.
(a) Since , J , we have that , B0 . Since the finite union of a finite union
of intervals is itself a finite union of intervals, we have that B0 is closed under
finite unions.
Similarly, a finite intersection of finite unions of intervals is also a finite union
of intervals. To see this let A, B B0 be finite
S unions
 ofS intervals A =
SN A
SNB
NB
NB
i=1 Ai and B =
i=1 Bi . Then A B = A
i=1 Bi =
i=1 (A Bi ) =



SN B
SNA
SNB SNA
i=1
j=1 Aj Bi =
i=1 j=1 (Aj Bi ) and since J is a semialgebra,
we have Aj Bi J , and thus A B B0 . The proof is similar for any finite
intersection of elements A1 , A2 , A3 , B0 (simply induct on the previous
argument) and therefore B0 is closed under finite intersection.
Since J is a semialgebra, we have that if A J thenSAC is a finite disjoint
N
union of intervals and thus AC B0 . Thus if B = i=1 Ai where Ai J

for i = 1, 2, . . . , N , then by DeMorgans laws B C =


intersection of elements in B0 and thus B C B0 .

TN

i=1

AC
i is a finite

(b) Note that n1 = [ n1 , n1 ] B0 for all n = 1, 2, 3, . . . . We will show that the set A =
{1, 12 , 13 , . . . }
/ B0 by contradiction, and thus prove that B0 is not a -algebra.
Assume A B0 is a finite union of (WLOG) nonempty disjoint intervals
SNA
Ai , since if Ai Aj 6= for some i, j, are intervals with endpoints
A=
i=1

i , i and j , j , then Ai Aj is an interval with endpoints min{i , j }


and max{i , j }. Then replace Ai Aj with Ck = Ai Aj . Continue in
this manner until all intervals are disjoint. Since
/ A, simply remove any
empty sets from the union. Since all Ai are nonempty disjoint intervals, we
must have that each Ai = [ n1i , n1i ] for some ni . Let n = maxi {ni }, then
1
1
we have n+1

/ Ai for any i = 1, 2, . . . , NA , and thus n+1

/ A leading to a
contradiction.

2.3 Exercises
Exercise 2.3.16. Prove that the extension (, M, P ) is a probability measure
on M, and that P is an extension of P.
Proof. Let A M with P (A) = 0 and let B A. Assume E . By subadditivity we have 0 6 P (B E) 6 P (A E) 6 P (A) = 0, so that P (B E) = 0.
Therefore P (B E) + P (B C E) = P (B C E) 6 P (E). Thus (2.3.8) is
satisfied and B M.

2.4 Exercises
Exercise 2.4.3.
Sn
(a) Prove that if I1 , I2 , . . . , In is a finite collection of intervals, and if j=1 Ij I
Pn
for some interval I, then
j=1 P(Ij ) > P(I). [Hint: Imitate the proof of
Proposition 2.4.2.]
(b) Prove
S that if I1 , I2 , . . . is a countable collection
P of open intervals, and if
I

I
for
some
closed
interval
I,
then
j
j=1 P(Ij ) > P(I). [Hint: You
j=1
may use the Heine-Borel Theorem, which says that if a collection of open
intervals contain a closed interval, then some finite sub-collection of the open
intervals also contains the closed interval.]
(c) Verify (2.3.3),
Pcountable collection of intervals,
S i.e. prove that if I1 , I2 , . . . is any
and if j=1 Ij I for any interval I, then j=1 P(Ij ) > P(I). [Hint: Extend

the interval Ij by 2j at each end, and decrease I by  at each end, while
making Ij open and I closed. Then use part (b).]
Proof.
(a) Let I1 ,S
. . . , In be intervals contained in [0, 1]. Let I be an interval such that
n
I j=1 Ij . If I = there is nothing to prove. Suppose I 6= . For
1 6 j 6 n, write aj for the left end-point of Ij , bj for the right end-point
of Ij ordered such that a1 6 a2 6 . . . 6 an , and a the left endpoint of
I and b the right endpoint of I. Then for each j = 1, . . . , nP
1, we have
n
(bj aj ) + (bj+1 aj+1 ) > max{bj , bj+1 } aj . Thus we have j=1 P(Ij ) =
Pn
j=1 (bj aj ) > max{b1 , . . . , bn } a1 > b a = P(I).
(b) Let I1 , I2 . . . be a countable
S collection of open intervals. Let I be a closed
interval such that I j=1 Ij . If I = there is nothing to prove. Suppose I 6=
. By the Heine-Borel
Theorem there exists a finiteP
subcollection Ii1 , . . . , Iin
Sn
n
such that I j=1 Iij . By part (a) we have that j=1 P(Iij ) > P(I). Let
Ps
Pn
s = max{i1 , . . . , in }, then j=1 P(Ij ) > j=1 P(Iij ) > P(I). Then for any
Pk
P
k > s we have j=1 P(Ij ) > P(I) and thus j=1 P(Ij ) > P(I)
(c) Let I1 , I2 . .S
. be a countable collection of intervals. Let I be an interval such

that I j=1 Ij . If I = there is nothing to prove. Suppose I 6= . For


each j, write aj for the left end-point of Ij , bj for the right end-point of
Ij , and a the left endpoint of I and b the right endpoint of I. Extend each
Ij to an open interval Ij = (aj 2j+1 , bj + 2j+1 ) and shrink I to the
closed interval I = [a + 2 , b 2 ] for someP > 0 small enough
P such that

I is nonempty. By part (b) we have that j=1 P(Ij ) = j=1 (bj aj ) +


P
P
(2j ) = j=1 P(Ij ) +  > P(I ) = b a +  = P(I) + . Thus we have
j=1
P
j=1 P(Ij ) > P(I).

Exercise 2.4.5. Let A = {(, x]; x R}. Prove that (A) = B, i.e. that the
smallest -algebra of subsets of R which contains A is equal to the Borel -algebra
of subsets of R. [Hint: Does (A) include all intervals?]
Proof. Let J = {all intervals contained in R} so that the Borel -algebra on R is
B = (J ). Let A = {(, x]; x R}. Clearly A J and thus (A) B. We
will show (A) J and thus (A) B which will complete the proof. We will do
this case by case. Let a, b R such that a 6 b:
Case (, b]: (, b] (A) by definition.
S
Case (, b): (, b) = j=1 (, b 1j ] (A).
Case (a, ): (a, ) = (, a]C (A).
T
Case [a, ): [a, ) = j=1 (a 1j , ) (A).
6

Case (a, b]: (a, b] = (a, ) (, b] (A).


Case (a, b): (a, b) = (a, ) (, b) (A).
Case [a, b): [a, b) = [a, ) (, b) (A).
Case [a, b]: [a, b] = [a, ) (, b] (A).

Exercise 2.4.7.
(a) Prove that K, K C B, where B are the Borel subsets of [0, 1].
(b) Prove that K, K C M, where M is the -algebra of Theorem 2.4.4.
(c) Prove that K C B1 , where B1 is defined by (2.2.6).
(d) Prove that K
/ B1 .
(e) Prove that B1 is not a -algebra.
Proof.
(a) At each stage, n, of construction of the Cantor set, removing 2n1 open sets
corresponds to the intersection of the (n1)th stage set with the complement
of the 2n1 open sets. Thus K is the countable intersection of sets in B and
therefore K B. Since B is a -algebra, we have that K C B.
(b) Since M is a -algebra containing all intervals, we have B M. Therefore
K, K C M.
(c) In part (a) we showed that K is the countable intersection of intervals. Therefore, since the complement of an interval is the union of intervals, we have by
de Morgans laws K C is the countable union of intervals. Thus K C B1 .
S
(d) We prove by contradiction. Assume K = j=1 Ij of (possibly empty) intervals
Ij . Since (K) = 0, we have that (Ij ) = 0 for all j = 1, 2, . . . . Therefore Ij
is a singleton set or empty. Therefore K is the countable union of singleton
sets and empty sets and is therefore countable. This is a contradiction since
K is uncountable. Therefore K
/ B1 .
(e) We have that K C B1 by part (c) and K
/ B1 by part (d). Therefore B1 is
not a -algebra.

2.5 Exercises
Exercise 2.5.6. Suppose P satisfies (2.5.5) for finite disjoint collections {Dn }.
Suppose further that, whenever A1 , A2 , . . . are finite disjoint unions of elements of
7

T
J such that An+1 An and n=1 An = , we have limn P(An ) = 0 (where
P is extended to An by finite additivity). Prove
satisfies
S that PSalso
S(2.5.5) for
n
countable collections {Dn }. [Hint: Set An = ( i=1 Di )\( j=1 Dj ) = i=n+1 Di .]
S
Proof. Let D1 , D2 , J be disjoint with n Dn J . 

S
S
S
Sn
D
D1C DnC
Set An = ( i=1 Di )\( j=1 Dj ) = i=n+1 Di . Since An =
j
j
and J is a semialgebra, then T
An can be written as the disjoint union of elements of

J . Clearly
A

A
and
An = which implies that
n+1
n
 P(An ) = 0.
S limn
S


n=1
Sn
n

D
= P(An ) +
D
=
P(A
)
+
P
Now P
D
=
P
A

j
n
j
n
j=1 j
j
S

j=1

Pn
S
j=1 P(Dj ) for any given n. Thus P
j Dj = limn P
j Dj =


Pn
P
P
limn P(An ) + j=1 P(Dj ) = 0 + j=1 P(Dj ) = j=1 P(Dj ).

2.6 Exercises
Exercise 2.6.1.
(a) Verify that the above J is a semialgebra.
(b) Verify that the above J and P satisfy (2.5.5) for finite collections {Dn }. [Hint:
For a finite collection {Dn } J , there is k N such that the results of only
coins 1 through k are specified by any Dn . Partition into the corresponding
2k subsets.]
Proof.
(a) Let J = {Aa1 a2 ...an ; n N, a1 , a2 , . . . , an {0, 1}} {, }. Note , J
by definition. Now let A, B J . If A B = , then clearly A B J .
Suppose A B 6= , then A = Aa1 a2 ...anA and B = Bb1 b2 ...bnB for some
nA , nB N. Suppose (WLOG) nA 6 nB , then A B 6= imples that
ai = bi , for i = 1, . . . , nA . Therefore A B = B J . To show this
is true for any finite collection of elements, simply induct on the previous
argument. Let S = {c1 . . . cnA ; c1 , . . . , cnA {0, 1}}. Now partition into
S
the 2nA subsets = sS Cs . Therefore AC
a1 a2 ...anA = \Aa1 a2 ...anA =
S

Cc ...c . Clearly Cs J for each s S. Thus J is a


sS\{a1 ...anA }

nA

semialgebra.
S
(b) Let (WLOG) D1 , . . . , Dn J \{} be disjoint with j Dj J . For each
j, one can write Dj = Dd1j ...dmj j for some mj . Let k = maxj {mj }. Let
S = {a1 . . . ak ; a1 , . . . , ak {0, 1}}. Now partition into the 2k subsets =
S

sS As . Then for each j, let Sj = {a1 . . . ak S; ai = dij for 1 6 i 6 mj }


S
S
S
S
so that Dj = sSj As . Let T = j Sj so that j Dj = tT At . Thus,
S
 P
 P
P P
P
j Dj =
tT P(At ) =
j
sSj P(As ) =
j P(Dj ).

Exercise 2.6.4. Verify that the above J is a semialgebra, and that , J with
P() = 0 and P() = 1.
Proof. Let (1 , F1 , P1 ) and (2 , F2 , P2 ) be probability triples and = 1 2 .
Let J = {AB; A F1 , B F2 }. Define P(AB) = P1 (A)P2 (B) for AB J .
We first show that , J with P() = 0 and P() = 1. Clearly = J .
Also for any A, B pair where at least one of A and B is the emptyset we have at least
one of P1 (A) and P2 (B) equal to 0. Therefore P() = P(A B) = P1 (A)P2 (B) =
0. Also = 1 2 J and since both P1 (1 ) = 1 and P2 (2 ) = 1 we have
P() = P(1 2 ) = P1 (1 )P2 (2 ) = 1.

Now we verify that J is a semialgebra. Suppose A1 B1 , A2 B2 J . Then


(A1 B1 ) (A2 B2 ) = (A1 A2 ) (B1 B2 ). Since (A1 A2 ) F1 and
(B1 B2 ) F2 we have that (A1 A2 ) (B1 B2 ) J . This holds also for
any finite intersection of elements by inducting on the previous argument. Now
C
C
C

(A1 B1 )C is the disjoint union (A1 B1 )C = (AC


1 B1 ) (A1 B1 ) (A1 B1 ).
C
C
Since A1 F1 and B1 F2 , we have that A1 F1 and B1 F2 and thus
C
C
C
AC
1 B1 , A1 B1 , A1 B1 J . Therefore J is a semialgebra.

2.7 Exercises
Exercise 2.7.1. Let = {1, 2, 3, 4}. Determine whether or not each of the
following is a -algebra.
(a) F1 = {, {1, 2}, {3, 4}, {1, 2, 3, 4}}
(b) F2 = {, {3}, {4}, {1, 2}, {3, 4}, {1, 2, 3}, {1, 2, 4}, {1, 2, 3, 4}}
(c) F3 = {, {1, 2}, {1, 3}, {1, 4}, {2, 3}, {2, 4}, {3, 4}, {1, 2, 3, 4}}
Proof.
(a) F1 is a -algebra. Since F1 , it is sufficient to show that F1 is closed under
complement and countable unions. Well C = {1, 2, 3, 4} F1 , {1, 2}C =
{3, 4} F1 , and therefore F1 is closed under complement. For each A F1 ,
we have A = A F1 and A {1, 2, 3, 4} = {1, 2, 3, 4} F1 . Also
{1, 2} {3, 4} = {1, 2, 3, 4} F1 , and any countable union of elements will
be equal to one of the previous. Thus F1 is closed under countable unions.
(b) F2 is a -algebra. Since F2 , it is sufficient to show that F2 is closed
under complement and countable unions. Well C = {1, 2, 3, 4} F2 , {3}C =
{1, 2, 4} F2 , {4}C = {1, 2, 3} F2 , and {1, 2}C = {3, 4} F2 , and therefore
F2 is closed under complement. For each A F2 , we have A = A F2
and A {1, 2, 3, 4} = {1, 2, 3, 4} F2 . The rest of the unions are clearly in
F2 , but are too tedious to list here.
(c) F3 is NOT a -algebra. {1, 2} {1, 3} = {1, 2, 3}
/ F3 , and therefore F3 is not
closed under countable union.

Exercise 2.7.2. Let = {1, 2, 3, 4}, and let J = {{1}, {2}}. Describe explicitly
the -algebra (J ) generated by J .
Proof. Let = {1, 2, 3, 4}, and let J = {{1}, {2}}. Any -algebra containing
J must contain and be closed under countable unions, intersections, and complements of elements of J . Therefore (J ), C = (J ), {1}C =
{2, 3, 4} (J ), {2}C = {1, 3, 4} (J ), {1} {2} = {1, 2} (J ) and
10

{1, 2}C = {3, 4} (J ). It can then easily be checked that


(J ) = {, {1}, {2}, {1, 2}, {3, 4}, {2, 3, 4}, {1, 3, 4}, {1, 2, 3, 4}} is a -algebra.

Exercise 2.7.3. Suppose F is a collection of subsets of , such that F.


(a) Suppose F is an algebra. Prove that F is a semialgebra.
(b) Suppose that whenever A, B F, then also A\B A B C F. Prove that
F is an algebra.
(c) Suppose that F is closed under complement, and also under finite disjoint
unions (i.e. whenever A, B F are disjoint, then A B F). Give a
counter-example to show that F might not be an algebra.
Proof.
(a) Since F is an algebra it is closed under complement, finite intersection and
finite union. Therefore C = F. For A F, we have AC F which is
the single element disjoint union AC F. Therefore F is a semialgebra.
(b) Suppose that whenever A, B F, then also A\B A B C F. We will
show that F is closed under complement and finite intersection. Suppose
A F, then AC = \A F (since F). Let A, B F, then B C F and
A B = A\B C F. Thus F is closed under finite intersection by simply
inducting on the previous argument. Therefore F is an algebra.
(c) Consider F = {, {1, 2}, {2, 3}, {3, 4}, {1, 4}, {1, 2, 3, 4}}. Clearly F and
F is closed under complement aand finite disjoint unions. However, {1, 2, 3}
/
F and thus, F is not closed under finite union and is thus not an algebra.

Exercise 2.7.4. Let F1 , F2 , . . . be a sequence of collections of subsets of , such


that Fn Fn+1 for each n.
S
(a) Suppose that each Fi is an algebra. Prove that i=1 Fi is also an algebra.
S
(b) Suppose that each Fi is a -algebra. Show (by counter-example) that i=1 Fi
might not be a -algebra.
Proof.
S
(a) Suppose that each Fi is an algebra and let F = i=1 Fi . Since F1 , we have
F. Let A F, then A Fi for some i. Thus AC Fi F. It remains
to show that F is closed under finite intersection. Let A1 , . . . , An F, then
for each 1 6 i 6 n, we have Ai Fki for some ki . Let k = maxi {ki }. Since
Fj Fj+1 for each j, we have that Fki Fk and thus A1 , . . . , An Fk .
Therefore A1 An Fk F.

11

(b) Let = N = {1, 2, 3, . . . }. Let F1 = ({1}) and for each i = 2, 3,S. . . , let

Fi = ({{1}, {2}, . . . , {i}}). Clearly Fi Fi+1 for each i. Let F = i=1 Fi .


We will prove F is not a -algebra by contradiction. Suppose F is a -algebra.
Then for each i, we have that {i} F, and thus F ({1}, {2}, . . . ). Since
for each i, we have that Fi is finite, then F is the countable union of finite
sets and is thus countable. However |F| > |({1}, {2}, . . . )| = |2N | = 20
which is a contradiction (here 2N is the power set of the natural numbers and
20 is the cardinality of the real numbers). Therefore F is not a -algebra.

Exercise 2.7.5. Suppose that = N is the set of positive integers, and F is the
set of all subsets A such that either A or AC is finite, and P is defined by P(A) = 0
if A is finite, and P(A) = 1 if AC is finite.
(a) Is F an algebra?
(b) Is F a -algebra?
(c) Is P finitely additive?
(d) Is P countably additive
2 , F are disjoint, and
S on F, meaning that
S if A1 , AP
if it happens that n An F, then P ( n An ) = n P(An )?
Proof.
(a) Yes, F is an algebra. First, F since |C | = || = 0. Let A F, then
by definition either A or AC is finite, and thus AC F. It remains to show
that F is closed under finite intersection. Let A1 , . . . , An F. If Ai is finite
for some i, then A1 An Ai is finite, and thus A1 An F.
If instead, for each j, Aj is infinite, then for each j, AC
j is finite. Thus
C
is
the
finite
union
of
finite
sets and is thus
(A1 An )C = AC

A
n
1
finite. Therefore A1 An F and F is an algebra.
(b) No, F is not a -algebra. For each j,SAj := {1, 3, 5, . . . , 2j 1} F since
Aj is finite. However {1, 3, 5, . . . } = j Aj
/ F since both {1, 3, 5, . . . } and
{1, 3, 5, . . . }C = {2, 4, 6, . . . } are infinite.
(c) Yes, P is finitely additive
S on F. Let (WLOG) A1 , . . . , An F\{, } be
disjoint and let A = j Aj . We will first show that at most one of AC
j is
and
AC
finite where 1 6 j 6 n. Suppose on the contrary that (WLOG) AC
1
2
C
C
C
C
are both finite. Let a = max{A1 A2 }. Then A1 A2 {1, 2, . . . , a} and
C C
thus A1 A2 = (AC
{1, 2, . . . , a}C = {a + 1, a + 2, . . . }. This
1 A2 )
contradicts the fact that A1 A2 = and therefore at most one of AC
j is
finite.
Now if none of AC
are finite, then each Aj is finite and therefore A is finite.
jP
P
Then P(A) = 0 = j 0 = j P(Aj ). If there exists k such that AC
k is finite,

12

P
then
A is infinite. Thus P(A) = 1 = j6=k 0 + 1 =
P Ak is infinite and therefore
P
j6=k P(Aj ) + P(Ak ) =
j P(Aj ). Therefore P is finitely additive on F.
(d) No, P is
S not countably additive on F. For each j N, let Aj = {j} with
A := j Aj F. For each j, Aj F since Aj is
clearly A
P finite, and P
is a disjoint union. Since each Aj is finite, we have j P(Aj ) = j 0 = 0.
However, A = N P
has finite complement AC = and thus P(A) = 1. Therefore
P(A) = 1 6= 0 = j P(Aj ) and P is not countably additive on F.

Exercise 2.7.6. Suppose that = [0, 1] is the unit interval, and F is the set of
all subsets A such that either A or AC is finite, and P is defined by P(A) = 0 if A
is finite, and P(A) = 1 if AC is finite.
(a) Is F an algebra?
(b) Is F a -algebra?
(c) Is P finitely additive?
(d) Is P countably additive on F (as in the previous exercise)?
Proof.
(a) Yes, F is an algebra. First, F since |C | = || = 0. Let A F, then
by definition either A or AC is finite, and thus AC F. It remains to show
that F is closed under finite intersection. Let A1 , . . . , An F. If Ai is finite
for some i, then A1 An Ai is finite, and thus A1 An F.
If instead, for each j, Aj is infinite, then for each j, AC
j is finite. Thus
C
is
the
finite
union
of
finite
sets and is thus
(A1 An )C = AC

A
n
1
finite. Therefore A1 An F and F is an algebra.
(b) No, F is not a -algebra. For each j, Aj := {1, 12 , 13 , . . . , 1j } F since Aj
S
is finite. However {1, 12 , 13 , . . . } = j Aj
/ F since both {1, 12 , 13 , . . . } and


S
1
{1, 12 , 13 , . . . }C = j j+1
, 1j are infinite.
(c) Yes, P is finitely additive
on F. Let (WLOG) A1 , . . . , An F\{, } be
S
disjoint with A := j Aj F. We will first show that at most one of AC
j
is finite where 1 6 j 6 n. Suppose on the contrary that (WLOG) AC
and
1
C
C
C C
C
=
AC
2 are both finite. Since both A1 and A2 are finite, then (A1 A2 )
C
[0, 1]\(AC

A
)
=
A

A
is
infinite
and
nonempty.
This
contradicts
the
1
2
2
1
fact that A1 A2 = and therefore at most one of AC
j is finite.
Now if none of AC
are finite, then each Aj is finite and therefore A is finite.
jP
P
Then P(A) = 0 = j 0 = j P(Aj ). If there exists k such thatP
AC
k is finite,
then
A
is
infinite
and
therefore
A
is
infinite.
Thus
P(A)
=
1
=
k
j6=k 0 + 1 =
P
P
P(A
)
+
P(A
)
=
P(A
).
Therefore
P
is
finitely
additive
on F.
j
k
j
j6=k
j

13

(d) Yes, P is countably S


additive on F. Let (WLOG) A1 , A2 , . . . , F \{, } be
disjoint with A := j Aj F. We will first show that at most one of AC
j
is finite. Suppose on the contrary that for some i1 , i2 , . . . , we have that
C
S
S C
C
C
=
AC
i1 , Ai2 , . . . are finite. Then
j Aij
j Aij is countable, and therefore

T

S
C
=
is infinite and nonempty. This contradicts the
[0, 1]\
j Aij
j Ai j
T

C
fact that
j Aij = and therefore at most one of Aj is finite.
Now if none of AC
j are finite, then each Aj is finite. If A is infinite it is
countable and thus AC is infinite (in fact uncountable)
P which
P contradicts
A F and therefore A is finite. Then P(A) = 0 = j 0 = j P(Aj ). If
there exists k such that AC
A is
k
Pis finite, then
P Ak is infinite and therefore
P
infinite. Thus P(A) = 1 = j6=k 0 + 1 = j6=k P(Aj ) + P(Ak ) = j P(Aj ).
Therefore P is countably additive on F.

Exercise 2.7.7. Suppose that = [0, 1] is the unit interval, and F is the set of all
subsets A such that either A or AC is countable (i.e., finite or countably infinite),
and P is defined by P(A) = 0 if A is countable, and P(A) = 1 if AC is countable.
(a) Is F an algebra?
(b) Is F a -algebra?
(c) Is P finitely additive?
(d) Is P countably additive on F?
Proof.
(a) Yes, F is an algebra. First, F since C = is countable. Let A F,
then by definition either A or AC is countable, and thus AC F. It remains
to show that F is closed under finite intersection. Let A1 , . . . , An F.
If Ai is countable for some i, then A1 An Ai is countable, and
A1 A2 An F. If for all j, Aj is uncountable, then for all j, AC
j is
C
countable. Thus (A1 An )C = AC

A
is
the
finite
union
of
n
1
countable sets and is thus countable. Therefore A1 An F and F is
an algebra.
(b) Yes, F is a -algebra. The argument is same as in part (a) replacing finite
intersections with countable intersections.
(c) Yes, P is finitely additive
on F. Let (WLOG) A1 , . . . , An F\{, } be
S
disjoint with A := j Aj F. We will first show that at most one of AC
j
is countable where 1 6 j 6 n. Suppose on the contrary that (WLOG) AC
1
C
C
and AC
2 are both countable. Since both A1 and A2 are countable, then
C C
C
C
(AC

A
)
=
[0,
1]\(A

A
)
=
A

A
is
uncountable
and nonempty.
1
2
1
2
1
2
14

This contradicts the fact that A1 A2 = and therefore at most one of AC


j
is countable.
Now if none of AC
j are countable,
P then each
P Aj is countable and therefore A
is countable. Then P(A) = 0 = j 0 = j P(Aj ). If there exists k such that
AC
and thereforeP
A is uncountable. Thus
k is countable,
P then Ak is uncountable
P
P(A) = 1 = j6=k 0 + 1 = j6=k P(Aj ) + P(Ak ) = j P(Aj ). Therefore P
is finitely additive on F.
(d) Yes, P is countably additive
on F. Let (WLOG) A1 , A2 , . . . , F \{, } be
S
disjoint with A := j Aj F. We will first show that at most one of
AC
that for some i1 , i2 , . . . , we have
j is countable. Suppose on the contrary
S C
C
,
.
.
.
are
countable.
Then
,
A
that AC
i
i1
j Ai is countable, and therefore
S
C 2
S

T
 j
C
C
= [0, 1]\
=
Aij is uncountable and nonempty.
j Ai j
j Ai j
T j 
This contradicts the fact that
= and therefore at most one of
j Ai j
AC
j is countable.
Now if none of AC
j are countable,
P then
Peach Aj is countable and thus A is
countable. Then P(A) = 0 = j 0 = j P(Aj ). If there exists k such that
AC
and thereforeP
A is uncountable. Thus
k is countable,
P then Ak is uncountable
P
P(A) = 1 = j6=k 0 + 1 = j6=k P(Aj ) + P(Ak ) = j P(Aj ). Therefore P
is countably additive on F.

Exercise 2.7.8. For the example of Exercise 2.7.7, is P uncountably additive (cf.
page 2)?
Proof. S
No, P is not uncountably additive. For each x , consider
P Ax = {x} F
where
A
=

F.
Then
each
A
is
countable,
and
thus
x
x x
x P(Ax ) = 0, but
S
P ( x Ax ) = P () = 1.
Exercise 2.7.9. Let F be a -algebra, and write |F| for the total number of
subsets in F. Prove that if |F| < (i.e., if F consists of just a finite number of
subsets), then |F| = 2m for some m N. [Hint: Consider those non-empty subsets
in F which do not contain any other non-empty subset in F. How can all subsets
in F be built up from these particular subsets?]
Proof. Let F be a -algebra with |F| < . Let D = {D1 , . . . , Dm } consist of
those non-empty subsets in F whichSdo not contain any other non-empty subset
n
in F. Then for each A F, A = j=1 Dkj , for some Dk1 , . . . , Dkn D. Let
G = {(r1 , r2 , . . . , rm ); ri = 0 or 1}. Consider the function : F G defined by
(A) = (r1 , r2 , . . . , rm ) where for 1 6 j 6 m, we have that rj = 1 if A Dj and
rj = 0 if A ) Dj . We will prove that is a bijection, and thus |F| = |G| = 2m .

15

To show that is onto, let (k1 , . . . , km ) G and letK = {j; kj = 1}. Then
S
S
= (k1 , . . . , km ). To show
jK Dj F since F is a -algebra and
jK Dj
is one-to-one, assume (A) = (B) = (s1 , . . . , sm ) for
S A, B F and let S =
{j; sj = 1}. Then by definition of , we have A = B = jS Dj . Therefore is a
bijection.
Exercise 2.7.10. Prove that the collection J of (2.5.10) is a semialgebra.
Proof. Let J = {(, x] : x R} {(y, ) : y R} {(x, y] : x, y R} {, R}.
Clearly , R J . Next, we show J is closed under finite intersection. Let (WLOG)
A, B J \{, R}. Consider the following cases (WLOG):
Case A = (, x], B = (, y]: Let s = min{x, y}, then A B = (, s] J .
Case A = (, x], B = (y, ): If x 6 y, then A B = J . If x > y, then
A B = (y, x] J .
Case A = (, x], B = (y, z]: Let s = min{x, z}. If x 6 y, then A B = J .
If x > y, then A B = (y, s] J .
Case A = (x, ), B = (y, ): Let s = max{x, y}. Then A B = (s, ) J .
Case A = (x, ), B = (y, z]: Let s = max{x, y}. If x > z, then A B = J . If
x < z, then A B = (s, z] J .
Case A = (w, x], B = (y, z]: Let s = max{w, y} and t = min{x, z}. If w > z or
x 6 y, then A B = J . Otherwise A B = (s, t] J .
For general A1 , . . . , An J simply induct on the above argument. Therefore J
is closed under finite intersection. Now we show the complement of A J is the
finite disjoint union of elements in J . Let (WLOG) A J \{, R}. Consider the
following cases:
Case A = (, x]: AC = (x, ) J .
Case A = (y, ): AC = (, y] J .

Case A = (x, y]: AC = (, x] (y,


) with (, x], (y, ) J .
Therefore J is closed under complement and is a semialgebra.
Exercise 2.7.11. Let = [0, 1]. Let J 0 be the set of all half-open intervals of
the form (a, b], for 0 6 a < b 6 1, together with the sets , and {0}.
(a) Prove that J 0 is a semialgebra.
(b) Prove that (J 0 ) = B, i.e. that the -algebra generated by this J 0 is equal to
the -algebra generated by the J of (2.4.1).
(c) Let B00 be the collection of all finite disjoint unions of elements of J 0 . Prove
that B00 is an algebra. Is B00 the same as the algebra B0 defined in (2.2.4)?

16

[Remark: Some treatments of Lebesgue measure use J 0 instead of J .]


Proof.
(a) Clearly , R J 0 . Next, we show J 0 is closed under finite intersection. Let
(WLOG) A, B J 0 \{, R}. If A = B = {0} then A B = {0} J 0 . If
(WLOG) A = {0} and B 6= {0} then A B = J 0 . Suppose A 6= {0}
and B 6= {0}. Then A = (a1 , b1 ] and B = (a2 , b2 ] for some 0 6 a1 < b1 6
1 and 0 6 a2 < b2 6 1. Let a = max{a1 , a2 } and b = min{b1 , b2 }. If
a1 > b2 or b1 6 a2 , then A B = J 0 . Otherwise A B = (a, b] J 0 .
For A1 , A2 , . . . , An J 0 , simply induct on the previous argument. Now
we show J 0 is closed under complement. Let (WLOG) A J 0 \{, R}. If
A = {0}, then AC = (0, 1] J 0 . Suppose A = (a, b]. If b = 1, then AC =

1] where
{0} (0,
a] where {0}, (0, a] J 0 . If b < 1, then AC = {0} (0,
a] (b,
0
{0}, (0, a], (b, 1] J . Thus the complement of an element is the disjoint union
of elements in J 0 and therefore J 0 is a semialgebra.
(b) Let J = {all intervals contained in [0,1]} and let B = (J ). Clearly B
(J 0 ) since J J 0 . It remains to show that B (J 0 ). We will do this
by showing that (J 0 ) J , since, by definition, all -algebras containing J
contain B = (J ). Let (WLOG) A J \{, [0, 1]}. Then A is an interval
(possibly a singular point or empty). Consider the following cases (WLOG):
Case (a, b]: (a, b] (J 0 ).
0
Case [a, b]: If a = 0, then [0, b] = {0} (0,
let k be large
Tb] (J1 ). If a > 0,
1
such that a k > 0. Then [a, b] = j=k (a j , b] (J 0 ).

Case {x}: {x} = [0, x] [x, 1] (J 0 )


Case [a, b): Let k be large such that 0 < b k1 < 1. Then [a, b) =
1
0
k ] (J ).

Case (a, b): Let k be large such that 0 < b k1 < 1. Then (a, b) =
1
0
k ] (J ).

j=k [a, b

j=k (a, b

(c) Let B00 be the collection of all finite disjoint unions of elements of J 0 . Let
SkA
Sk B
A, B B00 . Then we can write A = j=1 Aj and B = j=1 Bj for some
SkA

kA , kB and Aj , Bj J 0 for each j. Then A B =
Aj B =
j=1

S
kA (Aj B) = S
kA (Aj ( S
kB Bi )) = S
kA S
kB
j=1
i=1
j=1 i=1 (Aj Bi ). Since each
j=1
Aj , Bi J 0 , then so is Aj Bi , and therefore A B B00 . By inducting
on this previous argument we have that B00 is closed under finite intersection.
SkA
C
TkA C
Now AC =
Aj
=
A . Since J 0 is a semialgebra, for each j,
j=1

j=1

is a finite disjoint union of elements of J 0 , and since B00 is closed under


finite intersection, we have that AC B00 . Therefore B00 is an algebra.

AC
j

17

Now let B0 = {all finite unions of elements of J }. We we show that B00 6= B0


by proving that [ 21 , 1] 6 B00 . Assume on the contrary that [ 21 , 1] B00 . Then
Sn
[ 12 , 1] = j=1 Aj for some n and some Aj J 0 . Since 12 [ 21 , 1], then
there exists some k such that such that 12 Ak . Since (WLOG) Ak
J 0 \{, {0}, [0, 1]}, then Ak = (a, b] for some 0 6 a < b 6 1. Then a < 12 and
there exists x Ak such that x < 12 . Since x Ak , then x [ 12 , 1] which is a
/ B00 .
contradiction. Therefore [ 12 , 1]

Exercise 2.7.12. Let K be the Cantor set as defined inSSubsection 2.4. Let

Dn = K n1 where K n1 is defined as in (1.2.4). Let B = n=1 Dn .


(a) Draw a rough sketch of D3 .
(b) What is (D3 )?
(c) Draw a rough sketch of B.
(d) What is (B)?
Proof. Let K be the Cantor set as defined in Subsection 2.4. Let Dn = K
where K n1 {k + n1 ; k K, k + n1 6 1} {k + n1 1; k K, k + n1 > 1}.

1
n

(a)
(b) Since is shift-invariant as in (1.2.5), we must have (Dn ) = (K) = 0 for
all n. In particular, (D3 ) = 0.
(c)
(d) By part P
(b), (Dn ) = 0Pfor all n, and therefore by subadditivity we have

(B) 6 n=1 (Dn ) = n=1 0 = 0.

Exercise 2.7.13. Given an example of a sample space , a semialgebra J , and a


non-negative function P : J R with P() = 0 and P() = 1, such that (2.5.5)
is not satisfied.
Proof. See the example from Exercise 2.7.5.
Exercise 2.7.14. Let = {1, 2, 3, 4}, with F the collection of all subsets of .
Let P and Q be two probability measures on F, such that P{1} = P{2} = P{3} =
P{4} = 1/4, and Q{2} = Q{4} = 1/2, extended to F by linearity. Finally, let
J = {, , {1, 2}, {2, 3}, {3, 4}, {1, 4}}.
(a) Prove that P(A) = Q(A) for all A J .
(b) Prove that there is A (J ) with P(A) 6= Q(A).

18

(c) Why does this not contradict Proposition 2.5.8?


Proof.
(a) By additivity we have P{1, 2} = P{1} + P{2} = 1/4 + 1/4 = 1/2. Similarly,
for each A J , we have P(A) = 1/2. Since Q{2} + Q{4} = 1/2 + 1/2 = 1,
we have that by finite additivity Q{1} + Q{3} = Q() Q({2} + Q{4}) =
1 1 = 0. And therefore by nonnegativity, Q{1} = Q{3} = 0. Again, by
finite additivity we have that for each A J , we have Q(A) = 1/2. Therefore
P(A) = Q(A) for all A J .
(b) We have that {1, 2} {1, 4} = {1} (J ), but P{1} = 1/4 6= 0 = Q{1}.
(c) We have that {1, 2} {1, 4} = {1}
/ J , and thus J is not closed under intersection. Therefore J is not a semialgebra and the hypothesis of Proposition
2.5.8 is not satisfied.

Exercise 2.7.15. Let (, M, ) be Lebesgue measure on the interval [0, 1]. Let
0 = {(x, y) R2 ; 0 < x 6 1, 0 < y 6 1}.
Let F be the collection of all subsets of 0 of the form
{(x, y) R2 ; x A, 0 < y 6 1}
for some A M. Finally, define a probability P on F by
P({(x, y) R2 ; x A, 0 < y 6 1}) = (A).
(a) Prove that (0 , F, P) is a probability triple.
(b) Let P be the outer measure corresponding to P amd F. Define the subset
S 0 by


S = (x, y) R2 ; 0 < x 6 1, y = 1/2 .
(Note that S
/ F.) Prove that P (S) = 1 and P (S C ) = 1.
Proof. Let (, M, ) be Lebesgue measure on the interval [0, 1]. Let 0 = (0, 1]2
and F = M {(0, 1]}.
(a) Since M is a -algebra, clearly F is a -algebra. Since is a probability
measure and : F M given by (A (0, 1]) = A is clearly a bijection and
P(A (0, 1]) = ((A (0, 1])) for all A M, we see that P is a probability
measure. Therefore (0 , F, P) is a probability triple.
(b) Let P be the outer measure corresponding to P and F. SLet S = (0, 1]

{1/2}. Let A1 (0, 1], A2 (0, 1], F such that S j=1 (Aj (0, 1]).
S
Then j=1 (Aj (0, 1]) = (0, 1]2 = 0 . Therefore, by additivity, P(0 ) 6
19


Aj (0, 1] . Thus by definition of outer measure, we have P (S) =

 

P(0 ) = 1. Now we have that S C = (0, 1]C {1/2} (0, 1] {1/2}C


(0, 1]C {1/2}C = (0, 1] {1/2}C . Then the exact same argument as

S

j=1

above shows that P (S C ) = P(0 ) = 1.

Exercise 2.7.16.
(a) Where in the proof of Theorem 2.3.1 was assumption (2.3.3) used?
(b) How would the conclusion of Theorem 2.3.1 be modified if assumption (2.3.3)
were dropped (but all other assumptions remained the same)?
Proof.
(a) In Lemma 2.3.5 it is used to show that P (A) = P(A) for A J .
(b) We have that P is not extension of P, but is a probability measure on M
(J ).

Exercise 2.7.17. Let = {1, 2}, and let J be the collection of all subsets of ,
with P() = 0, P() = 1, and P{1} = P{2} = 1/3.
(a) Verify that all assumptions of Theorem 2.3.1 other than (2.3.3) are satisfied.
(b) Verify that assumption (2.3.3) is not satisfied.
(c) Describe precisely the M and P that would result in this example from the
modified version of Theorem 2.3.1 in Exercise 2.7.16(b)
Proof.
(a) J is a semialgebra since it contains ALL subsets of . P() = 0 and P() = 1
by assumption. We have A = A so that P(A ) = P(A) = P(A) + 0 =
P(A) + P(). Also, P({1} {2}) = P{1, 2} = 1 > 2/3 = 1/3 + 1/3 =
P{1} + P{2}. Therefore (2.3.2) is satisfied.
(b) We have that {1} {2} {1, 2}, but P({1} {2}) = P{1, 2} = 1 > 2/3 =
1/3 + 1/3 = P{1} + P{2}. Therefore (2.3.3) is not satisfied.
(c) M = {A ; P (A E) + P (AC E) = P (E) E } = {, } (J ).

20

Exercise 2.7.18.
P({1}) = 1/3.

Let = {1, 2}, J =


, , {1} , P() = 0, P() = 1, and

(a) Can Theorem 2.3.1, Corollary 2.5.1, or Corollary 2.5.4 be applied in this case?
Why or why not?
(b) Can this P be extended to a valid probability measure? Explain.
Proof.
(a) They cannot be applied. J is not a semialgebra since {1}C = {2} is not a
disjoint union of elements in J . Therefore the hypotheses are not satisfied.
(b) Yes, we can extend P to P on (J ) by defining P (A) = P(A) on J , and
P {2} = 1 P {1} = 1 1/3 = 2/3.

Exercise 2.7.19. Let be a finite non-empty set, and let P


J consist of all singletons in , together with and . Let p : [0, 1] with p() = 1, and
define P() = 0, P() = 1, and P{} = p() for all .
(a) Prove that J is a semialgebra.
(b) Prove that (2.3.2) and (2.3.3) are satisfied.
(c) Describe precisely the M and P that result in applying Theorem 2.3.1.
(d) Are the M and P the same as those described in Theorem 2.2.1?
Proof. Let be a finite non-empty set, and let J = {A1 , . . . , Am
P} consist of all
singletons in , together with and . Let p : [0, 1] with p() = 1,
and define P() = 0, P() = 1, and P{} = p() for all .
(a) Since J consists of singletons, we have Aj Ai = Ai if i = j and Aj Ai =
S

if i 6= j. Therefore J is closed under finite intersection. Also AC


j =
i6=j Ai
and therefore J is a semialgebra.
Sk
(b) For (2.3.2), let (WLOG) A1 , . . . , Ak J \{, }, and i=1 Ai J , where the
Sk
{Ai } are disjoint. We will show either k = 1 or k = |J | = m. Since i=1 Ai
Sk
J is a singleton, we have i=1 Ai J = Aj for some j. Therefore k = 1 and
Sk
P(Aj ) and (2.3.2) is satisfied. For (2.3.3), let A, A1 , A2 ,
i=1 Ai J =
S
S

J with A n An . Then SA is the disjoint union


S A = S jS Aj for some
S1 = {i1 , . . . ik }. Since
n we have that Pn An =
jS2 Aj for some
S A n AP
S2 S1 . Then P ( n An ) =
p(A
)
+
p(A
i
ij ) = P(A) +
j
jS
jS
\S
1
2
1
P
p(A
)
>
P(A).
i
j
jS2 \S1
(c),(d) Let m = |J |. Then by the proof of exercise 2.7.9, we have that |M| =
|2J | = 2m . And since J M and J 2J , then M = 2J . Thus M is the

21

F from theorem 2.2.1 and therefore P is the same probability measure in


theorem 2.1.1 (as they are defined the same way).

Exercise 2.7.20. Let P and Q be two probability measures defined on the same
sample space and -algebra F.
(a) Suppose that P(A) = Q(A) for all A F with P(A) 6 21 . Prove that P = Q,
i.e. that P(A) = Q(A) for all A F.
(b) Give an example where P(A) = Q for all A F with P(A) <
that P 6= Q, i.e. that P(A) 6= Q(A) for some A F.

1
2,

but such

Proof. Let P and Q be two probability measures defined on the same sample space
and -algebra F.
(a) Suppose that P(A) = Q(A) for all A F with P(A) 6 21 . Let A F with
P(A) > 21 . Then P(AC ) = 1 P(A) 6 12 and thus P(AC ) = Q(AC ). Thus
Q(A) = 1 Q(AC ) = 1 P(AC ) = P(A). Therefore P(A) = Q(A) for all
A F.
(b) Let = {, {1}, {2}, {1, 2}}. Define P by P() = 0, P() = 1, and P{1} =
P{2} = 21 . Define Q by Q() = 0, Q() = 1, Q{1} = 31 , and Q{2} = 23 .
Then P() = Q() = 0 and therefore P(A) = Q(A) for all A F with
P(A) < 21 , but clearly P 6= Q.

Exercise 2.7.21. Let be Lebesgue measure in dimension two, i.e. Lebesgue


measure on [0, 1] [0, 1]. Let A be the triangle {(x, y) [0, 1] [0, 1]; y < x}. Prove
that A is measurable with respect to , and compute (A).
Proof. Let be Lebesgue measure in dimension two, i.e. Lebesgue measure on
[0, 1] [0, 1]. Let A be the triangle {(x, y) [0, 1] [0, 1]; y < x}. Let M be the
set of Lebesgue measurable sets on [0, 1] [0, 1]. Define the open-sided rectangle
R(a, b) = {[a, b) [0, a)} (where [0, 0) = ). For each natural number n, let An =

S2n
S
S
j1 j
j=1 R 2n , 2n . We will show that A =
j=1 Aj . Clearly
j=1 Aj A since
Aj A for each j. Let (x, y) A. Let m be large such that there
 exists aSkwhere
k
k1 k
y < k1
6
x
<
.
Then
we
have
that
(x,
y)

R
,
j=1 Aj .
2m
2m
2m 2m Am
S
Therefore A = j=1 Aj and A M and is thus measurable with respect to .
Now, let B1 = A1 and for each j, let Bj+1 = Aj+1 \Aj . Since A1 A2 . . . we
S
can rewrite A as the disjoint union j=1 Bj . Therefore

22

(A) =

(Bj )

= (A1 ) +

((Aj+1 ) (Aj ))

j=1

j+1
 X

 
 X
 
2j

2X  
i
i
1
i

1
i

, j
= R
,1
+
R
R
j+1 2j+1
j
2
2
2
2
i=2
j=1
i=2

j+1
j
2

2
1 X X i 1 X i 1

= +
4 j=1 i=2 22j+2 i=2 22j

j+1
j
2X
1

2
1
1 X 1 X
= +
4i
i
4 j=1 22j+2
i=1
i=1


1 X 1
+
(2j+1 1)2j 4(2j 1)2j1
4 j=1 22j+2

1 X 1
+
4 j=1 2j+2

1 1X 1
+
4 4 j=1 2j

1 1
+
4 4
1
= .
2

Exercise 2.7.22. Let (1 , F1 , P1 ) be Lebesgue measure on [0,1]. Consider a


second probability triple, (2 , F2 , P2 ), defined as follows: 2 = {1, 2}, F2 consists
of all subsets of 2 , and P2 is defined by P2 {1} = 31 , P{2} = 23 , and additivity.
Let (, F, P) be the product measure of (1 , F1 , P1 ) and (2 , F2 , P2 ).
(a) Express each of , F, and P as explicitly as possible.
(b) Find a set A F such that P(A) = 43 .
Proof. Let (1 , F1 , P1 ) be Lebesgue measure on [0,1]. Let (2 , F2 , P2 ) be defined
as follows: 2 = {1, 2}, F2 consists of all subsets of 2 , and P2 is defined by
P2 {1} = 31 , P2 {2} = 23 , and additivity. Let (, F, P) be the product measure of
(1 , F1 , P1 ) and (2 , F2 , P2 ).

23

(a) We have = [0, 1] {1, 2}. If F F, then F = A B, where A F1 and


B F2 = {, {1}, {2}, {1, 2}}. Finally, for such F = A B F, we have
P = P1 (A)P2 (B). In particular, if B = , then P(F ) = 0; if B = {1},
then P(F ) = 31 P1 (A); if B = {2}, then P(F ) = 23 P1 (A); if B = {1, 2}, then
P(F ) = P1 (F ).
(b) Let A = [0, 34 ] 2 . Then P(A) = P1 ([0, 34 ])2 (2 ) = 34 .

3.1 Exercises
Exercise 3.1.4. For Example 3.1.3, compute P(Z > a) and P(X < a and Y < b)
as functions of a, b R.
Proof. Let (, F, P) be Lebesgue measure on [0,1]. Let X, Y and Z be defined by
X() = , Y () = 2 and Z() = 3 + 4. Then P(Z > a) = P(; 4 + 3 >
1a
b
a) = P ; > a3
= P ( a3
4
4 , 1] =
4 . Let c = min{a, 2 }, then P(X <
a and Y < b) = P(; < a and 2 < b) = P(; < a and < 2b ) = P(; <
c) = P([0, c)) = c.
Exercise 3.1.7. Prove (3.1.6). [Hint: remember the definition of X(w) =
limn Xn (), cf. Subsection A.3.]
Proof. Let Z1 , Z2 , . . . be random variables such that
Zn () exists for each
T lim
Sn
T
1
}.
, and Z() = limn Zn (). Let A = m=1 n=1 k=n {Zk 6 x + m
Let {Z 6 x}. If Z() 6 x, then for each natural number m there exists
1
n(m)Tsuch that for all k > n, we have Zk () 6 x + m
. That is, for each m,

1
k=n(m) {Zk 6 x + m } and therefore A, so that {Z 6 x} A. Now
suppose A. Then for each natural number m there exists n(m) such that for
1
1
. Thus for each m, Z() 6 x + m
. Since
all k > n, we have Zk () 6 x + m
this is true for all m, we must have Z(w) 6 x. Therefore {Z 6 x}, so that
A {Z 6 x}. This implies that {Z 6 x} = A.

3.2 Exercises
Exercise 3.2.2. Suppose (3.2.1) is satisfied.
(a) Show that (3.2.1) is still satisfied if A1 is replaced by AC
1 .
(b) Show that (3.2.1) is still satisfied if each Ai is replaced by the corresponding
AC
i .
(c) Prove that if {A }I is independent, then so is {AC
}I .
24

Proof. Let P be a probability measure. Let {A }I be a possibly-infinite independent collection of events. That is, for each j N and each distinct finite choice
1 , 2 , . . . , j I, we have
P(A1 A2 Aj ) = P(A1 )P(A2 ) . . . P(Aj ).
(a) Let j N and let 1 , 2 , . . . , j I. Since
P(A1 A2 Aj AC
1 A2 Aj ) = P(A2 Aj )
then
P(AC
1 A2 Aj ) = P(A2 Aj ) P(A1 A2 Aj )
= P(A2 ) . . . P(Aj ) P(A1 )P(A2 ) . . . P(Aj )
= (1 P(A1 ))P(A2 ) . . . P(Aj )
= P(AC
1 )P(A2 ) . . . P(Aj )
(b) Let j N and let 1 , 2 , . . . , j I. Suppose
C
C
C
P(AC
1 Ak Ak+1 Aj ) = P(A1 ) . . . P(Ak )P(Ak+1 ) . . . P(Aj ),

then similar to part (a)


C
P(AC
1 Ak+1 Ak+2 Aj )
C
C
C
= P(AC
1 Ak+1 Ak+3 Aj )P(A1 Ak+1 Ak+2 Aj )
C
= (1 P(Ak+1 ))P(AC
2 ) . . . P(Ak )P(Ak+2 ) . . . P(Aj )
C
C
= P(AC
1 )P(A2 ) . . . P(Ak+1 )P(Ak+2 )P(Aj )

Therefore, by induction, it is true for all k.


(c) By part (b), (3.2.1) holds for {AC
}I , and therefore it is an independent
collection.

3.6 Exercises
Exercise 3.6.1. Let X be a real-valued random variable defined on a probability
triple (, F, P). Fill in the following blanks:
(a) F is a collection of subsets of

(b) P(A) is a well-defined element of


.

provided that A is an element of

25

(c) {X 6 5} is shorthand notation for the particular set of


.
by:
(d) If S is a subset of
(e) If S is a

subset of

, then {X S} is a subset of

which is defined
.

, then {X S} must be an element of

Proof. (a) F is a collection of subsets of .


(b) P(A) is a well-defined element of [0, 1] provided that A is an element of F.
(c) {X 6 5} is shorthand notation for the particular set of outcomes which is
defined by: { ; X() 6 5}.
(d) If S is a subset of T , then {X S} is a subset of {X T }.
(e) If S is a closed subset of R (or in general, a Borel set), then {X S} must be
an element of F.

Exercise 3.6.2. Let (, F, P) be Lebesgue measure on [0, 1]. Let A = (1/2, 3/4)
and B = (0, 2/3). Are A and B independent events?
Proof. Yes. Let (, F, P) be Lebesgue measure on [0, 1]. Let A = (1/2, 3/4) and
B = (0, 2/3). Then A B = (1/2, 2/3) and P(A B) = 1/6 = (1/4)(2/3) =
P(A)P(B).
Exercise 3.6.3. Give an example of events A, B, and C, each of probability
strictly between 0 and 1, such that
(a) P(A B) = P(A)P(B), P(A C) = P(A)P(C), and P(B C) = P(B)P(C);
but it is not the case that P(A B C) = P(A)P(B)P(C). [Hint: You can
let be a set of four equally likely points.]
(b) P(A B) = P(A)P(B), P(A C) = P(A)P(C), and P(A B C) =
P(A)P(B)P(C); but it is not the case that P(B C) = P(B)P(C). [Hint:
You can let be a set of eight equally likely points.]
Proof. (a) Let = {1, 2, 3, 4}, F = 2 and P be defined by P(n) = 1/4 for n .
Let A = {1, 2}, B = {2, 3} and C = {1, 3}. Then A B = {2}, B C = {3}
and A C = {1}, thus P(A B) = 1/4 = (1/2)(1/2) = P(A)P(B), P (B
C) = 1/4 = (1/2)(1/2) = P(B)P(C), and P(A C) = 1/4 = (1/2)(1/2) =
P(A)P(C). However, A B C = , and therefore P(A B C) = 0 6=
1/8 = (1/2)(1/2)(1/2) = P(A)P(B)P(C).
(b) Let = {1, 2, 3, 4, 5, 6, 7, 8}, F = 2 and P be defined by P(n) = 1/8 for n
and extended to F. Let A = {1, 2, 3, 4}, B = {3, 4, 5, 6} and C = {1, 3, 7, 8}.
Then A B = {3, 4}, A C = {1, 3} and A B C = {3}, thus P(A B) =
1/4 = (1/2)(1/2) = P(A)P(B), P(A C) = 1/4 = (1/2)(1/2) = P(A)P(C)
26

and P(A B C) = 1/8 = (1/2)(1/2)(1/2) = P(A)P(B)P(C). However,


B C = {3}, and therefore P(B C) = 1/8 6= 1/4 = P(B)P(C).

Exercise 3.6.4. Suppose {An } % A. Let f : R be any function. Prove that


limn inf An f () = inf A f ().
Proof. Suppose {An } % A. Let f : R be any function. Let In = inf An f ()
and I = inf A f (). Since {An } % A, then I1 > I2 > . . . and In > I for all n.
Let  > 0. There exists 0 A such that f (0 ) I < /2. Since {An } % A, there
exists N (0 ) > 0 such that for all n > N , we have 0 An . Then, for all n > N ,
we have f (0 ) > In > I and thus f (0 ) In < /2. Therefore, for all n > N , we
have |In I| = |In f (0 ) + f (0 ) I| 6 |f (0 ) In | + |f (0 ) I| < /2 + /2 = .
That is, limn inf An f () = inf A f ().
Exercise 3.6.5. Let (, F, P) be a probability triple such that is countable,
and F = 2 . Prove that it is impossible for there to exist a sequence A1 , A2 ,
F which is independent, such that P(Ai ) = 12 for each i. [Hint: First prove
that for each , and each n N, we have P({}) 6 1/2n . Then derive a
contradiction.]
Proof. Let (, F, P) be a probability triple such that is countable, and F = 2 .
Suppose, for contradiction, that there exists a sequence A1 , A2 , F which is
independent, such that P(Ai ) = 1/2 for each i. Let . Let Bn = An if
C
An and Bn = AC
n if An . Since A1 , A2 , . . . is independent, then by
Exercise 3.2.2, B1 , B2 , . . . is independent. Since P(An ) = P(AC
n ) = 1/2, we have
P(Bn ) = 1/2 for each n. Then for each n, we have B1 Bn , and thus
P({}) 6 P(B1 Bn ) = P(B1 ) . . . P(BnP
) = 1/2n . Therefore, for each ,
we have P({}) = 0 which implies P(A1 ) = A1 P({}) = 0. This contradicts
the fact that P(A1 ) = 1/2, and thus there does not exist a sequence A1 , A2 , F
which is independent such that P(Ai ) = 1/2 for each i.
Then either A1 or AC
1 , and therefore P({}) 6 1/2 (since P(A1 ) =
n
P(AC
1 ) = 1/2). Suppose, for n = k, we have P({}) 6 1/2 .
Exercise 3.6.6. Let X, Y and Z be three independent random variables, and set
W = X + Y . Let Bk,n = {(n 1)2k 6 X < n2k } and let Ck,m = {(m 1)2k 6
Y < m2k }. Let
[

Ak =
(Bk,n Ck,m ).
n, m Z
(n + m)2k < x
Fix x, z R, and let A = {X + Y < x} = {W < x} and D = {Z < x}.
(a) Prove that {Ak } % A.
27

(b) Prove that Ak and D are independent.


(c) By continuity of probabilities, prove that A and D are independent.
(d) Use this to prove that W and Z are independent.
Proof. Let X, Y and Z be three independent random variables, and set W = X +Y .
Let Bk,n = {(n 1)2k 6 X < n2k } and let Ck,m = {(m 1)2k 6 Y < m2k }.
Let
[

(Bk,n Ck,m ).
Ak =
n, m Z
(n + m)2k < x
Fix x, z R, and let A = {X + Y < x} = {W < x} and D = {Z < z}.
(a) Since {n, m Z; (n + m)2k < x} {n, m Z; (n + m)2(k+1) < x}, we have
that {Ak } is increasing.
Let Ak . Then we have Bk,n Ck,m for some n, m Z with (n +
m)2k < x. This implies that X() < n2k and Y () < m2k , so that
X() + Y () < n2k + m2k = (n + m)2k < x. Thus A. Therefore,
for each k, Ak A.
It remains to show that for each A, there exists a k0 such that Ak0 .
Let A. For each k, there exists a unique nk , mk such that (nk 1)2k 6
X() < nk 2k and (mk 1)2k 6 Y () < mk 2k . Then for each k, we have
X()+Y () < (nk +mk )2k and in the limit limk (nk +mk )2k = X()+
Y (). Then we can find k0 such that X() + Y () < (nk0 + mk0 )2k0 < x.
Thus Ak0 . Therefore we have {Ak } % A.
(b) Since X, 
Y and Z are independent, for each k, n, m, we have P(Bk,n Ck,m





D) = P X (n 1)2k , n2k , Y (m 1)2k , m2k , Z (, z) =






k
k
k
k
P X (n 1)2 , n2
, Y (m 1)2 , m2
P (Z (, z)) =
P(Bk,n Ck,m )P(D). Thus, since Ak is a disjoint union over n, m, we have
X
P(Ak D) =
P(Bk,n Ck,m D)
(n+m)2k <x

P(Bk,n Ck,m ) P(D)

(n+m)2k <x

= P(Ak )P(D).
Therefore Ak and D are independent.
(c) Since {Ak } % A, we have that {Ak D} % A D. By continuity of probabilities (Proposition 3.3.1), we have that P(A D) = limk P(Ak D) =
28

limk (P(Ak )P(D)) = (limk P(Ak )) P(D) = P(A)P(D). Thus A and


D are independent.
(d) For each x and y, we have that P(W < x, Z < z) = P(A D) = P(A)P(D) =
P(W < x)P(Z < z). Therefore, by Proposition 3.2.4, W and Z are independent.

Exercise 3.6.7. Let (, F, P) be the uniform distribution on = {1, 2, 3}, as in


Example 2.2.2. Give an example of a sequence A1 , A2 , F such that




P lim inf An < lim inf P (An ) < lim sup P (An ) < P lim sup An ,
n

i.e. such that all three inequalities are strict.


Proof. Let (, F, P) be the uniform distribution on = {1, 2, 3}. Let A4n+1 =
{1}, A4n+2 = {2}, A4n+3 = {1, 2}, and A4n+4 = {2, 3} for n = 0, 1, 2, . . . . Then
we have P(A4n+1 ) = 1/3, P(A4n+2 ) = 1/3, P(A4n+3 ) = 2/3, and P(A4n+4 ) =
2/3. Also, lim supn An = {1, 2, 3} and lim inf n An = . Therefore, we have
P (lim inf n An ) = P() = 0 < lim inf n P (An ) = 1/3 < lim supn P (An ) = 2/3 <
P (lim supn An ) = P() = 1.
Exercise 3.6.8. Let be Lebesgue measure on [0,1], and let 0 6 a 6 b 6 c 6 d 6
1 be arbitrary real numbers. Give an example of a sequence A1 , A2 , . . . of subsets
of [0,1], such that (lim inf n An ) = a, lim inf n (An ) = b, lim supn (An ) = c, and
(lim supn An ) = d. [Hint: begin with the case d = b + c a, which is easiest, and
then carefully branch out from there.]
Proof. Let be Lebesgue measure on [0,1], and let 0 6 a 6 b 6 c 6 d 6 1
be arbitrary real numbers. We will give an example for each of the three cases:
d = b + c a, d > b + c a and d < b + c a.
First, suppose d = b + c a. Let A2n+1 = [0, c] and A2n+2 = [c a, b + c
a] for n = 0, 1, 2, . . . . Then we have (A2n+1 ) = c and (A2n+2 ) = b. Also,
A2n+1 A2n+2 = [0, b + c a] = [0, d] and A2n+1 A2n+2 = [c a, c] so that
lim supn An = [0, d] and lim inf n An = [c a, c]. Therefore, (lim inf n An ) = a,
lim inf n (An ) = b, lim supn (An ) = c, and (lim supn An ) = d.
d
for each
n
n = 1, 2, . . . and 0 6 k 6 n. Define Bk,n = [c a, ] [ + gk,n , + gk+1,n ] for
each n = 1, 2, . . . and 0 6 k 6 n. Let B2,n = [0, c] and B1,n = [c a, ]. Now
let A1 = B2,1 , A2 = B1,1 , A3 = B0,1 , A4 = B1,1 , A5 = B2,2 , A6 = B1,2 , A7 =
B0,2 , A8 = B1,2 , A9 = B2,2 , A10 = B2,3 , . . . .
Next, suppose d > b + c a. Let = b + c a. Let gk,n = k

29

Clearly we have lim inf n An = [c a, c] so that (lim inf n An ) = a. For each


k > 0 we have limn gk,n = 0, so that limn ([ + gk,n , + gk+1,n ]) = 0
and limn (Bk,n ) = ([c a, ]) = ([c a, b + c a]) = b. This implies
that lim inf n (An ) = b. ClearlySwe then have lim supn (An ) = ([0, c]) = c.
n
Finally, for each n > 0, we have k=2 Bk,n = [0, d]. Thus lim supn An = [0, d],
so that (lim supn An ) = d. Therefore, (lim inf n An ) = a, lim inf n (An ) = b,
lim supn (An ) = c, and (lim supn An ) = d.
Now, suppose d < b + c a. Let = b + c a d > 0. Let A3n+1 = [0, c],
A3n+2 = [0, ] [d b + , d] and A3n+3 = [d b, d] for n = 1, 2, . . . . Since
A3n+1 A3n+2 A3n+3 = [db+, d] = [ca, c], we have that lim inf n An = [ca, c]
so that (lim inf n An ) = a. Since ([0, ] [d b + , d]) = ([0, ]) + ([d b +
, d]) = + b = b and ([0, c]) = c, we have that lim inf n (An ) = b and
lim supn (An ) = c. Finally, since A3n+1 A3n+2 A3n+3 = [0, d], we have that
lim supn An = [0, d] so that (lim supn An ) = d. Therefore, (lim inf n An ) = a,
lim inf n (An ) = b, lim supn (An ) = c, and (lim supn An ) = d.
Exercise 3.6.9. Let A1 , A2 , . . . , B1 , B2 , . . . be events.
(a) Prove that

 

lim sup An lim sup Bn lim sup(An Bn ).
n

(b) Give an example where the above inclusion is strict, and another example
where it holds with equality.
Proof. (a) Let A1 , A2 , . . . , B1 , B2 , . . . be
n Bn ), then
Sevents. Let c lim supnS(A

there exists some k0 such that c k=k0 (Ak Bk ). then c k=k0 Ak and c
S
k=k0 Bk . Thus c (lim supn An ) (lim supn Bn ). Therefore (lim supn An )
(lim supn Bn ) lim supn (An Bn ).
(b) Consider the sequences A1 = {1}, A2 = , A3 = {1}, A4 = , . . . and B1 =
, B2 = {1}, B3 = , B4 = {1}, . . . . Then An Bn = for all n so that
lim supn (An Bn ) = , but (lim supn An ) = (lim supn Bn ) = {1}. Therefore
we have that (lim supn An ) (lim supn Bn ) ) lim supn (An Bn ).
Now consider the sequences An = Bn = {1} for all n. Then clearly {1} =
(lim supn An ) (lim supn Bn ) = lim supn (An Bn ).

Exercise 3.6.10. Let A1 , A2 , . . . be a sequence of events, and let N N. Suppose


there are events B and C such that B An C for all n > N , and such that
P(B) = P(C). Prove that P(lim inf n An ) = P(lim supn An ) = P(B) = P(C).

30

Proof. Let A1 , A2 , . . . be a sequence of events, and let N N. Suppose there are


events B and C such that B An C for all n > N , and such that P(B) = P(C).
Since B An C for all n > N , we have that B lim inf n An lim supn An
C. By monotonicity we have that P(B) 6 P(lim inf n An ) 6 P(lim supn An ) 6
P(C). Therefore, since P(B) = P(C) we have P(lim inf n An ) = P(lim supn An ) =
P(B) = P(C).
Exercise 3.6.11. Let {Xn }
n=1 be independent random variables, with P(Xn =
i) = 1/n for i = 1, 2, . . . , n (cf. Example 2.2.2). Compute P(Xn = 5 i.o.), the
probability that an infinite number of the Xn are equal to 5.
Proof. Let {Xn }
n = i) = 1/n
n=1 be independent random variables, with P(X
P

for
i
=
1,
2,
.
.
.
,
n.
Let
A
be
the
event
that
X
=
5.
Then
n
n
n=5 P(An ) =
P
P

1/n
=
.
This
implies
that
P(A
)
=
.
Since
{X
}
n
n
n=1 are inden=5
n
pendent, we have that {An }
are
independent,
and
thus
by
the
Borel-Cantelli
n=1
lemma, P(Xn = 5 i.o.) = 1.
Exercise 3.6.12. Let X be a random variable with P(X > 0) > 0. Prove
that there is > 0 such that P(X > ) > 0. [Hint: Dont forget continuity of
probabilities.]
Proof. Let X be a random variable with P(X > S
0) > 0. Consider the increasing
sequence of events An = {X > 1/n}, with A := n An = {X > 0}. Suppose for
contradiction that for all n we have that P(An ) = 0. By continuity of probability
we have that limn P(An ) = P(A). This implies that for all  > 0, there exists an
N such that for all n > N , we have that |P(A) P(An )| = |P(A)| < . Since this is
true for all  > 0, then P(A) = 0 which is a contradiction. Thus, there exists some
n0 such that P(An0 ) > 0. Therefore, given = 1/n0 , we have P({X > }) > 0.
Exercise 3.6.13.
P Let X1 , X2 , . . . be defined jointly on some probability space
(, F, P), with i=1 i2 P(i 6 Xn < i + 1) 6 C < for all n. Prove that P[Xn >
n i.o.] = 0.
Proof. Let X1 , X2 , . . . be defined jointly on some probability space (,P
F, P), with
P

2
i
P(i
6
X
<
i
+
1)
6
C
<

for
all
n.
Then
P[X
>
n]
=
n
n
i=1
i=n P(i 6
P 2
P 2
2
Xn < i+1) 6 n P[Xn > n] = i=n n P(i 6 Xn < i+1)
6
i
P(i
6
i=1
P
P Xn < i+
1) 6 C. This implies that P[Xn > n] 6 C/n2 . Then n P[Xn > n] 6 n C/n2 <
. Then by the Borel-Cantelli lemma, we have that P[Xn > n i.o.] = 0.
Exercise 3.6.14. Let > 0, 1 >  > 0 and let X1 , X2 , . . . be a sequence of
independent non-negative random
variables such that P(Xi > ) >  for all i.
P
Prove that with probability one, i=1 Xi = .

31

Proof. Let , > 0, 1 >  > 0 and let X1 , X2 , . . . be a sequence of independent


nonP
negative
random
variables
such
that
P(X
>
)
>

for
all
i.
Since
P(X
i
i >
i=1
P
P
) > i=1  we have that i=1 P(Xi > ) = and P
thus by the Borel-Cantelli
lemma, P[Xi > i.o.] =P1. Since {Xi > i.o.} { i Xi = }Pwe have that
P[Xi > i.o.] = 1 6 P[ i Xi = ]. Therefore, we must have P[ i Xi = ] =
1.
Exercise 3.6.15. Let A1 , A2 , . . . be a sequence of events, such that (i)P
Ai1 , Ai2 , . . . , Aik
are indepedent whenever ij+1 > ij + 2 for 1 6 j 6 k 1, and (ii) n P(An ) =
. Then the Borel-Cantelli Lemma does not directly apply. Still, prove that
P(lim supn An ) = 1.
Proof. Let A1 , A2 , . . . be a sequence of events, such that (i) Ai1P
, Ai2 , . . . , Aik are
indepedent whenever
i
>
i
+
2
for
1
6
j
6
k

1,
and
(ii)
j+1
j
n P(An ) = .
P
P
Then either
P(A
)
=

or
P(A
)
=
,
(if
both
were
finite, then
2n1
2n
n
n
their sum would be finite which contradicts (ii)). By (i) and simple induction, we
have that A1 , A3 , . . . are independent and A2 , A4 , . . . are independent. Thus by
the Borel-Cantelli Lemma, either P(lim supn A2n1 ) = 1 or P(lim supn A2n ) = 1
and therefore by subadditivity P(lim supn An ) = 1.
Exercise 3.6.16. Consider infinite, independent, fair coin tossing as in Subsection
2.6, and let Hn be the event that the nth coin is heads. Determine the following
probabilities.
(a) P(Hn+1 Hn+2 Hn+9 i.o.).
(b) P(Hn+1 Hn+2 H2n i.o.).
(c) P(Hn+1 Hn+2 Hn+[2 log2 n] i.o.).
(d) Prove that P(Hn+1 Hn+2 Hn+[log2 n] i.o.) must equal either 0 or 1.
(e) Determine P(Hn+1 Hn+2 Hn+[log2 n] i.o.). [Hint: Find the right subsequence of indices.]
Proof. Consider infinite, independent, fair coin tossing as in Subsection 2.6, and
let Hn be the event that the nth coin is heads.
(a) Let Bn = Hn+1 Hn+2
the subsequence of indepedent
P Hn+9 and
Pconsider
9
events B9n . Then
P(B
)
=
1/2
=
, and thus by the Borel9n
n
n
Cantelli lemma, P(B9n i.o.) = 1. Therefore by subadditivity, P(Hn+1
Hn+2 Hn+9 i.o.) = 1.
P
P
(b) Let Bn = Hn+1 Hn+2 H2n . Then n P(Bn ) = n 1/2n < , and
therefore by the Borel-Cantelli lemma, P(Bn i.o.) = 0.
P
P
[2 log2 n]
=
(c) Let
Bn = Hn+1 HP
n+2 Hn+[2 log2 n] . Then Pn=2 P(Bn ) =
n=2 1/2
P

n
2n
n
= n=1 1/2 < Therefore n=1 P(Bn ) < and by the
n=1 2 /2
Borel-Cantelli lemma, P(Bn i.o.) = 0.
32

(d),(e) Let Bn = Hn+1 Hn+2 Hn+[log2 n] . Since (n+1)2 > n log2 n2 for all n,
we have that [log2 (n + 1)2 ] > [log2 (n log2 n2 )] for all n. Then (n + 1)[log2 (n +
1)2 ] + 1 > n[log2 n2 ] + [log2 (n[log2 n2 ])] and {Bn[log2 n2 ] }
n=2 is an indepenP
P
1
dent sequence of events with n=2 P(Bn[log2 n2 ] ) > n=2
= .
n[log2 n2 ]
Therefore, by the Borel-Cantelli lemma, P(Bn i.o.) = 1.

Exercise 3.6.17. Show that Lemma 3.5.2 is false if we require only that P(B
Bn ) = P(B)P(Bn ) for each n N, but do not require that the {Bn } be indepedent
of each other. [Hint: Dont forget Exercise 3.6.3(a).]
Proof. From Exercise 3.6.3(a), let = {1, 2, 3, 4}, F = 2 and P be defined by
P(n) = 1/4 for n . Let B = {1, 2}, B1 = {2, 3} and B2 = {1, 3}. Then
B B1 = {2}, B1 C2 = {3} and B B2 = {1}, thus P(B B1 ) = 1/4 =
(1/2)(1/2) = P(B)P(B1 ), P (B1 B2 ) = 1/4 = (1/2)(1/2) = P(B1 )P(B2 ), and
P(B B2 ) = 1/4 = (1/2)(1/2) = P(B)P(B2 ). However, B (B1 B2 ) = , and
therefore P(B (B1 B2 )) = 0 6= 1/8 = (1/2)(1/4) = P(B)P(B1 B2 ).
Therefore P(B Bn ) = P(B)P(Bn ) for each n, but for B1 B2 (B1 , B2 ), we
have that P(B (B1 B2 )) 6= P(B)P(B1 B2 ).
Exercise 3.6.18.PLet A1 , A2 , . . . be any independent sequence of events, and let
n
Sx = {limn n1 i=1 1Ai 6 x}. Prove that for each x R we have P(Sx ) = 0 or
1.
Proof. Let A1 , A2 , . . . be any independent sequence of events, and let
=

(An , An+1 , . . . ).

n=1

Pn
Pn
For each x R we have Sx := {limn n1 i=1 1Ai 6 x} {limn n1 i=k 1Ai 6
x} (Ak , Ak+1 , . . . ) for each k N. Thus Sx and therefore, by the Kolmogorov Zero-One Law, for each x R we have P(Sx ) = 0 or 1.
Exercise 3.6.19. Let A1 , A2 , . . . be independent events. Let Y be a random
variable which is measurable with respect to (An , An+1 , . . . ) for each n N.
Prove that there is a real number a such that P(Y = a) = 1. [Hint: Consider
P(Y 6 x) for x R; what values can it take?]
Proof. Let A1 , A2 , . . . be independent events. Let Y be a random variable which is
measurable with respect to (An , An+1 , . . . ) for each n N. Then Y is measurable
with respect to

\
=
(An , An+1 , . . . ).
n=1

33

Then by the Kolmogorov Zero-One Law (and the defintion of a random variable),
for each x R, we have P(Y 6 x) S
= 0 or P(Y 6 x) = 1. Since Bn = {Y 6 n}
is an increasing sequence such that n Bn = {Y R}, by continuity there exists
some N such that P(Bn ) = 1 for all n > N (since P(YT R) = 1). Similarly, since
Cn = {Y 6 n} is a decreasing sequence such that n Cn = , there exists a K
such that P(Cn ) = 0 for all n > K. Thus the set S := {x R; P(Y 6 x) = 1} is
nonempty and bounded below.
Let a = inf S. Let Dn = {Y 6 aS 1/n} be an increasing sequence of events.
Then P(Dn ) = 0 for each n with n Dn = {Y < a}, and thus by continuity of
probability P(Y < a) = 0. Let En = {Y 6 a + 1/n} be a T
decreasing sequence of
events. Since a + 1/n > a, then P(En ) = 1 for each n with n En = {Y 6 a}, and
thus by continuity of probability P(Y 6 a) = 1. Therefore, P(Y = a) = P(Y 6
a) P(Y < a) = 1 0 = 1.

4.1 Exercises
Exercise 4.1.3. Prove that (4.1.2) is well-defined, in P
the sense thatP
if {Ai } and
n
m
{Bj } are two different finite partitions of , such that i=1 xi 1Ai = j=1 yj 1Bj ,
Pn
Pm
then
i=1 xi P(Ai ) =
j=1 yj P(Bj ). [Hint: collect together those Ai and Bj
corresponding to the same values of xi and yj .]
Pn
Proof. Suppose {Ai } and {Bj } are two finite partitions of , such that i=1 xi 1Ai =
Pm
S
n
I = {j; yj = xi } so that i=1 Ixi = {1, 2, . . . , m}. Then we have
j=1 yj 1Bj . Let
P xi
Pn
Pn P
that xi 1Ai = jIx yj 1Bj for each i. Then i=1 xi 1Ai = i=1 jIx yj 1Bj =
i

 iP
Pm
P
y
1
.
Therefore,
since
x
P(A
)
=
E(x
1
y
1
)
=
E
= jIx yj P(Bj ),
j
B
i
i
i
A
j
B
j
i
j
j=1
jIxi
Pn
Pn
Pn P
Pm i
we have that E ( i=1 xi 1Ai ) = i=1 xi P(Ai ) = i=1 jIx yj P(Bj ) = j=1 yj P(Bj ).
i

4.3 Exercises
Exercise 4.3.2. Let X and Y be two general random variables (not necessarily
non-negative) with well-defined means, such that X 6 Y .
(a) Prove that X + 6 Y + and X > Y .
(b) Prove that expectation is still order-preserving, i.e. that E(X) 6 E(Y ) under
these assumptions.
Proof. Let X and Y be two general random variables (not necessarily non-negative)
with well-defined means, such that X 6 Y .

34

(a) Since X 6 Y we have X + () = max(X(), 0) 6 max(Y (), 0) = Y + (). Since


X > Y , we have X () = max(X(), 0) > max(Y (), 0) = Y ().
(b) By part (a), we have E(X + ) 6 E(Y + ) and E(X ) > E(Y ). Then E(Y + )
E(X + ) > 0 and E(Y ) E(X ) 6 0. Thus E(Y ) E(X) = E(Y + )
E(X + ) (E(Y ) E(X )) > 0. Therefore E(Y ) > E(X).

Exercise 4.3.3. Let X and Y be two general random variables with finite means,
and let Z = X + Y .
(a) Express Z + Z in terms of X + , X , Y + , and Y .
(b) Prove that E(Z) = E(X) + E(Y ), i.e. that E(Z + ) E(Z ) = E(X + )
E(X ) + E(Y + ) E(Y ). [Hint: Re-arrange the relations of part (a) so that
you can make use of (4.2.6).]
(c) Prove that expectation is still (finitely) linear, for general random variables
with finite means.
Proof. Let X and Y be two general random variables with finite means, and let
Z =X +Y.
(a) Z + Z = Z = X + Y = X + X + Y + Y .
(b) From (a) we have Z + + X + Y = Z + X + + Y + . Then by (4.2.6),
we have E(Z + ) + E(X ) + E(Y ) = E(Z ) + E(X + ) + E(Y + ). Therefore
E(Z) = E(Z + )E(Z ) = E(X + )E(X )+E(Y + )E(Y ) = E(X)+E(Y ).
(c) Let a, b R. Since, by definition, (X)+ = X and (X) = X + , we have
E(X) = E(X ) E(X + ) = E(X). So WLOG assume a, b > 0. Let U =
aX, V = bY and W = U + Y . Then U and V are general random variables
with finite means. Then by (b) and (4.2.6), we have E(W ) = E(U + )
E(U ) + E(V + ) E(V ) = E(aX + ) E(aX ) + E(bY + ) E(bY ) =
aE(X + ) aE(X ) + bE(Y + ) bE(Y ) = aE(X) + bE(Y ).

Exercise 4.3.4. Let X and Y be two independent general random variables with
finite means, and let Z = XY .
(a) Prove that X + and Y + are independent, and similarly for each of X + and Y ,
and X and Y + , and X and Y .
(b) Express Z + and Z in terms of X + , X , Y + , and Y .
(c) Prove that E(XY ) = E(X)E(Y ).
Proof. Let X and Y be two general random variables with finite means, and let
Z =X +Y.
35

(a) Since X + > 0 ,if x > 0, we have that P(X + 6 x) = P(X + = 0)+P(0 < X + 6
x) = P(X 6 0) + P(0 < X 6 x) = P(X 6 x). Also, P(X 6 x) = P(X =
0) + P(0 < X 6 x) = P(X 6 0) + P(0 < X 6 x) = P(X 6 x).
Similarly, if y > 0, we have that P(Y + 6 y) = P(Y 6 y) and P(Y 6 y) =
P(Y 6 y).
Then P(X + 6 x, Y + 6 y) = P(X + = 0, Y + 6 y) + P(0 < X + 6 x, Y + 6
y) = P(X + = 0, Y + = 0) + P(X + = 0, 0 < Y + 6 y) + P(0 < X + 6 x, Y + =
0) + P(0 < X + 6 x, 0 < Y + 6 y) = P(X 6 0, Y 6 0) + P(X 6 0, 0 < Y 6
y) + P(0 < X 6 x, Y 6 0) + P(0 < X 6 x, 0 < Y 6 y) = P(X 6 0)P(Y 6
0)+P(X 6 0)P(0 <Y 6 y)+P(0 < X 6 x)P(Y 6 0)+P(0 < X 6 
x)P(0 <
Y 6 y) = P(X 6 0) P(Y 6 0) + P(0 < Y 6 y) + P(0 < X 6 x) P(Y 6

0) + P(0 < Y 6 y) = P(X 6 0)P(Y 6 y) + P(0 < X 6 x)P(Y 6 y) =
P(X 6 x)P(Y 6 y). Since X + > 0, if x < 0, then P(X + 6 x) = 0. So if
x < 0 or y < 0 there is nothing to prove (anything times 0 is 0). Therefore
by Proposition 3.2.4, X + and Y + are independent. The proof is similar for
X + and Y , and X and Y + , and X and Y , using the fact that if X and
Y are independent then each of X and Y are independent.

+

+
(b) Z = XY = (X X )(Y Y ). Therefore Z () = max (X + ()



X ())(Y ()Y ()), 0 and Z () = max (X + ()X ())(Y + ()



Y ()), 0 .
(c) By part (a) (independence), Exercise 4.3.3 (linearity) and (4.2.7) we have that
E(XY ) = E((X + X )(Y + Y )) = E(X + Y + X Y + X + Y +
X Y ) = E(X + Y + )E(X Y + )E(X + Y )+E(X Y ) = E(X + )E(Y + )
E(X )E(Y + )E(X + )E(Y )+E(X )E(Y ) = E(X + )(E(Y +)E(Y ))
E(X )(E(Y +)E(Y )) = (E(X + )E(X ))(E(Y + )E(Y )) = E(X)E(Y ).

4.5. Exercises
Exercise 4.5.1. Let (, F, P) be Lebesgue measure on [0, 1], and set

1, 0 6 < 1/4

2 2 , 1/4 6 < 3/4


X() =

2 , 3/4 6 6 1.
Compute P(X A) where

36

(a) A = [0, 1].


(b) A = [ 21 , 1].
Proof. Let (, F, P) be Lebesgue measure on [0, 1], and set

1, 0 6 < 1/4

2 2 , 1/4 6 < 3/4


X() =

2 , 3/4 6 6 1.

(a) Let A = [0, 1]. We have that 0 6 2 2 6 1 if and only if 0 6 6 2/2 < 3/4.
Also 06 2 6 1 if and only if 06 6 1. Therefore P(X A) = P([0, 1/4)
[1/4, 2/2) [3/4, 1]) = 1/4 + 2/2 1/4 + 1/4 = (1 + 2 2)/4.

(b) Let A = [1/2, 1]. Then 1/2 6 2 2 61 if and only if 1/2 6 6 2/2 < 3/4.
Also 1/2 6 2 61 if and only if 2/2 6 6 1. Therefore P(X
A) =
P([0, 1/4) [1/2, 2/2] [3/4, 1]) = 1/4 + 2/2 1/2 + 1/4 = 2/2.

Exercise 4.5.2. Let X be a random variable with finite mean, and let a R
be any real number. Prove that E(max(X, a)) > max(E(X), a). [Hint: Consider
separately the cases E(X) > a and E(X) < a.] (See also Exercise 5.5.7.)
Proof. Let X be a random variable with finite mean, and let a R be any real
number. Let A = {X > a}, then max(X, a) = X1A + a1AC and by linearity
we have E(X1A ) > E(a1A ) and E(a1AC ) > E(X1AC ). Thus E(max(X, a)) =
E(X1A ) + E(a1AC ) > E(X1A ) + E(X1AC ) = E(X) and similarly we have that
E(max(X, a)) > E(a) = a. Therefore E(max(X, a)) > max(E(X), a).
Exercise 4.5.3. Give an example of random variables X and Y defined on
Lebesgue measure on [0,1], such that P(X > Y ) > 21 , but E(X) < E(Y ).
Proof. Let X = 1[0,1] and let Y = 1001[3/4,0] . Then P(X > Y ) = P([0, 3/4)) =
3/4, but E(X) = P([0, 1]) = 1 < E(Y ) = 100P([3/4, 1]) = 25.
Exercise 4.5.4. Let (, F, P) be the uniform distribution on = {1, 2, 3}, as
in Example 2.2.2. Find random variables X, Y, and Z on (, F, P) such that
P(X > Y )P(Y > Z)P(Z > X) > 0, and E(X) = E(Y ) = E(Z).
Proof. Let (, F, P) be the uniform distribution on = {1, 2, 3}. Let X = 1{1,2} ,
Y = 1{2,3} and Z = 1{1,3} . Then we have P(X > Y ) = P({1}) = 1/3, P(Y >
Z) = P({2}) = 1/3, and P(Z > X) = P({3}) = 1/3, so that P(X > Y )P(Y >
Z)P(Z > X) = 1/27 > 0. Also, we have that E(X) = P({1, 2}) = 2/3, E(Y ) =
P({2, 3}) = 2/3, and E(Z) = P({1, 3}) = 2/3, so that E(X) = E(Y ) = E(Z).

37

Exercise 4.5.5. Let X be a random variable on (, F, P), and suppose that is


a finite set. Prove that X is a simple random variable.
Proof. Let X be a random variable on (, F,
PnP) and suppose = {a1 , . . . , an }
for some n N. Let xi = X(ai ), then X = i=1 xi 1{ai } and is a simple random
variable.
Exercise 4.5.6. Let X be a random variable defined on Lebesgue measure on
[0,1], and suppose that X is a one-to-one function, i.e. that if 1 6= 2 then
X(1 ) 6= X(2 ). Prove that X is not a simple random variable.
Proof. Let X be a random variable defined on Lebesgue measure on [0,1], and
suppose
that X is a one-to-one function. Suppose for contradiction that X =
Pn
x
1
for some x1 , x2 , . . . , xn and some finite partition {A1 , A2 , . . . , An } of
i
AS
i
i=1
. Then i Ai = , implies that for some k, (Ak ) > 0. This implies that Ak
is not countable and there exists 1 , 2 Ak with 1 6= 2 . However, X(1 ) =
xk = X(2 ) which contradicts the fact that X is one-to-one. Therefore, X is not a
simple random variable.
Exercise 4.5.7. (Principle of inclusion-exclusion, general case) Let A1 , A2 , . . . , An
F. Generalise the principle of inclusion-exclusion to:
P(A1 An ) =

n
X

P(Ai )

i=1

P(Ai Aj )+

16i<j6n

P(Ai Aj Ak ) P(A1 An ).

16i<j<k6n

[Hint: Expand 1

Qn

i=1 (1

1Ai ), and take expectations of both sides.]

Proof. Let A1 , A2 , . . . , An F. Then we have that


1

n
Y

(1 1Ai ) = 1 (1n

i=1

n
X
i=1

1Ai +

1Ai 1Aj

16i<j6n

1Ai 1Aj 1Ak + 1A1 . . . 1An )

16i<j<k6n
n
X

1Ai

i=1

1Ai Aj

16i<j6n

16i<j<k6n

38

1Ai Aj Ak 1A1 An .

Since 1 1Ai = 1ACi for each i, and


expecation of both sides yields:

Qn

i=1

1ACi = 1AC1 ACn we have taking the

C
P(A1 An ) = 1P(AC
C ) = E(11AC AC )
1 An ) = 1E(1AC
n
1 An
1

= E(1

n
Y

(1 1Ai )) =

i=1

E(1Ai )

i=1

E(1Ai Aj )

16i<j6n

E(1Ai Aj Ak ) E(1A1 An )

16i<j<k6n
n
X

P(Ai )

i=1

n
X

P(Ai Aj )

16i<j6n

P(Ai Aj Ak ) P(A1 An ).

16i<j<k6n

Exercise 4.5.8. Let f (x) = ax2 + bx + c be a second-degree polynomial function


(where a, b, c R are constants).
(a) Find necessary and sufficient conditions on a, b, and c such that the equations
E(f (X)) = 2 E(f (X)) holds for all R and all random variables X.
(b) Find necessary and sufficient conditions on a, b, and c such that the equation
E(f (X )) = E(f (X)) holds for all R and all random variables X.
(c) Do parts (a) and (b) account for the properties of the variance function? Why
or why not?
Proof. Let f (x) = ax2 + bx + c be a second-degree polynomial function (where
a, b, c R are constants).
(a) We have that E(f (X)) = aE((X)2 )+bE(X)+c = 2 aE(X 2 )+bE(X)+c
and 2 E(f (X)) = 2 aE(X 2 ) + 2 bE(X) + 2 c. Therefore E(f (X)) =
2 E(f (X)) holds for all R and all random variables X if and only if
2 a = 2 a, b = 2 b and c = 2 c if and only if b = c = 0.
(b) We have that E(f (X )) = aE((X )2 ) + bE(X ) + c = aE(X 2 ) + (b
2a)E(X) + a 2 b + c. Also we have that E(f (X)) = aE(X 2 ) + bE(X) + c.
Therefore E(f (X )) = E(f (X)) holds for all R and all random
variables X if and only if a = a, b 2a = b and a 2 b + c = c if and only
if a = 0 and b = 0.
(c) No. If f (x) = ax2 + bx + c satisfies both (a) and (b), we have that f (x) =
0. Suppose X = 1[0,1/2] is defined on Lebesgue measure on [0,1]. Then
Var(X) = 1/4 6= 0. The issue here is that Var(X) = E(gX (X)) where
gX (x) = (x E(X)) is dependent on X.

39

Exercise 4.5.9. In proving property (4.1.6) of variance, why did we not simply
proceed by induction on n? That is, suppose we know that Var(X+Y ) = Var(X)+
Var(Y ) whenever X and Y are independent. Does it follow easily that Var(X +
Y + Z) = Var(X) + Var(Y ) + Var(Z) whenever X, Y, and Z are independent?
Why or why not? How does Exercise 3.6.6 fit in?
Proof. In (4.1.6) it is only required that X1 , . . . , Xn are pairwise independent
(since we only need Cov(Xi , Xj ) = 0), which is more general. Also, we introduce the concept of covariance. Induction is straight forward. Suppose we
know that Var(X + Y ) = Var(X) + Var(Y ) whenever X and Y are independent. Suppose X, Y, and Z are independent, then letting W = X + Y , by Exercise 3.6.6, W and Z are independent. Therefore by the hypothesis, we have
Var(W + Z) = Var(W ) + Var(Z) = Var(X) + Var(Y ) + Var(Z).
Exercise 4.5.10. Let X1 , X2 , . . . be i.i.d. with mean mean and variance 2 , and
let N be a non-negative integer-valued random variable with mean mPand variance

v, with N independent of all the Xi . Let S = X1 + + XN = i=1 Xi 1N >i .


2
Compute Var(S) in terms of , , m, and v.
Proof. Since 2 = E(Xi2 ) 2 , then E(Xi2 ) = 2 + 2 . First, by linearity, and
independence, we have for each i, that
Var(Xi 1N >i ) = E(Xi2 12N >i ) E(Xi 1N >i )2
= E(Xi2 )E(12N >i ) E(Xi )2 E(1N >i )2
= (2 + 2 )E(12N >i ) 2 E(1N >i )2
= 2 (E(12N >i ) E(1N >i )2 ) + 2 E(12N >i ).
Also for i < j, we have
Cov(Xi 1N >i , Xj 1N >j ) = E(Xi 1N >i Xj 1N >j ) E(Xi 1N >i )E(Xj 1N >j )
= 2 E(1N >j ) 2 E(1N >i )E(1N >j )
= 2 (E(1N >j ) E(1N >i )E(1N >j )) .

40

Therefore since E(N 2 ) = m2 + v, and by Proposition 4.2.9, we have that


Var(S) =

X
k=1

Var(Xk 1N >k ) + 2

Cov(Xi 1N >i , Xj 1N >j )

i<j

X

2 (E(12N >k ) E(1N >k )2 ) + 2 E(12N >i ) + 2
2 (E(1N >j ) E(1N >i )E(1N >j ))
i<j

k=1

= 2 m 2

P(N > k)2 + 2 m + 22

= 2 m

P(N > k)2 + 2

P(1N >i )P(1N >j )

i<j

P(N > j) 2

P(N > j)

i<j

k=1

P(N > j)

i<j

P(N > i)P(N > j) + 2 m

i<j

i<j

k=1

= 2 m + 2

= 2 m + 2

P(N > j) 22

i<j

k=1

P(N > k)2 + 2

P(N > i)P(N > j) + 2 m

i<j

!2
P(N > k)

+ 2 m

k=1

= 2 m + 2

P(N > j) E(N )2 + 2 m

i<j

= 2

m+

!
l(l 1)P(N = l) m2

+ 2 m

l=1
2

m+

l P(N = l)

l=1
2

= 2

!
2

lP(N = l) m

+ 2 m

l=1
2

m + E(N ) m m

m2 + v m2 + 2 m

+ 2 m

= 2 v + 2 m.

Exercise 4.5.11. Let X and Z be independent, each with the standard normal
distribution, let a, b R (not both 0), and let Y = aX + bZ.
(a) Compute Corr(X, Y ).
(b) Show that |Corr(X, Y )| 6 1 in this case. (Compare Exercise 5.5.6.)
(c) Give necessary and sufficient conditions on the values of a and b such that
Corr(X, Y ) = 1.

41

(d) Give necessary and sufficient conditions on the values of a and b such that
Corr(X, Y ) = 1.
Proof. Let X and Z be independent, each with the standard normal distribution
(that is, with mean 0 and variance 1), let a, b R (not both 0), and let Y = aX+bZ.
(a) We have that E(Y ) = aE(X) + bE(Z) = 0. Also, E(X 2 )E(X)2 = Var(X) =
1 so that E(X 2 ) = 1. Then by independence, Cov(X, Y ) = E(XY ) =
E(aX 2 + bXZ) = aE(X 2 ) + bE(X)E(Z) = aE(X 2 ) = a. Also by in2
dependence, we have that Var(Y ) = Var(aX) +
Var(bZ) = a Var(X) +
2
2
2
2
2
b Var(Z) = a + b . Therefore Corr(X, Y ) = a/ a + b .

(b) We have that a2 + b2 > a2 = |a|. Therefore, |Corr(X, Y )| = |a/ a2 + b2 | 6


|a|/|a| = 1.

(c) Since
a2 + b2 > |a|, we have that Corr(X, Y ) = 1 if and only if a > 0 and

2
a + b2 = a if and only if a > 0 and b = 0.
(d) Similar to (c) we have that Corr(X, Y ) = 1 if and only if a < 0 and b = 0.

Exercise 4.5.12. Let X and Y be independent general non-negative random


variables, and let Xn = n (X), where n (x) = min(n, 2n b2n xc) as in Proposition
4.2.5.
(a) Give an example of a sequence of functions n : [0, ) [0, ), other than
n (x) = n (x), such that for all x, 0 6 n (x) 6 x and {n (x)} % x as
n .
(b) Suppose Yn = n (Y ) with n as in part (a). Must Xn and Yn be independent?
(c) Suppose {Yn } is an arbitrary collection of non-negative simple random variables
such that {Yn } % Y . Must Xn and Yn be indepedent?
(d) Under the assumption of part (c), determine (with proof) which quantities in
equation (4.2.7) are necessarily equal.
Proof. Let X and Y be independent general non-negative random variables, and
let Xn = n (X), where n (x) = min(n, 2n b2n xc).
(a) Let n (x) = min(n, 3n b3n xc). Then clearly 0 6 n (x) 6 x and {n (x)} % x
as n .
(b) We have that n is piecewise continuous, therefore by Proposition 3.1.8, n is
Borel measurable. Therefore, by Proposition 3.2.3, we have that Xn and Yn
are independent.
(c) No. Let = {0, 1}2 , F = 2 and P = |A|/4 for each A F represent flipping
two coins. Let X = 1{1}{0,1} and let Y = 1{0,1}{1} (e.g. the indicator
functions for the first and second coin tosses to be heads). Then X and Y
42

are independent since P(X = x, Y = y) = P({x} {y}) = P({x})P({y}) for


each x, y {0, 1}. Clearly we have Xn = X for all n N. For some n0 , let
Y1 = = Yn0 = 0 and Yn = Y for all n > n0 . Then P(Xn0 = 0, Yn0 = 0) =
P(X = 0) = 1/2 6= 0 = P(X = 0) 0 = P(Xn0 = 0)P(Yn0 = 0).
(d) Assume {Yn } % Y where each Yn is nonnegative, but Xn and Yn not necessarily independent. Since Xn % X, Yn % Y , and nonnegativity we
have Xn Yn % XY . By the Monotone Convergence Theorem, we have that
limn E(Xn ) = E(X), limn E(Yn ) = E(Y ), limn E(Xn Yn ) = E(XY )
and limn E(Xn )E(Yn ) = E(X)E(Y ). Thus by independence we have that
limn E(Xn Yn ) = E(XY ) = E(X)E(Y ) = limn E(Xn )E(Yn ).

Exercise 4.5.13. Give examples of a random variable X defined on Lebesgue


measure on [0,1], such that
(a) E(X + ) = and 0 < E(X ) < .
(b) E(X ) = and 0 < E(X + ) < .
(c) E(X + ) = E(X ) = .
(d) 0 < E(X) < but E(X 2 ) = .
Proof. Let (, F, P) be Lebesgue measure on [0,1].
(a) Define X by X() = 2n for 2n 6 < 2(n1) where n = 2, 3, . . . and
PN
X() = 1 for 1/2 6 6 1. Then E(X + ) > k=2 2k 2k = N 1 for any
N N and we have E(X + ) = . Also E(X ) = 1(1 1/2) = 1/2.
(b) Define X by X() = 2n for 2n 6 < 2(n1) where n = 2, 3, . . . and
X() = 1 for 1/2 6 6 1. Then Also E(X + ) = 1(1 1/2) = 1/2. Also
PN
E(X ) > k=2 2k 2k = N 1 for any N N and we have E(X ) = .
(c) Define X by X() = 2n for 2n 6 < 2(n1) where n = 2, 3, . . . and
X() = 2k for 1 2(k1) 6 < 1 2k where k = 2, 3, . . . . Then similar
to (a) and (b), we have E(X + ) = E(X ) = .
(d) Let a1 = a2 = 2, a3 = a4 = 22 , a5 = a6 = 23 , . . . . Define X by X() = 0
for 1/2 6 6 1 and X() = an for 2n 6 < 2(n1) where n = 2, 3, . . . .
P
4
2
Then we have that E(X) = 2 n=1 2n < . However, E(X 2 ) > 222 + 224 +
2N
+ 222N = N for any N N. Therefore E(X 2 ) = .

Exercise 4.5.14. Let Z1 , Z2 , . . . be general random variables with E|Zi | < ,


and let Z = Z1 + Z2 + . . . .
P
P
P
(a) Suppose i E(Zi+ ) < and i E(Zi ) < . Prove that E(Z) = i E(Zi ).
43

(b) Show that


P we still have E(Z) =
or i E(Zi ) < .

E(Zi ) if we have at least one of

E(Zi+ ) <

(c) Let {Zi } be independent,


with P(Zi = +1) = P(Zi = 1) = 12 for each i.
P
Does E(Z) = i E(Zi ) in this case? How does that relate to (4.2.8)?
Proof. Let Z1 , Z2 , . . . be general random variables with E|Zi | < , and let Z =
Z1 + Z2 + . . . .
P
P

(a) Suppose i E(Zi+ ) < and


By the Monotone convergence
i E(Zi ) < .P
P
+

+
theorem,
we
have
Z
=
Z
and
Z
=
i iP
i Zi , and we have E(Z ) =
P
+

< and E(Z ) = i E(Zi ) < . Therefore E(Z) = E(Z + )


i E(Zi )P

E(Z ) = i E(Zi ).
P
P
(b) Suppose that at least one of i E(Zi+ ) < orP i E(Zi ) < . If they are
both finite,Pthen by (a) we have
= i E(Zi ). We will prove the
Pthat E(Z)

case when i E(Zi+ ) < and i E(ZP


i ) = , the other case is similar. We
have that E(Z) = E(Z + ) E(Z ) = i E(Zi ) = .
+

(c) We have that


so that E(Z) = 0.
P E(Zi ) = 1(1/2) and E(Z+i ) = 1(1/2),
Therefore i E(Zi ) = 0. However, E(Z ) = E(Z ) = , so that E(Z) is
not defined. (4.2.8) does not apply since the Zi are not nonnegative.

Exercise 4.5.15. Let (1 , F1 , P1 ) and (2 , F2 , P2 ) be two probability triples.


Let A1 , A2 , F1 , and B1 , B2 , F2 . Suppose
S that it happens that the sets
{An Bn } are all disjoint, and furthermore that n=1 (An Bn ) = A B for some
A F1 and B F2 .
(a) Prove that for each 1 , we have
1A ()P2 (B) =

1An ()P2 (Bn ).

n=1

[Hint: This is essentially countable additivity of P2 , but you do need to be


careful about disjointness.]
(b) By taking expectations of both sides with respect to P1 and using countable
additivity of P1 , prove that
P1 (A)P2 (B) =

P1 (An )P2 (Bn ).

n=1

(c) Use this result to prove that the J and P for product measure, presented in
Subsection 2.6, do indeed satisfy (2.5.5).

44

Proof. Let (1 , F1 , P1 ) and (2 , F2 , P2 ) be two probability triples. Let A1 , A2 ,


F1 , and B1 , B2 , F2 . SupposeS that it happens that the sets {An Bn } are

all disjoint, and furthermore that n=1 (An Bn ) = A B for some A F1 and
B F2 .
S
(a) Since
(An Bn ) = A B, we have that
n=1

1AB (1 2 ) = 1A (1 )1B (2 ) =

1An (1 )1Bn (2 )

n=1

for 1 1 and 2 2 . Let


of random variPn 1 . Consider the sequence
Pn
ables defined by Sn (2 ) = j=1 1Aj Bj ( 2 ) = j=1 1Aj ()1Bj (2 ) for
2 2 . Since Sn % 1A ()1B , we have that by the Monotone convergence
theorem


lim EP2 (Sn ) = EP2 1A ()1B = 1A ()EP2 1B .
n

Or in otherwords,

1An ()P2 (Bn ) = 1A ()P2 (B).

n=1

(b) Similar to part (a), by the Monotone convergence theorem we have


P1 (A)P2 (B) =

P1 (An )P2 (Bn ).

n=1

(c) Let (1 , F1 , P1 ) and (2 , F2 , P2 ) be probability triples and = 1 2 .


Let J = {A B; A F1 , B F2 }. Define P(A B) = P1 (A)P
S 2 (B) for
A B J . Suppose that D1 , D2 , J are disjoint with n Dn J .
Since Dn J for
S each n, we have that Dn = An Bn for some An F1 and
Bn F2 . Since n Dn J , we must have that bigcupn Dn = A B for some
A F1 and B F2 . Therefore by part (a) and part (b), we must have that
!
[
X
P
Dn =
P(Dn ).
n

Exercise 4.5.16. Let X1 , X2 , . . . be defined jointly on some probability space


(, F, P), with E[Xi ] = 0 and E[(Xi )2 ] = 1 for all i. Prove that P[Xn > n i.o.] =
0.

45

Proof. First note that Var(Xi ) = E(Xi2 ) E(Xi )2 = 1. By Chebychevs inequality


(instead of reproving Chebychevs inequality, I refer you to Proposition 5.1.2), we
have P(|Xn E(Xn )| > n) 6 Var(Xn )/n2 . That is, we have P(|Xn | > n) 6 1/n2 .
Since {Xn >P
n} {|Xn | > n}, we
have P(Xn > n) 6 P(|Xn | > n) 6 1/n2 . This
P

implies that n=1 P(Xn > n) 6 n=1 1/n2 < . Therefore, by the Borel-Cantelli
Lemma, we that P[Xn > n i.o.] = 0.

5.5 Exercises
Exercise 5.5.1. Suppose E(2X ) = 4. Prove that P(X > 3) 6 1/2.
Proof. Suppose E(2X ) = 4. First note that X > 3 if and only if log2 (2X ) > log2 (8)
if and only if 2X > 8. Then by Markovs inequality, P(X > 3) = P(2X > 8) 6
E(2X )/8 = 1/2.
Exercise 5.5.2. Give an example of a random variable X and > 0 such that
P(X > ) > E(X)/. [Hint: Obviously X cannot be non-negative.] Where does
the proof of Markovs inequality break down in this case?
Proof. Let (, F, P) be Lebesgue measure on the interval [0, 1]. Let X be a random
variable defined by X() = 1 for [0, 3/4) and X() = 1 for [3/4, 1]. Then
E(X) = 1(3/4) + 1(1/4) = 1/2. But since probability is non-negative, for any
> 0, we have P(X > ) > 0 > 1/2 = E(X)/. The proof of Markovs
inequality relies on bounding X below by a non-negative simple random variable
which cannot be done if the range of X has negative values. In particular, the
simple random variable takes on the value of 0 if X() < which is necessary to
obtain the inequality.
Exercise 5.5.3. Give examples of random variables Y with mean 0 and variance
1 such that
(a) P(|Y | > 2) = 1/4.
(b) P(|Y | > 2) < 1/4.
Proof. (a) Let Y be defined on = {0, 1}3 by Y () = 2 if = (1, 0, 0), Y () = 2
if = (0, 1, 0), and Y () = 0 otherwise. Then E(Y ) = 2(1/8) + 2(1/8) = 0
and Var(Y ) = E(Y 2 ) = 4(1/8) + 4(1/8) = 1. Also P(|Y | > 2) = P(Y =
2 or Y = 2) = P(Y = 2) + P(Y = 2) = 1/8 + 1/8 = 1/4.
(b) Let Y be defined on {0, 1} by Y () = 1 if = 0 and Y () = 1 if = 1.
Then E(Y ) = 1/2 + 1/2 = 0 and Var(Y ) = E(Y 2 ) = 1/2 + 1/2 = 1. And
clearly P(|Y | > 2) = 0 < 1/4.

46

Exercise 5.5.4. Suppose X is a non-negative random variable with E(X) = .


What does Markovs inequality say in this case?
Proof. Suppose X is a non-negative random variable with E(X) = . Then
Markovs inequality simply says that for all > 0, we have P(X > ) 6 .
That is, it tells us nothing, i.e. 0 6 P(X > ) 6 1.
Exercise 5.5.5. Suppose Y is a random variable with finite mean Y and with
Var(Y ) = . What does Chebychevs inequality say in this case?
Proof. Suppose Y is a random variable with finite mean Y and with Var(Y ) = .
Then Chebychevs inequality simply says that for all > 0, we have P(|Y Y | >
) 6 . That is, it tells us nothing, i.e. 0 6 P(|Y Y | > ) 6 1.
Exercise 5.5.6. For general jointly defined random variables X and Y , prove that
|Corr(X, Y )| 6 1. [Hint: Dont forget the Cauchy-Schwarz inequality.] (Compare
Exercise 4.5.11.)
Proof. Let X and Y be jointly defined random variables with mean E(X) = X
2
and Var(Y ) = Y2 .
and E(Y ) = Y and with variance Var(X) = X
First, by the Cauchy-Schwarz inequality we see that
|Cov(X, Y )| = |E[(X X )(Y Y )]|
p
6 E[(X X )2 ]E[(Y Y )2 ]
q
2 2
= X
Y
= X Y .
Therefore we have
|Cov(X, Y )|
X Y
X Y
6
X Y
= 1.

|Corr(X, Y )| =

In Exercise 4.5.11 we prove this result for a more specific example involving independent standard normals (and linear combinations of them). Here we generalize
the result.

47

Exercise 5.5.7. Let a R, and let (x) = max(x, a) as in Exercise 4.5.2. Prove
that is a convex function. Relate this to Jensens inequality and to Exercise
4.5.2.
Proof. Let a R, and let (x) = max(x, a). Now let x, y, R with 0 6 6 1.
Then
(x + (1 )y) = max(x + (1 )y, a + (1 )a)
6 max(x, a) + max((1 )y, (1 )a)
= (x) + (1 )(y).
Therefore is a convex function. Jensens inequality then says that E((X)) >
(E(X)), which proves Exercise 4.5.2.
Exercise 5.5.8. Let (x) = x2 .
(a) Prove that is a convex function.
(b) What does Jensens inequality say for this choice of ?
(c) Where in the text have we already seen the result of part (b)?
Proof. Let (x) = x2 .
(a) Let x, y, R with 0 6 6 1. First we note that
x2 = y 2 + 2y(x y) + (x y)2 > y 2 + 2y(x y) = y(2x y).
This implies that
(x + (1 )y) = (x + (1 )y)2
= 2 x2 + 2(1 )xy + (1 )2 y 2
= 2 x2 + 2(1 )xy + (1 )y 2 (1 )y 2
= 2 x2 + (1 )y 2 + (1 )y(2x y)
6 2 x2 + (1 )y 2 + (1 )x2
= x2 + (1 )y 2
= (x) + (1 (y).
Therefore is a convex function.
(b) Jensens inequality then says that E(X 2 ) 6 E(X)2 .
(c) This result is seen on page 44 when we defined variance and showed its nonnegativity. That is Var(X) = E(X 2 ) E(X)2 > 0.

48

Exercise 5.5.9. Prove Cantellis inequality, which states that if X is a random


variable with finite mean m and finite variance v, then for > 0,
v
.
P(X m > ) 6
v + 2

[Hint: First show P(X m > ) 6 P (X m + y)2 > ( + y)2 for all y > 0. Then
use Markovs inequality, and minimise the resulting bound over choice of y > 0.]
Proof. Let X be a random variable with finite mean m and finite variance v and
let > 0. Suppose y > 0. Then we see that (X m + y)2 > ( + y)2 if and
only if |X m + y| > + y. Then {X m > } = {X m + y > + y}
{|X m + y| > + y} = {(X m+ y)2 > ( + y)2 }. This implies that P(X m >
) 6 P (X m + y)2 > ( + y)2 for all y > 0.

By Markovs inequality
we then have P(X m > ) 6 P (X m+y)2 > (+y)2 6

E (X m + y)2 /( + y)2 for all y > 0.
Now


E (X m)2 + 2yE(X m) + y 2
E (X m + y)2
=
( + y)2
( + y)2
2
v+y
=
:= f (y)
( + y)2
Taking the derivative of f we get
2y( + y) 2(v + y 2 )
( + y)3
2(y v)
=
,
( + y)3

f 0 (y) =

and solving f 0 (y) = 0 for y, gives us


y :=

v
.

If y < v/ we see that f 0 (y) < 0 and if y > v/, we see that f 0 (y) > 0. Therefore
y is a global minimum. Thus we have
v + y2
( + y)2
v + (v/)2
6
( + v/)2
2 + v
=v 2
( + v)2
v
=
.
v + 2

P(X m > ) 6

49

Exercise 5.5.10. Let


X1 , X2 , . . . be a sequence of random variables, with E[Xn ] =
8 and Var[Xn ] = 1/ n for each n. Prove or disprove that {Xn } must converge to
8 in probability.
Proof. Let X1 ,X2 , . . . be a sequence of random variables, with E[Xn ] = 8 and
Var[Xn ] = 1/ n for each n. Let  > 0. For each n, we have by Chebychevs
1
inequality (see Exercise 5.5.9) that P(|Xn 8| > ) 6 2 . Thus given N > 1/6 ,
n
1
< . Therefore Xn converges
we have for all n > N , that P(|Xn 8| > ) 6
N 2
to 8 in probability.
Exercise 5.5.11. Give (with proof) an example of a sequence {Yn } of jointlydefined random variables, such that as n : (i) Yn /n converges to 0 in probability; and (ii) Yn /n2 converges to 0 with probability 1; but (iii) Yn /n does not
converge to 0 with probability 1.
Proof. Let (, F, P) be Lebesgue measure on [0,1]. Let Y1 = 1[0,1/2) , Y2 = 1[1/2,1] ,
Y3 = 1[0,1/4) , Y4 = 1[1/4,1/2) , Y5 = 1[1/2,3/4) , Y6 = 1[3/4,1] , Y7 = 1[0,1/8) , Y8 =
Pk
k
1[1/8,1/4) , etc. as on page 59. Consider the sequence ak =
j=1 2 . For each
n > 0, let mn be the smallest number such that n 6 amn . Then E(Yn ) = 1/2mn .
(i) Given  > 0, we have by Chebychevs inequality that P(|Yn /n| > ) 6 E(Yn )/n =
1/2mn n. Thus letting N > 1/2 we have for all n > N that P(|Yn /n| > ) <
/2mn 6 . Therefore Yn /n 0 in probability.
P
(ii) P
Given  > 0, we have
by Chebychevs inequality that n P(|Yn /n2 | > ) 6
P
1/ n 1/2mn n2 < 1/ n 1/n2 < . Thus by Corollary 5.2.2, Yn /n2 converges
to 0 w.p. 1.
PmN +1
(iii) Let N > 0. Then clearly j=m
Y2j = 1[0,1] . This implies that we have
N +1
P(  > 0, |Yn /n| >  i.o) = 1 so that Yn /n does not converge to 0 w.p 1.
Exercise 5.5.12. Give (with proof) an example of two discrete random variables having the same mean and the same variance, but which are not identically
distributed.
Proof. Let X and Y be defined on = {0, 1}3 by X() = 2 if = (1, 0, 0),
X() = 2 if = (0, 1, 0), and X() = 0 otherwise and Y () = 1 if
{(0, 0, 0), (1, 0, 0), (0, 1, 0), (1, 1, 0)} and Y () = 1 otherwise.
Then E(X) = 2(1/8) + 2(1/8) = 0 and Var(X) = E(X 2 ) = 4(1/8) + 4(1/8) = 1.
Also E(Y ) = 4(1/8) + 4(1/8) = 0 and Var(Y ) = E(Y 2 ) = 4(1/8) + 4(1/8) = 1.
Let f : R R be defined by f (x) = x4 . Then f is Borel-measurable and
E(f (X)) = E(X 4 ) = 16(1/8) + 16(1/8) = 4, but E(f (Y )) = 4(1/8) + 4(1/8) = 1.
Therefore X and Y are not identically distributed.

50

Exercise 5.5.13. Let r N. Let X1 , X2 , . . . be identically distributed random


variables having finite mean m, which are r-dependent, i.e. such that Xk1 , Xk2 , . . . , Xkj
are independent whenever ki+1 > ki + r for each i. (Thus, independent
random
Pn
variables are 0-dependent.) ProveP
that with probability one, n1 i=1 Xi m as
n
n . [Hint: Break up the sum i=1 Xi into r + 1 different sums.]
Proof. et r N. Let X1 , X2 , . . . be identically distributed random variables having
finite mean m, which are r-dependent.
For each i = 1, . . . , r +1, define the sequence {ai,j }
j=1 by ai,j = (i+j 2)(r +1)+1.
Suppose n > r, and for each i = 1, . . . , r + 1, let ui,n be the
number
Pulargest
i,n
such that ai,ui,n 6 n (i.e. ui,n = b n1
j=1 Xai,j . Then
r+1 i + 2c). Let Si,n =
Pn
Pr+1
Pr+1
(1/n) i=1 Xi = (1/n) i=1 Si,n = i=1 (ui,n /n)(1/ui,n )Si,n . Then for each i,
we have limn (ui,n /n) = limn b n1
r+1 i + 2c/n = 1/(r + 1). But we also
have by the Strong Law of Large Numbers, for each i, that (1/ui,n )Si,nP m w.p.
n
1, so that (ui,n /n)(1/ui,n )Si,n m/(r + 1) w.p. 1. Therefore (1/n) i=1 Xi
Pr+1
i=1 m/(r + 1) = m w.p. 1.
Exercise 5.5.14. Prove the converse of Lemma 5.2.1. That is, prove that if {Xn }
converges to X almost surely, then for each  > 0 we have P(|Xn X| >  i.o.) = 0.
Proof. Suppose {Xn } converges to X almost surely. Let  > 0. Since P(Xn
X) = 1 P(  > 0, |Xn X| >  i.o.) = 1, we have P(  > 0, |Xn X| >  i.o.) =
0. Therefore P(|Xn X| >  i.o.) 6 P(  > 0, |Xn X| >  i.o.) = 0.
Exercise 5.5.15. Let X1 , X2 , . . . be a sequence of independent random variables
with P(Xn = 3n ) = P(Xn = 3n ) = 12 . Let Sn = X1 + + Xn .
(a) Compute E(Xn ) for each n.
(b) For n N, compute Rn sup{r R; P(|Sn | > r) = 1}, i.e. the largest
number such that |Sn | is always at least Rn .
(c) Compute limn n1 Rn .
(d) For which  > 0 (if any) is it the case that P( n1 |Sn | > ) 6 0?
(e) Why does this result not contradict the various laws of large numbers?
Proof. Let X1 , X2 , . . . be a sequence of independent random variables with P(Xn =
3n ) = P(Xn = 3n ) = 21 . Let Sn = X1 + + Xn .
(a) E(Xn ) = (1/2)3n (1/2)3n = 0.
Pn1 k
Pn1
3 n1
n
(b) Let
N. Since
1),
then |Sn | > |3n k=1 3k | =
3 nn1
3 > k=1 3 =
3 2 (3


n

(3
+ 1) , where P |Sn | = 2 (3n1
2
+ 1) = 1/2 . Thus, Rn sup{r
3
n1
+ 1) .
R; P(|Sn | > r) = 1} = 2 (3

51

(c) We have limn n1 Rn = limn


1 3 n1
(3
+ 1) = .
2
n

(d) Since (1/n)Rn , for all  > 0, we have P( n1 |Sn | > ) 1. Thus for all
 > 0, it is the case that P( n1 |Sn | > ) 6 0.
(e) For the first version of the weak law of large numbers we must have that {Xn }
has uniformly bounded variance. However, Var(Xn ) = E(Xn2 ) = 32n .
For the first version of the strong law of large numbers we must have that
{Xn } has uniformly bounded fourth moments. However E(Xn4 ) = 34n .
For the second version of both the strong and weak law of large numbers
we must have that {Xn } are identically distributed. However, since E(Xn2 )
depends on n, they are not identically distributed. Therefore the hypotheses
arent satisfied.

6.2 Exercises
Exercise 6.2.4. Why does Proposition 6.2.1 not imply that Var(X) equals the
corresponding linear combination of variances?
Proof. The function f which defined the variance is dependent on the random variable used, i.e. f (x) = (x X )2 . Var(X) does indeed equal the linear combination
of variances using this function, but these are not the variances of the individual
distributions. That is, we want the variance of X with respect to the various distributions, not merely the variances of the distributions. Indeed the following is
true: Let f (x) = (x X )2 and L(X) = 41 1 + 14 2 + 21 N := . Then
Z
Var(X) =
f (X)(dt)
ZR
=
(X X )2 (dt)
R
Z
Z
Z
1
1
1
=
(X X )2 1 (dt) +
(X X )2 2 (dt) +
(X X )2 N (dt)
4 R
4 R
2 R
 
 
 
1 1
1 25
1 25
=
+
+
4 16
4 16
2 16
19
=
.
16

52

6.3 Exercises
Exercise 6.3.1. Let have density 4x3 10<x<1 , and let have density 21 x10<x<2 .
(a) Compute E(X) where L(X) = 31 + 23 .
(b) Compute E(Y 2 ) where L(Y ) = 16 + 13 2 + 12 5 .
(c) Compute E(Z 3 ) where L(Z) = 81 + 18 + 14 3 + 12 4 .
Proof. Let have density 4x3 10<x<1 , and let have density 12 x10<x<2 .
(a) Let L(X) = 31 + 23 . Then
E(X) =
=
=
=
=

2
1
E (X) + E (X)
3Z
3
Z
1
2
1
3
t 4t 10<t<1 (dt) +
t t10<t<2 (dt)
3 R
3 R 2
Z 1
Z 2
4
1
t4 dt +
t2 dt
3 0
3 0
4
8
+
15 9
52
.
45

(b) Let L(Y ) = 16 + 13 2 + 12 5 .


E(Y 2 ) =
=
=
=
=

1
1
1
E (Y 2 ) + E2 (Y 2 ) + E5 (Y 2 )
6Z
3
2 Z
Z
1
1
1
2
3
2
t 4t 10<t<1 (dt) +
t 2 (dt) +
t2 5 (dt)
6 R
3 R
2 R
Z
2 1 5
1
1
t dt + 22 + 52
3 0
3
2
1 4 25
+ +
9 3
2
251
.
18

53

(c) Let L(Z) = 81 + 18 + 14 3 + 21 4 .


E(Z 3 ) =
=
=
=
=

1
1
1
1
E (Z 3 ) + E (Z 3 ) + E3 (Z 3 ) + E4 (Z 3 )
8Z
8
4 Z
2
Z
Z
1
1
1
1
3
3
3 1
t 4t 10<t<1 (dt) +
t t10<t<2 (dt) +
t3 3 (dt) +
t3 4 (dt)
8 R
8 R
2
4 R
2 R
Z
Z 2
1 1 6
1
1
1
t dt +
t4 dt + 33 + 43
2 0
16 0
4
2
1
2 27 64
+ +
+
14 5
4
2
5491
.
140

Exercise 6.3.2. Suppose P(Z = 0) = P(Z = 1) = 21 , that Y N (0, 1), and that
Y and Z are independent. Set X = Y Z. What is the law of X?
Proof. Suppose P(Z = 0) = P(Z = 1) = 21 , that Y N (0, 1), and that Y
and Z are independent. Set X = Y Z and let N = L(Y ). Let B B. Then
X (B) = P(X B) = P(Y Z B).
Suppose 0 6 B, then 0 (B) = 0 and X (B) = P(Y Z B) = P((Z = 1) (Y
B)) = P(Z = 1)P(Y B) = 21 N (B) = 21 0 (B) + 12 N (B). Suppose 0 B, then
X (B) = P[(Z = 0) ((Z = 1) (Y B))] = P(Z = 0) + P(Z = 1)P(Y B) =
1
1
2 0 (B) + 2 N (B).
That is L(X) = 21 0 + 21 N .
Exercise 6.3.3. Let X Poisson(5).
(a) Compute E(X) and Var(X).
(b) Compute E(3X ).
Proof. Let X Poisson(5). Then X =

54

j=0 (e

5 j

5 /j!)j .

(a) First we calculate the expected value:


Z
E(X) =
tX (dt)
R

X
=
(e5 5j /j!)
tj (dt)
R

j=1

= e5

= 5e

j=1

X
5
j1

= 5e5
= 5e

j5j /j!

j=1

/(j 1)!

5k /k!

k=0
5 5

= 5.
Next we calculate the variance using the formula Var(X) = E(X 2 ) E(X)2 .
First we have that:
Z
E(X 2 ) =
t2 X (dt)
R

(e5 5j /j!)

Z
R

j=1

= e5

= 5e

t2 j (dt)

j 2 5j /j!

j=1

X
5

j5j1 /(j 1)!

j=1

X
X
= 5e5
5j1 /(j 1)! +
(j 1)5j1 /(j 1)!
j=1

j=1

X
X
= 5e5
5j1 /(j 1)! + 5
5j2 /(j 2)!
j=1

= 5e

j=2
k

5 /(k)! + 5

k=0
5

X
k=0

= 5e5 (e + 5e5 )
= 5(1 + 5)
= 30.
55

!
k

5 /(k)!

Then finally
Var(X) = E(X 2 ) E(X)2
= 30 52
= 5.
(b)
E(3X ) =
=

3t X (dt)

(e5 5j /j!)

Z
R

j=1

= e5

3t j (dt)

3j 5j /j!

j=1

= e5

15j /j!

j=1

= e5 e15
= e10 .

Exercise 6.3.4.
given by

Compute E(X), E(X 2 ), and Var(X), where the law of X is

(a) L(X) = 21 1 + 12 , where is Lebesgue measure on [0,1].


(b) L(X) = 31 2 + 23 N , where N is the standard normal distribution N (0, 1).
Proof. (a) Let L(X) = 21 1 + 12 , where is Lebesgue measure on [0,1]. First, we
compute E(X):
Z
Z
1
1
E(X) =
t1 (dt) +
t(dt)
2 R
2 [0,1]
Z
1 1 1
= +
tdt
2 2 0
1 1
= +
2 2
= 1.

56

Next, we compute E(X 2 ):


E(X 2 ) =

1
2

t2 1 (dt) +

1
= +
2
1
= +
2
= 1.

1
2
1
2

1
2

t2 (dt)

[0,1]

t2 dt

Finally, we compute Var(X):


Var(X) = E(X 2 ) E(X)2
= 1 12
= 0.
(b) Let L(X) = 31 2 + 23 N , where N is the standard normal distribution N (0, 1).
2
Let (t) = 12 et /2 . First, we compute E(X):
Z
Z
1
2
E(X) =
t2 (dt) +
tN (dt)
3 R
3 R
Z
1
2
= (2) +
t(t)dt
3
3
2 2
= + (0)
3 3
2
= .
3
Next, we compute E(X 2 ):
Z
Z
2
1
t2 2 (dt) +
t2 N (dt)
3 R
3 R
Z
1 2
2 2
= (2 ) +
t (t)dt
3
3
4 2
= + (1)
3 3
= 2.

E(X 2 ) =

Finally, we compute Var(X):


Var(X) = E(X 2 ) E(X)2
 2
2
=2
3
14
=
.
9
57

Exercise 6.3.5. Let X and Z be independent, with X N (0, 1), and with
P(Z = 1) = P(Z = 1) = 1/2. Let Y = XZ (i.e., Y is the product of X and Z).
(a) Prove that Y N (0, 1).
(b) Prove that P(|X| = |Y |) = 1.
(c) Prove that X and Y are not independent.
(d) Prove that Cov(X, Y ) = 0.
(e) It is sometimes claimed that if X and Y are normally distributed random
variables with Cov(X, Y ) = 0, then X and Y must be independent. Is that
claim correct?
Proof. Let X and Z be independent, with X N (0, 1), and with P(Z = 1) =
2
P(Z = 1) = 1/2. Let Y = XZ. Let (t) = 12 et /2 .
(a) Let L(X) := N and L(Y ) := Y . Let B B. Since (t) is an even function,
we have for all x R, that P(X 6 x) = P(X > x) = P(X 6 x). Thus
by Proposition 6.0.2, X N (0, 1) so that N (B) = P(X B). We have
that
Y (B) = P(XZ B)




= P (Z = 1) (X B) (Z = 1) (X B)






= P (Z = 1) (X B) + P (Z = 1) (X B)
= P(Z = 1)P(X B) + P(Z = 1)P(X B)
1
1
= N (B) + N (B)
2
2
= N (B).
Therefore Y N (0, 1).
(b) Clearly

P(|Z| = 1) = P (Z = 1) (Z = 1)
= P(Z = 1) + P(Z = 1)
= 1.
Thus
P(|X| = |Y |) = P(|X| = |X||Z|)

= P (|X| = |X|) (|Z| = 1)
= P(|Z| = 1)
= 1.
58

(c) Suppose on the contrary that X and Y are indepedent. Then by Proposition
3.2.3, X 2 and Y 2 are indepdent so that E(X 2 Y 2 ) = E(X 2 )E(Y 2 ). However, since X and Z are independent, we have E(X 2 Y 2 ) = E(X 4 Z 2 ) =
E(X 4 )E(Z 2 ) = E(X 4 ) = 3, while, E(X 2 )E(Y 2 ) = E(Y 2 ) = E(X 2 Z 2 ) =
E(X 2 )E(Z 2 ) = 1. Thus E(X 2 Y 2 ) 6= E(X 2 )E(Y 2 ) which is a contradiction.
Therefore X and Y are not independent.
(d) Since X and Z are independent, then by Proposition 3.2.3, X 2 and Z are also
independent. Since X N (0, 1) and Y N (0, 1) we have E(X) = E(Y ) =
0. Also E(Z) = 21 12 = 0. Thus E(XY ) = E(X 2 Z) = E(X 2 )E(Z) =
1 0 = 0. We also have E(X)E(Y ) = 0. Therefore Cov(X, Y ) = E(XY )
E(X)E(Y ) = 0 0 = 0.
(e) The claim is not correct. We have shown in parts (a)-(d), that X and Y
are normally distributed random variables with Cov(X, Y ) = 0, but are not
independent.

Exercise 6.3.6. Let X and Y be random variables on some probability triple


(, F, P). Suppose E(X 4 ) < , and that P[m 6 X 6 z] = P[m 6 Y 6 z]
for all integers m and all z R. Prove or disprove that we necessarily have
E(Y 4 ) = E(X 4 ).
Proof. Let X and Y be random variables on some probability triple (, F, P).
Suppose E(X 4 ) < , and that P[m 6 X 6 z] = P[m 6 Y 6 z] for all integers
m and all z R. We shall prove that we necessarily have E(Y 4 ) = E(X 4 ). The
hypothesis implies that P[m 6 X 6 m] = P[m 6 Y 6 m] for all integers m.
Therefore by additivity, we have P[m < X 6 z] = P[m < Y 6 z] for all integers
m and all z R. Let x R. If x > bxc, let A = P[bxc 6 X 6 x] and if
x = bxc, let A = 0.
P Then by countable additivity we have that P(X 6 x) =
P(Y 6 x) = A + n<bxc P[n < X 6 n + 1]. Thus by Proposition 6.0.2, we
have that L(X) = L(Y ). Therefore by Corollary 6.1.3, letting f (z) = z 4 , we have
E(X 4 ) = E(Y 4 ).
Exercise 6.3.7. Let X be a random variable, and let FX (x) be its cumulative distribution function. For fixed x R, we know by right-continuity that
limy&x FX (y) = FX (x).
(a) Give a necessary and sufficient condition that limy%x FX (y) = FX (x).
(b) More generally, give a formula for FX (x)(limy%x FX (y)), in terms of a simple
property of X.
Proof. Let X be a random variable, and let FX (x) be its cumulative distribution
function. Let x R.

59

(a) We have that limy%x FX (y) = limy&x FX (y) = FX (x) if and only if FX is
continuous at x.
(b) Since FX is monotonic and bounded, we have that limy%x FX (y) exists. Let
L = limy%x FX (y). Let n N, then by additivity we have
FX (x) FX (x 1/n) = P(x 1/n < X 6 x).
Then
lim FX (x)FX (x1/n) = FX (x)L = lim P(x1/n < X 6 x) = P(X = x).

That is FX (x) (limy%x FX (y)) = P(X = x).

Exercise 6.3.8. Consider the statement: f (x) = (f (x)) for all x R.


(a) Prove that the statement is true for all indicator functions f = 1B .
(b) Prove that the statement is not true for the identity function f (x) = x.
(c) Why does this fact not contradict the method of proof of Theorem 6.1.1?
Proof. (a) Let x R. Let B B. Then 1B (x)2 = 12 = 1 if x B and 1B (x)2 =
02 = 0 if x 6 B. Therefore 1B (x)2 = 1B (x) for all x R. Since the choice of
B was also arbitrary, we have 12B = 1B for all B B.
(b) Let f (x) = x. Clearly (1)2 6= 1.
(c) By part (a) and (b) we have shown that this property is not necessarily preserved in the limit.

7.2 Exercises
Exercise 7.2.4. Verify that (7.2.2) and (7.2.3) satisfy (7.2.1), and also satisfy
s(0) = 0 and s(c) = 1.
Proof. First we verify (7.2.2) satisfies (7.2.1) and also satisfies s(0) = 0 and s(c) = 1.

60

Since q + p = 1, we have
q

qs(a 1) + ps(a + 1) =

qa
pa1
 c
q
p

q a+1
pa
 c
q
p

a+1

q
pa1

q pa
 c
q
p

pq a +q a+1
pa
 c
pq

q a (p+q)
pa
 c
1 pq
 a
1 pq
 c =
1 pq

s(a).

Also
1
s(0) =
1
=

 0
q
p

 c
q
p

11
 c = 0,
1 pq

and
1
s(c) =
1

 c
q
p

 c = 1.
q
p

Next we verify (7.2.3) satisfies (7.2.1). Since p = q = 1/2, we have


1
(s(a 1) + s(a + 1))
2
a1 a+1
=
+
c
c
a
= = s(a).
c

qs(a 1) + ps(a + 1) =

Also
s(0) =

0
= 0,
c

61

and
s(c) =

c
= 1.
c

Exercise 7.2.5. Solve equation (7.2.1) by direct algebra, as follows.


(a) Show that (7.2.1) implies that for 1 6 a 6 c 1, s(a + 1) s(a) = pq (s(a)
s(a 1)).
 a
(b) Show that this implies that for 0 6 a 6 c 1, s(a + 1) s(a) = pq s(1).
(c) Show that this implies that for 0 6 a 6 c, s(a) =

Pa1  q i
i=0

s(1).

(d) Use the fact that s(c) = 1 to solve for s(1), and verify (7.2.2) and (7.2.3).
Proof.
(a) Suppose that s(a) = qs(a 1) + ps(a + 1) whenever 1 6 a 6 c 1. This implies
1
that s(a + 1) = (s(a) qs(a 1)). Subtracting s(a) from both sides we obtain
p
s(a + 1) s(a) =
=
=
=
=

1
(s(a) qs(a 1)) s(a)
p
1
(s(a) qs(a 1) ps(a))
p
1
((1 p)s(a) qs(a 1))
p
1
(qs(a) qs(a 1))
p
q
(s(a) s(a 1)) .
p

(b) Suppose part (a) is true. If a = 0, then since s(0) = 0 we have s(0 + 1) s(0) =
 0
q
s(1). If a = 1, then
s(1) =
p
q
(s(1) s(0))
p
 1
q
=
s(1)
p

s(1 + 1) s(1) =

62

If 1 < a 6 c 1, let k = a 1. Then a k = 1 so that 1 6 a k 6 c 1. and


q
(s(a) s(a 1))
p
= ...
 k
q
(s(a k + 1) s(a k))
=
p
 k
q
=
(s(2) s(1))
p

s(a + 1) s(a) =

q
s(1). Therefore
p

But we have already shown that s(2) s(1) =

 k+1
q
s(1)
p
 a
q
=
s(1).
p
 a
q
Therefore for 0 6 a 6 c 1 we have s(a + 1) s(a) =
s(1).
p
s(a + 1) s(a) =

(c) Suppose 0 6 a 6 c. Then by parts (a) and (b) we have for each 0 6 k 6 a, that
 k1
q
s(1). Therefore
s(k) s(k 1) =
p
s(a) = s(a) s(0)
a
X
=
(s(k) s(k 1))

k=1
a 
X
k=1

a1
X
k=0

q
p

k1

q
p

k

s(1)

s(1).

(d) Since s(c) = 1, we have by part (c) that


s(c) =

c1  k
X
q

k=0

= s(1)

s(1)

c1  k
X
q
k=0

= 1.
63

Therefore s(1) = 1/

Pc1  q k
k=0

. Thus we have if p 6= 1/2 then

 k
q
k=0
p
s(a) =
 
Pc1 q k
k=0
p
 c
 a
p
p
1
1
q
q
=
p /
p
1
1
q
q
 a
p
1
q
 c
=
p
1
q
Pa1

and if p = q = 1/2 then


 k
q
k=0
p
s(a) =
 
Pc1 q k
k=0
p
Pa1 k
1
= Pk=0
c1 k
k=0 1
a
= .
c
Pa1

Exercise 7.2.6. Solve equation (7.2.1) using the theory of difference equations,
as follows.
(a) Show that the corresponding characteristic equation t0 = qt1 + pt1 has two
distinct roots t1 and t2 when p 6= 1/2, and one double root t3 when p = 1/2.
Solve for t1 , t2 and t3 .
(b) When p 6= 1/2, the theory of difference equations says that we must have
sc,p (a) = C1 (t1 )a + C2 (t2 )a for some constants C1 and C2 . Assuming this,
using the boundary conditions sc,p (0) = 0 and sc,p (c) = 1 to solve for C1 and
C2 . Verify (7.2.2).
(c) When p = 1/2, the theory of difference equations says that we must have
sc,p (a) = C3 (t3 )a + C4 a(t3 )a for some constants C3 and C4 . Assuming this,
use the boundary conditions to solve for C3 and C4 . Verify (7.2.3).

64

Proof.
(a) Suppose t0 = qt1 + pt1 . Then pt2 t + q = 0. If p = q = 1/2, then multplying
through by 2, we have t2 2t + 1 = 0 so that (t 1)2 = 0. Therefore we have a
double root t3 = 1. If p 6= 1/2, then note that
1 4pq = 1 4p(1 p)
= 1 4p + 4p2
= 4(p2 p + 1/4)
= 4(p 1/2)2 .
so that pq < 1/4 and we have by the quadratic formula, distinct roots

1 1 4pq
t1,2 =
2p
p
1 4(p 1/2)2
=
2p
1 |2p 1|
.
=
2p
Assume WLOG p > 1/2, then |2p 1| = 2p 1 and
t1,2 =

1 (2p 1)
,
2p

so that t1 = 1 and t2 = (1 p)/p = q/p.


(b) Suppose p 6= 1/2, sc,p (0) = 0, sc,p (c) = 1 and sc,p (a) = C1 (t1 )a + C2 (t2 )a for
some constants C1 and C2 . Then sc,p (0) = C1 (t1 )0 + C2 (t2 )0 = 0 implies that
C1 + C2 = 0 so that C2 = C1 . Thus, since sc,p (c) = C1 (t1 )c + C2 (t2 )c = 1 we
have C1 (t1 )c C1 (t2 )c = 1, so that C1 = 1/(tc1 tc2 ) along with C2 = 1/(tc1 tc2 ).
Therefore we have
ta
ta
sc,p (a) = c 1 c c 2 c
t1 t2
t1 t2
ta1 ta2
= c
t1 tc2
 a
q
1
p
 c .
=
q
1
p
(c) Suppose p = 1/2, sc,p (0) = 0, sc,p (c) = 1 and sc,p (a) = C3 (t3 )a + C4 a(t3 )a for
some constants C3 and C4 . Then since t3 = 1, we have sc,p (0) = C3 + 0C4 = 0 so
that C3 = 0. Also sc,p (c) = cC4 = 1 so that C4 = 1/c. Therefore
sc,p (a) = a/c.
65

Exercise 7.4.1. For the stochastic process {Xn } given by (7.0.1), compute (for
n, k > 0)
(a) P(Xn = k).
(b) P(k = n).
[Hint: These two questions do not have the same answer.]
Proof. Consider the stochastic process {Xn } given by (7.0.1).
(a) If n < k, then Xn < k so that
 P(Xn = k) = 0. Suppose n > k. Then
n
k
.
2n
(b) If n < k, then Xn < k so that P(k = n) = 0. Suppose n > k. Then
P(k = n) = P(inf{m > 0; Xm = k} = n) = P(Xn = k and Xn1 = k 1) =

P(Xn = k) = P(k heads) =

n1

P(rn = 1 and Xn1 = k 1). By part (a) we have P(Xn1 = k 1) = k1


.
2n1
1
Since the probability of the next coin flip to be heads is and the nth coin flip
2
1
is independent from the previous coin flips, we have P(k = n) = P(Xn1 =
2

k 1) =

n1
k1
2n

66

You might also like