You are on page 1of 265

Probability

with

Martingales

David
Statistical

Williams

Laboratory,

DPMMS

Cambridge

University

Cambridge
UNIVERSITY

PRESS

Published in tlie United

States

of America

by

Cambridge

University Press, New York

\302\251 Cambridge

University Press 1991


is in copyright. Subject to statutory exception of relevant collective licensing agreements, of any part may take place without

Thispublication
and

to the provisions

no reproduction the written permission of Cambridge University Press. 1991 published Twelfth printing 2010
First Printed

in the

United

Kingdom

at the

University Press,Cambridge
is available from the British Library

A catalogue

record for this

publication

ISBN

978-0-521-40605-5

paperback

for the persistence or accun Cambridge University Press has no responsibility of URLs for external or third-party internet websites referred to in this public! and does not guarantee that any content on such websites is, or will remain, accurate or appropriate.

Contents

Preface

\342\200\224

please

read!

xi

A Question
A

of Terminology
Notation

xiii
xiv

Guide

to

Chapter

0: A Branching-Process Example
remarks. 0.3. Z\342\200\236.

1
0.2. Size
of

0.0.
of

Introductory

0.1. Use

n^^

generation,
tt.

number of children, X. Typical of conditional expectations. 0.4. Extinction

probability,

0.5.

Pause

for thought:

0.7.

Convergence

(or not)

measure. 0.6. Our first martingale. the distribution of expectations. 0.8. Finding

Moo. 0.9.

Concrete example.

PART

A:

FOUNDATIONS

Chapter

1: Measure
remarks.

Spaces
1.1.

14
Definitions

1.0.
Examples.

Introductory

Borel

set

functions.
measures.

<7-algebras, B{S), B{R). 1.3. Definitions 1.4. Definition of measure space. 1.5.

B =

of algebra,

<7-algebra. 1.2.
concerning
Definitions

concerning

1.6.

Lemma.
extension

Theorem.

Uniqueness
theorem.
Elementary

of extension,
1.8.
inequalities.

7r-systems.1.7.
measure Leb
Lemma.
1.10.

Caratheodory's

on ((0,1],-B(0,1]). 1.9.
Monotone-convergence
Chapter

Lemma.

Lebesgue

properties of measures. 1.11.Example/Warning.


23
(Q,^,

2:

Events
for

2.1. Model

experiment:

Examples lim liminf,


sup,

of

(fi,^)
J,

pairs.
lim,

2.4.
2.6.

etc.

2.2. The intuitive meaning. 2.3. P). Almost surely (a.s.) 2.5. Reminder: Definitions. limsupJS^n, (-B\342\200\236,i.o.). 2.7.

vi

Contents

First Borel-Cantelli 2.9. Exercise.

Lemma (BCl). 2.8.

Definitions,

liminf

j^n,

(^n,ev).

Chapter

3: Random Variables
S-measurable

29
3.2. Elementary (mS)+,bS. Sums and products of

3.1.

Definitions.

3.4. Composition Lemma. 3.5. Lemma functions are measurable. measurable of functions. 3.6. Definition. Random liminfs of infs, on measurability 3.8. Definition. Coin variable. 3.7. Example. <7-algebra generated tossing. on Q. 3.9. Definitions. Law, Distribution by a collectionof functions functions. 3.11. Existenceof random of distribution Function. 3.10. Properties of with given distribution function. 3.12. Skorokod variable representation
a

Propositionson measurability.

mS, function, Lemma. 3.3.

random

variable

<7-algebras
Chapter

- a
4:

discussion.3.14.The Monotone-Class

with prescribed

distribution

function.

3.13. Theorem.

Generated

Independence
of

38
4.2. 4.3.

4.1. Definitions

independence.

more familiar
model

definitions.
A

Second
question

Example.4.5.
with

The 7r-system Lemma; and Lemma (BC2). Borel-Cantelli

the

4.4.

fundamental

applications.

4.7. Notation:

IID RVs.
0-1

for modelling.
4.8.

4.6.

coin-tossing

Stochastic

processes;

Markov

chains. 4.9.

algebras. 4.11. Theorem. Kolmogorov's


Chapter
5.0.

Monkeytyping

Shakespeare.

4.10. law.

Definition.

Tail a-

4.12.

Exercise/Warning.
49

5: Integration
etc.

Notation,

simple
TOU).

Positive and negative parts of /. 5.7. Inte5.8. 5.9. Dominated Convergence grable function, \302\243^(5, S,/i). Linearity. Theorem (DOM).5.10.SchefFe's Lemma 5.11. Remark on (SCHEFFE). uniform integrability. 5.12. The standard machine. 5.13. over Integrals subsets. 5.14. The measure //i, / \342\202\254 (mE)\"*\".
5.5.

Theorem Convergence (MON).


'Linearity'.

functions,

//(/) :=: J fdfi, SF'^. 5.2. Definition


5.4.

fj,(f;A). 5.1. Integrals


of//(/),

of

non-negative

/ G (mS)\"*\". 5.3.

Monotone(FA-

The

Fatou

Lemmas

for functions

5.6.

Chapter

6:

Expectation

58

of expectation. 6.2. Introductory remarks. 6.1. Definition Convergence 6.3. The notation E(X;F). 6.4. Markov's 6.5. inequahty. Sums of non-negative RVs. 6.6. Jensen'sinequality for convex functions. 6.7. Monotonicity of C^ norms. 6.8. The Schwarz 6.9. >C^: inequahty. < p < oo). 6.11. etc. 6.10. of \302\243p Pythagoras, covariance, Completeness (1

theorems.

6.13. Holder from

Orthogonal projection. 6.12. The


Jensen.

'elementary

formula'

for expectation.

Contents

vii

Chapter

7: An Easy
means

Strong Law
multiply'

71
7.2. Strong Law
approximation

7.1.

'Independence

7.3. Chebyshev's inequality.


Chapter

7.4. Weierstrass

- again!

- first theorem.

version.

8: Product

Measure
advice.

75
Product

and 8.0. Introduction

8.1.

measurable
=

8.2.

Product

and product measure. 8.5. i?(R)'' Independence of probability extension. 8.7. Infinite products
of on the existence

measure,

Fubini's Theorem.

8.3. Joint laws,


S(R'').

structure,
joint

Ei
pdfs.

E2.

8.4.

triples.

8.6. The n-fold 8.8. Technical note

joint

laws.

PART

B: MARTINGALE

THEORY

Chapter 9: ConditionalExpectation

83

9.1.
expectation Agreement expectation:

motivating

example. 1933).

(Kolmogorov, as

9.3.

with

least-squares-best traditional

Fundamental Theorem and Definition 9.2. The intuitive meaning. 9.4. Conditional 9.5. Proof of Theorem 9.2. 9.6. predictor. 9.7. Properties of conditional expression.

a list.

9.8.
Use

Proofs
and

of the

conditional
assumptions.

probabilities
9.11.

pdfs. 9.10. of symmetry: an

9.7.9.9.Regular properties in Section Conditioning under independence


example.

Chapter 10: Martingales

93
processes.
examples

10.1.
fundamental Stopped

Filtered

spaces.
submartingale.

10.2.

Adapted
10.4.

10.3. Martingale,
of martingales.

martingale,
and unfair

Some

10.5. Fair

super-

games. principle: supermartingales

10.6.

Previsible process, gambling strategy. beat the system! 10.8. you can't
are

time.10.9. Stopping
Doob's Optional10.12. Hitting

10.7. A

times
functions

Stopping Theorem.10.11. Awaiting


for

supermartingales.
the

10.10.
inevitable.

almost

simple

random

walk.

10.13.

Non-negative superharmonic

for

Markov

chains.

Chapter

11: The
picture

Convergence Theorem
says it all.
Corollary.

11.1.
ing

The

that

11.2.

106 Upcrossings. 11.3. Doob'sUpcrossTheorem.

Lemma.

11.4.

Corollary.
11.7.

11.6. Warning.

11.5. Doob's'Forward'Convergence

viii

Contents

Chapter
12.0.

12: Martingales bounded in


12.1. Martingales

\302\243^

110

Introduction.

in

\302\243^: orthogonality

of increments.

in C^. 12.3. of zero-meanindependent random variables the 12.4. A symmetrization Random signs. sample space. technique: expanding 12.6. Cesaro's Lemma. 12.7. Theorem. Three-Series 12.5.Kolmogorov's Kronecker'sLemma.12.8.A Strong Law under variance constraints. 12.9.

12.2. Sums

Law of Strong Kolmogorov'sTruncation Lemma. 12.10. Kolmogorov's The 1 2.12. Doob 12.11. Numbers angledecomposition. Large (SLLN). of of M to finiteness brackets process (M). 12.13. Relating convergence

extension

'Strong (M)oo- 12.14. A trivial of the Borel-Cantelli

Law'

Lemmas. 12.16.Comments.
126
UI

for martingales

in

\302\243^.12.15.

Levy's

Chapter 13:
13.1.
of

Uniform

Integrability

continuity' property. 13.2. Definition. Two simple sufficient conditions for the UI property.
An 'absolute
conditional proof
Elementary

family.

13.3.

expectations. of (BDD).

13.5.

Convergence
necessary

13.7.

and

13.4. UI property in probability. 13.6. sufficient condition for C^

convergence.

Chapter 14: UI Martingales


14.0.

133

Introduction.
Martingale

14.1.
proof

Theorem. 14.5. martingale

14.3.
14.8.
A

UI martingales.

Martingale

of Kolmogorov's of the proof


14.7.

Inequality.

standard bounds;

estimate large

on

14.2. Levy's 'Upward'Theorem. 14.4. Levy's 'Downward' Strong Law. 14.6. Doob's SubLaw of the Iterated Logarithm: special case. the normal distribution. 14.9. Remarkson
0-1 law. theory.

exponential

deviation

14.10.

consequence

of Holder's

14.12. C^ inequality. Kakutani's Theorem on theorem. 14.14. The 'product' martingales. 14.13.TheRadon-Nikodym theorem and conditionalexpectation.14.15.Likelihood Radon-Nikodym measures. 14.16. Likelihood ratio and conditional ratio; equivalent expectation. 14.17. Kakutani's Theorem revisited; consistency of LR test. 14.18.

inequality.

14.11. Doob's

Note

on

Hardy

spaces,

etc.

Chapter

15: Applications
-

153
15.1.
A

15.0.
result.

Introduction

entangled.15.11.

The formula. 15.3. Option pricing; discrete Black-Scholes Proof of Lemma 15.4. 15.5. Proof sheep problem. Mabinogion 15.3(c). of result 15.3(d). 15.6.Recursive nature of conditional 15.7. probabilities. formula for bivariate distributions. normal 15.8. observation of Bayes' Noisy a single random variable. 15.9. The Kalman-Bucy filter. 15.10.Harnesses

please

read!

trivial

martingale-representation

15.2.

Harnesses

unravelled,

1.

15.12.

Harnesses

unravelled, 2.

Contents
PART

ix

C:

CHARACTERISTIC

FUNCTIONS

Chapter

16: Basic
Definition.

Properties of CFs
Elementary 16.4.

172
16.3. Some uses of 16.5. Atoms. 16.6. Levy's

16.1.
characteristic

16.2.

properties.

functions.

Three

key results.

Inversion

Formula.

16.7.

A table.

Chapter

17: Weak Convergence


'elegant'

179
formulation, n.3. Skofor
compactness

rokhod representation.17.4. Sequential


Tightness.

17.1.

The

definition.

17.2.

A 'practical'

Prob(R).

17.5.

Theorem 18: The Central Limit 18.1. Levy's Convergence Theorem. 18.2.o and important estimates. 18.4. The CentralLimit

Chapter

185
O

notation. 18.5.

18.3.

Some

Theorem.

Example.

18.6. CF

proof of

Lemma

12.4.

APPENDICES

Chapter

Al:
A

Appendix
Proof

to Chapter 1

192

Al.l.
Lemma.

non-measurable

A1.4.

Outer measures. Al.7. Caratheodory'sLemma. A1.8.Proof of A 1.9. Proof of the existence Theorem. of Lebesgue measure on ((0,1],B(0,1]). ALIO. of non-uniqueness of extension. Al.ll. Example Completionof a measure space. A1.12. The Baire categorytheorem.
case.

subset A of 5^. A1.2. <i-systems. A1.3.Dynkin's of Uniqueness Lemma 1.6. A1.5. A-sets: 'algebra'

A1.6.

Caratheodory's

to Chapter A3: Appendix


A3.1.
generated

Chapter

205
of

Proof

of the

Monotone-Class Theorem3.14. A3.2. Discussion to Chapter 4


of the

<7-algebras.

Chapter

A4: Appendix
Kolmogorov's

208
A4.2. Strassen's Law
chain.
for

A4.1.
of

Law

Iterated
A

Logarithm.

the

Iterated

Logarithm.

A4.3.

model

a Markov

Chapter

A5: Appendix to
monotone

Chapter5
A5.2.

211
use
of

A5.1.

Doubly

arrays.

A5.3. 'Uniqueness Theorem.

of integral'.

A5.4. Proof of

The key
the

Lemma

1.10(a).

Monotone-Convergence

Contents

Chapter

A9:
Infinite

Appendix
products:

to Chapter
setting

9
up. A9.2.

214
Proof
of

A9.1.
Chapter

things

A9.1(e).

A13:
Modes

Appendix
of

to Chapter
definitions.

13
A13.2.

217
Modes of

A13.1.

convergence:

convergence:
219 case

relationships.

to Chapter A14:Appendix

Chapter

14

A14.1. The <7-algebra ^r, T a stoppingtime. A14.2. A special A14.3. Doob's Optional-Sampling Theorem for UI martingales. result for UI submartingales.

of OST. A14.4. The

16 Chapter A16: Appendixto Chapter


A16.1.

222

Differentiation

under

the integral

sign.

Chapter E: Exercises
References

224

243

Index

246

Preface

please

read!

I have book is Chapter E: Exercises. 'EG' on the start now can left the interesting things for you to do. You exercises,but see 'More about exercises' later in this Preface. the set of lecture notes for a third-year is essentially The book, which as I can an introduction course at Cambridge, is as lively undergraduate of probability. Since much of the book is manage to the rigorous theory at those look it is bound to become very devoted to martingales, lively: of course, there is that initial plod through Exercises on Chapter 10! But, be said however that measure the measure-theoreticfoundations. It must theory, that most arid of subjects when done for its own sake, becomes amazingly more alive when used in probability, not only because it is then applied,but also because it is immensely enriched. avoid measure You cannot theory: an event in probabilityis a measurableset, a random variable is a measurablefunction on the sample space, the expectation of a random variable is its integral with to the respect and so on. To be sure, one can take some central results measure; probability from measure theory as axiomatic in the main text, giving careful proofs in appendices;and indeedthat is exactly what I have done. Measuretheory for its own sake is based on the fundamental addition rule for measures. with that the theory Probability supplements multiplication rule describes which and things are already independence; looking But what enriches and enlivens we deal with is that lots up. really things of (7-algebras, not just the one <7-algebra is the concern of measure which
The

most

important

chapter

in this

theory. In planningthis book, I decided for just a bit too advanced, and, often with
them. For a more thorough training in

every

topic

what
I have

sadness,

things I considered ruthlessly omitted covered

many

of

the

topics

here, see

Billingsley(1979),

Chow

and

Teicher

(1978),

Chung

(1968), Kingman

and

xi

xii
Taylor

Preface
(1966),

Laha

and
this

Rohatgi
from

measure
(1968), martingales.

theory, I
reading

learnt it

(1979), and
and

Neveu

(1965).

As

regards
Breiman

Dunford

Schwartz
can

(1958) and Halmos


be done
with

(1959). After
and,

book,

you must

read the still-magnificent


what

for an excellent Hall and Heyde

indication of
(1980).

discrete

than Aldous (1989), though it is a very for this There is no better whetstone and for learning of probability demanding book. For appreciatingthe scope and Stirzaker and Grimmett how to think about it, Kaxlin Taylor (1981), and recent Grimmett's superb book, Grimmett (1989), (1982),Hall (1988), recommended. on percolation are strongly
More about exercises. the homework sheet

theory, and you

Of course,intuitionis muchmoreimportant

than

knowledge

should take every

opportunity

to

sharpen

of measure your intuition.

In compiling
give

Chapter E,

which

consists

exactly

of

to

the

account
contains

Cambridge

students,

I have

taken into

the

fact

that

this book,
of

like any other


to the

mathematicsbook,implicitly

are easier than of which exercises you create by reading the statementof a result, and then trying to prove it for yourself, before read the other about exercises: One you proof. you will, for point given in E Exerciseson example, surely forgive my using expectation Chapter4 E is treated before with in full 6. Chapter rigour
a vast

number

of other

those in Chapter E. I refer

exercises,many

course

must My first thanks go to the students who have the book is basedand whose quality has made me try hard to make it worthy of them; and to those, especially David who had developed the coursebefore it became to Kendall, my privilege teach it. My thanks to David Tranah and otherstaff of CUP for their help in converting the courseinto this book.Next,I must thank Ben Gar ling, James Norris and Chris Rogers without the book would have contained more whom errorsand obscurities. many faults which surely remain in it are my (The Helen and I typed part of the book, but the vast Rutherford responsibility.) majority of it was typed by Sarah Shea-Simonds in a virtuoso performance of Horowitz. to Sarah. worthy My thanks to Helen and, most especially, Special thanks to my wife. Sheila, too, for all her help.

Acknowledgements.
the

endured

course

on which

But my
must

best

thanks

- and

go

to three

without Doob, A.N. Kolmogorov and P. Levy: them, been much to write about, as Doob (1953) splendidly

people whose

yours if you derive any benefit from the book names appear in capitals in the Index: J.L.
there
confirms.

wouldn't

have

Statistical

Laboratory,

David

Williams

Cambridge

October1990

A Question

of Terminology
functions

Randomvariables:

or

equivalence

classes?

if we of this book, the theory would be more'elegant' regarded of measurable functions on the class variable as an equivalence to the same equivalence class if and sample space,two functions belonging Then the conditional-expectation are almost if everywhere. only they equal At

the

level

a random

map

X
would for

^ E{x\\g)
i^^(fi,

be p

>

a truly 1; and

well-defined

contraction map from


have

^,
the

P) to
endless

we would not
classes)

L^(f2, Q^P)
'almost

to

keep

mentioning

versions

(representatives of

equivalence

and would

be able to avoid
route:
firstly,

surely'

qualifications.
have

I
with

however

chosen

functions^

and

confess
5 =

the 'inelegant' to preferring

I prefer

to work

4 -h

2 mod

to

[4]7

[5]7

= [2]7.

But there is a substantive reason. I hope that this book will you to tempt to the and much more interesting, more where important, progress theory the parameter set of our process is uncountable (e.g. it may be the timeformulation parameter set [0,oo)). There, the equivalence-class just will not work: the 'cleverness'of introducing quotient spaces loses the subtlety which is essential even for formulating the fundamental results on existence of continuous are modifications, etc., unless one performs contortionswhich Even if these contortions allow one to formulate hardly elegant. results, one would still have to use genuine functions to prove them; so where doesthe
reality

lie?!

xni

Guide

to

Notation

\342\226\272 signifies

something

the

Martingale

\342\226\272\342\226\272 important, something Theorem. Convergence

very

important,

and

\342\226\272\342\226\272\342\

I use ':='
convenient

to signify
because (as

'is

defined

to equal'.

it can opposed

also be
to

use

analysts'

This Pascal notation is particularly used in the reversedsense. category theorists') conventions:

\342\226\272

N:={1,2,3,...}C{0,1,2,...}=:Z+.

Everyone For
function

is agreed that
of

R\"^

:=

[0,oo).
set

a set

B containedin someuniversal B: that is /^ : 5 \342\200\224\342\231\246 and {0,1}


\\ 0

5,

Ib

denotes

the indicator

otherwise.

For a, 6

E R,

a Ab

:= min(a, 6),

aV

6 :=

max(a,

6).
pdf:

CFxharacteristic
density

function; DF: distribution


(7{Yy

function;

probability

function.

a-algebra,

<7(C) (1.1);

: 7

G C)

(3.8, 3.13).

7r-system

(1.6);

d-system

(A1.2).

a.e.:

almost everywhere (1.5)

a.s.:
bE:

almost

surely

(2.4)

the space

of bounded

E-measurablefunctions

(3.1)

xiv

A Guide to
the

Notation

XV

B(S):

Borel

a-algebra
stochastic

on 5,
integral

B := B(R)
(10.6)
(5.14)

(1.2)

\342\200\242 X:

discrete

dX/dfi:
dQ/dP:

derivative Radon-Nikodym

Likehhood

Ratio (14.13)
X{uj)P(du;)

E(X):
E(X;F):

expectation E{X):= ^
/^ Xc/P
conditional

of

(6.3)

(6.3)
expectation

E(X|^):
(En.ev):
(En,i.o.):

(9.3)

liminf

jE;\342\200\236 (2.8)

limsupjEn

(2.6)
(pdf)

fX' /x,y:
fx\\Y'

probability density function joint pdf (8.3)


conditional

of X

(6.12).

pdf

(9.6)
of

Fx'
liminf:

distribution function
for sets,
for

(3.9)

(2.8)
(2.6)

limsup:
X =1 log:
linix\342\200\236:

sets,

x\342\200\236

| x in

that

Xn

<

Xn-\\-i (Vn)

and

\342\200\224> x. x\342\200\236

natural
law

(base e)
X

logarithm
(6.7, 6.13)

Cx, Ax:
LP: \302\243P,

of

(3.9) spaces

Lebesgue

Leb:
mE:

Lebesguemeasure (1.8)
space

of E-measurable

functions

(3.1)

process M stopped at time T (10.9)


(M):

angle-brackets

process
with

(12.12)
to

/i(/):

integral of /

respect

/i (5.0,

5.2)

/i(/;A):
<px''

X4/c//i(5.0,5.2)
CFof pdf

(Chapter

16) normal

<p:

of standard
N(0,1)

N(0,1) distribution

^:
X^:

DF of

distribution

X stopped

at time T (10.9)

Chapter

A Branching-Process
(This with

Example
of

Chapter Chapter

is not 1 if

essential

for the remainder

the

book.

You can

start

you wish.)

0.0. Introductory

remarks

Thepurpose
well known

is probably is threefold: to take somethingwhich of this chapter or Ross to you from books such as the immortal Feller(1957) to that start on to start think so familiar make you you ground; (1976), about someof the problems involved in making the elementary treatment into rigorous and to indicate what new results appear if one mathematics; the somewhat more advanced theory developedin this book. We applies stick to one example: a branching process. This is rich enoughto show that the theory has some substance.

0.1. Typical number of children, X In our the number of childrenof model,


for

typical

animal

some

interpretations

of 'child'
assume

and 'animal')

is a random variable

(see Notes
X

below
with

values

in

Z\"'\".

We

that
z=

P(X We define
where

0)

> 0.

the generatingfunction

of X

SiS the

map /

: [0,1]-^

[0?1]?

kez+

Standard

theorems on

power seriesimply E(X0^-') = J2

that,

for

0 G [0,1],
=

f\\0) =
and

ke^~^P{X

k)

E{X) = f\\l) = ^
1

kP{X

k)<oo.

2 Of course,

Chapter 0:

Branching-Process

Example

(O-l)-

as /'(I) is hereinterpreted
^Ti

0-1 that fl <


OO.

9]i

- 6

since /(I) = 1. We

assume

Notes.
of

The

first application
of family

survival

names;

of branching-process theory and in that context, animal


be

wasto the question


=

man,

and

child

= son.
can In another context,'animal' neutron into

'neutron',

and

'child'

of that

will a

signify nucleus.
can can

a neutron released if and when the parent neutron or not the associatedbranching Whether

supercritical

be

a matter find

of real

crashes is process
richer structures
study

importance.
of

We and

often

can

then

use

branching processes embedded in the results of this chapterto start the


processes,

more

interesting

things.
see Athreya

For superb accounts of branching Harris (1963), Kendall (1966, 1975).


0.2.

and Ney (1972),

Size

of n^^ generation,
formal:

Zn
that

To be a bit
(a)

suppose

we are

given a doubly infinite

sequence

|X(^^

:m,rGN}
random

of independent
with

identically distributed
distribution

variables

(IID

RVs), each

the

same

as X:

P(X^-*) = k) = P{X= k).


The

idea

is that of children

for n G (who

Z\"^ will

and be

r G in the

number

(if there
that

is one) in the
signifies

N, the variable Xr (n-h 1)^^ generation)


The

represents
of

the

the

r^^ animal

n^^

generation.

fundamental

rule therefore

is

if

Zm

the size

of the n^^ generation,then


+

(b)
We assume
the sequence
that (Zm

Z\342\200\236+i=x\\\"+'^

-.-

+ xil+'\\
full

Zq :

= m

1, so that G Z\"^) from

(b) gives a
the

recursive

definition

of

sequence

(a).

Our first task

is

..(0.3)
to calculate the
generating

Chapter

0:

A Branching-Process

Example or equivalently

distribution

function

of

Zn,

to

find

the

function

(c)

U9):=E{e^'')^Y.^'P{Zn^k).

0.3. Use of conditionalexpectations


The

first

main

result

is that for n G

Z\"^

(and

6 G

[0,1])

(a)

fn+m
that

= urn),
n-fold /o/o...o/.
the
that

so
(b)

for

each

n G

Z\"^,

is the /\342\200\236 =

composition

/\342\200\236

Note that the


0^

in

agreement

is by convention 0-fold composition with - indeed, forced by - the fact (a),


we

identity

map fo(0)

Zq = 1.

To prove
following

use

at the

moment in intuitive fashion

- the

very

special

case

of the very

useful

Tower

Property

of

Conditional

Expectation:

(c)
to

E(c;) =
find

EE(u\\vy,

the

expectation
of

of a
U

random variable
V, and

Z7,

first

find

the conditional
that

expectation E(Z7|V)
We

given

then find the


=

prove
We

the ultimate form


apply

of (c) at a later stage.


Zn:

expectation of

(c)

with

U =

6^^+^ and V

E(^^\"+0

= EE(^^\"+H^n).

Now, for
satisfies

A:

G Z\"^,

the

conditional

expectation of

^^\"+igiven
=

that

Zn

= ^

(d)
But

= E(^^\"+> \\Z\342\200\236 k)

= E(^^{-'\"+-+4\"+\"

|z\342\200\236 k).

Zn is
of

constructed
Xj

from

variables

independent

,...

in the
expectation

right-hand term in (d) must

,X|^\"

. The

with r < n, and so Zn Xi conditionalexpectation given


agree

is Zn

therefore

with

the absolute

(e)

E(e^'^\"^'\\..0^i\"''').

Chapter

0:

Branching-Process

Example

(0.3)..

the expression at (e) is a expectation random variables and as part of the family we know that this expectation of results,
But

of the product of 'Independence a product

may

of independent means multiply^ be rewritten as the

product
we have

of

expectations.

Since

(for every n and

r)

proved

that

E(0^\" and

'|^n

fc)

/W*,

this is what

it

means

to

say that

[If V

takes
of

only
U

E(L/|V)

integer values, V is equal given


V

then when V

k^ the

conditional

expectation
=

U given that

k.

(Sounds

to the conditionalexpectation E(Z7|F Property (c) now yields reasonable!)]

k) of

E^z\342\200\236+i ^E/(^)Z\342\200\236^

and,

since

E(a^\"
result

) =

/\342\200\236(\302\253),

(a) is

proved.
are

Independence and conditionalexpectations in this course.

two

of the

main topics

0.4. Extinction probability,


tt

Let

TTn

:=

P(Zn

= 0).

Then

tt^

/\342\200\236(0),

so

that,

by (0.3,b),

(a)
Measure

7r\342\200\236+i =/(7r\342\200\236).

theory

confirms
TT

our intuition
P{Zm

about the

extinction probability:
lim7r\342\200\236.

(b) Because

:=

= 0

for some m)
(a)

=t

/ is continuous, it

follows from
TT^

that

(c)

f(ir).

The

function

/ /

non-decreasing

the following

/'(I)

of

at 1

is /i = E(X). The celebrated pictures Theorem obvious.

is analytic on (0,1), and slope). Also, /(I) = 1 and

is non-decreasing and
/(O)

convex

P(X

= 0)

> 0. The slope


now

(of

opposite

make

THEOREM

IfE{X)> 1,
equation
then TT

then

the
which

extinction
lies

tt = 1.

/(tt)

strictly

tt is the unique probability between 0 and 1. If

E(-X')< 1,

root of

the

..(0.4)

Chapter

0:

Branching-Process

Example

y=

f{x)

Case1: subcriiical,
The critical

// = =

/'(!)<

case //

1 has

a similar

1. Clearly, tt picture.

:= 1.

Case 2: supercritical^

^ =

/'(I)

> 1. Now,

tt <

1.

6 0.5.
Now

Chapter 0: Pause
that

Branching-Process

Example

(0.5)..

for thought: measure


about

we have say

theory

find a

To be sure, more preciselanguage.


TT

finished revising what introductory theory, let us think branching-process


the

courseson probabiUty
about

why

we must

claim

at (0.4,b)

that

(a)
is intuitively
cannot

=1

limTTn

prove

it

at present

mathematical

it? We certainly plausible, but how could one prove of stating with no means because we have what it is supposed to mean. Let us discuss precision

pure-

this

further.

Back in Section 0.2, we


sequence

said

'Suppose

that

we are
What

given a doubly infinite


distributed does

[Xr

: m,r
with

6 N}
same

of independentidentically
distribution
of)

random

variables
could

each

the

as X'.
function

this

mean?

A
We

random variable
follow

is a (certainkind

on a

elementary

other words, taking

Q to be the set of Q to be the Cartesian product


theory in taking

sample space
all

Q.

outcomes,

in

the

typical

element

cj of

Q being

a; = (a;^^>
and then
we

:r6N,5
-

6N),
Q is

setting Xa {oj) =

oJa

Now

an uncountable
sense
of

set, so that
of
in the 7r\342\200\236

are

outside

the 'combinatorial'

one theory. Choice, can prove that it is impossibleto assign to all subsets of Q a probability the X's IID RVs satisfying the 'intuitively obvious' axioms and making with the correct common distribution. we to have know that the set So, of uo corresponding to the event 'extinction occurs' is one to which one can a will then a definition of uniquely assign probability (which provide tt). elementary

context which makes if one assumesthe Axiom Moreover,

Even then, we have to prove (a). Consider for a moment Example.

what
Let

is in some

constructa
which the

'probability

theory'.

C be

the class

ways a bad attempt to


of

subsets

C of

N for

'density'

p{C):= ntoo
lim

U^:l<k<n;keC}

exists.

Let Cn
Vn

:= {1,2,..., n}. Then


and

also

(J Cn =

N. However, p(Cn) =

Cn

E C

and Cn

in
Vn,

0,

the sense that = 1. but p{N)

..(0.6)
Hence the
fact

Chapter 0: A logic which


{Zn =
will

Branching-Process

Example

allow

us correctly

to deduce

(a)

from

the

that

0} t {extinctionoccurs}
(N,C,/9) is
not

fails

for

the

(N,C,/o)

set-up:

'a

probability

triple'.

but provides a huge resolves There are problems. Measuretheory them, bonus in the form of much deeper results such as the Martingale
Convergence

Theorem

which

we now

take a

first

look

at

- at

an

intuitive

level,

hasten

to add.

0.6. Our first


Recall

martingale
that

from

(0.2,b)

is clear

where the X^^'^^^variablesare independent from this that

of

the

values

Zi, Z2,...,

It Z\342\200\236.

P(Zn+i ^
a result
(Zn : n

j\\Zo

io.Zi

=ii,...,Zn=

in) =

P(2n+1 = j\\Zn
that

in),

> 0)

which you will is a Markov

probably chain.

as recognize We therefore =

stating have

the process

Z \342\200\224

E(Zn+l|Zo =

= .. ,Z\342\200\236 io.Zi = Z'l,. in)

2_^ jP(Z\342\200\236+i

j\\Zn

= in)
=

=
or, in
(a)

E(Zn+l|Z\342\200\236

in),

a condensed

and better notation,


=
E(Z\342\200\236+i|Z\342\200\236).

E(Z\342\200\236+,|Zo,Zi,...,Z\342\200\236)

Of

course,

it is intuitively

obvious that
== E{Zn^,\\Zn)

(b)
because
children.

flZn,

each
We

of the

Zn

animals

in

the

n^^ generation
differentiating

can confirm

result (b) by

has on average
result

(j.

the

with

respect

to 6

and setting 6=1.

Chapter

0:

A Branching-Process

Example

(0.6)..

Now define

(C)
Then

Mn := ^n//i\",

>

0.

E(Mn+i|Zo,Zi,...,Z\342\200\236)-Mn,

which exactly says that M is a martingale relative (d)


Given the

to

the

Z process.
value

it is now: what M is 'constant on average' in this very sophisticated of conditional expectation given 'past' and 'present'.The true statement

history of

up

to stage

n, the next

Mn+i

of M

is on

average
sense

(e)
is

E(Afn)
of course
A

= l,

Vn

infinitely
S is

cruder.
said to

statement 1 if

be true almost
P(5 is

surely (a.s.)

or

with

probability

(surprise,

surprise!)

true) =1.
{Mn >
is

Because
our Martingale

martingale

is non-negative implies

0,Vn), the
surely

Convergence

Theorem

that

it

almost

true that

(f)
Note

Moo:=limMn
that

exists.
can

> 0 for some outcome (which when probability only /i > 1), then the statement
if Moo

happen

with

positive

Zn//i\"

^ Moo
1; what

(a.s.)

is a precise formulation question is: suppose


the value
0.7.
We o/Mo\302\251?

of that

'exponential
/.i >

A particularly fascinating growth'. is the behaviour of Z conditional on

Convergence
know

(or Moo :=
/i

not)

of expectations
probabiUty

that

lim Mn exists with

1, and

that

Vn. We

know

might be
if

that

eventually

0. Hence
^/m

tempted to believethat E(Moo) = 1. However, we already < 1, then, almost surely, the process dies out and Mn is
1, then
0 =

E(M\342\200\236)

1,

(a)

<

Moo =

0 (a.s.)

and

E(Moo)7^1imE(Mn)

= l.

..(0.8)
This is

Chapter 0: A
for

Branching-Process

Example

Fatou's Lemma,
variables:

to keepin an excellentexample
valid

mind

when

we come

any

sequence
Yn)

{Yn) of

non-negative random

to study

E(liminf
What

< liminf

E(Fn).
/i <

is
that

are

'going
Mn

will

large value

times its small probability


Section
it

at wrong' be large
0.9.
is

(when (a) is that if Mn is not 0 will

1) for

large n, the chances


speaking,

and, very
keep

roughly

this

E(Mn)

at 1.

See the concrete

examples
in

Of course,

very

important

to know
E(lim-),

when

(b)
and
general

limE(-)=
we

do spend
are

quite a
rarely

considerabletime studying
fact

this.

The

best
concrete

theorems

good enough
and

to get the best resultsfor


fi

problems, (c)
where

as is

evidenced by the
=
\"^

that

E(Moo)
X
=

if

only

if hoth
children. though

>

1 and

E(XlogX) may not

<

cx),

is the

and
Moo

E(XlogX)
0, a.s.

typical number of = cx), then, even

Of the

course process

0 log 0 = 0. li /j

>

die out.

0.8.
Since

Finding
Mn

the distribution of Moo


(a.s.), it is obvious that
exp(-AAfn)
for A >

-^ Moo

0,

-^ exp{-XMoo)

(a.s.)

Now

since

each

Mn

> 0,

in absolute

value by the

experiment.
assert

The

the whole sequence (exp( bounded \342\200\224AMn)) is constant 1, independently of the outcome of our
Convergence

Bounded

Theorem

says

that

we can

now

what

we would

wish:

(a)
Since

Eexp(-AA/oo)= limEexp{-XMn).
Mn

Zn/^\"\" and

E(6\302\273^\")

fn{0),

we have

(b)
so

Eexp(-AM,)
that,

fn{exp{-X/fi^)),

in principle
However,

side of (a).
function

(if very rarely in practice), we


for

can

calculate
the

the left-hand
distribution

a non-negative

random

variable F,
by

\302\273\342\200\224\342\226\272 <

P{Y

y) is completely
A

determined
oji

the

map

\302\273\342\200\224\342\226\272 Eexp(\342\200\224Ay)

(0,cx)).

10

Chapter

0:

Branching-Process

Example

(0.8)..

Hence, in principle,

we can
the

find

the

distribution

of Moo-

We have

seen that

real

problem

is to

calculate the

function

i:(A):=Eexp(-AMoo).

Using

(b),

the fact that


of

/n+i = f
equation:

fn-,

and

establishthe functional
(c)

consequence

the

Bounded

Convergence

of L (another the continuity Theorem), you can immediately

I(Am)

= /(X(A)).

0.9. Concrete example


This everything

concrete

example
explicitly,

is just about the only one in which one can in the it is useful of mathematics, but, way X to

calculate
in

many

contexts.

We take
distribution:

the 'typical number


= k)

of

children'

have a

geometric

(a)

P{X

= pq^

(^^eZ+),

where

0<p<l,
Then,

q:=l-p.

as

you

can easily

check,

(b)
and

fi9) =

-^, 1
1

\342\200\224

q6

^=i, p
< p.
from

\\

if ^

of the

To calculate/ o / o ... o upper half-plane. If

/,

we use

a device familiar

the

geometry

f 9u
\\921

gi2\\

922 J
the

is a

non-singular 2x2 matrix,define G(e) =

fractional

Hnear

transformation:

(c)

f4\302\261i-. -r 92if^

922

..(0.9) Then you can

Chapter 0:

Branching-Process

Example

11

check that

if H

is another

such matrix, then

G{H{9))= (GHXe),
SO

that

composition

of fractional

Unear transformations

correspondsto

matrix

multipHcation.

Suppose

that
the

we find

p ^

that

n^^ power

= A method, for example, q. Then, by the S~^AS to / is of the matrix corresponding

(AO\"=\"-'\"-C:)(^o;)(-.
so

T).
6) +
+
qO

that

(d)
li
(jL

MO) =
=

pfi\"(l

gp\302\273(l-^)

50-p-

process dies out.


Suppose

q/p

< 1,

then
that

linin

fn{^)

\342\200\224

1,

corresponding

to the

fact that the


for A >

now

yi >

1. Then you

can easily check that,

0,

L{\\) :

= Eexp(-AMoo)_

lim/4exp(-A//i\)

p\\-\\r

q-

qX-^q-p

Jo
from

which

we

deduce

that

and

P(Moo
P(x

- 0) = TT,
<
X

< Moo

dx)

= (1

- 7r)2e-(^-^)^c/x (x > 0),

or,

better,

P(Moo

> x)
<

= (1

- 7r)e-(^-^)^
case, it is Zn ^ 0?
interesting
We

(x > 0).
to ask:
that

Suppose

that Zn

jj,

1.

In this by

what is

distribution of

the

conditioned

find

^^'

'^\"^'^i-/\342\200\236(0)

=13^'

where

\342\200\224

\"

qjj,^'

p-qyi^'

12

Chapter

0:
-h

Branching-Process

Example

(0.9)..

so 0

<

< 1 a\342\200\236

and

an

Pn

1. As n

\342\200\224^ we oc,

see that /^,

an -^
so (this

1-

//,

^n

-^

is justified)
= Um P{Zn n\342\200\224\342\226\272oo

(e)
Suppose

h\\Zn ^

0) =

(1 -

yi)ii^-^

[k

G N).

that

jjL

\342\200\224 1. You

can show by

induction that

n6 [n + 1) \342\200\224

and

that
E(e-^^\"/\302\273|Z\342\200\236^

0)^1/(1

+A),

corresponding

to

(f)

P{Zn/n > x\\Zn

7^ 0)

->

e-^

x > 0.

'The Fatou factor'


We we

know get

that some

when insight

1, we have into this?


/z <
case

E(Mn)

1, Vn, but

E(Moo) =

0. Can
that

First considerthe
for

when

jjl

<

1.

Result

(e) makes it

plausible

large

n, E{Zn\\Zn

^ 0)

is roughly

(1

//) E

kfi'^-' -

1/(1-

fi).

We

know

that

P{Zn ^
so

0) = 1 -

/\342\200\236(0)

is

roughly

(1

fi)fi^,

we

should

have (roughly)
=

E(M\342\200\236)

('^

Z\342\200\236 ^

o)

^ P{Z\342\200\236

0)

which might
values

help explain how


small probabilities.

the

'balance'

E(Mn)

= 1

is achievedby

big

times

..(0.9)

Chapter 0: A
case when

Branching-Process

Example

13

Now consider the

fi

= 1.
=

Then
l/(n ^
Zn

P(Z\342\200\236^0)

+ l), 0 is
\"^

and, from (f),


mean

1, so
correct

that Mn
order
We

Zn by Zn/n conditioned = Zn conditionedby

0 is

roughly exponential with on average of size about

n,

the

of magnitude
just

for balance.

exactly the type


E(Afoo)

Warning.

have of

argument

been using for 'correct intuitive explanations' which might have misled us into thinking that

= 1 in

the first place.


E(M\342\200\236)

But, of

course,

the

result = 1

E{Mn\\Z\342\200\236 ^

0)P(Z\342\200\236 7^

0)

is a

matter of

obvious

fact.

PART

A:

FOUNDATIONS

Chapter

Measure

Spaces

1.0.

Introductory

remarks

Topology

is about

oyen sets.
the

function
Measure

/ is

that

inverse is about

of a continuous The characterizing property set of an G is open. open image f~^{G)

theory

measurable
the

sets. The
image

of

a measurable

function

/ is that

characterizingproperty
f\"^ (A)

inverse

of any

measurable set

is measurable.

In topology,
particular

that

intersection

one axiomatizesthe notion of 'open set', insisting of any collection of open setsis open, and sets is open. of a finite collectionof open
the

in
that

union

the

set', theory, one axiomatizesthe notion of 'measurable a of sets is countable collection measurable of insisting of measurable of a countable that the intersection collection sets measurable, and of a set must be is also measurable. Also, the measurable complement and the whole space must be measurable.Thus the measurable sets measurable, a a-algebra, a structure stable (or 'closed') under countably set form many that Without the insistence many operations operations. 'only countably - a point lost on are allowed', measure theory would be self-contradictory

In measure
that

the

union

certain

philosophers

of

probability.

The
sphere by

probability
5^ the

that

in R^ falls

a point chosen at into the subset F


What

random
of

on is just

the

5^

surface of the unit the area of F divided if the


for

total

area 47r.

could

be

easier?
(see

However, Banach and Tarski showed Axiom of Choiceis assumed, asit is throughout then there exists a subset F of the unit 14

Wagon
conventional

(1985))
R^

that
such

sphere

S^ in

mathematics, that

..(1.1)
Z <

Chapter 1: oo (and even


for

Measure

Spaces

15

k <

k =

oo), S^

is the disjoint union


r/*>F,

of

k exact

copies

ofF:

5^ = U
1=

has an 'area', then that area must conclusion is that the set F The 0. only simultaneously so is it is non-measurable complicated that one measurable): (not Lebesgue Tarski have not broken the Law of and cannot assign an areato it. Banach of Area: Conservation they have simply operated outsideits jurisdiction. Remarks, every rotation r has a fixed point x on S^ such that (i) Because = X, it is not possibleto find A of 5^ and a rotation r such a subset r(x) \342\200\224 = A we A U t{A) could not have taken k = 2. that S^ and f] t{A) 0. So, that proved given any two bounded (ii) Banach and Tarski even subsets A and B of R^ each with non-empty interior, it is possible to decompose A into a certain finite number n of disjoint pieces A \342\200\224 A,- and B into IJ^Lj = the same number n of disjoint B a way that, for ^^ such pieces |jr=i ^\302\253' to B,!!! So, we can disassemble each 2, Ai is Euclid-congruent A and rebuild
where

each

r^-

is

a rotation.

If F

be 47r/3,47r/4,...,

it as

B.
in

(iii) Section Al.l (optional!) an Axiom-of-Choice of construction

the

appendix

a non-measurable

to this chapter subset of 5^.

gives

This chapter introduces

a-algebras,
case for
probability

Tr-systems,

and

measures We

and emphasizes m^onotone-convergence properties of measures. in later chapters that, although not all sets are measurable, it
theory

shall

see

that

enough
a-algebra

sets are

measurable.

is always the

1.1. Definitions ofalgebra,


Let

5 be

a set.
of subsets

Algebra on S
A

collection 5)

subsets of
(i)
(iii)

Eq if

of S

is calledan

algebra

on

S (or

algebra of

S e
F,G

So,
=>

(ii) FeSo
[Note that 0 =
5^

F^:=5\\F\342\202\254Eo,

\342\202\254 So

=>

FUGe

So.

\342\202\254 So and

So F, C? \342\202\254

=>

F n

C?= (F\" U

G\\"")

\342\202\254 So.]

16

Chapter
an

1: Measure
of

Spaces
of 5

(^-V-finitely

Thus,
set

algebra

on

5 is

family

subsets

stable under
C

many

operations.

Exercise

(optional).

Let C be the classof


mtoo

subsets

of N for

which the

'density'
exists.
that

e C} lim m-^i{k :1 <k <m',k


think
of

a number

We might like to chosen at

this

density

random belongs to

(if it exists) as 'the probabiHty C\\ But therearemany reasons


theory.

why

this does not conform to a proper probability find elements Section 0.5.) For example,you should FnG

(We

saw one in
for

F and

G inC

which

^C.
terminology

Note on difference

('algebra the

versus

field').

An algebra in
fl

our senseis a
symmetric

true algebrain as 'sum', the


(This is
why

algebraists'

sense

with

as

product,

and

AAB:=(AUB)\\{AnB)
underlying we

field

of the

prefer
of

way that an

is
A

algebra
is,

of subsets' to 'algebra subsets is a field in the

algebra being the

field

with

2 elements. there

'field of

subsets':

is no

algebraists' sense- unless So

trivial^

that

Eq =

{5,0}.) of S
algebra

(7-algebra on S
collection

E of

subsets
is an

subsets of
then

5)

if

is calleda a-algebraon on S such that whenever


n

(or

cr-algebra

of

F\342\200\236GE(nGN),

[Note

that

if E

is a

on S and cr-algebra
n

G E F\342\200\236

for n G N,

then

n of

a Thus, collection

(7-algebra on 5 is a family of set operations'.


it

subsets

of S

'stable under

any

countable

Note. Whereas

is

element

of

1.8

many for

of

the

below

a first

element typical where possible

of a cr-algebra. This is the reasonfor on the much simpler 'tt-systems'.


where

to write in 'closed form' the typical of sets which we shall meet (see Section algebras it is down the usually impossibleto write example),
usually

possible

our

concentrating

Measurablespace
A

pair

(5,

E),
space.

5 is a

measurable

An element

set and E is a cr-algebraon 5, is calleda of E is calleda E-measurable subset of S.

..(1.2)
generated

Chapter
(7-algebra be

1: Measure

Spaces

17

by a class C of subsets of 5. Then cr(C), the a-algebra by C, generated is the smallest cr-algebra E on 5 such that C C E . It is the intersectionof of all class the on S which have C as a subclass. all (7-algebras (Obviously, which extends subsets of 5 is a cr-algebra C.)
(7(C),

Let

a class

of subsets

1.2.
Let

Borel cr-algebras, Examples. 5 be a topological space.


the

B(5), B = B{R)

B{S)

B(5),

Borel

cr-algebra

on 5,
slight

is the
abuse
:\342\200\224

open subsets of

S.

by generated cr-algebra

the

family

of

With

of notation,
cr(open

B{S)
B:=B(R)

sets).

standard shorthand that B := B{R). of all cr-algebras. The cr-algebra B is the most important Every subset of R which you meet in everyday use is an element of it is indeed B; and difficult to find a subset of R constructed explicitly (but possible!) (without the Axiom of Choice)which is not in B.
It is

Elements of B can be quite 7r(R) :=


(not

complicated.

However,
R}

the collection

{(_oo,a:]: x G
easy

a standard

case that

all we need to know

notation) is

very

to

understand,

and

it

is

often

the

about

B is
<T(7r(R)).

that

(a)
Proof

B=
of (a).

need only
But,

countable intersectionof open sets, All that remains to be proved cr(7r(R)). But every such G is a
show

For each x in R,

(--cx),a:]=
is
countable

flnGNC\"\"^^'^

the set
that a <

is ( \342\200\224cx),x]

+ ^~^)? in B.

^^ ^^^^

^^ ^
is in

every
union
b^

open subset G of of open intervals,

so we

that,

for a, 6 G

with

(a,6)6a(7r(R)).

for any

with

w >

a,
(\342\200\224oo, u]

(a, u] = and since,

n ( \342\200\224cxD, aY G cr(7r(R)),

for

= \302\243

^(6

\342\200\224

a),

{a,b)
we

[j{a,b-sn-%
n

see

that

(a, b) G cr(7r(R)), and the proof

is complete.

18
1.3. Definitions
Let function

Chapter

1:

Measure

Spaces

(1.3)..

5 be

concerning set functions let on 5, and set, let Eo be an algebra


/io : So

/zq be

a non-negative

set

~> [0,oc].

Additive

Then

/zq

is

called

additive

if /io(0)
=>

= 0

and, for

F,

G G

So,

F n G=

yio{F

U G)

= yLo{F)+

/io(G).

Countably additive
The

map

whenever {Fn : n 6
F

/zo is

called
(note

countably additive
N)

(or

cr-additive)

if

/i(0)
with

=
not

0 and
union

is

a sequence

in |JF\342\200\236

So

that

this is

an assumptionsinceEo

of disjoint

sets in Eo
need

be a

(7-algebra), then

po(F) = ^/.o(F\342\200\236).
n

Of

course

(why?),

a countably additive of measure

set function

is

additive.

1.4.

Definition
be

space on S. E is a cr-algebra

Let (5, E)
A

a measurable

space, so that
/i : E

map

-^ [0,cx)].

is calleda measureon (5, E) if /i is then called a measure space.

is countably

additive. The

triple (5, S, /z)

1.5. Definitions concerningmeasures Let (5, E, /i) be a measure space. Then /z (5, E, /i)) is called
finite

(or

indeed

the

measure

space

if /i(5)

< oo,

(7-finite if there

is a sequence : n 6 (5\342\200\236
li{Sn) <

N) of

elements of
(J

such

that

oo (Vn

and \342\202\254 N)

5. 5\342\200\236

Intuition is usually OK for finite measures, (7-finite measures. However, measureswhich are not there are no such in measures this book. fortunately,
Warning.

and cr-finite

adapts well for can be crazy;

,.(1.6)
Probability

Chapter 1: Measure
measure,
is

Spaces

19

probability
a probability

triple
measure if

Our measure yi

called

and (5, E, /i)

is

then

called

a probability

triple.
(a.e.)

//-null element of E, almosteverywhere F of E is called An element fi-nuUii fi(F)


5

= 0.
(a.e.)

A statement
if

S about points

of

5 is

said to

hold almost
F :=

everywhere

{s : S{s) is false}G

and

fi{F)

= 0.

1.6. LEMMA. Uniqueness of extension, 7r-systems


Moral: aim

cr-algebras to

are

'difRcult',

but 7r-systenis
on

are 'easy'; so we
a family
of

work
S be a

with

the latter.

Let \342\226\272(a)

set. Let

be

of S stable

under finite intersection:

a 7r-system

S, that is,

subsets

Let that

:=

cr(J).
=

fii(S)

Suppose that fJ'2{S) < cx) and


fjii

fii
fii

and
=

/i2

CLf^

measures

on

(5, E)

such

yL2 on J.

Then

fi2

on E.

\342\226\272 Corollary. \342\226\272(b)

If
they

two
on

then

agree

measures agree on a probability the cr-algebra by that generated


is

7r-system,
7r-system.

The example B=
the

(7(7r(R))

of course

the most

E =

cr(J) in the

theorem.
an

important exampleof
it

This result will


frequently than will of this, the proof of

play the Lemma

important

role. Indeed,

will

be

celebrated

to

existence result in Section 1.6 given in Sections Al.2-1.4

more applied 1.7. Because

this

chapter

should

perhaps

be consulted

of the appendix - but read the remainder of

this

chapter

first.

20

Chapter 1:

MeasureSpaces
Extension

(1.7)..
Theorem

1.7. THEOREM. Caratheodory's


\342\200\242\342\226\272 Let S be

a set,

let So

be

an

algebra

on S,

and let

E:=(7(Eo).

If

fiQ

is

a countably

additive
such

map
^i ^

fio

: T,o

-^ [0,oo],

then there

exists a

measureji on (5, E)
< oo,

that
fiQ

on

Eo-

If fJ>o{S)

then,

by

Lemma

1.6,

this extension

is

unique

an

algebra is

a ir-system!
result

In a sense,this
without
we it

should

have

more
use

\342\226\272 signs

than

any

we

could

not

construct

any interesting models. However, once


of the

other,

for

have

our

model,
of

we make
this

no further

theorem.

The proof

result It

there
course.

for

completeness.

Let us

now see how

of the appendix given in Sections A 1.5-1.8 will do no harm to assume the result for is used. the theorem

is
this

1.8. Lebesgue
Let

S =

(0,1].

measure Leb on ((0,1],B(0,1]) For F C 5, say that F G Eo if F may be written as a finite


F = < ai

union

(*)

(ai,6i]U...U(a^,6r]
<

where r \342\202\254 0 N, (0,1] and


(We

< 6i < \" -

ar

< br

< 1.

Then Eq

is

an

algebra

on

write

S(0,1]

instead

E:=(7(Eo) = B(0,l]. of S((0,1]).) For F as at (*), let

fio{F)==J2(bk-ak).
k<r

Then

fiQ

is well-defined

and additive

measure 1.7, there existsa unique fi on ((0,1],B(0,1]) measure n is called Lebesgue measure on ((0,1],-B(0,1]) or (loosely) Lebesgue measureon (0,1]. We shall often denote fi by Leb. measure (still denoted by Leb) on ([0,1],B[0,1]) is of course Lebesgue obtained the set {0}having measure 0. Of by a trivial modification, Lebesgue the concept of length. course, Leb ma<kes precise

countably

additive
Eq. This

on Eq.

by Theorem
fiQ extending

(This is not

on Eq (this is easy). Moreover,


trivial.

fio

is

See

Section

A1.9.)

Hence,

on

In a similar way, we also denoteby Leb)

we

can on

construct (a-finite) R (more strictly, on

Lebesgue measure (which

(R,S(R)).

..(1.10)
1.9. LEMMA.
Let

Chapter

1:

Measure

Spaces

21

Elementary inequalities
measure space.
Then

(5,

E,//)

he a

(a)

fi(AuB)<fi{A)-hfi{B)

(A,BgE), (Fi , F2,

\342\226\272(b) K[J^<nF^)<E^<nKF^)

. . . , F\342\200\236 G E).

Furthermore^
(c) (d)

if fi{S)

< oo,

then

fi{AuB)

= fi{A)^fi{B)-fi{AnB) formula):
=

(A,B\342\202\254E),

(inclusion-exclusion

for Fi,

F2,...,

Fn

G E,

^(U<

\342\226\240^\342\226\240)

E<

^(^\342\226\240)-EE.<

Ki^.ni^;)

successive partial sums


You
Result will

alternating

between

over-

and under-estimates.

(c)

is obvious

(c)=>(a)=>(b)
from

(c)

by

is by integration.

some version of these resultspreviously. But AUB is the disjointunion AU(j5\\(AnB)). - check that 'infinities do not matter'. You can deduce (d) induction, but, as we shall seelater,the neat way to prove (d)
surely

have

seen

because

1.10. LEMMA.Monotone-convergence properties


of

measures

These

results

are often

needed
Shakespeare'

for

making

things

rigorous.

measure space.
If \342\226\272(a)

to the 'Monkey typing


Fn

(Peep

ahead

Section

4.9.)

Again,

let (5,

E,//) be a

G E

(n G N)
property
Gi

and

Fn

T F,

then

T //(F\342\200\236)

//(F).

Notes.
the

Fn]

F means:

C Fn+i F\342\200\236

(Vn G N),

[jFn =
(n

F. Result (a) is
Then the

fundamental

of measure.
:=

Proof of{8i). Write Gji (n G N) are


fiiF\342\200\236)

Fi,

Gn :=

F\342\200\236\\Fn-i

> 2).

sets

disjoint^ U G2
U

and
...

fi{Gi

G\342\200\236)

fiiGk)
the

kKn

^(*^*) Yl k<.oo

= ^(^)-

\302\260

Application.

In a

proper

formulation

of

branching-process

example

of

Chapter 0,
{Z\342\200\2360}

t {extinction

occurs),

so that

tt^

tt.

(A

proper

formulation

of the

branching-process examplewill

be

given

later.)

22
If

Chapter 1:
Gn
\342\202\254 G\342\200\236 i G S,

MeasureSpaces
cx)

(1.10)..
h, then
now
i //(G\342\200\236)

\342\226\272(b)

and

^i{Gk) <

for

some

//(G).

Proof

of{h).
to

For n 6 N, let
indicate

:= Ga:\\Ga;+\342\200\236, and F\342\200\236

apply

part (a).

Example -

what

can

'go

wrong\\

For

n G

N, let

Hn :=

(n,oo).

Then
(c)

Leb(^n)

cx),Vn,

but

i?\342\200\236 j 0.

The \342\226\272

union

of a

countable number
of results

This is a trivial

corollary

sets is fi-null. and (1.9,b) (1.10,a).


of

fi-null

1.11. Example/Warning
Let positive

(5,

S,//)

numbers

be a sequence be ([0,1],S[0, l],Leb). Let \342\202\254{k) that \342\202\254{k) such | 0. For a singlepoint x of 5,

of strictly
we

have

(a)

{x} C
for

(x -

e{k),

x -h

e{k))

n S,

so that
of

measurable
open

every
follows

fc,

fi({x}) because

<

2\342\202\254{k),

and

{x}

is the

= 0. That {x} is B{S)so fi{{x}) intersection of the countablenumber


(a).

subsets

Let
countable

V =
of

right-hand side of Q n [0,1], the set of rationals


of S

on the

in
it

union

measurableand that
S
of

singletons:
Leb(V)

V =

[0,1].
is
V

{vn

: n

G N},

clear

0. We

can include

Since V is a that V is iB[0,1]in an open subset of

measure

at

most

as follows: 4\342\202\254{k)

VCGk=

[j
fiGN

[(v\342\200\236

e(k)2-\",

+ e(fc)2-\") v\342\200\236

n 5]

=: jj

/\342\200\236,*.

Clearly,
that

consequence
of

:= pj^ Gk
the

satisfies Leb(-fir)
category

0 and

Baire

H is
the

(b)

set

uncountable^ so H is an uncountable set of measure 0; moreoverj


k

theorem (see

V C H. Now, it is a the appendixto this chapter)

n
to

n
be

Throughout
of

the subject, we have

careful

about

interchanging orders

operations.

Chapter

Events

2.1. Model for experiment: (Q,^,P)


A

model

for
triple

an experiment
(fi,^,

probability

P) in

involving randomness 1.5. the sense of Section

takes the form

of

Sample

space

f] is a
Sample
A

set calledthe
point
u; of

sample

space.

point

f] is

called a samplepoint
on f]

Event The element


By

(7-algebra of

^ ^,

is called the

family

of

events,
Q.

so that

an event is an
on

that

is, an

^-measurable subset of triple, P is a probability

definition

of probability

measure

(f],^).

2.2.
Tyche,

The intuitive
Goddess

meaning
point u; of f] 'at random' according to ^, P(F) represents the 'probability' (in the sense that the point uj chosen by Tyche belongs to intuition)
chooses a
uj

the law P in that,


understood

of Chance, F in for

by

our

F.

The chosen thereis a map

point

determines

the

outcome

of the

experiment. Thus

\342\200\224> set of

outcomes,

u;
There
should

\302\273\342\200\224> outcome.

is no
be

reason

why

this

'map'
an

(the
that

co-domain
although

lies in our
there

one-one.

Often

it is

the case

intuition!)
obvious
tossing

is some
of

richermodel.
by imbedding

'minimal' or 'canonical'model for


(For

experiment,
in

it is

better to
coin

use some

example,

we can

the associated

random walk

read off many

properties

a Brownian

motion.)

24
2.3.
We

Chapter 2:
Examples
leave

Events

(2,3)..

of (f],^)
question

pairs probabilities
can
until

the

of assigning

later.

(a)

Experiment:
Q. =

Toss coin

twice. We

take

[HH, HT,

TH, TT},
event

T = P(fi) :=
'At

set

of

all

subsets

of Q.

In this
by

model,the intuitive
mathematical
Toss

least

one head

the

event
coin

(element
infinitely

of ^)
often.

is obtained' is described {HH^HT^TH}.


We can

(b) Experiment:

take

n = {H,T}'-',
SO

that

a typical

point cj
uj

of f]

is

a sequence

(u;i,u;2,...)^

^n G {H,T}.
intuitive

We certainly wish to speak of the to choose {if, T}, and it is natural :f = (t{{ujen:ujn
Although

event

'ujn =

W\\

where

= w}:neN,w
it turns
that

e {h,t}).
for ^ is big enough;

T we

7^

'Pi^)

(accept

this!),

out that
truth

example,

shall

see in Section

3.7

the

set

p^(.
of

Kk<n:u;,=

H) ^

11
'1 2
m.odel

the

statement

number

of heads in n

tosses

is
for

an

element

of !F.

Note that we
the

experiment

outcomes.
Q>

can use the current model as a moreinformative in (a), using the map u \302\273-> of sample (u;i,u;2)
a point

points

to

Choose (c) Experiment:

between
the

0 and
point

1 uniform.ly
chosen.

at

random,.

Take

[0,1],^ case

=
P

B[0, l],u;
=Leb.

signifying

In this

obviously taJce
for

The
will

sense in which this


be

the

of a fair

coin

model containsmodel (b)

case, we

explained

later.

..(2.5)

Chapter 2: Events
surely

25

2.4. Almost
statement \342\226\272A

(a.s.)

about

outcomes

is said to

be true
and

almost

surely

(a.s.)^

or

with probability

1 (w.p.l)^if
F

:= {lv:

S{uj) is
E J^

true} G

P(F)

= 1.

(a) Proposition. If Fn
Proof.

(n e

N) and

P(Fn) = l,Vn, then


0-

P(F^)

= 0,Vn,

so, by Lemma
about.

1.10(c), P(Un-^n)=

But

f]Fn

([jF^y.

(b) Somethingto
develop

think

probability

without

Some measure

philosophers have tried to distinguished theory. One of the reasonsfor difficulty

is the

following.

When the
probability

measure

(SLLN)

to define the appropriate discussion(2.3,b)isextended of Large Numbers for fair coin tossing, the Strong Law = 1, where F, the truth set of the states that F \342\202\254 ^ and P(F)
'proportion
be let
of

statement

heads

in

n tosses

\342\200\224>

i',

is defined

formally in < a(2)

(2.3,b).

For a e

Let A
A,

the

set of

all maps

a :N

\342\200\224* N such

that

a(l)

<... .

= ^\342\200\236

{.,:\302\253^-^\"'^\302\260'\"-^'^l}

the 'truth
we

set
P(Fa)

of

the

Strong
G

Law
A.

for the

have

= l,Va
that

subsequence a\\ Then, of

course,

Exercise.

Prove

(Hint

For

any

given
that

cj, find
the

an a
concept

... .)
of 'almost

The moralis
but

precision, (ii) enough flexibility to into which those innocent of measure theory

surely' gives us
avoid

(i)

absolute

also

the

self-contradictions

too

since

easily fall.
they

(Of course,
thought

philosophers

are

pompous

where

we are

think deeply

... .)
Hmsup,

precise,

axe

to

2.5. Reminder:
: n G (a) Let (x\342\200\236

Hminf,

| lim,
real

etc.
numbers.

N) be

a sequenceof
<

We define

lim sup x\342\200\236 := inf ^

sup

[n>m

Xn f =i

lim { ^

ln>m

sup

> G [\342\200\22400,00]. x\342\200\236

26
Obviously,

Chapter 2: Events
ym '-=
limits

(2.5)..

^'^Pn>m ^n
exists

in is monotonenon-increasing

m,

so that

the

hmit of the
monotone

sequencey^
will be

in

The [\342\200\22400,00].
will

use

of tHm

handy, as

t/n

J,

t/oo

to signify

or |Um to signify t/oo =i limt/n-

(b)

Analogously,

liminf

Xn

:=

sup

<

inf

Xn \\ =T

li^ {

i^^f

^n

\342\202\254 [\342\200\22400,00].

(c)

We

have
in
<==^ =

Xn

converges

[\342\200\22400,00]

limsupx\342\200\236

liminf

Xn,

and then
Note \342\226\272(d)

limxn =
that

limsupx\342\200\236

liminf

x\342\200\236.

(i) if z

> limsupxn, then


Xn

<

z eventually
then

(that

is, for all


(that

sufficiently

large

n)

(ii) if 2: <
Xn

limsupx\342\200\236,

> z

infinitely

often

is, for

infinitely

many

n).

2.6.
The

Definitions.
event

limsupjE^n,(\302\243'n,

i.o.) the

(in the

rigorous formulation:
heads/

truth

set of

the statement)
^'

'number of
is

number

of tosses
n^^

\342\200\224>

built

out of

rather complicatedway.

simple events such as 'the


We

toss

need
of

handle complicated combinations lim sups of sets provides what


It
might

a systematic events. The

in heads' in a method of being able to idea of taking lim infs and


results

is required.

be

helpful

to note

the tautology that,


{uj

if

is an

event, then

E=
Suppose
\342\226\272(a)

:ujeE}.
Z5 a

now that (En

:n 6

N)

sequence

of events.

We

define
: =

(\302\243*\342\200\236, i.o.)

(En

infinitely
-=

often)
f]

:= =
=

limsup\302\243'n

m n>m

[j

En

{uj {uj

: for : uj

every

m,

3n{uj) > m
many

such that
n}.

u; G

^n(u;)}

E En for

infinitely

..(2.8)
Fatou

Chapter
Lemma

2: Events

27
of

(Reverse \342\226\272(b)

- needs
>

FINITENESS
limsupP(E\342\200\236).

P)

P(limsupJE;\342\200\236)

Proof.

Let

Gm

\342\226\240=

where G := HmsupE\342\200\236T

Un>m

^nBy

Then

(look at
(1.10,b),
>

the

definition

in

(a))

Gm i

G,

result

i P(G\342\200\236)

P(G).

But,

clearly,

P(G\342\200\236.)

sup

P{En).

Hence,

P(G) >i

Hm \"*

I sup
Ln>m

P{En)\\
J

=: limsupPC^n).

2.7.
\342\226\272 \342\226\272

First

Borel-Cantelli
Let

Lemma
G
Then

{En

: n

(BCl) N) be a sequence of
En) =

events such that

X:\342\200\236P(^n)<oo.

P{\\{m sup
Proof.

P{En, i.o.)= 0.
for

With

the

notation

of (2.6,b), P(G)<P(Gn^)<

we have,
n>m

each

m,

yP{En),

using

(1.9,b)

and

(1.10,a).

Now let m

cx). will

D be

Notes,

(i) An

instructive proof by integration


First

given

later.
will

within

(ii) Many applications of the this course. Interesting


random

Borel-Cantelli

Lemma

be

given

applicationsrequireconceptsof
(En, is

independence,

variables,

etc..
ev) a sequence

2.8.
Again

Definitions,
define

liminf jE^n, suppose that {En : n 6 N)


{En,

of events,

We \342\226\272(a)

ev) :

= {En eventually)
limmiEn

: =

:=[j

f| En
m(a;), uj G

= {a;: for some = {uj : LJ \302\243 En


(b) Note
\342\226\272 \342\226\272(c) (Fatou's that

En,\\/n >

m{u;)}

for

all large

n}.

{En,

evf

= {E^,

i.o.).

Lemma

for sets

- true for
with

ALL

measure
P(\302\243'n).

spaces)

P{\\hnin{En)
Exercise.

< liminf
the

Prove

this

(1.10,a)rather than

in analogy

proof

of result

(2.6,b), using

(1.10,b).

Chapter

2: Events

(2.9).

2.9. Exercise

For an event

jB,

define

the indicator

function I^- on Q via

i.M:={;;

:j
events.

uj

E.

Let {En

: n 6 N)

be a sequence of

Prove

that,

for each u;,

Iiimsup\302\243;\342\200\236(^) limsupl\302\243;\342\200\236(u;),

and

estabHsh

the

corresponding

result for

Um

infs.

Chapter

Random

Variables

Let

(5,

E)

be a

measurable space,

so that

is a

cr-algebra

on S.

3.1. Definitions. E-measurable function, mS,(mE)\"^,bE


Suppose

that

h :

S -^ R. For
h-\\A)

R, define
S:h{s)\302\243A].

:={se
/i\"^

Then

h is

called H-measurable if

: B

-^ T,, that function

is, h-^(A)

6 E, VA

E B.

So, here

is a

picture of

E-measurable

h:

Eiilis We write
the

mE for

the

class

of E-measurable

functions on
We

class

bounded E-measurable functions


Note.
infinite,

of non-negative

elements in mE.
on

5, and (mE)\"^
bE

for

denote

by

the class

of

5. even of finite-valued functions may be convenient to extend thesedefinitions in

Because
and

lim sups
for

of

sequences

other

reasons,

it is

to

functions

Tt-measurahle
Which

h taking if h~^ : of the

values in
S[\342\200\224oo, oo]

[\342\200\224oo, oo] \342\200\224> E.

the

obvious

way:

is

called

various results
in
[\342\200\224oo, oo],

stated for
and

real-valued

functions

extend

to

functions
obvious.

with

values

what

these extensions

are, should be

Borel

function
is S

A function h from a topologicalspace5 to R measurable. The most important caseis when

called itself

Borel is R.

if h is

B{S)-

29

30
3.2.

Chapter 8: Random
Elementary
The

Variables

(3.2)..

Propositions
preserves

on nieasurability operations:
h-\\A^)

(a)

map

h~^

all set

h-\\[j^A,) = [j,h-\\A,),
This is just
C B
Let definition

{h-^{A)y,

etc.
D

Proof.
IfC \342\226\272(b)

chasing.
then /i\"\"^

and g{C) = B,
the

C -^ E

=>
such

he mS.
that

Proof.

be \302\243

class

of elements
ft : 5
\342\200\224> R is

\\n

result (a), \302\243* is a

cr-algebra,
and

and, by hypothesis, S DC.


continuous, of R,

h~^(B)
h

E E. By

(c) If S is
Proof.
\342\226\272(d)

topological

then

is

Borel.

Take C
For

to be the classof

open

subsets

and apply
-^ R is
(Vc E

result (b). D
Jl-m,easurable

any

measurable

space (5,

E), a function

h : S

if

{h<c}:={seS:
Proof. Take C to be the class7r(R) and apply result (b).
of

h(s) < c}
intervals

R).
c E

of the

form

c], (\342\200\224oo,

R,

Note.
{h

Obviously,

similar

results

apply

in which

> c},

{h > c},

etc.
measurable

by {h < c} is replaced

3.3. LEMMA.Sums and productsof


measurable
\342\226\272 mS

functions

are

is an
R

algebra over R,
and

that

is,

if \\ E

/i, /ii, /i2

E mE, then
hih2

hi -{-h2
Example
if

E mS,

E mE,
it

\\h

E mE.

and

only

of proof. Let c E R. Then for 5 E 5, if for some rational g, we have


hi{s)

is clear

that

hi{s)-^h2{s)

> c

> q

> c

\342\200\224

h2{s).

In other

words,

{hi +
union

/i2

>

c}

y
qeQ

({hi >

q}n{h2>c-

q}),
D

countable

of elements

of E.

,.(8.6)
3.4. Composition

Chapter

3:

Random

Variables

31

Lemma.
mB, then f o h
E mE.

If h
Proof.

E mE

and f G

Draw the

picture:

s -!urMr

in moreadvanced
and

Note.There
h : Si
this

are

obvious theory):

generaUzations if (5i,Ei)
h

-^
point

From
3.5.
\342\226\272 \342\226\272

82^ then
of

is

called

based on the definition (important and (52, E2) are measurable spaces if h~^ : E2 -^ Ei. E1/E2-measurable

view,

what

we have

called Y^-measurable

should

read

TiIB-measurable LEMMA
Let

(or perhaps

E/S[\342\200\224oc,

00]-measurable).

on measurability
: n
(i)
\302\243 N)

(hn

be a

of infs, lim infs of sequence of elements o/mE.


00], ([\342\200\22400,
S[\342\200\22400, 00]),

functions
Then

inf/in?
(into

(ii) liminf/in,

(iii) lim sup/in


but

are Ti-m.easurable
inf
hn

we shall

still write

E mE

(for

example)).

Further,
exists

(iv) {s : lim/in('S) Proof (i) {inf/in> C} (ii) Let Ln{s) := \\ni{hr{s)


and{i:<c}
=

in R}

E E.

flni^^n
: r

> c}. > n}.


(i).

Then Ln E mE, by part = := lim L(s) inf/in('S) =| limXn('S) supXn('S), = nn{^n<c}EE.


lim/in

But

(iii) This part is now obvious, (iv) This is also clear becausethe set on which
{limsup/in
where
^f

exists

in R is

< 00}n {liminf/in> -00}n5f~^({0}),


:=

lim sup/in

\342\200\224

liminf/in.

3.6.

Definition.
(f],

Random
our (sample
Thus,

variable space, family


R,
of

\342\226\272Let

elementof

!F) be m^.

events).

A random

variable is an

X '.n-^

X-^ '.B-^T.

32
3.7.

Chapter S: Random
Example.

Variables

(3.7)..

Coin tossing
=

Let n =

{H,T}'^,u

(ui,U2,...),u;ne
: u;n

{H,T}.

As in (2.3,b),

we define

f = aiW
Let

= W}

: n e N,W

e {H,T}).

The

definition

of

Lemma3.3,
Sn

f guarantees

that

each

Xn

is a

random variable. By

:=

Xi

+ X2

\342\200\242 \342\200\242 \342\200\242

Xn

= number

of heads in n

tosses

is a random variable. Next, for


A:=
p

6 [0,1],
of

we have
heads
y

<uj :
\\

number
number

1
p}

of tosses

{uj :

. _.. = X^(u;) v y

.^, p} /-j H

{uj i

: L

r-/(u;)
\\

\\

i = y). p},

where 3.5, A
\342\226\272 \342\226\272

X\"^

:=

lim

supn~^5n

and L~

is the corresponding lim inf.

By

Lemma

JF. \342\202\254

we have taken an important result is meaningful! It only remains


Thus,

step towards
to

the

prove

that

it

Law: Strong is true!

the

3.8.
on

Definition,
Q,

cr-algebra

generated

by a collectionof functions
in

This

is an

important idea,

discussedfurther
every

Section

3.14.

weakest topologywhich
etc.)

(Compare
continuous,

the

makes

function

in a

given family

In Example

3.7, we have
set fi,
{Xn : n

a given

a The best
in the
way

family

6 N)

of maps
6

Xn that

-^ R.
is as

to

think

of the

a-algebra T in

example

T = a{Xn : n sense now to


if we

N)

be described.
a collection
(F-y

\342\226\272 \342\226\272Generally,

have

: 7

6 C)

of maps

Ky

: f]

-^' R,

then

3^ :=

a(K, : 7 \342\202\254 C)

..(3.10)

Chapter

3: Random

Variables
each

33
map

is defined to he the smallest a-algebra y onO, such that is y-measurahle. Clearly,


a{Yy

Yy (7 E

C)

= : 7 \342\202\254 C)

a({u;

^ : \342\202\254
for

F-^H
(f],^),

: \342\202\254 B}

\342\202\254 C, B

\342\202\254 S).

If X

is a
(i)

random variable
The

some

then,

of course,

cr{X) C T.

in this section is somethingwhich introduced you about work as pick up gradually you through the course. Don't worry it now; think about it, yes! to our aid. For example,if {Xn : n 6 N) is a come 7r-systems (ii) Normally,
idea will

Remarks,

collectionof
[J
A'n

functions

on

f], and

Xn

denotes
which

a^Xk

fc

<

n),

then

is

a TT-system

(indeed,

an algebra)

generates

(j{Xn

: n

the union 6 N).

3.9. Definitions. Law, distributionfunction that X is a random variable carried by Suppose We have (f],jF,P).

some

probability

triple

n^R
[0,1]^J'^B,
Define

the law Cx

or indeed [0,1]-^ of X by
Cx:=PoX-\\

a{X)

^B.

Cx:B

^[0,1].

Then (Exercise!) Cx is a probability measure on (R,S). Since 7r(R) = Lemma 1.6 {(\342\200\224cx),c] : c 6 R} is a 7r-system which generates S, Uniqueness shows that Cx is determined by the function as defined Fx : R \342\200\224> [0,1]
follows:

Fx(c) :=
The

\302\243x(-oo,c]

P(X

< c)

= P{uj : X{uj)<
of

c}.

function

Fx

is called

the distribution

function

X.

3.10.
Suppose

Properties
that

of distribution
is the

X.

Then

functions distribution function F = Fx of

some

random

variable

(a)
(b)
(c)

lim^^oo F{x) = 1, lim:c-.-oo F{x)


F is

F:R-^[0,1],

(that

is,

x <

=\302\273

F(x)

< F(y)),

0,

right-continuous.
using

Proof of (c). By
and

Lemma

(1.10,b),

we see that

P(X <x-f n-i)iP(X


this

<x),
of

fact

continuous.

together

with the
any

monotonicity
ends.

Fx

shows

that

Fx is

right-

Exercise! Clear

up

loose

34
3.11.
function \342\226\272If F Section

Chapter S:
Existence
has

Random

Variables

(S.ll)..

of random

variable with

given

distribution

1.8

on

the properties the existence

probability measure Take (17,J^, P)


Note. The
associated

C on (R,5) such
C{~oQ,x]
S,
\302\243),

(a,b,c) in Section3.10,then, of Lebesgue measure, we


that

by can

analogy construct

with a unique

= F{x),\\fx.
= co. Then =

(R,

X{u;) Fx{x)

it is

tautological

that

F{x)yx.
Lebesgue-Stieltjes

measure C just described is calledthe with F. Its existence is proved in the

measure

next

section.
with

3.12.
prescribed

Skorokhod
distribution

representation of a random variable


function [0,1]

Again

let F

->
with

have

properties
function

random

variable
Define

(3.10,a,b,c).
F carried

We

can

construct

distribution

by

(Q,^,P)

as follows.
for

= ([0,l],S[0,l],Leb)
equalities, which you can

(the only)

right-hand

prove, are there

clarification

(al)

X+(w)

:=

\\rd{z

: F{z)

> a;}

= supjy

: F{y) < a.},

(al)
The

X-{lo)

:=

hd{z

: F{z)

> w} =

snp{y : F{y) < co}.

following

picture

shows

cases to

watch out for.

F{x)

M 0
X\302\261(a;)

X-{Fix))

X+{Fix))

By

definition

of ^~,
{CO

<

F{c))

iX-{co)

< c).

..(3.12)
Now,

Chapter S: Random

Variables

35

(^>.Y-(u;))

=^

{F{z)>ul

so, by

the

right-continuity

of F,

F{X~{ijo)) >
L:<

u, and
< F(c)\\.

{X-{u) < Thus,


(u

c)

F{X-{u:))

<

F(c))

<=^

iX-{uj)
P(X-

< c),

so that
= F(c).

< c)

(b)
It
will

The measure

variable C

X^

therefore

has distribution

function F,

and

the

in

Section later

3.11 is

just

the

law

of X~.

be

important

to know that
function

(c)

X'^ alsohas distribution

Fy

and

that, indeed,

P(X+ = X-)
Proof

= 1.

of

(c).

By definition

of

X\"^,

(w

< F{c))

=> X\" <

(X+(u;) <

c),

so that F(c) < P{X+

<

c).

Since

X+, it is clearthat

cGQ

But,

for every

c6

R,

P(X-

< c

< X+)

= P{{X- < c}\\{X+


follows.

<

c})

< F(c)

- F(c) = 0.

Since

is countable,

the result

in fact true that every experiment you will meet in this (or course can be modelled via the triple ([0,1],S[0, l],Leb). (You will to be convinced of this by the end of the next start However, chapter.) this observation normally has value. only curiosity

Remark. It is
any other)

86
3.13.
Suppose experiment

Chapters:
Generated
that has

Random

Variables

(3.IS).,

(Q,^, been

cr-algebras - a discussion and that the experiment, P) is a model for some has made Section that so Tyche 2.2) (see performed,
a collection

her

choice of

u.
(Ky

Let
our

: 7

be \342\202\254 C)

experiment,

and

suppose

that

with of random variables associated the to someone reports following you

information
(*)

about
values

the

the chosen point uj: Yy{uj), that isj the observed values of

the

random

variables

Y, (7 e C).

Then

it : 7 \342\202\254 of the cr-algebra 3^ := cr(Ky the intuitive C) is that significance can F for which, for each and every consists precisely of those events u;, you or not uj E F) on decide whether or not F has occurred is, whether (that the information the basis of the information (*) is precisely equivalent (*); to the information: following

variable Y is given and


that

(**) the values If{uj) (F \342\202\254 y). Prove that the cr-algebra (a) Exercise. (t(Y)
by

generated

by

a single

random

a{Y)
cr(Y)

= Y-\\B)

is generated

:= ({u; : Y{uj) e by the 7r-system


<

B}

: B

e B),

7r(r) := {{u: Y{uj)

x}

: x

E R)

= F-'(7r(R)).
things.
in

D
the

The reading
if

following

results

might

help

this

section

after

(c)! Results Z :

(b) and (c) are proved

clarify

Good advice: stop


appendix

to this chapter.

(b) If y : f]
only

\342\200\224> then

R,

f]

\342\200\224> R is

there

exists

a Borel

function / : R

an <7(y)-measurable -^> R such that

function if and Z = f(Y).

from f2 to R, then a function Z : Q, -^ R Yn are functions Yi, F2,. \342\200\242., is cr(Yi, F2, \342\200\242 \342\200\242 if and \342\200\242, only if there exists a Borel function yn)-measurable / on R\" such that Z = /(Yi, F2, \342\200\242 \342\200\242 We shall see in the appendix that \342\200\242, Yn). the more correct measurability condition on / is that / be 'S\"-measurable'.

(c) If

functions
only

(d) If

(Yy

: 7
from

E C)
Q

is a

to

R, then Z

collection(parametrized by
: fi
\342\200\224^ R is

the

infinite

set

C) of
if

if there

exists a
/

Borel function
Warning much

on R^

countablesequence (ji :i E
such that Z =
/(K,.,K,\342\200\236...).

a{Yy : 7
N)

6 C)-measurable
of elements

and

of C

and a

- for the over-enthusiastic larger than the C-fold product

latter

rather than the

measure space

only. For uncountable C,


H^^c^i^)-

S(R^)

is

^^ is

the

former

which

gives

the appropriate

type of /

in (d).

..(3.14)
3.14.

Chapters:
Monotone-Class
that

Random

Variables

37

The

Theorem
Lemma

us to deduce results the 'elementary' 7r-systems, following (7-algebras version of the Monotone-Class Theorem allows us to deduceresultsabout of ttfunctions from results about indicatorsof elements measurable general the we shallnot use theorem in the main text, preferring systems. Generally, measure in Chapter 8, it for 'just to use barehands'. However, product In the same way
about

Uniqueness

1.6 allows

from

results

about

becomes

indispensable.

THEOREM.
\342\226\272 \342\226\272

Let

Ti

he a

class of hounded
conditions:
space

functions

from

a set

S into

satisfying

the

following

(i) H
(iii)

is a vector
is a
f

over

R;
7i;

(ii) the constant


if (fn)
where

function 1 is an elementof

sequenceof
the

non-negative

functions

in H,
f

such that
tt-

fn^f
system

is a

hounded function
indicator

on 5,

then

E 7i.

Then if 7i
I,

contains
then

function
hounded

of every
(j(I)-measurahle

set in some
function

Ti contains

every
chapter.

on S.

For proof,

see the appendix to this

Chapter

Independence

Let (fi,^, P) be a
4.1.

probability

triple.

Definitions
We

of independence
attention

the on the cr-algebra formulation (and describe to acclimatize ourselves in terms of of familiar forms more independence it) information. as the natural means of summarizing of cr-algebras to thinking definitions Section 4.2 shows that the fancy agree with the ones cr-algebra

Note.

focus

from

elementary

courses.

Independent
\342\226\272 Sub-<J-algebras Gi

a-algebras
of J^ ^1,^2,-\342\200\242\342\200\242

are

called
then

independent if, whenever Gi


n

(i

N) and

I'l,...,

are 2\342\200\236 distinct,

Independent
\342\226\272Random

random
Xi,X2,...

variables
are called

variables

independent if the

cr-algebras

aiXi),a(X2),...

are

independent.

Independent
\342\226\272Events

events
jE^i,JEJ2,...

are

called

independent
cr-algebra

if the

cr-algebras \302\243*i,52,...

are

independent,

where
is \302\243n

the

{0,

\302\243'n,Q\\En,f]}. \342\200\242 \342\200\242 \342\200\242 are independent

Since

= \302\243n

(^{lEn)?

i^ follows that

only if the

random variablesIei , IE2

events jEi, jE^2,

if and

\342\200\242 \342\200\242 ^-re independent. ? \342\200\242

38

,.(4.2)

Chapter 4'
TT-system

Independence
more

39
definitions
independent

4.2. The
We

Lemma;

and

the

familiar

know

from

elementary

theory
and

and only if whenever n G N

that events jE^i, \302\243\"2,... are z'l,... , in are distinct, then


n

if

corresponding
consequences

results
of

involving

complements

of the Ei^ etc.,

being

this.

We now
generalization (manageable)

use the
idea,

of

this

allowing
rather

a significant UniquenessLemma1.6to obtain us to study independence via

TT-systems

than
the

(awkward)
case

cr-algebras. cr-algebras.
J-',

on Let us concentrate

of two

\342\226\272 \342\226\272(a)

LEMMA.

Suppose

that Q
with

1 and

J are TT-systems

and H are sub-a-algebras of


= g,

and

that

a{i)
Q and
that

<7{j) =

n.
and J

Then

H are

in

independent if and only


n J)

if 2

J art ej.

independent

p(/

= P(/)P(

J),

J,

Proof Supposethat

J and

J are independent. For fixed

/ in

J, the

measures

(check

that

they

are

measures!)

H ^P{ln
on

H) and H ^

P(I)P(H)
J.

have the same total mass P(/), and agreeon (^^H) they therefore agree on cr{J) = W. Hence, p(/nH)==

By Lemma

1.6,

P(/)P(i?),
the

/GJ,

Hen.

Thus,

for

fixed

in

7Y,

measures

G^PiGD
on

H) and

G ^ P(G)P(H)

(f],

Q) have
=

the same
Q; and

agree on cr(Z)

this is what

total mass P(-H\,") and agree on J. They therefore we set out to prove. D

^0
Suppose

Chapter4now that X
x,y

I'Tidependence

(4-^)\"

and Y are two

random

variables

on (fi,

^, P)

such

that,
(b)

whenever

6 R,

P{X <x;Y <y)

P{X

<

x)P{Y

< y).

Now, (b)
independent.

says that
Hence

the

Tr-systems

independent in the senseof


In the
are

cr{X)

and

and 7r(F) (see are independent: that cr{Y)


7r(X)
4.1.

Section 3.13)are
is,

and

Y are

Definition

same

way,

we

can

prove

that random
n

\342\200\242 variables Xi, X2, \342\200\242 Xn \342\200\242,

independent

if and only if
<Xk

P{Xk

:l<k<n):=Y[
from

P{Xk < Xk),


theory.

and all the


Command:
4.3.
\342\226\272 \342\226\272

familiar

things

elementary

Do ExerciseE4.1now.
Borel-Cantelli
: n

Second
If

Lemma

(BC2)
eventSj

(En

E N) is a sequence0/independent
= 00

then

J2P{En)
First,

=^

P{En, i.o.) = P(limsupEn)= 1.

Proof,

we have

(limsupEny
With

- liminf ^^ = have

|J Q

E^.

pn

denoting

P(jE^n), we

\\n>m

J condition

n>m {n >
and
two

this equation being true if {r > n > m}, because of


justified

the

m} is
the

replaced by
limit

condition

independence,

as r

by the monotonicity For X >

of the

| 00

being

sides.

0,

1 \342\200\224 X < exp(\342\200\224x),

so

that,

since

YlPn =

00,

n>m

\\

n>m

So, PpmsupjEn)^]

=0.
if 0 that

D
<
< 1 /?\342\200\236

Exercise. Prove that 0. Hint First show

and

S :=

if 5

< 1,

then n(l

- X]Pn>
Pn)

< 00,then
1

- 5.

[](!

~Pn)

>

..(4-4)
4.4.

Chapter 4'

Independence

4^

Example
of rate

: n 6 N) be a sequence exponentially distributed with


Let (Xn
P(Xn Then,

independent

random

variables,

each

1:

>a:)

= e-^

a:

>

0.

for q

> 0,

P(Xn > alogn) = and (BC2),


for
infinitely

n-'',

so that,

using (BCl)
P{Xn

(aO)

> alogn

many

n)

= <

'

Now let

L :=
A:

limsup(X\342\200\236/logn).

Then

P(X and,

> 1)

> P(Xn

> logn, i.o.)=

1,

for

G N,

P(L
Thus,

> l

{L>

1}

=0. + 2k-^) <P{Xn > (l + fc-i)logn, i.o.) = [Jk{L > 1 -h 2k-^] is P-null, and hence L = 1 almostsurely.
think
we

Something

to

about
can

In the

same way,

prove

the finer result


=

(al)

P(Xn > logn + aloglogn,i.o.)


even

| ^

if ^

< l'

or,
(a2)

finer,

P{Xn

> log n

+ log log n -f
in

a log

log log n,

i.o. ) =

if a

< l'

or etc.
sequence

By

combining

an appropriate
sets

of

statements

(a0),(al),(a2),...
of

way (think about with the statement that

this!) the
the

union

of

a countable

number

null

is null

while the
can

intersectionof
make

a sequence

of probability-1

sets has probability 1, we about the size of the big statements precise

obviously

remarkably
a

elements

in the

sequence (Xn).

I
truly

have

included

in

the

appendix

fantastic

theorem

about precise

to this chapter the statementof behaviour: descriptionof long-term

Strassen's

Law.

42
A

Chapter 4- Independence
number

(4-4)\"
accessible

of exercises

in Chapter

E are now

to you.

4.5.
Can

A fundamental
we

construct

Xn having prescribed distribution function to to this question - for example, Yes


for

question for modelling sequence (Xn : n E N) of independent


Fn

random

variables,

the

branching-process

model
is

to make

rigorous branching-process model. The trick answer of Lebesgue measure given based on the existence answer is in the next section does settle the question. A more satisfying a topic deferred to Chapter 8. measure, provided by the theory of product

question is all that

sense. Equation

(0.2,b) makesit clearthat


for a

be ableto answer be able to construct a rigorousmodel 4.4 of Chapter 0, or indeed for Example
? We

have to

Yes

answer

to our

needed

4.6.

coin-tossing

model

with

applications
u; E

Let (n,

For jr,P) be ([0,1],S[0, l],Leb). UJ =

fi, expand

uj

in

binary:

O.UJ1UJ2

\342\200\242 \342\200\242 \342\200\242

different expansions of a dyadic rational is not going (The to cause any problems because the set D (say) of dyadic rationals in [0,1] has Lebesgue measure0 - it is a countable An an Exercise, you can set!) that the : n where G prove sequence N), (^n
existence

of two

is a sequence of
probability

independent

variables

each

^ for

either

for coin tossing.


Now

possibility. Clearly, (^^ : n

taking

the values 0 or
E

1 with

N)

provides

a model

define

Yi(uj) Y2((j^)
Y3{uj)

:=

O.uJiuJ^ujQ

...

:= 0.u;2Cc;5u;9 ...
:=

,
sequence

0.u;4u;8u;i3

... ,

and so on.

We

now

need

a bit

of common sense. Sincethe

has
is

the
clear

same 'coin-tossing'
that

properties as the full

sequence

(ujn

: n G

N),

it

and similarly

for

Fi has the uniform the other F's.

distribution on [0,1];

..(4-8)
Since

Chapter 4'

I'f^d^P^f^d^'^^^^

4^

disjoint,
is

the sequences (1,3,6,.-O^ (2,5,9,...), and therefore correspond to different that obvious intuitively

...

which

give sets

rise

\342\200\242 \342\200\242 are to Yi, 1^2,\342\200\242

of tosses

of our

'coin',

it

Yi^Y2,... are independent random


distributed

variables,

each

uniformly

on

[0,1].

Now
is

given.

: n E N) of distribution functions suppose that a sequence (F\342\200\236 we can the Skorokhod representation of Section 3.12, By

find

functions

gn on

[0,1] such that


Xn

:=

F-variables are independent, the X-variables,


But because the
\342\226\272

gn{yn)

has

distribution

function Fnthe

same

is obviously

true of

We

have

therefore

succeeded in constructing
with
you

a fam,ily

(Xn

: n

variables independent random, Exercise. Satisfy yourself that


intuitive

prescribed

distribution

E N) of functions.

utilizing

if forced carry through these this is again largely a case of Obviously, arguments rigorously. as we did in the Uniqueness Lemma 1.6 in much the same way
could

Section 4.2.

4.7. Notation: IID RVs


Many

of the

.random

variables
Thus,

(IID).
independent

and

most important problems in probability concern of sequences and which are distributed identically independent (RVs) if (Xn) is a sequence of IID variables, the Xn are then all have the same distribution function F (say): P{Xn <x)

= F{x),

V7i,Vx.

Of

course,

we now

we

can
for

construct
distribution
our

common

model
4.8.

know that for any given distribution function F, a triple (f],^, P) carrying a sequence of IID RVs with F. In particular, we can constructa rigorous function
process.

branching

Stochastic
process

processes; Markov chains


Y parametrized
by

\342\226\272A stochastic

a set

C is

a collection

F = (K, : 7
of random variables about existence of a

G C)

on some triple (f],^,

is (to all intents and purposes) settledby theorem, which is just beyond the scopeof

stochastic processwith

P).
the this

The

fundamental

question

prescribed

joint

distributions

famous course.

Daniell-Kolmogorov

^^
will Our concern

Chapter
be
Z\"^.

4' I'^f^^P^f^dence
with
think
7i \302\273-*

(4-^)\"
: n Z\"*\") (X\342\200\236 \342\202\254 of

mainly
We

processes
of Xn

time n. For u;
corresponding
A

(or parametrized)

by
G fi,

X = as the value

indexed

the

process

X at
of

the

to the
important

sample point lj.


example

map

Xn(u;)

is called

the sample

path

very

of a

stochastic process is
{pij

provided

by

Markov

chain.
a finite

\302\243\" be \342\226\272 \342\226\272Let

or countable
for

X E

matrix, so that

i,j

set. Let P = G E, we have


Y^pik

ij

e E)

he

sl

stochastic

Pii > 0,
Let // be
on

l.

^- := //({f}),{i
Z\"*\")

a probability
G

measureon E, so that
a time-homogeneous
fi

fi

is specified

by the

values
: ri

E).

By
such

Markov chain
transition
Z\"*\"

Z \342\200\224 {Zn
, in

with initial

distribution
that,

and

1-step

m,atrix P
io, M,...

is meant
gE.

a stochastic process Z
(a)

whenever
=

n G

and

P(Zo =

iQ\\Zi

2i;...;Z\342\200\236

z'n)

= fJ'ioPioh

-\"Pin-iin-

Exercise. Give a constructionof

such

a chain

in terms of
variables.
4.9.

the

values

at u;

of a

suitable family
chapter.

Z expressing
of

Zn{^) explicitly
random

independent

See the

appendix to this
Shakespeare
have

Monkey

typing

that

must Many interesting events an event F has probability


show

to independence

that

P(F)^

show 0 or 1, and we often probability 0 or 1 by using some argument based on = P(F).


we

Here is a silly example, to which both illustratesvery clearly


of

which

apply
of

the

use of

measures

in Lemma
'Easy

1.10 and
exercise'

has a lot
towards

0-1 law. See the


instantaneous

a silly method, but one the monotonicity properties the of the Kolmogorov flavour the end of this sectionfor an

solution

to the
correctly

problem.

WS, the Collected Works of typing a on a Shakespeare, particular sequence of N symbols typing typewriter. A monkey at one unit types symbols random, per time, producing an infinite sequence {Xn) of IID RVs with values in the set of all possible We agree that symbols.
amounts

Let us agreethat
to

= x) e := inf{P(A'i
Let

: x

is a

symbol} >

0.
of WS.
WS

be

the event that


event

Let Hk be the

that

the monkey producesinfinitely many copies the monkey will produce at least k copiesof

in

.,(4-9)

Chapter
be let

4: Independence
that
it

4^
at least

all, and let Hm,k time m. Finally,


many

the

probability be the H^^^

will

produce

k copies by

event that the monkey producesinfinitely

copies

of WS

over the time

period [m
over

-f-

1, cx)).

Because
behaviour

over

the monkey's behaviour [m + 1, oo), we have

[l,m]

is independent

of its

But

logic

tells

us that, for

every m, H^\"^^

HI

Hence,

P(Hm,knH)

= P{Hm^k)P{H).
{H,ri,k
H

But, as m t oo, Hm,k


obvious

T Hk, Hence,

and

JjT)

T (Hk

HH)

= H,

it

being

that

Hk 2

-S^-

by Lemma

1.10(a),

P{H)=P{Hk)P{H).

However, sls k ] oo, Hk


or 1.

H,

and

so, by Lemma

1.10(b),

P(H) = PiH)PiH),
whence

P(-fir)

= 0

The
for which

Kolmogorov

0-1
have

law
P(\302\243')

we must
- and

us which
Easy
1.

a produces 0 or P(^)

huge class of important = 1. Fortunately, it


of

events

doesnot tell

it therefore

generates a lot

interesting

problems!

to prove that P(H) \342\200\224 Lemma SecondBorel-Cantelli event that the monkey produces WS away, right > e^. Then that is, during time period [1, A^]. P(\302\243'i) exercise only Tricky types ( to which we shall return). If the monkey and is on every occasion likely to type any of the 26, capital letters, equally on average how will it take him to produce the sequence long
exercise.
Let

Use the
be

Hint,

E\\

the

'ABRACADABRA'?

The
assimilate.

next

three
They 0-1 but

sections
are law

involve quite
stage a quick

subtle topics which

take

time

to

Kolmogorov

IID RVs,
have

not strictly necessary for subsequent chapters. of the Strong Law is used in one of our two proofs

The
for
will

by

that

martingaleproof (of the 0-1law)

been

provided.

I use
chi,

Note. Perhaps the


/C

instead ^oo\\

of Z

otherwise-wonderful makes its TgK to avoid the confusion.ScriptX,


to live with
that.

T too

like J.

Below,

A*, is too

like Greek

X?

but

we have

46

Chapter Definition.
JY\"2,...

4'

I'f^d^V^''^^^''^^^

(4-10)..

4.10.
\342\226\272 \342\226\272Let Xi,

Tail
be random

cr-algebras variables. Define

The

(7-algebra

T is

called the tail


important

a-algebraof
events: :=

the

sequence

(Xn

:n 6

N).

Now, T contains many


(bl)
(b2)

for example,

Fi :=
F2 :=
.*=

(lim-Yfe

exists)

{uj :

exists}, limXit(u;) k

(X^-^ik
I hm

converges),
exists
1

(b3)
Also,

-F3

there

are many

important variableswhich ^
T

are

in mT:

for example,

.X
(c)

$i:=limsup
be
\302\26100, of

X\\'{-X2^

\\-Xk

which may
Exercise.

course.

Prove that
monkey

H in the
Hint

Fi, F2 and
is a
Section after

Fz

are

are

T-measurable,
the

that
various

the event
events

probability 0 and 1 in

problem

tail event, and that 4.4 are tail events.


have

of

- to be readonly Lookat F3 for example.

you

already

tried hard.
F3

For each
^\"+^M

n, logic tellsus that


+ ^n+*H
\342\200\242^-+

is equal

to the

set

Fi\") := {u,:lim
Now,
F3\" 3
Xn_|_i,

exists}.

Xn+2,...
now

are all from

Tn

follows

random variables on the Lemmas 3.3 and 3.5.


Law

triple (f],7^,

P).

That

4.11. THEOREM.Kolmogorov's 0-1


\342\226\272 \342\226\272

Let

(Xn

and
that

: n E N) 6e a sequence let T be the tail a-algebra of


isy

0/independent
(Xn : n

random

6 N).

Then T

is P-trivial:

variables,

(i) (ii)

FeT if ^ is
in

m,inistic

=^ P(F) = 0 or P(F) = 1, a T-measurablerandom variable, then, that for some constant c in [\342\200\22400,00],
P(e

^ is

almost deter-

c) =

l.

..(4-11)
We

Chapter4'
^ =
(i). at \302\261oo

I'^dependence

4'^

allow
of

(ii)

for obvious

reasons.

Proof

Let

Step

1: We
of

claim
The

that

Xn

and

Tn are

independent.
the

Proof

claim.

class

IC of

events of

form
x,-

{u :

Xi(u) <xi:l<k<n},
generates : n -f

R U \342\202\254 sets

{oo}

is a

which TT-system

Xn-

The
-f

class
r},

of

of the

form
U {cx)}
sequence

{lo : Xj{uj)< is a
(Xk)
now TT-system

x^-

1 <

; < n
7^.
/C

r E

N,

Xj

E R

which

generates

But the
and

assumption

that

the

is independent
clinches

implies

that

JT are independent.

Lemma 4.2(a)

our

claim.

Step 2: Xn and T are independent. This is obvious becauseT CTn.


Step

3:

We

claim,

that X^
Because

:=

cr(Xn

: n

6 N)
Vn,

and T are
the

independent.
\342\200\242=

Proof of
system

claim,.

Xn C A'n^.i,

class

/Coo

U'^'n

^^ ^

^\"

(it

is generally

NOT a

/Coo and

T axe independent,by
C A'cc, F

cr-algebra!)
Step

which

generates

A'oo-

Moreover,

2.

Lemma

4.2(a) again

clinches things.

Step 4Since

T is independentof
T \342\202\254

T!

Thus,

=>

P(F)

= P(F n

F) = P(F)P(F),
D

and

P(F)
of

= 0
(ii).

or 1.
By

Proof
Let

part

(i), for every x in R,


=

c :=

P($ =

sup{x : P(^ <


=

P($ < x) = 0 or

1.

x)

\342\200\224oo)

1; and

if c
c is

= oo, it
finite.

is clearthat
Then P(^

0}.

Then,

if c

= -oc,
=
=

it

is

clear

that

So, suppose that

< c-

P(^

oo) = 1.
0, Vn,

1/n)

so that

P(U{^<^-l/^})-P(^<c)
while,

= 0,
have

since

P(^

< c

+ 1/n)

= 1,Vn,

we

P(nU<^+l/n})

= P(^<c)

= l.

^8
Hence, P{C
Remarks.

Chapter

4-

Independence

(4-11)\"
\342\226\241

= c) =

1.

is. this result in Section 4.10 show how striking The examples random \342\200\242 ^^ cl sequence \342\200\242 For example, i/J\\ri,-Y2,\342\200\242 variables^ of independent
then

either

P( V]

Xn converges)
Xn

=0
=

or P(y^

converges)

1.
settles

The Three Series Theorem (Theorem completely 12.5) of which possibility occurs.

the

question

So,
Example.

you

can

see

that

the 0-1

law poses numerousinteresting questions.


example of Chapter 0, the
:=limZn//i'*
the variable

In

the branching-process
Moo

is measurable

of on the tail cr-algebra then not be almost deterministic. But

sequence

(Zn

the

variables

(Zn : n E N) are not

: n E N) but

need

independent.

4.12. Exercise/Warning
Let

Yo,

ill,

^2,

\342\200\242 \342\200\242 \342\200\242 be independent p(y'\342\200\236--fi)

random
=
p(r\342\200\236

variables = i, -i)

with

vn.

For n 6
Prove

N,

define

that

the

... ^nvariables -X'i,X2,... are independent. Define T^ := a{Xr : r > n). y:^aiYi,Y2,...),
Xn

:= Vo^i

Prove

that

c-f]<T{y,Tn)j^aly,f]Tn]
n

=:n.
\\

/ of 1Z. tripped
given

Hint.

Prove that
and

Yq

E mC

and

that

Yq

is

independent example
when

Notes. The
mogorov

phenomenonillustrated by
Wiener.

this

up even
was

Kolto that

The
and

me by Martin (for y a <7-algebra

Barlowand Ed Perkins. Deciding


(7^)

very simple

illustration

here

shown

we

can assert

a decreasing

sequence of cr-algebras )

=
f]aiy,T\342\200\236)

a(y,f]TA
contexts.

is a

tantalizing problemin many

probabilistic

Chapter

Integration

5.0. Notation, etc. /i(/) :=:J f dfi^ /i(/; A) Let (5, S,/i) be a measure space.We are interested elements/ of mE the (Lebesgue) integral of / with we shall use the alternative notations:
\342\226\272 \342\226\272

in defining
respect

to

for suitable /z, for which

fi{f)

:=:

Is f{s)fi{d3) that

:=: /^

fdfi.

It is worth
notations

mentioning

now

we shall

also use the equivalent

for

A 6

S:

(with a true
example,

definition

on

the

extreme

right!)

It should

be clear that,

for

Kf;

f>x):=

fi{f; A), where

{s E

S : f{s)

> x}.

now is that, of course, is else worth summation Something emphasizing a special type of integration. If (a\342\200\236 real : n E N) is a sequence of numbers, = 1 then with 5 = N, E = 'P(N),and measure on (5, E) with jj, the /i({fc}) for every A: in N, then 5 \302\273\342\200\224> if and only if ^ |an | < 00,and a^ is /z-integrable
then

y^an

/ asiJ>{ds)=

a dji.

We

begin

by
to

considering
take

such an f

the integral
in the

of a

function

in (mS)\"^,

allowing

values

extended half-line [0, 00].

49

50
5.1.

Chapter 5: Integration
Integrals
is

(5.1)..

of non-negative of E,

simple functions, SF'^

If

an

element

we define
A^o(U)

:=

^^{A) <
that

cx). have only a


shall
naive

The use of
integral
An

/io

rather

than

yi

signifies

we currently

defined
element
if

for simple

functions.
and

SF'^,
(a)

may

/ of (mE)\"^ is called simple^ sum be written as a finite


m

we

then write

/ E

/ =

X^\302\253itUfc

Jk=i

where

ak E

[0, oo] and Ak


Y^akfi{Ak)

T,. We

then define
(with

(b)

fioif) =
first

<

oo

O.oo :=

0 =: oo.O).

The

to be checked is that /io(/) is well-defined; for point / will have we different of the must that form and ensure many representations (a), the same value of in desirable also they yield properties /io(/) (b). Various need to be checked,namely (c), (d) and (e) now to be stated:

(c) ii f,g e
and
(e)

5F+

and

//(/

^ g)

= 0 then

/io(/) = f^oig);
f

(d)

('Linearity')

ii f,g
Mo(/

e 5F+
Mo(/)

and c > 0 then


+ f^o{g),

+ g

and cf

are in 5F+,

g)=^ if f,g

fJ'o{cf)= c/io(/);
/io(/)

(Monotonicity)

e SF'^
^f

and f < g, then


/
V

< l^o{g)]

(f)
involves

ii f,g
no

e 5F+

then

/ A

and

are

in 5F+.
but

Checking all the


point

of substance,
what

turn our

attention to

propertiesjust mentionedis a little messy, and in particular no analysis.We skip


matters:

it

this,

and

the

Monotone-Convergence

Theorem.

5.2. Definition
\342\226\272For /

of/i(/), /
we define

(mE)+

E (mE)\"^

(a)
Clearly, for
/

fi{f) :=
E 5F+,
result

sup{fio{h)

: h \342\202\254 SF+,

ft <

/}

< oo.

we have

fi{f) = fio{f).

The

following

is important.

..(5.3)

Chapter 5: Integration

51

LEMMA
\342\226\272(b)

//

/ G (mE)+

and fi{f)

= 0, then
K{/>o})

= o.

that

Proof. Obviously, {/ > 0}=T if /i({/ > 0}) > 0, then,

limj/
for

> n~^}.
some

n,

Hence, using (1.10,a), we see /i({/ > n~^}) > 0, and then
\342\226\241

fi{f)>fio{n-'l{f>i/n})>0.

5.3.

Monotone-Convergence

Theorem

(MON)
such

\342\226\272 If \342\226\272\342\226\272(a)(/n)

is a

sequence

of elements of (mE)\"^
M(/n)
T

that

f /\342\200\236

/,

then
M(/)

< OO,

or, in

other notation,

/ Js
other key

fnisUds)

/ Js

f{s)fJi{ds).

This theoremis really all there is results such a^ the Fatou it. Theoremfollow trivially from The (MON) theorem is proved
relates

Lemma

to integration theory. We shall see that and the Dominated-Convergence


in

the

Appendix.

Obviously,

the

theorem

measures.

you have
It is
a sequence

lookedat the following

very closely 1.10(a), the monotonicity result for The proof of (MON) is not at all difficult, and may be read once

to Lemma

definition

of o:^''^.
of
E

convenient to have an explicit way given / E (mE)\"^ such that f^^^ of simple functions f^^^ | /. For r r^^ staircase function a^^^ : [0,cx)] -^ [0,cx)] as follows:
(b)

N,

obtaining the define

a(''>(x) := I {i y r

(0

if

X = {i
X

1)2-''
/('')

0,

if

>

1)2-''
r.

<x <

i2-'' <r
T

{i

N),

if
satisfies

Then

/('') =

a^'') o /

6 5F+,

and /(''>

so that,

by (MON),

/i(/)=Tlim//(/''>)
We

=Tlim/io(/^''^).
/\342\200\236 T

have

made

a^''^ left-continuous

so that if

/ then

T Oi^'^Hf)\302\273('')(/\342\200\236)

52
need

Chapter
Often,

5: Integration

(5.3)..

we

to apply /
ii^

the

rather than everywhere.


(c)

hypothesis

T (/\342\200\236

^^e

Let
and

where convergence theorems such as (MON) case of (MON)) holds almost everywhere be made. us see how such adjustments may

If f,9

e (mE)+ /i(/^''^) =
E (mE)+
set /(''>

= g

(a.e.),
let

then fi{f)
r

= fi{g).
and

Proof. Let
by (5.1,c),
\342\226\272

o \302\253(''>

/,

^('')

= a^''^ o g.

Kg^\"^^)'

Now

Then /(''> = g^\"^^ (a.e.) t oo, and use (MON).


in (mE)\"^

so,

D
except

(d)

If f

and

is (/\342\200\236)

a sequence

such

that,

on

jjL-null

iV\",

Then /\342\200\236 T /\342\200\242

Kfn)

M/).

Proof
fls\\N

We have
everywhere.
on,

/i(/) =
The

result now follows from


is understood

fi{fls\\N) and /i(/n) =


to include

//(/nl5\\iv)-

But

fnls\\N

(MON).

D
not

From now

to spell out such extensions for the other bother theorems, convergence often stating results with 'almost but proving them under the everywhere' null set is empty. assumptionthat the exceptional

(MON)

this extension.We

do

Note on the Riemann


If, for

integral
Riemann

example,
with

/ is a non-negative
Riemann

integrable

function

on ([0,1],

S[0,1], Leb)
sequence of

integral

I, then
a

(Ln)

of elements

of

SF\"^

and

there exists an increasing sequence (Un) of elements decreasing UniU>f

SF\"^

such

that

Ln'{L<f,

and

fjL^Ln)

T I?

y^{Un) i

L If we define

2[L
then {/
7^

if

X =

[/,

0 \\

otherwise,

it is
/}
of

clear
is a
measure

that

/ 0.

is Borel

subset of the

to be
and

the

Riemann

So / is Lebesgue measurable (see SectionA 1.11) with integral of / equals the integralof / associated
the

Borelset {L ^

measurable, while
U)

(since/i(X) =
Lemma

/i(^)

1)

which

5.2(b)

shows

([0,1], Le6[0,1], Leb), Le6[0,1] denoting


measurable subsets

<j-algebra

of Lebesgue

of [0,1].

5.4.

The Fatou
(FATOU)

Lemmas for functions


For

\342\226\272 \342\226\272(a)

a sequence /i(liminf/n)

(/\342\200\236)

in

(mE)\"^,

< liminf/i(/n).

..(5.6)
Proof.

Chapter 5: Integration
We have

53

(*)
For n >
A;,

=T lim^^, liminf/n n
we

where gk
/i(/n) >
n>Ai;

:=

infn>*:

fn-

have

/\342\200\236 gk^

>

so that
li{gk)

l^iQk)-,

whence

< inf

//(/\342\200\236);

and

on combining

this

with

an

appHcation li^i
k

of (MON) inf /i(/n)

to (*), we obtain

//(Uminf/n)=t
n

hm/i(<;A:)

<T

n>k

=: Hminf/i(/n). n
Reverse
\342\226\272(b)

\342\226\241

Fatou
//

Lemma is a
<

have fn

(fn)

sequence in
and

(mE)\"^

such

that

for

some g in (mE)\"^, we

5',Vn,

fi^g) <
sup

oo,
fn)

then

fi{lim

> Hmsup/i(/n).
fn)\342\226\241

Proof Apply (FATOU)to the sequence {g-

5.5.

'Linearity'

Fora.f}

e R+

and f,g 6 (mE)+,


K^f

+ M

= c^Kf)

+ /i^g)

(< oo).
apply

Proof
to the

Approximate

and

simple functions,

and then use (MON). of/


where

g from

below by simplefunctions,

(5.1,d)
D

5.6.

Positive
E

and
we

For /

mE,

negative parts write / = /+\342\200\224/\",


:= max(/(^),0),
and

f^{s)
Then

f-{s)

:= max(-/(^), 0).

/+,/-\342\202\254

(mS)+,

|/| =

/++/\".

54
5.7.

Chapter 5: Integration
Integrable

(5.7)..

function,
we say

\302\243^(5, E,/i)

\342\226\272For mE, / \342\202\254

that /

is

fi-integrable^

and

write

if

M(i/i) =

M/\"')+M(r)<oo,

and then

we

define

y\"/dp:=M(/):=M(/+)-Mr)-

Note
\342\226\272

that,

for /

6 \302\243^(5,E,p),
IM/)I<MI/IX
integral

the familiar
integral

of

rule that the modulus of the the modulus.


E,

is less

than or

equal to

the

We write

\302\243^(5,

/i)\"^ for

the class

of non-negative elementsin \302\243^(5, E,

fi).

5.8.

Linearity

Fora,/3

GR

and

f,g

E C\\S,T;,fi),

af +
and

^geC\\S,i:,ix)
= ayL{f) +

fi(af

+ pg)

^yi{g).
result in Section 5.5.

Proof. This is a

totally

routine

consequence

of the

5.9. Doininated-Convergence Theorem (DOM)


\342\226\272

Suppose the

that

fn^f

E niE, that
is

fn{s)

\342\200\224>

f{s)

sequence

(/\342\200\236)

dominated

by an

for every s in S and that element g o/>C^(5, E,/z)\"^:

|/n(^)|<^W,
where

V^G5,VneN,
-^

fi(g)

< oo.

Then

fn-^f
whence

in C\\S,

E, /i): that
Kfn)

is,

fi(\\fn

f\\)

0,

^ P(/)-

Command: Do Exercise E5.1

now.

..(5.11)

Cha'pier -

5: Integration
the

55
reverse

Proof.
Lemma

We

have

|/\342\200\236

/| <
--

2g, where fi{2g) < oo, so by


/I)

Fatou

5.4(b),
\\imsupfi{\\fn

< /i(Hmsup|/n
-

- /I) = /i(0)
<
M(I/\302\253

0.

Since

IM/n) the theorem

- /^(/)l =

IM/n

/)l

/I),
CD

is proved.
Lemma
that

5.10.
\342\226\272(i)

Scheffe's
Suppose
negative.

(SCHEFFE)
\342\202\254 \302\243^(5, E,//)\"^;

fn,f
-

m particular,

fn and
-^ Kf)-

f are non-

Suppose
f^ilfn

that

fn-^f

(a.e.).

Then
if fi(fn)

/I)

-^ 0 if

and

only

Proof The
Suppose

'only if part is trivial.


that

now

(a)

Kfn)
(/\342\200\236

^ Kf).
that

Since
(b)

/)-

< /,

(DOM) shows

p((/n-/)-)-0.

Next

M((/n-/)+)
But

= M(/n-/;/n>/)

Kfn)

Kf)

- Kfn -f;fn<

/)\342\200\242

K/n
SO

-/;/n

< /)|

< K(/n

-/)-)!-

0
D
(a.e.). Then
-^
fi{\\f\\).

that

(a)

and

(b) together imply that


and

(C)
Of

course,

(b)
the

(c) now yield


part

M((/n-/)+)-0. the desired result.


of SchefFe's

Here is
\342\226\272(ii)

second

Lemma,
and

Suppose

that
K\\fn

fn^fE
-

\302\243^(5,

E,/i)

that

fn-^f

/I)

-^ 0 if

and

only

if ^(|M)

Exercise.

Prove
\"^

the
^^^

f^ift)
trivial.

Kf^)^

'if part of (ii) by using to show that Fatou's Lemma ^^^^ applying is if part (i). Of course, the 'only

5.11. Remark
The

on uniform integrability
better

theory triples,

of uniform gives

probability

which we shall establishlater for integrability, of integrals. insight into the matter of convergence

56

Chapter 5:
machine

Integration

(5.12)..

5.12. The standard


What

I call Monotone-Class

the standard Theorem.

machine is a much cruder alternative


that
/z),
the
definition;

to

the

/i in

to prove The idea is that a space such as \302\243^(5, E,


\342\200\242 first, function

a 'linear'

result is true for


case

all

functions

we

show the result is true for - which it normally is by

when

h is

an indicator
; integrability

\342\200\242 then, \342\200\242 next,

we use we use

linearity (MON)
h

to obtain the to obtain the


being
h

result for

h h G

in SF

result for
h\"^

(mE)\"^,

conditions on
\342\200\242

usually

superfluous
=

at this

stage;
linearity, that

finally,

we

show,

the claimed

result is true.
when

by writing

\342\200\224 h'~ and

using

It seems to

me

that,

machinework'than to appealto the monotone-class times when the greater subtlety of the Monotone-Class 5.13. Integrals over subsets Recall for / E (mE)\"^, we set, for that

it works,

it is ea<sier to

'watch the standard


though

result,

Theorem

there are is essential.

G E,

J[ fdfi:=:fi{f;A):=fi{flA).
A

If

we

really

want to
E^),
ought

integrate /
measure

over

A,

we

should
is // subsets

integrate
restricted
of

the restriction

/U with space (A, S. Sowe


(a)
The
both

respect

to the

E^
to

prove

denoting that

^a (say) which the a-algebra of

to the measure A which belong to

iiAU\\A)

ti{f;A).
indicator

standard machine does sides of (a) are just //(A have f\\A G mE^; and then

this. If / is the
fl B);

of a

set B in

A,

then

etc.

We discover that

for f

mS,

we

/U e
in

\302\243^(A,S^,//^)

if

and

only if fU

\302\243i(5,E,^),

which

case

(a) holds.

..(5.14)
5.14. Let

Chapter 5: Integration
measure f/.i^ f
A \342\202\254 (mS)\"*\"

57

The

/ G (mE)+.

For

G S,

define

(a)
A

(ff,){A):=fi{f;A):=fi(flAy
trivial

Exercise

on the

results of
^^

Section5.5 and
on (5,

(MON)

shows

that

(b)

(ff^)
h

measure

S).

For
(c)
If

(niE)\"^,

and

\342\202\254 S, we

can conjecture

that

(h{fl^))(A) := (/M)(ftU)= KfhU).


is the

standard (d)

indicator of a set in E, then machine produces (c), so that hifl,)

(c) is immediate
we

by

definition.

Our

have

= (hf),!.
following form:
then

Result (d) is often


^f \342\226\272(^)

used

in the

(\"^^)^
and need

^^^ ^
then

(n^S)>

\302\243^(5, E,///)

if and

only if
that
D

fh e C^{S,S, /i)

{ffi){h)

= fi{fh).
ft

Proof.

We

only

prove

this for
5.

>

0 in

which case

it

merely

says

the measures
Terminology,
If

at (d) agreeon
and

the Radon-Nikodym ffi on


symbols

theorem
say

A denotes

the measure
in

to //, and expressthis

(5, E), we
via

that

A has

density f

relative

d\\/dfi
We

= f.
i^ E

note

that

in this

case, we have

for

E:
X{F)

(f)
so that

^{F) = 0 impliesthat
only certain
and A are

= 0;

measures
(proved

have

Nikodyin theorem
(g)
\\ if

density

relative

to fi.

The Radonholds,

in

Chapter

14) tells us that

fi

a-finite
\342\202\254 (mE)\"^.

measures on

(5, E)

such

that

(f)

then

fji

for some f

Chapter

Expectation

6.0. Introductory
We

remarks

work

with a

Recall that a
measurable

probability
variable

triple (fi, ^, P), and


(RV)

write

C^ for

random

is an

element of

C^{Q.^T^ P).
is

m^,

that

an

J^-

function from fi to R.
to

Expectationis just the integralrelative


Jensen's
very

P.

inequality^
(5,

useful

general
We

critical use of the fact that P(r2) = 1, is for and powerful: it implies the Schwarz, Holder,... inequalities E,//). (See Section 6.13.)
which makes
geometry

study

the

of the space

C^{Q.^J-^ P) in somedetail,with

a view

to several

later applications.

6-1- Definition of expectation


For

random
of

variable

X E :=

>C^

\302\243^(fi, J^,

P),

we define

the expectation

E(X)

Xhy

E(X)
We

/ XdP =
X

/
with

X{u)P{duj),

also

define

E(X)

(< oo) for

\342\202\254 (m^)+.

In short, those

That our presentdefinitions


density

agree

in

P(X). terms of probability


6.12.

E(X) =

function

(if it exists)

etc.

will

be

confirmed

in Section

6-2- Convergence theorems


Suppose

that

{Xn)

is a

sequence of RVs,
P(Xn

that

is a

RVj

and

that

Xn

\342\200\224> X

almost

surely:

^ X) =

1.
notation:

We

rephrase

the

convergence

theorems

of Chapter 5 in our new

58

..(6.4)
\342\226\272 \342\226\272(MON)

Chapter

6:

Expectation

59

if 0

< Xn T X,
>

then

T E(X\342\200\236)

E(X)

< oo;
E(X\342\200\236);

\342\226\272 \342\226\272(FATOU) \342\226\272 (DOM)

ifX\342\200\236

0,
<

then
Y(uj)

E(X)
V(n,w),

< liminf

if

\\X\342\200\236{u)\\

where E{Y)

< oo, then

E(|X\342\200\236-.Y|)^0,

30 that

E{Xn)
\342\226\272(SCHEFFE)

- E(X);
then

ifE(\\Xn\\)

-^

E(|X|),

E{\\Xn-X\\)-^0;
\342\226\272 \342\226\272(BDD)

if for

some

finite

constant K,

\\Xn((^)\\

<

Ky{n^u),

then

E(|Xn-X|)->0.

The newly-added
immediate

Bounded Convergence
Section

Theorem

(BDD)
has

consequence
fact

of P(fi)

of the

that

proof which

in we shall examine

obtained by (DOM), = 1, we have E(F)

< oo.It
but

taking

Y{ijj)

K^ a direct

is an Vu;; because elementary

13.7;

to provide
As has
concept

you might

well be able

it

now.

which

study

this,

is the key been mentioned previously, uniform integrability of theorems. We a gives proper understanding convergence via the elementary (BDD) result,in Chapter 13.

shall

6.3.

The

notation

E(X; F)
and

For X eC^ (or (mJF)+)


\342\226\272

6 JF,

we define

E(X;

F)

:= /^

X(u)P(cL;) :=

E(XI^),

where, as ever.

Of

course,

this

tallies

with the

/i(/;

A)

notation

of Chapter

5.

6.4. Markov's
Suppose

inequality
E
mj-\"

that

and

that

decreasing.(We
\342\226\272

know

that
>

g{Z)
E(^(Z);

^ : R \342\200\224> [0, oo] is B-m.easurable = g o Z E (m^)\"^.^ Then Z >

and non-

\302\243g{Z)

c) >

g(c)P(Z > c).

60
Examples:

Chapter

6:

Expectation

(6-4)'-

for Z
for

\342\202\254 (m^)+,

cP(Z
cP(\\X\\

> c)

< E(Z),
E{\\X\\)
by

(c > 0),
(c >

e C\\
can

>c)<

0).
optimum

>->-Considerable strength
c in
\342\226\272

often

be obtained

choosing

the

0 for

P(F

> c)

< e-^^E(e^^),

(^ > 0,

\342\202\254 R).

6.5.
We

Sums
collect

of non-negative
together
imJ=')-^

RVs
< oo, then

some
and

useful results.

(a) If X e
If \342\226\272(b)

E{X)

P{X <

oo) = 1. This

is

obvious.

(Zk)

is a

sequence in (m^)\"^, then

This

is an (Zk)

obvious
is a

consequence of linearity

and (MON).
X^E(Z)t)

If \342\226\272(c)

sequence in (m^)\"^ such that ^Zk

< oo,

then

< oo (a.s.)

and

so

Zfc

\342\200\224> 0

(a.s.)

of immediate consequence and (b). (a) is a consequence of (c). For suppose (d) The First Borel-CantelliLemma that is a sequence of events such that oo. Take Zk = Ipk< (Fk) ^ P{Fk) = Then and, by E(Zk) P(Fk) (c),

This

is an

Y^

If^

number

of events

Fk

which

occur

is a.s.

finite.

6.6. Jensen's inequality for convexfunctions \342\226\272 \342\226\272A function c : G \342\200\224> where G is an open subinterval R, convex on G if its graph lies below any of its chords: for

of

R,

is

called

x,y

E G

and

0<p=l-q<l,
It
will

+ c{px
below
on

qy)

<

pc(x)
automatically

-h qc(y).
continuous and

be

explained

that
then

c is
c is

on
>

G.

If c

is

twice-difFerentiable

G,

convex if

only

if

c\"

0.

^-Important

examples

6 R). of convexfunctions: |x|,x^,e^^(^

..(6.7)

Chapter

6: Expectation

61

THEOREM.
\342\226\272 \342\226\272

Jensen's inequality
that

Suppose

c : that

G of

and

G -^ H is a convex function on an open subinterval such that X is a random variable


< oo,

E(|X|)

P(X

= \342\202\254 G)

1,

E|c(X)|

< oo.

Then

Ec{X) > c(E(X)).


Proof.
with

The
u

fact

< V <

is convex may w, we have


that c
A ^ Au,v < A T_ where At,,u\342\200\236

be

rewritten

as follows:

ior u^v^w 6

Au,u

:=

^(^) -^^-^^

^(^) ^-^.

now clear (why?!) the monotonelimits


It is

that

c is

continuous

on G,

and

that

for

each

v in

(D-c)(v) :=t
exist and satisfy
have

lii^

Au,i\342\200\236

(^+c)(^^)

:=i

Hm A^;,^^ D-c

(D-c){v) < (D^c)(v). a nd non-decreasing, for every v in G,


c(x)

The

functions

and Z^^-c are


we

for any m in
+ c(i;),
for
jjl

[(Z)_c)(v), (\302\243)4.c)(v)]

> m(x

\342\200\224

v)

x E G.
:=

In particular, we have,
c{X)
and

almost surely,

E(X),

> m(X

-fi) + c(m), m 6
follows on taking
we

[(D-c)(;.),

(D+c)(/x)]

Jensen's

inequality

expectations. fact
+
that

Remark. Forlater
(a)

use,

shall

need

the obvious
sup(ana: n

c{x)

= sup[(D_c)(^)(a:

qeG

- q) + c{q)]=

bn)

{x 6

G)

for

some

sequences

(an) and (bn) in

R. (Recallthat

c is

continuous.)

6.7.

Monotonicity
p <

of C^ norms
X

\342\226\272 \342\226\272For 1 <

cx), we say that

E C^

= a{Q.,7,
< oo,

P)

if

E(|X|^)

62
and

Chapter 6:
then

Expectation

(6.7)..

we

define
II^IIp

\342\226\272 \342\226\272

:=

{EdXl\}^.
following:

The monotonicity
\342\226\272

property referredto in the sectiontitleis the


and Y
\342\202\254 C,

(a)

ifl<p<r<oo

then <

and C^ \342\202\254

ii^^iIp

\\\\y\\\\r^

>-Proof.

For

n 6 N,

define

Xn{^):={\\Y{i^)\\hny.

Then Xn
x^^P

is

on

(0,

so that Xn and Xn we from Jensen's conclude oo),


bounded
(EX^y/\"

are

both

in C^.
that

Taking c{x) =

inequality
<

< ECJT;/\")

E[{\\Y\\AnY]

E(\\Yn.

Now let n t

oo and

use (MON)

to

obtain

the desired

result.

D
a simple but

effective use of
Vector-space

Note. The proofis markedwith


truncation.

because \342\226\272

it

illustrates

property
a,

of C^
R\"^,

(b)

Since,for

6 6

we

have

(a +
is \302\243^

by <

[2max(a,b)]P

<

Vi^oP

-f 6^),

obviously

a vector

space.

6.8. The Schwarz inequality


\342\226\272 If \342\226\272(a)

and Y

are in C?,

then

XY

\302\243^and

|E(xy)l<E(|XF|)<l|x||2||y||2.
will have seen many versions of this result and of truncation to make the argument rigorous.

Remarkbefore.

You We use
By

its

proof

Proof. restrict

considering to

|X| the

and

\\Y\\

instead

of

attention

case

when X

>0,Y >0.

and Y,

we can and do

..(6.9)
Write Xrt~

Cha'pier

6:

Expectation

63 Yn are

X Nn,Yn''-Y ^n, so that


0 <

Xn

and

bounded. For

any

E[{aXn

-f hYnf]
-f

= a^E{Xl)
and since

2abE(XnYn)

+ b^E{Y^),
have

not the quadratic in a/b (orb/a, or...) does

two

distinct

real

roots,
Now let n t
The

{2E{XnYn)y <
oo using

AE(Xl)E(Y^)

<

AE{X^)E{Y^).
\342\226\241

(MON).
immediate consequence of
so (a):
we

following

is an

(b)

if X

and Y are in

C^, then
\\\\X

is X

-^ Y,
+

and
\\\\Y\\\\2.

have

the

triangle

law:

Yh<\\\\Xh

Remark.
Section

The
6.13,

Schwarz

which

inequality is gives the extensions

true for any measure of (a) and (b) to C^.

space

- see

6.9.
In

C^: Pythagoras,
section,
with

this

we take
probabilistic
variance

covariaiice, etc. a brief look at the geometry


concepts

of

C^

and

at its

connections

such as

covariance, correlation,etc.

and Covariance
li

X,Y

>C\"^,

then

by

the

monotonicity

of norms,

X,Y 6

>C^,

so

that

we

may define

Mx:=E(X),

fiY-E{Y).
are in
we \302\243^,

Since
(a)

the

constant

functions

with

values /ix,/^y

see

that

X:=X-^fix,
the

Y:=Y-fiY
XY
= E[{X
and \342\202\254 \302\243^,

are in C^. By
(b)
The
final

Schwarz
:=

inequality,
EiXY)

so we

may define

Cov(X,Y)
Schwarz
[ ] bracket

- ^cx){Y-

/zy)].

inequality
to yield

further justifies expanding the alternative formula:

out the product in the

(c)
As

Coy(X,Y) = E{XY)-fixtiY.
you

know,

the

variance

of X

is defined

by =

(d)

Var(X)

:= E[(X

- fix)'] = E(X') -

^\\

Cov(X,

X).

64

Chapter 6:

Expectation

(6.9)..

Inner product,angle
For

Z7, V

G >C^, we

define the inner (or scalar)product


{U,V):=E{UV\\

(e)

and

and V
(f)

if ||J7||2
by

and ||F||2

^ 0, we

define

the

cosine

of the

angle 9 betweenU

cos.=
the

<^^^

WuhWh
Schwarz

This has modulus at most 1 by


the

inequality.

This ties in

with

probabilistic
correlation

idea

of correlation: X and Y

(g) the and Y.

p of

is cosa

where

is the

angle between

Orthogonality, Pythagoras theorem


C^ ing'

has

the

same
Thus

below).

geometry as any inner-product space (but see 'Quotientthe 'cosine rule' of elementarygeometry and the holds,
form

Pythagoras
(h)

theorem takes the


lie; +
=
A.

vh'

= wuh'

+ wvh'
V

if

{u,

V)

= 0. or perpendicular,
form

If {U,V) write U
replaced

0,
V.

we say that U In probabilistic

and

are

orthogonal

and
U,V

language,

(h) takes the


if

(with

hy X,Y)

(i)

Var(X + F)
Xi,

Var(X)

Var(y)

Cov(X,F)

= 0.

Generally,for
(j)

X2,...,

6 C^, A'\342\200\236

Var(Xi

+X2 +

---+Xn) = J2
k

V^(^t)

2^^

^^.^

.Cov(Xi,
I am

X,).

I have

not marked

they are

results such as (i) and well known to you.


law

(j)

with

\342\226\272 because

sure that

Parallelogram
Note

that

by the
-f

bilinearity of
+
\\\\U

(\342\200\242, \342\200\242),

(k)

\\\\U

FII2'

FII2'

= {U

+ V,U

+ V)

{U ^V,U

--V)

2\\\\Uh'+2\\\\Vh\\

..(6.10)

Chapter

6: Expectation

65

Quotienting
Our

(or lack of it!): L^


not

space

does \302\243\"^

quite
can

satisfy
say only

space because the

best we

is that

the requirements for (see (5.2,b))


if U

an

inner

product

||J7||2= 0 if In functional
equivalence

and

= 0 almost

surely.
an

analysis,

we find

an elegant

solution by defining surely

relation

U ^

V ii

and

only

if

f7 =

V almost

and define
Ui

L'^

as

oneneedsto check
-

'\302\243^ quotiented that

out

by this

if for

i =

1,2, we have

equivalence relation'. Of course, c,- 6 R and Ui,Vi 6 C^ with


=

Vi, then

ciUi
that

-f

C2U2

- ciVi

-f

C2V2;

{UuU2)

(V\\, F2);
V

'liUn-^U

in C^

and Vn

-^ Un

and
of

V -^U,

then Vn-^

in

C^;

etc.

As mentioned in 'A

this quotienting in probability the moderately elementary


advanced

Question
level

Terminology',

theory.

level.
t

For

a Brownian
an

that

\302\273\342\200\224\342\226\272 is

Bt{uj)

continuous

function Bt on 6.10.

book, one couldnot do so at a more motion {Bt : t E R\"^}, the crucial property the true would be meaningless if one replaced
of this

Although

we normally do not do one might safely do so at

Q,

by

equivalence

class.
^ p

Completeness

of C^

(1

< 00)

Let p e
The

[1,00).
following

(a) is important in functional analysis, and will it as an the case when p = 2. It is instructive to prove exercise in our probabilisticway of thinking, and we now do so.
result
be

crucial

for us in

(a)

//

(Xn)

is

a Cauchy

sequence

in C^ in that

sup \\\\Xr-Xs\\\\p-^0
r,s>fc

(k-^oo)
\342\200\224> X in

then

there

exists

X in C^ ||X,

such

that

Xr

C^:

- A^l;,-^ 0

(r-^oo).
(a) by

Note.

We

already

in showing that technique of the

know that C^ is a vector space. Property C^ can be made into a Banach space L^

is important a quotienting

type mentionedat the end of

the

preceding

section.

66

Chapter 6:
of

Expectation
be

(6.10)..
an almost sure limit
that

Proof

(a).

We

show

that

A\"

may

chosen

to be

of

a subsequence

{Xk^): 72 \342\202\254 N) with

Choose a sequence{h^
{r,s>kn)

kn ]

oo such

\\\\Xr-Xs\\\\p<2-^.

Then
=
H\\Xk\342\200\236^.-XkA) \\\\Xu\342\200\236^.-Xk\342\200\236h

<

U^^.-XkAr.

< 2-\",

SOthat

Hence it

is

almost

surely

true that the

series
converges

X](^\"itn+i
(even

^kr.)

absolutely!),

so that
limXfc\342\200\236(u;)

exists

for

almost

all uj.

Define
Then

X is Suppose

X{u) := limsupXjt\342\200\236(u;), a.s. jF-measurable, and Xk^ \342\200\224> A,


that n 6
E (|X.
N

Vu;.

and

r >

for /:\302\253\342\200\242 Then,


=

N 9
<

>

n,

- Xu, I\")

\\\\Xr

Xk,

lip\"

2-\"P,
obtain

so that

on letting

| oo

and using

Fatou's Lemma,we
E C^.

Firstly,
Xr

->

X 6 Xr \342\200\224 X in CP.

>C^,

so

that

Secondly, we see that, indeed,


D
see EA13.2.

Note. For an easy exercise on


6.11. Orthogonal

C^

convergence,

projection

The
number

result

as

well

on has of C^ obtained in the previous completeness of important consequences for probability theory, and it is to develop one of these while Section in your fresh mind.

6.10is

section a perhaps
of its

I hope that you


orthogonal projection

will

allow

me

to present
for

the following result on


now,

as

central role in the theory


We

a piece
of

of geometry
conditional throughout
||2

deferring

discussion

expectation this

until Chapter

9.

write

||

\342\200\242 for

||

||

\342\200\242

section.

..(6.11)
THEOREM
\342\226\272

Chapter 6:

Expectation

67

Let

(Vn)

he a vector subspace of is a sequence in K which

C^ has

which the

is complete in Cauchy property

that

whenever

that

sup ||v;~v;||-^o
r,3>k

(k-^^),

then

there

exists

a V in K

such

that

\\\\Vn-V\\\\^Q

(n^oo).
exists

Then given
(i)

X in
\\\\X

C?y

there

Y in

K such
--W\\\\:We

that

y|| =

:=

inf{||X

K},

(ii)

X-Y
(i) and

\302\261Z,

VZ

/C. \342\202\254

property (i) or (ii)


||F

Properties

(ii) ofY in IC
with

are

equivalent

and ifY

shares either

Y, 0

then {equivalently, in the


If Y

- y II

Y =

Y, a.s.).

Definition. The
the

random

variable

Y
IC.

theorem is
is another

a.s.
Proof.

orthogonal

projection

of X

onto

version, then F =

called a

version

of

F,

Choose

a sequence

(Yn) in /C such
||x-y\342\200\236||--A.

that

By

the parallelogram
\\\\X

law (6.9,k),

n||2 + n)

+ ||X

- n||2 = 2\\\\X
that

\\{Yr +

Y,)f

+ 2||Kn
It

Y,)\\\\\\

But

\\{Yr

e K,, so

||X

that the
K. such

sequence {Yn)
that

has

the

k{Yr + F,)||2 > A^. Cauchy property so that


-

is now

obvious
in

there exists a F

\\\\Y\342\200\236-Y\\\\^Q.

Since

(6.8,b)

implies

that

||,Y -

Y\\\\

<

\\\\X

Yn\\\\

||y\342\200\236

r||,

it is

clear

that

||X-F||=A.

68

Chapter

6: Expectation

(6.11)..
and so

For any

in /C,

we have

F + tZ

/C for

t 6 R,

\\\\X-Y-tZf>\\\\X-Y\\\\\\ whence This can only


be

the

case for

all

of

small

modulus

if
\342\226\241

(z,x-r)
Remark.
form

= o.
theorem

The
\302\243^(f], Q^

case to

P) for some

this which we shall apply of J^. Q sub-a-algebra

is when tC

has the

6.12.

The

'elementary

formula'

for expectation To avoid


the confusion

Backto earth!
Let

J\\r be

let

us here

a random variable. write Ax on (R, S) for

between

different

\302\243's,

law

of X:

Ax{B) :=
LEMMA

P{X e B).
R to

\342\226\272

Suppose

that

h is
G

a Borel

measurable function
if and

from

R. Then

h{X)
and

\302\243HQ,JF,P)

only if

\342\202\254 C\\R,B,Kx)

then

(a) We simply

Eh{X) =

Ax(h) = / Jr
into of

h{x)Ax{dx).

Proof

feed everything

the

standard

machine.

shows

Result (a) is the definition that (a) is true if ft is a


h

(a)

for

non-negative

if h = I5 (B 6 B). Linearity then function on then simple implies (R, B). (MON) and linearity allows us to complete the function,
Ax D

argument.

Probability
We

density
that

function

(pdf)
(pdf)

say

X has
/x

Borel function
(b)

: R

a probability density function -^ [0, cx)] such that

fx

if there

exists a

P(X

eB)=

f fxix)dx, Jb

BeB.

..(6.13)
Here we
Section

Chapter

6:

Expectation

69

have written dx for


result

what

should

be Leb(<ix).

In the
to

language
Leb:

of

5.12,

(b) says that Ax

has density fx

relative

dLeh

The function

to fx

fx is only
satisfy

defined

almost

everywhere:

any function

a.e. equal

will

also

(b) 'and

conversely'.

The above lemma extends to


E{\\h{X)\\)

<

oo if

and only if

< / \\h{x)\\fx{x)dx

cx)

and

then

Eh{X)=

Jr

f h{x)fx(x)dx.

6.13. Holder from Jensen


The

truncation

technique
fact

used
P(fi)

6.S relied

on the
any

that

true for
We

to prove the Schwarzinequality in Section < cx). However, the Schwarz inequality

is

measure

space, this
for

as is
with

the more general Holder inequality.

conclude

Holder inequality
triples.

chapter
any

(5, S,

a device (often useful) which yields the //) from Jensen's inequality for probability

Let
\342\226\272

(5, S,//)

be a measure space.Suppose
p

that

> 1

and p~^
and

-f q~^ = 1.
cx), and in
that

Write

e CP(S,

E, //)

if

mE \342\202\254

fi{\\f\\P) <

case

define

11/11. :=

Ml/r)}^/^.

THEOREM

Supposethai f,g e \302\243^(5, E,


\342\226\272(a)

fi),

h G C^{S,

E, //).
and

Then

(Holder's

inequality)

fh \\Kfh)\\

\342\202\254 \302\243^(5,S,//)

<K\\fh\\)

< ||./||;>||-|lg^

\342\226\272(b)

(Minkowski's

inequality)

11/+

fir||,<

11/11,+ ||5||p.

70
Proof

Chapter 6: Expectation
of (a).

(6.IS)..

We can obviously restrict attention to


/,/i>Oand/i(/P)>0.

the casewhen

With

the notation

of Section5.14,define

SO that

P is

a probabiUty measureon
\342\200\236(,):^/M^)//W-^

(5, S).

Define

if/W>o,

The fact

that

P(w)\302\253

<

P(w\302\253) now

yields
\342\226\241

M(IAI)<ll/llpl|ftI{/>o}||,<||/||p||%.

Proof

of (h).

Using Holder's
Ml/

inequality, we have + 91'-') + Mbll/ + 9r')


+

+ 91\")

= Ml/ll/
<\\\\f\\\\pA

\\\\9\\\\,A,

where

A=\\\\\\f+9r'\\u=M{\\f+9n^',

and (b)
and
of

follows

on

rearranging.

(The
A

result is
follows

non-trivial

only

if /,

flf

\302\243^,

in that
CP.)

case, the

finiteness of

from

the vector-space

property
D

Chapter 7

An Easy Strong Law

7.1. 'Independence means multiply' - again!

THEOREM

Suppose that X and Y are independent RVs, and that X and Y are both in Cl . Then XY G C1 and

E(XY) = E(X)E(F).
In particular,

if X and Y

are independent elements

of C2 , then

Cov(X, Y) = 0 and Var(X

+ Y) = Var(X) + Var(F ).

case when X

X~, etc., allows us to reduce the problem to the Proof Writing X = > 0 and Y > 0. This we do.
But then, if is our familiar staircase function, then

a(X) = ailAi,

a(Y) = bjlsj

where the sums are over finite parameter sets, and where for each i and j, Ai (in cr(X)) is independent of Bj (in cr()). Hence
E [a(X)a(r\Y)} =
=

n B>)

EEa'P(')P() = E[a(r)(X)]E[a<r)(y)].

Now let r j oo and use (MON).

Remark. Note especially that if X and Y are independent then X Cl and Y 6 Cl imply that XY 6 C1. This is not necessarily true when X and
71

72

Chapter

7:

An

Easy

Strong

Law

(7.0)..

Y are

It is important

not independent,and we
that

need

independence -

of Schwarz, the inequalities the need for such obviates

Holder, etc.

inequalities.
note

7.2. Strong

Law

first

version
many
4^**

The following result covers a 'finite it imposes though

cases
moment'

of importance.
condition,

You should

that

the (X\342\200\236) for about identicaldistributions sequence. so fine a result has so simplea proof.

it makes no assumption that It is remarkable

THEOREM
\342\226\272

Suppose

that

Xi^X2^'''

for

some

constant

are independent random K in [0, oo),


= 0,
+

variables,and

that

E(Xfc)

E{Xt)<K,
Then
=

\\/k.

Let

Sn

= Xi

+ X2 +

\342\226\240 \342\226\240 \342\226\240

Xn.

P(n->5\342\200\236-^0)

l,

or again, Proof.

Sn/n

\342\200\224> 0

(a.s.).

We have

EiS*J = E[iXx+X2+ --- + XnT]


k

because,

for distinct

i,j,

fc

and

/,

E(X,X|)
using

=
plus

E{XiX]Xk)
the fact that

= E{XiXjXkXi)=
E(A'i) =

0,
of

independence

the fact that


CP

norms'

E{Xj) < 00 impliesthat E(X|) result in Section 6.7. Thus Xi and

0. [Note that, for example, < 00, by the 'monotonicity are in C^.] Xj

We know from Section

6.7 that
\\/i.
i ^

[E{Xf)]''<E{Xt)<K,

Hence, using

independence again, for


E{XfX])

j,

E{Xf)E{X]) < K.

..(1.S)
Thus

Chapter

7:

An

Easy

Strong Law

73

E(5;t)<

nK

-f

3n(n

1)K <

3Kn\\

and (seeSection 6.5)


E

Y^{Sn/ny

< 3K

Y^

7Z-2 <

oo,

so that

Y!f{Sn/ny

< oo?

a.s., and
0, Sn/f'i' \342\200\224^

a.s.

Corollary.
E(Xk) (a.s.)

// the condition
fJ-

E(Xit) = 0 m

the

theorem

is

replaced
n~^Sn

by

= as

for

some

constant

fi, then the theorem

holds with
to the

-^

y^

its conclusion.

Proof. It is obviously a
where

case of

applying

the

theorem

sequence {Yk)-,

Yk :=

Xk

\342\200\224 But

/i.

we need

to know that

(a)
This

supE(F/)<oo. k
is obvious

from Minkowski's

inequality

||A';i.-mI|4<||A',||4 + H
//I on fi having C^ norm |/i|). But we the elementaryinequality (a) immediately by (6.7,b).
(the

constant

function

can

also

prove
D

The

next topics

indicate a

different

use

of variance.

7.3.

Chebyshev's
know

inequality
says

As you

this

that for
-

c > 0, and

C^ ^ \302\243

c^P{\\X

//| >

c) < Var(X),

fi

:=

E(X);

and

it is

obvious.

Example.

Considera sequence (Xn)


p

of

IID

RVs with
=

values in {0,1}

with

P(Xn

= 1)

= 1 ~ P(Xn

0).

Cha'pier

7:

An

Easy

Strong

Law

(7.3)..

Then E{Xn)
has

= P and Var(Xn)
np

p(l

P) < \\'
\342\200\224 <

Thus (usingTheorem 7.1)


and we

expectation

and

variance

np{l

p)

n/4,

have

= E(n-^5n) = p, Var(n-i5n)

n-2Var(5n)

< l/(4n).

Chebyshev's

inequality yields

P(|n-^5n-p|>(!))<l/(4n(!)2).

theorem 7.4. Weierstrassapproximation


J/ / is a continuous function on [0,1] and polynomial B such that

e >

0, then

there exists a

xe[o,i]

sup

\\B{x)

f{x)\\

<e.
are

Proof. Let
aware

(Xk)^

Sn

etc.

be as in

the Examplein Section 7.3.You

well

that

P[Sn =
Hence

k]=

(^)p'(l
=

~ Pr~\\ 0<k<n.
-p)\"-^

B\342\200\236(p)

:=

Ef(n-'S\342\200\236)

J2f(n-'k)(^^p''(l

the 'j5'
Now

to Bernstein. being in deference


/

is bounded
on

on [0,1], \\f{y)\\
^
<5

<

continuous

[0,1]:

for our
y|

<5 > given e > 0, thereexists

K-, \"iy

\342\202\254 [0,1].

Also, / 0 such

is

uniformly

that

(a)
Now, for p Let us

\342\200\224

implies

that

\\f{x)

\342\200\224

/(y)|

< \\e.

6 [0,1],
\\Bn{p)

/(p)|

= -

|E{/(n-i5\342\200\236)

/(p)}|.

write

:= F\342\200\236 |/(n-i5n) < \\e^ y\342\200\236


\\Bn{.p)

/(p)| and
E(F\342\200\236)

:= Z\342\200\236 [n-^Sn

-p|.

Then Zn<8

impliesthat

and m\\

we have <
=

E{Yn;Zn<S)

+ E{Yn\\Zn>S)
>

< heP{Zn<S) + 2KP(Zn


< Earlier,
i\302\243 + We

6)

2A7(4n<52).
now

we chose
-

a fixed 6 at (a).
< e,

choose

n so that

2K/(4n6^)

< ie.
\342\226\241

Then

|B\342\200\236(p)

/(p)|

for all p in [0,1]. Laplace transforms.

Now

do

Exercise

E7.1 on inverting

Chapter

Product

Measure

8.0.
One
'interchange

Introduction

and

advice
practical

of this

chapter's main lessonsof


of

importance

is that

an

order

of integration'

result

/ n
J

Si

J S2

U fisi,S2)fii(dsi)jfi2{ds2) f{si,S2)fi2{ds2yjfjii{dsi)= Si
*^ *^ \342\226\240S'2 infinite)

is always valid (both sides possibly being both valid for 'signed'/ repeated integrals that one (then the other)ofthe integrals absolute
(with of

if

/ >

finite)

0; and is provided

values:

1/(^1,52)1/^1

fJ'2{ds2)

is finite.

It is a good stage. Exceptfor


of either
use

idea

to read

strongly recommended to
the

postpone serious study


of infinite

through the chapter to


of

get the
the

ideas,

but

you are

contents

until a

later

matter

the standard machine or the Monotone-Class Theorem to prove the notation. When by things made to look complicated it is important to appreciate when the more you do begin a seriousstudy, subtle Monotone-ClassTheoremhas to be used instead of the standard
intuitively

products, it is all a

case of

relentless

obvious

machine.

product S
that

8.1. Productmeasurable structure, x E2 Let (5i, El) and (52,E2) bemeasurable spaces.
Ei

Let

denote

:= Si X

52-

For

i =

1,2, let pi

denotethe i^^

coordinate

the Cartesian so map,

pi{si,S2) :=

51,
75

:= S2. P2{si,S2)

7^

Chapter

8: Product
of E

Measure
E2

(8.1)..
the cr-algebra

The fundamental
\342\226\272

definition

= Si
E =

is as

(a)

(7(/>i,/>2).

Thus E

is generated by

the

sets

of the
XS2

form
(Bi

p:[\\Bi) = Bi
together
with

eEi)

sets

of the

form

p-\\B2)
Generally,

= SixB2
over

(B2eE2).
Cartesian

product

a-algebra
to

which one
factor
J

factor is allowed
all

and

other

factors

are whole spaces. In the caseof

vary

is generated by the a-algebra

products
to

in
that

corresponding
our

product

of

two factors,
(b)

we have
(Bi

X 52) n

{Si

B2)

= BiX

B2

and you can


(c)

easily checkthat
T={BixB2:B,\342\202\254E.}

is a
a

TT-system generating

E = Ei X
^^^

E2.

A similar

remark
of

would apply
may

for

product y^^ n^\302\253' countable intersections in analogues of (b), of (7-algebras cause problems. The fundamental

countable

^^^

^^^ that,

since we

only

take

products

uncountable

families

definition

analogous

still works.

to (a)

LEMMA

(d)

Let which

7i

denote
are

the class that

of functions
map m,ap

R which / : 5 \342\200\224>

are in

bE and
on S2, on 5i.

such

for each

si in Si, the for each S2 in S2, the Then H = bE.


It

S2 si

\302\273-> y-*'

f(si^S2) f{si,S2)

is Tt2-measurable is T,i-m^easurable

Proof

is clear

that if
the

\342\202\254 J,

then

conditions(i)-(iii)of SinceE = cr(T), the

Ia

Verification H. \342\202\254

that

Monotone-Class

Theorem

3.14 is

7i satisfies the straightforward. D

result

follows.

,,(8,2)
8.2.
We

Chapter 8: Product Measure


measure,
with
is

77

Product
continue

Fubiai's

Theorem

the

for i =

1,2, fii Section that for l{(^i)

/ 6 JS2
/

a finite

that notation of the precedingSection.We suppose the preceding from measure on (5i, Ei). We know bE, we may define the integrals

\342\200\242=

f{^ii^2)f^2{ds2),

12(^2):=

/ JSi

f{si,S2)f^i(dsi).

LEMMA
Let

7i

he the

class of

elements in bE such that


and

the

following

property

holds:
\342\202\254 bEi !{(\342\200\242)

l{(-)

\342\202\254 bEs

and

JSx
Then

l{{si)fii{dsi)

=
JS-y

li{s2)fi2{ds2).

= bE.
X,

is straightforward. Monotone-Class Theorem 3.14


For

Proof. If

then,

trivially,

I^

W. \342\202\254

Verification

of the

conditions of
D

-F 6

with

indicator

function

/ :=

If, we now define


Ii{s2)fi2{ds2y

//(F) := / JSi
Fubini's
\342\226\272 \342\226\272

l{(^i)/ii(c/3i)= / JS2
on (5,
fi = fii
X

Theorem
The
measure

set
of

function fi fii and fi2

is
O'f^d

a measure
we

S) calledthe
x
fi2

product

write

dnd

(5,S,p) Moreover, fi is

= (5i,Si,^i)

(52,E2,M2).

the unique measure


X A2)
with

on (5, E) for
Ai

which

(a)

//(Ai If f

= /ii(Ai)//2(A2),
the

\342\202\254 S,.

E (mE)\"^, then

obvious

definitions

ofl^^l^,

we have

(b)

fMif)= I l{(5iVl(rf^l)=/ JSi JSi

li{s2)^L2{ds^),

18

Cha'pter 8:
[0,oo].

Product Measure

(8,2),,
(a)

in

If f

E mE

and /i(|/|) <

oo,

then

equation

is valid

(with

all

terms

in H).
fact

Proof, The

The fact that /i

Lemma 1.6 and the fact

and (MON). of Unearity // is a measure is a consequence is obvious from Uniqueness is then uniquely specified by (a)
that that

<7{T)

= E. = I^, where valid for /


A

Result (b) is automatic for / Theorem shows that it is therefore

T. The \342\202\254
\342\202\254 bS,

and

Monotone-Class in particular
it

for
for

/ in the SF'^ spacefor / 6 (mS)\"^;and linearity

(5,

E,//). that

shows

(MON) (b) is

then shows that

is

valid

valid if //(|/|)

< oo.

Extension
\342\226\272

All sure

of FuhinVs spaces:

Theorem

will

work

if the

(Si^Tii, fii)

are a-finite
etc.
of

m,ea-

We have

this by
blocks.

a unique
breaking

measure/i on (5,S) satisfying


up

(a),

etc.,

We can
disjoint

<7-finite

spaces into countable

unions

prove
finite

Warning

The <7-finiteness The is the conditioncannot be dropped. standard example = 1,2, = = 2 For take and be Let 5, E,fii following. Lebesgue [0,1] S[0,1]. and let fi2 just count the number of elements in a set. Let F be measure the x 52 : x = y}. Then (check!) F 6 E,but \342\202\254 5i diagonal {{x^y)
I({s^)

l,

li(s2)

= 0

and result (b) fails,


Something

stating

that

1 =

0.

to think about
on was

So,
finite

our

insistence

measures

with beginning necessary. Perhaps

bounded functions on
it

products of
that in our

is

worth

emphasizing

standard machine, things work because we can use indicator functions of our set in we whereas when can only use indicator functions any cr-algebra, of sets in a 7r-system, we have to use the Monotone-Class Theorem. We cannot approximatethe set F in the Warning as example
F=TlimF\342\200\236,

where

each

Fn is

a finite union

of

'rectangles'

Ai

x A2,

each

A,

being

in

B[0,1].

..(8.3)
A

Chapter 8:
application

Product Measure

19

simple

Suppose that X is a non-negative the measure // := P x Leb on


A

random

variable

(fi,^) : 0

x ([0,

oo),S[0, oo)). Let


h

on (O,^,

P). Consider

:=

{(u;,x)

< x

< X(u;)}, graph


\\\\{x) of

:=

U.

Note

that

is

the

'region

under the

X'. P(X

Then > x).

If (u;)

= X(u;),

Thus

(c)
dx denoting
formulae

//(A) = E(X) =

P(X

>

x)dx,

for

Leb(dx) as usual. Thus we have E(-X') and also interpreted the

obtained

one of

the well-known
under

integral E(X) as ''area


the Lemma

the

graph

of X\\

Note. It is perhaps worth the Fatou Lemma and the

reverse
sets

remarking that
Fatou

Monotone-Class for

Theorem,

functions

amount

to

the correspondingresults for

applied

to regions

under graphs.

8.3. Joint laws,joint pdfs


Let

and

Y be two

random variables.
Cx,Y

(X, F)
defined

is the map
: BiR) by

The (joint) law


^ [0,1]

Cx,Y

of the

pair

X B{R)

-Cx,y(r):=p[(x,r)er].

The S(R)
F'x,Y

system
X

x {(\342\200\224oo,x]

(\342\200\224oo,y]

: x^y

6 R}

B{R).

Hence

Cx,Y

is completely

of

and Y which

is defined via

is a 7r-system which generates determined by the joint distribution


<x;Y

Fx,Y{x,y):=P{X

<y).

We now know how to


//

construct Lebesgue measure


LebxLebon(R,S(R))2.
joint

We say that

X and Y
if for

have

/x,y on R2

probability

density

function

(joint pdf)

\342\202\254 B{R)

x B(R),

P[{X,Y)eT]

J^fxM^Mdz)
JrJr

Ir(^, y)fx,Y{x, y)dxdy

80

Chapter
(Fubini's

8: Product
being

Measure
in the

(8.3)..

etc.,etc.,
Theorem

Theorem

used

last step(s)). Fubini's

further

shows

that

fx{x)
acts

:= / /x,y(^, Jh

y)dy

as a

pdf for
of

any more

X (Section6.12),etc.,etc. You
sort

don't

need

me to

tell you

this

of thing.

8.4. Independence and product measure laws Y be two random variableswith Let X and Cx -, Cy respectively three functions Fx^Fy respectively.Then the following distribution
statements

and

are

equivalent:

(i)

X and Y
Cx,Y

are independent;
X
Cy\\

(ii)
(iii)

Cx

Fx,y(x,2/)

= Fx(x)Fy(y);
/x,y

moreover, if (X, Y) has 'joint' pdf


(iv)

then

each of
almost

(i)-(iii) is equivalent
every

to

/x,y(^?y)

= fx{x)fY{y)

for Leb X

Leb

(x,i/).

You do

not wish to know more about this either.


=

8.5.
Here
countable

5(R)'^

BiR\"\

again, things
products,

are niceand tidy


require

provided

we work

with finite or
if

but

different

concepts (such as Bairecr-algebras)


topological

we

work

with uncountable
i^^ coordinate

products.
the

from S(R\")is constructed

space

R**. Now, if pi

: R\"

\342\200\224>

is the

map:
\342\200\242= ,^n) />i(^l,^2?-\342\200\242 -2:,',

then

pi is

continuous,
S\"

and hence
:=:

i3(R)\"

5(R'^)-measurable.Hence (j{pi : 1 < i < n) C B(R\.


by

On the other hand, S(R\")is such open subset is a

generated countable of
union

the

open

subsets

open

'hypercubes'

of R\", and every of the form

n
l<A;<n

(\302\253''^')

and

such
In

products

are in 5(R)\".
theory
it

Hence,B(R\")= 5(R)\".
always
8.8.

feature rather than S(R\.")SeeSection

probability

is

almost

product

structures

B^ which

..(8.7)

Chapter

8: Product

Measure

81

8.6. The n-fold extension


measure
of two space chapter, we have studied the product measure variables. of two random to the how this relates and study spaces 'fromtwo to n' from your You are more than able to 'generalize' You should in other branches of mathematics. experience of similar give things in product measure space, some thought to the associativityof the 'product' So

far

in

this

somethingagain familiar
8.7.
This

in

analogous

contexts.

Infinite
topic

products
sl trivial

of probability triples
extension of
an and previous

is not

results.
it

restricted context(though main idea in a clear fashion;


probability

important extension

one) because to infinite

We concentrate on a us to get the allows

triples

is then
a

a purely
sequence

routine exercise.
of independent

products of arbitrary RVs

Canonical model
Let
already
construct (A\342\200\236:nEN)bea

for

know
a

from

sequence

more elegantand
THEOREM Let
Define (An

of probability measures on (R,S). We sequence the coin-tossing trickery of Section 4.6 that we can of law An. Here is a independent RVs, Xn having (Xn)
systematic

way

of doing

this.

: n

\342\202\254 N)

fee

a sequence fi=

of probability [J
nGN
R

measures on (R,S).

SO

that

a typical

element
Xn : fi

uj

of

H is

a sequence
Xn(u;)

(u^n)

i'n

R-

Define

-> R,
Then that

:= Un,

and let T := (y{Xn : measureP on (f],J^)

\342\202\254 N).

there
N

exists a unique
and

probability

such

for r 6

Bi,

B2?

\342\200\242 \342\200\242 \342\200\242 \342\202\254 ? ^r B,

(a)

((n l<k<r
\\

^0

\"^

n R1 k>r J

l<k<r

^\"'^^^^

We

write

(f],jr,P)=

JJ(R,s,An).
nGN

Then on

the (fi,.?^,

sequence P),

n \342\202\254 w a sequence (Xn \342\200\242 N) Xn having law An-

of independent

RVs

82

Chapter

8: Product

Measure
in the appear

(8.7).. usual way from Lemma on the left-hand side of

Remarks, (i) The uniqueness


1.6, because product a tt-system form (a)

of
of

P follows

sets

the

form

which

generating !F.
(a) more

(ii) We

could

rewrite

neatly as

To see this, use the Proof


of

monotone-convergence

property

(1.10,b)

of measures.

the

theorem

is deferred
on

to the

Appendixto Chapter9. laws


i

8.8.
Let

Technical

note

the

existence
E, -^

of joint
Define

Xi : fi -^ Si be such
5 :=

(Q,^),

(5i,Ei)

and (52,
that

spaces.For E2) be measurable


:
J^\".

1,2,

let

X~^

5i

52,

E :=

Ei x
-^

E2,

X(uj) := (Xi(u;),X2(u;))

5.

Then
fi oi

variable,and if
X (equals

(Exercise)
P

X~^ :
is a

T,

J^^ so

probability
Xi

the joint law of

that X is an (5, E)-valued random measure on fi, we can talk about the law and X2) on (5, E) : /i = P o X~^ on E.

now that 5i and 52 are metrizablespaces and that Ej = B{Si) Suppose = isa Then metrizable 5 under the space product topology. If 5i (i 1,2). and 52 are separable, then S = S(5), and there is no 'conflict'. However, if 5i and 52 are not separable,then B(S) may be strictly larger than E, X need not be an (5, S(5))-valued random and the variable, joint law of Xi and X2 need not exist on (5, S(5)).

It is
separability

perhaps

of

R was

as well to be warned of such things. Note used in proving that S(R\") C B^ in Section

that

the

8.5.

PART

B:

MARTINGALE

THEORY

Chapter 9

Conditional

Expectation

9.1.
variables,

motivating

example

Suppose
X

that (fi,

J^, P) is a probability
the
the

triple

and

that

X and Z

are random

taJcing
taking

distinct
distinct

values xi,X2,...
values

,a:m,

^i,

\342\200\242 \342\200\242 ? -s^n. 2^2? \342\200\242

Elementary

conditional

probability:
Zj)

P(X =
and

Xi\\Z

:= P{X

=Xi;Z

= Zj)/PiZ =

Zj)

elementary

conditional
=

expectation:
zj)

E{X\\Z

5^x,P(X

=
Y

Xi\\Z

Zj)

are

familiar

to you.
of

The random variable


Z, is defined

E(X|Z),

the conditional

expectation

given

as follows:

(a)

if

Z(u)

= Zj,

then Y{u)
advantageous

:= \302\243(X\\Z
to

Zj)

=: yj (say).

It proves
'Reporting

to be

very

look
to

to us

the value

on which Z is constant:

of Z(ujy amounts
Z

at this idea in a new way. Q into 'Z-atoms' partitioning


Z =

Z = zi

Z2

Zn

The

<7-algebra

and therefore It is clear from


(b)

consistsprecisely
(a)

(^(Z)

generated
of

by Z
the
2**

consists of
possible

sets

{Z

\342\202\254 B},

E B,

unions

of the n
it

Z-atoms.

that

Y is

constant on is

Z-atoms, or, to put

better,

^-mea^urable. 83

84
Next,

Chapter 9:
since Y takes
YdP

Conditional

Expectation

(9.1)..

the constant value


yjP{Z

yj

on the

Z-atom
=

{Z =
Zj)P{Z

^j}, we
= zj)

have:

= zj)

= y]x,P(X = Xi\\Z

=:^S^xiP(X
If

= Xi;Z =

Zj)= f

XdP.
every

we

write

G in

^, /g

= = {Z = Zj}, this says ) E(XIg, ). Since for Gj E(FIg, is a sum of Igj 's, we have E(FIg) = E(XIg), or

(c)
Results (b)

JG

YdP

Jg
the

XdP,

\\/G

Q.

and (c) suggest

central

definition

of modern

probability.

9.2. Fundainental Theoremand Definition(Kolmogorov, 1933) with E(|X|) < oo. variable Let \342\226\272 \342\226\272\342\226\272 P) be a triple, and X a random (f],^, Let Q be a sub-a-algebra of J-. Then there exists a random variable
Y

such

that

(a)
(b)

Y is
E(|y|)

Q measurable^
< oo,
set

(c)

for

every

TT-system

which

in Q (^equivalently, for every set contains Q. and generates Q), we

G in some
have

I YdP=
G

G
RV

XdP,

\\/GeG.

Moreover,
that

is,

is called3L
given
Two

ifY is another = y] = 1. A P\\Y


version

with

random

thenY = properties variable Y with properties


these

Y, a.s., (a)-(c)
of

of

the

conditional

concept,
expectation

= E(X|^), a.s. versions with familiar a.s., and when one has become agree one identifies different versionsand speaks of the conditional
Q, and we write Y
E.{X\\G).

expectation

JE.{X\\Q)

the

But

you should

think

about

the

'a.s.'

throughout

this

course.

The theorem is proved


which

in

Section

9.5, except

for the

7r-systemassertion
for

you

will
We

find at
often

Exercise E9.1.
write E{X\\Z) for E(X|(7(Z)), That this is consistent with

\342\226\272Notation.

E(X|Zi, Z2,...)
the

E(X|<j(Zi,Z2,...)),

etc.

is apparent from

Section 9.6 below.

elementary

usage

..(9.5)
9.3. The
An

Chapter

9:

Conditional

Expectation

85

intuitive meaning
to you The only information available performed. is the set of values Z{u;) point lj has been chosen is the variable Z. Then F(u;) = E(X|^)(u;) random ^-measurable in the this information. The 'a.s.' ambiguity value of X{(jj) given in general, but it is sometimes one has to live with is something

experiment

has

been

regarding
for

which sample

every

expected
definition

a canonical possibleto choose Note that if Q is the trivial


information),

version

of E(X|^). {0,fi}

<7-algebra

(which

contains

no

then

^{X\\Q){ijj)

= E(X)

for all uj.


as least-squares-best

9.4.

Conditional
If

expectation <

predictor

\342\226\272 \342\226\272

E(-X''^) a version

\302\243^(fi,^,P). predictor

of

predictors

is oo, then the conditional expectation Y = E{X\\Q) the onto X Section of orthogonal projection (see 6.11)of Y is the least-squares-best Q-measurable Hence, all X: all Q-m,easurable functions (i.e. am^ongst amongst which can be com,puted the available Y from, information),

m,inim,izes

EliY-Xn No surprise then that


theory

conditional

which

develops

it) is crucial in

industrial

processes,

or whatever.

filtering and control-

expectation

(and

the martingale
of

space-ships,

of

9.5. Proof of Theorem 9.2

isvia the Radonway to prove Theorem 9.2 (seeSection 14.14) theorem, described in Section 5.14. However, a Section 9.4 suggests much simpler approach,and this is what we now develop. We can then prove
The

standard

Nikodym,

the general Radon-Nikodym

theorem

by

14.13.

martingale

theory.

See Section
Then

First
we prove

we

prove

the

almost
of

existence

the existence in general.

E{X\\Q)

sure uniqueness of a version of when X e C^; and finally,

E{X\\Q). we

prove

the

Almiost sure uniqueness of E(-X'|^)


Suppose
Y,Y

that eC\\Q,g,P),

E C^
and

and

that

and

Y are

versions of

E{X\\Q). Then

E(r-f;G)

= o,

WGeg.

86

Chapter

9:

Conditional

Expectation
We

(9.5)..
may

Suppose that Y and Y the labeUing is such that

are not almost surely


P{Y

equal.

assume

that

> Y)

> 0.

Since

{Y>Y + n-'}uy>y}.
we

see

that

Y is in ^, because

P{Y

\342\200\224 V\" >

and

n~^) Y are
y

for some n. But and ^-measurable;


> 0
>

the

set

{Y -Y

> n'^}

E(y a contradiction.

y; r -

n~^)

> n-\"^P(y

- y > n-^)

>

o,

Hence Y =
E{X\\g) C^

Y, a.s.
e C^ Let
Section

Existenceof
let fC :=

for :=

Suppose that X e C^iQ) :=


we

\302\243^(fi,^, P).

^ be
6.10

^,
(a)

know

orthogonal

that /C is complete for the projection we know that there E[(X

C^iQ^Q^P). By

C^ norm.
exists
Wf]

a sub-cr-algebra of J^, and to g rather than applied


By

Theorem

6.11

on

Y m
: W

/C =

such that C'^{Q)

Yf] =

mi{E[{X= 0,
-C^C^)

\342\202\254 C^Q)},

(b)
Now,

{X -Y,Z)
a G

VZ

in

\302\2432(^).

eQ,

then Z :=

Iq

and

(b) states

that

E(Y;G) =

EiX;G).

Hence

is

a version

of E(X|^),
for

as required.

Existence of
By splitting
case

E{X\\g)

e C^
X-,

X as X = X'^
Xn

\342\200\224

we see that
X X.

when

bounded variables
choose

X G (>C^)\"^.So
with

assume that
0 < Xn
We
T

it is enough to deal with We can now G (C^)'^. Since each Xn is in \302\243^, we

the

choose
can

a version

of E(-X'\342\200\236|^). y\342\200\236 true

now

need

to establish that

(c) it is almost surely


We prove

that

0 <Yn
that

^.
(c)

this in a moment.Given

is true,

we set

y(u;) := limsupy\342\200\236(u;).
Then

Y G

m^, and

y\342\200\236 t y,

a.s.
=

But now (MON)


EiX;G)
Yn

allows us to deduce that Q)


D

E{Y;G)
from

(G e
Xn

the

corresponding

result

for

and

\342\200\242

..(9.6) result

Chapter

9: Conditional

Expectation

87

positivity

Property

(c) follows
is a

once we prove that


bounded
E{U\\g)
RV,

(d)

if

non-negative

then

> 0,

a.s.

Proof of (d).
some

Let

VT

be

a version

of E{U\\g). If P{W
Q has

< 0) > 0, then

for

n, the

set

G := {W < so that

in \342\200\224n\"^}

positive

probability,

0 < E{U; G)
finishes This contradiction

= the

E{W; proof.

G) <

-n-^P(G)

< 0.
D

with traditional usage that The case of two RVs will suffice to illustrate things. Sosuppose Z are RVs which have a joint probability density function (pdf)
9.6.

Agreement

and

fx,z{x,z).

Then fz{^)

= J^fx,z{xy^)dx actsas a probability Define the elementary conditionalpdf fx\\z of


^)//^(^) \342\200\242^-^.^(^' fx\\zix\\z) := ( 10
/i

density

function

for Z.

X given if /^(^)

Z via ^ 0;

Let
where

otherwise.

be

a Borel

function
E|/i(X)|

on

such

that

JR

< oo, / \\h{x)\\fx{x)dx


a pdf

of course

fx{^)

= /r

fx,z{x^z)dz gives
Jr
h{x)fx\\z{x\\z)dx.

for X.

Set

g{z) := /
Then
a{Z).

Y :=

g{Z) is a
typical

version

of

the

conditional

expectation

of h{X)
G B},

given

Proof.

The

element of
must

B E B. Hence, we
(a)
But

a{Z) has the form

{uj

: Z{u;)

where

show

that
=:

L :=
^ =

= E[g(Z)lB{Z)] E[h{X)lB{Z)]
^)dxdz,

R.

J J

Kx)lB{z)fxA^^

R=

g{z)lB{z)fz{z)dz,
D

and result (a) follows


Some of the
at now.
practice

from

Fubini's is given

Theorem.

in Sections

15.6-15.9,

which

you

can

look

88
\342\226\272 \342\226\272\342\226\2729.7. Properties

Chapter 9:
of

Conditional

Expectation

(9.7).,

conditional
in

expectation:
Section
and

a list

areproved These properties


this Hst of

9.8.

All X's

satisfy

Ed-X\"!)

<

oo in

use
(a)

properties. Of course,Q
denote

7i denote

of

'c'

to

'conditional'

in (cMON),

etc., is obvious.)
{Very

sub-cr-algebras

of J^.

(The

If
If

Y is
X

any version

(b)
(c)

is ^

of E(X\\g) then E{Y) = E{X). then E(X\\g) = X, a.s. measurable,


E(aiXi
if Yi

useful, this.)

(Linearity)
Clarification:

+ 02^21^)

= aiE(Xi|a)
and

+ a2E(X2|a),a.s.
Y2 is

then

aiYi

4- ^2^2

is a version of E(-X'i \\Q) is a version of E(ai-X'i


E{X\\g)

a version

of E(-X'2

\\Q),

4- ci2X2\\G)'

(d) (Positivity)

If X > 0, then

> 0,

a.s.
a.s.

(e) (cMON) If 0
(f)

< Xn

^,

then

E{Xn\\G)

T E(X|a),

(cFATOU)

If Xn > 0, then
<

E[liminf

Xn|C?]

< liminf
^

E[J\\:n|a], a.s.
X, a.s.,

(g) (cDOM) If \\Xn{u;)\\

F(u;),

Vn, EV

< 00, and Xn

then

E{Xn\\g)-^E{X\\g),

a.s.
E|c(X)|

(h) (c

JENSEN)

If

c : R

->

is

convex,

and

< 00, then

E[c{X)\\g]>c{E[X\\g]\\ a.s.

< Important corollary: ||E(X|a)||p


(i) (Tower

\\\\X\\\\p

for

p >

1.

Property) If W

is

a sub-cr-algebra

of ^,

then

= E[x\\ni E[E{x\\g)\\n]
Note.

a.s.
and bounded,

We

shorthand
what

LHS
is

to E[X|a|W]
known')

for tidiness.
is ^-measurable a.s.
and E{X)

(j) ('Taking out


then

If Z

(*)

E[ZX\\g]

= ZE[X\\gi
\302\243^(fi,

If i? > l,i?-i +^-1 = l,X e again holds. If X G (mJP')+,


then

J^,P)

\342\202\254 (ma)+,

Z G \302\243^(fi,a,P),then < 00 and E{ZX) < then

(*)

co,

(*)

holds.

(k)

(Role

of independence)

If H is independentof
a.s.

a{a(X),g),

E[X\\a{g,n)]=E{X\\g),

In particular,

if

is independent

of W, then

E{X\\n) = E(A'),

a.s.

..(9.8)
9.8. Proofs
Property

Chapter

9:

Conditional

Expectation

89

9.7 of the properties in Section


element
of

its Clarification has beengiven.


Property
immediately to

Property

= E(JV;J1), Jl being an since E(y;Jl) follows as is Property the from is immediate definition, (b)
(a)

Q.

(c) now that

(d)

is not

obvious, but the proof

of

(9.5,d)

transfers

our

current

situation.

Proof of (e). If 0 < Xn T ^^ then, by (d), if, for each n, Yn is a version of Then Y G mQ, and Y := limsupFnE{Xn\\Q), then (a.s.) 0<Ynt Define a.s. Now use (MON) to deduce from Yn T y,
E(r\342\200\236;G')

E(Xn;G),

VG G a,

that
argument

E(y; G)
in

= E{X]G), VG
9.5.)

G G-

(Of course

we used a

very

similar

Section

D
should

Proof of (f) and


(FATOU)

(g).
(MON)

You

check

from

(DOM) from
you.

(FATOU)

in Section in Section
from

that the argument used to obtain 5.4 and the argument used to obtain to 5.9 both transfer without difficulty
careful

yield the conditional from (cMON) and of

versions. Doing the


(cDOM)

derivation

of

(cFATOU)

is an

essential exercisefor
n

(cFATOU)

Proof of

(h). From (6.6,a), there existsa


R^

countable

sequence

((ctnj^n))

of

points in

such

that

c(x)

= sup(ariX
n

-h

6\342\200\236),

G R.

For each
surely,

fixed n we deduce via

(d)

from

c{X)

> CnX

4- bn

that,

almost

(**)
By

^lc{x)\\g] >
appeal
for

a\342\200\236E[x\\g]

b\342\200\236.

the

usual

to count ability,
all

we

can

say that

simultaneously

n,

whence, almost surely,


+ =

almost surely (**) holds

> snp(a\342\200\236E[X\\g] E[c(X)|C?] n


Proof of

6\342\200\236) c(E[J^|g]).

corollary

to

(h).

Let

p >

1. Taking
>
\\E(X\\g)\\\\

we c{x) = |a:|P,

see

that

E(|Xng)

a.s.

90

Chapter
take

9: Conditional
using

Expectation

(9.8)..
\342\226\241

Now

expectations, (i)

property

(a).
definition

Property

is virtually

immediate from the

of

conditional

expectation.

Proof of
Y

of E{X\\g),

(i). Linearity showsthat and fix G in Q. We is Z Q-measurableand if


then

we must

can

assume
prove

that X

> 0. Fixa version


conditions

that
integrahility

appropriate

hold,

(***)
We

E(ZX;G) = E(Zr;G).
machine. If Z is the indicator of a set in ^, then (***) of the conditional expectation Y. Linearity then shows for Z \342\202\254 Next, (MON) shows that (***) 5F+(fi,a,P). both be sides might (m^)\"^ with the understanding that

use

the

standard
definition

is true by
that

is true for
infinite.
All

(***)

holds
Z

that

is necessary

is

to

show

that
if
\\i

is obvious
inequality

Z
X

to establish that property (j) in the tableis correct under each of the conditions < cx). This given, Ed-ZXl) is bounded and X is in \302\243^,and follows from the Holder D ^ C^ and Z E C^ wherep > 1 and -{- q\"^ = 1. p\"^
can

Proof o/(k).
ff
\342\202\254 XIq H,

We

assume

and

H are

that X >0 (and E(A') < oo). ForG EQ and independent, so that by Theorem 7.1,

E(JV; GnH)
Now

= E[(XIg)Ih] = E(XlG)P(if).

if

independent of H so that

= E(X\\Q)

(a version

of), then sinceY

is

^-measurable,

YIq

is

E[(riG)iH] = E(riG)P(^)
and

we

have

= E[Y;GnH]. E[X;GnH]

Thus the

measures

K-.

E(X;

F),

F >-* E(Y;

F)
of on

on a(Q, \"H) of the same finite total mass agreeon the 7r-system form GC\\H(GeG,H\302\243 everywhere H), and henceagree is exactly what we had to prove.

sets

of the

(t(Q,

H).

This

..(9.10)
9.9.

Chapter 9: Conditional
conditional
have

Expectation

91

Regular

probabilities
=

ForF e f,we

P(F)

E(If).

For
be

and pdfs F e J^ and G a

sub-a-algebra

of ^,

we define

P{F\\Q) to
can

a version

o/E(If|^). for a

By linearity
disjoint

and (cMON), we
of ^,

show

that

elements

we have
EP(^\302\273l^)'

fixed sequence(Fn) of

(a)
Except

P(U^\"i^) =
in

(^\342\200\242^\342\200\242)

trivial

cases, there

are

uncountably

many

sequences

of disjoint

sets, so we cannot

concludefrom

(a)

that

there

exists a

map

P(-,-):fix:r^[o,i) such that


(bl)

for

F e

J^, the function


every

uj

\302\273->

P(a;,F)

is a

version

ofP{F\\Q);

(b2)

for almost

lj,

the

map

F^P(u:,F)

is a
If such is known
encountered

probability

measure

on T,

a map
that
in
for

exists, it

technical
Important

is called a regular conditional probability given Q. It conditions regular conditional probabilitiesexist under most exist. The matter is too practice^ but they do not always book at this level. See, for example, Parthasarathy (1967).

is a proper - technically, for every AmB^

note.

The elementaryconditionalpdf
regular

- conditional

pdf for X

/x|z(^k)

of Section
given

9.6

in that

^
Proof,

\342\200\242\"*

JA
/i =

fx\\z{^\\Z{^))dx

is a

version of

P{X

G A\\Z).

Take

U in Section 9.6.

9.10. Conditioningunder independence assumptions


Suppose

that

r
If

G N
/i G

and that
bS\"\"

A'*!,^\"2,

\342\200\242 \342\200\242 are A'r \342\200\242,

having law Ajt.

independent

RVs, Xk

and

we

define

(for xi G R)

(a)

^\\x,) = E[h{x,,X2,X^,...,Xr%

92
then

Chapter

9:

Conditional

Expectation

(9.10)..

(b)

7^(-X'i)is a

version

of

the

conditional

expectation

E[/i(Xi,X2,...,X,)|Xi].

Two proofs of {h). We

need

only

show

that for

B e B.,
of

(c)
We

can

do this

E[h{Xi,X2,...,Xr)lB{Xi)]=E[j'^{Xi)lB{Xi)]. H h satisfying via the Monotone-Class Theorem,the class


functions

(c) contains
the

the indicator

of

elements

in the

7r-system of

sets of

form

B1XB2X

...xBr
appeal

etc., etc.
says

Alternatively,

we

can

to the

{Bke B), r-fold Fubini Theorem;for


/
Jxi\302\243R

(c)

that

Jx\302\243R^

/i(x)Ib(xi)(Ai

A2

...

Ar)(c/x)

l^{xi)lB{xi)Ki{dxi),

where

7^^i)=

/
Jy\302\243Rr\"-i

h{xi,y){A2X...xAr){dy). an example
RVs with
4-

9.11.

Use
that

of symmetry:
Xi,

Suppose
E(\\X\\)

X2

\342\200\242 \342\200\242 are IID ? \342\200\242

< 00.

Let 5n Qn

:= Xi

the same distributionas X, where


and

4-

X2

\342\200\242 \342\200\242 \342\200\242

4- Xn,

define
\342\200\242 \342\200\242 \342\200\242)\342\200\242

\342\200\242\342\200\242=

Cr(5'n,5n+1,.

= . \342\200\242) Xn+i,-X'\342\200\236+2, a(5\342\200\236,

We

wish

to calculate
E(Xi|a\342\200\236),

for is

very

good

independent

14. Now cr(Xn+i, Xn+2, \342\200\242 \342\200\242 reasons, as we shall seein Chapter \342\200\242) of cr(Xi,5n) of ... (which is a sub-cr-algebra (t(Xi, ,Xn)).

Hence, by (9.7,k),

But if
we

denotes

the

E(Xi\\gn) = E{Xi\\Sn). law of X, then, with 3\342\200\236 denoting

xi

4-

^^2

4- x\342\200\236,

have
E(Xi;5\342\200\236G5)

...

/
Jsn\302\243B

xiA{dxi)A(dx2)... A{dxn)
...
=E(X\342\200\236;5nG5).

= Hence,

E(X2;SneB)=

almost

surely,
E(Xi|5\342\200\236)= \342\200\242\342\200\242\342\200\242 =E(X\342\200\236|5\342\200\236)

n-\302\273E(Xi +

... +

n-^Sn. X\342\200\236|5\342\200\236)

Chapter

10

Martingales

10.1. Filtered spaces


\342\226\272 \342\226\272As basic

datum,
(Q,J^,

we now is a
>

take a filtered space(fi,^,

{^n},P)-

Here,

P)
n

probability triple
a filtration,

as usual,
that is, an
^.

{^n :

0} is

increasingfamily

of

sub-

cr-algebras

of J^:

^0 C ^1 C ... C C^.
u; in

We define

J'oo:=<7(\\jj'n)

Intuitive idea. The information


prefer,
measurable

about

Q available
the

'just

after') time n consists preciselyof Z. Usually, {fn} is the functions


:F\342\200\236 a{Wo,Wi,...,W\342\200\236)

values

natural filtration

to us at (or, if you of Z{(jj) for all ^\342\200\236

of

some

about u

(stochastic)
which

process
have

W =

(Wn

*\342\200\242 n E Z\"^),

and
values

then the

information

we

at time n

consists of the

Woiu;),Wiio,),...,W\342\200\236{u;).

10.2.
\342\226\272A process

Adapted
X

process
= (Xn
'\342\200\242 n >

0) is

called adapted (to


value

the filtration
is known

{J^n})

if for

each n,
Intuitive
Usually,

is J>i-measurable. X\342\200\236

idea.
J^n

If X

is adapted,the

-X'\342\200\236(u;)

to us
W^n)

at time
for

n.

<7{Wo,Wi,...,

and W^\342\200\236) on R\"\"^^. /\342\200\236

Xn

= fn{Wo,

W^i,...,

some

g\"+i-measurable

function

9S

94

Chapter 10:

Martingales
submartingale

(10.S)..

10.3. Martingale, supermartingale,


\342\226\272 \342\226\272\342\226\272A process

is called

a martingale

(relative to ({J^n},P)) if

(i)

is adapted,

(ii)
(iii)

E(|X\342\200\236|)<oo,Vn,

J^\342\200\236_i, E[X\342\200\236|:F\342\200\236_i]

a.s.

(n>l).
similarly,

A superniartingale (iii) is replaced by

(relative to {{Tn},P)) is defined

except

that

a.s. E[X\342\200\236|:r\342\200\236_i]<Z\342\200\236_i,

(n>l), replaced

and a

submartingale

is defined

with

(iii)

by

E[Xn\\rn-l]>Xn-U
A

a.S.

(n > 1). 'increaseson


/ on
f{B)
R\",

supermartingale
[Supermartingale

'decreases

on average';
corresponds

a submartingale
superharmonic:
filtration

average'!
R\"

to

a function
of

is

superharmonic

if and only if

for a

Brownianmotion B on
B.

is a

local supermartingale relative to the natural Section 10.13.]

Compare

Note that
and that
\302\243^(fi, if

is a

supermartingale
and

if and only if

is a \342\200\224X

submartingale,

X is a

a submartingale. It is important
J^5,P) and

martingale if
X

only if it is that to note =

both a supermartingaleand a process X for which Xq 6

only

property. So we can focus


\342\226\272

is a martingale if the process

[respectively, supermartingale,
\342\200\224

submartingale]
has

Xq

attention

Xq {Xn on processes

\342\200\224

: n

\342\202\254 Z\"^)

the

same

which are null

at 0.
of

If

is for

CEs,

(9.7)(i),

example a supermartingale,then shows that for m < n^


=

the Tower
<

Property

E[Xn\\Tm]

ElXnlJ'n-^llTm]

< E[Xn^l\\Tm]

< Xm^

a.s..

10.4.

Some

examples
is

of miartingales

As we shall see,it
and

submartingales

importance

up in

very

be

studied

to view all martingales,supermartingales the enormous gambling. But, of course, of martingale theory derivesfrom the fact that martingales crop contexts. For example, diffusion theory, which used to many via methods from Markov-process theory, from the theory of
very helpful of in terms

..(10.4)

Chapter

10:

Martingales

95

partial
Let interesting

differential

equations,

etc.,

has been

revolutionized by the
examples,

martingale
an

approach.
us question

now

look
(solved

at some
later)

simple first
pertaining

and

mention

to each.
Let and

RVs. zero-mean (a) Sums of independent of

Xi,

X2,...

be a

sequence

independent

RVs

with

Ed^itl)
E(X,)

< oc, Vfc,


= 0,

Vfc.

Define

(5o

:= 0

and)
:= Xi 5\342\200\236

4- -X'2

4- -X'n,

J^n:=^(Xi,X2,...,Xn),

J^o:={0,fi}.

Then for n

> 1, we

have

(a.s.)

E(5n|^n-l)

= E(5n-l|J^n_l)

+ E(Xn|J^n-l)

=
The

Sn-\\

= Sn-1' 4- E(-X'\342\200\236)

first (a.s.)

equality is obvious

= and since X\342\200\236 is independent That must our notation! by (9.7,k). explain when does lim 5n exist (a.s.)? SeeSection 12.5. Interesting question:
(b)

is 5\342\200\236_i

^n-i-measurable,

from the linearity property (9.7,c). Since = 5n-i we have E(5\342\200\236_i|J*\342\200\236^i) (a.s.) by (9.7,b); of J^n~i, we have E(-X'n|^n~i) E(-X'\342\200\236) (a.s.)

Products X2,...

Xi, with

be

of non-negative RVs of mean 1. independent a sequence of independent non-negativerandom variables


E(Xt)

Let

= l,

Vfc.

Define

(Mo

:= 1,

JTq

:=

{0,fi}

and)
!Fn
'\342\226\240= Cr(Xi,X2,.

Mn :=
Then,

X\\X2

\342\226\240 \342\226\240 .X\342\200\236,

..

,X\342\200\236).

for

n >

1, we have
=

(a.s.)

E(M\342\200\236|:r\342\200\236_i) E(M\342\200\236_iX\342\200\236|^\342\200\236_i)^i:W\342\200\236_iE(x\342\200\236|:r\342\200\236_i)

^=W\342\200\236_iE(X\342\200\236) M\342\200\236-i,

so

that

A/ is

a martingale.

96
It should

Chapter 10: Martingales


be remarked that

(10.4)\"

such martingalesare not

at

all

artificial.

Because M is a non-negativemartingale,Moo = Theorem this is part of the Martingale Convergence 14.12 we say that E(Moo) = 1? SeeSections can of the next chapter. When
lim Mn

Interesting question. exists (a.s.);


and

14.17.

(c)

Accumulating

data

about
J^,P).

a random variable.
Define
have := M\342\200\236 E(^|J^n)

Let

{Tn}

be

our

filtration,

and let

^6

\302\243^(0,

('some

version

we of). By the Tower Property (9.7,i),

(a.s.)

E(MnlJ^n^l)
Hence

= E(e|J^n|J^n~l)

= E{i\\J'n-l) =

Mn^L

is a

martingale.
shall

Interesting question.In this case,we


Mn -^
because
Moo

be

able

to say that
a.s.,

:=

E(^|J^oo),

is the best of Levy's Upward Theorem(Chapter14). Now Mn available to us at time n, and Moo is the the information predictor of ^ given best prediction of ^ we can ever make. When can we say that ^ = E(^|^oo)5 a.s? The answer is not always obvious. See Section 15.8.

10.5. Fair and


Think
Xn in

unfair

games

now of
\342\200\224

Xn-i

as your

net

winnings

per

unit

stake

in game n
There

a series

of games,

played

0.

at times n = 1,2,

(n > 1)
game

is

no

at time

In the martingale case,


(a) \342\200\224 Xn-i E[-X'\342\200\236

\\Tn-i]

= 0,

(game seriesis fair), (game seriesis unfavourable


a useful to

and in the supermartingale case,


(b)

B[Xn

\342\200\224

Xn-i

l^n-i]

^ 0,

you).

Note
martingale

that

(a) [respectively (b)] gives


property

way of

formulating the

[supermartingale]

of X.

10.6. Previsible
\342\226\272 \342\226\272We call

a process

process, gambling strategy C = (Cn : n G N) previsible if

Cn is

fn-1 measurable(n > 1).

..(10.8)
Note

Chayier

10:

Martingales

97 Z^i

that
Think

exist.
of
of

C has parameter

set

rather

than

Co does

not

Cn

based

Cn as your stake on on the history up to


the
\342\200\224

game n. You have to decide on the value is the 1. This (and including)time n \342\200\224
character

intuitive significance of

game n are Cn{Xn

-^n-i)

'previsible' and your

of C.
up

Your winnings on
time

total

winnings

to

n are

Yn=
Note

J2
l<Jt<n

C',(Xfc-AVi)=:(C#X)\342\200\236.

that

(C

\342\200\242

X)o

= 0,

and that

discrete
theory

The expression
analogue

the C \342\200\242 X, the

of

is

one

of the

greatest achievements of

transform martingale stochastic integral J CdX.


the

C, is the Stochastic-integral
of X by
theory

modern

of proba-

bihty.

10.7.
\342\226\272 \342\226\272(i)

fundamental

Let

system! C be a bounded non-negative previsible processso that, some for < K for every n and every u. Let X be a superin [0, oo)^ |Cn(<^)|
principle:

you

canH

beat the

m,artingale

[respectively

martingale].

Then

C%X is X is

a superw^artingale
m,artingale,

[m,artingale]

null
bounded

at 0.
previsible

(ii)

// C is a
{C
\342\200\242 is

process

and

then

X)

a w,artingale
(ii)^the
G

null at

0.
be
Xn

(iii)
Proof

In

(i)

and

boundedness
C^^Vn,
for

condition Cn
of (i). Write

provided
C

condition on C m,ay we also insist that


is bounded C\342\200\236

replaced
G C^,Wn.

by the

\342\200\242 X. Since

non-negative

and

!Fn-i measurable,
E[Yn

Yn^l l^n-l]
and

CnE[X\342\200\236

Xn-1

|^n-l]

< 0,

[resp. =0].

Proofs of

(ii)

(iii)

are now obvious.

(Look again at (9.7,j).)

10.8. Stoppingtime
A

map

T : Jl

\342\200\224\342\226\272

{0,1,2,...; {T<n}

00} is = {u;:

\342\226\272\342\226\272(a)

stoppingtime T(u;) < n} e Tn, Vn

called a

if, <

00,

98

Chapter

10: Martingales

(10.8)..

equivalently,

(b)

{T

= n}

= {uj:

T(u;) = n} e Tn,
and (b). If T

Vn

< oo.

Note that

T can
the

be oo.
of (a)

Frooj

of

equivalence

has property
-

(a),

then

{T =

n} =

{T< n}\\{T< n
k <

1} G J^nQ: ^n

If T

has property (b), then

for

n, {T

= k} e J^k

and

{T<n)=

U
0<k<n

{T = k}eJ'n^ decideto stop playing


the our

Intuitive

Whether

the history up to (and including)


Example.

T is a time when you can idea. or not you stop immediately after
time

game.

n^^

game
J^n-

depends
B E

only on B. Let

n : {T

= n} E

Suppose

that
> 0

(An) is
E B}

an adapted process, and


of first entry
=
of

that

T=

inf

{n

An

= time
T

into

set B.

By convention, inf(0)
Obviously,

= oo, so that
{T<n}^
k<n

oo if A

never enters

set B.

\\J {Ak

e B}

e J'n,

so

that

T is

a stopping
L

time.
: n

Example. Let L =
yourself that
10.9.
is NOT

sup{n

<

G 10;A\342\200\236

B},
is

sup(0)
freaky).

= 0.

Convince

a stopping

time (unlessA

Stopped

supermartingales

Let X be a supermartingale, and let at (immediately Supposethat you always bet 1 unit and quit playing T. time Then 'stake is n G N, for your C^^\\ where, process' after)

are supermartingales T be a stopping time.

Your

'winnings

process'

is the

processwith

value

at

time

n equal

to

..(10.9)
If X^

Chapter 10: Martingales


the process X
X'^{uj)

99

denotes

stopped at T:
:=

XT(u;)An{(^),

then

Now C^^^

is clearly bounded (by


can

1)

and

non-negative.
n

Moreover,
G N,

C^^^ is

previsiblebecauseCn
{CP
Result

only

be 0

or 1 and, for
J'n-i^

= 0} =

{T<n-l} e

10.7

now yields

the following result.

THEOREM.
\342\226\272 \342\226\272(i)

If X

is a

supermartingale and T

process

X^ = (Xtau

is a stopping
is
<

time,

then

the stopped

^ \342\202\254 \342\200\242 Z\"^)

a supermartingale, Vn. X'^

so that in

particular,
E(XrA\342\200\236)

E(Xo),

\342\226\272 \342\226\272(ii)

IfX

is a

gale, so that

martingale
in

and T

is a stopping
= ^Xo),

time,

then

is a m,artin-

particular,

E(XTAn)

Vn.

It bility
definition

is important conditions
of

to notice that
whatsoever and

this theorem imposes no extra integrain the (except of course for those implicit
martingale).
on

supermartingale

But be
at 0.

careful! Let X be a simple random walk very Then X is a martingale. Let T be the stopping time:

Z\"^, starting

T :=

inf{n : Xn

= 1}.
a

It is
proof

well

known

that

of this

fact, and for


though

P{T

< cx)) =
a martingale

1. (SeeSection10.12 for
calculation

martingale
of

of the

distribution

T.)

However,

even

E{XTAn) =
we have

E(Xo) for

every

n,

1=

E{Xt) j^

E(J^o)

= 0.

100
We

Chapter 10:
very

Martingales

(10.9)..

much

want

to know
E(Xt)

when we can say that


= E(Xo)
theorem

for a martingale

X. The following

gives

some

sufficient

conditions.

10.10. Doob's Optional-Stopping Theorem T he a stopping time. Let X be a supermartingale. Let \342\226\272(a) integrahle and

Then

Xt

is

E{Xt) < E(Xo)


N in

in

each

of the

following situations:
(for

(i) T is hounded
(ii)

some

N, T{uj) <

N,

Vu;/,

is bounded
and

every uo)

T
oo^

is a.s.
and,

(for some K finite]


for

in R^,
K in

|X\342\200\236(u;)|

<

for

every n and

(iii) E(T) <

some
~

R\"'\",

\\Xn{uj)

Xn^i{uj)\\

< K

V(n,u;).

(b)

If any

of

the

conditions

(i)-(iii)
E(Xr)

holds and
= E(Xo).

X is a martingale, then

Proof of (ai). We

know

that

Xtau

is integrable,

and

(*)

E(XrAn-Xo)<0.
(i),

For

we

can
have

take

n =

For (iii), we

N. For (ii), we
TAn

can

let

\342\200\224\342\226\272 oo in

(*) using

(BDD).

\\XTAn-Xo\\

k=l

^(X,

-X,_i)|

< KT

and E(/\\T) < oo, so that the answerwe want.


Proof

(DOM)

justifies

letting

\342\200\224\342\226\272 oc in

(*) to

obtain
D

o/(b).

Apply (a)

to X

and

to

(-X).

..(10.11)

Chapter 10:

Martingales

101

Corollary
-\342\226\272(c)

Suppose

that

M is
by

a martingale,
constant some

the

increments

Mn~Mn-i

of which

are bounded by

some

Ki.

Suppose K2,
o>nd

that C
T

is a previsible
stopping time

process
such
that

bounded

constant

that

is a

E(T)

< 00.

Then

E(C#M)t

= 0.

Proof
left

of

the

following

as

an //

Exercise. X

is Theorem the Optional-Stopping (It's clear whose lemma is needed!)


final

part of

(d)

is a

non-negative
finite, then

superm^artingale,

and

is a

stopping

tim,e

which

is a.s.

E(Xt)<E(.Yo).

10.11. Awaiting the


In

almostinevitable
some of
surely
the

order

to

be able
of

we need ways
announcement

to apply
that

results

of the

proving

of
happening

the

chance
often

principle
will

(when true!) E(T) < that ^whatever always

00. The

preceding Section,
following

of

almost

stands a reasonable sooner rather than laterals happen

useful.

LEMMA
\342\226\272

Suppose

that

some \342\202\254 > 0,

T is a we have,

stopping time such for every n in N:


<n-\\-N\\J='n)

that

for

som,e N in N

and

P(T
Then E(T)
You

>

e,

a.s.

< 00.
of this
first

will

find

the proof

set as an exercise in
occasion

Chapter

E.

Note

that if T
the

is the

at exercise'

by

which

the monkey in

the 'Tricky

end

of Section

4.9 first completes

ABRACADABRA, then E(T) <


to

00.

You

will

find

another

exercise

apply

result

to show (c) of the precedingSection


E(T)

in Chapter
that

inviting

you

= 26^^

4-26^4-26.
now

large

number

of other

Exercises are

accessible

to

you.

102

Chapter 10:
for

Martingales
random

(10.12)..

10.12.Hittingtimes
Suppose that
(X\342\200\236

simple

walk
each

: n

same distribution as X where

G N)

is a

sequenceof IID RVs,


=

Xn

having

the

P(X = 1)
:= Xi Define So := 0, 5\342\200\236

P(X

-1)
set

= i.

4-

and -X'\342\200\236,

T:=inf{n:5n

= l}.

Let
Then wish

= . ,-X'\342\200\236) Tn = Cr{Xi,.. (7(5o,5'i,.

.,5\342\200\236).

the to

process calculate

S is adapted (to {fn})^so that the distribution of T.


Ee^^

T is

a stopping

time.

We

For 0 eR,

i{e^

4- e\"^)

= cosh^,
=

so that
Vn.

E[(sechl9)e^^\"]

1,

Example (10.4,b)

showsthat

M^

is a

martingale,

where

M^ = (sechl9)\"e^^\".

SinceT is a
(a)

stopping

time,

and

M^

is a

martingale, we have
= 1,
Vn.

EM|,^\342\200\236

E[(sech^)^^\"

exp(^5rAn)]

\342\226\272

Now

insist

that as n t

0 >

0.
is bounded

Then, firstly,
e^.

exp(^5rAn)

by e^,

Secondly,

T = oo. The Bounded Convergence Theorem allows to obtain

where cx), Mj^^^ \342\200\224\342\226\272 Mj^

the latter

so Mj^^^is bounded by is defined to be 0 if


us

to let

\342\200\224\342\226\272 oo in

(a)

EM|, = l
the

E[(sech^)^e^]

term

inside

on the [\342\200\242]

right-hand

side correctly

being 0 if
0.

= cx). Hence

(b) We now

E[(sech^)^] =
\"f

e-^
T <

for

^ >

let 0 10. Then(sechl?)^ Either (MON) or (BDD) yields


EI{r<oo}

1 if

oo, and

(sechl?)^T

0 if

T =

cx).

= 1

= P(r

<

CX)).

.,(10.13)

Chapter

10: Martingales

103
to

\342\226\272 The

above infinite

possibly

argument stopping
(b)

has been times.


to obtain
=

given carefully

show

how to

deal

with

Put a
(c) so that

= sech^in
E(a^) =

^ a\"P(T
P(T =

n)

e~^ =

a\"^ [1 -

\\/l

a2],

2m-l) =

(-ir+>^j.

Intuitive proofof (c)


We

have

(d)

/(a)

:=

E(a^) = \\E{a^\\X,= 1) +

iE(a^|Xi

-1)

reason for the very last term is that time 1 has already elapsed \342\200\2241 to 1 has the form Ti -|- T2, and the time taken to go from giving to 0) and T2 (the time to go from where 0 to \342\200\2241 Ti (the time to go from as are T. It is not obvious each same with the distribution independent, 1) to devise a proof: that 'Ti and T2 are independent', but it is not difficult the so-called Markov us to allow Theorem would Strong justify (d).
The intuitive the a,
10.13.
Let

Non-negative
\302\243* be

superharmonic or countable
G

functions
(pij)

for Markov chains


be

a finite
for

set. Let P =
=
Y^pik

a stochastic

E x

matrix, so that,

z, j

E, we have

Pij>0,

l.

Let /i be a probability measureon E. We know from Section 4.8 that there exists a triple (fi,^, P'^) (we now signify the dependence of P on //) carrying a Markov chain Z = (Z\342\200\236 : n G Z\"^) such that (4.8,a) holds. We write 'a.s., P'^'to signify 'almost surely relative to the P'*-measure'.

Let Tn := cr(Zo, Zi,..., write p(z,j) insteadof pij

Zn). It is ezisy
when

to

deduce

from

(4.8,a) that if we

typographically

convenient,

then (a.s.,P'')

Let

/i be

a non-negative

P^{Zn^^=j\\J^r^)=p(ZnJy function on E and define the (Ph)ii) = J2pii,j)hij).

function

Ph

on E

via

104
Assume that our

Chapter

10:

Martingales

(10.IS)..

that Ph<h

non-negative on E. Then,(cMON)

function

h shows

is finite that,

a.s.,

and P-superharmonic P'^,


<

in

E^[h{Zn-,l)\\J'n] =
SO that

J2p{ZnJ)h{j)
supermartingale

{Ph){Zn)

h{Zn%

h{Zn)

is a

non-negative

(whatever

be the

initial

distribution //).

Suppose

that

the

chain

Z is

irreducible recurrent in
< cx))

that

P'(T; /,,\342\200\242

:=

= 1,
mass

Vz',iG^,
(//j

where P' denotesP'^ below)and


Note return

when

// is

the

unit

6ij) at i

(see 'Note'

Tj

:=inf{n:n>l;Z\342\200\236=i}.
over

that

the infimum

is

to i if Z starts at i. and P-superharmonic^ non-negative

[n

>
by

1}, so that
Theorem

/,-, is the

probability of
if
h

a is

Then,

then, for

10.10(d), i and any

we see that j in

E^

hiJ) = E'h{ZTj)<E'hiZo)= hii),

so

that

is constant

on E.
first

Exercise.

Explain (at
fij

intuitively,

and

later

with

consideration

of

rigour) why
=

2Zi?tit//:j

-^Pij >

k^j
and

/^Pikfkj
k function

deduce

that

if every

then

Z is

irreducible recurrent.
have

non-negative P-superharmonic

is constant,

So

we

proved

that
recurrent

our chain Z
negative

is irreducibleand
function step
first

P-superharmonic
trivial

if and is constant.

only if every

non-

This is a
theory.

in the

links between probabilityand potential

Note.

The

perspicacious

reader
convey

will

have what

been

upset

in this

section.

I wished to
very

is interesting

by a first.

lack of

precision

Only the

enthusiastic

should

read

the remainder

of this section.

,.(10.13)
The natural
\302\243 denote for model take the canonical

Chapter

10:

Martingales

105 transition

thing to

do, given
the

the

one-step E

matrix

P, is to

Markov
of

the

<7-algebra

of all

subsets

chain Z obtained and define

as follows. Let

(fi,j^):=
a point
Q is

n(^\"^)nGZ+

In

particular,

u;

of

a sequence
=

u;

(u;o,cc;i,...)

of elements

of E. For u;

in

fi and

n in
:=

Z\"^,

define

Zn{u;)

LJn

E.

Then,

is a unique probability there for each probability measure/j, on (E, \302\243), \342\200\242 \342\200\242 \342\200\242 measure P'^ on (fi,J-) such that for n G N and \302\253o?\302\253i? G E, we have ?in

(*) P''[u; : Zo(u;)


The
the

io,Zi(u;)
trivial

=
because

ii,...,
u;-sets
with

Zn(cc;) =
of the
0,

in]

f^ioPioii

-\"Pin-iin'

uniqueness is
left-hand

side of

Existence follows
canonical

(*), together
we

form

because

can

take

process Z constructed in
P'^

P'^ to be the SectionA4.3:

a 7r-system
P'^-law

form contained in [\342\200\242] on J^. generating


of

the

non-

=P''oZ\"^

Here,

we regard Z

as the map

u;^{Zo{u;\\Zi{u),...\\
this

map

Z being

^/J^

measurable in that

The canonicalmodel thus


measurable

obtained

is very

satisfying

because the

space

(Jl,^)

carries

all measures

P'^ simultaneously.

Chapter

11

The

Convergence

Theorem

11.1.
The

picture

that

says

it all

for a process X 11.1 shows a sample path n \302\273-> The top part of Figure Xn{(^) stake on unit where Xn \342\200\224 Xn-i game n. The represents your winnings per X lower part of the picture illustrates your total-winnings process Y := C \342\200\242 under the previsible strategy C describedas follows:
Pick

two

numbers
until
unit

a and
X
stakes

with

a <

b.

REPEAT
Wait

gets

below a
until

Play
UNTIL

X gets

above

and

stop

playing = 0.

FALSE (that
where

is, forever!).
=
at

Blackblobs
Recall that
To be

signify

1; and
time

C is not defined
more formal
(and

open circles signify


that C

where

0.

to prove
Ci

inductively
I{Xo<a},

is previsible),define

:=

and, for n

> 2,

11.2. Upcrossings
The

number

i7iv[a,6](u;)

by

time

N is

defined to

be the largest

of upcrossings

of [a, 6]
in Z\"^ such

made

by

\302\273-> -X'\342\200\236(u;)

fc

that
tk <

we can find N

0 <si < ti
with

<

S2

<

t2 <

'\" <

Sk

<

Xs,{u;)

< a,

Xt,{u;) >b

{1 < i <

k),

106

.(11.2)

Chapter

11: The
Figure

Convergence Theorem
11.1

107

QQQ

108
The fundamental
\342\226\272(D)

Chapter 11: The

Convergence

Theorem

(11.2)..

inequality (recall that Fo(^)-= 0) - [Xiv(cc;) a)UN[a, b]{u;) >{bYN(i^)

a]'

the F-value of [a, b] increases is obvious from the picture: every upcrossing the loss during the [-X'a^(u;) \342\200\224 while overemphasizes by at least (6 \342\200\224 a]~ a), the last 'interval of play'.
11.3.
\342\226\272

Doob's
Let

Upcrossing
be a
[a,
b]

Lenima

swpermartingale.
by

Let

i7iv[a,

^]

be the

number

of

wpcrossings of

time

N.

Then
<

(6 -

a)EUN[a,b]

E[{Xn

a)'].
> 0, result
X. and Y = C \342\200\242 now follows from

bounded Proof. The processC is previsible, F is a supermartingale, and E(yiv) 0. \302\243 (11.2,D).

and The

Hence

11.4. COROLLARY
\342\226\272

Let

be a

supermartingale
n

bounded in C^ in
<

that

supE(|A'\342\200\236|)

oo.

Let a, 6 G R

with

a <

b. Then,

with

Uoo[a^

b] :=t

limiv UNla, b],


<

{b-a)EUoo[a,b]
SO that

< |a| +

supE(|X\342\200\236|)

oo

P{Uoo[a,b] =
Proof

oo) = Q.

By

(11.3),

we have,

for
<

iV

G N,

{b^a)EUN[a,b]
Now

\\a\\

E{\\XN\\)

< |a|

4-supE(|X\342\200\236|).

let N

\"{

oo,

using

(MON).

..(11.7) 11.5. Doob's


Let \342\226\272\342\226\272 \342\226\272

Chapter 11:

The

Convergence

Theorem

109

'Forward' Convergence Theorem bounded in O : be a superniartingale


almost
liniXn surely, X^o \342\200\242=
define

supEd-Ynl)

<

oo.

Then,

exists

and

is finite.
Vu;,

For
Xoo

definiteness, we
is ^oo
Write
: =

measurable and
(noting

XqoC^)
X^o

'-=
=

limsup-X'\342\200\236(u;),

so

that

limX\342\200\236,a.s.

Proo/(Doob).
A

the use of

[\342\200\22400,00]):

{u; :

Xn{^) 1)

does not converge to a


< A'\342\200\236(u;)
{ijj

limit

in

00]} [\342\200\22400,

{ijj : liminf

limsupXnC^)}

: liminf-X'n(c<;)

< a

< 6<

limsup-X'\342\200\236(u;)}

{a,6GQ:a<6}

=:[jAa,6

(say).

But Aa,6 Q
so that,
we

{^ : Uoo[0',h]{ijj)
is

00},

by
that

see

(11.4), P(Aa,6) = 0. SinceA P(A) = 0, whence


Xoo \342\200\242= lini-Yn

a countable

union of

sets

Aa,6,

exists

a.s.

in

[\342\200\22400,00].

But

Fatou's

Lemma

shows that

E(|Xoc|) =

E(liminf |Xn|) < liminf

^(|Xn|)

< SUpEd^nl)
SO

< 00,

that

P(Xoo

is finite)

= 1.

is as

for the discrete-parameter case. None of these proofs and none shares the central of one for this probabilistic, importance the continuous-parameter case.
are

Note.There

other

11.6. Warning
As

we

Xn

saw for the -* -X'oo in C^.

branching-process

example, it need not be

true

that

11.7. Corollary
\342\226\272 \342\226\272

If

is a

exists

almost

non-negative surely.
bounded

supermartingale,
in
since \302\243\\

then
= ^Xn)

Xqo

:=

limXn

Proof. X is obviously

E(|Xn|)

< E(Xo).

Chapter

12

Martingales

bounded

in C

12.0.

Introduction

When

boundedin C^
(a)

it works,
is

one
to

of

the

prove
<

ezisiest ways of proving that a martingale M that it is hounded in C? in the sense that

is

sup||M\342\200\236||2 n

oo,

equivalently,

supE(M^)
n

< oo.
formula

Boundedness

(proved

in C? is often in Section 12.1)

of because easy to check

a Pythagorean

ib=l

The

study

of sums

on Theorem 12.2 below, both of neat which have proofs. We shall prove the Threeparts martingale Series Theorem,which says exactly when a sum of independentrandom We shall also prove the generalStrong Law variables of Large converges. Numbers for IID RVs and extension of the Borel-Cantelli Levy's
in the

of independent
be

random variables,

central

topic

classical theory,

will

seen

to hinge

Lemmas.

12.1. Martingalesin
Let M

\302\243^; orthogonality

of

increments
each

: n (M\342\200\236

that E(M^) < oo,Vn.


that

> 0)

be a martingale in
for

C? in that
with

Mn

is in

C? so
know

Then

s, t,u^v
=

G
Mu

Z\"^,

s<t<u<v,we

E{Mr,\\J^u)

(a.s.),

so that
(a)

My\342\200\224Mu

is orthogonal

to C^{J^u)

(see Section 9.5) and in particular,


=

(Mt-M\342\200\236M,-M\342\200\236>

0.

110

..(12.1) Hence the

Chapter 12:

Martingalesbounded

in

C?

Ill

formula
n

Mn

= Mo

+ ^(Mit-Mit_i)

expresses
yields

as M\342\200\236

the

sum

of orthogonal
n

theorem terms, and Pythagoras's

(b)

E(M2)

= E(M2)

+ ^E[(Mfc
ib=l

Mu-xf\\.

THEOREM
\342\226\272

Let in

M C?

he a

martingale for

which

Mn

G C'^,

Vn. Then

M is bounded

if and only if

(c)

Y.^[{Mk-Mk-xf]<oo; and
when

this

obtainsj

Mn

\342\200\224> Moo

almost

surely

and in

M is boundedin C^.

Proof. It is obviousfrom

(b)

that

condition

(c) is

equivalent to the statement

Suppose
the property

now

that

(c)
Theorem

holds.
of

Then M
norms

Doob's

of monotonicity

(Section

is boundedin \302\2432, and hence, by is M boundedin C^. 6.7),


\342\200\242*= lim Mn

Convergence

11.5

shows

surely. The
(d)

that Pythagorean theorem implies

that M^q

exists

almost

n-\\-r
E[(M\342\200\236+.-M\342\200\236)2]=

Y.

E[(M*-Mft_i)2].

A:=n+1

Letting r

\342\200\224> oo and

applying

Fatou's

Lemma, we obtain
Y,

(e)
Hence

E[(Moo-M\342\200\236)2]<

E[(M*-Mft_i)2].

ib>n+l

(f)

liinE[(Moo-M\342\200\236)2]

= 0,

so that Mn \342\200\224> in C^. Of course, Moo (e) holds with equality.

from (f) allows us to deduce

(d)

that

112

Chapter

12: Martingales

bounded in C?
variables

(12,2)..
C?

12.2.Sums ofzero-mean independent


THEOREM
\342\226\272 \342\226\272

in

Suppose variables

that

(Xk : fc such that, for

N)

is a

sequence of

independentrandom

every k,
(7^

E(X,) = 0,
(a)

:=

Var(XO

< oo.

Then

(yZ^l (b)

\"^

^^)

iT^pli^^

ihat

(/\"y\"^* converges,
by

a.s.).
in [0, cxd)
in

//
that

the

variables

(Xk)
Wk,

are
Vu;,

bounded
then

some

constant

\\Xk{u;)\\ < K, (y_]Xk

converges,

a.s.)
0-1

im,plies that
law

(/^

<^I

<

^^)-

Note.

Of course,

the

Kolmogorov

implies

that

P(5^Xjt converges) =

0 or 1.

Notation.
J^o :=

We

define

(with

{0, Jl},

Mo := 0, by
n

the

usual

conventions).

We also

define

An:=J2^l

Nn:=Ml-An,

k=l

so that

Ao

:=

0 and We

No :=

0. M is a martingale. Moreover
= al

Proof of
(*)
SO

(a).

know

from (10.4,a) that

\302\243[iAh-M,.^)^]
that,

= E(Xl)

from

(12.1,b),

E(A/2)^\302\243^2^^\342\200\236.

If

Z)^fc

< ^?

^hen M

is boundedin \302\2432, so

that

HmMn

exists a.s.

..(12.3)
Proof of (b).
J^k-if
We

Chapter 12: Martingales


can

bounded

in C?

113

strengthen

(*) as

follows: since Xk is independentof


= E(X|)

we

have,

almost

surely,
=

E[(M* A

Ah-i)'\\J'k-i]
now
E(Mfc2m-i)

^Xl\\n-i]

= cl

familiar

argument

applies:
-

since Mk-i

is ^k-i measurable,
+ M|_i

al
But

2Mfc_iE(Mfc|:rfc_i)

= E(Ml\\J^k-i)-Ml.,
this

(a.s.)

result

states

that

N is a
Now

martingale.

let

c G

(0, oo)

and define

T:=inf{r : \\Mr\\
We know

>

c}.
every =

that N'^

is a martingale so that, for


EiVj

n, 0. see

= E[(Mj)2]
\\^t\\

E^TAn

But
for

since
every

\\Mt

\342\200\224

Mt-i

\\

^ K

li T

is finite,
Vn.

we

that

|Mj|

<K + c

n,

whence

(**)
However,
bounded,

EATAn<{K
since
and converges X^-X'\342\200\236

+ cf,
the
for

a.s.,

partial
some

it must

be the
Aqo

case that
^<^l

c,

sums of ^Xjt P(T = oo) > 0.

are a.s.
It is now
D

clear

from

(**)

that

\342\200\242=

< ^^'

Remark.
zero-mean

The proofof
sums

(b)

RVs uniformly

of showed that if (Xn) is a sequence bounded by some constant K, then

independent

(P{

partial

of J^X/t

are bounded } > 0) =>

i^Xk

converges

a.s.)

Generalization.
of

Sections 12.11-12.16 form present the natural martingale


applications.

Theorem

12.2 with

12.3. Random signs

Suppose
of IID
RVs

that

is a (a\342\200\236) with

sequence

of real

numbers and that

(\302\243\342\200\236)

is

a sequence

114

Chapter
of

12: Martingales
12.2

bounded in C?

(12.3)..

Theresults

Section

show

that
(a.s.)

Y^SnCin

converges

if and

only

if^a^

< oo,

and that
You should

infinitely ^Sndn (a.s.) oscillates think

if^o,^

= oo.

about

how

to clinch technique:

the latter

statement.
the sample

12.4.
We

symmetrization a stronger

expanding

space

need

result than that

provided by (12.2,b).

LEMMA
Suppose

by a

that i^ ^ sequence of (Xn) constant K in [0, oo):

independentrandom
A',

variables

bounded

lA'nHI <
Then

Vn,Vu;.

(^X\342\200\236converges,

a.s.)

=>

(^E(-X'\342\200\236)

converges

and

^ Var(-X'n)
would

< oo).
to

Proof. If each Xn version' Z*


Define
(12.2,b).

hzis

mean

zero,
way

then of
replaces
\302\243is to

There
of

is a
n

nice trick which


0 in
G

course, this
each

amount

mean

such a
N))

preserve
of

by a 'symmetrized enough of the structure.


Xn
(fi,^,P,(Xn : n G

Let {^,T,P,{Xn :

be an

exact copy

N)).

(n*,:F*,p*):=(fi,:F,p)x(n,:F,p)

and, for
X:(u;*)
We
clear

u;*

(^,i^)

define \342\202\254 Jl*,

:=

X\342\200\236H,

A-K)

:=

X\342\200\236(u-),

Z^K)

:= A^Cu;*)

X\342\200\236(u;*).

think
(and

of X*
may

be proved

as Xn lifted to the larger 'sample It is space' (J1*,^*,P*). by applying Uniqueness Lemma 1.6in a familiar

way)

that

the combined family


{XI : n

G N)

(X^

: n

G N)

is a
and

family

of

independent

random

variables

X* having

the same P*-distributionzis


P* o (X*)-i

on (Jl*,^*,
the

P*), with

both

X*

P-distribution

of Xn'.

= P o X-i

on

(R,

B),

etc.

..(12.5) Now we

Chapter 12:

Martingalesbounded

in

C?

115

have
n

(a)

(Z* :
variables

N*)

is a

zero-mean

on (fi*,

J^*,P*) such that

sequence of
\\Zn{u;*)\\

independent

random

< 2K

(Vn,Vu;*) and

where a'^ := Var(X\342\200\236).


Let

G := with
P*(G

{u; E

0>

: X^-X'\342\200\236(u;) converges}, that

G defined
X

similarly. Then we are given


X^

P(G)

G) = 1. But

Z;i{u;*)

converges

on G

x G,

so that

= P(G)

= 1, so that

(b)

P*(X;

^n
we

converges)
conclude

= 1.
that

From (a) and (b) and

(12.2,b),

and now it (c)


the variables

follows

from

(12.2,a)

that
E(^n)]

Yll^n in this sum

converges,

a.s.,

with being zero-meanindependent,

E[{Xn-EiX\342\200\236)r]

al
X1E(-X'\342\200\236)

Since (c)
converges,

holds and

Yl-^n

converges

(a.s.)

by hypothesis,

Note.

Another

proof of the lemma

may

be

found

in Section

18.6.

12.5. Kolmogorov's Three-Series Theorem


Let

(Xn)

be a

converges
K

almost

sequence of independent random variables. Then surely if and only if for some (then for

Y^Xn
every)

> 0,
(i)

the following
EP(l^n|>/0<00, n

three propertieshold:

(ii) \"^^{Xjf)
n

converges^

(iii)

EVax(X\342\200\236^)<co,

110

Chapter
where

12: Martingales bounded in

C?

(12.5)..

^-^^>'-\\0

if |Xn(cc;)|>X.
that

Proof of
Then

Hf part.

Suppose

for

some

K >

0 properties (i)-(iii)hold.

Y,^(Xn

^ Xf)

5^P(|Xn|

> K) < oo,

so

that

by

(BCl)

P{Xn

= X^
we (ii),

for all but


need we

finitely

many

n) X^

= 1. converges

therefore clear that surely; and because of


It is

only

need

show that YL only prove that


Y^

almost

a.s.,where Y: Y^ converges,
However,

:=

X^

- E(X^).
independent

the

sequence

(Yj^

: n G N)

of is a zero-mean sequence

random

variables

with

E[(yj^)^]=Var(Xf).

Because of
Proof
any

(iii),

the

desired

result now that

follows from

(12.2,a).

of
constant

^only

if^

part.

Suppose
many

in

(0,

oo). Since it is
finitely

almost surely

converges, ]^X\342\200\236 true

a.s., that

and

that

K is

\342\200\224> 0 whence X\342\200\236

\\Xn\\ >
Xn

K for only

X^

for all but finitely

n, (BC2) shows that (i) many n, we know that converges,

holds. Since(a.s.)

YX^

a.s.

Lemma 12.4 completesthe proof.

D
when

Results such as the Three-Series Theorem become powerful


conjunction
with

used

in

Kronecker's

Lemma

(Section

12.7).

12.6. Cesaro's
Suppose

Lemma
that

(bn)

is a

sequence of strictly positive real numbers


i^
fl

with

bn

^^f

^^^

^^^^ (^n)

convergent

sequence

of real

numbers:

^n

\342\200\224*' ^cx) G

R.

Then

1
^

\"

^\"*=i

X^(^*

\"

^k-i)vk

-^

Voo

(n

-> oo).

..(12.7) Here,
Proof.
bo

Chapter 12: Martingales


:\342\200\224 0.

bounded

in C?

Ill

Let

\302\243 > 0.

Choose

N such that
>
\342\200\224 whenever ^c\302\273 \302\243

^k

k >

N.

Then,

1
t?n \"

\"

liminf\342\200\224 Y^ibk \342\200\224 n\342\200\224\342\226\272cx^

i/.\342\200\224Ot'ib

ib=l

> liminf

<

\342\200\224

Y]{bk

6ib-i)t'ib +

-^\342\200\224^(t'cx.

\302\243) ^

this is true for every \302\243 Since > similar argument, limsup < Voc

0,
the

we have
result

liminf >
follows.

v^o;

and

since,

by a

12.7. Kronecker's
\342\226\272

Lemnia
denote

Again, with

let bn t

(6\342\200\236)

a sequence

oo. Let

be a (x\342\200\236)

of strictly positive real numbers sequence of real numbers, and define +


X2 -]

Sn

'-= Xi

h Xn.

Then

(E
Proof.

t:

^
\342\200\224-^g-) (\302\243-\"\302\253)

\342\200\242

Let

Un '-=

Ylk<n(^f^/^f^)^ ^^ ^^^^^oo
Un \342\200\224

'=

limwn

exists.

Then

Wn-1

Xn/bn.

Thus

ib=l

ib=l

Cesaro's

Lemma now shows that

Sn/bn -> Woo

Woo

0.

\342\226\241

118

Chapter

12: Martingales

hounded in C?

(12.8)..

12.8.

Strong

Law

under

variance

constraints

LEMMA
Let

(Wn)

be a

sequence

of independent random variables

such that

Then

n-i

X;ib<n ^k

-> 0,

a.s..
X^(VFn/n)

Proof. By
converges,

Kronecker's

a.s.

But

this

it is enough to prove that Lemma, is immediate from Theorem 12.2(a).

D
to

Note.
obtain

We

are

now

going

to see that
IID

the general

Strong Law for

a truncation techniqueenablesus RVs from the above lemma.

12.9. Kolmogorov's TruncationLemma


Suppose

that
where

-X'i,X2,...
Ed-X\"!)

as X,

< oo.

are IID Set fi

RVs

each

with

the same

distribution

:=

E(-X').

Define

'\"\"

\\0

if\\Xn\\>n.

Then

(i)

E(r\342\200\236)^//;

(ii)

P[Yn

= Xn

eventually] =

1;

(iii)
Proof

^n-2Var(r\342\200\236)<oo.

of (i).

Let

._(X y \"\342\200\242\"
\\0

if|X|<n,
if

|X|>n.
that

Then Z\342\200\236 has the E(Yn). But, as n

same
oo,

distribution
have

as Fn, so

in

particular,

E(Zn)

\342\200\224> we

Zn
SO, by

^ X,
= ^c.

\\Z\342\200\236\\ \\X\\,

<

(DOM),

-\302\273 EiZ\342\200\236) E(X)

..(12.10) Proof of

Chapter 12:

Martingalesbounded

in

C?

119

(ii).
CX)

We

have

n=l

X;P(r\342\200\236

J^\342\200\236)Y^Pi\\X\342\200\236\\

>n)

J2Pi\\X\\

> n)

= EX;i{m>n}
n=l

Y.

l<E(l^l)<oo,

l<n<|X|

SOthat by (BCl),

result (ii) holds.


have

Proof

of

(lii).

We

where, for

0<z<

oo,
f{z)

^
n>max(l,z)

n-2 <

2/max(l,z).

We

have

used

the fact

that,

for

>

1,

n?

?i(n 4-1)

\\n

n-\\-\\J
oo.

Hence

< ^n-2Var(rn) < 2E(|X|)


Kolmogorov's

12.10.
\342\226\272 \342\226\272

Strong

Law of

Large Numbers (SLLN)


Define
\" '

LetXi,X2,...heIID

RVswithE{\\Xk\\)<oo,'ik.

Sn '-= X\\
Then,

-\\-

X2

-{\342\226\240 -{\342\226\240 Xn-

with /i :=

E(Xit),
n~^

Vifc,

Sn

\342\200\224>

/^,

almost

surely.

Define F\342\200\236 as in Lemma Proof need only show that

12.9.
Yit

By property
~>

(ii) of

that

lemma,

we

n~^ ^

//,

a.s.

k<n

120 But

Chapter 12: Martingales

bounded

in C?

(12.10)..

(a)
where
tends
converges

n-' Y.^k= n-'


k<n

k<n

J^ ^^' J^ ^(^0 + n'' k<n


first term on the right-hand sideof (a) Lemma; and the secondalmost surely

Wk :=
to
/j,

Yk

by

E(Fib)- But, the and Cesaro's (12.9,i)


-

to

0 by

Lemma 12.8.
Law

Notes. The Strong


precise

is philosophically

satisfying
a

in that
number
if

it

gives

formulation
of

of
X\\

realizations

E{X) as 'the mean of We know from Exercise

large

E4.6 that
a.s..
for

Ed^l)

of independent = oo, then

lim sup 15n So, we have


Discussion
arrived

I/n

oo,

at

the best

possible result

the

IID

case.

a good result, it Even though we have achieved it does to be admitted that the truncation technique seems'ad hoc^: has - which sense of rightness not have the pure-mathematical elegance- the in the proof by ergodic theory (the latter is not and the martingale proof can be this adapted to book) both possess. However, each of the methods the others cannot tackle; and, in particular, classical cover situations which

of methods.

truncation

arguments

retain

great

im,portance.

Properly formulated, the argument which gave the result. which all of this chapter has so far relied, can yield much

Theorem
more.

12.2, on

12.11.

Doob

decomposition
'A

In the

following theorem, the statement that at 0' means of course that Aq = 0 and An E

is a

previsible
(n G N).

process null

mj^n-i

THEOREM
\342\226\272 \342\226\272(a)

Let

has a Doobdecom,position
(D)

(Xn

: n

E 2'^)

be an adaptedprocesswith
X

Xn

G C^,Wn.

Then X

= Xo

+ M

+A

..(12.12)

Chapter is a

12: Martingales

bounded in C?
is

121
process null at decomposition^

where

M
Moreover^

martingale
this

null at

0, and A
is unique
is

a previsible

0.
in

decomposition

the

sense

that if X

= Xo + M + A

modulo in distinguishability
another

such

then
i\342\200\236,Vn)-l. P(M\342\200\236=M\342\200\236,A\342\200\236

\342\226\272-(b)

is

processin the

a submartingale
sense

if and
that
<

only

if

the

process

A is

an increasing

P(A\342\200\236 A\342\200\236+i,Vn)

l.

Proof.
martingale

If X
and

as at (D), has a Doob decomposition we have, almost surely, A is previsible,


Xn-l\\rn-l)

then,

since

M is

E{Xn

E(A/\342\200\236 M\342\200\236_i|^n-l)

E(A.

An-llJ'n-l)

= 0+
Hence
n

~
{An

An-l).

(C)

^n

J2 E(^^^
k=l
A,

^^^-1

l-^^-l).

a-S-

and if we
The

use (c) to

define

we

obtain

the required

decomposition

of

X.

'submartingale'

result

(b) is now

obvious.

Remark.
submartingale previsible

The
in increasing

Doob-Meyer
process,

continuous

time is

decomposition, which expresses a sum of a local martingale and a a deep result which is the foundation stone
as the

for

stochastic-integral

theory.

12.12.
Let

Jensen's inequality
(a)

The angle-brackets process (M) be a martingale in C^ and null at


shows

0.

Then

the conditional form

of

that

M^ is
has

a submartingale.
a Doob

Thus AI (b)
where
M^

decomposition

(essentially unique): process, both

=N
is

i-A,

A''

and A
Notation.

being

a martingale null at 0.

and

A Aoo

is

a previsible

increasing

Define
A

a.s. \342\200\242=? Iinii4\342\200\236,

The process

is

often

written

(M).

122 Since E(M^)


(c)

Chapter 12: Martingales

bounded

in C?

(12.12)..

E(A\342\200\236),

we

see

that
and

M is bounded in

C? if

only

i/E(Aoo)

< oo.

It is important to
\342\226\272 (d)

note that
M^.^lJ^n-l)

An

- An-l

= E(M2 -

E[(Mn

- Mn-lflJ'n-ll

of (M)oo 12.13. Relating convergence of M to finiteness in C^ and null at 0. Define A := (M). Again let M be a martingale

(More

strictly,

let

be

'a version

of (M).)

THEOREM
\342\226\272(a) limM\342\200\236(u;) n

exists

for

almost

every

u;

for

which

Aoo(^)

< oo.
that

\342\226\272(b)

Suppose

that
Hj

M has

uniformly bounded incrementsin


alm^ost M\342\200\236_i(u;)|

for

some

in

\\Mn{uj)

<

K,

Vn^iv.

Then Aoo{^)
Remark.
Theorem

< oo for

every

lj for

which
a very

limM\342\200\236(c<;)

exists.

This is
12.2.

obviously an extensionit is
G Z+

and

substantial

one

of

Proof

of (a).

Because A

is

previsible,

immediate that for


: An^i

every

fc

G N,

S{k)
defines

:=

inf

{n

> k}
is

a stopping

time

ible because

for B

B^

S{k). Moreover, the n G N, and

stopped processA^^^^

previs-

{An^S(k) e
where

B} = Fi U

F2,

n-l

Fi :=

r=0

= U {S{k) r; Ar
G B}

\342\202\254 \342\202\254 J'n-l, B}

F2 :=
Since

{An

n {S{k)

< n

- 1}^G
A)^W

J^n-i-

(M^W)2
is

- A^W

= (M2 -

a martingale,

A^(^^ is
(c)

bounded by
M\342\200\236A5(*)

= A^^'^K we now see that (M*^^*^) the process However, so that k, by (12.12,c), M-^^*) is boundedin C^ and exists

lim

almost

surely.

..(12.14) However,

Chapter 12:

Martingalesbounded

in

C?

123

(d)
Result

{Aoo <
now follows

OO}

\\]{S(h)
k

= OO}.

(a)

on combining (c) and (d).


that

\342\226\241

Proof

o/(b).

Suppose

P{Aoo =
Then for
(e)

OO,

SUp|Mn|

<

CX))

>

0.

some c > 0,
P(T(c)

= OO,

Aoo =

oo) > 0,

where

T{c)

is the

stopping

time:

T{c):=mi{r:\\Mr\\
Now,

> c}.

and

M^(^)

is bounded

by

c-\\-

K.

Thus

(f)

EATic)An<(c-^K)\\

Vn.

But (MON)
Remark.

showsthat
^

(e)

and

(f) are
we

incompatible.
able

Result (b) follows.


previsibility to

In the

the jump
^T{c)

proof of
As{k)-i

(a),

were

to use

make
jump

increments.

Mt{c)-i which is why


trivial

As(k)

irrelevant.
we

We could not
the

do this

for

the

needed

assumption

about bounded

12.14.
Let is

'Strong

Law'

for martingales
at

in

\302\243^

be a

a bounded

martingale in C^ and null previsible process,

0, and

let A =

(M). Since(1+ A)~^

\\<k<n

defines

a martingale
E[(H^\342\200\236

W. Moreover, \\rn-i]

since {1-{An)
An)-\\A\342\200\236 A\342\200\236_i)-i-(l

is

J^n-i

measurable,

W^.^f

= (1 +
<(l

A\342\200\236_i)

A\342\200\236)-i,

a.s.

124

Chapter
see

12: Martingales

bounded in C?
a.s..

(12.H)..
Kronecker's

We

that

(VF)oo <

1, a.s.,

so that

limT1^\342\200\236 exists,

Lemma

now shows that


\342\226\272 Mn/An \342\226\272(a) \342\200\224^ 0 a.s.

on {Aoo

= oo}-

Lemmas 12.15. Levy's extensionofthe Borel-Cantelli

THEOREM

Suppose

that for n G N, Zn :=

G J^n^\342\200\236

Define
Ek{k

y2 ^^k l<k<n

= number of
and

< n)

which occur.

Define ik:=P{Ek\\rk-i)y

l<k<n

Then,

almost

surely,
<

(a)

(Yoo

oo)

=> (Zoo
=^

< oo),
^ 1).
it follows that

(b) (Foe= oo)


Remarks,
Yoo

(Zn/Vn

(i) SinceE^it
a.s.

P{Ek),

<

oo,
Let

(BCl)
{En : n

therefore follows. be a
and

if Y^P{Ek)

< oo, then

(ii)
P{Ek), Proof.

G N)

sequenceof
define (b).
\342\200\224

with some
a.s., Let

triple (fi, ^, P), and (BC2) follows from


M be
of the
An

events associated independent = = \342\200\242 \342\200\242 J^n \302\243^k g{Ei , \302\243\"2, \342\200\242, \302\243'n). Then

the martingale Z
:= {M)n

F,

so that

Z =

M + F is the

Doob

decomposition

submartingale Z. Then (you =

check!)

X^ 6(1
k<n

- 6) <
exists,

Yn,

a.s.

If^oo
are

< 00,

then

A<x)

<
null

00 and
u;-set'

lim

Afn

so that

Zoo is

finite.
trivial

(W^e

skipping

'except

for a
Aoo

statements

now.)

If Foo = Zn/Yn -> 1.

00 and
and

<

00

then

lim Mn

exists and
\342\200\224^so

it

is

that

If

Yoo

\342\200\224 00

Aoo

^1. Mn/Yn -^ 0 and Z\342\200\236/r\342\200\236

00, then Mn/An

0,

that,

a fortiori,
D

..(12.16)

Chapter 12:

Martingalesbounded

in

C?

125

12.16.

Comments
few

just how powerful the use of (M) to as one can obtain the conditional study of one can obtain version Theorem 12.15 the Borel-Cantelli Lemmas, conditional versions of the Three-Series Theorem etc. But a whole new world is opened In the continuous-time up: see Neveu (1975),for example. case,

The last

sections

have

indicated

M is likely

to be. In the sameway

things are much morestriking


(1987).

still.

See,

for example,

Rogers and

Williams

Chapter

13

Uniform

Integrability

We

have

already

seen
full

a number
we
In

To derive
sufficient concept

the

Theorem. Convergence
condition required

benefit,

of nice applicationsof martingale theory. something better than the DominatedTheorem 13.7 gives a necessary and particular,
need

for

a sequence
links

of

RVs

to

converge

on

The \302\243^. of

new randomi

is that

of a

uniformly integrable (UI)


perfectly

family

variables.

This concept
martingales.

with

conditional

expectations

and

hence with
The

examiners and others: modes of convergence. use of the Upcrossing Our Lemma has meant that this topic does not feature large in the main text of
this

appendix

to this chapter

of contains a discussion

that

topic

loved by

book.

13.1.

An ^absolute

continuity'
X E
that

property
P).

LEMMA
\342\226\272(a)

Suppose

that
such

C^ =
for

a 6>0
Proof
sequence

\302\243^(Q, J^,

Then,

given e
that

eT,

P{F) < 6 implies


some

> 0, there exists


E(\\X\\]F)

< e.
find

If the conclusionis false, then, for (Fn) of elements of J-'such that


P(Fn)<2-\"

Sq

>

0, we

can

and

E(|X|;Fn)
P{H)

>

\302\243o.

Let

Fatou Lemma (5.4,b)shows

:= limsupFn.

Then (BCl)
that

showsthat

= 0,

but the

'Reverse'

Ei\\X\\;H)>eo;

and

we have

arrived at the required contradiction.

126

..(13.3)
Corollary

Chapter

13:

Uniform

IntegrabilHy

127

(b)

Supposethai X e
such

C^

and

that

e >

Q. Then there

existsK

in

[0,oo)

that

E{\\X\\;\\X\\>K)<e.

Proof. Let

S be as in

Lemma

(a).

Since
>

KP{\\X\\

K)

< E(|X|),

we can

chooseK such

that

P{\\X\\

> K)

< 6.

13.2. Definition.UI
\342\226\272 \342\226\272

family

A
if

class
given

C of random e > 0, there

variables is calleduniformly integrable(UI) exists K in [0,oo) such that


>K)<e,

E(|X|;|J^|

WXeC,
(with

We note that every X E C,

for

such

a class

C, we

have

Ki

relating

to

= \302\243

1) for

E{\\X\\)

E{\\X\\;\\X\\>K\\)

+ E{\\X\\;\\X\\<K^)

Thus, a
It is

UIfamily
not true

is
that

bounded
a family
=

in C^. bounded
([0,

in C^
l],Leb).

is UI.
Let

Example. Take(Q,^,P)
Then Ed^nl) =
iiT >

l],i?[0,

^n=(0,n-i),
0, we have for

Xn^nlE^.
is bounded in

1,

Vn,

so

that

{Xn)

O. However, for

any

> A\",

E{\\Xn\\\\\\Xn\\>K)^nP{En)

= l, E{Xn) />
for
variables

so that (Xn) is not UI. Here, Xn

~>

0, but

0.
UI property

13.3. Two simplesufficient


\342\226\272(a)

conditions

the

Suppose

thatC

is a

for somep > 1; thus,

class of random,
for

which

is bounded in

C^

some

A G [0,

oo),

E(|X|P)<A,

VXeC.

128
Then C

Chapter

13:

Uniform

Integrability

(13.3)..

is UI.
then

Proof. Uv>K>0,
X

v <

K^'^vP

(obviously!).

Hence, for K) <

K > 0 and

E C,

we have

E(|X|; |X| > K)

<

K^-^E{\\X\\P;

\\X\\ >

K^-^A,
D

The
(b)

result

follows.

Suppose
an

that C

is a class of
<

random

variables

which

is dom,inated

by

integrable

non-negative
\\X{u;)\\

variable
WX

Y: eC

y(u;),

and

E(Y) < oc.

Then

is

UI.
makes

Note. It is

precisely this which


obvious

(DOM)

work
E

for our

(fi, J^,P).

Proof.It is
and now

that,

for

K >

0 and
K)

C^

Ei\\X\\;\\X\\>
it

<EiY;Y

> K),

is only

necessary

to apply

(13.1,b)toy.

13.4. UI propertyofconditional expectations


The

mean

reason

that
See

the UI
Exercise

is the following.

property
E13.3

fits

in

so well

for an

important extension.

with martingale

theory

THEOREM
\342\226\272 \342\226\272

Let

e C^.

Then the class

: g a sub-(7-algebra {E{X\\g)
is uniformity

of

J^}

integrable.

in question

Note. Becauseof the business of versions, a formal descriptionof the class C would be as follows: y G C if and only if for some sub-cr-algebra of y is a version of E{X\\Q). ^ ^,
Proof.

Let e

> 0

be given. Choose 6 > 0 such


P(F) < 6 impliesthat

that,

for < e.

F e

J^,

E{\\X\\;F)

,.(13,5)
Choose

Chapter 13:
K so that

Uniform

Integrability

129

A'-^EdA'D < S. let ^ be a sub-cr-algebra of J^ and let Y Now Jensen's inequality, By

be

any

version

of E{X\\Q).

(a)
Hence

|F|<E(|X||a),
E(|r|)

a.s.

< E(|J5f

|) and

ii:p(|y|>JO<E(|r|)<E(|x|), so that
But

p(|y| >
> K}
G

A')

<

s.
definition

{|F|

G^

so

that,

from (a)

and the

of

conditional

expectation,

E(|r|;|F|>A')<E(|X|;|F|>/r)<\302\243.

Note,
just

Now

you

can

see why

the

result

(13.1,b)

we needed the moresubtle result (13.1,a), not which has a simplerproof.

13.5. Convergence in probability


Let

(Xn)

be a

sequence

of random

variables,and let X bea random

variable.

We

say that
Xn
\342\200\224> X in

\342\226\272 \342\226\272

probability

if for

every

\302\243 > 0,

P(\\Xn

- X| > \302\243) ->

0 as

n ->

oo.

LEMMA
\342\226\272

If

Xn

\342\200\224^ X almost

surely,
Xn

then
\342\200\224* X in

probability.

Proof
Reverse

Suppose that
Fatou

Xn

\342\200\224> X almost

surely

and that e

> 0. Then by

the

Lemma
-

2.6(b) for sets,


X\\ >
\302\243, i.o.)

0 = P{\\Xn

P(limsup{|Xn

X\\

>

e})

>limsupP(|Xn--Y|

>e),

and the result is proved.

ISO
Note.
various

Chapter 13: Uniform


As already
modes

IntegrabilHy

(13.5)..

of convergence

between mentioned, a discussionof the relationships to this the in be found chapter. appendix may

13.6.
We

Elementary
restate of

proof
Bounded

of (BDD)
Convergence Theorem, but in probability' rather than

the

under the weaker

hypothesis convergence'.

'convergence

'almost sure

THEOREM

(BDD)

Let {Xn) he
Xn

a sequenceof RVs,
probability
u)

and

let

he a RV.

\342\200\224*\342\226\240 X in

and that for


\\Xn{0j)\\

some K in [0,oc), we
< K.

Suppose that
have

for

every n and

Then
E(|X\342\200\236 that Proof. Let us check

X\\)

^ 0.
fc

P(|X|

< K)

= I.
-

Indeed,for
Xn\\ >

G N,

Pd^l
so

> K

+ fc-i) < P{\\X

fc-i),

Vn,

that

P(\\X\\ >

K +
P(|J^|

fc-i) = 0. Thus
> A')

P(U{|X|
k that

>K + k-'})^

0.

Let

\302\243 > 0

be given.

Choose no such
P{\\Xn

X\\

>i\302\243)

<

when n >

hq.

Then,

for

> no, -

\302\243{\\Xn

X\\) =

E{\\Xn

X\\;

\\Xn

X\\ >ie)

-f

E(|X\342\200\236

X|;

|X\342\200\236

X\\ <ie)

<2XP(|X\342\200\236-X|>l\302\243)-fl\302\243<\302\243.

The

proof

is finished.

This proof shows(much as doesthat of the Weierstrass approximation theorem) that convergence in probabilityis a natural concept.

,.(13.7)
13.7. A necessary
THEOREM

Chapter

IS:

Uniform

Integrability

131

and sufficient conditionfor O


sequence in C^, and let
\342\200\224 ^{\\Xn

convergence
Then

\342\226\272 \342\226\272

Let

{Xn)

be a
are

X ^

C^,

Xn

\342\200\224^ in X two

OJ

equivalently
\342\200\224^ in X

X\\)

-^

^, if and only

if the following

conditions

satisfied: probability,

(i) Xn

(ii) the sequence

(Xn) is

UL

Remarks. It is of course the 'if part of the it must improve the result is 'best possible',

theorem on

which

(DOM)

triple;
Proof
K
of

and,

of

course,

result Suppose
function

13.3(b) makes that


(pK

this explicit.

is useful. Since for our (fi, J^,P)

Hf^

part.

e [0,

oc), define a

conditions (i)
: R \342\200\224^ [~-^5

and (ii) are satisfied.


^ follows:

For

^]

( K K
<Pk{x)

if X > K, iix> K,

:=

< x

if

\\x\\ X

< <

K,

l-K Let
\302\243 > 0

if
of

-K (Xn)

we can choose K so that


E{\\ipK{Xn)

be given.

By the
-

UI property
<

the

sequence

and (13.1,b),

Xn\\}

|,

Vn;

E{\\ipK{X)

X\\}

<

|.

But, since
probability;

\\ipK{x)

by

and

(pK{y)\\ < in (BDD)

choose

no such

that, for n

> no,

\\x y|, we see that (pK(Xn) ~> ^k(X) in the form of the preceding section,we can

E{\\^K{Xn)-'PKiX)\\}<'-. The
triangle

inequality

therefore

implies

that, for n

> no,

E{\\Xn-X\\)<e, and the proof is complete.

D
that Xn -^

N Choose

Proofof

'only such

if
that

part.

Suppose

\\n

O,

Let

e >

0 be given.

n>N

=>

E{\\Xn-X\\)<e/2,

132
By (13.1,a),

Chapter IS:
we can

Uniform

Integrability

(13.7)..

choose^

>

0 such

that whenever
(l<n<iV),

P(F) < 6, we

have

E(|,Y\342\200\236|;F)<\302\243

E(|A'|;F)<e/2.

Since

(X\342\200\236)

is

bounded

in C^,
K-^

we can r

chooseK such
< 6.

that

supE(\\Xr\\)

Then

for

> iV,

we have

Pd^nl > K)

<6
+

and

E(\\X\342\200\236\\;\\Xn\\>K)

< E(|X|;

> A') |X\342\200\236|

Ei\\X

< e. X\342\200\236\\)

For n

<

iV,

we

have

> P(|A'\342\200\236|

K)

< S

and

E{\\Xn\\;\\X\342\200\236\\>K)<e.

Hence

{Xn)

is a

UI family.

Since eP{\\Xnit is
X\\

>e)<

E(|X\342\200\236

X\\) =

||A\342\200\236

A||i,

clear

that

Xn

\342\200\224^ X in

probability.

Chapter

14

UI

Martingales

14.0.Introduction
The

first

part

of this

chapter examines what


the

happens

when

uniform

bility is

we also obtain such as Levy's 'Upward' and 'Downward'Theorems, Law of Large and of the Strong new proofs of the Kolmogorov 0-1 Law

combined with

martingale

property.

In addition

to new results

integra-

at Section 14.6) is concerned result impliesin particular This Inequality. martingale = in C^ is dominated for p > I (but not for a bounded that p martingale 1) an of hence element and both almost and in C^. The C^ surely by converges to Kakutani's is also used Theorem on prove SubmartingaleInequality in of an illustration and, bounds, product-formmartingales exponential to prove a very special case of the Law of the Iterated Logarithm.
of

Numbers. Thesecond part


Doob's

the

chapter

(beginning

with

Sub

The

likelihoodratio
The

Radon-Nikodym
explained.

theorem

is then

proved, and

to its relevance

topic

theory 14.1.

and UI

of optional sampling, important for continuous-parameter in other contexts, is coveredin the appendix to this chapter.

martingales

Let M be a
Since

(fi,^, {^n},
P)

UI martingale,so that
and : n (M\342\200\236

is a

G Z+)

is a
13.7,

UI family.
(by

martingale
(13.2)),

relative to
and
that

our set-up
:= limM^
in

existsalmost
We now

is UI,
surely.

M is boundedin C^
By

so Mqo

Theorem

it is

also true

\342\200\224> M\342\200\236 Mqo

E(|M\342\200\236-Moo|)->0.

prove that

= M\342\200\236
yields

E{Moo\\^n),

a.s.

For F

J^n,

and

r >

n, the

martingale property
(*)

E{Mr;F)

= E(Mn;F).

133

134
But

Chapter 14: UIMartingales


\\EiMr;F)-E{Moo;F)\\<Ei\\Mr-Moo\\;F)

(H-V-

<E(\\Mr-Moo\\).

Hence, on lettingr

-+oo

in

(\342\231\246),

we

obtain

E(Moo;F)
We

E(M\342\200\236;F).

have

proved

the following

result.

THEOREM
\342\226\272 \342\226\272

Let

be a\\JI

martingale.
Moo

Then
a.s.

'=

exists limM\342\200\236

and

in C^.

Moreover, for

every

n,

Mn

= E{Moo\\^n),

a.s..
be

The obvious extensionto UI supermartingales may


14.2.
\342\226\272 \342\226\272

proved

similarly.

Levy's
Let

'Upward'
\302\243^(Q, J^,P),

Theorem
and

^ e

define

Mn :=

E{^\\J^n),

a.s.

Then

is a

UI martingale and

almost surely

and

in C^.

because of the Proof. We know that M is a martingale We know from Theorem 13.4 that is UI. Hence Moo M a.s. and in \302\243^,and it remains Moo only to prove that
r? :=

Tower =

\342\200\242= lim Mn \"Ht

Property. exists
where

a.s.,

E{i\\J'oo).

Without
consider

loss of
measures
Qi(F):=E(,7;F),

the

generality, we may (and Qi and Q2 on (fl, J^oo),


Ql^{F)
=

do)

assume

that ^

> 0.

Now

where

= E{Moo; F),
the

F e TooProperty,

If

e Tn,

then since

by E(\302\2737|^\342\200\236) E{i\\:F\342\200\236)

Tower

E(\302\273?; F)

E(M\342\200\236; F)

= E(Moo;

F),

..(14'S)
the

Chapter 14: UIMartingales


equality

135

second

having been proved in


and

Section 14.1. Thus


hence

Qi

and

Q2

agree on

the 7r-system (algebra!) |J J^n,


are
by
J^j\302\251 measurable;

they

agree on J^joMoo may


every

Both ry and Moo to be ^00 measurable

more

strictly,

be taken
Thus,

defining

Moo '-= limsup


>

Mn

for

u.

F :=

{cj : ry

Moo}

G J^oo,

and since Qi(F)

Q2(i^),

E(r7-Moo;r7

> Moo)

= 0.
ry)

Hence P{'q> Moo)

0, and

similarly

P(Moo >

0.

14.3.
Recall

Martingale
the

proof of Kolmogorov's 0-1 law

result.

THEOREM

Let -X'i,-X'2? he a sequence \342\200\242 \342\200\242 \342\200\242

of independent RVs.

Define

7^ := Cr(-X'n-|-l,-X'n-f2,-\342\200\242
Then ifFeT,
Proof T]

\342\200\242)\302\273 ^\342\200\242=11^-

P(F) = 0 or 1.

Define

J='n :=

G b^oo,

Levy's

a{Xi,X2,... ,Xn). Let Upward Theorem shows that


TJ

F eT,

and

let

tj :=

I^.

Since

E{tj\\J^oo)
is

= lim

E(r7|J^\342\200\236),a.s.
and

However,

for each

n,

r/

T\342\200\236 measurable,

hence

(see Remark

independentof

below) is

Hence J^\342\200\236.

by

(9.7,k),
=

E(7,|JF\342\200\236)

E(r?) =
ry

P(F),

a.s.
the values

Hencerj
follows.

P(i^),

a.s.;

and since

only

takes

0 and

1, the result
D

Remark.

the earlier

we have cheated to someextent in into the used in the proof martingalestatements


Of course,

building

parts

of

proof

just given.

136
14.4.
\342\226\272 \342\226\272

Chapter 14: UIMartingales


Levy's
Suppose

(U'4)\"

'Downward'
that

Theorem P) is a
probability

is a collection of

(Jl, J^,

triple,

and

that

{Q-n : n G N}

sub-a-algebras

of
\342\200\242 \342\200\242 \342\200\242

T such that
^-n \302\243

Q-oo := fl
k

^-*
and

^-(-+1)

\342\200\242 \342\200\242 \342\200\242 C

a-i.

Let 7 G

\302\243^(Q, J^,

P)

define

M.n :=
Then

E(7|a-n).
exists a.s. and in C^

M-oo

'=

lim M_n

and

(*)
Proof.

A/_oo

E(7|^_oo),

a.s.

The Upcrossing Lemmaapplied to

the martingale

canbe
13.4,

used

exactly

as in

to show that

lim M_n

the proof of Doob'sForward exists a.s. The uniform-integrability


in C^.
M_oo

Convergence result,

Theorem Theorem

shows that That (*)

limM_\342\200\236 exists

holds (if you likewith


reasoning:
E(7;G)

'-= lim

sup M_n G m^_oo)

follows

by

now-familiar

for G G Q-oo \302\243 Q-r-,


= E(M_.;G),

and now let

r t oo.
Law

14.5. Martingaleproofofthe
Recall

Strong

the result

as bonus). (but add C^ convergence


be

THEOREM
Let

Xi,X2,...

IID

RVs,

with

common value

E{\\Xk\\)

<

oo,Vifc.

Let n be the

ofE{Xn)'

Write

Then

n'^^Sn

\342\200\224^ cl-s.

/J-,

and

in C^.

,.(14^6)
Proof.

Chapter 14: UIMartingales

137

Define
:= ^_\342\200\236
\342\200\242 \342\200\242 ^-oo \342\200\242)? Cr(5\342\200\236,5n-fl,5n+2? \342\200\242*= I ]^-n-

We know from

Section 9.11 that


E{Xi\\g-n)

= n-'Sn.
in

a.s.
definiteness,

Hence L := lim sup n\"\"^5n

a.s. and lim n~^5nexists


for

\302\243^.For fc,

define

L :=

every

u;. Then L =
..

for each
Xk-\\-l
H

hm sup

-^ Xk-^n

n
\342\200\242 \342\200\242 By Kolmogorov's \342\200\242)\342\200\242

so that L G mTk P{L = c) = 1 for

where
some

Tk = a(Xk-\\.i',Xk-\\.2,
c in

0-1 law,

R. But
E(n-i
deduced think

c = E{L)= lim
Exercise.

//. 5\342\200\236) C^ about

Explain how we could have Hint. RecallScheffe'sLemma 5.10,and

at convergence how to use it.

(12.10).

Remarks.
of

See

Meyer

the

results

given
0-1

variables, the Choquet-Deny theorem on for random walks on groups.

Hewitt-Savage

for important extensions and applications (1966) so far in this chapter. These extensions include:the on de theorem Finetti's random law, exchangeable
bounded

harmonic

functions

14.6. Doob's
THEOREM
\342\226\272 \342\226\272(a)

Submiartingale

Inequality

Let

be a

non-negative

submartingale.

Then, for c > 0,


>

CP [sup
k<n

Zk>c]

<

Zn; sup Zib (Zn;SUpZk>c)


\\

k<n

< E(Zn) ) <E{Zn)

Proof Let

F :=

{sup;t<n

^k

c}-

Then F

is a disjointunion

F = FoUFiU...UFn,

138
where Fo :=
Fk

Chapter

14:

UI Martingales

(U-^)-

{Zo> c},
{Zo
and

:=

< c} n
Z

{Zi <
on Fk.

c} n...
Hence,

{Zk-i

<c}n

{Zk >

c}.

Now, Fk e fk,

> c

EiZn;Fk)>EiZk;Fk)>cP{Fk). Summingover k
The
following.

now

yields

the result.

\342\226\241

main

reason

for the

usefulness of

the

above

theorem

is the

LEMMA
\342\226\272(b)

If

is a

martingale, c

is a convex

function,

and

E|c(M\342\200\236)|

<

cx)^ Vn^

then

c(M) is a

submartingale.
of Jensen's

Proof

Apply

the

conditional

form

inequality in Table 9.7.

Kolniogorov's
\342\226\272

inequality

Let

(Xn

: n
:=

E N) be
Var(Xit).

a sequenceof
Write

independent

zero-mean

RVs in

C?.

Define 0%

Sn :=

Xi +

\342\200\242 \342\200\242 \342\200\242

Xn,

Vn :=

Var(5\342\200\236)

J2

^^

k=i

Then, for c>

0,

< c2pfsup|5it|>c) /
\\k<n

Vn.

Proof

= We know that if we set ^\342\200\236 a(Xi,


Now

-X'2,

\342\200\242 \342\200\242 \342\200\242, ^n),

then

S =

martingale. Note.
Kolmogorov's

apply

the

Submartingale
was

Inequality

to 5^.

(Sn) is a

Kolmogorov's

inequality

the

Three-Series

Theorem

key step in the and Strong Law.

original proofs of

14.7. Law of the IteratedLogarithin: case special


Let

Submartingale Inequality may be used bounds to case of Kolmogorov's exponential prove a very special
us

see

how the

via

so-called

Law of the

..(14.7)
Iterated

Chapter 14: UIMartingales


Logarithm

139

which is
this

to take

a quick look at

described in SectionA4.1.(You would do it is not needed later.) even though proof

well

THEOREM Let
(Xn

: n

e N)
of

be IID

RVs

each

with

the standard

normal N(0,1)

distribution

mean

0 and

variance
X2

1. Define
+
\342\200\242 \342\200\242 \342\200\242

5n := Xi +
Then,

Xn.

almost surely,

limsup

(2nloglogn)2
write
n

= 1.

Proof

Throughout

the

proof,

we shall

h{n) := (2nloglogn)2,
(It will be understood that this is necessary.) e, when

>

3.

than integers occurringin the proof are greater

Step 1:

An

exponential

bound.

Define

a martingale relative to

{J^n}-It is well

:= cr(-X'i,-X'2, J^\342\200\236 known

\342\200\242 \342\200\242 \342\200\242

yXn)-

Then

S is

that

for ^ G R, n G N,

The function x

\302\273-> e^^

is convex

on R, so that

e^^\" is a submartingale

and, by
\342\226\272 \342\226\272

the

Submartingale

Inequality, >

we have,
6^^*

for ^ > 0,
< e-^-E

5, (sup,<\342\200\236

c)

= P

(sup,<\342\200\236

>

e'^)

(e^^-) .

This is a type
In

of exponential

bound

much used in

modern probabilitytheory.

our

special

case, we

have

> c) < P (( sup 5it>c^ <e-^^e^^'\", supSk


\\k<n

1^0
and

Chapter
for

14: UI Martingales
c/n,

(U-'^)obtain

c >

0, choosing

the best 6, namely


P (sup

we

(a)

Sk> A <e-^^'/\". K be a realnumber


to 1.)
with

when (We are interestedin cases

Step

2:

Obtaining

an upper

bound. Let
K

> 1.

is close

:= Kh{K^~^). Choose c\342\200\236

Then

( sup 5,
\\k<K\302\273

>

cr^

< J

exp(-4/2A--)

= (n

- l)-^(log A^^^.

The

First

large

Lemma therefore showsthat, Borel-Cantelli n (all n > no(u;))we have for A'\"\"-^ < k < A\", Sk <
for
A\"

almost surely,

for

all

< Kh{ky sup Sk<Cn = Kh{K''-'^)


limsup/i(fc)~^5it
k

Hence,

>

1,

<

AT,

a.s.

By

taking

a sequence

of A-values converging down to 1, we


limsup/i(fc)\"\"^5jk
k

obtain

< 1,

a.s.

interested in caseswhen
course,

Step

3:

Obtaining

a lower
N

bound. Let N be an
is very more

integer

with

N >

e will
when

be small in

the cases which


_

large.)

Let e
interest

be a number in (0,1). (Of


us.)

1. (We are

Write

S{r) for

Sr,

etc.,

typographically

convenient.

For n G N, define
-

the

event

Fn

:=

{5(iV\"+l)

^(^n)

^ (J _ \302\243)ft(iV\"+l

iV\}.

Then (see

Proposition 14.S(b)below),
P(F\342\200\236)

$(y) >

(27r)-^(y + y-')-' exp(-yV2),


-

where

t/ =

(1

\302\243){21oglog(iV\"+i

iV\}^

so that Thus, ignoring 'logarithmic terms', P(Fn)is roughly (nlogN)-^^~^^^ Fn (n G N) are clearly independent^ so ]CP(-^n) = <^- However, <Ae events

..(14-8)
that infinitely

Chapter

14:

UI Martingales

I4I
occur. F\342\200\236

(BC2) shows that, many n,

almost surely,

infinitely

many

Thus,

for

5(;V\"+i)> (1But, by Step 2,


many

\302\243)/i(iV\"+i

iV\") + large

SiN\"\.
n, so that

S{N'') >

-2/i(A^\")

for

all

for

infinitely

n, we have

5(7V\"+i) ^ It now follows


that

(1 _ \302\243)/i(Arn+i

AT\")

2/i(A^\.

limsup/i(fc)-i5ib
ib

> limsuph(iV\"+i)-i5(iV\"+i)
n

>(l-\302\243)(l-iV-i)2

-2iV-2.

(You should
obvious.

check that Hhe logarithmicterms do disappear'.)The restis


n

14.8.
We

A standard
used

estimate

on the normaldistribution previoussection.

part

of the

following result in the

Proposition
Suppose

that

has

the standard
=

normal distribution, so =

that,

for

P(X>x)
where

l-^x)

<p{y)dy

Then,

for

x >

0,

(a)
(b)

P{X > x)<


P(X

x~V(a^),

> x)>{x
=

+ x-^y^ifix).

Proof Let x > 0. Since


V?(x)

(p'{y)

\342\200\224y^{y)^ /\342\200\242oo

/\302\273oo

JX

y<^{y)dy >x
J X

9{y)dy,

yielding

(a).

142

Chapter

14: UI Martingales
V'^Mv),

(I4.8)..

Since(y-V(y))' = \"(1 +
/\342\200\242oo

yoo

yielding

(b).

HI

14.9.
Obtaining

Remarks
exponential

on exponential
bounds
(1984),

is

deviations
an

- seeVaradhan
number

bounds; large-deviation theory of large related to the very powerful theory


Deuschel
application.

and

Stroock
See

ever-growing

of fields of

(1989)

which

has

Ellis

(1985).

of context You can study exponentialboundsin the very specific e and in Teicher Garsia Neveu martingales (1978), (1975), Chow (1973), tc.
Much

of

the

literature

is concerned

with

obtaining

exponential

bounds

a sensebest possible. results such as the 'elementary' However, in Exercise E14.1 numerous are useful in very Azuma-HoefFdinginequality to for the combinatorics in See Bollobas applications. example applications
which are in
(1987).

14.10.
Look

A consequence
at

the

statement
we

to see

of Holder's inequality of Doob's C^ inequality in the next sectionin order

where

are

going.

LEMMA

Suppose that X and Y are


cP{X >c)< Then, for

non-negative

RVs

such

that
c>

E(Y;X > c)for every


4- q^^

0.

p>I

and

p~^

= 1, we
<

have

\\\\X\\\\p

q\\\\Y\\\\,.

Proof

We obviously

have
c)dc

(*)

L:=

Jc=0

pc^-'PiX >

<

/ Jc=0

i?cP-2\302\243(y;X

>

c)dc

=: R,

with Using Fubini'sTheorem

non-negative

integrands,

we obtain

..(14-11)
L =

Chapter 14: UI Martingales


PC^-'dc
j\302\260\302\260(j

US

l{x>c}{^)P{d^))

-L Q \\Jc=0

Exactly similarly,

we

find

that

R =
We

E{qXP-'^Y).

apply

Holder's

inequality

to conclude that
II,.

(**)
Suppose

< qWYUX\"-' E(X'') < EiqXP-'Y)


{p
\\\\Y\\\\p \342\200\224 =

that

<

oo,

and

suppose

for now that

||X||p <

cxd

also.

Then

since

l)q

p, we

have
\\\\X^-'\\\\,

EiXni,

SO (**) implies that H^Hp true for X An. remains


follows

Hence

< 5||y||p. For generalX, notethat the hypothesis < ^H^Hp for all n, and the result \\\\X A n||p
D

using

(MON).

14.11.
THEOREM
\342\226\272 \342\226\272(a)

Doob's

\302\243P inequality

Let

p >

1 and

define q so thatp\"^
bounded

-{-q~^

1. Let

Z be a

non-negative

submartingale

in C^,
Z*

and define (this


:=

is standard

notation)

sup Zk.

Then Z* G
(*)

C^,

and

indeed

||Z*t<5Sup||Z,||p. r
by the element and in IIP and

The submartingaleZ is therefore dominated CP. As Ti \342\200\224> OO. Z(Xi \342\200\242\342\200\224 exists a.s. lim2^^

Z* of

||Zoo||p = sup||Zr|U=Tlim||Z.||p.

144
\342\226\272 \342\226\272(b)

Chapter

14:

UI Martingales

(I4.II)..

If

Z is :=

Mqo

of the form \\M\\j lim Mn exists

where

is a

a.s. and in fy,

martingale houndedin C^, then and Zoo = |-^cx)L of course

a.s.

Proof For n
Inequality

Z\"^,

define

Z* :=

14.6(a)

and

Lemma

14.10 we

sup/^^n^k-

From Doob's Submartingale

see that
r

l|-^:ilp<9ll^n||p<<?SUp||Z.||p.

Property
is a ( \342\200\224Z)

Theorem.Since (*) now follows from the Monotone-Convergence we know in and in therefore bounded \302\243^, C^, supermartingale
exists

that

Zoo

\342\200\242= limZn

a.s.

However,

|Zn-Z|^<(2Z*)PG\302\243^

so
IIZr

that

(DOM)

shows

that

Zn

\342\200\224> Z in

\302\243^.Jensen's

inequality

shows

that
D

IIJ) is non-decreasing

in r,

and all the rest is straightforward.

14.12.

Kakutani's
Let

theorem

on 'product'

martingales
RVs,

Xi,A'2,...

Define Mq :=

I,

be independent non-negative n G N, let for and,

each

of

mean

1.

Mn:=XiX2...Xn.
Then

is a

non-negative
Moo

martingale, so
\342\200\242= limMn

that

exists

a.s.

The following five


(i)

statew>entsare
Moo

equivalent:

E(Moo)

= 1,

(ii) Mn ->
0 <

in

O;

(iii) M
< 1,

is UI;

(iv) n\302\253n >


If one (then

0 where

:= E(Xj) \302\253\342\200\236

(v)E(l-\302\253n)<CX).
every

one)

of the

above five = 0)

statements fails

to

hold,

then

PiMoo

= 1. theorem is explained in

Remark.
Section

Something

of

the

significance

of this

14.17.

..(14.13)

Chapter
That

14: UI Martingales

145
0 > \302\253\342\200\236

Proof.
obvious.

an

<

I follows

from Jensen's
holds.

inequality. That
Then define

is

First, suppose that

statement

(iv)

11
(*)

= \302\261i_\302\2612____^_ Ar\342\200\236

Then

iV is

a martingale

for the same reasonthat


<l/(n\302\253*)'

is.

See (10.4,b). ^'

We

have

ENl=l/(aia2...anY
so that

<

N is

bounded in C^. By
<E

Doob's

C^

inequality,

E (sup\\Mn\\]

TsuplAT^p^ :=

<4supE|iV2|) < oo,


G
\302\243^. Hence

so

that

is dominated

properties (i)-(iii) hold.


Now

by M* when
=

\\Mn\\ sup\342\200\236

is UI

and

consider

the

case

is a

[][a\342\200\236

0. a.s.

Define But

non-negative

martingale,
Moo

exists limiV\342\200\236

forced to

concludethat

N as at (*). SinceN = since J][ a\342\200\236 0, we are (4.3). The

0? a-.s.
to

is proved.

The equivalence

of (iv) and (v) is known

us from

theorem

14.13. The
Martingale
Radon-Nikodym
We

Radon-Nikodym

theorem
intuitive

theory
with

yields an
theorem. a special

and

''constructive^

- proof of

the

We are

guided by Meyer(1966).

begin

case.

THEOREM
\342\226\272 \342\226\272(I)

Suppose

that

in

(Q,^,

P) is
T

probability

triple

in

which

T is

separable

that

= a{Fn
subsets

: n G N)
of Q,.

for some sequence of (F\342\200\236)


m,easure

Suppose that Q
to

is a finite
P in

on (fi,^)

which is

absolutely continuous relative

that

(a)

for

FeT,

P(F) = 0

=^

Q(F)

= 0.

146
Then

Chapter 14:
there
i'n

UI Martingales
such

(14.13)..

exists X in

\302\243^(fi,^,P)

that

Q =

XP (see

Section

5.14)

that

Q(F)

/ XdP

= E(X; F),
of

VF

J\".

The

variable

X is
to

called a version
on

the

Radon-Nikodym

derivative
write

of Q

relative

(Q,

J^). Two such versionsagreea.s. We


= A

-r=r dP

on

y^,

a.s.

Remark.

Most of

we the cr-algebras

have

encountered

are separable.

(The

cr-algebra of Proof.
property
With

Lebesgue-measurable

subsets

of [0,1]

is not.)

the

method

of Section

13.1(a) in mind, you

can prove

that

(a) implies

that
there

(b)

> 0, given \302\243

exists S

> 0 such that,

for

G J^,

P(F) <S=^
Next,

Q(F) < e.
,Fn)possible

define

^n

\342\200\242= cr(Fi,F2,...

Then

for each

n,

J^n

consists

of the

2^^\"^

unions

of 'atoms'

of ^n, an atom A proper subset of A have the form

of

J^n

being

which

is again

an element of an element

J^n

such

that

0 is atom

the only
A will

of

^\342\200\236. (Each

HinH2n...nHny

where eachHi is eitherFi or F^.)


Define

a function

Xn : f2

\342\200\224>

[0, cx))

as follows:

if u

An,k,

then

^\"^'^^-\\Q(yl\342\200\236,fc)/P(A\342\200\236,A)

if

P(yl\342\200\236,,)>0.

Then

G C^{Q.,Tn,P) X\342\200\236

and
=

(c)

E(J^\342\200\236;F)

Q(F),

VFeJP-\342\200\236.

..(14,13)

Chapter

14:

UI Martingales

i^7

The variable

Xn
obvious

is the
from (J^n : n

obvious version
(c)

of dQ/dP on (Q,^n)relative is a martingale : n \342\202\254 Z+) this martingale is non-negative,


exists,

It is
to the

that

filtration

el'^),
Xoo

X = {Xn and since


limX\342\200\236,

'=

a.s. cx)) be

Let

\302\243 > 0,

choose

6 as

at (a), and let K

(0,

such that

Then
P{Xn

> K)

< K-'E{Xn)
> K)

= /<-'Q(n) < > K) < S.

S,

so that
E{Xn;Xn

= q(Xn

The

martingale

is therefore

UI, so that

Xn-^XmC\\ It
now

follows

from

(c) that

the measures
Cl{F)

Fh->E(J\\:;F) and F ^
agree

on

the

the proof

of uniqueness,which

7r-system IJ^n,

so
is

that

they

agree

on T,

All

that

remains

is

now

standard

for us.

Remark. The familiarity


emphasizes expectation

of

all

of the

the which

close

theorem ...

link between the Radon-Nikodym and conditional is made explicit in Section 14.14. Now for the next part of

arguments in the above

proof

the

(II)
P

The

assumption

that

T is

separable can

be

dropped

from

Part

I.

and

finite.
on

Once one has Part II, one can easily extend the result to the case when Q are cr-finite measures by partitioning fi into sets on which both are
the

Proving Part II of
the

theorem

is a

piece of
is a

'abstract nonsense' based


space, and in
You this

fact

that

on the role of sequential to take Part II for granted


Let
that

C^ (or,

more strictly, Z^)


convergence
and

metric

particular
well

skip

in metric spaces. the rem,ainder of

might section.

want

for

of Sep be the class of all separable sub-cr-algebras such that G G Sep, there exists X(; in \302\243^(Q,^,P)
dq/dP

J^. Part

I shows

= Xq ;

equivalently, E{Xg\\G) =

Q(G),

GeQ.

148
We

Chapter
are

14: UI Martingales there exists

(14,13)..
P)

going

to prove that

X in \302\243^(Q, J^, O

such

that

(d)

Xo-^Xm
given

in the sensethat

e >

0, there

exists K in Sep such that

if K;

C a G

Sep, then \\Xq -

X||i < e.

First,

we

note

that

it is
: g

enough to prove that


e Sep)

(e)

{Xq the

is Cauchyin C^
in

in

sense

that given

/C e > 0, thereexists \342\200\224

Sep

such that if

/C C

Qi

G Sep

for i = 1,2,

then

\\\\X(;^

-^(?2lli

'^

^\342\200\242

that Proof (e) implies (d). that if /CnQQie Sep for i

Suppose
=

that

(e)

holds.

Choose

G Sep /C\342\200\236

such

1,2,

then

\\\\Xo,-XoAi<^-^''^'^'

Let H(n) =
X

cr(/Ci,/C25

\342\200\242 \342\200\242 \342\200\242

^^n)-

Then

(see

the proof
indeed,

of (6.10,a)) the

limit

:=

lim-X'7^(\342\200\236)

exists

a.s.

and

in

and \302\243^,

\\\\X-Xn(n)\\\\i<2-\\

Set X have

:=

limsup-X'7^(\342\200\236)

for

definiteness.

For any

Q G Sep with

\"Hn we

\\\\Xc-Xnin)\\\\i<2-\\

Result (d) follows.

C ... of fC{0)C /C(1)

Proof

o/(e).

If (e)

is false, then (why?!)we


elements

can

find

> \302\243o

0 and

a sequence

of Sep

such that
>
\302\2430,

||^X:(n)
However,
filtration that

XfC{n-\\-l)\\\\l

Vn.

it is
(K{n))y

easily seen that


so that

{Xfc(n))

is a

UI martingale

relative to the
D

Xfc{n) converges in C^.

The contradiction establishes


show that for

(e)

is true.

Proof of
and

Part II
G ^,

of

the

theorem.

We need only

for

we have

X as at (d)

E(X;F) = Q(n

..(14.15)
Choose K
where

Chapter

14:

UI Martingales

149

suchthat
is the

for

/C C g? G Sep,

yX^; -X||i

< e.

Then a{K:, F) G
including

Sep,

(j{K^F)

K smallest a-algebra extending

and

F; and,

by a

familiar argument,

\\E{X;F)
The

Q(F)|
=

\\E{X-X,(K,F)\\F)\\

<\\\\X-X\342\200\236(K,F)h<e.

result

follows.

theoremand conditional 14.14. The Radon-Nikodym expectation that that ^ is a sub-cr-algebra Suppose (Q, ^, P) is a probabilitytriple,and of \302\243^(Q,^, element Then of T. Let X be a non-negative P).
Q(X):=E(X;G),

GeQ,

defines a
continuous

finite

measure

on
Q^

(Q,^).
so

relative

to P
Y

on
:=

that,

Moreover, Q is clearly by the Radon-Nikodym


on (Q,^).

absolutely

theorem, (a

version...)
c/Q/dP

exists

Now Y is

^-measurable, and

E(y;G) = Q(G)
Hence

E(X;G),

GeQ.

F is

a version of

the

conditional
Y

expectation
a.s.

of X

given G'-

\302\243{X\\g),

Remark.
between

The
martingale

right context
convergence,
geometry

theorem, etc., is the


14.15.

for appreciating the closeinter-relations conditional the Radon-Nikodym expectation,


of Banach

spaces.

Likelihood ratio, equivalent measures Let P and that Q be probability measures on (Q,J^) such Q is absolutely continuous relative to P, so that a version X of dQ/dP on J- exists. We that Y is (a version of) the likelihood ratio say of Q given P. Then P is to continuous relative if and if absolutely Q only P{X > 0) = 1, and then X~^ is a versionof c/P/c/Q. When each of P and Q is absolutely continuous relative to the other, then P and Q aresaidto be equivalent. Note it that then makes sense to define

/ JF
and

y/dPdq

:=

/ Jf

x'^dP

f {x-^)dq, Jf
what

FeT;
Kakutani

we

can

hope

for a

fuller understandingof

achieved

150
14.16.
Let

Chapter 14:
Likelihood
(Jl,

UI Martingales

(I4.I6)..

absolutely continuous relative to P with (fZ,^) function be a sub-cr-algebra of f. What ^-measurable = is of i t y on Q? Yes, course, yields dQ/dP E{X\\Q), E denoting P-expectation,

fy P) be which is

expectation a probability triple, and let Q be a probability


ratio
density

and conditional

measure

on

X.

Let Q

(modulo
for,

yet

again,

versions) with

E{Y;G) = E{X;G) of (fi,^), Hence, if {Tn} is a filtration (*)


form

Q(G)

for GeQ.

then

the likelihood

ratios

{dQlldPonTn)=
a UI

E{X\\Tn)

was theorem Radon-Nikodym sections,we are dropping the we

martingale.

(This is

of

course

bound

why the to succeed!)

proof of the martingale Here and in the next two

'a.s.'

qualifications on

as (*): such statements


test

have

outgrown

them.

14.17.
Let

Jl

Theorem revisited; consistency of LR = R*^, Xn{uj) = uJny and define the cr-algebras
Kakutani's

^ = a{Xk : fc
Suppose

G N), and /\342\200\236

fn=
Qn

a{Xk :

1<

fc

<

n).

that
functions

for each
on measure

n,

are

everywhere

density be the unique


Xn

R and

let

:= r\342\200\236(a:) gn{x)/fn{x).

Let P

on (Q,

probability you should prove this,


having

makes the variablesXn independent, function density /\342\200\236 Clearly, [respectively, ^\342\200\236].
^) which

positive probability [respectively, Q]


but

Mn := dq/dP where
reasons,
Now

YiY2

...

Fn on

^n,

Note y\342\200\236 r\342\200\236(X\342\200\236). each

that

the

under P and that


M is
if

has

P-mean

variables {Yn 1. For any

: n G N) are independent of a multitude of familiar

a martingale.
absolutely
and E(^|^\342\200\236),

Q is

on f,
exists

then
(a.s.,
But

= M\342\200\236

P) and

continuous relative to P on ^ with dQ/dP M is UL Conversely, if M is UI, then


= Vn. M\342\200\236,

=
Moo

E(Moo\\T\342\200\236)

then

the probability

measures

Fh^Q(F)
f. Thus Q is absolutely m.

agree

on

the

7r-system

|J

and F^E{Moo;F) fn and so agree on f, so that continuous relative to P on ^

Moo
if

and

only

on c/Q/dP if M is

..(14-18)
Kakutani's

Chapter 14: UIMartingales


Theorem
if

151

therefore =

impHes that Q

is equivalent

to

P on

if and

only

nE(ri)
equivalently

/ y/U^)9n{x)dx> 0,

if

(*)

^ then P

/ {Vfn{x)

Vdnix)^

dx

< oc;

and that

is also absolutely

continuous

relative

to Q.

variables

distributed independent Suppose now that the Xn are identically functions of P and Q. Thus, for some under each density probability = = that n. is from It clear f and Qn 9 for all / and 5f on R, we have fn (*) = to Q is equivalent to P if and only if f g almost everywhere with respect = P. Moreover, Theorem Kakutani's measure, in which case Q Lebesgue \342\200\224> also tells us that if Q ^i^ P, then 0 (a.s., M\342\200\236 P) and this is exactly the Test in Statistics. consistency of the Likelihood-Ratio

etc. (prestissimo!) 14.18. Note on Hardy spaces,


We

have

seen
is

in this

martingales

a natural Theorem,
what

Sampling
However,

that for the class of UI many purposes, The appendix to this chapter, on the Optionalfurther evidence of this. provides

chapter

one.

we

might
if

martingales.
bounded

For example,
process,

wish to be true is is a UI martingale a.s.!)

not

always

true

for

UI

and C is a

previsible

then the
converge

in C^.

M does (Even so, C \342\200\242


of

martingale C9M neednot


theory, one
one

(uniformly)

be

bounded

For many parts


martingales conditions equivalent
H\\

the

more

advanced
for

of

M null holds:

at 0

which

(then

uses the 'Hardy' space each) of the following

(a)
(b)

M*

:=sup|M\342\200\236|G\302\243^

:= e C\\ where [M]\342\200\236 [M]i and [M]oo=T limMn. a

ELi(^^

\"

Mk-,f

thereexist
(c)

By

special

case
absolute

of a

constants

celebrated Burkholder-Davis-Gundy that (1 < p < cx)) such Cp,Cp


\\\\M*\\\\p

theorem,

Cp||[M]i|U

<

<

C,\\\\[M]i\\\\,

(1 <

p <

oo).
spaces

The space
of martingales

TYj

is

obviously

sandwiched
>

bounded in

\302\243^ (p

1) and

between the union of the the space of UI martingales.

Its

152
identification

Chapter 14:
as the
from
right

UI Martingales
space

(I4.I8)..

intermediate

has proved
complex

very important.

Its

name derives
Proof

its

important

links

with

analysis.

of (a) and (b) is B-D-G inequality or of the equivalence look at the relevance take a But we can here. quick very give it clear that makes M problem. of (b) to the C \342\200\242 First, (b)

of the
to

too
(d)
and

difficult

if

eHl

and C

is a bounded previsible process,then

\342\200\242 M

EHq^

see that, in a sense, this is 'best possible'. at 0 and a (bounded) previsthat we have a martingale M null Suppose = |, are IID RVs with P(ek = il) ible process e = (sk : k E N) where the \342\202\254k to show that want and where e and M are independent. We
we shall now

(e)

Hj

if (as

well as only
of

M is if) e \342\200\242

bounded

in C^.

We run into no

difficulties

'regularity'

if we

condition on
> 3-^E([M]|).

M:

E|(\302\243 M)|\342\200\236 EE{|(\302\243

\342\200\242 M)\342\200\236| |a(M)}

And where did the last inequality of sequence of real numbers.Think

ak

appear from? Let (ak as Mk \342\200\224 when Mk-i

fc

G N)

be a

M is

known.

Define
n

Xk:=akek,

:= .Yi T^\342\200\236

\342\200\242 \342\200\242 \342\200\242 + X\342\200\236, v^

E{W^)

k=i

J2 al

Then (see

Section

7.2)

SO

that,

certainly,

E{W^)

< Sv^.

On combiningthis
=

fact

with

Holder's

inequality

in the form
vn

E{W'J

<

||t^i|||||W^^||3

(E\\W\342\200\236\\)iEiW*)i

we

obtain

the

special case

of Khinchine's inequality we
E{\\Wn\\)>3-KI,

need:

For more on the topics in this section, seeChow and Dellacherie and Meyer (1980), Doob (1981), Durrett (1984). these is accessibleto the readerof this book; the others are

Teicher The

(1978), first of

more advanced.

Chapter

15

Applications

15.0.Introduction
The

\342\200\224

please

read!

in which the

purpose

of this
theory

chapter is to
which

we

have

problems.
In

We

consider

only

very

of some of the ways can be applied to real-world developed but at a lively pace! simple examples,
give

some

indication

Sections
The

15.1-15.2,
was

we discuss a
developed

trivial

case

of

a celebrated

result

from mathematical
formula.

model

for a continuous-parameter (diffusion) We for prices; see, example,Karatzas and Schreve (1988). in treatments the an also hzis obvious which discretization many present is that in the discrete case,the to be emphasized literature. What needs is why the answer is result has to do with which probability, nothing the of completely independent underlying probability measure. The use of the a device for other than P measure^ in Section 15.2is nothing 'martingale some But in the diffusion expressing sim,ple algebra/combinatorics. where the the and combinatorics are no longer meaningful, setting, algebra theorem, and Cameron-Martin-Girsanov changem^artingale-representation the essential of-measure theorem provide language. I think that this justifies a 'martingale' treatment of something which needs only juniormy giving
formula for

option-pricing economics,the Black-Scholes

stock

school

algebra.

Sections

15.3-15.5 indicate the

further

formulation
E10.2

of optimality in a first look. We consider gave


which

sheep problem^]
techniques

example, the 'Mabinogion just but it is an example which illustrates rather well several be utilized in other contexts. may effectively
one 'fun'
we

stochastic control,

development

of the
at

martingale
Exercise

which

In Sections15.6-15.9, look estimating in real-timeprocesses


of

at

some

simple
noisy

which

made. This topic has important applicationsin engineering (lookat the IEEE I that in and will look in economics. medicine, you journals!), hope
15S

only

problems of filtering: observations can be

154
further

Chapter 15: Applications


into

(15.0)..

this topic and into the important subject which develops for with stochastic-control is combined example, theory. See, filtering and Whittle and Vintner (1990). (1985)
encounter

when

Davis

Sections 15.10-15.12 consist of we try to extend the when


A

first

reflections

on the

problems we

martingale concept.
result
subsets
with

15.1.
Let S
of

trivial

martingale-representation

let E denote the set of all denote the two-point set {\342\200\224 1,1}, and let fx be the probability measure on (5, E) 5, let p G (0,1),
/.({I})=:p=l-M{-1}).

Let

A^

G N.

Define

(fi, T,
cj =

P) = (5, S, //)^

so

that

a typical

element of fi

is

(cji,cj2,---,<^iv),
ek{^)
\342\200\242= ^ib, so define

^k G {-1,1}.
that
\342\200\242 \342\200\242 are (\302\243i,\302\2432, \342\200\242,^iv)

^ ^ Define \302\243;t with

\342\200\224^ R

by

IID

RVs each

law

11. For 0

<n <
n

N,

^n:=X^(\302\243ib-2p+l),

^n
Note

'= Cr(Zo,Zi,
l.p 4- (-1)(1

. . . , Zn) - p) =

Cr{Sl,S2, -

\342\200\242 \342\200\242 \342\200\242 j^n)-

that

E(\342\202\254k)

2p

1. We

see that

(a)

Z = {Zn:0<n<N)
martingale

is

(relative

to {{fn

:0

<n < iV},P)). n<

LEMMA

If

= {Mn
then

:0

< n < N)
exists

iV},P)^,

there

is a martingale (relative to {{fn : 0 < a unique previsible process H such that

= Mo

+ H^Z,
=

that is, Mn
Mo

Mo

ELi

Hk{Zk

- Zk-i).
the

Remark. Sincefo
common

{0,Q},

is constant

on Q, and

has to be
measurable,

value We

of the

E(M\342\200\236).

Proof.

simply

construct

H explicitly.

BecauseMn =/n(^i,...

is

fn

Mn{^)

= fn{ei{u;),...,en{u;))

,u;n)

..(15.2)
for some

Chapter

15:

Applications

155

function

\342\200\224> : { \342\200\2241,1}\" R. Since /\342\200\236

M is

a martingale, we have

0=E(Mn-Mn_i|j^n-l)(cc;)

= p/n((^l,

. . . ,CJ\342\200\236_i, 1)

4- (1

p)/\342\200\236((^l,

. . . ,CJn-l,

-1)

-'/n_i(u;i,...,u;n-i).

Hence the

expressions
/n(u^l, . . . ,CJn-l,

.,
^

^x

1) 2(1

- /n-l(<^l, . \302\273 ,^n-l)


-p)

and
(b2)
\342\200\242 \342\200\242 /n-l (t^l, \342\200\242 , t^n-1

/n(t^l,

\342\200\242 \342\200\242 \342\200\242 , t^n-1,

-1)

2p
if we define then H is Hn{(j^) to be their common value, that as M = Mq -^ H \342\200\242 previsible, and simple algebraverifies Z, You check that H is unique. D

are equal; and


clearly
required.

15.2. Option pricing; discrete


Consider

Black-Scholes

formula
two

an economy

in which
the

there are
of which
TV.

'securities':

bonds of stock

of fixed

interest rate

r, and stock,
of N.

value

fluctuates randomly.
units For

Let N be

fixed

element

units change abruptly -Sn = (1 4Sn

We suppose that at times 1,2,...,


for

values of

n =

0,1,...,
throughout

and of bond N, we write


the

r)\"-Bo

the

value of

1 bond unit
throughout

open

time interval
for

(n, n + 1),
of 1 unit
0 with so that
of

the 4- 1).

value

stock

the open
x

time

interval

{n,n

of stock

You start just after and Vq of

time

a fortune

of value
=

made

up of

Aq

units

bond,

AoSo 4- VoBo
Between
before

X.

times
time

0 and 1 you 1, you have Ai

invest units

this of stock

in stocks

and bonds,
of

so that

just

and

Vi

bond

so that

^150 4-^1^0

= 2:. as
your

So,

(Ai,

Vi)

represents

the

portfolio

you have

'stake

on

the

first

game'.

156
Just
units

Chapter 15: Applications


after
of bond
1 (where time n \342\200\224 with value =

(15.2)..

n >

1) you have

An-i

units

of stock

and

V\342\200\236_i

Xn-\\

-An-iSn-l

+ Vn-lBn-l-

By trading
between

stock for
n

bonds

times

\342\200\224 1 and

value Xn-i

costs tobe zero) because we assumetransaction


Xn-l

or conversely, you rearrange n so that just before time n, your

your portfolio
fortune is

(still described

of by

= AnSn-l

+ K^n-l
by

(n

>

1).

Your
(a)

fortune

just after time n


Xn =

is given
+

VnBn

(n>0)

and

your change in
Xn

fortune satisfies
Xn^l

(b)
Now,

= An{Sn \"

Sn-l) + K(Bn = rBn,

5\342\200\236_i).

Bn

\342\200\224

Bn-1

and

where
rewrite

is the J?\342\200\236

random

'rate of

interest

of

stock

at

time

n\\

We

may

now

(b) as

Xn \342\200\224 Xn-l
so that

= rXn-l

+ AnSn--l(Rn

\342\200\224

t),

if we

set
Yn

(C)
then

{l+ry^Xn,

(d) Note that (c) n, so that the


Let

Yn

Yn-1

= (1

r)-(\"-\302\273>A\342\200\2365\342\200\236-i(iZ\342\200\236

r).

shows Yn to be the discounted value of your evolution (d) is of primary interest.


that

fortune

at

time

Section 15.1.Note
We

fi,^,\302\243\342\200\236(l <n<N),

Zn{0 < n < N) no probability measure

and J='n(0
only

< n < N) he as in has been introduced.

build

a model

in which

each
a < r

Rn

takes

values

a,

in

\342\200\224

1, oo),

where

< b,

..(15,2)
by

Chapter 15: Applications

157

setting

(e)

^ Rn =

a+ b + -\"2-

b \342\200\224 a

^~^\"-

But then (f)


where we

R^-r='^(bnow choose

a){\342\202\254n

2p +

1) = h{b-

a)(Zn

^n-i),

Note

that

(d)

and

(f) together

integral'relative display y as a 'stochastic


time

to

Z.

you to buy 1 unit


striking

A European

optionis a
of

contractmadejust after
after time
iV

0 which

will allow

stock

just

at

a price

K; K is the

so-called

you will
value the

at option

time If you have made such a contract, then just after N, price. exercise the option if 5^ > K and will not if 5^ < K. Thus, the should you pay for time N of such a contract is {Sn \342\200\224 What K)'^.

at

time

0?
an

Black and
the

Scholesprovide
strategy

answer

to this

question which is basedon

concept
A

of a

hedging strategy.
with
scheme

a portfolio
A

hedging

initial
{{An,
Vi^)

management

and

V are

previsible
for

relative to
every uj,

and (b),
(hi)

we have

{^n}, and where, with

value x for the describedoption is : 1 < n < N} wherethe processes


X

satisfying

(a)

Xo{uj) =
Xn{uj)

x,
0 (0

(h2)
(h3)

>

< n <

iV),

Xn{^)

= {SnH-'K)^-

Anyone employing a
management,

hedging strategy
going

will

by

appropriate

portfolio
value

and

without

bankrupt,

exactly duplicate the

of

the

option

at time

N.
that
-X'\342\200\236(u;)

some n amounts to borrowing at the fixed interest rate r. A negative value of A corresponds to 'short-selling^ stock,but after you have read the theorem, this may not worry you!
V for

Note. Though Blackand Scholes insist we) do not insist that the processes A

>

0, Vn,
A

Vu;, they

(and
of

and V be

positive.

negative

value

158

Chapter 15: Applications

(15.2)..

THEOREM
A

hedging

strategy

with

initial

value x exists if

and

only

if

x =
where

xo:=E[(l+r)-''iSN-K)+],
for the measure P
hedging
of

is the

as at
and

(g). There is a
of

expectation
unique

Section

15.1

with p
Xq,

strategy

with negative.

initial

value

it involves
this

On the basis time 0 of the


Proof.
of

no short-selling: A is never it is claimed that Xq result,

is

the

unique

fair price

at

option.

In the
underlying positive
Suppose

definition

of

hedging

strategy,

an

requirement,
(jj

has

we should consider only mass. Of course, P is such a measure.


now

probability

measure.

there is nowhere any mention Because however of the 'for every u;' measures on Q for which each point x exists,

that

let A,V,X^Y denotethe

a hedging
associated

strategy
processes.

with

initial

value

and

From

(d) and (f),

Y = Yo^F.Z,
where

is the

previsible process
F\342\200\236

with

(l +

r)-(\"-\302\273)A\342\200\2365\342\200\236_i.

Of

course,
X

F is
Thus

bounded because

there are only

finitely

combinations.

Yo =
strategy,

and

y is a martingale under the P y}v = (1 4- r)''^{Sn \342\200\224 by (c) and K)\"^

since measure,

many

{n,uj)
is;

and

since

the definition of

hedging

we obtain

(We did not

use the property

that

> 0.)
define

Now consider

things afresh and

y\342\200\236:=E((l+r)-^(5w-i^)+|:^\342\200\236).

Then

F is

a martingale,
in

and by

combining

(f)
for

with
some

the
unique

martingale-representation result

Section

15.1, we

see that
:=

previsible

A, (d)

holds. Define

process

Xn := (1+ r^Yn,

Vn

{Xn

- AnSn)/Bn.

Then

(a)

and

(b)

hold.

Since Xo =
X

and

Xn

= {Sn

- K)-^,

,,(15.3)
the only thing which
of

Chapter

15:

Applications A is

159

remains is to prove
(15.1,bl),

that

never negative.

Because

the

explicit
E

formula

this reduces

to showingthat
= (1 +
a simple

[{Sn

K)^\\Sn-l,Sn

= (1 + h)Sn-.l\\

>
and

[{Sn

K)-^\\Sn--uSn

a)Sn^i] ;
computation

this

is intuitively

obvious and may

be provedby

on binomial

coefficients.

15.3. The Mabinogion sheepproblem


In

the

Tale

of Peredur

Mabinogion (see Jones


some
the

ap Efrawg in the
Jones
We

very

early

Welsh

folk
flock

tales,
of

The

and

(1949)),
sacrifice

there
poetry

is a

magical

sheep,

black,
entire

some white.
each

for precision

in specifying

its behaviour. At

from randomly 1,2,3,... a sheep (chosen of if this bleating flock, independently previousevents) bleats; becomes white; if instantly sheep is white, one black sheep(if any remain) the bleating sheepis black, then one white sheep (if any remain) instantly

of times

becomesblack.No
The controlled

births

or deaths

occur.

system
the

Suppose

now

that

system
the

can be
transition
system.

and just after every magical sheep may be removed from of black

controlled in time 1,2,...,


(White
the

that any

just number

after

time
of white

numerousoccasions.) The object isto maximize sheep.

sheep may
expected

be removed on
final

number

Consider

the

following

example

of a
decision,

policy:

Policy A: at each time of black sheepthan white sheep


immediately

or if no

reduce

the

white

do nothing if there are more black sheep remain;otherwise, to one less than the black population

population. The
value

function

V for

Policy F : Z+

is

the ->

function [0, cx)),

Z+

sheep

where for w^b G Z\"^, V{w,b) denotes the expected final number of black if one adopts Policy A and if there are w white at and b black sheep time 0. Then V is uniquely the fact for G specified by that, w,b Z\"^,
(al)y(0,6)

= 6;

160
(a2)

Chapter 15: Applications


V(w,

(15.3)..

b) =

V{w

1,

ft)

whenever

w >b

and

>

0]
whenever

(a3)

6>

0 and w
It is black

V(w,b)

=
>

:^^V{w + l,b~l)^^^V(w-l,b^l)
0. Wn
we

<

h,

and

almost tautological that if sheep at time n, then, if


of

and Bn
adopt

Policy

denote the numbersof white A, then, whatever the

initial

values

Wq

and

Bq,

is a martingale relativeto the natural filtration


(b) V{Wn,Bn)

of

{{Wn,Bn)

: n

> 0}.

(c) LEMMA
The

following
V{w^

statements
b)

are true
1, b)

for w,b E T^:


w;

(cl)
(c2)
whenever

> V(w

\342\200\224

whenever

>

0,

V(w,b)>^,Viw
w

+ l,b-l)
and
b

> 0

+ ^,Viw-l,b+
in

1)
the

>

0.

Let us suppose that this Lemma is true. (It is proved Then, for any policy whatsoever, (d)

next

Section.)

V(Wn^Bn)
fact

is a
F(VFn,

supermartingale.
Bn)
a.s. end up converges means that the systemmust are of one colour. But then V( Woo? Boo) sheep sheep (by definition of V). SinceV{Wn,Bn) have

The
is

that

in an

is a non-negativesupermartingale, we

absorbing state in which just the final number of black

for

deterministic

Wq^Bq,

EV{Woo,Boo)<V{Wo,Bo).

Hence,
sheep

whatever

the

initial

position, Thus

under

final any policy is no morethan the expected A

the expected final

number
number

of

black

of black

sheep if Policy

is

used.

Policy
In Section

is

optim^al.
result:

15.5, we prove the

following

as Thus

oc. fc \342\200\224)^

if we

start

with

10000

black

sheep

and 10000
(over

up with

(about)

19 824 black sheepon

white sheep, we
many
had

finish

average

'runs').
correctly

Of course, the above argument


the optimal

workedbecause we
subject

goodguesses.

policy to begin with. In this


Then

guessed

area,

one often

has to

make

one

usually

has

to work

rather hard to clinchresults

.,(15,4)

Chapter

15:

Applications

161

which correspond in more general situations to Lemma (c) and to prove these You might find it quite an amusingexercise (d). on. our special problem now, before reading
For

statement

results

for

problem

in economics

which utilizes

analogousideas,seeDavis

and

Norman

(1990).

15.4.

Proof
be

of Lemma 15.3(c)
to define

It
(a)

will

convenient

Vk:=V{k,k).
results:

Everything hinges on the following


(bl)

for

1 <

c<

fc,

y(A:-c,fc

+ c)

= \302\253,+(2A:-n-)2-^\"-'> Y.

\342\226\240

)'

(b2)

y(jfc +

l-c,fc
Vk

+ c)

=
which

simply

reflect

(15.3,a3)
vfc,

together
F(0,2fc)
vk,

with

the

'boundary

conditions':

F(fc,fc) =
(c)

= 2fc, 4-1)

V(h 4-1, k)

V(0,2fc

= 2fc 4-1.

Now, from (15.3,a2),

VM = and hence,from
(d)
(b2)

F(fc

1,

fc

1) =
that

V{k, k +

1),

with

c =

1, we find

v,+,
is the

\\^v,

+ -^{2k

+ 1),
tails

where

pk

chance of

obtaining k headsand k

in 2k

tosses of

a fair

coin:

Result

(d) is

the key to

proving

things

by

induction.

162

Chapter 15: Applications


of

(15.4),. is automatically
when
w

Proof
true

result

(15.3,cl).
b

From
is

when

w > b.
and

Hence, we
w
-{- b

need only
then
(fc

(15.3,a2),

result (15.3,cl)
establish

the

result

<

b.

Now

i{ w <

odd,

(u?,6) = for some

l -c,fc4-c)
that

c with
1 <

1 <
fc,

c <

k. But formulae (b) show

it is

enough

to show

that

for

a <

>(2.-\302\253)2-\"-.(/;;_\\);

and

since

/
\\k +

2fc

\\

//2ilA

2fc-l

\\

//2A:-1\\
\\

a-l)
the

/ \\k) -\\k + a-l)/


case when

;'

we need

only

establish

a=

1:

(2fc+i-n)2-(\"-i>(i+Pfcr'(^jtO

>(2ifc-.02-(\"-^>(^\\~'),
which

reduces

to

(f)

Vk>2k--p-\\
follows

But property (f)

by

induction

from

(d) using only in k. be achieved

the

fact

that

Pk

is decreasing
is

Proof

for

the

case

when

6 4- w^

even

may

similarly.

Proof of
automatically

result true

(15.3,c2). when

Because
w; <

case when reducethis problem to showing


(15.3,c2) for the
{1 +

In analogy with the a = 1' in the proof of

reduction
(15.3,cl),

of (15.3,a3), the result (15.3,c2) is > b. we so need it w establish when 6, only of the 'general a' caseto the 'boundary case it is easily shown that it is sufficient to prove = (fc 4-1, fc 4-1) for some k. Formulae (b) {w, 6)
that

(2fc +

l)pk}vk < 2k{2k+ l)pk,

..(15,5)
and this

Chapter

15:

Applications

168

follows by induction from


(2fc 4- ^)pk i^

(d)

using

only

the fact that D

increasing

in k.

15.5. Proof
Define

of result (15.3,d)
OLk :=

Vk-2k-

{pkT^

JTT.

Then,

from

(15.4,d),

aA:+i =
where

(1 -

pk)o^k

4- pkCk,

Stirling's
find

formula shows that


\\ck\\

Cfc

\342\200\224> 0 as fc \342\200\224> oo, so

that

given e

> 0, we

can

N so that
<e iV,

for

k>N,

Induction shows that

for

fc

>

\\oik+i\\ <(1

- pk)(l we have ak

pk-i)'

\"

(I

pN)\\otN\\+e.
0, and it

But, since
limsup

J^/Ojt

oo,

J][(l

\342\200\224

pk)

is now

clear that

|a;t+i| Because

so that \302\243 \302\243?

\342\200\224> 0.

of the n\\ =

accurate version of (27rn)^ (^Y

Stirling's

formula:

e^/^^^n)^ 0 <

6{n)

< 1,

we have

SO

that Vit

(2fc

4- ~

>/7rfc)

->

0,

as required.

We now
formula

take a

quick

look

at

filtering.

The central
illustrated

with

a recursive

property

which is now

idea combinesBayes'
by

two examples.

164
15.6.

Chapter 15: Applications


Recursive
Suppose

(15.6)..

nature
that

of conditional
S, C probabiUty.
A,

probabilities
(elements
(for

Example.
with

ACi B

the P[C. Let us also introduce

strictly

positive

and D are events Let us write


notation

example)

of J^) ABC

each
for

Ca{B)
for

:- P{B\\A)
The

= P{AB)IP{A)
property' in which we

conditional is

probabilities. exemplified

'recursive

are

interested

by

Cabc{D)

= CMD\\C)

:=

^^
D

;
given B

'if
have

we

want occurred,

to find we

can

the Cab probability

the conditional probability of assume that both A and of D given C\\


and
on

have

that A, B and C occurred and find

a strictly

Example. Supposethe X, Y, Z positive joint pdf fx,Y,Z,T


P{(X,

T are RVs such that R^: for B G B\"^,

{X, Y,

Z,

T)

has

Y,Z,T)eB}=

J
has

J J

fx,Y,zA^,

y, z,

t)dxdydzdt.

Then, of

course, (X, F, Z)

joint

pdf

fx,Y,z

on R^, where

fx,Y,z{x,y,z)= Jr/
The

fx,Y,z,T{x,y,z,t)dt.

formula

/T|x,y,z(^k,t/,^)
defines

:= fx,Y,z,T{x,y,z,t)
T
given

fx,Y,z{x,y,z)
X,

a ('regular')

conditional

pdf of

Y, Z:

for B

e B,we

have,

with

all

dependence E B\\X,Y,Z){u;)

on

u;

indicated,

P(T

= E(Ib{T)\\X,Y,Z){u;)

=
Similarly,

Jb

fT\\x,YAAX{uj),Y{uj),Z{uj))dt.

fT,Z\\X,Y{i,z\\x,y)

\\

\\

fx,Y,Z,T{x,y,z,t) \342\200\224 r

fx,Y{x,y)

\\

..(15.1)
The

Chapter 15: Applications


property

165

recurrence

is exemplified by
fT,Z\\X,Y

pj

fnx,Y,z-{fnz)^^,y.--^^

15.7.
With

Bayes'
a now-clear

formula for bivariate iiorinal distributions


notation,

we have

for RVs

X,

Y with

strictly

positive joint

pdf/x,y on R2,
.
X

(*)

/x,v(x|y)

fx,Y{x,y)

-j^:^

fx{x)fY\\x{y\\x)
\342\200\224j^\342\200\224.

Thus

(**)

fx\\Y{x\\y)
proportionality'

oc fx{x)fY\\x{y\\x),
depending

the 'constant of
the fact that

on

but

being

determined

by

Jr /R

=
fx\\Y{x,y)dx

1.

The meaning

of the

following

Lemma

is clarified

within

its proof.

LEMMA
\342\226\272(a)

Suppose
RVs

that
such

/j,^a^b E R, that
\302\243(X)

U,W E (0, cx))


=

and

that

X and Y

are

that N(/.,C/),

CxiY)

= -Nia +

bX,W).

Then

Cy(X)=N(i-,n
where

the

number

V G (0,

cx)) and
~

the

RV

X -

are

determined

as

follows:

1 = ~ 14.^
V

W'
X

X_^
V

b{Y

a)

U'^

'

Proof. The

absolutedistribution of /x(a:) =

is N(//,

U), so

that

(27rt7)-Uxp{-^^^|

166
The conditional

Chapter

15:

Applications

(15.7)..
of

pdf

of

given

X is

the density

N(a

4- bX,

W), so that

/y|x(t/k)
Hence,

(27rTyr^exp{-^^-^^^^^'|

from

(**),

log/x,v(x|\302\273)

= CW

(i^

fc|j^
result

where

1/F

= 1/U

+ b'^/W and

x/V = n/U + b(y-a)/V. The

follows.

COROLLARY

(b)

With

the

notation

of the

Lemma, we have
=

= E{iX-Xr} \\\\X-X\\\\l
Proof.

V.

Since

Cy{X)

= N(X,

F),

we

even

have

E{(X-X)^\\Y}

V,

a.s.

15.8.
Let
A\",

Noisy observation
r/i,

of a singlerandom variable
RVs,
N(0,a2),

7/2,

\342\200\242 \342\200\242 \342\200\242 be independent

with
Civk)

\302\243(X)

= m^)-

Let

(c\342\200\236)

be

a sequence

of positive

real numbers,and let


Tn

Yf,=X+Ckr)k.
We regard

(T{YuY2,...,Yn).

each

Yi

as

a noisy

observation
Moo

of X.

We

know

that

Mn :=
One

E{X\\:Fn) ^

:=

E(X|^oo)

a.s.

and in

\302\2432.

interesting
when

question
is

mentioned that Moo

at (10.4,c) is:

it true

= X a.s.?^ or again,

is

a.s.

equal to an

J^oo-measurableRV?

..(15.8)

Chapter

15: Applications

167 law' given Fi, 1^2,


\342\200\242 \342\200\242 5^n\342\200\242,

Let us write
We

to signify C\342\200\236

'regular

conditional

have

Suppose

that it is

true

that

c\342\200\236-i(x)

N(i-\342\200\236_i,y\342\200\236-i)

where

is X\342\200\236-i

a linear

function
=

(0, oo). Then,since

\342\200\242 of Yi, y2, \342\200\242 Yn-i \342\200\242,

and

Vn-i

a constant in

X Y\342\200\236

we have c\342\200\236j]\342\200\236, =

C\342\200\236-i(Yn\\X)

N(X,cl).

From

the Lemma
/i

15.7(a) on bivariate normalswith


= Xn-i,

U = Vn-i

,a = 0,b = l,W =
=

cl,

we

have
Cn-iiX\\Y\342\200\236) -NiX\342\200\236,V\342\200\236),

where
\342\200\224 \342\200\224

4_ ^

_L ' c^

^^

V
in

^^-^ \342\200\224 ~ V

Xn

r2*

But the recursiveproperty

indicated

Section

15.6 shows that

C\342\200\236{X) Cn-l{X\\Y\342\200\236).

We

have

now proved

by induction that

Cn(X) =
Now,

Vn. N(X\342\200\236,V\342\200\236),

of course,

Mn =
However,

Xn and E{{X-Mny} = Vn.

ib=l

Our

martingale

M is \302\243^ bounded Moo =

and

so converges
only

in

\302\243^. We

now

see that

X a.3, if

and

if

^ c^^

= oo .

168
15.9.

Chapter

15:

Applications

(15.9)..

The Kalman-Bucy
method

filter
used

The
immediate

of derivation

calculation of the

in the

famous

allows preceding three sections filter. Kalman-Bucy

Let A,
r/i,

fl\", C,

K and g

that be real constants. Suppose

Xo,

\342\200\242 \342\200\242 \342\200\242, Vo,\302\243i,\302\2432,

7/2,

\342\200\242 \342\200\242 \342\200\242 are independent

RVs with
CiXo)

C{ek) =

C(7jt)= N(0,1),
at time n

N(m,

a%
by

Yo =
Xn,

0.
where

The true state


(dynamics)

of

a system

is supposed

given

Xn
However

\342\200\224

Xn-l

= AXn-1

+ HSn +

9can

the

process

X cannot

be observed

the process

y, where
^n \342\200\224

directly: we

only

observe

(observation)
Yn-1

= CXn the

+ Krjn.
induction

Just as in Section we 15.8,

make

hypothesis

that

C\342\200\236-iiX\342\200\236-i)=NiXn-uV\342\200\236.i),

where

Cn-i

signifies

regular

conditional law

given

5^1,12,

\342\200\242\342\200\242\342\200\242 Since ^^n-i-

Xn =

OiXn-1 + 9 -^ Hsn,

where

a:=l4-A,
we

have

Cn-i{Xn)

N(aX\342\200\236_i

g,a^V\342\200\236.i

H^).

Also,

since

= Y\342\200\236^i Y\342\200\236 + CXn

+ KVn,
=

we have

C\342\200\236_i(r\342\200\236|A'\342\200\236) N(F\342\200\236_i +CJ^\342\200\236,A'2).

Apply

the

bivariate-normal
=

Lemma

15.7(a) to
=

find

that

C\342\200\236iX\342\200\236) C\342\200\236-iiX\342\200\236\\Y\342\200\236) N{Xn, V\342\200\236),

where

(KBl)

\342\200\224 =

-^^^\342\200\224\342\200\224\342\200\224

('KB2^ ^ ^

^ = ^^n-i + g
Vn

C(Y\342\200\236-Y\342\200\236-i)

aWn-1

+H^

lO

..(15.10)
Equation
rectangular

Chapter 15: Applications


(KBl)
which
= Examination shows that V\342\200\236 /(V\342\200\236_i). that is the graph of / shows Vn

169

of the
\342\200\224> V'oo,

hyperbola

the

positive

root oiV

= f{V).
to

in continuous a rigorous treatment of the K-B filter a nd use See, techniques. time, martingale stochastic-integral to filtering and and references for example, Rogers and Williams (1987) control mentioned there.For more on the discrete-time situation, which is

If onewishes

give

one is forced to

very

important

in

practice,

and for how filtering


Whittle

does link
(1990).

with

stochastic

control, 15.10.

see Davis Harnesses

and Vintner (1985) or


entangled

The martingale
because question

concept is well
time

(discrete) arises:

does

the

processes

parametrized
explain

to processes evolving in time to the ordered naturally belongs spaceZ'^. The in some natural way transfer property martingale Z^? by (say)
adapted

to

Let mefirst
with

models

in Z (d
that

= 1) and in Z^,
: n

a difficulty

described
though

we do

in Williams (1973) that arises not study the latter here.


each

Suppose
that ('almost
(a)
For

(Xn

G Z)

is a
will

process suchthat
be

Xn

G C^

and

surely' qualifications
E(Xn\\Xm

dropped)

:7n^n)=

K^n-i

+ Xn-fi),

Vn.

G Z,

define
Gm

cr{Xk shows

A:

<

m),

Hm =
h

cr{Xr : r
Z^

> m).
a <

The
a <

Tower

Property

that

for a,

in

with

6, we

have

for

r <

6,
: r

^{XAGa.H,) = E{Xr\\Xs
SO that

^ ^l^a,

Wft)

r i->

E^XrlQa^T^b) is the
E{Xr\\Ga,y'b)

linear interpolation
=

0 \342\200\224 a

^a

+ -. \342\200\224 Xh.
b a

Hence,

for n G Z

and

G N,

we have,

a.s..

E(-X'n|^n-u?'Wn-f

l) =

Xn-u , uXn-\\-i --\342\200\224


W-f- 1
+

U-f- 1

110
Now,
Warning

Chapter 15: Applications


the
4.12,
decrease cr(^n-u, Wn-fi) better not claim that they the Downward Theorem, we see that

(15.10)..

cr-algebras

as decrease
the

we had

Anyway,

by

u t oo. Because of to a(^_oo) Wn-f i)variable random

L :=
and

\\\\iaui-oo{Xulu) exists (a.s.)


= ^n+1

E{Xn\\\\\\(^{Qn-u,Hn-\\-l))
u

- L.

Hence,

by

the

Tower Property,
E(Xn\\L,Hn+i)

\342\200\242

(b)

= Xn+1

\342\200\224 L

whence

we have

a reversed-martingale property: = Xn+i nL\\L,Hn+i) E(Xn

(n

1)L.
that

A further

application of the Downward Theoremshows


A :=

(c)

Imin^ooiXn

\342\200\224

nL)

exists

(a.s.)

Hence

L=
By

uf

l\\ni(X u/u). oo
which led to (b) in
the

using

the arguments

reversed-time

sense,

we

now obtain

(d)

E{Xn+l\\L,gn)=Xn+L.
(d)

From (b) and

and

the Tower

Property,
L\\Xn-^l)

E{Xn 4E9.2 shows that

Xn-^l,

Exercise

Xn-^i

= Xn = nL-^

+ L. A,
Vn

Hence (almost surely)


G Z,

Xn

so that

(almost) all

samplepaths
that

of

are straight

linesl

a harness property
that

that Hammersley (1966)suggested

and

every low-dimensional

harness is a straitjacket!

the type

of (a) should be called any analogue of result just obtained conveys the idea

..(15.12)
15.11.
in terms
should

Chapter

15:

Applications

171

Harnesses
that

unravelled, 1
rules
each

Thereason

(15.10,a)

of the idea that

out interesting be Xn should

models is that it is expressed one a random variable. What

say is that

Jtn :
but

Q -> R

then

require

only that

differences

(Xr-Xsir.sel)

be RVs

(that

is, -

be J^-measurable),

and that for


n)

n,k

1 with

k ^ n.,

E(X\342\200\236

Xk\\Xm

Xfc

: m

7^

K^n-i

- Xk) + K^n+i
(1973).

^fc).

I call

this

a difference

harness in
that

Williams

Easy

exercise.

Suppose

: n (y\342\200\236

any function on

Q. Define
Y

e 1)

are IID

RVs

in

C^.

Let Xq

be

ifn<0. ^\"-\\Xo-EL\342\200\236+in
Thus,

_/^o

+ E*=in

ifn>0,

Xn

\342\200\224

Xn-i

= Yn, Vn.

Prove that X 2

is a difference

harness.

15.12.

Harnesses

unravelled,

In dimension d >
described

3,

we

do

not

in

related both to Gibbsstates in statisticalmechanics such that each Xn{n G Z^) is a, RV and, for n G Z^, EiXniXm : m
Z\"* \\ \342\202\254

the preceding

need to use the 'difference-process' unravelling section. For d > 3, there is a non-trivial model,
and

to

quantum

fields,

{n})

= (2d)-'

J] X See WiUiams
the (1973).

whereU is the set of


In addition
'martingale' and

the

2d unit

vectors in 1^.

to a

'harness',
anticipating

harnesses,
Many

interesting

Hammersley (1966) contains many important ideas on later work on stochastic partial differential equations. unsolved on various kinds of harness remain. problems

treatiseon fascinating etymological

terms

PART

C: CHARACTERISTIC

FUNCTIONS

Chapter 16

Basic

Properties

of CFs

Part

is merely

the briefest
is about

account

function theory. This theory work in Part B. PartB was


Characteristic function

theory
that

is on
finds

it is

proper

it

its way
do

the one hand part of Fourier-integral recent into that marvellous


and

of characteristic stages in different spirit from the very something of the sample processes. paths
of

the

first

theory, book,

and

Korner

characteristic statistics, I
and

(1988); see

also the magicalDym


functions

McKean

(1972).

On the

have

an essential

role in both

must

include

see

Chow

and Teicher
indicate the method

(1978) or Lukacs

these few pages


(1970).

on them. Forfull

probability and
treatment,

other hand,

Exercises
develop

the analogous

in

full

of moments

Laplace-transform method, and


for

distributions

on

[0,1].

16.1. Definition

The characteristic function


to

(CF)

ip

= ipx
: R

of a

random variable X is defined

be

the map
(yp

-> C defined

(important.-the
\342\226\272 \342\226\272

domain is R
ip{e)

not

C)

by

:=

E(e'^^')

= Ecos(^X)
of

+ iEsm{eX).
X,

Let

:= Fx

be the

distribution function

and

let

/j,

:=

law of

X, Then

fix

denote

the

^{6):= f Jr
so
of
that (p

e'^^'dFix)

:=

Jr of //, for
172
(p,

e'^V(d^),

is

the

Fourier
write

transform
factor

or the Fourier-Stieltjes transform


is sometimes

F.

(We do

not use the

(27r)~

2 which

used in Fourier

theory.) We

often

(pp

or c^^

..(16.3)

Chapter

16: Basic

Properties

of

CFs

178

16.2.

Elementary
ip

properties
RV

Let

ipx

for a
= <

X.

Then

\342\226\272(a)

^(0) \\<f{e)\\

1 (obviously); 1, V0;
is continuous

\342\226\272(b)

\342\226\272(c)

6 \302\273-> ^{0)

on R;

(d)
(e)

n-x){e)
VaX+6(\302\253)

= ^x{e\\ye,
=

e'\302\273Vx(a^).
to

You can

Note
differentiate

A16.1) implies that


^{6)

these properties.(Use(BDD) on differentiability (or lack of it). Standard


easily prove
if

establish (see

(c).) Theorem

analysis

G N

and

Ed-X\"!\")

<

oo,

then

= Ee*^^

n times
(^(\">(^)

to obtain
E[(iX)\"e'^^].
it

we may

formally

In particular,
when

(p^^\\0) =
= oo.

i^E(X^),
that

However,

is possible

for <p'{0) to

exist

E(|X|)

We shall

see shortly

(f can

be the

'tent-function'

^w = (i-i^i)i[_,,ij(^)

so

that

(p

need

not

be differentiable

everywhere, and

(f

can

be

0 outside

[-1,1].

16.3. Some uses of characteristic functions


Amongst

uses

of CFs the

are the

following:

\342\200\242 to prove \342\200\242 to calculate

Central

Limit Theorem

(CLT) and analogues,


Theorem,
saddle-point

distributions the

of limit RVs,

\342\200\242 to

prove

'only if

part

of

the

Three-Series
via

\342\200\242 to obtain approximation, \342\200\242 to prove

estimates

on tail

probabilities

such

results

as
and
normal

if X

and Y are independent, then both X and Y have


three
of

X -{-Y

has a normal distribution,

distributions.

Only the first

these

uses

are discussed

in this book.

174
16.4. Three
(a)

Properties Chapter 16: Basic

of

CFs

(^6,4),.

key results
are independent
RVs,

If

and Y

then

Proof.

This

is just

'independence means mvdtiply'

agziin:

(b)

F may he

reconstructed from (f.


a precise distribution

for See Section 16.6

statement.
functions

(c) ^Weak^ convergence of


convergence

corresponds

exactly to

of

the

corresponding

CFs,
a precise

See Section
Theorem

18.1for

statement.
in

The way in which these results are used is as follows. Suppose that X\\, -X'2, we and variance 1. From (a) and (16.2,e),
Eexp(i^5n/v/;7)

the

proof

of the

Central Limit
with

\342\200\242 are IID \342\200\242 \342\200\242

RVs each

mean
h -X'\342\200\236,

see

that

if Sn

:= X\\^

then

= (^x(^/v^\".
show

We shall

obtain rigorousestimates which


=

that

9x(^/v^)\"

|l
\\6Vn

o(l/n)|\"

-. exp(-i^2)^

y^

Since6 \302\273-> exp(


shall

\342\200\224^^^)

see
the

shortly),
distribution $

is the CF of the it now follows from

standard normal distribution (as we (b) and (c) that


to

function

of SnIy/ri converges weakly of the standard normal distribution simply

the

distribution

N(0,1).

In this

case,

this

means that

P{Sn/Vn

<x) -^ $(x),

eR.

16.5.

Atoms

In regard
by

to both (16.4,b) and


of atoms.

(16.4,c),

tidiness

of results

can be

threatened

the

presence

.,(16.6)
If P{X

of Chapter 16: BasicProperties

CFs

175

the distribution

= c)

> 0, then
function

the

law

// of

A\"

is

said

to have

an atom at

of X

has a discontinuity at
=

c:

c, and

f,{{c])= F(c)-Fic-)
Now
of /J,

P{X

c).

can

have of /i

at most

atoms

is at

n atoms of mziss most countable.


that

at

least

1/n,

so that the

number

It therefore follows
reals

given

x G R,
right-continuity

there exists a
of
/j,

of sequence(t/\342\200\236)

with

yn

I x such

of continuity

of F);

and then, by

that every

is a non-atom t/\342\200\236

(equivalently

a point

of F,

F{yn) i

F{x).

16.6. Levy's InversionFormula


This

theorem

puts
functions

the fact that


the

may

be

reconstructed

from

explicit form. (Check that


distribution

theorem

does

such

that

ipp =

imply
R,

that
F

if F

and G are

(f in

very

^G on

then

= G.)

THEOREM
\342\226\272

Let

(^ be

F.

the CF of a RV Then, for a < b,


1

which

has law

/j,

and

distribution

function

^ ^ (a)

Z*^ e\"*^^ J_' -T


/

- p-*^^
-J \"-^

lim\342\200\224

TToo27r

ie

<p(e)de ^^ ^

\\[F{b)

F(b^)]-^[F{a)^F{a^)].
X has
continuous

Moreover,

if

J^

\\(p(6)\\d6

< oo, then

probability

density

function

/, and

(b)
The

f{x) =
'duality'

^j

e-'''^{e)de.

between

(b) and

the result

(c)
can

^{e) =
be

JR

I e''-f{x)dx
omitted on a first
reading.

exploited
proof

as we
of

shall see.
may be

The

the

theorem

176

Chapter of the

16: Basic

Properties
u

of

CFs

(16.6)..

Proof

theorem. For u,v

eR

with

<v,

(d)

|e*'\"-e'\"|<|^-u|,

either from a

picture or since
I \\Ju

ie'*dt\\
a

<
I

Ju

f f \\ie''\\dt= Ju
Theorem

Idt.
allows

Let
0

a, 6 G R,

with

<

b.

Fubini's

us to

say that,

for

<T

< oo,
^ e-'\302\273\302\273

e-

(e)

-A J_
2w _rp

ie

<f{e)de

It

^Jr\\J-t~

id

de \\

/i{dx)

provided we show that


'' dO > fj.{dx)

Ct

^^Jr[j-

TI
that evenness

< oo.

id

However, inequality (d) shows the Next, we can exploit


of

Ct

^ (^
of the

\342\200\224 \302\253)^/7r,

so

that

(e) is

valid.

cosine function

and the oddness

the

sine

function

to obtain
^ie(x-a)

(f)

I
27r

rT

_ ^iBix-b)
iO

iTTJ_ ./\342\200\224J\"
_

de
a\\T)

sgn(a:

- a)S{\\x
r 1 := < 0 I -1

sgn(x

- b)S{\\x -

b\\T)

where,

as usual.

if
\\i if

>

0,

sgn(a:)

x = X <

0, 0,

and S(U) :=

sinx
/ Jo

dx

(U > 0).

..(16.6)

Chapter

16:

Basic

Properties
x~^

of CFs
does

177
exist, because

Even though the LebesgueintegralJ^

sinxdx

not

we have

(see

Exercise

E16.1)

lim S{U)

= ^ .
in x and
to

The expression
(f)

is

bounded

simultaneously
converges

T for

our

fixed

a and

6; and, as T t
0 if
^ if
X X

oo, the expression (f)


a or a or
X X X

< =

> =

6, 6,

1 if a The

<

<

6.
now

Bounded Suppose

Convergence Theorem
now that
(DOM)

yields

result

(a).

(a)
(g)

and

use

to obtain

J^

\\(f{6)\\d6

<

oc.

We can

then let T t oo in

result

F{h)

- F(a) =

-1 /r

\342\200\224 iBa __ p \342\200\224 iBh ^(e)dd,

F is continuous at a and b. However, that provided (DOM) right-hand side of (g) is continuous in a and b and (why?!) that F has no atoms and that holds for all a and b with (g)

shows we can
a <

that the conclude b.

We now have
(^^
But, F{b)

F{a)

6-a
by

[ e^^^

e\"''^^ ^^^)^^-

=2^U
^^iSa ie(b

iO{b-a)
I

(d).
_ ^-iSh

< 1.

a) oo allows

Hence,the assumptionthat
6 \342\200\224> a in

J^

\\y:>(6)\\d6 <

(h) to obtain

us to use (DOM) to let

n\302\253)=/(\302\253):=^^e-*VWd^,

and, finally,

is continuous

by (DOM).

178
16.7.

Properties Chapter 16: Basic

of

CFs

(16,7).

A table
Distribution

pdf

Support

CF

1.

N(/x,a2)

^^^cxp|

-^^

-}

exp(i>6l

- 10-202)
i0

2.

U[0,1]

1 1
2

[0,1]

3. U[-l,l]

[-1,1]
R

sin 0

0 1

4. Double
exponential

ie-l-l

5. Cauchy 6. Triangular
7. Anon

7r(l-fic2)

e-l\302\273l

l-|x| 1\342\200\224COS X
7ra:2

[-1,1]

2(i-r^)

(1

|0|)i[_i,i]W

The
as

two

lines

4 and

5 illustrate

do the

two lines 6

and 7.

Hints

exerciseson this chapter.

the duality between (16.6,b) and (16.6,c), on the table are verifying given in the

Chapter 17

Weak

Convergence

for the appropriate concept of 'convergence' is 'weak on The mezisures convergence' terminology probability (R,B). in the is closer to 'weak*' than to 'weak' convergence unfortunate: concept is the official senses used by functional analysts. 'Narrow convergence' the term. However, probabilists seem determined to use pure-mathematical here. them 'weak in their sense, and, reluctantly, I follow convergence'
In

this

chapter,

we consider

We are
(complete,

studying the
metric)
of

specialcaseof
For

weaJc

separable,

special features
Parthasarathy

R.

space the

S when general

5 =

convergence
R;

on a
we

Polish
use Ethier

and

(1967)

or, for

and Kurtz

a superb acount of

theory, see Billingsley(1968)or


its

unashamedly scope,

current

(1986).
We

Notation.
for the

write

Prob(R)
on space of probabilitymeasures

R,

and

Cfc(R)

for the

space of

bounded

continuous

functions

on R.

17.1. The 'elegant'


Let

definition
sequencein Prob(R)and
to
/j,

(/In
Hn

: n

G N)

be a

let

/j,

G Prob(R).

We say

that

converges

weakly

if

(and

only
V/i

if)
C6(R),

(a)
and

fin(h) ->
then write

//(/i),

(b)

/J'n^
179

fi

180

Chapter
We

17: Weak of Prob(R)

Convergence

(17.1)..

know

that

elements

correspond to distributionfunctions
F{x) =

via

the

correspondence
/J,

<-^ F,

where

//(\342\200\224oo,x].

Weak

convergence

of distribution
Fn

functions is defined in the if and


only

obviousway:

(c)

^ F

if

//. //\342\200\236

-^

We are

Fn is the

distribution function
have,

generally interested in the


for

case when
random

F\342\200\236 Fx^,

that

is when

some

variable

Xn-

Then, by

(6.12), we

for

h G Cb{R),

^^(h) =
Note

JR

= E/i(X\342\200\236). f h{x)dFn{x)
F is meaningful X are RVs
even

that
different

the statement
probability

Fn -^
spaces.

if the

Xn^s are

defined

on
then

However,

if Xn

(n G N^ and
-.

on

the

same

triple (Q, J^, P)^

(d)

(X\342\200\236

A', a.s.)

^ =\302\273 (Fx\342\200\236

Fx),

and indeed,

(e)
Proof o/(d).
fj, is

(A\342\200\236^Xinprob)

=>

(Fx\342\200\236

Fx),

the

law of

Suppose that X. Then,

-+ X, X\342\200\236
for

a.s.,

G Ci,(R), ^

and that /z\342\200\236 is the law of X\342\200\236 and \342\200\224\342\226\272 we have h{X\342\200\236) a.s., and, h{X), = n{h).

by (BDD),
/x\342\200\236(A) E(X\342\200\236)

E{X)

Exercise. Prove (e).

17.2.
be

'practical

formulation

Example.
the

Atoms
law of
for

are a

Xn, so that
h

nuisance. Suppose that


fin

Xn

is the

unit mass

X. Then,

at -, and let /j,

-, X

= 0.
be

Let

//\342\200\236

the

law of

G Cb{R),

fin{h) =

h{n-')-^h{0)

= fi(h),

so

that

jJLn \342\200\224^ IJ\" However,

FniQ)

= 0

/> F(0) =

1.

..(17.2)
LEMMA

Chapter

17:

Weak

Convergence

181

(a)

Fn ^ F

Let {Fn) he
if

a sequence and

of DFs if
l\\m

on R, and let

be

a DF

on R.

Then

only

Fn{x)

F(x)

for every

non-atom (that is,


part.

every

point

of continuity) Let

x of

F.

Proof of 'only
Define

if'

Suppose

that

\342\200\224> F. F\342\200\236

x G R,

and let

^ > 0.

G Ch{^)

via

if
h{y)

<

X,
y

:=

1 \342\200\224 S~^(y -^

\342\200\224

x)

if x
\\{ y

<
>

<

x
S.

-{-

6,

lo
Then
\342\200\224> //\342\200\236(/i) /J,{h).

X -{-

Now,

Fn{x)
so that

< Hn{h)

and

n{h) < F(x -h


< F{x
may
4-

S),

limsupFn(x)
n

S).

However,

F is

right-continuous, so we
limsupF\342\200\236(x) n

let

| 0 to obtain
R.
R

(b)
In

<

F(x),

Wx e

similar

fashion,

working
n

with

y \302\273-> h{y

+ S), F{x

we find that for x G

and

^>0,

liminf
SO that

F\342\200\236(x-)

>

S),

(c) and

liminf
n

F\342\200\236(x-)

>

F(x-),

Vx G R.

Inequalities

(b)
the

(c) refine

the desired result.


the 'if
of part as a consequence

D
a

In
represent

next

section,

we obtain

nice

at ion.

182
17.3.

Chapter 17: Weak


Skorokhod

Convergence

(17.3)..

representation
of DFs on R, that F is a DF point x of continuity of F.

THEOREM

Supposethat
on
R

: n (F\342\200\236 \302\243N)

is

and there

that

\342\200\224> F\342\200\236(x) F{x)

a sequence at every

Then

exists

a probability
RV

(Xn)

of RVs and also a

triple (fi,J^,
such

P)

carrying

sequence

that

Fn =
and

Fx^,

F = Fx,
a..S.

Ji-n

\342\200\224^ -^

This

is a kind
We

of 'converse'to (17.1,d).
use

Proof.

simply

the construction

in Section

3.12. Thus, take

in,T,P) = i[0,l],B[0,l],Leh),
define

X-^{u;)

:=

m{{z

: F(z)

> u;},
X\"^

X'^{u;):=mi{z:F(z)>i^},
and

define

have DF

F and that
Let
2r

X^,

X~

similarly.
P(X+

We know from Section 3.12 that X\") = 1.

and

X~

z > X'^(u;). Then F{z) > of F with < so z. that > So hence, large n, Fn{z) u;, X^(uj) limsup\342\200\236 X^{u;) we can choose z I X'^{u;). Hence But (since non-atoms are dense), be

Fix Lj.
for

a non-atom

u, and
< z.

limsupJ\\:+(u;)<X-^(u;), and, by
similar

arguments,

liminfX~(u;)>X-(u;).

Since X~

< X+ and P(X+ = X\")

1, the

result follows.

17.4. Sequential compactness for Prob(R)


There

is a

the

unit
/In

but

the problem in working with mass at n. No subsequence of \342\200\224> in where //qo is A^oo Prob(R),

non-compact
converges (//\342\200\236)

space
weakly

R. Let

be //\342\200\236

in Prob(R),
is

the

unit

mziss at

oo. Here R

the

..(11.4) compact

Chapter metrizable
-^ in /^c\302\273

17:

Weak

Convergence

183

space

[\342\200\224cx),cx)],

the

definition

of Prob(R)

is obvious,

and Hn

Prob(R)

means

that
V/i

^in{h) -> //oo(ft),

e C(R).

of C(R) axe elements because (We do not need the subscript '6' on C(R) that while functions in to keep remembering bounded.) It is important need not. The in ^^(R) functions at 4-cx) and \342\200\224cx), C(R) tend to hmits a countable dense subset) while the space has space C(R) is separable (it
C6(R)

is

not.

think of the next topic. HereI next paragraph(not the next the analysis, treatment. I resort to bare-hands By the Riesz elementary on, section) of is the dual representation the spaceof bounded space C(R)* theorem, C(R) of The measures on weak* signed topology C(R)* is metrizable (R,S(R)). of C(R)* is the unit ball a nd under this topology (because C(R) separable), is compact and contains Prob(R)as a closed subset.The weak* topology of Prob(R) is exactly the probabilists' weak so topology, weak (a) Prob(R) is a compact metrizable space under our probabilists^ Let me
some

briefly describe
functional

how

one

should

assume

but from

topology.

The
LEMMA

bare-hands
(Helly-Bray)

substitute for

result (a) is the

following.

(b)

Let (Fn) be any

sequence

of distribution
non-decreasing

exist a

functions
function

on R. Then there
F

right-continuous

on

such

that

0 <

F <

1 and

a subsequence

(rzj) such that


at

(*)

limFni{x) = F(x)
We

every

point

of continuity

F.

Proof

make
countable

an

obvious
dense

use of
set

'the diagonalprinciple'.
R

Takea

C of

and

label

it:

C =

{ci,C2,C3,...}.

Since (Fn{ci) :
subsequence

N)

is a

bounded sequence, it

contains a

convergent

(i^n(i,j)(ci)):

Fn(ij){ci)
In

~> H(ci)

(say)

as j -> oo.

some

subsequence

of this
Fni2j){c2)

subsequence, we shall have


~> H{c2)

as j

~> oo;

184
and so on.

Chapter

17:

Weak

Convergence

(17,4)--

If we

put

n,- =

n{i,i)^ then
limFn.(c)

we shall have:
for

H{c):=
Obviously, 0 < if

exists

every

c in C.

< 1,

and

is non-decreasing

on C.

For x

R,

define

F(x)

:= lim

ciixH{c),
to x through as you can
can

the m'

signifying

that

c decreases

strictly

C. (In particular,
the

F{c)
pery'

need

not

equal F

Jy(c) for c in

C.)

Our function
of Sections

is right-continuous,
of depriving

check. By

'limsu-

holds: I wouldn't

17.2 and 17.3, you


dream

also

check

for yourself that

(*)

you of that

pleasure.

17.5. Tightness
Of

course,

the

function

F in the It
will

neednot Helly-Bray Lemma 17.4(b)


a distribution

be

distribution

function.

be

if and only if
F{x)

lim F{x)
Definition
\342\226\272

= 0,

lim

1.

sequence

\302\243 > 0,

there

(Fn) of distribution exists K > 0 such that


finhK,K]

functions is called tight - F(-K-)

if,

given

= F{K)

>l-e.
out

You

can

see

the

idea:

-oc'.

'tightness stops

mass beingpushed

to

4-cx) or

LEMMA
Suppose

that

Fn

is a

sequence of DFs.

(a)
(b)

If
such

Fn

F for

some DF
then

F,

then

(Fn)

is tight.

If (Fn)
that

is

tight,

there

exists

a subsequence

(Fm) and a DF F

Fm

-^ F.
exercise.

This is a really

easy limsupery

Chapter

18

The

Central

Limit

Theorem

The

Central

Limit we
that

Theorem it as
weak

mathematics. Here

derive of CFs.

which says
convergence

(CLT) is one of the great results of Theorem a corollary of Levy's Convergence to DFs exactly pointwise of corresponds convergence

18.1. Levy's
\342\226\272 \342\226\272\342\226\272 Let

Convergence Theorem
be

(Fn)

a sequence

of DFs, and let

ifn

denote

the

CF

of Fn-

Suppose that
g{6)

:=

lim(fn{^)

exists for

all 0 eR,

and

that

g{')
Then

is continuous

at 0.
F,

= (fp

for some

distribution function
f\342\200\236zf.

and

Proof

Assume
the

for the
(Fn)

moment that
is tight.
can

(a)

sequence

Then, by the

and a DF F But then, for

Helly-Brayresult 17.5(b),we
that

find

a subsequence

(Fn^)

such

Fn.
^ G

- F.

R, we

have
: CF (<^\342\200\236,

<fn,iO) ^ ^FiO)
(take

of

F\342\200\236J

h{x)

= e'*^).

Thus g =

ipp.

185

186
Now weakly

Chapter

18:

The

Central

Limit

Theorem

(18.1)..

we to

argue

by contradiction.

Suppose that {Fn)


of

does not converge


F,

F.

Then,
we

for some
shall

point x

continuity

of
77

subsequencewhich
(*)

denote

by {Fn)

and an

>

0 such

we can find that

\\Fn{x)-F{x)\\>r^,
so

yn.
and

But (Fn) is tight,


that

that

we can find

a subsequence Fn^

a DF

F such

F
But

^ F
=1 g
=1

then

(fnj

~> (^,

so that

(p

ipp see

ipp.

Since

a CF

determines the

correspondingDF a of F and
non-atom

uniquely,

we

that

F =

F, so that, in particular,x is

(**)

Fn,(x)^F(x)=F(x).
(**)

The contradictionbetween(*) and


We must now prove

clinches

the result.

(a).
Let e
+

Proof

of

tightness

of (Fn).

> 0 be given. Sincethe

expression

M^)

M-^)

= I 2cos{0x)dFn{x)

Jr

is

real,

it

follows

that

g{0) +

g{\342\200\2240)

is real

(and

obviously

bounded above by
can

2).

Since
such that

is

continuous

at 0

and equal to
when

1 at 0, we
\\0\\

choose

6 >

\\l-9{0)\\<ie

<

6.

We now

have

0<6-'
Since

Jo {2-g{0)-g{-0)}d0<ie.
Convergence Theoremfor the in N such that for n > no.
finite

= lim(/?n,

the Bounded

interval

[0,6] shows that

there exists no

6-' I Jo

{2-^n{0)-^n{-0)}d0<e.

..(18.2)

Chapter

18:

The Central Limit

Theorem

187

However,

6-^

Jo

{2-^n{e)-<Pn{-6))d6

= 6-'

U {l-e''^)dFn{x)\\dO

the

interchange
\342\200\224

|1

e*^^|

< 2,

have, for n

> no,

of order of integration being justified by 'the integral of the absolutevalue' is clearly

the

fact
finite.

that
We

since
now

> / J\\x\\>26-->l\\x\\>26-

dFn

=fin{x:

\\x\\

>26-^}

and

it is now

evident that the

sequence(Fn) is tight.

obtain 'Taylor'estimates on
18.2.

If you

now re-read

Section 16.4, you


characteristic

will

realize functions.

that

the next

task is to

o and
that

O notation

Recall

/(t)

= 0(^(0)

3s

t^L
CO

meansthat

< limsup|/(t)/(9r(t)|
and

that

f{t)=o{g{t))

3s

t^L
t^L.

means that
f{t)/g{t)^0

3S

188
18.3. Some

Chapter 18: The

Central

Limit

Theorem

(18.S)..

important estimates
and

For

0,1,2,...

x real,
=

define the 'remainder'


e^^-\302\261

i?\342\200\236(x)

^.

Then

i?o(x)= e^^-l=
and

r
Jo

ie^ydy,

from

these two expressions


|i?o(x)|

we see that

<min(2,!x|).

Since

Rn{x) =
we

/ Jo

iRn^i{y)dy,

obtain

by

induction:

Suppose

now that

A\"

is

a zero-mean
(7^

RV in C^:
Var(J\\:)

E{X) = 0,
Then,
\342\226\272 (a) with (/?

:=

< oo.

denoting

(/?x? we

have
|Ei?2(^X)|

l^(^)

(1

- \\a'9^)\\=

<

E\\R2{eX)\\

(\\xf^miy

The final term within to 0 as ^ \342\200\224> 0. Hence,


\342\226\272 \342\226\272(b)

E(-)

is dominated

by the

integrable
^

RV

\\X\\^

and

tends

by (DOM),
(^(^)

we have

l-i<T202_^o(^2)

^^0.
logs,

Next, for

\\z\\

<

^, and

with principal

values for

tdt

Jo

^+w

Jo 1

and since|1+ tz| > ^


\342\226\240\342\226\272(c)

we

have

\\logil

+ z)

z\\

<

\\z\\\\

\\z\\<\\.

,.(18.5)

Chapter 18:

The

Central

Limit

Theorem

189

18.4. The
Let \342\226\272 \342\226\272\342\226\272

Central Limit Theorem


{Xn)

he an

IID sequence, each Xn


E{X)
-

distributedas X

where

= 0,

(T^ :=
set

Var(X) < oo.

Define

Sn

'-= Xi

-\\-

-- -{\342\226\240 and Xn,

\342\200\242 \342\200\242\342\200\224

7=

cr^/n

Then,

for

x G R, we have,
P{Gn<x)^^x)

as n
=

\342\200\224^ oo,

-^J'

exp{-iy^)dy.

Proof.

Fix

in

R. Then,

using (18.3,b),

the 'o' now


we have,

referring

to the
oo,

situation

when n -^

oo. But

now,

using

(18.3,c),

as n

\342\200\224\302\273\342\200\242

logvo.W=nlog{l-l^+o(^)}

Hence

v^g\342\200\236(^)

-^

normal distribution, the result follows

Gxp(

and \342\200\224^^^),

since

0
from

i\342\200\224> exp(\342\200\224^^^)

is

the

CF

Theorem

18.1.

of the D

18.5. Example

Let us look at a simple examplewhich

shows

adapted to dealwith
With

sequence

of independent

how the method may but non-IID RVs. some

be
P),

the

Record

Problem

jBi, jB2,

\342\200\242 \342\200\242 \342\200\242 are independent

events

E4.3 in mind, suppose that on with P{En) = 1/n. Define

(ft, j^,

190

Chapter

18:

The Central Limit


time

Theorem

(18.5)..

the 'numberof
E{Nn)

records

by

n' in the + 7

record context. Then

^-r=logn

+ o(l),
log

(7 is Euler's
7

constant)

Var(Ar.)

^
k<r,

(^1

i^
_

n +

Y + o(l).

Let

Nn

- log n
0 in

so that

E(Gn)

-^

0, Var(Gn)

\\/logn -^ 1. Then, for fixed

R,

^Gni^)
But

\342\200\224

exp{-iOy/logn)(fN^

ik=l

nvx.(.)=n{i-i+i.\"}. k=l
with t

We see that

as n

\342\200\224\302\273\342\200\242 and

00,

:\342\200\224 0/y/logn,

k=l

= -z9y/i^

+J2l k=l
(it

(it ^

It' + oit')) ^
j
[logn

fE
\\k=l

p)

-iOy/logn-]-

- 1^2+o(t2)

0(1)]

+ ^^0(1

-1^2+o(l)--l^^
x)

HenceP{Gn<
18.6.

<^(x), x
(1980)

eR.
for

D
some

SeeHall and Heyde


CF

very general

limit

theorems.

proof
was

of Lemma
us

12.4
if part

Lemma12.4gave
statement

the

'only

of the Three-Series Theorem Its 12.5.

els follows.

..(18.6)

Chapter 18:

The

Central

Limit

Theorem

191

LEMMA

Suppose that
by

{Xn)

is

a sequence

of independent random variables

hounded

a constant

K in [0, oo):

\\Xn{u^)\\<K. Vn,Vu;.
Then

and Y^W^v{Xn) < oo). {Y^^{Xn) converges The proof given in Section 12.4was rather sophisticated. of as a consequence Proof using characteristic functions. First, note that, estimate (18.3,a), if Z is a RV such that for someconstant Ki, < /iTi, (j2 := Var(Z) < oo, \\Z\\ E(Z) = 0, for then |^| < K^^, we have
{Y^Xn

converges,

a.s.) =^

< 1_
< exp

1
i<y202 _^_

2^2 ^

1_

1
2^2

(-!-')\342\226\240

Now

take

:= X\342\200\236 Then Z\342\200\236 E{X\342\200\236).


E(Z\342\200\236) \\^z\342\200\236m

0,

Var(Z\342\200\236)

Var(X\342\200\236),

\\exp{-ieE{Xn)WxM\\

\\^x\342\200\236m,

and

2K. \\Z\342\200\236\\<

If X; {2K)-\\

Var(X\342\200\236)

oo,

then,

for 6

< i2K)-\\
\302\253^P

we

shall

have,

for 0

<

\\d\\

<

n
However,

l'^^*(^)l =
if

IV'^*(^)I

{-^^'

Var(X*)|

= 0.

Yl^k

converges

a.s. to

5, then, by (DOM),

nv^x.(^) =
k<n

Eexp(i^5n)^(^s(^),

and

^s{0)

is continuous in
Y^W^v{Xn)

6 with

<^s(^)

1. We

have a

contradiction.

Hence
12.2(a)

X)Vax(Zn)

< oo,

and, since E(Zn) =

0,

Theorem

shows

that

\\^ Zn

converges a.s.

Hence
converges

Y,E{Xn) =
a.s.,
the

Y.{X\342\200\236-Z\342\200\236}

and

since

it

is

a deterministic

part of

argument

was used in

Section 12.4.

sum, it converges!

This last

APPENDICES

Chapter Al

Appendix

to Chapter

Al.l.
In the
example

non-measurable Banach

subset and

A of

5^.
of

spirit of

Tarski,

although,
Axiom

pre-dates

theirs, we use the

of course, this relatively to show that Choice

trivial

(a)

S'-\\jA,
are

wherethe Aq the others by


clear

disjoint

sets,
If the

rotation.

be obtained each of which may set A = Aq has a 'length' then

from
it

any of

is

intuitively

that result

(a) would force


27r

oo X

length

(A),

an impossibility.

To constructthe
{e*^ : 0 G
w

R} inside
exist

z ^

ii

there

family (Aq : q G Q), proceed as follows. Regard S^ as C. Define an equivalence relation ~ on S^ by writing a and ^ in R such that

z = e*'\",
Use

e'^,

- ^
which

Q.

the

Axiom of
of

Choice to produce a set A


class. Define
=

has

precisely

one

representative

each

equivalence
Aq

e'^A^

{e'^z:zeA}.

Then
could

the family
be replaced

{Aq

: g
Z

G Q)

has the
the to

desired properties. (Obviously,


above its
fully

by

throughout

argument.)
rigorous

remainder

We do not bring this example of this appendix is fully

conclusion.

The

rigorous.

192

..(Al.S)
We

Chapter
set

Al: Appendix to Chapter

193

now

out

to prove

Uniqueness Lemma 1.6.

A1.2.
Let

c?-systems.

5 be

a set, D,

a d-system
(a)

of and let I> be a collection if {on S)

subsets

of 5.

Then

T>

is

called

5 G

(b)
(c)

ii A,B eVdindACB
ii
An

then

B\\A

e D,

eV
An

and
^ A

An

^,

then

AeV.
An+i(Vn)

Recall that
(d)

means:

An Q

and

[JAn of S

\342\200\224 A.

Proposition.
only

A collection
both

of

subsets

is a

a-algebra
part.

if

and

if S is
if

a 7r-system

and

a d-system.
the

Proof The 'only


Suppose that
G S.

part

is trivial,

so we prove only
and

'if

S is both
Then
E\"\"

a 7r-system
S\\E

a cf-system,

and that -E, F

and

En{n

N)

:=

G S,

and

^UF=:5\\(E^nF^)GS.

HenceGn
Finally,

'-\342\200\224 EiU.,

.UEn

G Ti and,

since Gn T

U-^^^

^^

^^^ ^^^^

|J

J^ib

G S.

is a c?-system,

of d(C). Supposethat C is the to be intersection all of c?-systems d(C)


Definition
the

smallest

cf-system
d{C)

of subsets of 5. We define which contain C. Obviously, d(C) which contains C. It is also obvious that
a class C a{C).

Al.S.
\342\226\272 \342\226\272

Dynkin's
If

Lemma
d{I) = a{J).

is a TT-system, then

Thus
Proof.
TT-system.

any

c?-system

which
by

contains
that

a 7r-system

contains the
that

a-algebra generated
Because of

7r-system.
need

Proposition Al.2(d),we

only

prove

d{T) is a

194

Chapter

Al:

Appendix

to Chapter
G

1
el}.

(A1.3)..

that Vi D I. It is easilychecked system, from d{I), [For, clearly, 5 G X>i. Next,

Step 1: Let

Dj

:=

{B G d{X)

: BCiC

Because J is a ttthe cf-system structure inherits Vi if Bi,B2 e Vi and Bi C B2, then,


d{I),

yC

for C in J,
and,

{B2\\Bi)nC
=

{B2nC)\\{BinC);

since ^2 H C G c?( J), J3i (B2\\Bi) n C G d{I), so that Bn T i^, then for C G J,

fl

G d{I) ^ ^2X^1
C

and c?( J) is a c?-system, A. Finally, if Bn G

we I>i(n

see G N)

that

and

(^nnc)T(^nc)

so that

n C G

which containsJ, so that


Step

d{T) and

B
(since

X^i.]

Vi
:

We have shown that Vi C d{X) by its definition)

is a c?-system
X>i

d(X).

2: Let
X>2

D2 := {^
T.

d(J)

BHAe

d{I),

VJ3

G d(J)}.

that
the
fact

contains

c?-system
that

structure
d(T)

X>2 =

just as in Step 1, we can therefore from d{X) and that says that c?(X) is a 7r-system.
But,

prove D2

Step 1 showed that X>2 inherits = d{X). But the

A1.4. Proof of Uniqueness


Recall

Lemma

1.6

what
Let

the crucial
S

Lemma 1.6 stated: T


be

be a
jjLi

set. Let

a Tr-system

on 5,

and let E

that

and ~

and

Hi

/i2 cire measures Then /j,2 on J.

on (5, S)

such that
E.

:= cr(2').Suppose
=

iJi\\{S)

l^2{S)

< 00

Hi

\342\200\224 /jL2

on

Proof.

Let

I? =
X>

{F

: Mi(F)

= /.2(F)}.
fact

Then
A,B

is

a c?-system

on 5.

[Indeed, the
=

that

G T> is

given.

If

eV,

then

(*)

Hi{B\\A) =
that

Hi(B) ii

Hi{A)

H2{B)

H2{A) ^,

MB\\A\\
by Lemma

so

B\\A

G V.

Finally,
=T

Fn

V and

F^

then

1.10(a),

/ii(F)
so that
F

lim/ii(Fn)

=T

lim/i2(Fn)

= fi2{F\\

G v.]

..(A1.5)

Chapter Al:
V^Ihy

Appendix

to

Chapter

195

that

and Since I> is a cf-system V 2 cr( J) = S, and the

hypothesis,

Dynkin's Lemma shows


\342\226\241

result follows.
circular argument is entailed by
the

Notes.

You

should

check that no

use

of

Lemma
The
/J'2{S)

1.10 (this
reason oo is
<

is obvious).

for the insistence that we do not wish

on finiteness in the condition Hi{S)= to try to claim at (*) that


oo
\342\200\224 oo.

oo = oo \342\200\224

Indeed

the Lemma

1.6 is false if

'<

oo' is

omitted

- seeSection Al.lObelow.

We

now

aim

to prove

Caratheodory's Theorem 1.7.

A1.5. A-sets:'algebra'case
LEMMA

Let

Qo be an

algebra of

subsets of S
A:go^[0,oo]

and

let

with

element

A(0) =
of

0. Call an elementL of Qq
properly\\'

X-set

if L ^splits

every

Qq

\\{L n

G) + A(i:^n
\\-sets

G) = A(G),
an algebra,

VG

G Go.
is

Then the class Co of on Co. Moreover, for

is

and

disjoint\302\243i,\302\2432,... ,Xn

dnd \342\202\254 \302\243o

additive finitely G in Qo,

A(|J(i:,nG)J
Kk=l

-^ACL.nG).
/

k=l

Proof. Step 1: Let Li prove that L is a A-set.


Now we

and

L2

be

A-sets,

and let
= L^.

L = Li

fl

L2.

We

wish

to

have,

^2 = L2 n LI for any G in ^0,


L^ n

and L^
n

H Z^

Hence, since L2

is a

A-set,

\\{L' n G) = A(i:2

Lj

n G)

-f \\{Li n

G)

196
and, of
Since course

Chapter

Al:

Appendix

to Chapter

(A1.5)..

\\{L'i Li

n G)

+ \\{L2 n G)

A(G).

is a

A-set,
A(L2

Lj

n G)

+ \\{L n G)

\\{L2

G).

On adding

the three
\\{L'

we equationsjust obtained,

see

that G ^o,

n G)
A-set.

+ \\{Ln G) =

A(G),

VG

so that L
Step
A-set,

is indeeda
follows
and

2:
it

Since,
now

and trivially, 5 is a A-set, that \302\243o is an algebra.

the

complement

of a A-set

is a

Step 3: If

L\\

L2

are disjoint

A-sets and

GG
L2)

^0,

then

(Li

U L2)

r^Li=Li,

(Li

nLl

= L2,

so, since Li

is a A-set,
x{{Li

u L2)

n G)

A(Li

G) +

A(i:2 n G).
D

The proof is now


A1.6.

easily

completed.

Outer

measures
of

Let ^ be a cr-algebra

subsets

of S.
A:g^[0,oo]

map

is called
(a)

an outer 0;

measureon (5, Q)
for Gi,

if

A(0) =
A

(b)

is

increasing:

G2 G ^

with

Gi

C G2,

A(Gi)<A(G2);

(c)

is

countably

subadditive:

then

ii (Gk)

is

any

sequence

of elements

of ^,

^(u^M

^E^(^*)-

..(A1.7)

Chapter

Al:

Appendix

to Chapter

197

A1.7. Caratheodory's Lenima.


\342\226\272 \342\226\272

measurable A he an outer measure on the Let X-sets in Q form a a-algebra C on which that (5, \302\243, A) is a m,easure space.

space
A is

Then the {S^Q). so additive, countably

Proof. Becauseof Lemma then sequence of sets in \302\243,


(a)

A 1.5, L :=

we need only |J Ljfc E >C and


k

show that

if (Ljfc)

is a

disjoint

\\{L) =

Y.^{Lk).
k

By

the

subadditive

property
A(G)

of
<

A,

for

G ^

Q^'we have

(b)
Now

\\{L

n G)

+ \\{L' n

G).
that

let

Mn

:\342\200\224

IJifc<n

-^*-

Lemma

so A1.5 shows that Mn G \302\243,

A(G) However,
(c)

- \\{Mn n G) +

A(M^

G).

M^ D

L^, so that
A(G)

> X{Mn n
us to

G) +

X{L'n G).

LemmaAl.5

now

allows

rewrite
\\{Lk n

(c) as G) +

A(G)

>

Y^
k<n

\\{L' n G),

SO that

(d)

A(G) >

Y^ X{Lkn G) + \\{L'n
k

G)

> x{L

n G)

+ A(i:^ n G),

subadditive of A in the last step. On comparing using the countably property and (b), so that (d) with (b), we see that equality must hold throughout (d) L e C] and then on taking G = L we obtain result (a). D

198

Chapter

Al:

Appendix

to Chapter

(A1.8)..

Theorem. A1.8. Proof of Caratheodory's


Recall

that Let

we need S

to prove

the
an

following.

be a

set, let

So be

algebra

on Sj

and let

S:=<t(So).

If

fiQ

is

a countahly

additive

a measure// on (5,S) such

map
that

fiQ

: So

~> [0,

ooj^ then there

exists

fi =

fiQ

on

So.

Proof.

Step 1:

Let

be

the

<t-algebra

of all

subsets

of

5.

For

G E

Q, define

A(G):=inf J]^o(i^n),
n

where We

the now
A

infimum prove
is

is taken

over all sequences(Fn) in


on (S,Q).

So

with

G C

{JFnn

that
outer

(a)

an

measure

The facts that A(0) = 0 and A is increasing are obvious. Suppose that (Gn) is a sequence in that each A(Gn) is finite. Let e > 0 be given. For ^, such a sequence each n, choose ^ ^ N) of elements of So such that (Fn^k \342\200\242
Gn

U Fn,k,
k

Yl
k

/^0(i^n,ifc)

<

A(Gn)

\302\2432-^

Then

G :=[jGnQ[jU

^n,k, so that
k

KG)

<

E
n

<E E /^''(^\".*)
ik

^(^\+^-

Since

e is

arbitrary, we have

proved

result

(a).
is

Step 2: By
is the

a-algebra of A-setsin

Caratheodory's LemmaAl.7, A

Q. All
So;

we

need

a measure on show is that

(5,

where \302\243),

(b)
(5,S).

So Q >C,

and

A = Q

//o on
C and

for then S := ^(So)

we can

define // to be the restriction of

A to

..(A1.8)

Chapter Al:
that

Appendix

to

Chapter

199

Step 3: Proof

A =

//q on

Sq.
suppose \\{F) < fio{F). Now a sequence we can define (En)
^

Let F G So. Un ^n, where Fn sets:

Then,
G Sq.

clearly, As usual,

F C that of disjoint

E^'.^Fu

En^Fnr^^[^
\\k<n

fA
/

such

that

En C

Fn

and

[jEn

^{jFn'^ n

F. Then

//0(F) =
6^/

/iO

(|J(i^

En))
fiQ

X^ /io(F n Fn),
Hence

using

the

countable

additivity /^O(F) <

of

on

Sq.

J]//o(Fn)

< X;/^0(Fn),

SO

that

A(F) Proof

> fJ'o{F). that

Step 3 is complete.

Step sequence

4'

So Q
such

in (F\342\200\236) So

C. Let F G So and that G C |J^ F^, and


5^//o(Fn)<A(G)
n

G e

Q. Then

there existsa

\302\243.

Now,

by definition

of

A,

flo{Fn) =
n

Y, //o(F n Fn) + 5]
n n

/^0(F^

Fn)

> \\(E n
since
\302\243: n

G) +

n A(\302\243J'^

G),
D F\342\200\236). Thus,

U(\302\243;

D F\342\200\236) and

jE'^ n

G C

U(\302\243;<^

since

e is

arbitrary, A(G) >


n A(\302\243;

G)

+ a(je;^ n

G).

However, since A

is

subadditive,

A(G)
We

<

n A(\302\243;

G)

+ A(je;= n

G).

see

that

E is

indeed a A-set.

200

Chapter

Al:

Appendix

to Chapter
measure

1
on

(A1.9)..
((0,1],S(0,1]).
say

ofLebesgue A1.9. Proof ofthe existence Recall the set-up in Section1.8. Let 5 = union if F may be written as a finite
(*)

(0,1].

For F

C 5,

that

G So

F=(ai,6i]U...U(ar,M

where r G N, 0
convince

yourself)

< ai < 6i < ... < ar < 6r < 1- Then (as you So is an algebra on (0,1]and
S:=(j(So)

should

= S(0,l].
For

(We write

B(0,1] rather than

B((0,1]).)

F as

at (*), let

//o(F) = Yl^hk
k<r

ak).

Of course, a
of

set F

may

have

different

expressions

as a finite

disjoint

union

the

form

(*): for example,

(0,l] = (0,i]U(i,l].
it is easily seen that fiQ is well defined on So and that //o is finitely is obvious additive on So. While this a from picture, you might (or might to make the intuitive not) wish to consider how argument into a formal
However,

proof.

The

key
that

thing
(Fn)

is to prove
is a

that

fiQ

is

countably

additive
So

on So.
union

suppose

sequence of disjoint

So. We know that if Gn

= ULi
n

elementsof

So,
F in

with

^k,

then

MGn)

^f^oiFk)
k=i

and Gn
it is

F.

To prove that fiQ for then //o(F),

is

countably

additive

enough to show that

fJ-o{Gn)

/xo(F)

=T

lim^o(C?\342\200\236)

=T

limJ^MFk)

5]/xo(Ft).

Let

= H\342\200\236 F\\Gn.

Then

\302\243 H\342\200\236 So

and

H\342\200\236 J. 0.

We need only

prove that

MHn) i 0;

..(A1.9)
for

Chapter

Al:

Appendix

to Chapter

201

then

It

is clear

that an alternative
is a

show is the following:


(a)

(and final!)

rewording

of

what

we need

to

if (Hn)
\302\243 > 0,

decreasing sequenceof
f^o{Hn)

elements

o/Sq

such

that for

some

>

2\302\243,

Vn,

then

fl

^ifc

7^ 0.

of Eo that, for the definition Proof of (a). It is obvious from the closure we can choose Jk G So such that, with Jk denoting

each A;
of Jk,

G N,

JkQHk
But

and

fi{Hk\\Jk)

<

e2-^.

then (recall that Hn

i)

fio[Hn\\f]Jk]<f^o[[J
Hence,

< {Hk\\Jk) j Yl
we

^2\"*

<

e.

since fio{Hn)

>

Vn, 2\302\243,

see that

for every n.

fJ'O

ik<n

f]Jk]>e,

and hence

f]k<n \"^k

is

non-empty.

A fortiori
Jk

then, for

every

n,

Kn

'-\342\200\224

\\\\

is non-empty.

k<n

That

(b)
now
A:

n
follows
N)

*^^

'^

(whence

fl ^ik
if

7^

0)

gives

as follows. Alternatively, we can arguedirectly in the set Since each Xn KnXn non-empty belongs we can find a subsequence and a point x of (uq)

from the Heine-Borel theorem: for a covering of [0,1] by open sets

with

is false, then {{JkT \342\200\242 no finite subcovering. For each n, choose a point
(b) to
Ji many

such

However,

for each

fc,

Xn,

G Jk

ior all but

finitely

the compact set Ji, that x^, \342\200\224> x. and since is Jk g,

Chapter Al:
compact,
it

Appendix

to

Chapter

1
and

(A1.9)..
property

follows

that

x G Jk-

Hence x G Hifc
on So

^k,

(b) holds.

D
is countably

has a

// unique extensionto a measure measure Leb on ((0,1], S(0,1]). The

Since//q

additive

and //o(0,1]< oo,it


((0,1],S(0,1]).

follows

that

//q

on

This

is Lebesgue
the

//Q-sets

form

a-algebra

of Lebesgue measurablesubsets of

a a-algebra

strictly larger
(0,1].

than

S(0,1],

namely

See

Section

Al.ll.

ALIO.
With

Example of non-uniqueness of extension


(5,

So)

as in

Section

1.9,

suppose

that
F

for F G So,
= ^,

(a)
The Caratheodory
of

-o(^)-{L
extensionof
u{F)
vq

ff

will

be obtained
u

a^ the obvious extension


by

(a)

to

S.

However,

another extension
\342\200\224 number

is

given

of elements

in F.

Completion of a measure space In fact (apart from an 'aside' on the Riemann in completions this book.
Al.ll. of 5 as follows:
AT G

integral),

we

do not

need

Suppose

that

(5,

S, //)

is a

measure space. Define

a class

M of

subsets

TV if and only if

3Z G S
satisfying

such that
to be

AT C

Z and

//(Z) =

0.

It is sometimes philosophically
that any

able to

'iV subset

in M F

is //-measurable of 5, write

and ^{N) \342\200\224 0'. This FgS*

is done

make precisethe idea as follows. For

if

3J^,

G G

S such that J^

C F C G and
and

show
obvious

that

S* is a

a-algebra on 5
for F
G ^\342\200\242(F)

//(G indeed

\\ F) that

= 0. It is very S* = a(S,^).

easy
With

to

notation

we define

S*,

^i{E)

= ^i{G),

it

being

easy
(5,

prove that

to check that is a S*,//*)

it is no problem //* is well defined.Moreover, measure space, the completion of (5, S, //).

to

..(A1.12)

Chapter Al:
of

Appendix

to

Chapter

203

For parts
probability example)

advanced

probability,

it is

essential to
of

completethe basic
when

triple

(Q, j^,

measures on (5, S), it is meaningless If we begin with ([0,1],S[0,


are

is topological,

parts S = B{S)^and we wish


to

P). In other

probability,

(for

to

consider

several

different

insist

on completion.

l],Leb),

then

S[0,1]*

is the

a-algebra of
example,

what

called

Lebesgue-measurable
~>

sets of

[0,1]. Then, for


if the

a
of

function
every

/ :
Borel

image

of a

it need not be true that the Lebesgue-measurable set is Lebesgue-measurable.


set is

[0,1]

[0,1]

is Lebesgue-measurable

inverse image
inverse

Lebesgue-measurable:

theorem A1.12. TheBairecategory


In

Section

1.11,
jH\"

we studied

a subset

iJ

of

5 :=
open

[0,1] such that


subsets

(i)
(ii)

P] Gk
k

for a

sequence {Gk)of Q fi
{hr 5. : r

of 5,

HDV,

where

y =

If H

were countable:

H
S =

\342\200\224

G N},

then we would have

(a)

HiJH^
union

{\\j{hr})^{\\JG%) r

expressing5 as a countable
(b)

S^lJFn

n
an
only

of closed every
k^

sets where no
G^

Fn

contains

open

interval.
that as a

CV^

so that

G^ contains
theorem

[Since
points

Gk

for

irrational

in 5.]

However, the
if a

Baire category
closed

states
he

complete metric

spaceS may

written

union of a countable

sequenceof
then

sets:

some Fn contains
be

an open ball.
in functional

Thus the set H must


The

uncountable.

Baire category

theorem has fundamental

applications too!

analysis,
Proof

and some strikingapplicationsto probability


no

of the

contradiction that

of 5,

we can

Baire category theorem. Assumefor the Fn contains an open ball. Since Ff is a


find

purposes
non-empty

of
open

subset

xi

in S

and

> \302\243i

0 such

that

^(xi,\302\243i)CF^

Chapter Al:
B{xi,
denoting \302\243i)

Appendix

to

Chapter

(A1.12)..
F2

the

open
the

ball of
set

radius Si centredat xi. Now

contains

no open

ball, so that

open

U2:=B{xi,2''ei)nF^

is non-empty,

and we

can

find

X2 in

U2

and

> \302\2432

0 such

that

B{x2,e2) Q U2,
Inductively,
\302\243n+i <

62

<2\"^\302\243i.

choose
2\"^\302\243n

a sequence

(xn) in S

and (\302\243\342\200\236) in (0, 00)

so that

we have

and

Since
Cauchy,

d(x\342\200\236,Xn+i)

<

it is 2\"^\302\243\342\200\236,

obvious

from the

so

that

x :=

limxn exists,

and that

triangle law that (xn) is

xef]Bix\342\200\236,e\342\200\236)cf]F^

contradicting

the fact that

[JFn =

S.

Chapter

A3

Appendix

to Chapter

A3.1.

Proof

of

the

Monotone-Class
the

Theorem

3.14
a set

Recallthe statementof
\342\226\272

theorem.
functions

Let

7i

he a

class of hounded
conditions:
space

from

S into

satisfying

the

following

(i) H
(ii) the

is a vector

over
1

R;
is an

constant function
i^ 0' sequence 1

element

of H;

(hi)
Then

if

(fn)

of non-negative
hounded

functions in H such that


on
of

fn

f where f

is a

function

S, then
every

\302\2437i.

if 7i
X,

contains
then

the indicator
contains

function

set

in some

tt-

system

7i

on S.
-

every hounded a{I)'measurahle

function immediate
I, V

Proof Let
from

T>

be

the

(i)

(iii)

that

clsiss of sets -F in 5 such that I/? D is a cf-system. Since T> contains

G W. the

It is

7r-system

contains (t{X).

Supposethat / is a <T(I)-measurable
N,
For n G

function

such

that

for some

K in

0 < f{s) < K,


N, define

V5

G 5.

t=0

where

A{n,i)

:= {s

: ^2\"\"
G H.

<

f{s)

< (i

+ 1)2\"\"}. so that
lA{n,t)

W is

Since/ is fT(J)-measurable, a vector space, every

every fn

A{n^i)

G ^(2\,")

^'

Since

But 0

<

T /, /\342\200\236

so that

f G H.

205

206

Chapter

A3:

Appendix

to Chapter

3
where

(A3.1)..
/

li f e bcr(I),we and /- = max(-/,0).


f'^

may
Then

write /+,/\"

= f^

- f~,

^f~

G W by what

we established

above.

G h(j{I)

and /+,/\"

> 0, so that

max(/,0)

of generated A3.2. Discussion


This

cr-algebras

is

one of those

situations in which it
abstract

is

actually

easier

to understand

things in a more formal


ft

setting.

So, suppose
~>

that

and

S are

sets, and that

F : ft

5;

S is

a a-algebra on
-^

5:

X : ft
Because

R.

Y~^ preserves

all set operations,

is a a-algebra ft such that y


on on

ft,

and

because

it is

Y is

^/S measurable(in that

tautologically the smallest a-algebra F\"^ : S ~> y), we call it

a{Y):

a{Y)= r-^S.

LEMMA

(a)

is cr{Y)-measurable

if and only if

X = f{Y)
where

is a

Ti-measurable

function from

to

R.

Note.

The
of

'if part
^only

is just the CompositionLemma


It is

Proof
(b)

if

part

enough to prove that 3/


G

G ba(F)

if and only if

bS

such

that

X =

/(F).

(b), we may as well use it. So define 7i to be the class X = /(F) for some / G bS.

axetan X, for example.) (Otherwise,consider Though we certainly donot needthe Monotone-Class


of all Taking

Theorem

to prove

F = Y'^B for

bounded functions JC on ft such I = a{Y), note that if F G J

that

then

some

B in

S, so that

Mio) = Ib(F(u;)),

..(A3.2)
so that
If Finally,

Chapter

A3:

Appendix

to Chapter

207

G H.

That

W is
that

is obvious. a vector spacecontainingconstants


(Xn)

suppose
real

is a
K,

for some positive

sequenceof
X <K,
Define

elements

of H

such that,

constant

0<Xn'[

For each n, Xn
/ G bS.

Then X
to

= /(F).
careful about
(3.13,b)

/n(^)

for some

in bS. /\342\200\236

/ :=

limsup/n, so that

One

has

be very

what

Lemma

(a)

means

in practice.

To be sure,result
Discussion

is the

special case when


that
Fjt \342\200\242 ^

(5, S) =
1 <
fc

(R,5).
<

define a

map F :

o/(3.13,c).
12

-^

Suppose R\" via

~^ R for

n.

We may

r(u;):=(r:
The

(\302\253),...,

r\342\200\236(u-))\342\202\254R\".

problem

mentioned

up here because,
to prove

beforewe
:=

at (3.13,d)
can

and in the
Lemma

Warning

following

it shows
need

apply

(a) to prove (3.13,c),we

that

(t(Fi,
[This

...,

Yn)

aiY^'^BiR)

: 1

< k <

n) = F-^S(R\")=: (t(F).

that the product <T-algebra ^^ ^^^ proving ^C^) ni<fe<n See Section 8.5.] Now Yk = 7ikoF, where 7^ is^the hence 'k^^ coordinate' (continuous, map on R\", so that Yk is a(F)-measurable. Borel) On the other hand, every subset of R\" is a countable union of open open of R, and since rectanglesd x - - x Gn where each Gk is a subinterval
amounts

to

same

as 5(R\.")

{YeGix.-.xGn}^C]{YkeGk}e<7iYi,...,Yn),

things do work
You discussion

out.

can already of (3.13,d).

see

why

we

are

in an

appendix,

and

why

we

skip

Chapter

A4

Appendix

to

Chapter

This Logarithm.

appendix Section

gives A4.3

of Strassen's Law of the Iterated the statement the completely different topic of constructing treats

rigorous
A4.1.

model for
Kolmogorov's

Markov

chain.

Law

of the

Iterated Logarithm
mean 0
almost

THEOREM
Let

JCi,-X'2

\342\200\242 \342\200\242 f>^ ^I^ ? \342\200\242

RV^

^o,ch with

and

variance

1.

Let

Sn

'-\342\200\224 -\\Xi

X2

-\\-

\" ' -{\342\226\240 XnThen,

surely,
\"

lim sup-^===2==
V 2n

log

log n

= +1,

liminf\342\200\224^ V 2n

log log

-1.

This result
sums.
distributed.

already gives

very

precise
proof

behaviour
in

See Section

14.7 for

the

on the big values of partial case when the JC's are normally

A4.2.

Strassen's
Law

Law
is

of the

Iterated Logarithm
of Kolmogorov's result.
section.
of

Strassen's
map on Z\"*\",
t

a staggering

extension
in

Let {Xn) and (5n) be els


H-\342\226\272

the

previous

St{ijj)

on

[0,

00) be the linear

interpolation

the

For each u;, let the n 1-^ Sn{<^) map

so

that

St{uj)

:= {t

- n)5n+i(u;) + (n + 1result in mind, define


n

t)5n(u;),

te[n,n-^

1).

With Kolmogorov's

Z\342\200\236(<,c.):=^\302\243p^L=, V2n log log

t\342\202\254[0,l],

208

..(A4'3) so that t \\-^ Zni't.Lo) up to time n. Say


shapes
in

Chapter A4'
on [0,1] is a a function that
with

Appendix

to

Chapter
of

4
random

^09
walk S run
limiting

rescaledversion
t
\\-^

the

f{t,uj)

is in

the set

K{(jo)of

of the that such

path associated

uj if

there

is a

sequence ni(u;), n2(u;),...

Zn{t,uj) Now let


in

-^ /(t,u;)
those

uniformly

in

t G

[0,1].

K consist of

functions

/ in C[0,1]

which can be written

the

Lebesgue-integral

form

f(t) =

JO

I h{s)ds

where

/ ^0

hisfds

< 1.

Strassen's Theorem
P[K{uj)

K]

= 1. limiting

Thus, (almost) all paths have follows from Strassen's precisely sup{/(l):
However,

the

same

because 1,

shapes. (Exercise!)

Khinchine's law

= \342\202\254 A'}

inf{/(l):

/ G K}

= -1.
function rescaled)

the

only

so the big values of a line of slope 1.


Almost

element of K for which /(I) = 5 occur when the whole path will, in its often like

1 is
(when

the

f(t) looks

= t, like

every
t

path

function

and

infinitely

Z rescaling, look infinitely the function \342\200\224t; etc. etc.,


of

often

like

the

For a highly-motivated classicalproof References. Freedman (1971). For a proof for Brownian motion of see Stroock theory large deviations, (1984).

Strassen's

Law,

see

based

on the

powerful

A4.3. Let
matrix

model

for

a Markov

chain

\302\243^ be

a countable the

set; let //

denotes \302\243 as

set of all subsets in Section 4.8.

of

be a probability measure on where (^,5), let P denote a and stochastic E x E E;

we shall discover later, Complicating the notation somewhatfor reasons we wish to constructa probability triple (f2,j^, P'^) carrying an i^-valued stochastic process n such n that for Z+ and z'o, ii,..., E G in \342\202\254 E, {ZnZ+)

we have

P^{Zo =: Zo;...;Zn

in)

A^ioP\302\253o\302\253i \342\200\242\342\200\242\342\200\242P\302\253n-iin-

210

Chapter

A4:

Appendix
carry

to Chapter

4
\302\243^-valued

(A4'8)..
variables

Thetrickis to make

(f2,

j^,

P'^)

independent
G J^;nGN)

{Zo',Y{i,n):i

Zq having

law ji and such that


P''(f(i,n)=j)=p(i,i),

(e,iGE). construction

We can

obviously do this
f2 and

via

the

in Section

4.6.

For a; G

n G N,

define

Zn(a;):=F(Zn-i(a>),n);

and that's it!

Chapter
Appendix

to

Chapter

Our

task

is to

elementary

prove the Monotone-Convergence Theorem 5.3. We preliminary result.

need

an

A5.1. Doubly monotone arrays

Proposition.
be an

Let

(2/1'^

:rGN,nGN)
which

array of numbers in [0, oo] for fixed r,


for
yn

is doubly
y^^^

monotone:
limyn n r

as n ^

so that

:=!

exists;
exists.

fixed

n,

yn

T cls

r ]

so

that

yn

'-\342\200\224] Yimyn'

Then
y<\302\260\302\260) :=T limy('-)

=T

=: limy\342\200\236

y^.

Proof
we

The result
Let
\302\243 > 0

is almosttrivial. By
y^'^

replacing

each

{yn

) by arc
Then

tan

yn

\\

can assume

that the
be given.
>

are

uniformly

bounded.
that

Choose no such
\342\200\224 Then

?/no

> Voo

\342\200\224

^S-

choose

To such that

yl!'^^^

yno

i^-

so that

?/(^) >

?/oo.

Similarly,

?/oo >

y^\"^^-

\342\226\241

A5.2.

The

key

use of

Lemma 1.10(a)
monotonicity this

This is wherethe fundamental Please re-readSection 5.1at

property

of measures

is used.

stage.

211

212 LEMMA

Chapter A5: Appendix to

Chapter

(A5.2)..

(a)

Suppose that

G S

and
hn

that

e 5F+

and

hn

T U-

Then Proof.

i2o{hn)

K^)need

From (5.1,e),

/io(^n) < /^(^), so we


liminf/io(^n)

only

prove

that

^ K^)>!-\302\243}.

Let

\302\243 > 0,

and

define

An

:=^

{s e
/^(^)-

A :
But

hn{s)
<hn

Then

An

so that,

by Lemma 1.10(a), /j{An) T

(l-e)U\342\200\236

so that,

by (5.1,e),

(1 \342\200\224 e)i2{An)

< /io(^n)-

Hence

liminf/io(^n)
Since

>: (1 \342\200\224 ^)/^(^)result follows.

this

is true

for every

e >

0, the

LEMMA

(b)

Suppose

that

G SF'^
gn

and that
e

SF'^

and

gn

T /\342\200\242

Then

/J,o{gn)

/^o(/).

Proof. disjoint

We can write and each ak

/ as a finite > 0. Then


a^^U^gn

sum

/ =

^aklAk
(n T

where the sets Ak

are

TU, Lemma

oo),
D

and the result follows

from

(a).

A5.3.
LEMMA

'Uniqueness
Suppose

of integral'
f G
(mS)\"*\" of

(a)

that

and

that

we have

two

sequences

(f^^^)

and

(/n)

of elements

SF'^

such

that

f^''^U,

fnU-

Then
Tlim/<o(/'\">)=Tlim/.o(/\342\200\236).

..(A5.4)

Chapter

A5:

Appendix
as

to Chapter

5
/n,

213
and

Proof. Let
/i'\"^

/i'\"^

:=

Then A /\342\200\236. /(\342\200\242\342\200\242)

r T

oo, fn^

as n T

oo,

Z^'\"^- Hence,

by Lemma

A5.2(b),

i\"o(/i''^)T/io(/n)asrToo,

M/n''^)TMo(/('-))asnToo. The
result

now

follows

from

Proposition

A5.1.
G

\342\226\241

Recall

from

Section

5.2 that

for /

(mS)\"*\",

we

define

/i(/)
By

:= sup{fio{h) : h
we may
Let us (We
fn

E 5F+;

/i <

/} <

oo.

definition
fio{hn) gn

of /i(/),
T /^(/)t

and

such that

/.

choose a sequencehn in SF\"*\" such that hn < f of SF'^ also choose a sequence(gn)of elements can do this via the 'staircase function' in Section
:\342\200\224 msix{gn,

5.3.)
Then

Now

let

/^i,

^^2,

\342\200\242 \342\200\242 \342\200\242, ^n).

fn

G 5F+,

fn <
since

/^o(/n) ^

/^(/)?

^iid

/, and sincefn > fn ^ hn, we see that


Mfn) T M/)-

gn,

fn

Since /\342\200\242

fn

<

/,

On combining

this

fact

with

Lemma sequence

(a) 'changes
LEMMA

our

particular

(a), we to any

obtain the next sequence'.)

result. (Lemma

(b)

Let

G (mS)\"*\"

and let

(fn) be any sequence in


Kfn)

SF'^ such that

fn

T /\342\200\242

Then

= Mfn)

/^(/).

A5.4.

Proof
the

of the be a

Monotone-Convergence sequence of
elements

Theorem
such

Recall

statement:

Let

(fn)

o/(mS)\"'\"
T

that fn

Then T /\342\200\242

Kfn) Proof
set as

M/)-

Let a^^^

denote the

r*^

staircase

function

defined r

in Section
Lemma

5.3.

Now /^\"\"^

/^^ n I

a^^) /^^^ := a(^)(/). Since oo. Since a^^\\x) | x, Vx, /n as T /n

:= a(^)(/n),

is left-continuous,

fk''^ T

| oo.

By

A5.2(b),
T

/^(/n\"\"^)

Kf^''^)

as n

oo;

and

oo. We also know from Lemma A5.3(b) now follows from Proposition A5.1.
r t

by Lemma A5.3(b), that Kf^^^)

//(/i^^)
T

Kfn)

as

Kf)-

The

result

Chapter
Appendix

to

Chapter

This

chapter

is solely

devoted
all

8.6. It may

be read after
has

Section

student who
A9.1.
Let

read

to the proof of the 'infinite-product' Theorem It is probably something which a keen 9.10. a tutor. previous appendices should study with

Infinite
(An

products:
be a

setting things up
probability

:G N)

sequence of

measures

on (R,S).

Let

fi :=
a typical
Define

n
nGN

R.

SO that
of

R.

element u; of fi is a sequence u Xn{(-o) := u;^, and set

\342\200\224

(un

: n

G N)

of elements

^n
The

\342\200\242=

(t{Xi,X2,

. . . ,Xn)'

typical

element
Fn

Fn

of

J^n hsis

the form R,

(a)
Fubini's

Gn

JJ
k>n

Gne

II
l<k<n

B.

Theorem

shows that on

the algebra (NOT <T-algebra)

we

may

unambiguously

use (a)
P-(F\342\200\236)

to define a map P~ : J^~


(AiX...xA\342\200\236)(G\342\200\236),

-^

[0,1]

via

(b)
and

that

P~

is finitely

additive on

the

algebra

J-~,

However,

for each

fixed n,

2H

..(A9.2)
(c)
with

Chapter

A9:

Appendix

to Chapter

9
may

215
be identified

(fi,j^n,P~)
We want to

i^

0, bona
vio>

fide
(a)

probability
and

triple

which

Y[i<k<n(^^^^^k)
on

(b).

Moreover,

Xi,X2,...,Xn

are

independentRVs
(d)
(obviously

(fi,

J^n^P\that

prove
the

P~

is countably
with

additive on T~
of using Caratheodory's Theorem 1.7). Now measure of the existenceof Lebesgue (see (Al.9,a))
in T~
r,

intention

we

know

from

that it is

enough to show that is a sequence of sets if (Hr) (e) > e for every some \302\243 > 0, P~{Hr) A9.2. Proof of (A9.1,e)
Step

our proof

such
then

that

Hr

-^r+i^Vr^

and if for

f]Hr

^ 9-

1:

For

every

r, there
\342\200\224

is some n{r) such


,(-On(r)) look

that

Hr

G J^n{r)

and so

IhX^)
Recall

hr{(-0i,i02,... and

foT some

hr G

bB'^^^'K

that

Xk{io)
have

iOk^

again

at Section

A3.2.

Step 2: We
(aO)

E-MXi,X2,...,X,(,))>\302\243,Vr,

because
probability

the left-hand
triple

side of

(aO)

is exactly

(fi, J>i(y,),P~), then we know from

P~{Hr).

If we work within
Section

the

9.10

that

7r(<^) :T=5'r(<^l) :=

^~hr{0Jl,X2,X3,...,Xn(r))
expectation

is an explicitversion

of

the

conditional

of iHr

given

J^i

, and

< \302\243 P-iHr)


Now,

- E-(7,)

= Ai(ff,).

0 <

^'r <
e <

1, so that
Ai{9r)

< lAi{9r
<

> e2-'} + e2-'A,{gi'^ < \302\2432\"^}

Ai{^r>\302\2432-^}+\302\2432-^

Thus

Ai{9r>e2-^}>2-'e.

Step 3: However, since Hr 2 -H'r+i, where both Hr and Hr+i are in J^rn)
9r{^i)

we

have

(working

within

(fi,j^^,p-)

> Qr-^iioJi),

for every a;i in R.

216

Chapter A9: Appendix


we

to

Chapter

(A9.2)..

Working on (R, S, Ai)

have

Ki{gr>e2-^}>e2-\\'ir,

and gr i
and

so that
continuity

{gr

>

|; \342\202\2542~^}
from

by Lemma

1.10(b) on the

above

of measures,

we have

Ai{iOi:griu;^)>e2-\\yr}>e2-'.

thereexists Hence,
(al)
Step 4' We now

u*

(say)

in R

such that
>

E-ft,K,X2,...,X\342\200\236(o)

e2-\\

Vr.

repeat Steps 2 and 3 applied to the situation in which


is replaced

(Xi,X2,---) hr

by

(^2,^3,

\342\200\242 \342\200\242

O^

is replaced

by /ir(^i),

where
...)\342\200\242= hr{Ljl,U2,i03,

(^r(^t))(^2,^3,
We

. .)\342\200\242 \342\200\242

find

that

there exists

u;^

in

R such

that
Vr.

(a2)
Proceeding

E-M\302\253r,\302\2532,^3,...,X\342\200\236(,))>e2-2,

inductively,

we obtain
u;*

a sequence
: n G

(u;;

N)

with

the

property

that

E-/..K,u;2*,...,a;*(,))>\302\2432-\"('-),Vr.

However,
and

can

exactly

/ir(u;r,u;*,...,u;;(^)) be or 1. 0 The only conclusion only the existence of such an uj* which

= /H.(u;*),
is that we

had

uj* G Hr^yr; to prove.

and

it

was

Chapter

A13

Appendix

to Chapter

13

This

chapter

by

many

is devoted to comments on as good for the souls of students, on.

modes of convergence, regarded


and

certainly

easy to

set

examination

questions

A13.1.

Modes
: n

of convergence: be a
Let

definitions
RVs

Let {Xn
our triple

G N)

sequenceof
us

and

(fi,j^, P).

collect

together

let JC be a RV, all definitions known to

carried

by

us.

Almost sure convergence


We

say

that

Xn

\342\200\224> X almost

surely

if

Convergence
We

In probability
Xn
\342\200\224\342\226\272 X In

say

that

probability
X\\>

if, for
^^

every e > 0,
n-^oo.

P{\\XnC^ convergence (p > 1)


We

e)

as

say

that

Xn

^ in C^
\\\\Xn

if each

Xn is in
as

and \302\243p

e C^

and

\342\200\224

X\\\\p

\342\200\224> 0

\342\200\224>

oo,

equivalently,

E(|Xn-X|P)^0

as

n^oo.

217

218
A13.2.
Let

Chapter A13:Appendix

to

Chapter

13

(A13.2)..

Modes of convergence: relationships


state

me

the facts.

Thus

Convergence in probability

is the weakest
in prob)

of

the

above

forms of

convergence.

(a)
(b)

(Xn for

-^

X,

a.s.)

=^{Xn-^X

>

1,

(Xn -^

X in

=> \302\243P)

{Xn

-^ X in

prob).

valid.

No other implicationbetweenany two of course, for r > p > 1, But,


(c)

of our

three forms

is of convergence

(Xn

X in that

\302\2430=>

(Xn

^ X in

\302\243^).

If

we know

'convergence in probability
V5>0, us to

is happening quickly'

in

that

{d)^P{\\Xn-X\\>e)<oo,
n

then

(BCl)

allows

conclude that

Xn

\342\200\224*\342\200\242 a.s.

X,

The fact that property (d) the following result:

impliesa.s. convergence
only

is

used

in proving

(e)

Xn

\342\200\224*' X in

probability

if and

if

a further subsequence along which

we

every subsequence have almost sure

of (Xn) contains convergence to X.

The only
(f) for

other
Xn

useful
\342\200\224^ X in

result L^

is that
if and
only

hold:

p > Ij
Xn

if

the

following

two statements

(i)

\342\200\224*' X in

probability,
: n

(ii) the family that is


if

{\\Xn\\P

> 1)

is UL
above provide

There is only one way to gain an understanding of the to prove them yourself. The exercises under EA13 you need it.

facts,

and

guidance

Chapter

AI4

Appendix

to Chapter

14

We

v/ork

with

a filtered

space (II, ^,

{J^n-

Z\"*\"},

P).

This

chapter
that

introduces
J^t

The idea is

the a-algebra
the

J^t,

where

is a

stopping

time.

represents

information
integrable

available to our
supermartingale

observer
and

at (or, if
and T
of

Theorem
the

you

prefer,
that

says

immediately if X is a property:

after) time
uniformly

T. The Optional-Sampling
S

are stopping times with


supermartingale

< T,

then we

have the natural extension

E(Xt|J^s)

< Xs,

a.s.
time a stopping
time

A14.1. The a-algebra


Recall that
a map

J^t,

a stopping is called

T: fi

\342\200\224\302\273\342\200\242 Z\"*\" U {00}

if

{T
if

<n}

eJ'n,

ne Z+U{oo},

equivalently

{T

= n}

eJ'n,

ne Z^^Uloo}.
the 'n
\342\200\224 00' Z\"*\".

In each of
from the
Let

the

above
of

validity

the

statements, result for

case

follows

automatically
F

every n in
F

T be

a stopping
F n

time. Then, for n}


G

C fi,
G Z+ U

we say
{00},

that

G J^t if

\342\226\272 \342\226\272

{T <

n J^\342\200\236,

equivalently if
Then

F n {T = n} G T = n; ^t

^n, if T

n G

Z+

{00}.

^T

= ^n if

^00

= 00;

and J't

^ ^00 for

every

T.

219

220 You can another

Chapter A14:

Appendix

to

Chapter

14

(AI4.I)..

if

5 is

easily check that !Ft is stopping time, then

a cr-algebra.

You can

also check that

Hint. If

G J^SAT,

then

Fn{T

= n}= U
k<n

Fn{5AT=fc}.
X

is that if that needs to be checked Another detail process and T is a stoppingtime, then Xt G mJ^T- Here, defined in some way such that Xqo is ^00 measurable.
Proof. For B e

is an Xoo is

adapted assumed

B,
G

{Xt
A14.2.

5}

n {T

= n}

= {Xn \302\243 B}

H {T

= n}

J^n-

\342\226\241

A special

case of

OST
Let T
be

LEMMA
Let

be a

some N in N,

supermartingale.
T{lj)

a stopping

time

such that, for


and

< N,

Vu;. Then
E{Xn\\:Ft)<Xt.

Xt

\302\243Hfi,J^T,P)

Proof

Let F

e J^t- Then
n

E{Xn;F)=Y^ E{Xn;F
n<N

{T

n}) n}) =

<
the fact that

Y, E{Xn;F n
n<N \\Xt\\

{T

E(Xt; F).
the result

(Of

course,

<

E{\\Xt\\)< 00.)

|^i

|H

h \\Xn\\

guarantees

that

D
martingales

A14.3. Doob'sOptional-Sampling
Theorem

for

UI

Let M

be a UI martingale.Then,

for

any

stopping

time T,

E(Moo|J^t) = Mt,

a.s.

..(A14-4)

Chapter

A14:

Appendix

to Chapter

I4

221

Corollary 1 (a new Optional-Stopping and If M is a UI martingale^ = E(Mo). an^E(MT)

Theorem!) T is

a stopping

time,

then

E(|Mt|)

< 00

Corollary 2

If Proof

is

a UI

martingale

and S and T are E{Mt\\J's) =

stoppingtimes

with

<T,

then

Ms,

a.s.
have,

of

theorem.

By

Theorem

14.1 and

Lemma A14.2,we
=

for

fc

G N,

E(Mool^ik)

= Mk,

a.s.,

E{Mk\\J^TAk)

Mtau,

a.s.

Hence, by the

Tower Property,
E(Moo|J^TAik)

(*)
If F G J^T, then

MTAik,

a.s.

(check!)F fl
< k})

{T

< fc} G

J^TAk, so
< k})

that, by (*),
= E(Mt;
Moo j
fc

(**)
We

E(Moo;

Fn{T

all

E(MTAifc;

Fn{T

Fn{T < fc}).


^ 0, whence (**) and using

can

Mn =

E(Moo|^n) ^ 0 for (MON), we obtain


E(Moo;
However,

(and

do) restrict

attention to the case when


n.

Then,

on letting

00 in

F n

{T <

00}) = E(Mt;Fn{T<
{T

00}).
00}).
D

the

fact

that

E(Moo; F n is tautological.
Corollary
follows

{T = 00})= E(Mt;F n

Hence E(Moo;F) =
2 now

E(Mt;F).
1

follows from
2!

the Tower Property, and Corollary

from

Corollary

A14.4.
A

The result for


submartingale

UI submartingales
Doob decomposition M is
if UI. Hence,

UI

X has

where (Exercise: explainwhy!) E(Aoo) a stopping time, then, almost surely,


e(Xoo\\:Ft)

<

00 and

is

= Xo

+ e(Moo|:^t)
E(Aoo|:^T)

= Xo-fMT+
>Xo

+ e(Aoo|:^t)

+ Mt
J*.

+ E{At\\J't)

-A

Chapter

A16

Appendix

to

Chapter

16

A16.1.

Differentiation

under

the

integral

sign

Before stating our theoremon this topic, let us examine the type of appHcaJC is a RV such that Ed-X\"!) tion we need in Section 16.3. Supposethat < oo the real and imaginary parts of can treat and that h{t,x) = ixe**^. (We of R, then Note that if [a, 6] is a subinterval the variables h separately.) {h{t,X) : t G [a, 6]} are dominated by |X|, and so are UI.In the theorem, we shall have

EH(t,X) = ^xit)-fx(a),
and

te[a,b],

we

can

conclude

that v^x(0

exists and equals Eh{t,X).

THEOREM
Let

be a

RV carried

by

(fi,^,P). a <b,
: [a, 6]

Suppose

that a, 6 G R

with

and that
X R
-\342\226\272 R

ft

has

the properties:

(i) i 1-^h{t,x) (ii)


X

is continuous
is B-measurable

in

t for

every x in R,
in

1-^

h{t,x)

for every t
are

[a, 6],

(iii) the variables

{h{t,X) : t

G [a, &]}

UI.

Then

(a) (b)

\\\342\200\224^

Eh{t,X)

is continuous

on [a,6],

is

B[a,b]

X B-measurable,

..(A16.1) (c) if H{t, x)

Chapter A16: Appendix

to

Chapter

16

223
6),

:= Jl h{s,

x)ds

for

a < t

< b, then for

G (a,

-j-EH{t,X) at

exists and equals


consider

Eh{t,X).
case tn
\342\200\224> t\\ result \342\226\241

Proofof(a). Since
Proof

we

need

only

(a) follows immediately from


o/(b).

Theorem 13.7.
:=

the 'sequential

Define

6n :=

2-^(6 - a), Dn
G

(a +

\302\253+)

fl

[a, 6],

rn(t)

:= inf {r

: r jO\342\200\236

> t},
t G

hn{t,x) :=
Then,
for

/i(r\342\200\236(t),x),

t G [a, 6], [a, 6], x G R.

G S,

/i-^(5)
SO

U (([r,r+

<5)n

[a,6])

x {x

: /i(r,x)

G 5}),
x R, result

that

hn

is S[a, 6]

x S-measurable.

Since hn

\342\200\224^ h on

[a,b]

follows.

(b)

Proofo/(c).
If r = A
X C

For

C [a,

6] x R,

define
6] x

a(T) := {(t,uj)e [a,


where

fi :

(t,X(a;))

G T}.

A G B[a,

b]

and

C G

S, then
G S[a,6]

a{T) = Ax

(X-^C)

^.
B[a,

It is now clear that the class of F for which a(F) is a cr-algebra containing B[a^b] x B. The point (*)

is an element of
of

b]xj^

all

this

is that

(t,a;)
for

H->

h{t,X(uj))

\\s B

x J^B}

measurable
(Yes, I know,

since

could have obtained but it is good to have (*) more directly using the /i\342\200\236's, other methods.)Sincethe family {h{t^X) : t > 0} is UI, it is bounded in
>C^, whence
rb

B, {(t,uj)

: h(t,X{uj))

is a(h-^B).

we

Ja I

E\\h{t,X)\\dt<oo.

Fubini's Theorem now


I J

implies

that,

for a

<t <

b,

Eh(s,X)ds
a

= E f
J a

h(s,X)ds = EH{t,X),

and

part (c) now

follows.

Chapter

Exercises

Starred

exercises

are

more

tricky.

The first number in an


it

exercisegives
an

a
of

rough
gumption in

of which chapter indication is all that's necessary'. A


main

depends

number

'G' of exercises
on.
begin

stands

the

text.

Some are

repeated here. We
measure

for 'a bit be found also may


the

with

Antidote to measure-theoretic mere that probability is more than

material- just for fun


theory

though

point

needs hammering

home.

EG.l.
chosen

Two

points

are to

chosen

at random

according

made

independently

the uniform distribution on of each other. The line AB


What

on a line AB, each point being AB, and the choices being
may

now

divided into three parts.

be regarded

as

is

the

probability

that

into a triangle?

they may be

made

EG.2. Planet X is a ball with centre O. Three spaceships A, B and C land at random on its surface, their positionsbeing independent and each
uniformly

distributed

on the

surface.
<
example,

Spaceships
that A

directly by radio if ZAOB can keep in touch (with, for


necessary)

90\302\260.Show

communicate that they probability with via C if B communicating


A

and

B can

the

is (tt

+ 2)/(47r).
free

EG.3.
0
with

Let G be the
the

group

with
the

two generators

a and h.
second

Start

at

time

unit

element

each
at

current word on the right by one of


with

1, the empty
(independently

word.

At

each

multiply

the

four

elements

a, a~^,

probability

1/4

of previous
h\"^, a~^,

6, h~^,choosing choices). The choices

a,a,6,
times

a~^,

a,

a,

1 to
the

will

produce

the

reduced
is intuitively word

word aah

Prove that
time

probability

that
why

the reduced
it

is 1/3,

and explain

of length 3 at time 9. word 1 ever occursat a positive clear that (almost surely)
n)ln
\342\200\224> \\.

(length of

reduced

at time

224

Chapter

E: Exercises
now

225
elements

EG.4.* (Continuation) Suppose


chosen

that

the

a,
the

a\"\"^,

&, &''^

are

instead
/3

with

0,a +
1 ever

respective

probabilities
that

a^a,l3^l3^
the

where a is

a >
reduced

0,/? >
word

^. Prove
root

that the
x =

conditional probabilitythat
element
the

1, is the unique

occurs at a positive time, given


r{a) (say) in
3x^ +

chosen at time

(0,1) of
+

equation

(3

- 4a-^)x^ + X
true

1 =

0. and more

As time
word

goes on, (it is almost surely


fixed,

that)

more

of the reduced

becomes

so that

a final word

is built

up.

If in

the symbolsa and a\"^ are both replaced by A and the show that the sequence of A^s are both replaced B, by a Markov chain on {A,B} with (for example)
PAA

the final word, b and b~^ symbols is B^s obtained and

a(l

\342\200\224

a(l -

x)

'

x)

+2/3(1-t/)'
proportion

where

r(/3).

What

is the
final

of the
Lyons

symbol a in the
of

(almost sure) limiting


word?

of occurrence

(Note.

This

result was

Edinburgh

to solve

a long-standing

problemin potentialtheory

used by Professor
on

Riemannian

manifolds.)

Algebras,

etc.
subsets

El.l.
Let

'Probability' for y C N. Say that V

of N

has

(Cesaro)

density ^(V)

and write

G CES

if

exists. Give an exampleof sets Thus, CES is not an algebra.

Vi and

V2

in

CES

for which

Vi

fl V'2

^ CES.

Independence

E4.1.

Let

(fi,.F,

P)

be a
for

TT-systems on fi such that,

probability triple.
k =

Let Ji, J2 and

I3

be

three

1,2,3,

Xik

and Q

\302\243Xk'

226

Chapter

E: Exercises

Prove that

if

p(/in/2n/3)

= P(/i)P(/2)P(/3)
then

whenever h e Ik {k = Why did we require that


E4.2.

1,2,3),
Q, \302\243 Xk\"^ (^(s)

cr(Ji),a(J2),a(J3)

are

independent.

Let 5

> 1, and define


random

:=

^~'*' Yln\342\202\254N

^^ usual.

Let X

and Y be

independent N-valued

variables

with

P(X =
Prove
are

n) = P(F

n)

n-7C(5).

that

the events

independent.

Explain Euler's formula

(Ep : p prime)

, where

Ep =

{X is

divisibleby

p},

i/c(^)=n(i-i/p')
p

probabilistically.

Prove

that

P(no square Let H

other than 1 divides


of

X)

1/C{2s).

be the highest commonfactor

and

Y. Prove that

P(H = n) = n-27C(25).

E4.3.
continuous

Let

-X'i,-X'2,...

be

distribution

function.

random variables with the independent Let Ei := fi, and, for n > 2, let
<

same

En :=

{Xn >
and

Xm^ym

n}

= {a

'Record' occurs at time

n}.

Convince yourself
with

your

tutor

that the

P{En)

= 1/n.

events J^i,J^2?

\342\200\242 \342\200\242 \342\200\242 are independent,

Borel-Cantelli Lemmas

E4.4.
Let
Ak

Suppose

that

a coin

be

the

event

that a

amongst tossesnumbered

is tossed repeatedly. heads occurs consecutive sequence more) - 1. Prove that 2*, 2* + 1,2* + 2,..., 2*+^
of

with probability
of

heads

k (or

'\342\200\242(^-o)

{j

Zi\\. use of the

Hint. Let Ei be the event that tossnumbered2*+(z Now \342\200\224l)fc.


formulae

there
make

are k consecutive
a simple

heads beginningat

inclusion-exclusion

(Lemma

1.9).

Chapter

E: Exercises

227
the

E4.5. Prove that


distribution,

if

G is

a random

variable with

normal

N(0,1)

then,

for

x >

0,

P(G >x) =
Let Xi,X2,
with \342\200\242 be a \342\200\242 \342\200\242

y/27:

Jx

He-^y'dy< ^yJ^

-J\342\200\224e-^^\\

probability

1,

of independent N(0,1) i < 1, where


sequence

variables. Prove that,

L := limsup(X\342\200\236/\\/2
(Harder.

log See

n). Section

Prove that

P{L =
J^2

1) = 1.)[Hint
\342\200\242 \342\200\242 \342\200\242

14.8.]

distribution.
this

Let 5n

:= J^i +
Prove

^n-

Recall

that

Sn/y/n

has the N(0,1)

that

P(|5n|
Note

< 2i/nlogn,

ev) =

1.
0)

that

implies

the Strong
the

Law: P(5n/n
Logarithm

= \342\200\224>

1.

Remark.

The Law of

Iterated

states

that
=

P (lim
V

sup .2n ^
V
now!

log log n

=1

1-

Do not attempt to prove


E4.6. Converseto
Let
SLLN

this

See Section

14.7.

Z be

a non-negative RV.

Let Y be the integerpart of

Z.

Show

that

and deduce that

(*)

5]P[Z>n]<E(Z)<l
ncN

+ j;P[Z>n].
ncN

Let (Xn) be a sequence of random variables)with EdJC^I)

IID

RVs = 00,

(independent,

identically

distributed

Vn. Prove that

P[Xn\\

>

kn]

= 00

(ke N)

and

limsup J^

= 00, a.s.

Deduce

that

\\i Sn

= Xi
r lim

-\\-

X2

^ \\Sn\\

then \\-X\342\200\236,

sup

= 00,

a.s.

228
E4.7.
Let

Chapter

E:

Exercises

What's
Xi, X2,...

fair about a

fair

game?
RVs

be independent
y

such

that

__ ~\"

\342\200\224 with I f n'^

\\

\342\200\224 1

with that

probability probability if Sn

n~^ n~^. 1 \342\200\224

Prove

that

E{Xn) =

0, Vn,

but

= Xi

+ X2

\342\200\242\342\200\242\342\200\242

^n,

then

Sn n

\342\200\224 a.s.

1,

E4.8*.

Blackwell's

test of imagination
you

that assumes This exercise chains with two states.

are

familiar

with continuous-parameter
: t

Markov
with

For each n G
state-space

N, let

X^^) =

{X^^'^t) with

> 0}

be a Markov chain

the

two-point

set {0,1}

Q-matrix

and

transition

function

P^^\\t) =
bn)

exp(tQ^^^).Show
,

that,

for

every

t,

Pil\\t)

> bn/{an +

p^'^t)

< 0^/(0^

+ 6n).
=

The processes : n G N) are independent and X^^\\0) (X^^^ Each X^^^has right-continuous paths.
Suppose Prove

0 for

every n.

that that

an =
if t

oo and ^an/bn < is a fixed time then


for

oo.
many

(*)
on

P{X(^)(t)= 1
to

infinitely

n} =

0.
convergent

Use Weierstrass'sM-test
[0,1],

show

that

and deduce that

J^n ^^gPoo

(0 is uniformly as tj 0.

P{X(^)(^) = 0 for
Prove

ALL

n}

-. 1

that

P{X(^)(5)

= 0,
tutor

V5

<

^,Vn}

= 0

for every t

>

and

discuss

with

your

why it is

almost surely true

that

Chapter

E:

Exercises
of

229
many

(**)
Now

within

every

non-empty

time

interval,

infinitely

the

X^^^

chains jump.
imagine

the

whole

behaviour.

almost all its time Notes. Almost surely, the process X = (X^^^) spends of sequences with in the countablesubset of {0, l}*^ consisting finitely only and Fubini's Theorem 8.2. However, it I's. This follows from many (*) is a.s. true that X visits uncountable points of {0,1}'^ during every This follows from (**) and the Bairecategory theorem time interval. nonempty A 1.12. one can show that for certain By using much deeper techniques, choices of (on) and (6n),X will almost certainly visit every point of {0,1}*^ often within a finite time. uncountably

Tail <T-algebras

E4.9.

Let

lo,

yi,

^2,...
p(y;

be independent
=

random variableswith = -i) =


...

+i)

= P(y;

i,

Vn.

For

n G

N, define

Xn '=
Prove that

YoYi

YnDefine

the variables-X'i,X2,...areindependent.
y :=

(7(^1, 1^2,. . .).

^n

:=

CT{Xr

I V

>

Tl).

Prove

that

:=f]c7{y,Tn)^ah,f]Tnj

=:n.
independent of IZ.

Hint. Prove that

Yq

and G m\302\243 2

that

Yq is

E4.10. Star
See

Trek,

ElO.ll,

which you

can do now.

that fn{s) \342\200\224> 0 for every of := picture g sup\342\200\236 |/n|,

Dominated-Convergence Theorem E5.1. 5 := [0,1], S := B(5), // := Leb.


Let

Define

s in 5, but that //(/n) = 1 for and show that g ^ C^{S, S,//).

fn

:=

n/(o,i/n).
every

n.

Draw

Prove a

230
Inclusion-Exclusion

Chapter

E:

Exercises

Formulae
and

formulae E5.2. Prove the inclusion-exclusion functions. of indicator by considering integrals

inequalities

of Section

1.9

The

Strong
Inverting

Law
Laplace

E7.1.
Let
of

transforms on
by

function / be a bounded continuous is the function L on (0, oc) defined

[0,oo).

The Laplace

transform

i(A)

:=

o-Ax

/'

f{x)dx

Let

JCi,-X'2,.
A,

\342\200\242 be independent \342\200\242

of rate

so

P[X

> x]

= e~'^^, E{X) =

RVs each

with Var(X)

the

exponential
=

distribution

{,

^.

Show that

where Sn = Xi + -X'2 H of L. Prove that / may

h -^n, be

and

recovered

L^^^^^ denotes the (n \342\200\224 1)*'* derivative from L as follows: for y > 0,

ntoo

(n

\342\200\224

1)!

E7.2.

The

uniform

distribution

on the sphere 5\"*^^

R^

write S^^^ = {x E R^ : |a:| = 1}. You may assume that there is a = such that unique probability measure i/^~^on (5^~^,S(5^~-^)) u^^^{A) A in B{S^^^). u^^^{HA) for every orthogonal n x n matrix H and every Prove that if X is a vector in Rn, the components of which are for then x n matrix H, the n independent variables, every orthogonal N(0,1) vector HX. has the same property. Deducethat has law i/^~^. X/|X|
As usual,

Let Zi, Z2,...

be

independent

N(0,1)

variables

and

define

Prove that Rn/y/n


Combine
important

\342\200\224> a.s.

1,

these ideas
Brownian

normal distribution to the


both

for

'infinite-dimensional' sphereand which motion and for Fock-space constructions in

to prove a rather striking

fact

which

relates
is
quantum

the

mechanics:

Chapter
for

E: Exercises
is a

231
chosen on

If,

each

to the distribution
lim

n, (F/^*^ , Y^\"\"^,..., Y^\"\"^)


i/^~^,

point

5^^^ according

then

n-*ooP(v^F/\"^

< x)

- $(x)

^=
y/2'K

J-oo

e-y^l^dy,

lim

n\342\200\224^00

PiV^Y}\"^

< xi;

^/^Y^^\"'>

<

X2) =

$(xi)$(a;2).

iTmi.

P(F/\"^

< u)

P(Xi/ii\342\200\236

<

u).

Conditional

Expectation
if

E9.1.

Prove that

is a

sub-<T-algebra

of J^

and if

\302\243 \302\243^(Q,J^,

P)

and

iiY\342\202\254C\\n,g,P)

and

(*)

E(X;G)

= E(F;G)
Q>

for every
for

G in a 7r-system
G in

which

contains

and

generates

Q, then

(*) holds

every

^.
that

E9.2. Suppose

X,Y

\302\243^(J2,J^,P)

and

that

E{X\\Y)
Prove

= F,

a.s.,

E(F|X) = X,
< c) + E{X-Y;X

a.s.

that

P(J\\: =
E{X

F) = 1.
-

Hint Consider
Martingales

F;X

> c,F

<c,Y <c).

El0.1.
At

Polya's
0,

urn

an urn contains 1 black ball and 1 white ball. At each time a at ball is chosen random the from urn and is 1,2,3,..., replaced together a new with ball of the same colour. Just after time are therefore n, there n + 2 balls in the urn, of which I are where is -^ Bn Bn the number black, of black balls chosen by time n.
time

Let Mn = {Bn+ l)/(n + 2), the


after

proportion

of black
filtration

balls in the urn


which

time

n.

Prove

that
=

specify)

M is a
of

(relative to a natural
fc)

just

you

should

martingale.
P{Bn =

Prove that
distribution
0,

(n

+ 1)-^

for 0

<

A;

<

n.

What

is the

where

:\342\200\224 lim Mn?

Chapter

E: Exercises

Prove

that

for

0 <

^ <

1,

N^. definesa martingale

{Continued

at

ElO.8.)

E10.2.

Martingale formulation per


unit

of

Bellman's

Optimality
where \302\243n,

Principle
the

Your winnings
with

stake

on

game

n are

Sn are

IID

RVs

P(\302\243n

+1)

= p,

P(\302\243n

-1)

= g,
lie

where

\\

<p^l-q<l.

IS 0 and Z^-i, where Zn \342\200\2241 maximize the object expected your 'interest rate' where iV is a given integer representing the length Elog(Z;v/^o), of the constant. and Zq, your fortune at time 0, is a given Let game, \342\200\224 be time n. to Show that if is C J^^ 'history' up your any (j{ei^... ,\302\243n) Tia is a ^t^permartingale, where a denotes (previsible) strategy, then log Zn \342\200\224

Your

stake

fortune

Cn on game n must at time n \342\200\224 1. Your

between

is to

the ^entropif

==

plogpH-glogg but

+ log2,

so that Elog(Z;v/Zo) a martingale. What

<
is

Noi-,

that,

for a

certain strategy, log Zn \342\200\224 na is

the

best

strategy?

El0.3.
that

Stopping times
that

Suppose times.

5 A T

5 and T are stopping times (relative to (J2,^, {^n}))and 5 + T are (:= min(S,T)), 5 V T(:= max(5,T))

Prove

stopping
process

ElO.4.
l(5,7^
with

Let

and

T be
set

stopping times
N via

with

<

T.

Define the

parameter

\"
l(^,^(n,u;).-|^

otherwise.
and deduce that if JC
<

Prove that then

1(5,t]

is previsible,

is a supermartingale,

E(XTAn)

E(XsAn),

Vn.

Chapter

E: Exercises chance of happening will


for

23S
(almost

E10.5.
surely)

'What

always

stands

a reasonable
than

happen
Suppose

- sooner rather
T is

later.'

\302\243 > 0,

we

a stoppingtime such that have, for every n:


that

some

N eN

and some

P(T <n +
Prove by induction
fc-1,2,3,...

iV|^n)

>

e,

a.s.

using P(r >

kN)

P(T

> kN;T

> (k

- l)N)

that

for

P(T>fciV)

<(!-\302\243)*.

Show that E(T)

< oo.

E10.6.
At

ABRACADABRA

each

sequence
uniformly

of times of letters
from

1,2,3,..., a monkey types a capital letter at of each RVs typed forming an IID sequence the 26 possible capital letters. amongst
time

random,

the

chosen

Just beforeeach
He bets $

1,2,...,
letter

1 that
the

a new gamblerarrives
will be

on

the

scene.

n^^

A.
of

If he loses,he leaves.If event that


If he that

he

wins,

he receives
letter

$ 26 all
will

which

he bets

on the

the (n + loses, he leaves. If


he

1)^^

be B.
fortune

wins,

he bets

his whole current


be

of

$ 26^

the (n +
and

2)^^letter will

R Let

so on
which

time by
that

through the
the Explain

ABRACADABRA

sequence.

T be

the first
obvious

monkey why

ABRACADABRA.

produced the theory martingale


has

consecutive sequence
makes
-f- 26
it

intuitively

E(T) =
and

26^^ -f-

26^

use result

10.10(c) to prove this. (SeeRoss


Ruin

(1983)

for

other

such

applications.)

ElO.7.

Gambler's

Suppose

that Xi,

^2,

\342\200\242 \342\200\242 \342\200\242 are IID

RVs with
==

P[X =-f-1]

= p,

P[.Y

=-1]

^,

where

0<p=l-g<l,

234
and

Chapter E: Exercises
p T^ q.

Suppose

that

a and

are

integers

with

0 <

a<

&.

Define

5n

:=a-|-Xi+---4-X\342\200\236,

T :=
{0,^})-

inf{n : 5n = 0 or
Explain
why

5n

&}.

Let

^ =
in

<t(Xi,... ,Xn)
Question

(^0

satisfies

the

conditions

E10.5.

Prove that

Mn :=
define

i^fP

and

Nn

Sn

n{p ^
of

q)
= 0)

martingales

and

N. Deduce

the

values

P(5t

and E(5t).

E10.8. Bayes' urn


A

random

number 0

coin is tossedrepeatedly.Let Bn the same has exactly be the number of heads in n tosses. Prove that (Bn) on Polya's urn. Prove in (ElO.l) probabilistic structure as the (Bn)sequence that N^ is a regularconditional of 0 given jBi, ^^2,..., Bnpdf {Continuedat El8.5.)
probability

0 is of heads

chosen uniformly is minted. The

between

0 and

1, and

a coin with

E10.9.
stopping

Show
time,

that

if JC is

a non-negative supermartingaleand T is a

then

E(Xt;T<oo)<E(Xo).

{Hint.
ElO.lO*.

Recall Fatou's
The

that Lemma.) Deduce

cP(supXn
n

> c)

< E(Xo)-

'Star-ship

Enterprise'

Problem

The control system on the star-ship has gone wonky. All that Enterprise one can do is to set a distance to be travelled. The will then move spaceship that distance in a randomly chosen then stop. The object is to direction, into the a of r. ball radius SolarSystem, get Initially, the Enterprise is at

a distanceRo{> r)

from

the

Sun.
Sun

Let Rn be the distancefrom Gauss's theorems on potentials


distributions

due

to

show that

that for any

whatever

and supermartingale^

that

from Sun

that
For

to Enterprise, 1/Rn is a

strategy is adopted, 1/Rn is a which no always sets a distance strategy


martingale.

to Enterprise after n to spherically-symmetric

'space-hops'. charge
greater

Use

than

Use

(ElO.9)

to show

P[Enterprise gets into SolarSystem]


each e
than

<

t/Rq.

> 0,
{t/Rq)

you

can
\342\200\224

choose What

a strategy kind

greater

e.

which makes this of strategy will this be?

probability

Chapter

E: Exercises
Log
Scott next

235

ElO.ll*. Star Trek,


Mr Spock and
that Chief

2.

'Captain's

...
modified
for

Engineer

have

the

Enterprise
current

is confined
However,

to move

ever

through the Sun.

the

and 'current' being updated is way). Spock muttering somethingabout logarithms and we will get random walks, I wonder whether it is (almost) certain that but into the Solar System sometime ... ' to be the
in distance

'hop-length'

the control system so a fixed plane passing is now automatically set


in

to the

Sun ('next'

the

obvious

Hint.

Let

Xn

of variables

each of

'-= log

Rn-iRn\342\200\224^og mean

Prove

that

JCi, ^2?

\342\200\242 \342\200\242 \342\200\242 is an

0 and

finite variance

a^ (say), where a

IID sequence

> 0.

let

Sn:=Xr+X2

+ ---+Xn.
number,

Prove that if a

is a fixed
5n P[inf n

positive

then
-Q!<Tx/n,
=

= -oo] > P[Sn < > limsupP[5n < -a(Ty/n\\


Theorem.)

i.o.] > 0.

$(-a)

(Use the
in the

Central Limit
Process

Prove

that

the event

tail <T-algebra of
Branching

the (Xn) sequence.

{inf^ Sn =

is \342\200\224oo}

E12.1.
a
family

A branching

process Z
define

= {Zn-n > 0}isconstructed


of

in

the

usual

way. Thus,

IID

Z\"*\"-valued random
define

variables is

supposed

given. We

Zq: =

1 and

then

recursively:

Zn+i:=X[\"+'^

+ ---+Xi\"+'^

(n>0).
then

Assume that

if X

denotes

any one
and

of the
0 <

J\\:J^'*\\

//:=E(X)<oc
Prove

cr^:= Var(X) < oo.


filtration

that
(y

Mn:= Zi,...,

Zn/fJ-^ Zn).

Tn=

(Zq,

defines a Show that

martingale M relative to the

and deduce that

M is boundedin \302\243^ if

and

only

if ^

> 1.

Showthat

when

236

Chapter
Kronecker's

E: Exercises

El2.2. Use of
Let jEi,jE2,... Prove that ^

Lemma
with

be independent events
\342\200\224 (Yk

P{En)

to deducethat
where iV^: =
becomes

j^)

/logfc

converges
Nn
y

a.s., and
,
1,

use Kronecker'sLemma

l/n.

Let

Yi

Ie..

a.s.,

logn
Vi H

5^n- An

the

number
Trek,

of records by time n.

interesting

application is to

E4.3, when

Nn

E12.3. Star
Prove

and for ever - in R^

that

if the

strategy in ElO.ll
rather

is (in the obvious


then

sense)

employed

than

in

R\"^,

^R~'^<oo,

a.s.,
Enterprise result
fully

where Rn is the
should

distance from Note. It should be obvious


try

the which

to the plays

Sun at time

n.
you

the key

role here, but

to make

your

argument

rigorous.

Uniform

Integrability

El3.1.
conditions

Prove
(i)

that

a clas

C of RVs

is UI if

and

only

if both

of the

following

and

(ii)

A := sup{E(|X|) : X eC} <oo, so that (i) C is boundedin \302\243\\ for e > 0, 36 > 0 such that if F G ^, every P(F) < 6 and X e C,

(ii) hold:

< e. thenE(|J\\:|;F)
Hf\\

Hint

for

For X

eC,

P{\\X\\

>

K)

< R-^
\\X\\

A.
KP{F).

Hint for 'only if\\ E(|J\\:|; F) < E{\\X[, Prove that if C and V are UI El3.2.
C -^V

>

K)-^

classes
eC,

of RVs,
Y eV),

and

if

we

define

:= {X

-^Y : X

then C -\\-T>

IS UI. C

Hint. a

E13.3.
and

Let

be

some

sub-cr-algebra

prove this is to use E13.1. UI family of RVs. Say that F G 2> if for some X eC = we of have Y Q ^, E{X\\Q), a.s. Prove that V is
One way to

UI.

El4.1.

Hunt's
that

Lemma
{Xn)

Suppose
and that

is a

sequence
Y

of
in

RVs

such

that

X:

\342\200\224

YimXn

exists

a.s.

{Xn) is dominatedby
\\Xn{uj)\\<Y{uj),

{O)^:

V(n,a;),

and E(F) <

oo.

Chapter

E:

Exercises

237

Let {Tn}

be

any

filtration.

Prove

that

E(Xn|^n)-^E(X|^oo)

a.s.
that

Hint Let Zm.= sup^>^ \\Xr for n > m, we have, that Prove

X\\.

Prove

Z^

-^ 0

a.s. and in

\302\243^

almost

surely,

E14.2.

Azuma-HoefFding
if

InequaHty
RV

(a) Showthat
for

is a

with

values

in

c] [\342\200\224c,

and

with

E(F)

= 0,

then,

e eR,

Ee^^
Prove

< coshl9c

< exp (\\^^cA


null

.
for some

(b)

that
G N)

(cn : n

of

is a martingale positive constants,


if M

at

0 such

that

sequence

|Mn -Mn_i| < Cn,


then,

Vn,

for x

> 0,

P
Hint

sup

Mfc >

< exp

( -x^

for

(a).

Let f{z):

= exp(^2;), zG

c]. [\342\200\224c,

Then,

since

/ is

convex,

/(y)<Sr^(-c)
Hint

+ ^/(c).

for

(b).

See the

proof of (14.7,a).

Characteristic Functions

El6.1.

Prove

that

lim TToo

sinx

dx

= 7r/2
by

by integrating
semicircles

of

radii

e and

J z ^e^^dzaround the contour formed T and the intervals and \342\200\224e] [\342\200\224T, has the U[-l, 1]distribution,
ipz{e)

the

'upper'

[e,T].
then

E16.2.

Prove that if Z

= {sin

0)/0,

238
and

Chapter E: Exercises
prove

that

there

do not

exist IID RVs x-r-uhi,i].

and

Y such that

that E16.3. Suppose

integrating

with
Show

the
that
with

each has the

standard Cauchy distribution.


Suppose

and let ^ > 0. By has the Cauchy distribution, formed the semicircle around + z^) by together [\342\200\224R^R\\ e*^^/(l i?, prove that ipx{0) = e~^. 'upper' semicircle centre 0 of radius = e\"!^! for all 6. Prove that are IID RVs if Xi, X\342\200\236 X2,... (fx{0) \342\200\242 \342\200\242 \342\200\242 then also the standard + Xn)/n distribution, Cauchy (Xi +
X that

E16.4.
^

has

the standard
\342\200\224

normal N(0,1) distribution.Let


the rectangular

>

0.

Consider

J(27r)~2

exp{

^z'^)dz

around

contour

(_i? _ ie) ^{Rand

^R^ (-i?)^ 19)

{-Rof and

19),

prove

that Prove

^x{^) that
definite

= exp( \342\200\224^^^).
a RV real

El6.5. non-negative

if (f is the characteristicfunction in that for complex Ci,C2,..., Cn

X, then

(p

is

\342\200\242 \342\200\242 ^1, ^2? \342\200\242 ? ^n,

{Hint

Express
(^

LHS as

says that
continuous,

the expectation of
function

is a characteristic

(^ :

if is non-negative \342\200\224> E18.6 C.) gives a


and Let Z,

definite! (It is of course understood simpler result in the same spirit.

... .) Bochner's if and only if (f{0) =

Theorem

1,

<^

is

that

here

E16.6.
of the
expansion
RV

(a)

(J2,^,P) where Z{uj)

of u.

Let

= ([0,1],B[0, l],Leb). What is the := 2uj - 1? Let u = X)2^^i?n(u;)

distribution
the

be

binary

^(^) = Y.
odd

2\"\"Qn(u;),

where

Qniu;)

= 2i?n(u;)

- 1.
V

Find
identically

a random
distributed

variable
and

V -F

independent ^ V

of U

such that

U and

are

[/

is uniformly

distributed

on
and

[\342\200\2241,1].

(b)

Now

suppose

that
^Y

such that

(on some probability triple)X


is

are

IID RVs

X+

uniformly

distributed

on

[\342\200\2241,1].

Chapter

E:

Exercises

239

of that the distribution Let V? be the CF of X. Calculate ^{0)/ip{\\6).Show X must be the same as that of U in part (a), and deduce that there exists = 0 and P{X e F) = 1. a set F G e[-l, 1] such that Leb(F)

E18.1.
associated

(a)

Suppose

that

with

the

Binomial

converges

weakly
A.

to F

where F is the DF of
are IID RVs Prove that for x G
<

A > 0 and that (for distribution B{n^X/n).

n > X)Fn is the DF Prove (using CFs) that


Poisson

Fn

the

distribution

with

parameter
(b)

Suppose

that

-X'i,X2,...
on R.

each R,

with

the

density

function

cos x)l'KX^ (1 \342\200\224

lim

X I =

^ + TT

arc tanx,

where

arc tan G

? f )\342\200\242 ( \342\200\224f

E18.2.

Prove the
Jri,jr2?--X
of

Weak

Law

of

Suppose that

are ^ C^

IID

X. Suppose that the distribution

and that E(-X') =

Large Numbers RVs, each with

in the
the

following form.
use

same
the

/i. Prove by
+ Xn)

as distribution of CFs that

^n:=n-^(Xi+...

converges weakly to the

unit

mass ~>

at

//. Deduce

that

An

/^

in probability.
Law.

Of course,
Weak

SLLNimpliesthis Weak
for Prob[0,1]
be

Convergence

E18.3. Let X and Y

RVs =

taking E(r*),

values in [0,1].
fc

Suppose that

E(X*)

0,1,2,....

Prove

that

(i) Ep(X)
(ii)

= Ep(y) for E/(-X') = E/(y) for


x)

every every

polynomial continuous

p, function

/ on

[0,1],

(iii) P{X <


E18.4.

P(y

< y) for

every x in [0,1].
Theorem

Hint for (ii). Use the Weierstrass


Suppose
Fn{x)

7.4.
DFs

that {Fn) is a
= 0

sequence
of

with

for X

<

0,

Fn(l)

= 1,

for every n.

240
Suppose

Chapter E: Exercises
that

(*)

rrik :=

lim / \"

x^dFn existsfor

fc

0,1,2,...

[0,1]
that

Use the

characterized
by

Helly-Bray Lemmaand E18.3to show


Jr^
^\342\226\240^

Fn

F, where

F is

x^dF

rrik^yk.
Moment F(O-)

E18.5. Let

E18.3: A F be a distribution with


Improving on
law,

Inversion =

Formula

0 and

F(l)

= 1. Let ^ be

the

associated

and

define
ruk

:=

J[o,i]
Define

x^dF{x).
P =

Q-

[0,1]

[0,1]^,

J^^BxB^,

^x Leb^,
law

This modelsthe
probability E10.8.

situation

in

0 The

of heads

is then
the

RV Hk

is 1 if

minted, and tossedat times 1,2,... . See k^^ toss produces heads, 0 otherwise. Define
+ --Theorem,

which

0 is

chosen with

^,

a coin

with

5n:=Hi+H2

+ i^n.

By the Strong Law


Define

and

Fubini's

Sn/n
a map

~> 0,
real

a.s.
(an : n G

D on

the space

of

sequences

Z*^)by setting

Da = {an
Prove
(*)
that

an+i

: n G Z*^).

Fn{x)

:=

^
i<nx of

f^)(D--'m), ^^^
F.

^ F{x)

at

every

point x of
Moment
{rrik

continuity

El8.6*

Problem
: A; G
Z\"*\")

Prove that if

is

existsa
mo =
Hint.

a sequence

of numbers in
that

RV

with

values

1 and
Define the

in [0,1]

such

[0,1], then there


only if

E{X^)

rrik if and

{D^m)s>0
show that

(r,5,GZ+).
that

and then verify Fn via E18.5(*), moments of ^n satisfy ^^k n


mn,i

E18.4(*)

holds.

You can
etc.

^n,o = 1,
You discover

mi,

m\342\200\236^2m2-f

n~^(mi

\342\200\224

m2),

the algebra!

Chapter E: Exercises
Weak

241

Convergence

for

Prob[0,

oo)
instead R such

El8.7.

Using Laplacetransforms Suppose that F and G are DFson


/

of CFs

that

F(O-)

= G(O-)
VA

= 0, and

e-^^dF{x)

e-^^dG{x\\
F(0) the if X

>

0.

\302\273/[0,oo)

\302\273/[0,oo)

Note

that

the

integral
[Hint.

that
it

\342\200\224 G.

One

on LHS has a contribution could derive this from


that

idea

from {0}. Prove in E7.1. However,

is

easier
then

to use

DF G,

El8.3, because we know


= E[(e-^r],
is a sequence such that

has

DF F

and Y has

E[(e-^r]

n =

0,1,2,...

.]
R

Suppose
with
Fn(0\342\200\224)

that

{Fn)

of distribution functions on

each

0 and

i:(A):=lim

f e-^'^dFnix)

exists
that

for

A >

0 and that

L is continuous at

0. Prove that
VA

Fn

is tight

and

Fn^F

where / e-^^dF(x)= L{X),

> 0.

Modes

of convergence
that

EA13.1. (a) Prove


Hint.

(Xn

-^ X,

a.s.)

=>

(Xn

-\342\226\272 X in

prob).

See Section
that

13.5.
-^ X in

(b)

Prove

{Xn

prob)

y^

(Xn

-^ X,

a.s.).
events.

Hint.

Let Xn =

\342\200\242 where \342\200\242 \342\200\242 are \302\243^i,\302\243^2, l\302\243;\342\200\236, independent

(c) Prove that if

^^ P{\\Xn
the

X\\ >

e) <

oo.We

>

0, then

Xn -^

X, a.s.

Hint. Show

that

set {u

: Xn{(-o)-f^
X{lS)\\

-^(^)}

niay

be written
many

IJ {u; :
(d) Suppose
(JCnjfe)

\342\200\224 \\Xn{y^^

> h~^

for

infinitely

n}.

that

Xn

\342\200\224> X in

of

(-^n)

such

that
with

Xn^

probability. ~> X., a.s.


'diagonal

Prove that there

is a subsequence

Hint.

Combine (c)

the

principle'.

242

Chapter E:
from

Exercises

(e) Deduce X.
EA

(a) (Xn)

and

subsequence of
13.2.

X in probability if and only if every (d) that Xn \342\200\224> a further subsequence which converges a.s. to contains

Recall

that

if

(^

is

a random

variable

with

the

standard

normal

N(0,1)

distribution,

then

Ee^^=:exp(|A2).
Suppose

that
Xlfc=i

Sn =

ik,

\342\200\242 \342\200\242 are IID RVs each (^1,(^2, \342\200\242 let a, 6 G R, and define

with

the

N(0,1)

distribution.

Let

Xn
Prove

\342\200\224

exp{aSn

\342\200\224

hn).

that

[Xn -> but that for r

0, a.s.)

44>

(6

> 0)

> 1,
(Tn->0in\302\243O^(^

<2b/a^).

References

Aldous,

D. (1989),
K.B.

Probability New

Approximations York.

via

the

Poisson

Clumping
New

Heuristic^ Springer,
Athreya,

and Ney, P.

(1972), BranchingProcesses^
of

Springer,

York,

Berlin.

Billingsley, P.

York.

(1968), Convergence
and

Probability

Measures^

Wiley,

New

P. (1979), Probability Billingsley, (2nd edn. 1987).


BoUobas,

Measure^

Wiley,

Chichester,

New York

B.

graphs.Coll

(1987),
Math.

Martingales,
Soc.

J.

isoperimetric inequalities, Bolyai 52, 113-39.


Reading,
Theory:

and random

Probability^ Addison-Wesley, Breiman, L. (1968), Chow, Y.-S. and Teicher,H. (1978), Probability Inter

Mass..
Independence,

changeability,

Martingales,
Course

Springer,
in

New York, Harcourt,

Berlin.
Brace and
with
Wold,

A Chung, K.L. (1968),

Probability,

New

York.

Davis,

M.H.A.
transactioncosts.

and Norman,
Maths, and

Portfolio A.R. (1990),

selection

of Operation

Davis,

M.H.A.

Vintner,

R.B.

appear). and (1985), Stochastic Modelling


(1980),

Research (to

Control,

Chapman

and

Hall, London.
Meyer,

Dellacherie, C. and
and

P.-A.

Probabilites

V-VIII, Hermann, Paris.


Stroock,

et Potential,

Chaps.

J.-D. Deuschel, Press, Boston.

D.W.

(1989),

Large Deviations,
New York.

Academic

J.L.(1953), Doob,

Stochastic

Processes,

Wiley,

243

244

References
Classical

Doob,J.L.(1981),
Counterpart,Springer,

Potential

Theory

and its

Probabilistic
Part
General

New

York.
Operators:
I^

Dunford,

N. and

Linear Schwartz,J.T. (1958),

Theory,

Interscience,

New York.
Motion

Brownian Durrett, R. (1984), Ca. Belmont, worth, Dym,

and

Martingales

in Analysis,
Integrals,

Wads-

H.

and

McKean,

H.P. (1972),

Fourier Series and


and

Academic

Press,

New York. Statistical

Deviations, Large Ellis, R.S. (1985),Entropy, Springer, New York, Berlin. Markov S.N. and Kurtz, T.G. (1986), Ethier, and Convergence, Wiley, New York.

Mechanics,

Processes:

Characterization

Feller, W.

(1957),
2nd

Introduction
edn.,

Vol.1,

Wiley,

to Probability New York.


Motion

Theory and its


Diffusion,

Applications,
San

Freedman, D.

Francisco.

(1971), Brownian
Martingale Reading,

and

Holden-Day,

Garsia, A.
Progress,

(1973), Benjamin,

Inequalities: Mass.

Seminar

Notes on Recent
New
and

Grimmett,

G.R. (1989),
Oxford

Percolation Theory,
Press.
to

Springer,

York,

Berlin.

Grimmett,
Processes,

G.R. and Stirzaker, D.R.(1982), Probability


University
the

Random

Hall, P. (1988), Introduction


New

Theory

of Coverage

Processes,
and

Wiley,

York.

Hall,

P. and

Heyde, C.C. (1980), Martingale


Press,

Limit

Theory

its

Academic Application,

New York.

Van Nostrand, Halmos, P.J. (1959),Measure Theory, Princeton, NJ. Proc. Fifth Berkeley Hammersley, J.M. (1966),Harnesses, Symp. Statist, and Prob., Vol.Ill,89-117, of California Press. University

Math.

Harris,

T.E. (1963), York, Berlin.


Jones,

The

Theory

of

Branching

Processes,

Springer, New

Jones, G. and London.

T.

(1949),
S.E. New

(Translation

of) The Mabinogion, Dent, Motion and

Karatzas,I.
Calculus,

and

Schreve,

Springer,

(1988), York.

Brownian

Stochastic

Karlin,

S. and Taylor,
Academic

H.M. (1981),A
New York.

Second

Course

in Stochastic

Processes,

Press,

References
Branching Kendall, D.G. (1966), Soc, 41, 385-406. Kendall,

245
since

processes

1873,

J. London

Math.

D.G.

before
Kingman, Probability,

(1975),
after)

The
1873,

(and

genealogy of genealogy: Branching processes Bull. London Math. Soc. 7, 225-53. S.J. (1966),
Press.
Cambridge

J.F.C.

and

Taylor,

Introduction

to

Measure

and

Cambridge

University

Korner, Laha, R.

T.W. (1988),

Fourier Analysis,

University

Press.

New York. and Rohatgi, V. (1979),Probability Theory, Wiley, 2nd Griffin, London. edn.. Functions, Lukacs, E. (1970), Characteristic Blaisand Potential (English Meyer, P.-A. (1966),Probability translation), Mass. Waltham, dell, J. (1965), Mathematical Foundation of the Calculus of Probability Neveu, San Francisco. (translated from the French),Holden-Day, Neveu,

J. (1975),

Discrete-parameter Martingales,North-Holland,
(1967), York.

Amsterdam.

Parthasarathy,
Academic

K.R.
Press,

Probability

Measures on
Diffusions^
Chichester,

Metric Spaces,
and

New

Rogers, L.C.G. Ross, S.

and Williams, D. (1987), Ito 2: Martingales^ calculus,Wiley, (1976), A


First

Markov Processes^
New

York.

Course

in Probability,
Wiley, to

Macmillan, New York.


New York.

Stochastic Ross, S. (1983),

Processes, Introduction

Stroock, D.W.

(1984),An
York,

the

Theory of

Large Deviations,

Springer,
Varadhan,
Philadelphia.

New

Berlin.

S.R.S. (1984),
S.

Large Deviations and Applications, SIAM, Paradox, Encyclopaediaof


Press.
Control,

Wagon,

(1985),
Vol.

The Banach-Tarski
24,

Mathematics,

Cambridge

University
Optimal

Whittle, P.

York.
Williams, Analysis,

(1990),Risk-sensitive
(1973), D.G.

Wiley,

Chichester,

New

D. eds.

Some Kendall

basic theorems on harnesses,in Stochastic and E.F. Harding, Wiley, New York, pp.349-66.

Index

(Recall that there is a Guideto


ABRACADABRA

Notation

on

pages

xiv-xv.)

(4.9, E10.6).
decomposition

Doob adapted process(10.2):

(12.11).

<T-algebra

(1.1).

algebra of

sets (1.1).
=

almost

everywhere of

a.e. (9.1,

(1.5); 14.13);

almost

surely =

a.s. (2.4).
function (16.5).

atoms:
Azuma-Hoeffding

<T-algebra

of distribution

inequality

(E14.2).

Baire

category

theorem (A1.12).

Banach-Tarskiparadox(1.0).
Bayes'

formula

(15.7-15.9).

Bellman

Optimality
option-pricing

Principle (E10.2,
formula

15.3).

Black-Scholes
Blackwell's

(15.2).

Markov

chain (E4.8). (2.7); Second =

Bochner's Theorem (E16.5)


Borel-Cantelli

Lemmas:

First

= BCl

BC2(4.3);

Levy's

extension

of (12.15).

Bounded
branching

Convergence Theorem
process

= BDD (6.2,13.6).

(Chapter

0, E12.1).

Burkholder-Davis-Gundy

inequality (14.18).

246

Index
Caratheodory's

247

Lemma

(A1.7).

Caratheodory's
Central

Theorem: statement
Theorem

(1.7); proof

(Al.8).

Limit

(18.4).

Cesaro's Lemma (12.6).

characteristic
convergence theorem

functions:

definition

(16.1);

inversion

formula

(16.6);

(18.1).

Chebyshev's

inequality

(7.3).

coin tossing (3.7).


conditional

expectation

(Chapter

9):

properties

(9.7).

conditional probability (9.9).


consistency

of

Likelihood-Ratio
of

Test

(14.17).
expectation

contraction property
convergence

conditional

(9.7,h).

in probability
for

(13.5, A13.2).
integrals: UI

theorems convergence BDD (6.2,13.6);


convergence

MON

(5.3);

Fatou (5.4);
for

DOM (5.9);
(14.1);

for

RVs

(13.7).
(11.5);

theorems for martingales: Main Upward (14.2);Downward (14.4).


(A1.2).

UI case

c?-system

differentiation
distribution

under integral
function

sign (A16.1).
DOM

for

RV (3.10-3.11).
(5.9);

Dominated Convergence Theorem =


J.L. DOOB's
inequality

conditional

(9.7,g).

StoppingTheorem Submartingale (10.10, A14.3); Lemma (11.2) - and much else! Upcrossing
Downward

C^ Decomposition Convergence Theorem (11.5); (12.11); Theorem Optional Sampling (14.11); (A14.3-14.4); Optional

Inequality

(14.6);

Theorem

(14.4).

Dynkin's

Lemma

(A1.3).
(4.1).

events (Chapter 2): independent


expectation

(6.1):

conditional

(Chapter 9).
(1.6);

'extensionof

measures':

uniqueness

existence

(1.7).

248

Index
probability

extinction
fair

(0.4).

game

(10.5):

unfavourable (E4.7).
sets

Fatou Lemmas: for


version

(2.6,b),

2.7,c);

for functions

(5.4); conditional

(9.7,f).

filtered

space,

filtration

(10.1).

filtering (15.6-15.9).
finite

and

<T-finite

measures

(1.5).
supermartingales

Forward Convergence Theoremfor


Fubini's

(11.5).

Theorem

(8.2).

gambler's ruin (E10.7).

gamblingstrategy
Hardy

(10.6).

space

WJ (14.18).

harnesses (15.10-15.12).

hedgingstrategy
Helly-Bray

(15.2).

Lemma

(17.4).

hitting

times (10.12).
(E14.2).

inequality Hoeffding's

Holder's

inequality

(6.13).

Hunt's Lemma

(E14.1).
definitions

independence: (9.7,k, 9.10).


independence

(4.1);

7r>system

criterion

(4.2); and

conditioning

and

product

measure

(8.4).
(14.18);

inequalities:
Chebyshev

Azuma-HoefFding(E14.2);Burkholder-Davis-Gundy
(7.3);

Doob's
(6.4);

\302\243p (14.11);

Holder

and in conditionalform
(14.6);
infinite

(9.7,h);

Khinchine

- see

(6.13);

Jensen

(6.6),

Markov

Minkowski

(6.14); Schwarz (6.8).

(14.8); Kolmogorov

Theorem
integration

products

of probability
(14.12,

measures

(8.7, Chapter A9);

Kakutani's

on

14.17).

(Chapter

5).

Index

249
(9.7,h).

Jensen's

inequality

{6.6)]

conditional

form

Kakutani's
Kalman-Bucy

Theorem on likelihoodratios (14.12, 14.17).


filter

(15.6-15.9).

A.N.

KOLMOGOROV's
Inequality

Law
Truncation

of

Definition of ConditionalExpectation (9.2); of the Iterated Logarithm (A4.1, 14.7); Strong (14.6); Theorem (12.5); Large Numbers (12.10, 14.5); Three-Series Zero-One Lemma (0-1) Law (4.11, 14.3). (12.9);
Law

Kronecker's Lemma(12.7).
Laplace law

transforms: of

inversion

(E7.1);

and weak

convergence (E18.7).

random

variable

(3.9): joint

laws (8.3).

predictor least-squares-best (9.4).


Lebesgue

integral

(Chapter

5).
1.9).

Lebesgue
Lebesgue

measure =
spaces

Leb (1.8,A
(6.10).

L^ \302\243p,

P.LEVY's

Convergence Theorem for CFs (18.1);Downward Lemmas martingales (14.4); Extension of Borel-Cantelli Inversion for CFs (16.6); Upward Theorem for formula
(14.2).

Theorem (12.15); martingales

for

Likelihood-Ratio
sheep

Test, consistency
(15.3-15.5).

of (14.17).

Mabinogion

Markov

chain (4.8,

10.13).
(6.4).

Markov'sinequality
martingale

(Chapters
Optional-Stopping

10-15!):

(11.5);
martingale

definition (10.3);
Theorem

Theorem Convergence
A14.3);

(10.9-10.10,

Optional-

Sampling Theorem
transform

(Chapter A14).

(10.6)

measurable

function

(3.1).

measurable space (1.1).

measure space (1.4).


Minkowski's

inequality

(6.14).

Moment

Problem

(E18.6).

250
monkey

Index
typing

Shakespeare

(4.9).

Monotone-Class

Theorem (3.14, A1.3).


for

Theorem: Monotone-Convergence
Chapter

sets

(1.10);

for functions

(5.3,

A5);

conditional

version

(9.7,e).
convergence.

narrow convergence option

see

weak

pricing

(15.2).

Optional-Sampling
Optional-Stopping

Theorem (Chapter A14).


Theorems

(10.9-10.10,
time

A14.3).

optional
orthogonal
outer

time - seestopping
projection
(A1.6).

(10.8).

(6.11): and

conditionalexpectation (9.4-9.5). (1.6, 4.2).

measures

TT-system

(1.6):
urn

Uniqueness Lemmas
E10.8).

Polya's
previsible
probability

(ElO.l,

(= predictable)
density measure

process (10.6).
= pdf

function (1.5).

(6.12); joint

(8.3).

probability

probability

triple

(2.1).

product measures (Chapter8).


Pythagoras's

Theorem

(6.9).

Radon-Nikodym

theorem

(5.14, 14.13-14.14).

random signs (12.3).


random

walk:

hitting

times

(10.12, E10.7);

on free group (EG.3-EG.4).

RecordProblem E12.2, (E4.3, 18.5).


regular

conditional

probability

(9.9).

Riemann

integral (5.3).

samplepath (4.8)

sample
sample

point

(2.1)

space

(2.1)

Index

251

Schwarz inequality (6.8).

Star

Trek

problems

(ElO.lO,

ElO.ll,

E12.3).

stopped process

(10.9).
(A14.1).

<t-algebras stopping times (10.8);associated

Strassen's

Law of
Laws

the Iterated Logarithm


12.10,

(A4.2).

Strong

(7.2,

12.14,

14.5).

submartingales
theorem

(11.5); optional
functions

and supermartingales:definitions convergence (10.3); optionalsampling stopping (10.9-10.10);


for

(A14.4).
superharmonic

Markov

chains (10.13).

symmetrization technique(12.4).
cf-system

(A1.2);

7r-system

(1.6).

tail <T-algebra (4.10-4.12,


Tchebycheff:

14.3).
(7.3).

Chebyshev

Three-Series

Theorem (12.5).

tightness (17.5).
Tower

Property

(9.7,i).

Truncation

Lemma

(12.9).

uniform integrability
Upcrossing

(Chapter 13).

Lemma

(11.1-11.2).

Uniqueness

Lemma

(1.6).
functions
(E18.7).

weak convergence (Chapter17):and characteristic moments (E18.3-18.4); and Laplace transforms


Weierstrass

(18.1);

and

approximation
(4.11,

theorem
14.3).

(7.4).

Zero-one law = 0-1law

You might also like