You are on page 1of 552

This page intentionally left blank

Probability Theory
An Analytic View, Second Edition
This second edition of Daniel W. Stroocks text is suitable for rst-year graduate
students with a good grasp of introductory undergraduate probability. It provides
a reasonably thorough introduction to modern probability theory with an emphasis on the mutually benecial relationship between probability theory and analysis. It includes more than 750 exercises and oers new material on Levy processes,
large deviations theory, Gaussian measures on a Banach space, and the relationship
between a Wiener measure and partial dierential equations.
The rst part of the book deals with independent random variables, Central Limit
phenomena, the general theory of weak convergence and several of its applications, as
well as elements of both the Gaussian and Markovian theories of measures on function
space. The introduction of conditional expectation values is postponed until the
second part of the book, where it is applied to the study of martingales. This part also
explores the connection between martingales and various aspects of classical analysis
and the connections between a Wiener measure and classical potential theory.
Dr. Daniel W. Stroock is the Simons Professor of Mathematics Emeritus at the
Massachusetts Institute of Technology. He has published many articles and is the
author of six books, most recently Partial Dierential Equations for Probabilists
(2008).

Probability Theory
An Analytic View
Second Edition

Daniel W. Stroock
Massachusetts Institute of Technology

cambridge university press


Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore,
S
a o Paulo, Delhi, Dubai, Tokyo, Mexico City
Cambridge University Press
32 Avenue of the Americas, New York, NY 10013-2473, USA
www.cambridge.org
Information on this title: www.cambridge.org/9780521132503
c Daniel W. Stroock 1994, 2011

This publication is in copyright. Subject to statutory exception
and to the provisions of relevant collective licensing agreements,
no reproduction of any part may take place without the written
permission of Cambridge University Press.
First edition published 1994
First paperback edition 2000
Second edition published 2011
Printed in the United States of America
A catalog record for this publication is available from the British Library.
Library of Congress Cataloging in Publication data
Stroock, Daniel W.
Probability theory : an analytic view/ Daniel W. Stroock. 2nd ed.
p. cm.
Includes bibliographical references and index.
ISBN 978-0-521-76158-1 (hardback) ISBN 978-0-521-13250-3 (pbk.)
1. Probabilities. I. Title.
QA 273.S763 2010
519.2dc22
2010027652
ISBN 978-0-521-76158-1 Hardback
ISBN 978-0-521-13250-3 Paperback
Cambridge University Press has no responsibility for the persistence or accuracy of URLs for
external or third-party Internet Web sites referred to in this publication and does not guarantee
that any content on such Web sites is, or will remain, accurate or appropriate.

This book is dedicated to my teachers:


M. Kac, H.P. McKean, Jr., and S.R.S. Varadhan

Contents

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
Table of Dependence . . . . . . . . . . . . . . . . . . . . . . xxi
Chapter 1 Sums of Independent Random
1.1 Independence . . . . . . . . . . . .
1.1.1. Independent -Algebras . . . . . .
1.1.2. Independent Functions . . . . . . .
1.1.3. The Rademacher Functions . . . . .
Exercises for 1.1 . . . . . . . . . . . .
1.2 The Weak Law of Large Numbers . . .
1.2.1. Orthogonal Random Variables . . .
1.2.2. Independent Random Variables . . .
1.2.3. Approximate Identities . . . . . . .
Exercises for 1.2 . . . . . . . . . . . .
1.3 Cramers Theory of Large Deviations . .
Exercises for 1.3 . . . . . . . . . . . .
1.4 The Strong Law of Large Numbers . . .
Exercises for 1.4 . . . . . . . . . . . .
1.5 Law of the Iterated Logarithm . . . .
Exercises for 1.5 . . . . . . . . . . . .

Variables
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .

Chapter 2 The Central Limit Theorem . . . .


2.1 The Basic Central Limit Theorem . . . . . .
2.1.1. Lindebergs Theorem . . . . . . . . . .
2.1.2. The Central Limit Theorem . . . . . . .
Exercises for 2.1 . . . . . . . . . . . . . . .
2.2 The BerryEsseen Theorem via Steins Method
2.2.1. L1 -BerryEsseen . . . . . . . . . . . .
2.2.2. The Classical BerryEsseen Theorem . . .
Exercises for 2.2 . . . . . . . . . . . . . . .
2.3 Some Extensions of The Central Limit Theorem
2.3.1. The Fourier Transform . . . . . . . . . .
2.3.2. Multidimensional Central Limit Theorem . .
2.3.3. Higher Moments . . . . . . . . . . . .
Exercises for 2.3 . . . . . . . . . . . . . . .
vii

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.

1
1
1
4
5
7
14
14
15
16
20
22
31
35
42
49
56
59
60
60
62
65
71
72
75
81
82
82
84
87
90

viii

Contents

2.4 An Application to Hermite Multipliers


2.4.1. Hermite Multipliers . . . . . . .
2.4.2. Beckners Theorem . . . . . . .
2.4.3. Applications of Beckners Theorem
Exercises for 2.4 . . . . . . . . . . .

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

. 96
. 96
. 101
. 105
. 110

Chapter 3 Infinitely Divisible Laws . . . . . . . . . . . . . . . 115


3.1 Convergence of Measures on RN . . .
3.1.1. Sequential Compactness in M1 (RN )
3.1.2. Levys Continuity Theorem . . . .
Exercises for 3.1 . . . . . . . . . . .
3.2 The LevyKhinchine Formula . . . .
3.2.1. I(RN ) Is the Closure of P(RN ) . .
3.2.2. The Formula . . . . . . . . . .
Exercises for 3.2 . . . . . . . . . . .
3.3 Stable Laws . . . . . . . . . . . .
3.3.1. General Results . . . . . . . . .
3.3.2. -Stable Laws . . . . . . . . .
Exercises for 3.3 . . . . . . . . . . .
Chapter 4 L
evy Processes

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

115
116
117
119
122
123
126
137
139
139
141
147

. . . . . . . . . . . . . . . . . . . 151

4.1 Stochastic Processes, Some Generalities . . .


4.1.1. The Space D(RN ) . . . . . . . . . . .
4.1.2. Jump Functions . . . . . . . . . . . .
Exercises for 4.1 . . . . . . . . . . . . . .
4.2 Discontinuous Levy Processes . . . . . . .
4.2.1. The Simple Poisson Process . . . . . . .
4.2.2. Compound Poisson Processes . . . . . .
4.2.3. Poisson Jump Processes . . . . . . . .
4.2.4. Levy Processes with Bounded Variation .
4.2.5. General, Non-Gaussian Levy Processes . .
Exercises for 4.2 . . . . . . . . . . . . . .
4.3 Brownian Motion, the Gaussian Levy Process
4.3.1. Deconstructing Brownian Motion . . . .
4.3.2. Levys Construction of Brownian Motion .
4.3.3. Levys Construction in Context . . . . .
4.3.4. Brownian Paths Are Non-Differentiable .
4.3.5. General Levy Processes . . . . . . . .
Exercises for 4.3 . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

152
153
156
159
160
161
163
168
170
171
174
177
178
180
182
183
185
187

Contents

ix

Chapter 5 Conditioning and Martingales . . . . . . . . . . . . 193


5.1 Conditioning . . . . . . . . . . . . . . . .
5.1.1. Kolmogorovs Definition . . . . . . . . . .
5.1.2. Some Extensions . . . . . . . . . . . . .
Exercises for 5.1 . . . . . . . . . . . . . . . .
5.2 Discrete Parameter Martingales . . . . . . . .
5.2.1. Doobs Inequality and Marcinkewitzs Theorem
5.2.2. Doobs Stopping Time Theorem . . . . . . .
5.2.3. Martingale Convergence Theorem . . . . . .
5.2.4. Reversed Martingales and De Finettis Theory .
5.2.5. An Application to a Tracking Algorithm . . .
Exercises for 5.2 . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

193
194
198
202
205
206
212
214
217
221
226

Chapter 6 Some Extensions and Applications of Martingale


Theory . . . . . . . . . . . . . . . . . . . . . . 233
6.1 Some Extensions . . . . . . . . . . . . . . .
6.1.1. Martingale Theory for a -Finite Measure Space
6.1.2. Banach SpaceValued Martingales . . . . . .
Exercises for 6.1 . . . . . . . . . . . . . . . .
6.2 Elements of Ergodic Theory . . . . . . . . . .
6.2.1. The Maximal Ergodic Lemma . . . . . . . .
6.2.2. Birkhoffs Ergodic Theorem . . . . . . . .
6.2.3. Stationary Sequences . . . . . . . . . . .
6.2.4. Continuous Parameter Ergodic Theory . . . .
Exercises for 6.2 . . . . . . . . . . . . . . . .
6.3 Burkholders Inequality . . . . . . . . . . . .
6.3.1. Burkholders Comparison Theorem . . . . .
6.3.2. Burkholders Inequality . . . . . . . . . .
Exercises for 6.3 . . . . . . . . . . . . . . . .
Chapter 7 Continuous Parameter Martingales
7.1 Continuous Parameter Martingales . . . . . .
7.1.1. Progressively Measurable Functions . . . .
7.1.2. Martingales: Definition and Examples . . .
7.1.3. Basic Results . . . . . . . . . . . . . .
7.1.4. Stopping Times and Stopping Theorems . .
7.1.5. An Integration by Parts Formula . . . . .
Exercises for 7.1 . . . . . . . . . . . . . . .
7.2 Brownian Motion and Martingales . . . . . .
7.2.1. Levys Characterization of Brownian Motion
7.2.2. DoobMeyer Decomposition, an Easy Case .
7.2.3. Burkholders Inequality Again . . . . . . .

. .
.
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

233
233
239
240
244
245
248
251
253
256
257
257
262
263

. . . . . . . . . 266
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

266
266
267
270
272
276
280
282
282
284
289

Contents
Exercises for 7.2 . . . . . . . . . . . .
7.3 The Reflection Principle Revisited . . .
7.3.1. Reflecting Symmetric Levy Processes
7.3.2. Reflected Brownian Motion . . . . .
Exercises for 7.3 . . . . . . . . . . . .

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

290
292
292
294
298

Chapter 8 Gaussian Measures on a Banach Space .


8.1 The Classical Wiener Space . . . . . . . . . . .
8.1.1. Classical Wiener Measure . . . . . . . . . .
8.1.2. The Classical CameronMartin Space . . . . .
Exercises for 8.1 . . . . . . . . . . . . . . . . .
8.2 A Structure Theorem for Gaussian Measures . . .
8.2.1. Ferniques Theorem . . . . . . . . . . . . .
8.2.2. The Basic Structure Theorem . . . . . . . . .
8.2.3. The CameronMarin Space . . . . . . . . . .
Exercises for 8.2 . . . . . . . . . . . . . . . . .
8.3 From Hilbert to Abstract Wiener Space . . . . .
8.3.1. An Isomorphism Theorem . . . . . . . . . .
8.3.2. Wiener Series . . . . . . . . . . . . . . . .
8.3.3. Orthogonal Projections . . . . . . . . . . . .
8.3.4. Pinned Brownian Motion . . . . . . . . . . .
8.3.5. Orthogonal Invariance . . . . . . . . . . . .
Exercises for 8.3 . . . . . . . . . . . . . . . . .
8.4 A Large Deviations Result and Strassens Theorem
8.4.1. Large Deviations for Abstract Wiener Space . .
8.4.2. Strassens Law of the Iterated Logarithm . . . .
Exercises for 8.4 . . . . . . . . . . . . . . . . .
8.5 Euclidean Free Fields . . . . . . . . . . . . .
8.5.1. The OrnsteinUhlenbeck Process . . . . . . .
8.5.2. OrnsteinUhlenbeck as an Abstract Wiener Space
8.5.3. Higher Dimensional Free Fields . . . . . . . .
Exercises for 8.5 . . . . . . . . . . . . . . . . .
8.6 Brownian Motion on a Banach Space . . . . . . .
8.6.1. Abstract Wiener Formulation . . . . . . . . .
8.6.2. Brownian Formulation . . . . . . . . . . . .
8.6.3. Strassens Theorem Revisited . . . . . . . . .
Exercises for 8.6 . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

299
299
299
303
306
306
306
307
310
313
317
317
318
322
326
328
330
337
337
340
342
343
344
346
349
355
358
358
361
363
365

Chapter 9 Convergence of Measures on a Polish Space


9.1 ProhorovVaradarajan Theory . . . . . . . . . . .
9.1.1. Some Background . . . . . . . . . . . . . . . .
9.1.2. The Weak Topology . . . . . . . . . . . . . . .
9.1.3. The Levy Metric and Completeness of M1 (E) . . . .

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

367
367
367
370
377

Contents
Exercises for 9.1 . . . . . . . . . . . . . . .
9.2 Regular Conditional Probability Distributions .
9.2.1. Fibering a Measure . . . . . . . . . . .
9.2.2. Representing Levy Measures via the Ito Map
Exercises for 9.2 . . . . . . . . . . . . . . .
9.3 Donskers Invariance Principle . . . . . . . .
9.3.1. Donskers Theorem . . . . . . . . . . .
9.3.2. Rayleighs Random Flights Model . . . . .
Exercise for 9.3 . . . . . . . . . . . . . . .

xi
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

381
386
388
390
392
392
393
396
399

Chapter 10 Wiener Measure and Partial Differential Equations


10.1 Martingales and Partial Differential Equations . . . . . . . .
10.1.1. Localizing and Extending Martingale Representations . . . .
10.1.2. Minimum Principles . . . . . . . . . . . . . . . . . .
10.1.3. The Hermite Heat Equation . . . . . . . . . . . . . . .
10.1.4. The Arcsine Law . . . . . . . . . . . . . . . . . . . .
10.1.5. Recurrence and Transience of Brownian Motion . . . . . .
Exercises for 10.1 . . . . . . . . . . . . . . . . . . . . . .
10.2 The Markov Property and Potential Theory . . . . . . . . .
10.2.1. The Markov Property for Wiener Measure . . . . . . . . .
10.2.2. Recurrence in One and Two Dimensions . . . . . . . . . .
10.2.3. The Dirichlet Problem . . . . . . . . . . . . . . . . .
Exercises for 10.2 . . . . . . . . . . . . . . . . . . . . . .
10.3 Other Heat Kernels . . . . . . . . . . . . . . . . . . . .
10.3.1. A General Construction . . . . . . . . . . . . . . . . .
10.3.2. The Dirichlet Heat Kernel . . . . . . . . . . . . . . . .
10.3.3. FeynmanKac Heat Kernels . . . . . . . . . . . . . . .
10.3.4. Ground States and Associated Measures on Pathspace . . .
10.3.5. Producing Ground States . . . . . . . . . . . . . . . .
Exercises for 10.3 . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

400
400
401
404
405
407
411
415
416
416
417
418
426
429
429
431
436
439
445
449

Chapter 11 Some Classical Potential Theory . . . . . .


11.1 Uniqueness Refined . . . . . . . . . . . . . . . . .
11.1.1. The Dirichlet Heat Kernel Again . . . . . . . . . .
11.1.2. Exiting Through reg G . . . . . . . . . . . . . .
11.1.3. Applications to Questions of Uniqueness . . . . . . .
11.1.4. Harmonic Measure . . . . . . . . . . . . . . . .
Exercises for 11.1 . . . . . . . . . . . . . . . . . . .
11.2 The Poisson Problem and Green Functions . . . . . . .
11.2.1. Green Functions when N 3 . . . . . . . . . . .
11.2.2. Green Functions when N {1, 2} . . . . . . . . . .
Exercises for 11.2 . . . . . . . . . . . . . . . . . . .
11.3 Excessive Functions, Potentials, and Riesz Decompositions

.
.
.
.
.
.
.
.
.
.
.
.

456
456
456
459
463
468
472
475
476
477
486
487

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

xii

Contents

11.3.1. Excessive Functions . . . . . . . . .


11.3.2. Potentials and Riesz Decomposition .
Exercises for 11.3 . . . . . . . . . . . .
11.4 Capacity . . . . . . . . . . . . . .
11.4.1. The Capacitory Potential . . . . . .
11.4.2. The Capacitory Distribution . . . . .
11.4.3. Wieners Test . . . . . . . . . . .
11.4.4. Some Asymptotic Expressions Involving
Exercises for 11.4 . . . . . . . . . . . .

. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
Capacity
. . . .

.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
. .

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

488
489
496
497
497
500
504
507
514

Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 521

Preface

From the Preface to the First Edition


When writing a graduate level mathematics book during the last decade of
the twentieth century, one probably ought not inquire too closely into ones
motivation. In fact, if ones own pleasure from the exercise is not sufficient to
justify the effort, then one should seriously consider dropping the project. Thus,
to those who (either before or shortly after opening it) ask for whom was this
book written, my pale answer is me; and, for this reason, I thought that I should
preface this preface with an explanation of who I am and what were the peculiar
educational circumstances that eventually gave rise to this somewhat peculiar
book.
My own introduction to probability theory began with a private lecture from
H.P. McKean, Jr. At the time, I was a (more accurately, the) graduate student
of mathematics at what was then called The Rockefeller Institute for Biological Sciences. My official mentor there was M. Kac, whom I had cajoled into
becoming my adviser after a year during which I had failed to insert even one
micro-electrode into the optic nerves of innumerable limuli. However, as I soon
came to realize, Kac had accepted his role on the condition that it would not
become a burden. In particular, he had no intention of wasting much of his
own time on a reject from the neurophysiology department. On the other hand,
he was most generous with the time of his younger associates, and that is how
I wound up in McKeans office. Never one to bore his listeners with a lot of
dull preliminaries, McKean launched right into a wonderfully lucid explanation
of P. Levys interpretation of the infinitely divisible laws. I have to admit that
my appreciation of the lucidity of his lecture arrived nearly a decade after its
delivery, and I can only hope that my reader will reserve judgment of my own
presentation for an equal length of time.
In spite of my perplexed state at the end of McKeans lecture, I was sufficiently
intrigued to delve into the readings that he suggested at its conclusion. Knowing
that the only formal mathematics courses that I would be taking during my
graduate studies would be given at N.Y.U. and guessing that those courses would
be oriented toward partial differential equations, McKean directed me to material
which would help me understand the connections between partial differential
equations and probability theory. In particular, he suggested that I start with
the, then recently translated, two articles by E.B. Dynkin which had appeared
originally in the famous 1956 volume of Teoriya Veroyatnostei i ee Primeneniya.
Dynkins articles turned out to be a godsend. They were beautifully crafted to
xiii

xiv

Preface

tell the reader enough so that he could understand the ideas and not so much
that he would become bored by them. In addition, they gave me an introduction
to a host of ideas and techniques (e.g., stopping times and the strong Markov
property), all of which Kac himself consigned to the category of overelaborated
measure theory. In fact, it would be reasonable to say that my thesis was simply
the application of techniques which I picked up from Dynkin to a problem that
I picked up by reading some notes by Kac. Of course, along the way I profited
immeasurably from continued contact with McKean, a large number of courses
at N.Y.U. (particularly ones taught by M. Donsker, F. John, and L. Nirenberg),
and my increasingly animated conversations with S.R.S. Varadhan.
As I trust the preceding description makes clear, my graduate education was
anything but deprived; I had ready access to some of the very best analysts
of the day. On the other hand, I never had a proper introduction to my field,
probability theory. The first time that I ever summed independent random
variables was when I was summing them in front of a class at N.Y.U. Thus,
although I now admire the magnificent body of mathematics created by A.N.
Kolmogorov, P. Levy, and the other twentieth-century heroes of the field, I
am not a dyed-in-the-wool probabilist (i.e., what Donsker would have called a
true coin-tosser). In particular, I have never been able to develop sufficient
sensitivity to the distinction between a proof and a probabilistic proof. To me,
a proof is clearly probabilistic only if its punch-line comes down to an argument
like P (A) P (B) because A B; and there are breathtaking examples of such
arguments. However, to base an entire book on these examples would require a
level of genius that I do not possess. In fact, I myself enjoy probability theory
best when it is inextricably interwoven with other branches of mathematics and
not when it is presented as an entity unto itself. For this reason, the reader
should not be surprised to discover that he finds some of the material presented
in this book does not belong here; but I hope that he will make an effort to figure
out why I disagree with him.
Preface to the Second Edition
My favorite preface to a second edition is the one that G.N. Watson wrote for
the second edition of his famous treatise on Bessel functions. The first edition
appeared in 1922, the second came out in 1941, and Watson had originally
intended to stay abreast of developments and report on them in the second
edition. However, in his preface to the second edition Watson admits that his
interest in the topic had waned during the intervening years and apologizes
that, as a consequence, the new edition contains less new material than he had
thought it would.
My excuse for not incorporating more new material into this second edition is
related to but somewhat different from Watsons. In my case, what has waned
is not my interest in probability theory but instead my ability to assimilate
the transformations that the subject has undergone. When I was a student,

Preface

xv

probabilists were still working out the ramifications of Kolmogorovs profound


insights into the connections between probability and analysis, and I have spent
my career investigating and exploiting those connections. However, about the
time when the first edition of this book was published, probability theory began
a return to its origins in combinatorics, a topic in which my abilities are woefully
deficient. Thus, although I suspect that, for at least a decade, the most exciting
developments in the field will have a strong combinatorial component, I have
not attempted to prepare my readers for those developments. I repeat that my
decision not to incorporate more combinatorics into this new edition in no way
reflects my assessment of the direction in which probability is likely to go but
instead reflects my assessment of my own inability to do justice to the beautiful
combinatorial ideas that have been introduced in the recent past.
In spite of the preceding admission, I believe that the material in this book
remains valuable and that, no matter how probability theory evolves, the ideas
and techniques presented here will play an important role. Furthermore, I have
made some substantive changes. In particular, I have given more space to infinitely divisible laws and their associated Levy processes, both of which are now
developed in RN rather than just in R. In addition, I have added an entire chapter devoted to Gaussian measures in infinite dimensions from the perspective of
the SegalGross school. Not only have recent developments in Malliavin calculus
and conformal field theory sparked renewed interest in this topic, but it seems to
me that most modern texts pay either no or too little attention to this beautiful
material. Missing from the new edition is the treatment of singular integrals. I
included it in the first edition in the hope that it would elucidate the similarity
between cancellations that underlie martingale theory, especially Burkholders
Inequality, and CalderonZygmund theory. I still believe that these similarities
are worth thinking about, but I have decided that my explanation of them led
me too far astray and was more of a distraction than a pedagogically valuable
addition.
Besides those mentioned above, minor changes have been made throughout.
For one thing, I have spent a lot of time correcting old errors and, undoubtedly,
inserting new ones. Secondly, I have made several organizational changes as well
as others that are remedial. A summary of the contents follows.
Summary
1: Chapter 1 contains a sampling of the standard, pointwise convergence theorems dealing with partial sums of independent random variables. These include
the Weak and Strong Laws of Large Numbers as well as HartmanWintners Law
of the Iterated Logarithm. In preparation for the Law of the Iterated Logarithm,
Cramers theory of large deviations from the Law of Large Numbers is developed
in 1.4. Everything here is very standard, although I feel that my passage from
the bounded to the general case of the Law of the Iterated Logarithm has been

xvi

Preface

considerably smoothed by the ideas that I learned during a conversation with


M. Ledoux.
2: The whole of Chapter 2 is devoted to the classical Central Limit Theorem.
After an initial (and slightly flawed) derivation of the basic result via moment
considerations, Lindebergs general version is derived in 2.1. Although Lindebergs result has become a sine qua non in the writing of probability texts, the
BerryEsseen estimate has not. Indeed, until recently, the BerryEsseen estimate required a good many somewhat tedious calculations with characteristic
functions (i.e., Fourier transforms), and most recent authors seem to have decided that the rewards did not justify the effort. I was inclined to agree with
them until P. Diaconis brought to my attention E. Bolthausens adaptation of
C. Steins techniques (the so-called Steins method) to give a proof that is not
only brief but also, to me, aesthetically pleasing. In any case, no use of Fourier
methods is made in the derivation given in 2.2. On the other hand, Fourier
techniques are introduced in 2.3, where it is shown that even elementary Fourier
analytic tools lead to important extensions of the basic Central Limit Theorem
to more than one dimension. Finally, in 2.4, the Central Limit Theorem is applied to the study of Hermite multipliers and (following Wm. Beckner) is used to
derive both E. Nelsons hypercontraction estimate for the Mehler kernel as well
as Beckners own estimate for the Fourier transform. I am afraid that, with this
flagrant example of the sort of thing that does not belong here, I may be trying
the patience of my purist colleagues. However, I hope that their indignation
will be somewhat assuaged by the fact that the rest of the book is essentially
independent of the material in 2.4.
3: This chapter is devoted to the study of infinitely divisible laws. It begins
in 3.1 with a few refinements (especially The Levy Continuity Theorem) of
the Fourier techniques introduced in 2.3. These play a role in 3.2, where
the LevyKhinchine formula is first derived and then applied to the analysis of
stable laws.
4: In Chapter 4 I construct the Levy processes (a.k.a. independent increment
processes) corresponding to infinitely divisible laws. Secton 4.1 provides the requisite information about the pathspace D(RN ) of right-continuous paths with
left limits, and 4.2 gives the construction of Levy processes with discontinuous
paths, the ones corresponding to infinitely divisible laws having no Gaussian
part. Finally, in 4.3 I construct Brownian motion, the Levy process with continuous paths, following the prescription given by Levy.
5: Because they are not needed earlier, conditional expectations do not appear
until Chapter 5. The advantage gained by this postponement is that, by the
time I introduce them, I have an ample supply of examples to which conditioning can be applied; the disadvantage is that, with considerable justice, many
probabilists feel that one is not doing probability theory until one is conditioning. Be that as it may, Kolmogorovs definition is given in 5.1 and is shown

Preface

xvii

to extend naturally both to -finite measure spaces as well as to random variables with values in a Banach space. Section 5.2 presents Doobs basic theory
of real-valued, discrete parameter martingales: Doobs Inequality, his Stopping
Time Theorem, and his Martingale Convergence Theorem. In the last part of
5.2, I introduce reversed martingales and apply them to DeFinettis theory of
exchangeable random variables.
6: Chapter 6 opens with extensions of martingale theory in two directions: to
-finite measures and to random variables with values in a Banach space. The
results in 6.1 are used in 6.2 to derive Birkhoffs Individual Ergodic Theorem
and a couple of its applications. Finally, in 6.3 I prove Burkholders Inequality
for martingales with values in a Hilbert space. The derivation that I give is
essentially the same as Burkholders second proof, the one that gives optimal
constants.
7: Section 7.1 provides a brief introduction to the theory of martingales with
a continuous parameter. As anyone at all familiar with the topic knows, anything approaching a full account of this theory requires much more space than a
book like this can give it. Thus, I deal with only its most rudimentary aspects,
which, fortunately, are sufficient for the applications to Brownian motion that I
have in mind. Namely, in 7.2 I first discuss the intimate relationship between
continuous martingales and Brownian motion (Levys martingale characterization of Brownian motion), then derive the simplest (and perhaps most widely
applied) case of the DoobMeyer Decomposition Theory, and finally show what
Burkholders Inequality looks like for continuous martingales. In the concluding section, 7.3, the results in 7.17.2 are applied to derive the Reflection
Principle for Brownian motion.
8: In 8.1 I formulate the description of Brownian motion in terms of its Gaussian, as opposed to its independent increment, properties. More precisely, following Segal and Gross, I attempt to convince the reader that Wiener measure
(i.e., the distribution of Brownian motion) would like to be the standard Gauss
measure on the Hilbert space H 1 (RN ) of absolutely continuous paths with a
square integrable derivative, but, for technical reasons, cannot live there and
has to settle for a Banach space in which H 1 (RN ) is densely embedded. Using
Wiener measure as the model, in 8.2 I show that, at an abstract level, any
non-degenerate, centered Gaussian measure on an infinite dimensional, separable Banach space shares the same structure as Wiener measure in the sense
that there is always a densely embedded Hilbert space, known as the Cameron
Martin space, for which it would like to be the standard Gaussian measure but
on which it does not fit. In order to carry out this program, I need and prove
Ferniques Theorem for Gaussian measures on a Banach space. In 8.3 I begin
by going in the opposite direction, showing how to pass from a Hilbert space H
to a Gaussian measure on a Banach space E for which H is the CameronMartin
space. The rest of 8.3 gives two applications: one to pinned Brownian motion

xviii

Preface

and the second to a very general statement of orthogonal invariance for Gaussian
measures. The main goal of 8.4 is to prove a large deviations result, known as
Schilders Theorem, for abstract Wiener spaces; and once I have Schilders Theorem, I apply it to derive a version of Strassens Law of the Iterated Logarithm.
Starting with the OrnsteinUhlenbeck process, I construct in 8.5 a family of
Gaussian measures known in the mathematical physics literature as Euclidean
free fields. In the final section, 8.6, I first show how to construct Banach space
valued Brownian motion and then derive the original form of Strassens Law of
the Iterated Logarithm in that context.
9: The central topic here is the abstract theory of weak convergence of probability measures on a Polish space. The basic theory is developed in 9.1. In
9.2 I apply the theory to prove the existence of regular conditional probability
distributions, and in 9.3 I use it to derive Donskers Invariance Principle (i.e.,
the pathspace statement of the Central Limit Theorem).
10: Chapter 10 is an introduction to the connections between probability theory and partial differential equations. At the beginning of 10.1 I show that
martingale theory provides a link between probability theory and partial differential equations. More precisely, I show how to represent in terms of Wiener
integrals solutions to parabolic and elliptic partial differential equations in which
the Laplacian is the principal part. In the second part of 10.1, I use this link to
calculate various Wiener integrals. In 10.2 I introduce the Markov property of
Wiener measure and show how it not only allows one to evaluate other Wiener
integrals in terms of solutions to elliptic partial differential equations but also
enables one to prove interesting facts about solutions to such equations as a consequence of their representation in terms of Wiener integrals. Continuing in the
same spirit, I show in 10.2 how to represent solutions to the Dirichlet problem
in terms of Wiener integrals, and in 10.3 I use Wiener measure to construct
and discuss heat kernels related to the Laplacian.
11: The final chapter is an extended example of the way in which probability
theory meshes with other branches of analysis, and the example that I have chosen is the marriage between Brownian motion and classical potential theory. Like
an ideal marriage, this one is simultaneously intimate and mutually beneficial to
both partners. Indeed, the more one knows about it, the more convinced one becomes that the properties of Brownian paths are a perfect reflection of properties
of harmonic functions, and vice versa. In any case, in 11.1 I sharpen the results
in 10.2.3 and show that, in complete generality, the solution to the Dirichlet
problem is given by the Wiener integral of the boundary data evaluated at the
place where Brownian paths exit from the region. Next, in 11.2, I discuss the
Green function for a region and explain how its existence reflects the recurrence
and transience properties of Brownian paths. In preparation for 11.4, 11.3 is
devoted to the Riesz Decomposition Theorem for excessive functions. Finally,
in 11.4, I discuss the capacity of regions, derive Chungs representation of the

Preface

xix

capacitory measure in terms of the last place where a Brownian path visits a
region, apply the probabilistic interpretation of capacity to give a derivation of
Wieners test for regularity, and conclude with two asymptotic calculations in
which capacity plays a crucial role.
Suggestions about the Use of This Book
In spite of the realistic assessment contained in the first paragraph of its preface,
when I wrote the first edition of this book I harbored the nave hope that it
might become the standard graduate text in probability theory. By the time
that I started preparing the second edition, I was significantly older and far less
nave about its prospects. Although the first edition has its admirers, it has
done little to dent the sales record of its competitors. In particular, the first
edition has seldom been adopted as the text for courses in probability, and I
doubt that the second will be either. Nonetheless, I close this preface with a few
suggestions for anyone who does choose to base a course on it.
I am well aware that, except for those who find their way into the poorly
stocked library of some prison camp, few copies of this book will be read from
cover to cover. For this reason, I have attempted to organize it in such a way that,
with the help of the table of dependence that follows, a reader can select a path
which does not require his reading all the sections preceding the information he
is seeking. For example, the contents of 1.11.2, 1.4, 2.1, 2.3, and 5.1
5.2 constitute the backbone of a one semester, graduate level introduction to
probability theory. What one attaches to this backbone depends on the speed
with which these sections are covered and the content of the courses for which
the course is the introduction. If the goal is to prepare the students for a career
as a quant in what is left of the financial industry, an obvious choice is 4.3
and as much of Chapter 7 as time permits, thereby giving ones students a
reasonably solid introduction to Brownian motion. On the other hand, if one
wants the students to appreciate that white noise is not the only noise that they
may encounter in life, one might defer the discussion of Brownian motion and
replace it with the material in Chapter 3 and 4.14.2.
Alternatively, one might use this book in a more advanced course. An introduction to stochastic processes with an emphasis on their relationship to partial
differential equations can be constructed out of Chapters 6, 7, 10, and 11, and
4.3 combined with Chapter 8 could be used to provide background for a course
on Gaussian processes.
Whatever route one takes through this book, it will be a great help to your
students for you to suggest that they consult other texts. Indeed, it is a familiar
fact that the third book one reads on a subject is always the most lucid, and so
one should suggest at least two other books. Among the many excellent choices
available, I mention Wm. Fellers An Introduction to Probability Theory and Its
Applications, Vol. II, and M. Loeves classic Probability Theory. In addition, for

xx

Preface

background, precision (including accuracy of attribution), and supplementary


material, R. Dudleys Real Analysis and Probability is superb.

Table of Dependence

11.111.4

10.110.3

7.47.5

8.18.5

9.3

7.17.3

3.23.3

4.14.3

6.1 & 6.3

6.2

2.3

3.1 & 3.4

5.1 & 5.3

1.5

2.1 & 2.2

1.11.4

9.19.2

xxi

Chapter 1
Sums of Independent Random Variables

In one way or another, most probabilistic analysis entails the study of large
families of random variables. The key to such analysis is an understanding
of the relations among the family members; and of all the possible ways in
which members of a family can be related, by far the simplest is when there
is no relationship at all! For this reason, I will begin by looking at families of
independent random variables.
1.1 Independence
In this section I will introduce Kolmogorovs way of describing independence
and prove a few of its consequences.
1.1.1. Independent -Algebras. Let (, F, P) be a probability space
(i.e., is a nonempty set, F is a -algebra over , and P is a non-negative
measure on the measurable space (, F) having total mass 1), and, for each i
from the (non-empty) index set I, let Fi be a sub--algebra of F. I will say
that the -algebras Fi , i I, are mutually P-independent, or, less precisely,
P-independent, if, for every finite subset {i1 , . . . , in } of distinct elements of I
and every choice of Aim Fim , 1 m n,
(1.1.1)

P(Ai1 Ain ) = P(Ai1 ) P(Ain ).

In particular, if {Ai : i I} is a family of sets from F, I will say that Ai , i


I, are P-independent if the associated -algebras Fi = {, Ai , Ai {, }, i I,
are. To gain an appreciation for the intuition on which this definition is based,
it is important to notice that independence of the pair A1 and A2 in the present
sense is equivalent to P(A1 A2 ) = P(A1 )P(A2 ), the classical definition that
one encounters in elementary treatments. Thus, the notion of independence just
introduced is no more than a simple generalization of the classical notion of
independent pairs of sets encountered in non-measure theoretic presentations,
and therefore the intuition that underlies the elementary notion applies equally
well to the definition given here. (See Exercise 1.1.8 for more information about
the connection between the present definition and the classical one.)
As will become increasing evident as we proceed, infinite families of independent objects possess surprising and beautiful properties. In particular, mutually
1

1 Sums of Independent Random Variables

independent -algebras tend to fill up space in a sense made precise by the following beautiful thought experiment designed by A.N. Kolmogorov. Let I be
any index set, take F = {, }, and, for each non-empty subset I, let
!
F =

Fi

Fi

iI

S
be the -algebra
generated by i Fi (i.e., F is the smallest -algebra conS
taining i Fi ). Next, define the tail -algebra T to be the intersection over
all finite I of the -algebras F{ . When I itself is finite, T = {, } and
is therefore P-trivial in the sense that P(A) {0, 1} for every A T . The
interesting remark made by Kolmogorov is that even when I is infinite, T is
P-trivial whenever the original Fi s are P-independent. To see this, for a given
non-empty I, let C denote the collection of sets of the form Ai1 Ain
where {i1 , . . . , in } are distinct elements of and Aim Fim for each 1 m n.
Clearly C is closed under intersection and F = (C ). In addition, by assumption, P(A B) = P(A)P(B) for all A C and B C{ . Hence, by Exercise
1.1.12, F is independent of F{ . But this means that T is independent of FF
for every finite F I, and therefore, again by Exercise 1.1.12, T is independent
of
[

FI =
{FF : F a finite subset of } .
Since T FI , this implies that T is independent of itself ; that is, P(A B) =
P(A)P(B) for all A, B T . Hence, for every A T , P(A) = P(A)2 , or,
equivalently, P(A) {0, 1}, and so I have now proved the following famous
result.
Theorem 1.1.2 (Kolmogorovs 01 Law). Let {Fi : i I} be a family
of P-independent sub--algebras of (, F, P), and define the tail -algebra T
accordingly, as above. Then, for every A T , P(A) is either 0 or 1.
To develop a feeling for the kind of conclusions that can be drawn from Kolmogorovs 01 Law (cf. Exercises 1.1.18 and 1.1.19 as well), let {An : n 1} be
a sequence of subsets of , and recall the notation

lim An

[
\



An = : An for infinitely many n Z+ .

m=1 nm

Obviously, limn An is measurable with respect to the tail field determined by


the sequence of -algebras {, An , An {, }, n Z+ ; and therefore, if the An s
are P-independent elements of F, then


P lim An {0, 1}.
n

1.1 Independence

In words, this conclusion can be summarized as follows: for any sequence of


P-independent events An , n Z+ , either P-almost every is in infinitely
many An s or P-almost every is in at most finitely many An s. A more
quantitative statement of this same fact is contained in the second part of the
following useful result.
Let {An : n Z+ } F be given.

Lemma 1.1.3 (BorelCantelli Lemma).


Then

(1.1.4)

P(An ) < = P

n=1


lim An = 0.

In fact, if the An s are P-independent sets, then

(1.1.5)

P(An ) = P

n=1


lim An = 1.

(See part (iii) of Exercise 5.2.40 and Lemma 11.4.14 for generalizations.)
Proof: The first assertion, which is due to E. Borel, is an easy application of
countable additivity. Namely, by countable additivity,
[



X
An lim
P(An ) = 0
P lim An = lim P
n

nm

nm

n=1 P(An ) < .


To complete the proof of (1.1.5) when
 the An s are independent, note that,
by countable additivity, P limn An = 1 if and only if

if

\


\
 
[
lim An { = 0.
lim P
An { = P
An { = P

m=1 nm

nm

But, by independence and another application of countable additivity, for any


given m 1 we have that

\
n=m

!
An {

= lim

N
Y
n=m

"


1 P(An ) lim exp


N

N
X

#
P An

=0

n=m

P
if n=1 P(An ) = . (In the preceding, I have used the trivial inequality 1 t
et , t [0, ).) 
A second, and perhaps more transparent, way of dealing with the contents of
the preceding is to introduce the non-negative random variable N () Z+

1 Sums of Independent Random Variables

{}, that counts theP


number of n Z+ such that An . Then, by Tonellis

1
P
Theorem, E [N ] = n=1 P(An ), and so Borels contribution is equivalent to
the EP [N ] < = P(N < ) = 1, which is obvious, whereas Cantellis
contribution is that, for mutually independent An s, P(N < ) = EP [N ] <
, which is not obvious.
1.1.2. Independent Functions. Having described what it means for the algebras to be P-independent, I will now transfer the notion to random variables
on (, F, P). Namely, for each i I, let Xi be a random variable (i.e., a
measurable function on (, F)) with values in the measurable space (Ei , Bi )). I
will say that the random variables Xi , i I, are (mutually) P-independent
if the -algebras
 

(Xi ) = Xi1 Bi Xi1 (Bi ) : Bi Bi , i I,

are P-independent. If B(E; R) = B (E, B); R denotes the space of bounded
measurable R-valued functions on the measurable space (E, B), then it should
be clear that P-independence of {Xi : i I} is equivalent to the statement that






EP fi1 Xi1 fin Xin = EP fi1 Xi1 EP fin Xin
for all finite subsets
{i1 , . . . , in } of distinct elements of I and all choices of

fi1 B Ei1 ; R , . . . , fin B Ein ; R . Finally, if 1A given by

1A ()

if A

if
/A

denotes the indicator function of the set A , notice that the family of sets
{Ai : i I} F is P-independent if and only if the random variables 1Ai , i I,
are P-independent.
Thus far I have discussed only the abstract notion of independence and have
yet to show that the concept is not vacuous. In the modern literature, the
standard way to construct lots of independent
quantities is to take products of

probability spaces.
Namely,
if
E
,
B
,

is
a
probability
space for each i I,
i
i
i
Q
one sets = iI Ei ; defines i : Ei to be the natural projection map
W
for each i I; takes Fi = i1 (Bi ), i I, and F = iI Fi ; and shows that
there is a unique probability measure P on (, F) with the properties that

P i1 i = i i )
1

for all

i I and i Bi

Throughout this book, I use EP [X, A] to denote the expected value under P of X over the set
R
A. That is, EP [X, A] =
X dP. Finally, when A = , I will write EP [X]. Tonellis Theorem
A
is the version of Fubinis Theorem for non-negative functions. Its virtue is that it applies
whether or not the integrand is integrable.

1.1 Independence

and the -algebras Fi , i I, are P-independent. Although this procedure is


extremely powerful, it is rather mechanical. For this reason, I have chosen to
defer the details of the product construction to Exercises 1.1.14 and 1.1.16 and
to, instead, spend the rest of this section developing a more hands-on approach
to constructing independent sequences of real-valued random variables. Indeed,
although the product method is more ubiquitous and has become the construction of choice, the one that I am about to present has the advantage that it shows
independent random variables can arise naturally and even in a familiar places.
1.1.3. The
 Rademacher Functions. Until further notice, take (, F) =
[0, 1), B[0,1) (when E is a metric space, I use BE to denote the Borel field over
E) and P to be the restriction [0,1) of Lebesgue measure R to [0, 1). Next
define the Rademacher functions Rn , n Z+ , on as follows. Take the
integer part btc of t R to be the largest integer dominated by t, and consider
the function R : R {1, 1} given by



1
if t btc 0, 12


R(t) =
1
if t btc 12 , 1 .

The function Rn is then defined on [0, 1) by



Rn () = R 2n1 ,

n Z+ and [0, 1).

I will now show that the Rademacher functions are P-independent. To this end,
first note that every real-valued function f on {1, 1} is of the form + x, x
{1, 1}, for some pair of real numbers and . Thus, all that I have to show is
that


EP (1 + 1 R1 ) (n + n Rn ) = 1 n
for any n Z+ and (1 , 1 ), . . . , (n , n ) R2 . Since this is obvious when
n = 1, I will assume that it holds for n and need only check that it must also
hold for n + 1, and clearly this comes down to checking that


EP F (R1 , . . . , Rn ) Rn+1 = 0
for any F : {1, 1}n R. But (R1 , . . . , Rn ) is constant on each interval


m m+1
, 0 m < 2n ,
Im,n n ,
2n
2

whereas Rn+1 integrates to 0 on each Im,n . Hence, by writing the integral over
as the sum of integrals over the Im,n s, we get the desired result.
At this point I have produced a countably infinite sequence of independent
Bernoulli random variables (i.e., two-valued random variables whose range is
usually either {1, 1} or {0, 1}) with mean value 0. In order to get more general

1 Sums of Independent Random Variables

random variables, I will combine our Bernoulli random variables together in a


clever way.
Recall that a random variable U is said to be uniformly distributed on the
finite interval [a, b] if
P(U t) =

ta
ba

for t [a, b].

Lemma 1.1.6. Let {X` : ` Z+ } be a sequence of P-independent {0, 1}valued Bernoulli random variables with mean value 12 on some probability space
(, F, P), and set

X
X`
.
U=
2`
`=1

Then U is uniformly distributed on [0, 1].


Proof: Because the assertion only involves properties of distributions, it will be
proved in general as soon as I prove it for a particular realization of independent,
mean value 12 , {0, 1}-valued Bernoulli random variables. In particular, by the
preceding discussion, I need only consider the random variables

n ()

1 + Rn ()
,
2

n Z+ and [0, 1),


on [0, 1), B[0,1) , [0,1) . But, as is easily checked (cf. part (i) of Exercise 1.1.11),
P
for each [0, 1], = n=1 2n n (). Hence, the desired conclusion is trivial
in this case. 
Now let (k, `) Z+ Z+ 7 n(k, `) Z+ be any one-to-one mapping of
+
Z Z+ onto Z+ , and set
Yk,` =

1 + Rn(k,`)
,
2

2
(k, `) Z+ .

Clearly, each Yk,` is a {0, 1}-valued, Bernoulli random variable with mean value


1
+ 2
is P-independent. Hence, by Lemma
2 , and the family Yk,` : (k, `) Z
1.1.6, each of the random variables

Uk

X
Yk,`
`=1

2`

k Z+ ,

is uniformly distributed on [0, 1). In addition, the Uk s are obviously mutually


independent. Hence, I have now produced a sequence of mutually independent
random variables, each of which is uniformly distributed on [0, 1). To complete
our program, I use the time-honored transformation that takes a uniform random

Exercises for 1.1

variable into an arbitrary one. Namely, given a distribution function F on


R (i.e., F is a right-continuous, non-decreasing function that tends to 0 at
and 1 at +), define F 1 on [0, 1] to be the left-continuous inverse of F . That
is,
F 1 (t) = inf{s R : F (s) t},
t [0, 1].
(Throughout, the infemum over the empty set is taken to be +.) It is then an
easy matter to check that when U is uniformly distributed on [0, 1) the random
variable X = F 1 U has distribution function F :
P(X t) = F (t),

t R.

Hence, after combining this with what we already know, I have now completed
the proof of the following theorem.
Theorem 1.1.7. Let = [0, 1), F = B[0,1) , and P = [0,1) . Then, for
any sequence {Fk : k Z+ } of distribution functions on R, there exists a
sequence {Xk : k Z+ } ofP-independent random variables on (, F, P) with
the property that P Xk t = Fk (t), t R, for each k Z+ .
Exercises for 1.1


Exercise 1.1.8. As I pointed out, P A1 A2 = P A1 )P A2 if and only
if the -algebra generated by A1 is P-independent of the one generated by A2 .
Construct an example to show that the analogous statement is false when dealing

with three, instead
 of two, sets. That is, just because P A1 A2 A3 =
P A1 P A2 P A3 , show that it is not necessarily true that the three -algebras
generated by A1 , A2 , and A3 are P-independent.
Exercise 1.1.9. This exercise deals with three elementary, but important,
properties of independent random variables. Throughout, (, F, P) is a given
probability space.
(i) Let X1 and X2 be a pair of P-independent random variables with values in
the measurable spaces (E1 , B1 ) and (E2 , B2 ), respectively. Given a B1 B2 measurable function F : E1 E2 R that is bounded below, use Tonellis or
Fubinis Theorem to show that


x2 E2 7 f (x2 ) EP F X1 , x2 R
is B2 -measurable and that




EP F X1 , X2 = EP f X2 .
(ii) Suppose that X1 , . . . , Xn are P-independent, real-valued random variables.
If each of the Xm s is P-integrable, show that X1 Xn is also P-integrable and
that


 
 
EP X1 Xn = EP X1 EP Xn .
(iii) Let {Xn : n Z+ } be a sequence of independent random variables taking
values in some separable metric space E. If P(X
 n = x) = 0 for all x E and
n Z+ , show that P Xm = Xn for some m 6= n = 0.

1 Sums of Independent Random Variables

Exercise 1.1.10. As an application of Lemma 1.1.6 and part (ii) of Exercise


1.1.9, prove the identity
sin z = z

cos 2n z

for all z C.

n=1

Exercise 1.1.11. Define {n () : n 1} for [0, 1) as in the proof of


Lemma 1.1.6.
+

Z
(i) Show that {P
n () : n 1} is the unique sequence {n : n 1} {0, 1}
n
such that m=1 2m m < 2n , and conclude that 1 () = b2c and
n+1 () = b2n+1 c 2b2n c for n 1.

(ii) Define F : [0, 1) [0, 1)2 by


F () =

2n1 (),

n=1

!
n

2n () ,

n=1


and show that [0,1)2 = F [0,1) . That is, [0,1) { : F () } = 2[0,1) () for
all B[0,1)2 .
(iii) Define G : [0, )2 [0, 1) by

 X
2n (1 ) + n (2 )
,
G (1 , 2 ) =
4n
n=1

and show that [0,1) = G [0,1)2 .


Parts (ii) and (iii) are special cases of a general principle that says, under
very general circumstances, measures can be transformed into one another.
Exercise 1.1.12. Given a non-empty set , recall2 that a collection C of subsets
of is called a -system if C is closed under finite intersections. At the same
time, recall that a collection L is called a -system if L, A B L
whenever A and B are disjoint members
S of L, B \ A L whenever A and B
are members of L with A B, and 1 An L whenever {An : n 1} is a
non-decreasing sequence of members of L. Finally, recall (cf. Lemma 3.1.3 in my
Concise Introduction to the Theory of Integration) that if C is a -system, then
the -algebra (C) generated by C is the smallest L-system L C.
Show that if C is a -system and F = (C), then two probability measures
P and Q are equal on F if they are equal on C. Next use this to see that if
{Ci : i I} is a family of -systems contained in F and if (1.1.1) holds when
the Ai s are from the Ci s, then the family of -algebras {(Ci ) : i I} is
independent.
2

See, for example, 3.1 in the authors A Concise Introduction to the Theory of Integration,
Third Edition, Birkh
auser (1998).

Exercises for 1.1

Exercise 1.1.13. In this exercise I discuss two criteria for determining when
random variables on the probability space (, F, P) are independent.
(i) Let X1 , . . . , Xn be bounded, real-valued random variables. Using Weierstrasss Approximation Theorem, show that the Xm s are P-independent if and
only if






EP X1m1 Xnmn = EP X1m1 EP Xnmn
for all m1 , . . . , mn N.
(ii) Let X : Rm and Y : Rn be random variables. Show that X
and Y are P-independent if and only if


h 

 i
P
E exp 1 , X Rm + , Y Rn


 

h
h
 i
 i P
= E exp 1 , X Rm E exp 1 , Y Rn
P

for all Rm and Rn .


Hint: The only if assertion is obvious. To prove the if assertion, first check
that X and Y are independent if



 

EP f (X) g(Y) = EP f (X) EP g(Y)


for all f Cc Rm ; C and g Cc Rn ; C . Second, given such f and g, apply
elementary Fourier analysis to write
Z
Z

1 (,x)Rm
() d and g(y) =
e 1 (,y)Rn () d,
f (x) =
e
Rm

Rn

where and are smooth functions with rapidly decreasing (i.e., tending
to 0 as |x| faster than any power of (1 + |x|)1 ) derivatives of all orders.
Finally, apply Fubinis Theorem.
Exercise 1.1.14. Given a pair of measurable spaces (E1 , B1 ) and (E
 2 , B2 ),
recall that their product is the measurable space E1 E2 , B1 B2 , where
B1 B2 is the -algebra over the Cartesian product space E1 E2 generated by
the sets 1 2 , i Bi . Further, recall that, for any probability measures i
on (Ei , Bi ), there is a unique probability measure 1 2 on E1 E2 , B1 B2
such that

(1 2 ) 1 2 = 1 (1 )2 (2 ) for i Bi .
More Q
generally, for any n 2 and measurable
spaces {(Ei , Bi ) : 1Q i n}, one
Qn
n
n
takes 1 Bi to be the -algebra over 1 Ei generated by the sets 1 i , i Bi .
Qn+1
Qn+1
Qn
In particular, since 1 Ei and 1 Bi can be identified with ( 1 Ei )

10

1 Sums of Independent Random Variables

Qn
En+1 and ( 1 Bi ) Bn+1 , respectively, one can use induction to show that, for
every choice
measures
i on (Ei , Bi ), there is a unique probability
Qn of probability
Qn
Qn
measure 1 i on ( 1 Ei , 1 Bi ) such that
! n !
n
n
Y
Y
Y
i
i =
i (i ), i Bi .
1

The purpose of this exercise is to generalize the preceding construction to


infinite collections. Thus, let I be an infinite index set, and, for each i I,
let (Ei , Bi ) be a measurable
6 I, use E to denote the
Q space. Given =
Cartesian product space i Ei and Q
to denote the natural projection map
taking EI onto E . Further, let BI = iI Bi stand for the -algebra over EI
generated by the collection C of subsets
!
Y
1
F
i , i Bi ,
iF

as F varies over non-empty, finite subsets of I (abbreviated by =


6 F I).
In the following steps, I outline a proof that, for every choice of Q
probability
measures

on
the
(E
,
B
)s,
there
is
a
unique
probability
measure
i
i
i
iI i on

EI , BI with the property that
!
!!
Y
Y
Y

1
(1.1.15)
i
F
i
=
i i , i Bi ,
iI

iF

iF

for every =
6 F I. Not surprisingly, the probability space
!
Y
Y
Y
Ei ,
Bi ,
i
iI

iI

iI


is called the product over I of the spaces Ei , Bi , i ; and when all the factors

are the same space E, B, , it is customary to denote
it by E I , B I , I , and

if, in addition, I = {1, . . . , N }, one uses E N , B N , N .
(i) After noting (cf. Exercise 1.1.12) that two probability measures that agree on
a -system agree on the -algebra generated bythat -system, show that there
is at most one probability measure on EI , BI that satisfies the condition in
(1.1.15). Hence, the problem is purely one of existence.
(ii) Let A be the algebra over EI generated by C, and show that there is a finitely
additive : A [0, 1] with the property that
!


Y


F1 F =
i F , F BF ,
iF

Exercises for 1.1

11

for all =
6 F I. Hence, all that one has to do is check that admits a
-additive extension to BI , and, by a standard extension theorem, this comes
down to checking that (An ) & 0 whenever {An : n 1} A and An & .
Thus, let {An : n 1} be a non-increasing sequence from A, and
Tassume that
(An )  for some  > 0 and all n Z+ . One must show that 1 An 6= .
(iii) Referring to the last part of (ii), show that there is no loss in generality
to assume that An = F1
Fn , where, for each n Z+ , =
6 Fn I and
n
Fn BFn . In addition, show that one may assume that F1 = {i1 } and that
Fn = Fn1 {in }, n 2, where {in : n 1} is a sequence of distinct elements
of I. Now, make these assumptions, and show that it suffices to find a` Ei` ,
` Z+ , with the property that, for each m Z+ , (a1 , . . . , am ) Fm .
( iv) Continuing (iii), for each m, n Z+ , define gm,n : EFm [0, 1] so that


gm,n xFm = 1Fn xi1 , . . . , xin

if n m

and


gm,n xFm =
EFn \Fm

1Fn xFm , yFn \Fm

n
Y

!
i`

dyFn \Fm

if n > m.

`=m+1

After noting that, for each m and n, gm,n+1 gm,n and


Z



gm,n xFm =
gm+1,n xFm , yim+1 im+1 dyim+1 ,
Eim+1

set gm = limn gm,n and conclude that


Z



gm xFm =
gm+1 xFm , yim+1 im+1 dyim+1 .
Eim+1

In addition, note that


Z
Z


g1 xi1 i1 dxi1 = lim
Ei1



g1,n xi1 i1 dxi1

Ei1

= lim (An ) ,
n

and proceed by induction to produce a` Ei` , ` Z+ , so that



gm (a1 , . . . , am )  for all m Z+ .
Finally, check that {am : m 1} is a sequence of the sort for which we were
looking at the end of part (iii).

12

1 Sums of Independent Random Variables

Exercise 1.1.16. Recall that if is a measurable map from one measurable


space (E, B) into a second one (E 0 , B 0 ), then the distribution of under a
measure on (E, B) is the pushforward measure (also denoted by 1 )
defined on (E 0 , B 0 ) by

() = 1 () for B 0 .
Given a non-empty index set I and, for each i I, a measurable space (Ei , Bi )
and an Ei -valued
random variable Xi on the probability space (, F, P), define
Q
X :  iI Ei so that X()i = Xi () for each i I and . Show
that XQ
i : i I is a family of P-independent random variables if and only if
X P = iI (Xi ) P. In particular, given probability measures i on (Ei , Bi ),
set
Y
Y
Y
=
Ei , F =
Bi , P =
i ,
iI

iI

iI

let Xi : Ei be the natural projection map from onto Ei , and show that
{Xi : i I} is a family of mutually P-independent random variables such that,
for each i I, Xi has distribution i .
Exercise 1.1.17. Although it does not entail infinite product spaces, an interesting example of the way in which the preceding type of construction can be
effectively applied is provided by the following elementary version of a coupling
argument.
(i) Let (, B, P) be a probability space and X and Y a pair of P-square integrable
R-valued random variables with the property that


X() X( 0 ) Y () Y ( 0 ) 0 for all (, 0 ) 2 .
Show that


EP X Y EP [X] EP [Y ].
Hint: Define Xi and Yi on 2 for i {1, 2} so that Xi () = X(i ) and
Yi () = Y (i ) when = (1 , 2 ), and integrate the inequality




0 X(1 ) X(2 ) Y (1 ) Y (2 ) = X1 () X2 () Y1 () Y2 ()
with respect to P2 .
(ii) Suppose that n Z+ and that f and g are R-valued, Borel measurable
functions on Rn that are non-decreasing with respect to each coordinate (separately). Show that if X = X1 , . . . , Xn is an Rn -valued random variable on a
probability space (, B, P) whose coordinates are mutually P-independent, then



 

EP f (X) g(X) EP f (X) EP g(X)
so long as f (X) and g(X) are both P-square integrable.

Exercises for 1.1

13

Hint: First check that the case when n = 1 reduces to an application of (i).
Next, describe the general case in terms of a multiple integral, apply Fubinis
Theorem, and make repeated use of the case when n = 1.
Exercise 1.1.18. A -algebra is said to be countably generated if it contains
a countable collection of sets that generate it. The purpose of this exercise is to
show that just because a -algebra is itself countably generated does not mean
that all its sub--algebras are.

Let (, F, P) be a probability space and {An : n Z+ F a sequence of
P-independent sub-subsets of F with the property that P(An ) 1 for
some (0, 1). Let Fn be the sub--algebra generated by An . Show that the
tail -algebra T determined by Fn : n Z+ cannot be countably generated.
Hint: Show that C T is an atom in T (i.e., B = C whenever B T \ {} is
contained in C) only if one can write
C = lim Cn
n

\
[

Cn ,

m=1 nm

where, for each n Z+ , Cn equals either An or An {. Conclude that every


atom
in T must have P-measure 0. Now suppose
that T were generated by


B` : ` N . By Kolmogorovs 01 Law, P B` {0, 1} for every ` N. Take
` =
B

B`
B` {


if P B` = 1

if P B` = 0

and set

C=

` .
B

`N

Note that, on the one hand, P(C) = 1, while, on the other hand, C is an atom
in T and therefore has probability 0.
Exercise 1.1.19. Here is an interesting application of Kolmogorovs 01 Law
to a property of the real numbers.
(i) Referring to the discussion preceding Lemma 1.1.6 and part (i) of Exercise
1.1.11, define the transformations Tn : [0, 1) [0, 1) for n Z+ so that
Tn () =

Rn ()
,
2n

[0, 1),

and notice (cf. the proof of Lemma 1.1.6) that Tn () simply flips the nth coefficient in the binary expansion . Next, let B[0,1) , and
 show that
is measurable with respect to the -algebra {Rn : n > m} generated by
{Rn : n > m} if and only if Tn () = for each 1 n m. In particular,
conclude that [0,1) () {0, 1} if Tn = for every n Z+ .

14

1 Sums of Independent Random Variables

(ii) Let F denote the set of all finite subsets of Z+ , and for each F F, define
T F : [0, 1) [0, 1) so that T is the identity mapping and
T F {m} = T F Tm

for each F F and m Z+ \ F.

As an application of (i), show that for every B[0,1) with [0,1) () > 0,
!
[
[0,1)
T F () = 1.
F F

In particular, this means that if has positive measure, then almost every
[0, 1) can be moved to by flipping a finite number of the coefficients in the
binary expansion of .
1.2 The Weak Law of Large Numbers
Starting with this section, and for the rest of this chapter, I will be studying what
happens when one averages independent, real-valued random variables. The
remarkable fact, which will be confirmed repeatedly, is that the limiting behavior
of such averages depends hardly at all on the variables involved. Intuitively,
one can explain this phenomenon by pretending that the random variables are
building blocks that, in the averaging process, first get homothetically shrunk
and then reassembled according to a regular pattern. Hence, by the time that
one passes to the limit, the peculiarities of the original blocks get lost.
Throughout the discussion, (, F, P) will be a probability space on which there
is a sequence {Xn : n 1} of real-valued random variables. Given n Z+ , use
Sn to denote the partial sum X1 + + Xn and S n to denote the average:
n
1X
Sn
X` .
=
n
n
`=1

1.2.1. Orthogonal Random Variables. My first result is a very general


one; in fact, it even applies to random variables that are not necessarily independent and do not necessarily have mean 0.
Lemma 1.2.1. Assume that
 
EP Xn2 < for n Z+



and EP Xk X` = 0 if k 6= `.

Then, for each  > 0,


n


 2
1 X P 2
E X`
for n Z+ .
(1.2.2)
2 P S n  EP S n = 2
n
`=1

In particular, if
 
M sup EP Xn2 < ,
nZ+

then



 2 M
,
n Z+ and  > 0;
2 P S n  EP S n
n
and so S n 0 in L2 (P; R) and therefore also in P-probability.
(1.2.3)

1.2 The Weak Law of Large Numbers

15

Proof: To prove the equality in (1.2.2), note that, by orthogonality,


n
  X
 
EP Sn2 =
EP X`2 .
`=1

The rest is just an application of Chebyshevs inequality, the estimate that


results after integrating the inequality


2 1[,) |Y | Y 2 1[,) |Y | Y 2
for any random variable Y . 
1.2.2. Independent Random Variables. Although Lemma 1.2.1 does
not use independence, independent random variables provide a ready source of
orthogonal functions. To wit, recall that for any P-square integrable random
variable X, its variance Var(X) satisfies
h
2 i
 
2
 
Var(X) EP X EP [X]
= EP X 2 EP [X] EP X 2 .
In particular, if the random variables Xn , n Z+ , are P-square integrable and
P-independent, then the random variables
 
n Xn EP Xn , n Z+ ,
X
are still P-square integrable, have mean value 0, and therefore are orthogonal.
Hence, the following statement is an immediate consequence of Lemma 1.2.1.


Theorem 1.2.4. Let Xn : n Z+ be a sequence of P-independent, P-square
integrable random variables with mean value m and variance dominated by 2 .
Then, for every n Z+ and  > 0,

h


2 i 2
.

(1.2.5)
2 P S n m  EP S n m
n
In particular, S n m in L2 (P; R) and therefore in P-probability.

As yet I have made only minimal use of independence: all that I have done
is subtract off the mean of independent random variables and thereby made
them orthogonal. In order to bring the full force of independence into play, one
has to exploit the fact that one can compose independent random variables with
any (measurable) functions without destroying their independence; in particular,
truncating independent random variables does not destroy independence. To see
how such a property can be brought to bear, I will now consider the problem
of extending the last part of Theorem 1.2.4 to Xn s that are less than P-square
integrable. In order to
understand the statement, recall that a family of random
variables Xi : i I is said to be uniformly P-integrable if
h
i
lim sup EP Xi , Xi R = 0.
R% iI

As the proof of the following theorem illustrates, the importance of this condition
is that it allows one to simultaneously approximate the random variables Xi , i
I, by bounded random variables.

16

1 Sums of Independent Random Variables



Theorem 1.2.6 (The Weak Law of Large Numbers). Let Xn : n Z+
be a uniformly P-integrable sequence of P-independent random variables. Then
n


1X
Xm EP [Xm ] 0 in L1 (P; R)
n 1


and therefore also in P-probability. In particular, if Xn : n Z+ is a sequence
of P-independent, P-integrable random variables that are identically distributed,
then S n EP [X1 ] in L1 (P; R) and P-probability. (Cf. Exercise 1.2.11.)

Proof: Without loss in generality, I will assume that EP [Xn ] = 0 for every
n Z+ .
For each R (0, ), define fR (t) = t 1[R,R] (t), t R,


m(R)
= EP fR Xn ,
n

Xn(R) = fR Xn m(R)
n ,

and set
(R)

Sn

1 X (R)
X`
n

and

(R)

Tn

Since E[Xn ] = 0 =

1 X (R)
Y` .
n
`=1

`=1

(R)
mn

and Yn(R) = Xn Xn(R) ,



= E Xn , |Xn | > R ,

 (R) 

 (R) 

EP |S n | EP |S n | + EP |T n |


 (R)  1
EP |S n |2 2 + 2 max EP |X` |, |X` | R
1`n



R
EP |X` |, |X` | R ;
+ 2 max
n
`Z+
and therefore, for each R > 0,





lim EP |S n | 2 sup EP |X` |, |X` | R .

`Z+

Hence, because the X` s are uniformly P-integrable, we get the desired convergence in L1 (P; R) by letting R % . 
1.2.3. Approximate Identities. The name of Theorem 1.2.6 comes from
a somewhat invidious comparison with the result in Theorem 1.4.9. The reason
why the appellation weak is not entirely fair is that, although The Weak Law
is indeed less refined than the result in Theorem 1.4.9, it is every bit as useful
as the one in Theorem 1.4.9 and maybe even more important when it comes
to applications. What The Weak Law provides is a ubiquitous technique for
constructing an approximate identity (i.e., a sequence of measures that approximate a point mass) and measuring how fast the approximation is taking

1.2 The Weak Law of Large Numbers

17

place. To illustrate how clever selections of the random variables entering The
Weak Law can lead to interesting applications, I will spend the rest of this section
discussing S. Bernsteins approach
to Weierstrasss
Approximation Theorem.


+
For a given p [0, 1], let Xn : n Z
be a sequence of P-independent
{0, 1}-valued Bernoulli random variables with mean value p. Then

P Sn = ` =

 
n `
p (1 p)n`
`

for

0 ` n.


Hence, for any f C [0, 1]; R , the nth Bernstein polynomial
n    
X
n
`
p` (1 p)n`
Bn (p; f )
f
n
`

(1.2.7)

`=0

of f at p is equal to



EP f S n .
In particular,






f (p) Bn (p; f ) = EP f (p) f S n EP f (p) f S n



2kf ku P S n p  + (; f ),

where kf ku is the uniform norm of f (i.e., the supremum of |f | over the domain
of f ) and


(; f ) sup |f (t) f (s)| : 0 s < t 1 with t s 

is the modulus of continuity of f . Noting that Var Xn = p(1 p)
applying (1.2.5), we conclude that, for every  > 0,

1
4

and



f (p) Bn (p; f ) kf ku + (; f ).
u
2n2

In other words, for all n Z+ ,


(1.2.8)



f Bn ( ; f ) u (n; f ) inf


kf ku
+ (; f ) :  > 0 .
2n2

Obviously, (1.2.8) not only shows that, as n , Bn ( ; f ) f uniformly on


[0, 1], it even provides a rate of convergence in terms of the modulus of continuity
of f . Thus, we have done more than simply prove Weierstrasss theorem; we have
produced a rather
explicit and tractable
sequence of approximating polynomials,


the sequence Bn ( ; f ) : n Z+ . Although this sequence is, by no means, the

18

1 Sums of Independent Random Variables

most efficient one,1 as we are about to see, the Bernstein polynomials have a
lot to recommend them. In particular, they have the feature that they provide
non-negative polynomial approximates to non-negative functions. In fact, the
following discussion reveals much deeper non-negativity preservation properties
possessed by the Bernstein approximation scheme.
In order to bring out the virtues of the Bernstein polynomials, it is important to replace (1.2.7) with an expression in which the coefficients of Bn ( ; f )
(as polynomials) are clearly displayed. To this end, introduce the difference
operator h for h > 0 given by


f (t + h) f (t)
.
h f (t) =
h

A straightforward inductive argument (using Pascals Identity for the binomial


coefficients) shows that
 
m
X
 m 
` m
(h) h f (t) =
(1)
f (t + `h)
`
m

for m Z+ ,

`=0

(m)

where h
see that

denotes the mth iterate of the operator h . Taking h =

Bn (p; f ) =

n n`
X
X nn `

`=0 k=0
n
X

1
n,

we now

(1)k f (`h)p`+k


r  
X
n n`
=
p
(1)r` f (`h)
`
r

`
r=0
r

`=0

n
X

 X
r  
n
r
=
(p)
(1)` f (`h)
r
`
r=0
r

`=0

n 
X
r=0



n
(ph)r rh f (0),
r

where 0h f f . Hence, we have proved that


(1.2.9)

Bn (p; f ) =

n
X
`=0

n`

 
n  ` 
1 f (0)p`
n
`

for

p [0, 1].

The marked resemblance between the expression on the right-hand side of


(1.2.9) and a Taylor polynomial is more than coincidental. To demonstrate how
1

See G.G. Lorentzs Bernstein Polynomials, Chelsea Publ. Co. (1986) for a lot more information.

1.2 The Weak Law of Large Numbers

19

one can exploit the relationship between


the Bernstein and Taylor polynomials,

say that a function C (a, b); R is absolutely monotone if its mth derivative Dm is non-negative for every m N. Also, say that C [0, 1]; [0, 1])
is a probability generating function if there exists a un : n N [0, 1]
such that

X
X
un = 1 and (t) =
un tn for t [0, 1].
n=0

n=0

Obviously, every probability generating function is absolutely monotone on (0, 1).


The somewhat surprising (remember that most infinitely differentiable functions
do not admit power series expansions) fact which I am about to prove is that,
apart from a multiplicative constant, the converse is also true. In fact, one does
not need to know, a priori, that the function is smooth so long as it satisfies a
discrete version of absolute monotonicity.

Theorem 1.2.10. Let C [0, 1]; R with (1) = 1 be given. Then the
following are equivalent:
(i) is a probability generating function,
(ii) the
of to (0, 1) is absolutely monotone;

 restriction

(0)

0
for every n N and 0 m n.
(iii) m
1
n

Proof: The implication (i) = (ii) is trivial. To see that (ii) implies (iii), first
observe that if is absolutely monotone on (a, b) and h (0, b a), then h
is absolutely monotone on (a, b h). Indeed, because D h = h D on
(a, b h), we have that


h Dm h (t) =

t+h

Dm+1 (s) ds 0,

t (a, b h),

for any m N. Returning to the function , we now know that m


h is absolutely
monotone on (0, 1 mh) for all m N and h > 0 with mh < 1. In particular,
m
[m
h ](0) = lim [h ](t) 0
t&0



and so m
h (0) 0 when h =

1
n

if

mh < 1,

and 0 m < n. Moreover, since

[n1 ](0) = lim1 [nh ](0),


n

h% n



we also know that nh (0) 0 when h = n1 , and this completes the proof that
(ii) implies (iii).
Finally, assume that (iii) holds and set n = Bn ( ; ). Then, from (1.2.9) and
the equality n (1) = (1) = 1, we see that each n is a probability generating
function. Thus, in order to complete the proof that (iii) implies (i), all that

20

1 Sums of Independent Random Variables

one has to do is check that a uniform limit of probability generating functions


is itself a probability generating function. To this end, write
n (t) =

un,` t` ,

t [0, 1] for each n Z+ .

`=0

Because the un,` s are all elements of [0, 1], one can use a diagonalization procedure to choose {nk : k Z+ } so that
lim unk ,` = u` [0, 1]

exists for each ` N. But, by Lebesgues Dominated Convergence Theorem,


this means that
(t) = lim nk (t) =
k

u` t`

for every t [0, 1).

`=0

Finally, by the Monotone Convergence Theorem, the preceding extends immediately to t = 1, and so is a probability generating function. (Notice that
the argument just given does not even use the assumed uniform convergence
and shows that the pointwise limit of probability generating functions is again
a probability generating function.) 
The preceding is only one of many examples in which The Weak Law leads
to useful ways of forming an approximate identity. A second example is given
in Exercises 1.2.12 and 1.2.13. My treatment of these is based on that of Wm.
Feller.2
Exercises for 1.2
Exercise 1.2.11. Although, for historical reasons, The Weak Law is usually
thought of as a theorem about convergence in P-probability, the forms in which
I have presented it are clearly results about convergence in either P-mean or
even P-square mean. Thus, it is interesting to discover that one can replace the
uniform integrability assumption made in Theorem 1.2.6 with a weak uniform integrability assumption if one is willing to settle for convergence in P-probability.
Namely, let X1 , . . . , Xn , . . . be mutually P-independent random variables, assume that


F (R) sup RP |Xn | R 0 as R % ,
nZ+
2

Wm. Feller, An Introduction to Probability Theory and Its Applications, Vol. II, Wiley, Series
in Probability and Math. Stat. (1968). Feller provides several other similar applications of The
Weak Law, including the ones in the following exercises.

Exercises for 1.2


and set

21

i
1 X Ph
E X` , |X` | n ,
mn =
n

n Z+ .

`=1

Show that, for each  > 0,


n
i






1 X P h 2
X` > n
E
X
,
X

n
+
P
max
P S n mn 
`
`
1`n
(n)2
`=1
Z n
2
F (t) dt + F (n),
2
n 0


and conclude that S n mn 0 in P-probability. (See part (ii) of Exercises
1.4.26 and 1.4.27 for a partial converse to this statement.)

Hint: Use the formula


 
Var(Y ) EP Y 2 = 2


t P |Y | > t dt.

[0,)

Exercise 1.2.12. Show that, for each T [0, ) and t (0, ),



X (nt)k
1
if T > t
=
lim ent
n
k!
0
if T < t.
0knT

Hint: Let X1 , . . . , Xn , . . . be P-independent, N-valued Poisson random variables with mean value t. That is, the Xn s are P-independent and

tk
for k N.
P Xn = k = et
k!
Show that Sn is an N-valued Poisson random variable with mean value nt, and
conclude that, for each T [0, ) and t (0, ),
X (nt)k

= P Sn T .
ent
k!
0knT

Exercise 1.2.13. Given a right-continuous function F : [0, ) R of bounded variation with F (0) = 0, define its Laplace transform (), [0, ), by
the RiemannStieltjes integral:
Z
() =
et dF (t).
[0,)

Using Exercise 1.2.12, show that


X (n)k 

Dk (n) F (T )
k!

as n

knT

for each T [0, ) at which F is continuous. Conclude, in particular, that


F can be recovered from its Laplace transform. Although this is not the most
practical recovery method, it is distinguished by the fact that it does not involve
complex analysis.

22

1 Sums of Independent Random Variables

1.3 Cram
ers Theory of Large Deviations


From Theorem 1.2.4, we know that if Xn : n Z+ is a sequence of Pindependent, P-square integrable random variables with mean value 0, and if
the averages S n , n Z+ , are defined accordingly, then, for every  > 0,
 max

1mn Var(Xm )
,
n Z+ .
P S n 
n2
Thus, so long as
Var(Xn )
0 as n ,
n
the S n s are becoming more and more concentrated near 0, and the rate at
which this concentration is occurring can be estimated in terms of the variances
Var(Xn ). In this section, we will see that, by placing more stringent integrability
requirements on the Xn s, one can gain more information about the rate at which
the S n s are concentrating at 0.
In all of this analysis, the trick is to see how independence can be combined
with 0 mean value to produce unexpected cancellations; and, as a preliminary
warm-up exercise, I begin with the following.

Theorem 1.3.1. Let {Xn : n Z+ } be a sequence of P-independent, Pintegrable random variables with mean value 0, and assume that
 
M4 sup EP Xn4 < .
nZ+

Then, for each  > 0,


 4  3M4
4 P |S n |  EP S n 2 ,
n
In particular, S n 0 P-almost surely.
(1.3.2)

n Z+ .

Proof: Obviously, in order to prove


 (1.3.2), it suffices to check the second
inequality, which is equivalent to EP Sn4 3M4 n2 . But
n
X

 
E Sn4 =
P



EP Xm1 Xm4 ,

m1 ,...,m4 =1

and, by Schwarzs Inequality, each of these terms is dominated by M4 . In addition, of these terms, the only ones that do not vanish have either all their factors
the same or two pairs of equal factors. Thus, the number of non-vanishing terms
is n + 3n(n 1) = 3n2 2n.
Given (1.3.2), the proof of the last part becomes an easy application of the
BorelCantelli Lemma. Indeed, for any  > 0, we know from (1.3.2) that



X
P S n  < ,
n=1



and therefore, by (1.1.4), that P limn S n  = 0. 

1.3 Cramers Theory of Large Deviations

23

Remark 1.3.3. The final assertion in Theorem 1.3.1 is a primitive version of


The Strong Law of Large Numbers. Although The Strong Law will be taken up
again, and considerably refined, in Section 1.4, the principle on which its proof
here was based is an important one: namely, control more moments and you
will get better estimates; get better estimates and you will reach more refined
conclusions.
With the preceding adage in mind, I will devote the rest of this section to
examining what one can say when one has all moments at ones disposal. In fact,
from now on, I will be assuming that X1 , . . . , Xn , . . . are independent random
variables with common distribution having the property that the moment
generating function
Z
(1.3.4)
M ()
e x (dx) < for all R.
R

Obviously, (1.3.4) is more than sufficient to guarantee that the Xn s have moments of all orders. In fact, as an application of Lebesgues Dominated Convergence Theorem, one sees that R 7 M () (0, ) is infinitely differentiable
and that
Z
 
dn M
(0) for all n N.
EP X1n =
xn (dx) =
d n
R

In the discussion that follows, I will use m and 2 to denote, respectively, the
common mean value and variance of the Xn s.
In order to develop some intuition for the considerations that follow, I will
first consider an example, which, for many purposes, is the canonical example in
probability theory. Namely, let g : R (0, ) be the Gauss kernel


|y|2
1
, y R,
(1.3.5)
g(y) exp
2
2

and recall that a random variable X is standard normal if


Z

P X =
g(y) dy, BR .

In spite of their somewhat insultingly bland moniker, standard normal random


variables are the building blocks for the most honored family in all of probability
theory. Indeed, given m R and [0, ), the random variable Y is said to
be normal (or Gaussian) with mean valuem and variance 2 (often this
is abbreviated by saying that X is an N m, 2 -random variable) if and only
if the distribution of Y is m,2 , where m,2 is the distribution of the variable
X + m when X is standard normal. That is, Y is an N (m, 2 ) random variable
if, when = 0, P(Y = m) = 1 and, when > 0,


Z

ym
1
dy for BR .
g
P Y =

24

1 Sums of Independent Random Variables

There are two obvious reasons for the honored position held by Gaussian
random variables. In the first place, they certainly have finite moment generating
functions. In fact, since
 2
Z

y
, R,
e g(y) dy = exp
2
R

it is clear that



2 2
.
Mm,2 () = exp m +
2

(1.3.6)

Secondly, they add nicely. To be precise, it is a familiar fact from elemen is


tary probability theory that if X is an N (m, 2 )-random variable and X
2

an N (m,

)-random
 variable that is independent of X, then X + X is an
N m + m,
2 +
2 -random variable. In particular, if X1 , . . . , Xn are mutually

independent, standard normal random variables, then S n is an N 0, n1 -random
variable. That is,
r


Z

n|y|2
n
dy.
exp
P Sn =
2
2

Thus (cf. Exercise 1.3.16), for any we see that


(1.3.7)

h
i
1
log P S n = ess inf
lim
n n


|y|2
: y ,
2

where the ess in (1.3.7) stands for essential and means that what follows is
taken modulo a set of measure 0. (Hence, apart from a minus sign, the right2
hand side of (1.3.7) is the greatest number dominated by |y|2 for Lebesgue-almost
every y .) In fact, because

g(y) dy x1 g(x) for all x (0, ),

we have the rather precise upper bound


r



n2
2
exp
P |S n | 
2
n2

for  > 0.

At the same time, it is clear that, for 0 <  < |a|,

r

P |S n a| < 



n(|a| + )2
22 n
.
exp
2

1.3 Cramers Theory of Large Deviations

25

More generally, if the Xn s are mutually independent N (m, 2 )-random variables, then one finds that

r

P |S n m| 



n2
2
exp
2
n2

for  > 0;

and, for 0 <  < |a| and sufficiently large ns,

r

P |S n (m + a)| < 



n(|a| + )2
22 n
.
exp
2

Of course, in general one cannot hope to know such explicit expressions for the
distribution of S n . Nonetheless, on the basis of the preceding, one can start to
see what is going on. Namely, when the distribution falls off rapidly outside of
compacts, averaging n independent random variables with distribution has the
effect of building an exponentially deep well in which the mean value m lies at the
bottom. More precisely, if one believes that the Gaussian random variables are
normal in the sense that they are typical, then one should conjecture
that,


even
when the random variables are not normal, the behavior of P S n m  for
large ns should resemble that of Gaussians with the same variance; and it is in
the verification of this conjecture that the moment generating function M plays
a central role. Namely, although an expression in terms of for the distribution
of Sn is seldom readily available, the moment generating function for Sn is easily
expressed in terms of M . To wit, as a trivial application of independence, we
have


EP eSn = M ()n , R.

Hence, by Markovs Inequality applied to eSn , we see that, for any a R,




P S n a ena M ()n = exp n a () ,

[0, ),

where
(1.3.8)

() log M ()

is the logarithmic moment generating function of . The preceding relation is one of those lovely situations in which a single quantity is dominated by a
whole family of quantities, which means that one should optimize by minimizing
over the dominating quantities. Thus, we now have
(1.3.9)

"

P S n a exp n

sup
[0,)

#

a () .

26

1 Sums of Independent Random Variables

Notice that (1.3.9) is really very good. For instance, when the Xn s are N (m, 2 )random variables and > 0, then (cf. (1.3.6)) the preceding leads quickly to the
estimate



n2
P S n m  exp 2 ,
2

which is essentially the upper bound at which we arrived before.


Taking a hint from the preceding, I now introduce the Legendre transform


(1.3.10)
I (x) sup x () : R ,
x R,
of and, before proceeding further, make some elementary observations about
the structure of the functions and I .
Lemma 1.3.11. The function is infinitely differentiable. In addition, for
each R, the probability measure on R given by
Z
1
ex (dx) for BR
() =
M ()

has moments of all orders,


Z
x (dx) =
R

0 (),

Z

x (dx)

and
R

2
x (dx) = 00 ().

Next, the function I is a [0, ]-valued, lower semicontinuous, convex function


that vanishes at m. Moreover,


I (x) = sup x () : 0
for x [m, )
and


I (x) = sup x () : 0

for x (, m].

Finally, if






= inf x R : (, x] > 0 and = sup x R : [x, ) > 0 ,
then I is smooth on (, ) and identically + off of [, ]. In fact, either
({m}) = 1 and = m = or m (, ), in which case 0 is a smooth,
strictly increasing mapping from R onto (, ),

I (x) = (x) x (x) , x (, ),

where

= 0

1

is the inverse of 0 , ({}) = eI () if > , and ({}) = eI () if


< .

1.3 Cramers Theory of Large Deviations

27

Proof: For notational convenience, I will drop the subscript during the
proof. Further, note that the smoothness of follows immediately from the
positivity and smoothness of M , and the identification of 0 () and 00 () with
the mean and variance of is elementary calculus combined with the remark
following (1.3.4). Thus, I will concentrate on the properties of the function I.
As the pointwise supremum of functions that are linear, I is certainly lower
semicontinuous and convex. Also, because (0) = 0, it is obvious that I 0.
Next, by Jensens Inequality,
Z
() x (dx) = m,
R

and, therefore, x () 0 if x m and 0 or if x m and 0. Hence,


because I is non-negative, this proves the one-sided extremal characterizations
of I (x) depending on whether x m or x m.
Turning to the final part, note first that there is nothing more to do in the
case when ({m}) = 1. Thus, assume that ({m}) < 1, in which case it is clear
that m (, ) and that none of the measures is degenerate (i.e., concentrate
at one point). In particular, because 00 () is the variance of the , we know
that 00 > 0 everywhere. Hence, 0 is strictly increasing and therefore admits a
smooth inverse on its image. Furthermore, because 0 () is the mean of , it
is clear that the image of 0 is contained in (, ). At the same time, given an
x (, ), note that
Z
ex ey (dy) as || ,
R

and therefore
x () achieves a maximum at some point x R. In
addition, by the first derivative test, 0 (x ) = x, and so x = 1 (x). Finally,
suppose that < . Then
Z
Z

y
e
e (dy) =
e(y) (dy) & ({}) as ,
R

(,]

and therefore eI() = inf 0 e M () = ({}). Since the same reasoning


applies when > , we are done. 
Theorem 1.3.12 (Cram
ers Theorem). Let {Xn : n 1} be a sequence of
P-independent random variables with common distribution , assume
R that the
associated moment generating function M satisfies (1.3.4), set m = R x (dx),
and define I accordingly, as in (1.3.10). Then,


P S n a enI (a)

P S n a enI (a)

for all a [m, ),

for all a (, m].

28

1 Sums of Independent Random Variables

Moreover, for a (, ) (cf. Lemma 1.3.11),  > 0, and n Z+ ,


!
h 
i



00 (a)
exp
n
I
(a)
+
|
(a)|
,
P S n a <  1

n2
1
where is the function given in (1.3.8) and 0
.

Proof: To prove the first part, suppose that a [m, ), and apply the second
part of Lemma 1.3.11 to see that the exponent in (1.3.9) equals nI (a), and,
after replacing {Xn : n 1} by {Xn : n 1}, one also gets the desired
estimate when a m.
To prove the lower bound, let a [m, ) be given, and set = (a)
[0, ). Next, recall the probability measure described in Lemma 1.3.11, and
remember that has mean value a = 0 () and variance 00 (). Further, if


Yn : n Z+ is a sequence of independent, identically distributed random
variables with common distribution , then it is an easy matter to check that,
for any n Z+ and every BRn -measurable F : Rn [0, ),
h
h
i
i
1
P Sn
E
e
F
X
,
.
.
.
,
X
EP F Y1 , . . . , Yn =
1
n .
M ()n
In particular, if
n
X
Tn
,
Tn =
Y` and T n =
n
`=1

then, because I (a) = a (),


i

h




P S n a <  = M ()n EP eTn , T n a < 



en(a+) M ()n P T n a < 

h 
i 

= exp n I (a) +  P T n a <  .

But, because the mean value and variance of the Yn s are, respectively, a and
00 (), (1.2.5) leads to
 00 ()


.
P T n a 
n2

The case when a (, m] is handled in the same way.

Results like the ones obtained in Theorem 1.3.12 are examples of a class of
results known as large deviations estimates. They are large deviations because the probability of their occurrence is exponentially small. Although large
deviation estimates are available in a variety of circumstances,1 in general one
has to settle for the cruder sort of information contained in the following.
1

In fact, some people have written entire books on the subject. See, for example, J.-D.
Deuschel and D. Stroock, Large Deviations, now available from the A.M.S. in the Chelsea
Series.

1.3 Cramers Theory of Large Deviations

29

Corollary 1.3.13. For any BR ,

h
i
1
log P S n
n n
h
i
1
log P S n inf I (x).
lim
n n
x

inf I (x) lim


x

(I use and to denote the interior and closure of a set . Also, recall that I
take the infemum over the empty set to be +.)

Proof: To prove the upper bound, let be a closed set, and define + =
[m, ) and = (, m]. Clearly,



P S n 2P S n + P S n .

Moreover, if + 6= and a+ = min{x : x + }, then, by Lemma 1.3.11 and


Theorem 1.3.12,



I (a+ ) = inf I (x) : x +
and P S n + enI (a+ ) .

Similarly, if 6= and a = max{x : x }, then





I (a ) = inf I (x) : x
and P S n enI (a ) .

Hence, either = , and there is nothing to do anyhow, or






P S n 2 exp n inf I (x) : x , n Z+ ,

which certainly implies the asserted upper bound.


To prove the lower bound, assume that is a non-empty open set. What I
have to show is that
h
i
1
log P S n I (a)
lim
n n

for every a . If a (, ), choose > 0 so that (a , a + ) and


use the second part of Theorem 1.3.12 to see that

h


i
1
log P S n I (a)  (a)
n n
lim

for every  (0, ). If a


/ [, ], then I (a) = , and so there is nothing to do.
Finally, if a {, }, then ({a}) = eI (a) and therefore


P S n P S n = a enI (a) . 

30

1 Sums of Independent Random Variables

Remark 1.3.14. The upper bound in Theorem 1.3.12 is often called Chernoff s Inequality. The idea underlying its derivation is rather mundane by
comparison to the subtle idea underlying the proof of the lower bound. Indeed,
it may not be immediately obvious what that idea was! Thus, consider once
again the second part of the proof of Theorem 1.3.12. What I had to do is estimate the probability that S n lies in a neighborhood of a. When a is the mean
value m, such an estimate is provided by the Weak Law. On the other hand,
when a 6= m, the Weak Law for the Xn s has very little to contribute. Thus,
what I did is replace the original Xn s by random variables Yn , n Z+ , whose
mean value is a. Furthermore, the transformation from the Xn s to the Yn s was
sufficiently simple that it was easy to estimate Xn -probabilities in terms of Yn probabilities. Finally, the Weak Law
Pn applied to the Yn s gave strong information
about the rate of approach of n1 `=1 Y` to a.

I close this section by verifying the conjecture (cf. the discussion preceding
Lemma 1.3.11) that the Gaussian case is normal. In particular, I want to check
that the well around m in which the distribution of S n becomes concentrated
looks Gaussian, and, in view of Theorem 1.3.12, this comes down to the following.

Theorem 1.3.15. Let everything be as in Lemma 1.3.11, and assume that


the variance 2 > 0. There exists a (0,
1] and a K (0, ) such that
[m , m + ] (, ) (cf. Lemma 1.3.11), 00 (x) K,


(x) K|x m|,

and



2

I (x) (x m) K|x m|3

2 2

for all x [m , m + ]. In particular, if 0 <  < , then




 2


3
K
,
P |S n m|  2 exp n
2 2

and if |a m| < and  > 0, then









|a m|2
K
2
+ K|a m|  + |a m|
.
P |S n a| <  1 2 exp n
2 2
n

Proof: Without loss in generality (cf. Exercise 1.3.17), I will assume that m =
0 and 2 = 1. Since, in this case, (0) = 0 (0) = 0 and 00 (0) = 1, it
follows that (0) = 0 and 0 (0) = 1. Hence, we can find an
M (0, )
and a (0, 1] with < < < for which (x) x M |x|2 and


() 2 M ||3 whenever |x| and || (M + 1), respectively. In
2


particular, this leads immediately to (x) (M + 1)|x| for |x| , and
the estimate for I comes easily from the preceding combined with equation
I (x) = (x)x (x) . 

Exercises for 1.3

31

Exercises for 1.3



Exercise 1.3.16. Let E, F, be a measure space and f a non-negative,
F-measurable function. If either (E) < or f is -integrable, show that
kf kLp (;R) kf kL (;R)

as p .

Hint: Handle the case (E) < first, and treat the case when f L1 (; R)
by considering the measure (dx) = f (x) (dx).
Exercise 1.3.17. Referring to the notation used in this section, assume that
is a non-degenerate (i.e., it is not concentrated at a single point) probability
measure on R for which (1.3.4) holds. Next, let m and 2 be the mean and
variance of , use to denote the distribution of
x R 7

xm
R

under ,

and define , I , and accordingly. Show that


() = m + (),
R,


xm
,
x R,
I (x) = I



Image 0 = m + Image 0 ,



1
xm
, x Image 0 .
(x) =

Exercise 1.3.18. Continue with the same notation as in the preceding.


(i) Show that I I if M M .
(ii) Show that
I (x) =

(x m)2
,
2 2

x R,


when is the N m, 2 distribution with > 0, and show that
I (x) =

bx
bx
xa
xa
,
log
+
log
p(b a)
(1 p)(b a) b a
ba

x (a, b),

when a < b, p (0, 1), and ({a}) = 1 ({b}) = p.



(iii) When is the hcentered
Bernoulli distribution given by {1} = 12 , show
i
2
2
that M () exp 2 , R, and conclude that I (x) x2 , x R. More

generally, given n Z+ , {k : 1 k n} R, and independent random


variables X1 , . . . , Xn with this as their common distribution, let denote the

32

1 Sums of Independent Random Variables

Pn
Pn
x2
2
1 k2 .
distribution of S 1 k Xk and show that I (x) 2
2 , where
In particular, conclude that



a2
P |S| a 2 exp 2 , a [0, ).
2

Exercise 1.3.19. Although it is not exactly the direction in which I have been
going, it seems appropriate to include here a derivation of Stirlings formula.
Namely, recall Eulers Gamma function:
Z
(1.3.20)
(t)
xt1 ex dx,
t (1, ).
[0,)

The goal of this exercise is to prove that


 t

t
(1.3.21)
(t + 1) 2t
e

as

t % ,

where the tilde means that the two sides are asymptotic to one another in
the sense that their ratio tends to 1. (See Exercise 2.1.16 for another approach.)
The first step is to make the problem look like one to which Exercise 1.3.16
is applicable. Thus, make the substitution x = ty, and apply Exercise 1.3.16 to
see that
! 1t
1

Z
(t + 1) t
e1 .
=
y t ety dy
tt+1
[0,)

This is, of course, far less than we want to know. Nonetheless, it does show that
all the action is going to take place near y = 1 and that the principal factor in
t
the asymptotics of (t+1)
tt+1 is e . In order to highlight these observations, make
the substitution y = z + 1 and obtain
Z
(t + 1)
=
(1 + z)t etz dz.
tt+1 et
(1,)
2

Before taking the next step, introduce the function R(z) = log(1 + z) z + z2

for z (1, 1), and check that R(z) 0 if z (1, 0] and that |R(z)|
everywhere in (1, 1). Now let (0, 1) be given, and show that


Z
t tz


t 2
t
1 + z e dz (1 ) (1 )e
exp
2
1

and
Z

h
it1 Z

etz dz 1 + e
(1 + z)ez dz



3
t 2
.
+
2 exp 1
3(1 )
2
1+z

t

|z|3
3(1|z|)

Exercises for 1.3

33

tz 2

Next, write (1 + z)t etz = e 2 etR(z) . Then


Z
Z
t tz
tz 2
1+z e
dz =
e 2 dz + E(t, ),
|z|

|z|

where

Z
E(t, ) =

tz 2
2


etR(z) 1 dz.

|z|

Check that
Z
r
Z

2
t 2
z2
1
2
2

tz2
e 2 dz 1 e 2 .
dz
e
= t 2

1
|z|
t
t2
|z|t 2

At the same time, show that


Z
Z
2
tz2 +|R(z)|
dz t
|E(t, )| t
|R(z)|e

|z|3 e

|z|

|z|

tz 2 35
2 3(1)

dz

12(1 )
(3 5)2 t

p
as long as < 35 . Finally, take = 2t1 log t, and combine these to conclude
that there is a C < such that


C
(t + 1)

1
, t [1, ).



2t t t
t
e

Exercise 1.3.22. Inspired by T.H. Carne,2 here is a rather different sort of


application of large deviation estimates. Namely, the goal is to show that for
each n 2 and 1 m < n there exists an (m 1)st order polynomial pm,n with
the property that


2
n

x pm,n (x) 2 exp m
for x [1, 1].
2n

(i) Given a C-valued f on Z, define Af : Z C by


f (n + 1) + f (n 1)
, n Z,
2


and show that, for any n 1, An f = EP f (Sn ) , where Sn is the sum of n
P-independent, {1, 1}-valued Bernoulli random variables with mean value 0.
Af (n) =

T.H. Carne, A transformation formula for Markov chains, Bull. Sc. Math., 109, pp. 399
405 (1985). As Carne points out, what he is doing is the discrete analog of Hadamards representation, via the Weierstrass transform, of solutions to heat equations in terms of solutions
to the wave equations.

34

1 Sums of Independent Random Variables

(ii) Show that, for each z C, there is a unique sequence {Q(m, z) : m Z} C


satisfying Q(0, z) = 1,


Q(m, z) = Q(m, z), and AQ( , z) (m) = zQ(m, z) for all m Z.
In fact, show that, for each m Z+ : Q(m, ) is a polynomial of degree m and
Q(m, cos ) = cos(m),

C.

In particular, this means that |Q(n, x)| 1 for all x [1, 1]. (It also means
that Q(n, ) is the nth Chebychev polynomial.)
(iii) Using induction on n Z+ , show that
 n

A Q( , z) (m) = z n Q(m, z),

m Z and z C,

and conclude that


h
i
z n = E Q Sn , z ,

n Z+

and z C.

In particular, if
h
i

pm,n (z) E Q Sn , z), Sn < m = 2n

X
|2`n|<m

 
n
Q(2` n, z),
`

conclude that (cf. Exercise 1.3.18)




m2
sup |x pm,n (x)| P |Sn | m 2 exp
2n
x[1,1]
n

for all 1 m n.

(iv) Suppose that A is a self-adjoint contraction on the real or complex Hilbert


space H (i.e., (f, Ag)H = (g, Af )H and kAf kH kf kH for all f, g H). Next,
assume that f, A` g H = 0 for some f, g H and each 0 ` < m. Show that



2


f, An g 2kf kH kgkH exp m
H
2n

for n m.

(See Exercise 2.3.30 for an application.)



Hint: Note that f, pm,n (A)g H = 0, and use the Spectral Theorem to see that,
for any polynomial p,
kp(A)f kH

sup |p(x)| kf kH ,
x[1,1]

f H.

1.4 The Strong Law of Large Numbers

35

1.4 The Strong Law of Large Numbers


In this section I will discuss a few almost sure convergence properties of averages
of independent random variables. Thus, once again, {Xn : n 1} will be a
sequence of independent random variables on a probability space , F, P , and
Sn and S n will be, respectively, the sum and average of X1 , . . . , Xn . Throughout
this section, the reader should notice how much more immediately important a
role independence (as opposed to orthogonality) plays than it did
1.2.
 in Section
To get started, I point out that, for both {Sn : n 1} and S n : n 1 , the
set on which convergence occurs has P-measure either 0 or 1. In fact, we have
the following simple application of Kolmogorovs 01 Law (Theorem 1.1.2).


+
Lemma
1.4.1.
For
any
sequence
a
:
n

Z
R and any sequence
n


bn : n Z+ (0, ) that converges to an element of (0, ], the set on which

lim

Sn an
bn

exists in R

has P-measure either 0 or 1. In fact, if bn as n , then both

lim

Sn an
bn

and

Sn an
bn
n
lim

are P-almost surely constant.


Proof: Simply observe that all of the events and functions involved can be
expressed in terms of {Sm+n Sm : n 1} for each m Z+ and are therefore
tail-measurable. 
The following beautiful statement, which was proved originally by Kolmogorov,
is the driving force behind
 many of the almost sure convergence results about
both {Sn : n 1} and S n : n 1 .

Theorem 1.4.2.
variables, and if

If the Xn s are independent, P-square integrable random

(1.4.3)

Var Xn ) < ,

n=1

then


X

 
Xn EP Xn
converges P-almost surely.

n=1

Note that, since


(1.4.4)

n 
X




1 X


Var X` ,
sup P
X` EP X`  2



nN
`=N

`=N

36

1 Sums of Independent Random Variables

 
P 
(1.4.3) certainly implies that the series n=1 Xn EP Xn converges in Pmeasure. Thus, all that I am attempting to do here is replace a convergence
in measure statement with an almost sure one. Obviously, this replacement
would be trivial if the supnN in (1.4.3) appeared on the other side of P. The
remarkable fact which we are about to prove is that, in the present situation,
the supnN can be brought inside!
Theorem 1.4.5 (Kolmogorovs Inequality).
and P-square integrable, then
(1.4.6)

If the Xn s are independent

n

!

X




1 X


Var Xn
P sup
X` EP X`  2

 n=1
n1
`=1

for each  > 0. (See Exercise 1.4.21 for more information.)


Proof: Without loss in generality, assume that each Xn has mean value 0.
Given 1 n < N , note that
2
SN
Sn2 = SN Sn

2



+ 2 SN Sn Sn 2 SN Sn Sn ;

and therefore, since


 SN Sn has mean value 0 and is independent of the -algebra
{X1 , . . . , Xn } ,
(*)

 2




EP SN
, An EP Sn2 , An for any An {X1 , . . . , Xn } .



In particular, if A1 = |S1 | >  and
n
o


An+1 = Sn+1 >  and max S`  ,
1`n

n Z+ ,

then, the An s are mutually disjoint,



BN


N
[



max Sn >  =
An ,

1nN

n=1

and so (*) implies that


N
N
 2
 X
 2
 X


P
E SN , BN =
E SN , An
EP Sn2 , An
P

n=1

2

N
X
n=1

n=1



P An =  2 P B N .

1.4 The Strong Law of Large Numbers

37

Thus,




 2 X
 


 P sup Sn >  = lim 2 P BN lim EP SN

EP Xn2 ,
2

n1

n=1

and so the result follows after one takes left limits with respect to  > 0. 
Proof of Theorem 1.4.2: Again assume
that the Xn s have mean value 0.

By (1.4.6) applied to XN +n : n Z+ , we see that (1.4.3) implies





 
1 X
EP Xn2 0 as N
P sup Sn SN  2

n>N
n=N +1

for every  > 0, and this is equivalent to the P-almost sure Cauchy convergence
of {Sn : n 1}. 
 In order to convert the conclusion in Theorem 1.4.2 into a statement about
S n : n 1 , I will need the following elementary summability fact about
sequences of real numbers.


Lemma 1.4.7 (Kronecker). Let bn : n Z+ be a non-decreasing sequence
of positive numbers that tend to , and set n = bn bn1 , where b0 0. If
{sn : n 1} R is a sequence that converges to s R, then
n
1 X
` s` s.
bn
`=1

In particular, if {xn : n 1} R, then


n

X
1 X
xn
x` 0 as n .
converges in R =
bn
b
n=1 n
`=1

Proof: To prove the first part, assume that s = 0, and for given  > 0 choose
N Z+ so that |s` | <  for ` N . Then, with M = supn1 |sn |,


n
Mb
1 X


N
+   as n .
` s`


bn
bn
`=1
Pn
Turning to the second part, set y` = xb`` , s0 = 0, and sn = `=1 y` . After
summation by parts,
n
n
1 X
1 X
` s`1 ;
x` = sn
bn
bn
`=1

`=1

and so, since sn s R as n , the first part gives the desired conclusion. 
After combining Theorem 1.4.2 with Lemma 1.4.7, we arrive at the following
interesting statement.

38

1 Sums of Independent Random Variables

Corollary 1.4.8. Assume that {bn : n 1} (0, ) increases to infinity as


n , and suppose that {Xn : n 1} is a sequence of independent, P-square
integrable random variables. If


X
Var Xn
< ,
b2n
n=1

then

n
 
1 X
X` EP X` 0
bn

P-almost surely.

`=1

As an immediate consequence of the preceding, we see that S n m P-almost


surely if the Xn s are identically distributed and P-square integrable. In fact,
without very much additional effort, we can also prove the following much more
significant refinement of the last part of Theorem 1.3.1.


Theorem 1.4.9 (Kolmogorovs Strong Law). Let Xn : n Z+ be
a sequence of P-independent, identically distributed random variables. If X1
is P-integrable and has mean value m, then, as n , S n m P-almost
surely and in L1 (P; R). Conversely, if S n converges (in R) on a set of positive
P-measure, then X1 is P-integrable.
 
P
Proof: Assume
that
X
is
P-integrable
and
that
E
X1 = 0. Next, set Yn =
1

Xn 1[0,n] |Xn | , and note that

P Yn 6= Xn =

n=1


P |Xn | > n

n=1
Z n
X
n=1




P |X1 | > t dt = EP |X1 | < .

n1

Thus, by the first part of the BorelCantelli Lemma,



P n Z+ N n YN = XN = 1.
Pn
In particular, if T n = n1 `=1 Y` for n Z+ , then, for P-almost every ,
T n () 0 if and only if S n () 0. Finally, to see that T n 0 P-almost
surely, first observe that, because EP [X1 ] = 0, by the first part of Lemma 1.4.7,
n



1X P
E [Y` ] = lim EP X1 , |X1 | n = 0,
n
n n
lim

`=1

and therefore, by Corollary 1.4.8, it suffices for us to check that

X
EP [Yn2 ]
< .
n2
n=1

1.4 The Strong Law of Large Numbers


To this end, set
C = sup `
`Z+

and note that

X
EP [Y 2 ]
n

n=1

n2

39

X
1
,
n2
n=`

X

1 X P 2
E
X
,
`

1
<
|X
|

`
=
1
1
n2
n=1
`=1

X
`=1


X
1
EP X12 , ` 1 < |X1 | `
n2
n=`

X
1
`=1





EP X12 , ` 1 < |X1 | ` C EP |X1 | < .

Thus, the P-almost sure convergence is now established, and the L1 (P; R)-convergence result was proved already in Theorem 1.2.6.
Turning to the converse assertion, first note that (by Lemma 1.4.1) if S n
converges in R on a set of positive P-measure, then it converges P-almost surely
to some m R. In particular,



|Xn |
= lim S n S n1 = 0 P-almost surely;
n
n n



and so, if An |Xn | > n , then P limn An = 0. But the An s are mutually
independent, and
by the second part of the BorelCantelli Lemma, we
Ptherefore,

now know that n=1 P An < . Hence,


Z

X




P
E |X1 | =
P |X1 | > t dt 1 +
P |Xn | > n < . 
lim

n=1

Remark 1.4.10. A reason for being interested in the converse part of Theorem
1.4.9 is that it provides a reconciliation between the measure theory vs. frequency
schools of probability theory.
Although Theorem 1.4.9 is the centerpiece of this section, I want to give
another approach to the study of the almost sure convergence properties of
{Sn : n 1}. In fact, following P. Levy, I am going to show that {Sn : n 1}
converges P-almost surely if it converges in P-measure. Hence, for example,
Theorem 1.4.2 can be proved as a direct consequence of (1.4.4), without appeal
to Kolmogorovs Inequality.
The key to Levys analysis lies in a version of the reflection principle, whose
statement requires the introduction of a new concept. Given an R-valued random
variable Y , say that R is a median of Y and write med(Y ), if


(1.4.11)
P Y P Y 12 .

40

1 Sums of Independent Random Variables

Notice that (as distinguished from a mean value) every Y admits a median; for
example, it is easy to check that




inf t R : P Y t 12
is a median of Y . In addition, it is clear that
med(Y ) = med(Y )

and

med ( + Y ) = + med (Y ) for all R.

On the other hand, the notion of median is flawed by the fact that, in general,
a random variable will admit an entire non-degenerate interval of medians. In
addition, it is neither easy to compute the medians of a sum in terms of the
medians of the summands nor to relate the medians of an integrable random
variable to its mean value. Nonetheless, at least if Y Lp (P; R) for some
p [1, ), the following estimate provides some information. Namely, since, for
med(Y ) and R,






| |p
| |p P Y P Y EP |Y |p ,
2
we see that, for any p [1, ) and Y Lp (P; R),


 p1
for all R and med (Y ).
| | 2EP |Y |p

In particular, if Y L2 (P ) and m is the mean value of Y , then


(1.4.12)

| m|

2Var(Y )

for all med(Y ).



Theorem 1.4.13 (L
evys Reflection Principle). Let Xn : n Z+ be
a sequence of P-independent random variables, and, for k `, choose `,k
med S` Sk . Then, for any N Z+ and  > 0,

(1.4.14)

max Sn + N,n 

1nN


2P SN  ,

and therefore

(1.4.15)



max Sn + N,n 

1nN


2P |SN |  .

Proof:
Clearly (1.4.15) follows by applying (1.4.14) to both the sequences

Xn : n 1} and {Xn : n 1} and then adding the two results.

1.4 The Strong Law of Large Numbers

41



To prove (1.4.14), set A1 = S1 + N,1  and



An+1 = max S` + N,` <  and Sn+1 + N,n+1 
1`n

for 1 n < N . Obviously, the An s are mutually disjoint and


N
[


An =

n=1

max Sn + N,n  .

1nN

In addition,



{SN  An SN Sn N,n

for each 1 n N.

Hence,
N

 X


P SN 
P An SN Sn N,n
n=1



N

 1
1X
P An = P max Sn + N,n  ,

1nN
2
2 n=1
where, in the passage to the
last line, I have used the independence of the sets
An and SN Sn N,n . 


+
Corollary 1.4.16. Let X
:
n

Z
be
n

a sequence of independent random
+
variables, and assume that Sn : n Z
converges in P-measure to an Rvalued random variable S. Then Sn S P-almost surely. (Cf. Exercise 1.4.25
as well.)
Proof: What I must show is that, for each  > 0, there is an M Z+ such
that




sup P max Sn+M SM  < .
N 1

1nN

To this end, let 0 <  < 1 be given, and choose M Z+ so that






for all 1 k < n.
<
P Sn+M Sk+M
2
2

Next, for a given N Z+ , choose N,n med SM +N SM +n for 0 n N .
Then |N,n | 2 , and so, by (1.4.15) applied to {XM +n : n 1},













P
max SM +n SM  P
max SM +n SM + N,n
1nN
1nN
2




< . 
2P SM +N SM
2

42

1 Sums of Independent Random Variables

Remark 1.4.17. The most beautiful and startling feature of Levys line of
reasoning is that it requires no integrability assumptions. Of course, in many
applications of Corollary 1.4.16, integrability considerations enter into the proof
that {Sn : n 1} converges in P-measure. Finally, a word of caution may be
in order. Namely, the result in Corollary 1.4.16 applies to the quantities Sn
themselves; it does not apply to associated quantities like S n . Indeed, suppose
that {Xn : n 1} is a sequence of independent, identically distributed random
variables that satisfy


 

 12
P Xn t = P Xn t = 1 + t2 log e4 + t2

for all t 0.

On the one hand, by Exercise 1.2.11, we know that the associated averages S n
tend to 0 in probability. On the other hand, by the second part of Theorem
1.4.9, we know that the sequence S n : n 1 diverges almost surely.

Exercises for 1.4


Exercise 1.4.18. Let X and Y be non-negative random variables, and suppose
that

i
 1 h
P X t EP Y, X t ,
t

(1.4.19)

t (0, ).

Show that


(1.4.20)

p  P  p  p1
,
E Y
p1

  p1

EP X p

p (1, ).

Hint: First, reduce to the


 case when X is bounded. Next, recall that,
 for any
measure space E, F, , any non-negative, measurable f on E, F , and any
(0, ),
Z

f (x) (dx) =
E

f > t dt =

(0,)


t1 f t dt.

(0,)

Use this together with (1.4.19) to justify the relation


 
E Xp p



tp2 EP Y, X t dt

(0,)

" Z
= pE Y
P

p2

dt =



p
EP X p1 Y ,
p1

and arrive at (1.4.20) after an application of H


olders Inequality.

Exercises for 1.4

43

Exercise 1.4.21. Let {Xn : n 1} be a sequence of mutually independent,


integrable random variables with mean value 0, and assume that
 2
P P-square
E
X
<
.
Let S denote the random variable (guaranteed by Theorem
n
1
1.4.2) to which {Sn : n 1} converges P-almost surely, and, using elementary
orthogonality considerations, check that Sn S in L2 (P; R) as well. Next,
after examining the proof of Kolmogorovs Inequality (cf. (1.4.6)), show that




2
2
1 P 2




P sup Sn t E S , sup Sn t , t > 0.
t
nZ+
nZ+

Finally, by applying (1.4.20), show that



 
p h i
2p
2p
p
P


EP S ,
(1.4.22)
E sup Sn
p1

p (1, ),

nZ+

and conclude from this that, for each p (2, ), {Sn : n 1} converges to S
in Lp (P ) if and only if S Lp (P ).
Exercise 1.4.23. If X L2 (P; R), then it is easy to characterize its mean m
as the c R that minimizes EP (X c)2 . Assuming that X L1 (P; R), show
that med(X) if and only if




EP |X | = min EP |X c| .
cR

Hint: Show that, for any a, b R,






E |X b| EP |X a| =

b


P(X t) P(X t) dt.

Exercise 1.4.24. Let {Xn : n 1} be a sequence of P-square integrable


random variables that converges in probability to a random variable X, and
assume that supn1 Var(Xn ) < . Show that X is square integrable and that


EP |Xn  X| 0. In particular, if, in addition, Var(Xn ) Var(X), show
that EP |Xn X|2 0.
Hint: Let n med(Xn ), and show that + = limn n and = limn n
are both
of med(X). Combine this with (1.4.12) to conclude that
elements

supn1 EP [Xn ] < and therefore that supn1 EP [X 2 ] < .

Exercise 1.4.25. The following variant of Theorem 1.4.13 is sometimes useful


and has the advantage that it avoids the introduction of medians. Namely, show
that, for any t (0, ) and n Z+ ,



P |Sn | > t
.
P max |Sn | 2t
1mn
1 max P |Sn Sm | > t
1mn

Note that this can be used in place of (1.4.15) when proving results like the one
in Corollary 1.4.16.

44

1 Sums of Independent Random Variables

Exercise 1.4.26. A random variable X is said to be symmetric if X has


the same distribution as X itself. Obviously, the most natural choice of median
for a symmetric random variable is 0; and thus, because sums of independent,
symmetric random variables are again symmetric, (1.4.14) and (1.4.15) are particularly useful when the Xn s are symmetric, since the `,k s can then be taken
to be 0. In this connection, consider the following interesting variation on the
theme of Theorem 1.4.13.
(i) Let X1 , . . . , Xn , . . . be independent, symmetric random variables, set Mn ()
=
|X` ()|, let n () be the smallest 1 ` n with the property that
max1`n

X` () = Mn (), and define
Yn () = Xn () ()

and Sn = Sn Yn .

Show that

7 Sn (), Yn () R2


and 7 Sn (), Yn () R2

have the same distribution, and conclude first that





P Yn t P Yn t & Sn 0 + P Yn t & Sn 0


= 2P Yn t & Sn 0 2P Sn t ,
for all t R, and then that







P max X` t 2P Sn t ,
1`n

t [0, ).

(ii) Continuing in the same setting, add the assumption that the Xn s are identically distributed, and use part (i) to show that


lim P |S n | C = 1

for some C (0, )



= lim nP |X1 | n = 0.
n

Hint: Note that



P
and that

1(1x)n
x


max |X` | > t = 1 P(|X1 | t)n

1`n

n as x & 0.

In conjunction with Exercise 1.2.11, this proves that if {Xn : n 1} is


a sequence of independent, identically distributed symmetric random
 variables,
then S n 0 in P-probability if and only if limn nP |X1 | n = 0.

Exercises for 1.4

45

Exercise 1.4.27. Let X and X 0 be a pair of independent random variables


that have the same distribution, let be a median of X, and set Y = X X 0 .
(i) Show that Y is symmetric and that


P |X | t 2P |Y | t

for all

t [0, ),

and conclude that, for any p (0, ),




 1

1
1
2 p 1 EP |Y |p p 2EP |Y |p p + || .
In particular, |X|p is integrable if and only if |Y |p is.
(ii) The result in (i) leads to my final refinement of The Weak Law of Large
Numbers. Namely, let {Xn : n 1} be a sequence of independent, identically
distributed random variables. By combining Exercise 1.2.11, part (ii) in Exercise
1.4.26, and part (i) above, show that1



lim P S n C = 1

for some C (0, )



= lim nP |X1 | n = 0
n


= S n EP X1 , |X1 | n 0 in P-probability.
n

Exercise 1.4.28. Let {Xn : n 1} be a sequence of mutually independent,


identically distributed, P-integrable random variables with mean value m. As
we already know, when m > 0, the partial sums Sn tend, P-almost surely, to
+ at an asymptotic linear rate m; and, of course, when m < 0, the situation
is similar at . On the other hand, when m = 0, we know that, if |Sn | tends
to at all, then, P-almost surely, it does so at a strictly sublinear rate. In this
exercise, you are to sharpen this statement by proving that
m = 0 = lim |Sn | < P-almost surely.
n

The beautiful argument given below is due to Y. Guivarch, but its full power
cannot be appreciated in the present context (cf. Exercise 6.2.19). Furthermore,
a classic result (cf. Exercise 5.2.43) due to K.L. Chung and W.H. Fuchs gives a
much better result for the independent random variables. Their result says that
limn |Sn | = 0 P-almost surely.
In order to prove the assertion here, assume that limn |Sn | = with
positive P-probability, use Kolmogorovs 01 Law to see that |Sn | Palmost surely, and proceed as follows.
1

These ideas are taken from the book by Wm. Feller cited at the end of 1.2. They become
even more elegant when combined with a theorem due to E.J.G. Pitman, which is given in
Fellers book.

46

1 Sums of Independent Random Variables

(i) Show that there must exist an  > 0 with the property that



P ` > k S` Sk  
for some k Z+ and therefore that
P(A) ,





where A : ` Z+ S` ()  .

(ii) For each and n Z+ , set


o
n


n () = t R : 1 ` n t S` () < 2

and

o
n


n0 () = t R : 1 ` n t S`0 () < 2 ,

Pn
where Sn0 `=1 X`+1 . Next, let Rn () and Rn0 () denote the Lebesgue measure of n () and n0 (), respectively; and, using the translation invariance of
Lebesgue measure, show that
Rn+1 () Rn0 () 1A0 (),




where A0 : ` 2 S` () S1 ()  .
On the other hand, show that
 
 
EP Rn0 = EP Rn

and P(A0 ) = P(A),

and conclude first that




P(A) EP Rn+1 Rn ,
and then that

n Z+ ,

1 P 
E Rn .
n n

P(A) lim

(iii) In view of parts (i) and (ii), what remains to be done is show that
m = 0 = lim

1 P 
E Rn = 0.
n

But, clearly, 0 Rn () n. Thus, it is enough to show that, when m = 0,


Rn
n 0 P-almost surely; and, to this end, first check that

Rn ()
Sn ()
0,
0 =
n
n

and, finally, apply The Strong Law of Large Numbers.

Exercises for 1.4

47

Exercise 1.4.29. As I have already said, for many applications The Weak
Law of Large Numbers is just as good as and even preferable to the Strong
Law. Nonetheless, here is an application in which the full strength of the Strong
Law plays an essential role. Namely, I want to use the Strong Law to produce
examples of continuous, strictly increasing functions F on [0, 1] with the property
that their derivative
F (y) F (x)
=0
yx
yx

F 0 (x) lim

at Lebesgue-almost every x (0, 1).

By familiar facts about functions of a real variable, one knows that such functions F are in one-to-one correspondence with non-atomic, Borel probability
measures on [0, 1] which charge every non-empty open subset but are singular
to Lebesgues measure.
Namely, F is the distribution function determined by :

F (x) = (, x] .
+

(i) Set = {0, 1}Z , and, for each p (0, 1), take Mp = (p )Z , where p on
{0, 1} is the Bernoulli measure with p ({1}) = p = 1 p ({0}). Next, define
7 Y ()

2n n [0, 1],

n=1

and let p denote the Mp -distribution of Y . Given n Z+ and 0 m < 2n ,


show that


p m2n , (m + 1)2n = p`m,n (1 p)n`m,n ,
Pn
n
n
where
`
=
=
m,n
k=1 k and (1 , . . . , n ) {0, 1} is determined by m2
Pn
k
k . Conclude, in particular, that p is non-atomic and charges every
k=1 2
non-empty open subset of [0, 1].
(iii) Given x [0, 1) and n Z+ , define

n (x) =

if 2n1 x b2n1 xc

if 2n1 x b2n1 xc <

1
2
1
2,

where
denotes the integer part of s. If {n : n 1} {0, 1} satisfies
Pbsc

x = 1 2m m , show that m = m (x) for all m 1 if and only if m =


 0 for
infinitely many m 1. In particular, conclude first that n = n Y () , n
Z+ , for Mp -almost every and, second, by the Strong Law, that
n
1 X
n (x) p
n m=1

Thus, p1 p2 whenever p1 6= p2 .

for p -almost every x [0, 1].

48

1 Sums of Independent Random Variables

(iv) By Lemma 1.1.6, we know that 12 is Lebesgue measure [0,1] on [0, 1].
Hence, we now know that p [0,1] when p 6= 12 . In view of the introductory
remarks, this completes
the proof that, for each p (0, 1) \ { 12 }, the function

Fp (x) = p (, x] is a strictly increasing, continuous function on [0, 1] whose
derivative vanishes at Lebesgue-almost every point. Here, one can do better.
Namely, referring to part (iii), let p denote the set of x [0, 1) such that

1
n (x) = p,
n n
lim

where n (x)

n
X

m (x).

m=1

We know that 12 has Lebesgue measure 1. Show that, for each x 12 and
p (0, 1) \ { 12 }, Fp is differentiable with derivative 0 at x.

Hint: Given x [0, 1), define


Ln (x) =

n
X

2m m (x)

and Rn (x) = Ln (x) + 2n .

m=1

Show that



Fp Rn (x) Fp Ln (x) = Mp

n
X

!
2m m = Ln (x)

= pn (x) (1 p)nn (x) .

m=1

When p
that

(0, 1) \ { 12 }

and x 12 , use this together with 4p(1 p) < 1 to show



!
Fp Rn (x) Fp Ln (x)
< 0.
lim n log
n
Rn (x) Ln (x)

To complete the proof, for given x 12 and n 2 such that n (x) 2, let

mn (x) denote the largest m < n such that m (x) = 1, and show that mnn(x) 1
as n . Hence, since 2n1 < h 2n implies that


Fp (x) Fp (x h)
nmn (x)+1 Fp Rn (x) Fp Ln (x)
,
2
Rn (x) Ln (x)
h

one concludes that Fp is left-differentiable at x and has left derivative equal to


0 there. To get the same conclusion about right derivatives, simply note that
Fp (x) = 1 F1p (1 x).
(v) Again let p (0, 1) \ { 12 } be given, but this time choose x p . Show that

lim

h&0

Fp (x + h) Fp (x)
= +.
h

The argument is similar to the one used to handle part (iv). However, this time
the role played by the inequality 4pq < 1 is played here by (2p)p (2q)q > 1 when
q = 1 p.

1.5 Law of the Iterated Logarithm

49

1.5 Law of the Iterated Logarithm


Let X1 , . . . , Xn , . . . be a sequence of independent, identically distributed random
variables with mean value 0 and variance 1. In this section, I will investigate
exactly how large {Sn : n Z+ } can become as n . To get a feeling
for what one should be expecting, first note that, by Corollary 1.4.8, for any
non-decreasing {bn : n 1} (0, ),
Sn
0
bn

P-almost surely if

X
1
< .
2
b
n=1 n
1

Thus, for example, Sn grows more slowly than n 2 log n. On the other hand, if
Sn
;
the Xn s are N (0, 1)-random variables, then so are the random variables
n
and therefore, for every R (0, ),


P

Sn
lim R
n




[  Sn
S
R lim P N R > 0.
= lim P
N
N
n
N

nN

Hence, at least for normal random variables, one can use Lemma 1.4.1 to see
that
Sn
lim = P-almost surely;
n
n
1

and so Sn grows faster than n 2 .


If, as we did in Section 1.3, we proceed on the assumption that Gaussian
random variables are typical, we should expect the growth rate of the Sn s to be
1
1
something between n 2 and n 2 log n. What, in fact, turns out to be the precise
growth rate is

(1.5.1)

q
2n log(2) (n 3),


where log(2) x log log x (not the logarithm with base 2) for x [e, ). That
is, one has The Law of the Iterated Logarithm:
(1.5.2)

Sn
= 1 P-almost surely.
n n
lim

This remarkable fact was discovered first for Bernoulli random variables by Khinchine, was extended by Kolmogorov to random variables possessing 2 +  moments, and eventually achieved its final form in the work of Hartman and Wintner. The approach that I will adopt here is based on ideas (taught to me by
M. Ledoux) introduced originally to handle generalizations of (1.5.2) to random

50

1 Sums of Independent Random Variables

variables with values in a Banach space.1 This approach consists of two steps.
The first establishes a preliminary version of (1.5.2) that, although it is far cruder
than (1.5.2) itself, will allow me to justify a reduction of the general case to the
case of bounded random variables. In the second step, I deal with bounded random variables and more or less follow Khinchines strategy for deriving (1.5.2)
once one has estimates like the ones provided by Theorem 1.3.12.
In what follows, I will use the notation
= []

S[]
and S =

for [3, ),

where [] is the integer part of .


Lemma 1.5.3. Let {Xn : n 1} be a sequence of independent, identically
distributed random variables with mean value 0 and variance 1. Then, for any
a (0, ) and (1, ),2


lim Sn a

(a.s., P) if



X

1
P S m a 2 < .
m=1

Proof: Let (1, ) be given and, for each m N and 1 n m , let


m,n
be a median (cf. (1.4.11)) of S[ m ] Sn . Noting that, by (1.4.12), m,n 2 m ,
we know that

Sn


1





2
lim Sn = lim m1max m Sn lim m1max m
m
m
n
n m
n




Sn + m,n
1
,
2 lim maxm
m n
m

and therefore




P lim Sn a P
n

lim max

m n m



Sn + m,n

!
a

12

But, by Theorem 1.4.13,

!




Sn + m,n

1
1
a 2 2P S m a 2 ,
P maxm
n
m

and so the desired result follows from the BorelCantelli Lemma.


1

See 8.4.2 and 8.6.3 and, for much more information, M. Ledoux and M. Talagrand, Probability in Banach Spaces, Springer-Verlag, Ergebnisse Series 3.FolgeBand 23 (1991).
2 Here and elsewhere, I use (a.s.,P) to abbreviate P-almost surely.

1.5 Law of the Iterated Logarithm

51

Lemma 1.5.4. For any sequence {Xn : n 1} of independent, identically


distributed random variables with mean value 0 and variance 2 ,


lim Sn 8

(1.5.5)

(a.s., P).

Proof: Without loss in generality, I assume throughout that = 1; and, for


the moment, I will also assume that the Xn s are symmetric (cf. Exercise 1.4.26).
By Lemma 1.5.3, we will know that (1.5.5) holds with 8 replaced by 4 once I
show that



X

3
P S2m 2 2 < .

(*)

m=0

In order to take maximal advantage of symmetry, let (, F, P) be the probability


space on which the Xn s are defined, use {Rn : n 1} to denote the sequence of
Rademacher functions on [0, 1) introduced in Section 1.1, and set Q = [0,1) P
on [0, 1) , B[0,1) F . It is then an easy matter to check that symmetry of
the Xn s is equivalent to the statement that

+
X1 (), . . . , Xn (), . . . RZ
has the same distribution under P as

+
(t, ) [0, 1) 7 R1 (t)X1 (), . . . , Rn (t)Xn (), . . . RZ
has under Q. Next, using the last part of (iii) in Exercise 1.3.18 with k = Xk (),
note that

[0,1)

2m

!
X

t [0, 1) :
Rn (t)Xn () a
n=1
#
"
a2
, a [0, ) and .
2 exp P2m
2 n=1 Xn ()2

Hence, if

)
2m
1 X
2
Xm () 2
: m
2 n=1

(
Am

and

2m

)!
X

3


,
t [0, 1) :
Rn (t)Xn () 2 2 2m

(
Fm () [0,1)

n=1

52

1 Sums of Independent Random Variables

then, by Tonellis Theorem,

o Z
n


3


2
=
Fm ) P(d)
P : S2m () 2 2m

"

82m
exp P2m 2
2 n=1 Xn ()2

Z
2

h
i

P(d) 2 exp 4 log(2) 2m + 2P Am .


P
Thus, (*) comes down to proving that
m=0 P Am < ; and, in order to
check this, I argue in much the same way as I did when I proved the converse
statement in Kolmogorovs Strong Law. Namely, set
m

Tm =

2
X

Xn2 ,


Bm =

n=1


Tm+1 Tm
2 ,
2m

and T m =

Tm
2m



for m N. Clearly, P Am = P Bm . Moreover, the sets Bm , m N, are
mutually independent; and therefore, by the BorelCantelli Lemma, I need only
check that




Tm+1 Tm

2
= 0.
P lim Bm = P lim
m
m
2m

But, by the Strong Law, we know that T m 1 (a.s., P), and therefore it is
clear that
Tm+1 Tm
1 (a.s., P).
2m

I have now proved (1.5.5) with 4 replacing 8 for symmetric random variables.
To eliminate the symmetry assumption, again let (,
 F, P) be the probability
0
0
0
space on which the Xn s are defined, let , F , P be a second copy of the
same space, and consider the random variables

(, 0 ) 0 7 Yn , 0 Xn () Xn ( 0 )
under the measure Q P P0 . Since the Yn s are obviously (cf. part (i) of
Exercise 1.4.21) symmetric, the result which I have already proved says that

lim



Sn () Sn ( 0 )

22 8

for Q-almost every (, 0 ) 0 .

n|
Now suppose that limn |S
n > 8 on a set of positive P-measure. Then, by
Kolmogorovs 01 Law, there would exist an  > 0 such that

|Sn ()|
8 +  for P-almost every ;
n
n
lim

1.5 Law of the Iterated Logarithm

53

and so, by FubinisTheorem,3 we would


that, for Q-almost every (, 0 )
have
0
+
+
, there is a nm () : m Z
Z such that nm () % and






Sn () () Sn () ( 0 )
Sn () ()
Sn () ( 0 )
m
m
m
m
.
lim
lim
lim
m
m
nm ()
nm ()
nm ()
m

But, again by Fubinis Theorem, this would mean


that there exists a {nm : m
Sn (0 )
 for P0 -almost every
Z+ } Z+ such that nm % and limm mn
m
0 0 , and obviously this contradicts
"  #
2
0
1
Sn
P
0. 
=
E
2 log(2) n
n

We have now got the crude statement alluded to above. In order to get the
more precise statement contained in (1.5.2), I will need the following application
of the results in 1.3.
Lemma 1.5.6. Let {Xn : n 1} be a sequence of independent random
variables with mean value 0, variance 1, and common distribution . Further,
assume that (1.3.4) holds. Then, for each R (0, ) there is an N (R) Z+
such that
!
#
"
r


8R log(2) n
R2 log(2) n
(1.5.7)
P Sn R 2 exp 1 K
n

for n N (R). In addition, for each  (0, 1], there is an N () Z+ such that,
for all n N () and |a| 1 ,

(1.5.8)

h 

i

 1

P Sn a <  exp a2 + 4K|a| log(2) n .
2

In both (1.5.7) and (1.5.8), the constant K (0, ) is the one in Theorem
1.3.15.
Proof: Set
n
=
n =
n

2 log(2) (n 3)
n

 12

To prove (1.5.7), simply apply the upper bound in the last part of Theorem
1.3.15 to see that, for sufficiently large n Z+ ,







3
(Rn )2





K Rn
.
P Sn R = P S n Rn 2 exp n
2
3

This is Fubini at his best and subtlest. Namely, I am using Fubini to switch between horizontal and vertical sets of measure 0.

54

1 Sums of Independent Random Variables

To prove (1.5.8), first note that







P Sn a <  = P S n an < n ,
where an = an and n = n . Thus, by the lower bound in the last part of
Theorem 1.3.15,




 2

 


an
K
2



+ K|an | n + an
P Sn a <  1 2 exp n
2
nn
!
h 
i

K
exp a2 + 2K|a|  + a2 n log(2) n
1 2
2 log(2) n

for sufficiently large ns.

Theorem 1.5.9 (Law of Iterated Logarithm). The equation (1.5.2) holds


for any sequence {Xn : n 1} of independent, identically distributed random
variables with mean
value 0oand variance 1. In fact, P-almost surely, the set of
n
Sn
limit points of n : n 1 coincides with the entire interval [1, 1]. Equiva
lently, for any f C R; R ,


lim f

(1.5.10)

Sn
n


=

sup f (t)

(a.s., P).

t[1,1]

(Cf. Exercise 1.5.12 for a converse statement and 8.4.2 and 8.6.3 for related
results.)
Proof: I begin with the observation that, because of (1.5.5), I may restrict
my attention to the case when the Xn s are bounded random variables. Indeed,
for any Xn s and any  > 0, an easy truncation procedure allows us to find an
Cb (R; R) such that Yn Xn again has mean value 0 and variance 1
while Zn Xn Yn has variance less than 2 . Hence, if the result is known
when the random variables are bounded, then, by (1.5.5) applied to the Zn s,


Pn
m=1 Zm ()

1 + 8,



lim Sn () 1 + lim

n
n
n
and, for a [1, 1],


Pn



Zm ()
lim Sn () a lim m=1
8
n
n

for P-almost every .

1.5 Law of the Iterated Logarithm

55

In view of the preceding, from now on I may and will assume that the Xn s
are bounded. To prove that limn Sn 1 (a.s., P), let (1, ) be given,
and use (1.5.7) to see that

h


 i
1
1
P S m 2 2 exp 2 log(2) m
+
for all sufficiently
large m Z . Hence, by Lemma 1.5.3 with a = , we see
that limn Sn (a.s., P) for every (1, ). To complete the proof, I
must still show that, for every a (1, 1) and  > 0,




P lim Sn a <  = 1.

Because I want to get this conclusion as an application of the second part of


the BorelCantelli Lemma, it is important that we be dealing with independent
events, and for this purpose I use the result just proved to see that, for every
integer k 2,




lim Sn a inf lim Skm a
k m

= inf




Skm Skm1

a
lim
k m

P-almost surely.

k m

Thus, because the events






Skm Skm1


a <  ,
Ak,m
km

m Z+ ,

are independent for each k 2, all that I need to do is check that

X

P Ak,m = for sufficiently large k 2.
m=1

But
P Ak,m





km 
km a


,
<
= P Skm km1
km km1 km km1

and, because




km

1 = 0,
lim max+
k mZ
m m1
k k

everything reduces to showing that



X

(*)
P Skm km1 a <  =
m=1

for each k 2, a (1, 1), and  > 0. Finally, referring to (1.5.8), choose 0 > 0
so small that a2 + 4K0 |a| < 1, and conclude that, when 0 <  < 0 ,
h
i

 1

P Sn <  exp log(2) n
2
for all sufficiently large ns, from which (*) is easy. 

56

1 Sums of Independent Random Variables

Remark 1.5.11. The reader should notice that the Law of the Iterated Logarithm provides a naturally occurring sequence of functions that converge in
measure but not almost everywhere. Indeed, it is obvious that
Sn 0 in

2

L (P; R), but the Law of the Iterated Logarithm says that Sn : n 1 is
wildly divergent when looked at in terms of P-almost sure convergence.
Exercises for 1.5
Exercise 1.5.12. Let {Xn : n 1} be a sequence of mutually independent,
identically distributed random variables for which


|Sn |
< > 0.
n n


(1.5.13)

lim

 
In this exercise I4 will outline a proof that X1 is P-square integrable, EP X1 = 0,
and
(1.5.14)

 1
Sn
Sn
= EP X12 2
= lim
n n
n n
lim

(a.s., P).

(i) Using Lemma 1.4.1, show that there is a [0, ) such that
(1.5.15)

lim


Sn

(a.s., P).

Next, assuming that X1 is P-square integrable, use The


Law of Large
 Strong

Numbers together with Theorem 1.5.9 to show that EP X1 = 0 and
 1
Sn
Sn
= lim
= EP X12 2 = lim
n n
n n

(a.s., P).

In other words, everything comes down to proving that (1.5.13) implies that X1
is P-square integrable.
(ii) Assume that the Xn s are symmetric. For t (0, ), set


nt = Xn 1[0,t] |Xn | Xn 1(t,) |Xn | ,
X
and show that
t, . . . , X
t , . . .
X
1
n
4

and

X1 , . . . , X n , . . .

I follow Wm. Feller An extension of the law of the iterated logarithm to variables without
variance, J. Math. Mech., 18 #4, pp. 345355 (1968), although V. Strassen was the first to
prove the result.

Exercises for 1.5

57

have the same distribution. Conclude first that, for all t [0, 1),
Pn



m=1 Xn 1[0,t] |Xn |
(a.s., P),
lim
n
n

where is the number in (1.5.15), and second that


h
i

 
EP X12 = lim EP X12 , X1 t 2 .
t%

Hint: Use the equation


nt
 Xn + X
,
Xn 1[0,t] |Xn | =
2

and apply part (i).


(iii) For general {Xn : n 1}, produce an independent copy {Xn0 : n 1} (as
in the proof of Lemma 1.5.4), and set Yn = Xn Xn0 . After checking that
Pn
| m=1 Ym |
2 (a.s., P),
lim
n
n
 
conclude
first that EP Y12 4 2 and then (cf. part


 (i) of Exercise1.4.27) that
EP X12 < . Finally, apply (i) to arrive at EP X1 = 0 and (1.5.14).

Exercise 1.5.16. Let {


sn : n 1} be a sequence of real numbers which possess
the properties that


lim sn+1 sn = 0.
lim sn = 1,
lim sn = 1, and
n

Show that the set of subsequential limit points of {


sn : n 1} coincides with
[1, 1]. Apply this observation to show that, in order to get the final statement in
Theorem 1.5.9, I need only have proved (1.5.10) for the function f (x) = x, x R.
Hint: In proving the last part, use the square integrability of X1 to see that

 2

X
Xn
1 < ,
P
n
n=1

and apply the BorelCantelli Lemma to conclude that Sn Sn1 0 (a.s., P).
Exercise 1.5.17. Let {Xn : n 1} be a sequence of RN -valued, identically
distributed random variables on (, F, P) with the property that, for each e
SN 1 = {x RN : |x| = 1}, e, X1 RN has mean value 0 and variance 1. Set
Pn
n | = 1 P-almost surely.
n = Sn , and show that limn |S
Sn = m=1 Xm and S
n
Here are some steps that you might want to follow.

58

1 Sums of Independent Random Variables

(i) Let {ek : k 1} be a countable, dense subset of SN 1 for which {e1 , . . . , eN }


N
is orthonormal, and suppose
that

 the sequence {sn : n 1} R has the
property that limn ek , sn RN = 1 for each k 1. Note that |sn |


1
N 2 max1kN (ek , sn RN , and conclude that C supn1 |sn | [1, ).

S`
(ii) Continuing (i), for a given  > 0, choose ` 1 so that SN 1 k=1 B ek , C .
Show that


|sn | max e,sn RN + ,
1k`

and conclude first that limn |sn | 1 +  and then that limn |sn | 1.
At the same time, since |sn | e1 , sn RN , show that limn |sn | 1. Thus
limn |sn | = 1.

(iii) Let {ek : k 1} be as in (i), and apply Theorem 1.5.9 to show that, for
n () : n 1} satisfies the condition in (i).
P-almost all , the sequence {S

Thus, by (ii), limn |Sn ()| = 1 for P-almost every .

Chapter 2
The Central Limit Theorem

In the preceding chapter I dealt with averages of random variables and showed
that, in great generality, those averages converge almost surely or in probability
to a constant. At least when all the random variables have the same distribution
and moments of all orders, one way of rationalizing this phenomenon is to recognize that the mean value is conserved whereas all higher moments are driven
to 0 when one averages. Of course, the reason why it is easy to conserve the
first moment is that the mean of the sum is the sum of the means. Thus, if one
is going to attempt to find a simple normalization procedure that conserves a
quantity involving more than the mean value, one should seek a quantity that
shares this additivity property.
With this in mind, one is led to ask what happens if one normalizes in a way
that conserves the variance. For this purpose, suppose that {Xn : n Z+ } is a
sequence of mutually independent, identicallyP
distributed random variables with
1
n
mean value 0 and variance 1, and set Sn = 1 Xk . Then Sn n 2 Sn again
has mean value 0 and variance 1. On the other hand, because of Theorem 1.5.9,
we know that, with probability 1, limn Sn = = limn Sn . Hence,
from the point of view of either almost sure convergence or even convergence in
probability, there is no hope that the Sn s will converge.
Nonetheless, the random variables {Sn : n 1} possess remarkable stability
when viewed from a distributional perspective. Indeed, if the Xn s are Gaussian,
then so are the Sn s, and therefore Sn N (0, 1) for all n 1. More generally,
even if the Xn s are not Gaussian, fixing their mean value and variance in this
way forces all their moments to stabilize. To be precise, assume that X1 has finite
moments of all orders, that its mean is 0, and that its variance is 1. Trivially,
L1 limn EP [Sn ] = 0 and L2 limn EP [Sn2 ] = 1. Next, assume that
L` limn EP [Sn` ] exists for 1 ` m, where m 2. I will show now that
Lm+1 limn EP [Snm+1 ] exists and is equal to mLm1 . To this end, first note
that, since EP [Xn ] = 0 and the Xn s are independent and identically distributed,
m  
X
 m+1 

m 
m P  j+1  P  mj 
P
E Sn
= nE Xn Xn + Sn1
=n
E Xn E Sn1
j
j=0
P

59

60

2 The Central Limit Theorem


m  
X
 m1 
m P  j+1  P  mj 
= nmE Sn1 + n
E Xn E Sn1 .
j
j=2
P

m+1

Thus, after dividing through by n 2 , one gets the desired conclusion when
n . Starting from L1 = 0 and L2 = 1, one now can use induction to check
Qm
+
that L2m1 = 0 and L2m = `=1 (2` 1) = 2(2m)!
m m! for all m Z . That is,



lim EP Sn2m1 = 0

and

m

 Y
(2m)!
lim EP Sn2m =
(2` 1) = m ,
n
2 m!
`=1

for all m Z+ . In other


 words, at least when the Xn s have moments of all
orders, limn EP Snm exists and is independent of the particular choice of
random variables. In particular, since for the Gaussian case, EP [Snm ] = EP [X1m ],
we conclude that all moments of the Sn s converge to the corresponding moments
of a standard normal random variable.
In this chapter we will see that the preceding stabilization result is just one
manifestation of a general principle known as the Central Limit phenomenon.
2.1 The Basic Central Limit Theorem
In this section I will derive the basic Central Limit Theorem using a beautiful
argument which was introduced by J. Lindeberg. Throughout, h, i denotes
the integral of a function against a measure .
2.1.1. Lindebergs Theorem. Let {Xn : n 1} be a sequence of independent,Psquare integrable random variables with mean value 0, and set
1
n
Sn = n 2 m=1 Xm . At least when the Xn s are identically distributed and
have moments of all orders and variance 1, we just saw that (recall that m,2
is the distribution of an N (m, 2 )-random variable)


(2.1.1)
lim EP (Sn ) = h, 0,1 i
n

for any polynomial : R C. In this subsection, I will prove a result


that shows that, under much more general conditions, (2.1.1) holds for all
C 3 R; C) with bounded second and third order derivatives.
In the following statement,
sX
p
p
Sn
2 ,
.
m
and Sn
(2.1.2) m = Var(Xm ) > 0, n = Var(Sn ) =
n
m=1

Notice that when the Xk s are identically distributed and have variance 1, the
Sn in (2.1.2) is consistent with the notation used above. Finally, set
(2.1.3)

m
1mn n

rn = max

and gn () =

n
i

1 X P h 2

E
X
,
X


m
n
m
2n m=1

2.1 The Basic Central Limit Theorem

61
1

for  > 0. Clearly, in the identically distributed case, rn = n 2 and


i
h
1
gn () = 12 EP X12 , |X1 | n 2 1  0 as n for each  > 0.

Theorem 2.1.4 (Lindeberg). Refer to the preceding, and let be an element


of C 3 (R; R) with bounded second and third order derivatives. Then, for each
 > 0,






rn 
P

000 + gn () 00 .
+
(2.1.5)
E Sn h, 0,1 i
u
u

2
6

In particular, because
rn2 2 + gn (),

(2.1.6)

 > 0,

(2.1.1) holds if gn () 0 as n for each  > 0.


Proof: Choose N (0, 1)-random variables Y1 , . . . , Yn which are both mutually
independent and independent of the Xm s. (After augmenting the probability
space, if necessary, this can be done as an application of either Theorem 1.1.7
or Exercise 1.1.14.) Next, set
k Yk
Yk =
n

and Tn =

n
X

Yk ,

and observe that Tn is again an N (0, 1)-random variable and therefore that








EP (Sn ) h, 0,1 i = EP (Sn ) EP (Tn ) .
k =
Further, set X

Xk
n ,

Um =

and define
X

1km1

Yk +

k
X

for 1 m n,

m+1kn

where a sum over the empty set is taken to be 0. It is then clear that

n
X






m EP Um + Ym .
where m EP Um + X

Moreover, if

Rm () Um + (Um ) 0 (Um )

2 00
2 (Um ),

R,

62

2 The Central Limit Theorem

m and Ym are independent of Um and have the same first


then (because both X
two moments)




 

m ) EP Rm (Ym )] EP Rm (X
m ) + EP Rm (Ym ) .
m = EP Rm (X
In order to complete the derivation of (2.1.5), note that, by Taylors Theorem,


 3 

Rm () 000 || 00 ||2 ;
u
u 6

and therefore, for each  > 0,


n
X


m )|
EP |Rm (X
1

n
n
X

 2

k000 ku X P  3
, |Xm | n
E |Xm | , |Xm | n + k00 ku
EP X
m
6
1
1

n
2
k000 ku
k000 ku X m
00
+ k00 ku gn (),
+
k
k
g
()
=

u
n
2
6

6
n
1

while
n
X
1

3
n
3
X

 k000 ku P 
3 4 rn k000 ku
m
.

E |Y1 |3
EP |Rm (Yn )|
6
3n
6
1

Hence, (2.1.5) is now proved.


Given (2.1.5), all that remains is to prove (2.1.6). However, for any 1 m n
and  > 0,
 2

 2


2
m
= EP Xm
, |Xm | < n + EP Xm
, |Xm | n 2n 2 + gn () . 
The condition that gn () 0 for each  > 0 is often called Lindebergs
condition because it introduced by J. Lindeberg and it was he who proved that
it is a sufficient condition for (2.1.1) to hold for all (cf. Theorem 2.1.8)
Cb (RN ; C). Later, Feller proved that (2.1.1) for all Cb (RN ; R) plus rn 0
imply that Lindebergs condition holds. Together, these two results are known
as the LindebergFeller Theorem. See Exercise 2.3.20 for a proof of Fellers
part.
2.1.2. The Central Limit Theorem. If one is not concerned about rates
of convergence, then the differentiability requirement on can be dropped from
the last part of Theorem 2.1.4. In order to understand the reason for this, it is
helpful to couch the statement of Theorem 2.1.4 entirely in terms of measures.
Thus, let n denote the distribution of Sn . Then, under Lindebergs condition,
Theorem 2.1.4 allows one to say that h, n i h, 0,1 i for all C 3 (RN ; C)
with bounded second and third order derivatives. Because we are dealing with
statements about integration and integration is a very forgiving operation, this
sort of result self-improves. To be precise, I prove the following lemma.

2.1 The Basic Central Limit Theorem

63

Lemma 2.1.7. Suppose that {n : n 1} is a sequence of (non-negative)


locally finite1 Borel measures on RN and that is a locally finite Borel mea N
sure on RN with the property that
 h, n i h, i for all Cc (R ; R).
N
Then, for any
 C R ; [0, ) , h, i limn h, n i. Moreover, if
C RN ; [0, ) is n -integrable for each n Z+ and if h, n i h, i [0, ),
then for any sequence {n : n 1} C(RN ; C) that converges uniformly on
compacts to a C(RN ; C) and satisfies |n | C for some C < and all
n 1, hn , n i h, i.

Proof: Choose Cc B(0, 1); [0, ) with total integral 1, and set  (x) =
N (1 x) for  > 0. Also, choose Cc B(0, 2); [0, 1] so that = 1 on
B(0, 1), and set R (x) = (R1 x) for R > 0.
Begin by noting that h, n i h, i for all Cc (RN ; C). Next, suppose
that Cc (RN ; C), and, for  > 0, set  =  ? , the convolution
Z
 (x y)(y) dy
RN

of  with . Then, for each  > 0,  Cc (RN ; C) and therefore h , n i


h , i. In addition, there is an R > 0 such that supp( ) B(0, R) for all
 (0, 1]. Hence,


lim h, n i h, i 2hR , ik ku .
n

Since lim&0 k ku = 0, we have now shown that h, n i h, i for all


Cc (RN ; C).

Now suppose that C RN ; [0, ) , and set R = R , where R is as
above. Then, for each R > 0, hR , i = limn hR , n i limn h, n i.
Hence, by Fatous Lemma, h, i limR
 hR , i limn h, n i.
Finally, suppose that C RN ; [0, ) is n -integrable for each n Z+ and
that h, n i h, i [0, ). Given {n : n 1} C(RN ; C) satisfying
|n | C and converging uniformly on compacts to , one has




hn , n i h, i hn , n i + h, n i h, i .

Moreover, for each R > 0,




lim hn , n i
n

lim

sup

n xB(0,2R)



|n (x) (x)|hR , n i + lim h(1 R )(n ), n i
n


2C lim h(1 R ), n i = lim 2C h, n i hR , n i = 2Ch(1 R ), i,
n

A Borel measure on a topological space is locally finite if it gives finite measure to compacts.

64

2 The Central Limit Theorem

and similarly


lim h, n i h, i
n


lim hR , n i hR , i + C lim h(1 R ), n i + Ch(1 R ), i
n

= 2Ch(1 R ), i.
Finally, because is -integrable, h(1 R ), i 0 as R by Lebesgues
Dominated Convergence Theorem, and so we are done. 
By combining Theorem 2.1.4 with the preceding, we have the following version
of the famous Central Limit Theorem.
Theorem 2.1.8 (Central Limit Theorem). With the setting the same as it
was in Theorem 2.1.4, assume that gn () 0 as n for each  > 0. Then


lim EP n (Sn ) = h, 0,1 i
n

whenever {n : n 1} C(R; C) satisfies




n (y)
<
sup sup
2
n1 yR 1 + |y|

and tends to uniformly on compacts. Moreover, if a < b , then


 2
Z b


y
1

dy.
exp
(2.1.9)
lim P a Sn b = 0,1 (a, b] =
n
2
2 a

(See Exercise 2.1.10 for more information about the identically distributed case.)
Proof: Take n to be the distribution of Sn . By Theorem 2.1.4, we know
that h, n i h, 0,1 i for all Cc (RN ; R). In addition, we know that
h, n i = 2 = h, 0,1 i when (y) = 1 + y 2 . Hence, the first assertion is an
application of Lemma 2.1.7.
Turning to the second assertion, let a < b be given. To prove (2.1.9), choose
{k : k 1} Cb (R; R) and {k : k 1} Cb (R; R) so that 0 k % 1(a,b)
and 1 k & 1[a,b] as k . Then,
Z


i

P

k (y) 0,1 (dy) 0,1 (a, b)


lim P a < Sn < b lim E k Sn =
n

as k , and, similarly,




lim P a Sn b lim EP k Sn =

Z
R

Finally, note that 0,1 (a, b) = 0,1 [a, b] . 


k (y) 0,1 (dy) 0,1 [a, b] .

Exercises for 2.1

65

Exercises for 2.1


Exercise 2.1.10. Let {Xn : n 1} be a sequence of independent, identically
1 Pn
distributed random variables, set Sn = n 2 m=1 Xm , and assume that



lim EP Sn2 R2 1

for every R [0, ).

In particular, by Lemma 2.1.7, this will certainly be the case whenever (2.1.1)
holds for every Cc (R; R). The purpose of this exercise is to show that the
Xn s are P-square integrable, have mean value 0, and variance no more than 1;
and the method which I will use is based on the same line of reasoning as was
given in Exercise 1.5.12.
 
 
(i) Assuming that X1 L2 (P; R), show that EP X1 = 0 and EP X12 1. In
particular, use this together with the result in part (i) of Exercise 1.4.27 to see
that it suffices to handle the case when the Xn s are symmetric.
(ii) In this and the succeeding parts of this exercise, we will be assuming that
the Xn s are symmetric. Following the same route as was suggested in (ii) of
Exercise 1.5.12, set


t = Xn 1[0,t] |Xn | Xn 1(t,) |Xn | ,
X
n

n Z+ ,



t, . . . , X
nt , . . . and X1 , . . . , Xn , . . . have the same distribuand recall that X
1
tion for each t (0, ). Use this
 together with our basic assumption to see that
limR sup nZ+ P An (t, R) = 0, where
t(0,)

)
( n
n

X X

1
t n2 R .
An (t, R)
Xk
X
k
1

(iii) Continuing in the setting of part (ii), set


n

1 X
Xk 1[0,t] |Xk | .
Snt = 1
n2 1


After noting that the Xn 1[0,t] |Xn | s are symmetric, check (cf. the proof of


Theorem 1.3.1) that EP |Snt |4 3t4 . In particular, conclude that, for each
t (0, ), there is an R(t) (0, ) such that

 1


1
EP |Snt |2 , An t, R(t) 3 2 t2 P An t, R(t) 2 1

for all n Z+ .

66

2 The Central Limit Theorem

(iv) Given t (0, ), choose R(t) (0, ) as in the preceding. Taking into
account the identity
Pn
Pn t
X
+
k
t
1
1 Xk
,
Sn =
1
2n 2
show that





 
EP X12 , |X1 | t = EP |Snt |2 EP |Snt |2 , An t, R(t) { + 1
h
i
EP Sn2 R(t)2 + 1

for all n Z+ and t (0,


use this and our basic hypothesis
 ). In particular,

to conclude first that EP X12 , |X1 | t 2 for all t (0, ) and then that X1
is square P-integrable.
(v) Show that (2.1.1) holds if and only if X1 has mean value 0 and variance 1.
Exercise 2.1.11. An interesting way in which to interpret The Central Limit
Theorem is as the solution to a certain fixed point problem. Namely, let P denote
the set of probability measures on R; BR with the properties that
Z
Z
x2 (dx) = 1 and
x (dx) = 0.
R


Next, define T2 for P to be the probability measure on R, BR given by


ZZ
x+y

(dx)(dy) for BR .
T2 () =
1
2
R2

After checking that T2 maps P into itself, use The Central Limit Theorem to
show that, for every P,
Z
Z
n
lim
d T2 =
d0,1 , Cb (R; C).
n

Conclude, in particular, that 0,1 is the one and only element of P with the
property that T2 = and that this fixed point is attracting. (See Exercise
2.3.21 for more information.)
Exercise 2.1.12. Here is another indication of the remarkable stability of normal random variables. Namely, I will outline here a derivation2 of the L
evy
Cram
er Theorem which says that if X and Y are independent random variables whose sum is normal (with some mean and variance), then both X and Y
are normal.
2

This derivation is based on a note by Z. Sasv


ari, who himself borrowed some of the ideas
from A. R
enyi. I know of no derivation that does not rely on complex analysis and would be
very interested in learning one.

Exercises for 2.1

67

(i) Assume that X + Y N (a, 2 ), and, by subtracting a from X, reduce to


the case in which X + Y N (0, 2 ). Next, show that there is nothing more to
do when = 0 and that one can always reduce to the case = 1 when > 0.
Thus, from now on, assume that X + Y N (0, 1).

(ii) Choose r (0, ) so that P |X| |Y | r 12 , and conclude that



R2
,
P |X| r + R P |Y | r + R 4 exp
2


R (0, ).

In particular,
show that the moment generating
functions z C 7 M (z) =



EP ezX C and z C 7 N (z) = EhP eizY C exist and are entire functions.
2

Further, note that M (z)N (z) = exp z2 , and conclude that M and N never
vanish. Finally, from the fact that X + Y has mean 0, show that one can reduce
to the case in which both X and Y have mean 0. Thus, from now on, we assume
that M 0 (0) = 0 = N 0 (0).

(iii) Because M never vanishes and M (0) = 1, elementary complex analysis (cf.
Lemma 3.2.3) guarantees that there is a unique entire function : C C such
that (0) = 0 and M (z) = e(z) for all z C. Further, from M 0 (0) = 0, note
that 0 (0) = 0. Thus,
(z) =

cn z n

where n!cn =

n=2

Finally, note that N (z) = exp

z2
2


 xX 
dn
P

log
E
e
R.

dxn
x=0

i
(z) .

(iv) As an application of Holders Inequality, observe that x R 7 (x) R


2
and x R 7 x2 (x) R are both convex. Thus, since 0 (0) = 0, both these
functions are non-increasing on (, 0] and non-decreasing on [0, ). Use this
observation to check that

(x) 0

x2
(x)
2

for all x R.

Next, use the preceding in conjunction with the trivial remarks



 

exp Re (z) = EP ezX e(x)
and

h
exp Re

z2
2

(z)

i

i
h 2


= EP ezY exp x2 (x)

to arrive at

y 2 2Re (z) x2

for z = x +

1 y C.

68

2 The Central Limit Theorem

In particular, this means that



 |z|2

,
Re (z)
2

z C.

(v) To complete the program, use Cauchys integral formula to show that, for
each n Z+ and r > 0, on the one hand,
Z 2 

1
re 1 e 1 n d, r > 0,
cn r n =
2 0

while, on the other hand (since (z) = z) and therefore z (z) = 0),

re

0=

1 n

d.

Hence,
1
cn r =


 
e 1 n d,
Re re 1

n Z+ and r > 0.

Finally, in combination with the estimate obtained in (iv) and the fact that
c0 = c1 = 0, this leads to the conclusion that cn = 0 for n 6= 2 and therefore
that (z) = c2 z 2 with 0 c2 12 .

Exercise 2.1.13. An important result that is closely related to The Central


Limit Theorem is the following observation, which occupies a central position in
the development of classical statistical mechanics.3
(i) For each n Z+ , let n denote the normalized surface measure on the
(n 1)-dimensional sphere
 
1
Sn1 n = x Rn : |x| = n 2 ,
(1)

and denote by n the distribution of the coordinate x1 under n . Check that,


(1)
when n 2, n (dt) = fn (t) dt, where
fn (t) =
3

n2
1
2

n n1

t2
1
n

 n3
2

1 
1(1,1) n 2 t ,

Although E. Borel seems to have thought he was the first to discover this result and rhap
sodizes about it a good deal in Sur les principes de la cin
etique des gaz, Ann. lEcole
Norm.

sup., 3e t. 23, it appears already in the 1866 article Uber


die Entwicklungen einer Funktion
von beliebig vielen Variabeln nach Laplaceshen Funktionen h
oherer Ordnung, J. Reine u.
Angewandte Math., by F. Mehler and is only a small part of what Mehler discovered there. Be
that as it may, Borel deserves credit for recognizing the significance of this result for statistical
mechanics.

Exercises for 2.1

69

and k1 denotes the surface area of the (k 1)-dimensional unit sphere in Rk .


Using polar coordinates to compute the right-hand side of
Z
|x|2
k
e 2 dx,
(2) 2 =
Rk

first check that

k1 =

2 2
,
k2

where (t) is Eulers -function (cf. (1.3.20)), and then apply Stirlings formula
(cf. (1.3.21)) to see that
n2
1
2

n n1

1

2

as

n .

Now, using g to denote the density for the standard Gauss distribution (i.e., the
Gauss kernel in (1.3.5)), apply these computations to show that
sup sup
n3 tR

fn (t)
< and that
g(t)

fn (t)
1 uniformly on compacts.
g(t)

In particular, conclude that, for any L1 (0,1 ; R),


Z
Z
(1)
(2.1.14)
dn
d0,1 .
R

(ii) A less computational approach to the same calculation is the following. Let
{Xn : n
variables, and set
p 1} be a sequence of independent N (0, 1) random

2
2
Rn = X1 + + Xn . First note that P Rn = 0 = 0 and then that the
distribution of

1
n 2 X1 , . . . , Xn
n
Rn
R2

is n . Next, use The Strong Law of Large Numbers to see that nn 1 (a.s., P)
and conclude that, for any N Z+ ,





lim EP n(N ) = EP X1 , . . . , XN , Cc RN ; R ,
n

(N )

where, for n N , n RN denotes the projection of n Rn onto its first



(N )
N coordinates. Conclude that if n on RN , BRN denotes the distribution of

x = (x1 , . . . , xn ) Rn 7 x(N ) x1 , . . . , xN RN under n , then
Z
Z

(N )
N
lim
dn =
d0,1
for all Cb RN ; C .
n

RN

RN

70

2 The Central Limit Theorem

(iii) By considering the case when N = 2, show that, for any Cb (R; R),
Z
(2.1.15)

lim

Sn1 ( n)


1X
xk
n

!2

k=1

d0,1

n (dx) = 0.

Notice that the non-computational argument has the advantage that it immedi(N )
ately generalizes the earlier result to cover n for all N Z+ , not just N = 1
(cf. Exercise 2.3.24). On the other hand, the conclusion is weaker in the sense
that convergence of the densities has been replaced by convergence of integrals
with bounded continuous integrands and that no estimate on the rate of convergence is provided. More work is required to restore the stronger statements
when N 2.
When couched in terms of statistical mechanics, this result can be interpreted
as a derivation of the Maxwell distribution of velocities for an ideal gas of free
particles of mass 2 and having average energy 1.
Exercise 2.1.16. The most frequently encountered applications of Stirlings
formula (cf. (1.3.21)) are to cases when t Z+ . That is, one is usually interested
in the formula
 n n

.
(2.1.17)
n! 2n
e

Here is a derivation of (2.1.17) as an application of The Central Limit Theorem.


Namely, take {X
 n : n 1} to be a sequence of independent, random variables
with P Xn > x = exp (x + 1)+ , x R for all n Z+ . For n 1, note that

1
Z
n+n
 1 
1 1+4

xn ex dx
P Sn+1 0, 4 =
n! 1+n
Z 12 1
1
1
n n y
nn+ 2 en n + 4 1+n
12
dy.
y
e
1
+
n
=
1
n!
n 2

By The Central Limit Theorem,

Z 14
 1 
x2
1

e 2 dx.
P Sn 0, 4
2 0


At the same time, an elementary computation shows that


1

n 2 + 14

1
n 2

1+n1

1+n

12

n

ny

1
4

Z
dy
0

x2
2

dx,

2.2 The BerryEsseen Theorem via Steins Method

71

and clearly (2.1.17) follows from these. In fact, if one applies the BerryEsseen
estimate proved in the next section, one finds that

2n
n!


n n
e

1
= 1 + O n 2 .

However, this last observation is not very interesting since we saw in Exercise
1.3.19 that the true correction term is of order n1 .4
2.2 The BerryEsseen Theorem via Steins Method
As we will see in the next section, the principles underlying the passage from
Theorem 2.1.4 to Theorem 2.1.8 are very general. In fact, as we will see in
Chapter 9, some of these principles can be formulated in such a way that they
extend to a very abstract setting. However, rather than delve into such extensions here, I will devote this section to a closer examination of the situation at
hand. Specifically, in this section we are going to see how to make the final part
of Theorem 2.1.8 quantitative.
From (2.1.5), we get a rate of convergence in terms of the second and third
derivatives of . In fact, if we assume that
(2.2.1)


 1
k EP |Xk |3 3 < ,

1 k n,

then (cf. the proof of Theorem 2.1.4), by using the estimates


000
3


Rm () k ku ||
6

and k k ,

one sees that (2.1.5) can be replaced by


(2.2.2)



Pn 3
Z
000
P


E Sn d0,1 2k ku 1 k


3n
3
R

when the Xk s have third moments.


Although both (2.1.5) and (2.2.2) are interesting, neither one of them can
be used to get very much information about the rate at which the distribution
functions
(2.2.3)
4


x R 7 Fn (x) P Sn x [0, 1]

As this exercise demonstrates, Stirlings formula is intimately connected to The Central

Limit Theorem. In fact, apart from the constant 2, what we now call Stirlings formula was
discovered first by DeMoivre while he was proving The Central Limit Theorem for Bernoulli
random variables. For more information, see, for example, Wm. Fellers discussion of Stirlings
formula in his Introduction to Probability Theory and Its Applications, Vol. I, Wiley, Series in
Probability and Math. Stat. (1968).

72

2 The Central Limit Theorem

are tending to the error function


(2.2.4)


1
G(x) 0,1 (, x] =
2

t2

e 2 dt.

To see how (2.1.5) and (2.2.2) must be modified in order to gain such information,
first observe that
Z

0 (x) Fn (x) G(x) dx
R
Z
(2.2.5)



P

= E (Sn )
(y) 0,1 (dy), Cb1 (R; R .
R

(To prove (2.2.5), reduce to the case in which Cc1 (R; R) and (0) = 0;
and for this case apply either Fubinis Theorem or integration by parts over
the intervals (, 0] and [0, ) separately.) Hence, in order to get information
about the distance between Fn and G, we will have to learn how to replace
the right-hand sides of (2.1.5) and (2.2.2) with expressions that depend only on
the first derivative of . For example, if the dependence is on k0 ku , then we
get information about the L1 (R; R) distance between Fn and G, whereas if the
dependence is on k0 kL1 (R;R) , then the information will be about the uniform
distance between Fn and G.
2.2.1. L1 -BerryEsseen. The basic idea that I will use to get estimates in
terms of 0 was introduced by C. Stein and is an example of a procedure known
as Steins method.1 In the case at hand, his method stems from the trivial
observation that if is a Borel probability measure
  on R and g is the Gauss
kernel in (1.3.5), then = 0,1 if and only if g = 0 in the sense of Schwartz

distribution theory. Equivalently, if A+ is the raising operator


(cf.
2.4.1) given
D
 E

by A+ (x) = x(x) (x), then, because hA+ , i = g, g , = 0,1


if and only if hA+ , i = 0 for sufficiently many test functions . In fact, as
will be shown in what follows, will be close to 0,1 if, in an appropriate sense,
hA+ , i is small.
To make mathematics out of the preceding, I will need the following.

Lemma 2.2.6. Let C 1 (R; R), assume that k0 ku < , set = h, 0,1 i,
and define
Z x
2
x2
t2
2
dt R.
(t)e

(2.2.7)
x R 7 f (x) e

Then f Cb2 (R; R),


(2.2.8)
1

kf ku 2k0 ku ,

q
kf 0 ku 3 2 k0 ku ,

kf 00 ku 6k0 ku ,

Stein provided an introduction, by way of examples, to his own method in Approximate


Computation of Expectations, I.M.S., Lec. Notes & Monograph Series # 7 (1986).

2.2 The BerryEsseen Theorem via Steins Method

73

and
f 0 (x) xf (x) = (x),

(2.2.9)

x R.

Proof: The facts that f C 1 (R; R) and that (2.2.9) holds are elementary
applications of The Fundamental Theorem of Calculus. Moreover, knowing that
f C 1 (R; R) and using (2.2.9), we see that f C 2 (R; R) and, in fact, that
f 00 (x) xf 0 (x) = f (x) + 0 (x),

(2.2.10)

x R.

To prove the estimates in (2.2.8), first note that, because and therefore f are
unchanged when is replaced by (0), I may and will assume that (0) = 0
and therefore that |(t)| k0 ku |t|. In particular, this means that
Z

Z
q


d0,1 k0 ku |t| 0,1 (dt) = k0 ku 2 .

t2

2
dt = 0, an alternative expression for f
(t)e

2
x2
t2
dt, x R.
(t)e

f (x) = e 2

Next, observe that, because


is

Thus, by using the original expression for f (x) when x (, 0) and the
alternative one when x [0, ), we see first that
Z

 2
x2
t sgn(x) e t2 dt, x R,
2
|f (x)| e
|x|

and then that


|f (x)| k0 ku e

x2
2


t+

|x|

q 
2

t2

e 2 dt.

But, since


Z
Z
t2
x2
t2
x2
d

t e 2 dt 1 = 0
e 2 dt e 2
e2
dx
x
x

we have that, for x R,


Z
Z
x2
t2
x2
e 2 dt e 2
(2.2.11) |x|e 2
|x|

t2

te 2 dt = 1 and e

for x [0, ),

x2
2

|x|

|x|

t2

e 2 dt

2;

which means that I have now proved the first estimate in (2.2.8). To prove the
other two estimates there, derive from (2.2.10)


x2
d  x2 0 
e 2 f (x) = e 2 f (x) + 0 (x)
dx

74

2 The Central Limit Theorem

and therefore that


Z
Z x
 t2
x2
x2
0
0
f (t)+ (t) e 2 dt = e 2
f (x) = e 2

 t2
f (t)+0 (t) e 2 dt,

x R.

Thus, reasoning as I did above and using the first estimate in (2.2.8) and the
relations in (2.2.9), (2.2.10), and (2.2.11), one arrives at the second and third
estimates in (2.2.8). 
I now have the ingredients needed to apply Steins method to the following
example of a BerryEsseen type of estimate.
Theorem 2.2.12 (L1 -BerryEsseen Estimate). Continuing in the setting
of Theorem 2.1.4, one has that for all  > 0 (cf. (2.1.3), (2.2.3), and (2.2.4))
(2.2.13)



Fn G 1
2 gn (2).

6(r
+
)
+
3
n
L (R;R)

Moreover, if (cf. (2.2.1)) m < for each 1 m n, then


(2.2.14)



Fn G 1

L (R;R)


6rn +

Pn

3
m=1 m
3n


 Pn
3
9 m=1 m
.

3n

2
In particular, if m
= 1 and m < for each 1 m n, then



8 3
6 + 2 3
Fn G 1
.

L (R;R)
n
n

Proof: Let C 1 (R; R) having bounded first derivative be given, and define
f accordingly, as in (2.2.7). Everything turns on the equality in (2.2.9). Indeed,
because of that equality, we know that the right-hand side of (2.2.5) is equal to
n 



 X




2 P
m f (Sn ) ,
EP f 0 (Sn ) EP Sn f (Sn ) =

m
E f 0 (Sn ) EP X
m=1

where I have set


m =

m
n

m =
and X

Xm
n .

Next, define

m
Tn,m (t) = Sn + (t 1)X

for t [0, 1],

m , and conclude that


note that Tn,m (0) is independent of X


m f (Sn ) =
EP X

 2 0

m f Tn,m (t) dt
EP X



2 P
=
m
E f 0 Tn,m (0) +

Z
0

 2 0


m f (Tn,m (t) f 0 Tn,m (0) dt
EP X

2.2 The BerryEsseen Theorem via Steins Method


for each 1 m n. Hence, we now see that
Z
n
n Z
X
X


2
(2.2.15)
EP (Sn ) d0,1 =

m
Am
R

where

m=1

m=1

75

Bm (t) dt,

h
i
Am EP f 0 Sn ) f 0 Tn,m (0)

and

h


i
2
m
Bm (t) EP X
f 0 (Tn,m (t) f 0 Tn,m (0)
.

Obviously, by Taylors Theorem and H


olders Inequality, for each 1 m n,


m
00
kf 00 ku
(*)
|Am |
m kf ku rn
n

while, for each t [0, 1] and  > 0,

i


kf 0 ku h 2
2
Bm (t) 2t
, |Xm | 2n .
m
kf 00 ku + 2 2 EP Xm
n
Thus, after summing over 1 m n, integrating with respect to t [0, 1], and
using (2.2.5), (2.2.15), and (*), we arrive at
Z




0 (x) Fn (x) G(x) dx rn +  kf 00 ku + 2gn (2)kf 0 ku ,


R

which, in conjunction with the estimates in (2.2.8), leads immediately to the


estimate in (2.2.13). In order to get (2.2.14), simply note that
Z 1 h
3



 i
m |3 f 00 Tn,m (st) ds tkf 00 ku m ,
Bm (t) t
EP |X
3n
0

and again use (2.2.15), (2.2.8), and (*). 


2.2.2. The Classical BerryEsseen Theorem. The result in Theorem
2.2.12 is already significant. However, it is not the classical BerryEsseen Theorem, which is the analogous statement about kFn Gku .
In order to prove the classical result via Steins method, we must learn how
to replace the k000 ku in Lindebergs Theorem by k0 kL1 (R;R) . It turns out that
this replacement is far more challenging than replacing k000 ku by k0 ku , which
was the replacement needed to prove Theorem 2.2.12. The argument that I will
use is a clever induction procedure that was introduced into this context by E.
Bolthausen.2 But, before I can apply Bolthausens argument, I will need the
following variation on Lemma 2.2.6.
2

The BerryEsseen Theorem appears as a warm-up exercise in Bolthausens An estimate


of the remainder term in a combinatorial central limit theorem, Z. Wahr. Gebiete 66, pp.
379386 (1984).

76

2 The Central Limit Theorem

Lemma 2.2.16.
C 1 (R; R), and define f accordingly, as in (2.2.7).
p Let
0
Then kf ku 8 k kL1 (R;R) and kf 0 ku k0 kL1 (R;R) .

Proof: I will assume, throughout, that k0 kL1 (R;R) = 1. Observe that, by the
Fundamental Theorem of Calculus, (cf. the notation in Lemma 2.2.6)
Z
(x)

= y (x) 0 (y) dy, where y = 1(,y] ,


R

and so (cf. (2.2.4))


Z
f (x) = y (x) 0 (y) dy,

where y (x) =

2e

x2
2


G(xy)G(x)G(y) 0.

At the same time, these, together with (2.2.9), give


Z 

0
f (x) =
xy (x) + y (x) 0 (y) dy.
R

Hence, the desired estimates come down to checking that



x2
e 2 G(x y) G(x)G(y) 14 ,

and





x2


2xe 2 G(x y) G(x)G(y) + 1(,y] (x) G(y) 1

for all (x, y) R R. But, if x y,


G(x y) G(x)G(y) G(x) G(x)2 =

2 
1
1 4 G(x) 12
4

and
G(x)


1 2
2

|x|

1
=
2

ZZ

2
2

!2
d

2 + 2
2

dd =


x2
1
1 e 2 ,
4

2 + 2 x2

which proves the first inequality. To get the second one, it suffices to consider
each of the four cases 0 x y, x 0 & y < x, y < x < 0, and x < 0 & y x
separately and take into account that, from the first part of (2.2.11),


x2
x2
x 0 = 2xe 2 1G(x) 1 and x < 0 = 2|x|e 2 G(x) 1. 

As distinguished from Lemma 2.2.6, Lemma 2.2.16 contains no estimate on


kf 00 ku . Indeed, there is no such estimate in terms of k0 kL1 (R;R) . As a consequence, the proof of the following is much more involved than that of Theorem
2.2.12

2.2 The BerryEsseen Theorem via Steins Method

77

Theorem 2.2.17 (Classical BerryEsseen Estimate). Let everything be as


in Theorem 2.1.4, and assume that (cf. (2.2.1)) m < for each 1 m n.
Then (cf. (2.2.3) and (2.2.4))
Pn
kFn Gku 10

(2.2.18)

3
m

3n

In particular, if m = 1 for all 1 m n, then (2.2.18) can be replaced by


Pn
kFn Gku 10

(2.2.19)

3
m
3

n2

3
max m

.
10
n
1mn

Proof: For each n Z+ , let n denote the smallest number with the property
that
Pn 3

kFn Gku 1 3 m
n

for all choices of random variables satisfying the hypotheses under which (2.2.18)
is to be proved. My strategy is to give an inductive proof that n 10 for all
n Z+ ; and, because 1 1 and therefore 1 1, I need only be concerned
with n 2.
m,
Given n 2 and X1 , . . . , Xn , define X
m , and Tn,m (t) for 1 m n and
t [0, 1] as in the proof of Theorem 2.2.12. Next, for each 1 m n, set
n,m =

2 ,
2n m

m =

m
,
n

n =

n
X

3
m
,

and n,m =

X  ` 3
.
n,m

1`n
`6=m

Finally, set
Sn,m =

X`

1`n
`6=m

Sn,m
,
and Sn,m =
n,m


and let x R 7 Fn,m (x) P Sn,m x [0, 1] denote the distribution
function for Sn,m . Notice that, by definition, kFn,m Gku n1 n,m for each
1 m n. Furthermore, because (cf. (2.1.3))
2n,m
2
=1
m
1 rn2
2n

we know first that


n,m


and n,m

n
3

(1 rn2 ) 2

n
n,m

1 m n,

3
n ,

78

2 The Central Limit Theorem

and therefore that


max kFn,m Gku

(2.2.20)

1mn

n n1
3

(1 rn2 ) 2

Now let Cb2 (R; R) with k00 kL1 (R) < be given, define f accordingly, as
in (2.2.7), and let
{Am : 1 m n}

and {Bm (t) : 1 m n & t [0, 1]}

be the associated quantities appearing in (2.2.15). By (2.2.9), we have that


h
i h

i

m f (Sn ) + EP Tn,m (0) f (Sn ) f Tn,m (0)
|Am | EP X

h
i

+ EP (Sn ) Tn,m (0)




m | kf ku + EP |X
m Tn,m (0)| kf 0 ku
EP |X
Z 1
P

m 0 Tn,m () d
E X
+
0




n,m 0
m 0 Tn,m ()
kf ku + max EP X
kf ku +
n
[0,1]




m 0 Tn,m () .
kf ku + kf 0 ku + max EP X

[0,1]

m from Tm,n (0), one sees that


Similarly, from (2.2.9)) and the independence of X
|Bm (t)| is dominated by
h

i h 2

i

3 f Tn,m (t) + EP X
Tn,m (0) f Tn,m (t) f Tn,m (0)
t EP X

m

h


i

2 Tn,m (t) Tn,m (0)
+ EP X

m



 

m |3 kf ku + tEP |X
m |3 EP |Tn,m (0)| kf 0 ku
tEP |X
Z 1
P 3 0

m Tn,m (t) d
E X
+t
0

3
t
m

 3 0


m Tn,m () .
kf ku + kf 0 ku + t max EP X
[0,1]

In order to handle the second term in the last line of each of these calculations,
introduce the function


n,m
0

y R.
(, , y) [0, 1] R 7 (, , y) Xm () +
n

2.2 The BerryEsseen Theorem via Steins Method

79

m is independent of Tn,m (0),


Then, because X
h

Z

Z
P k 0

i
k
m Tn,m ()

E X
Xm ()
(, , y) 0,1 (dy) P(d)

R
Z

Z
Z

k

Xm () (, , y) dFn,m (y) (, , y) dG(y) P(d)

Z
=

Z




m () k 0 (t, , y) G(y) Fn,m (y) dy P(d)
X


R

h
k
00
i
1
m k k00 kL1 (R;R) m n1 k kL 3(R;R) n ,
EP X
(1 rn2 ) 2
(1 rn2 )
n1 n

3
2

k {1, 3},

where I have used 0 (t, , y) to denote the first derivative of y R 7 (, , y),


applied (2.2.5) and (2.2.20), and noted that
k 0 (, , )kL1 (R;R) = k00 kL1 (R;R)

for all (, ) [0, 1] .

At the same time, because


k(, , )kL1 (R;R) =

n
k0 kL1 (R;R)
n,m

for all (, ) [0, 1] ,

we have that, for each [0, 1],


Z

Z

k

k0 kL1 (R;R) m
m ()k
X

(,
,
y)

(dy)
P(d)

0,1
1 .

R
2(1 rn2 ) 2

Hence, by combining these estimates, we arrive at

0
1
k kL (R;R)
n1 n
00

|Am | m kf ku + kf 0 ku +
3 k kL1 (R;R)
 12 +
2
2
(1 rn )
2(1 rn2 )

and

0
1
k
k
n1 n
L (R;R)
00
3

|Bm (t)| t
m
kf ku + kf 0 ku +
3 k kL1 (R;R)
 12 +
2
2
2
(1 rn )
2(1 rn )

for all 1 m n and t [0, 1], and, after putting these together with (2.2.5)
and (2.2.15), we conclude that
Z



0 (y) G(y) Fn (y) dy


R

3
kf ku + kf 0 ku

(2.2.21)
2

n1 k00 kL1 (R;R) n
k0 kL1 (R;R)
n .
+
3
1 +
(1 rn2 ) 2
2(1 r2 ) 2
n

80

2 The Central Limit Theorem

I next apply (2.2.21) to a special class of s. Namely, set

1
h(x) = 1 x

if x < 0
if x [0, 1]
if x > 1,

and define
h (x) = 


1 y h(x y) dy

for  > 0 and x R,


R
where Cc R; [0, ) satisfies R (y) dy = 1. Finally, let a R be given,
and set


x R and , L > 0.
,L (x) = h xa
Ln ,

It is then an easy matter to check that k0,L kL1 (R;R) = 1 while k00,L kL1 (R;R)
2
Ln . Hence, by plugging the estimates from Lemma 2.2.16 into (2.2.21) and
then letting  & 0, we find that, for each L > 0,




1 Z a+Ln



G(y) Fn (y) dy
sup

aR Ln a

r
2n1
1

3
n .
+
1+

3
1 +
8
2
(1 rn2 ) 2 L
2(1 r2 ) 2

(2.2.22)

But
1
Ln

1
Fn (y) dy Fn (a)
Ln
aLn

a+Ln

Fn (y) dy,
a

while
0

1
Ln

a+Ln

G(y) dy G(a) =
a

1
Ln

and, similarly,
1
0 G(a)
Ln

Z
a

a+Ln

Ln
(a + Ln y) 0,1 (dy) ,
8

Ln
G(y) dy .
8
aLn

Thus, from (2.2.22), we first obtain, for each L (0, ),

3
kFn Gku +
2

L
3n1
3
9
n ,
+
+
1
3
1 +
32
(1 rn2 ) 2 L (8) 2
8(1 rn2 ) 2

Exercises for 2.2

81

and then, after minimizing with respect to L (0, ),


r
r

 1
9
9
3
1 rn2 2
+
+
kFn Gku
8
32
2
(2.2.23)
r

1
 3
4 18
2 4
2
n .
1 rn

+
n1

In order to complete the proof starting from (2.2.23), we have to consider the
1
1
. Because kFn Gku 1,
or n < 10
two cases determined by whether n 10
it is obvious that we can take n 10 in the first case. On the other hand, if
1
and we assume that n1 10, then, because
n 10
n
n
n
X

1 X P  2  32
1 X P
3
3

m
rn3 ,
E Xm =
E |Xm | 3
n = 3
n 1
n 1
1

(2.2.23) says that kFn Gku 10n . Hence, in either case, n1 10 =


n 10. 
It is clear from the preceding derivation (in particular, the final step) that the
constant 10 appearing in (2.2.18) and (2.2.19) can be replaced by the smallest
> 1 that satisfies the equation
r
r
r
 1
2  3
1
9
9
3
4 18
23 2
2 1 3 4 .
+
1
+
= +

8
32
2

Numerical experimentation indicates that 10 is quite a good approximation to


the actual solution of this minimization problem. However, it should be recognized that, with sufficient diligence and entirely different techniques, one can
show that the 10 in (2.2.18) can be replaced by a number that is less than 1.
Thus, I do not claim that Steins method gives the best result, only that it gives
whatever it gives with relatively little pain.
Exercises for 2.2
Exercise 2.2.24. It is important to know that, at least qualitatively, one cannot do better than BerryEsseen. To see this, consider independent, symmetric,
{1, 1}-valued Bernoulli random variables, and define Fn accordingly. Next,
1
observe that when tn = (2n + 1) 2 ,
Z 0
x2
1

e 2 dx
F2n+1 (tn ) G(tn ) =
2 tn
1

and therefore that limn n 2 kFn Gku 12 . In particular, since m = 1 for


these Bernoulli random variables, we conclude that the constant in the Berry
1
Esseen estimate cannot be smaller than (2) 2 .

82

2 The Central Limit Theorem

Exercise 2.2.25. Because the derivation of Theorem 2.2.12 is so elegant and


simple, one wonders whether (2.2.14) can be used as the starting point for a proof
of (2.2.19). Unfortunately, the following nave idea falls considerably short of
the mark.
Let X1 , . . . , Xn satisfy the hypotheses of Theorem 2.2.17. Starting from
(2.2.14) and proceeding as I did in the passage from (2.2.22) to (2.2.23), show
that for every L > 0
Pn 3
6 1 m
L
+ ,
kFn Gku
3
Ln
8

and conclude that



kFn Gku

72

 14  Pn
1

3
m

3n
Pn
3

Obviously, this is unacceptably poor when n

 12

3
m
is small.

2.3 Some Extensions of The Central Limit Theorem


In most modern treatments of The Central Limit Theorem, Fourier analysis plays
a central role. Indeed, the Fourier transform makes the argument so simple that
it can mask what is really happening. However, now that we know Lindebergs
argument, it is time to introduce Fourier techniques and begin to see how they
facilitate reasoning involving independent random variables.
2.3.1. The Fourier Transform. The Fourier transform of finite, C-valued,
Borel measure on RN is the function
: RN C given by
Z
h
 i
(2.3.1)

() =
exp 1 , x RN (dx) for x RN .
RN

When is a probability measure which is the distribution of an RN -valued random variable X, probabilists usual call its Fourier transform the characteristic
function of X, and when admits a density with respect to Lebesgue measure
RN , one uses
Z
h
 i
(2.3.2)
()

=
exp 1 , x RN (x) dx for RN
RN

in place of
to denote its Fourier transform.
Obviously,
is a continuous function that is bounded by the total variation

kkvar of ; and only slightly less obvious1 is the fact that, for Cc RN ; C ,
C RN ; C and that as well as all its derivatives are rapidly decreasing
1
(i.e., they tend to 0 at infinity faster than 1 + ||2
to any power).

() = ( 1) ()

and concludes that


One uses integration byP
parts to check that d
k
||n |()|

is bounded by
k
.
1
N
L (R )
kk=n
1

2.3 Some Extensions of The Central Limit Theorem

83

Lemma 2.3.3. Let be a finite,


measure on RN . Then, for
 C-valued Borel
N
1
N
1
N
every Cb R ; C L R ; C with L (R ; C),
Z
Z
1
() d.
()

(2.3.4)
h, i =
d =
(2)N RN
RN

Moreover, given a sequence {n : n Z+ } of Borel probability measures and


a Borel probability measure on RN , 
cn
uniformly on compacts if
h, n i h, i for every Cc RN , R . Conversely, if
cn ()
() pointwise, then hn , n i h, i whenever {n : n 1} is a uniformly bounded
sequence in Cb (RN ; C) that tends to uniformly on compacts. (Cf. Theorem
3.1.8 for more refined information on this subject.)

R
Proof: Choose Cc RN ; [0, ) to be an even function that satisfies RN dx =
1, and set  (x) = N (1 x) for  (0, ). Next, define  for  (0, ) to
be the convolution  ? of  with . That is,
Z
 (x) =
 (x y) (dy) for x RN .
RN


It is then easy to check that  Cb RN ; C and k kL1 (RN ;R) kkvar for every

 (0, ). In addition, one sees


Fubinis
().
 (by
 Theorem) that  () = ( )
N
1
N
Thus, for any Cb (R ; C L R ; C , Fubinis Theorem followed by the
classical Parseval Identity (cf. Exercise 2.3.23) yields
Z
Z
1
( ) ()

() d,
h , i =
(x)  (x) dx =
(2)N RN
RN

where   ? is the convolution of  with . Since, as  & 0, 


while ( ) 1 boundedly and pointwise, (2.3.4) now follows from Lebesgues
Dominated Convergence Theorem.
Turning to the second part of the theorem, first suppose that h, n i h, i
for every Cc (RN ; R), and
let n in C. Then,
by the last part of Lemma

cn (n )
().
2.1.7 applied to n (x) = e 1 (n ,x)RN and (x) = e 1 (,x)RN ,
Hence,
cn
uniformly on compacts. Conversely, suppose that
cn

pointwise. Again by Lemma


2.1.7,
we
need
only
check
that
h,

h,
i
n

when Cc RN ; C . But, for such a , is smooth and rapidly decreasing,
and therefore the result follows immediately from the first part of the present
lemma together with Lebesgues Dominated Convergence Theorem. 

Remark 2.3.5. Although it may seem too obvious to mention, an important,


and rather amazing, consequence of Lemma 2.3.3 is that a finite Borel measure
on RN is completely determined by its 1-dimensional marginals. To understand
this remark, recall that for a linear subspace L of RN , the marginal distribution of on L is the measure (L ) , where L denotes orthogonal projection

84

2 The Central Limit Theorem

onto L. In particular, if e SN 1 and e is the marginal distribution of on the


1-dimensional subspace spanned by e, then
(e) =
ce (). Hence, the Fourier
transform of is determined by the Fourier transforms of {e : e SN 1 }, and
therefore, by Lemma 2.3.3, can be recovered from its 1-dimensional marginals.
Of course, one should be careful when applying this observation. For instance,
when applied to an RN -valued random variable X = (X1 , . . . , XN ), it says that
the distribution of X can be recovered from a knowledge of the distributions
of (e, X)RN for all e SN 1 , but it does not say that the distributions of the
coordinates Xi , 1 i N , determine the distribution of X.
2.3.2. Multidimensional Central Limit Theorem. The great virtue of
the Fourier transform is that it behaves so well under operations built out of
translation. In applications to probability theory, this virtue is of particular
importance when adding independent random variables. Specifically, if X and
Y are independent, then the characteristic function of X + Y is the product
of the characteristic functions of X and Y. This observation, combined with
Lemma 2.3.3 leads to the following easy proof of The Central Limit Theorem for
independent, identically distributed, R-valued random variables {Xn : n 1}
with mean value 0 and variance 1. Namely, if n is the distribution of Sn , then

  n 
 n
2
2
e 2 = d
+ o n1
= 1

n () =
n
0,1 ()
2n

for every R.
Actually, as we are about to see, a slight variation on the preceding will allow
us to lift the results that we already know for R-valued random variables to random variables with values in RN . However, before I can state this result, I must
introduce the analogs of the mean value and variance for vector-valued random
variables. Thus, given an RN -valued random variable X on the probability space
(, F, P) with |X| L1 (P; R), the mean value EP [X] of X is the m RN that
is determined by the property that

 
(, m)RN = EP , X RN
for all RN .
Similarly, if |X| is P-square integrable, then the covariance cov(X) of X is the
symmetric linear transformation C on RN determined by



 
, C RN = EP , X EP [X] RN , X EP [X] RN
for , RN .
Notice that cov(X) is not only symmetric
but is also non-negative definite,

since for each RN , , cov(X) RN is nothing but the variance of (, X)RN .
Finally, given m RN and a symmetric, non-negative C RN RN , I will use
m,C to denote the Borel probability measure on RN determined by the property
that
Z
Z
 N
1
(dy), Cb (RN ; R),
(2.3.6)
dm,C =
m + C 2 y 0,1
RN

RN

2.3 Some Extensions of The Central Limit Theorem

85

where C 2 is the non-negative definite, symmetric square root of C


Clearly, an RN -valued random variable Y has distribution m,C if and only
if, for each RN , (, Y)RN is a normal random variable with mean value
(, m)RN and variance (, C )RN . For this reason, m,C is called the normal
or Gaussian distribution with mean value m and covariance C. For the same
reason, a random variable with m,C as its distribution is called a normal or
Gaussian random variable with mean value m and covariance C, or, more
briefly, an N (m, C)-random variable. Finally, one can use this characterization
to see that
h
 i
1
,
C
.
1
,
m)

(2.3.7)
[
()
=
exp
m,C
2
RN

In the following statements, I will be assuming that {Xn : n Z+ } is a


sequence of mutually independent, square P-integrable, RN -valued random variables on the probability space (, F, P ). Further, I will assume that, for each
n Z+ , Xn has mean value 0 and strictly positive definite covariance cov(Xn ).
Finally, for n Z+ , set
Sn =

n
X

Xm ,

Cn cov(Sn ) =

m=1

n
X

cov(Xm ),

m=1

 1
n = Sn .
n = det(Cn ) 2N and S
n

n is consistent
Notice that when N = 1, the above use of the notation n and S
with that in 2.1.1.
With these preparations, I am ready to prove the following multidimensional
generalization of Theorem 2.1.8.
Theorem 2.3.8. Referring to the preceding, assume that the limit
A lim

(2.3.9)

Cn
2n

exists and that


(2.3.10)

n

1 X P
E |Xm |2 , |Xm | n = 0 for each  > 0.
2
n n
m=1

lim

Then, for every sequence {n : n 1} C(RN ; C) that satisfies


(2.3.11)

sup sup
n1

yRN

|n (y)|
<
1 + |y|2

86

2 The Central Limit Theorem

and converges uniformly on compacts to ,




n = h, 0,A i.
lim EP n S

(2.3.12)

In particular, when the Xn s are uniformly square P-integrable random variables


with mean value 0 and common covariance C,


 
Sn
= h, 0,C i
lim EP n
n
n
whenever {n : n 1} C(RN ; C) satisfies (2.3.11) and converges to uniformly on compacts.
Proof: Given e SN 1 , set
n (e) =


e, Cn e RN

and n (e) =

n (e)
.
n

p
Then, (e) inf n1 n (e) (0, 1] and n (e) (e, Ae)RN as n . In
particular, if (e1 , . . . , eN ) is an orthonormal basis in RN , then
N
N

 X

  X
n |2 =
n 2N =
EP |S
EP ei , S
n (ei )2
R
i=1

N
X
i=1

i=1

ei , Aei


RN

|y|2 0,A (dy).

RN

Hence, by Lemmas 2.1.7 and 2.3.3 plus (2.3.7), all that we have to do is check
that
i
h
1

(*)
fn () EP e 1 (,Sn )RN e 2 (,A)RN

for each RN .
When = 0, (*) is trivial. Thus, assume that 6= 0, set e =
(e,Sn )RN
. Because
Sn (e) =

|| ,

and take

n (e)

n
X

2


1
EP e, Xm RN , e, Xm RN n (e)
2
n (e) m=1

n
X

2


1
EP e, Xm RN , e, Xm RN (e)n (e)
2
2
(e) n m=1

2.3 Some Extensions of The Central Limit Theorem

87

tends to 0 for each  > 0, Theorem 2.1.8 combined with Lemma 2.3.3 guarantees
that, for any R,


2
1

EP e 1 n Sn (e) e 2 ||
p
for any {n : n 1} R that tends to . In particular, if = (, A)RN and
n = n (e)||, we find that


1

fn () = EP e 1 n Sn (e) e 2 (,A)RN .

When C is non-degenerate, the final part is a trivial application of the initial


part. When C is degenerate but not zero, one can reduce to the non-degenerate
case by projecting onto the span of its eigenvectors with strictly positive eigenvalues, and when C = 0, there is nothing to do. 
2.3.3. Higher Moments. In this subsection I will show that when the Xn s
possess higher moments, then (2.1.1) remains true for s that can grow faster
than 1 + |y|2 . As an initial step in this direction, I give the following simple
example.
Lemma 2.3.13. Suppose that {Xn : n 1} is a sequence of independent,
identically distributed random variables with mean value 0 and variance 1. If
EP [X12` ] < for some ` Z+ , then (2.1.1) holds for any C(RN ; C) that
satisfies
(2.3.14)

sup
yR

|(y)|
< .
1 + |y|2`

Proof: Refer to the discussion in the introduction to this chapter, and observe
that the argument there shows that
Z
  (2`)!
P 2`
y 2` 0,1 (dy)
lim E Sn = ` =
n
2 `!
R

whenever the 2`th moment of X1 is finite. Hence the desired conclusion is an


application of the last part of Lemma 2.1.7 with (y) = 1 + |y|2` . 
In most situations one cannot carry out the computations needed to give a
direct proof that the last part of Lemma 2.1.7 applies, and for this reason the
following lemma is often useful.
Lemma 2.3.15. Suppose that {n : n 1} is a sequence of finite (nonnegative) Borel measures on RN , and assume is a finite Borel measure with
the property that h, n i h, i for all Cb (RN ; R). If, for some
C RN ; [0, ) and p (1, ),
(2.3.16)

suph p , n i < ,
n1

then hn , n i h, i whenever {n : n 1} C(RN ; C) is a sequence that


satisfies |n | for all n Z+ and converges to uniformly on compacts.

88

2 The Central Limit Theorem

Proof: By Lemma 2.1.7, all that we have to prove is that h, n i h, i.


For this purpose, note that, under our present hypotheses, Lemma 2.1.7 shows
that h, i limn h, n i < and that h R, n i h R, i h, i
for each R > 0. Thus, it suffices to observe that
Z
suph( R), n i = sup
dn R1p suph p , n i 0
n1

n1

n1

{>R}

as R . 
Knowing Lemma 2.3.15, ones problem is to find conditions under which one
n )] < for an interesting class of non-negative
can show that supn1 EP [(S
s. One such class is provided by the notion of a sub-Gaussian random variable. Given [0, ), an RN -valued random variable X is said to be -subGaussian if


2 ||2
EP e(,X)RN e 2 ,

(2.3.17)

RN .

The origin of this terminology should be clear: if X N (0, 2 ), then equality


holds in (2.3.17) with = .
Lemma 2.3.18. Let X be an RN -valued random variable. If X is a -subGaussian, then EP [X] = 0, Cov(X) 2 I,
2

R
P |X| R 2N e 2N 2 ,

R > 0,

and, for each [0, 1 ),

 N
 2 |X|2 
1 ()2 2 .
EP e 2

 2 |X|2 
< for some (0, ) and EP [X] = 0, then X
Conversely, if A EP e 2

2(1+A)
. In particular, if X is a bounded random
is -sub-Gaussian when =

variable with mean value 0, then X is -sub-Gaussian with 2kXkL (P;RN ) .


Finally, if X1 , . . . , Xn are independent random variables, and,
Pnfor each 1
m n, Xm is m -sub-Gaussian,
then
for
any
a
,
.
.
.
,
a

R,
1
n
m=1 am Xm is
pPn
2
-sub-Gaussian when =
m=1 (am m ) .

Proof: Since the moment generating function of the sum of independent random variables is the product of the moment generating functions of the summands, the final assertion is essentially trivial.
To prove the first assertion, use Lebesgues Dominated Convergence Theorem
to justify
2 t2
 




e 2 1
1
P t(e,X)RN
=0
E (e, X)RN = lim t
E e
1 lim
t&0
t&0
t

2.3 Some Extensions of The Central Limit Theorem

89

and




2 t2


EP et(e,X)RN + EP et(e,X)RN 2
e 2 1
2
= 2
2 lim
E (e, X)RN = lim
t&0
t&0
t2
t2
P

for any e SN 1 . Next, from


P (e, X)RN R) e

tR



 t(e,X) N 
2 t2
R
E e
exp tR +
2
P

for any 0 and e SN 1 , one gets P (e, X)RN R) e


over t 0. Since

R2
2 2

by minimizing



1
P |X| R 2N max P (e, X)| RN N 2 R ,
eSN 1

 2 |X|2 
, use
the estimate for P(|X| R) follows. To get the estimate on EP e 2
Tonellis Theorem to see that
Z
Z
 N


 2 |X|2 
2 ||2
=
EP e(,X)RN 0,2 I (d)
e 2 0,2 I (d) = 1()2 2 .
EP e 2
R

 2 |X|2 
< for some (0, ) and that EP [X]
Now assume that A = EP e 2
= 0. Then
1



||2 P  2 |||X| 
E |X| e
(1 t)EP (, X)2RN et(,X)RN dt 1 +
2
0


||2
A||2
||2 ||22
||2 ||22 P  2 2 |X|2 
4
2 ,

e
1+
1+A 2 e
e E |X| e
1+
2



EP e(,X)RN = 1 +

from which it is clear that X is -sub-Gaussian for the prescribed . In par 2 |X|2 
2 K 2
e 2 for all 0,
ticular, if K = kXkL (P;RN ) (0, ), then EP e 2
and
X has mean value 0, then X is -sub-Gaussian
for =
p so, if, in addition,

t1 (1 + etK 2 ) for all t > 0. Taking t = K 2 , we see that = K 1 + e 2K.


When kXkL (P;RN ) = 0, there is nothing to do. 

By combining Lemmas 2.3.15 and 2.3.18 with Theorem 2.3.8, we get the following.
Theorem 2.3.19. Working in the setting and with the notation in Theorem
2.3.8, assume that, for each n Z+ ,


2
EP e(,Xn )RN en || ,

RN ,

90

2 The Central Limit Theorem

where n (0, ). If
pPn

m=1

sup

2
m

n1

< ,

then (2.3.12) holds for any C(RN ; C) satisfying


2 |y|2

|(y)| Ce 2 , y RN ,

for some C < and 0, 1 . In particular, if the Xn s are identically
 2
2
distributed with covariance C and if EP e |X1 | < for some (0, ),
then, for any C(RN ; C),




1
lim |y|2 log 1 + |(y)| = 0 = lim EP n 2 Sn = h, 0,C i.
n

|y|

Exercises for 2.3


Exercise 2.3.20. Here is a proof of Fellers part of the LindebergFeller
Theorem. Referring to Theorem 2.1.4 and the discussion proceeding it, assume
that, as n ,

2

rn 0 and EP e 1Sn e 2 for all R.

(i) Show that



h Xm i 2 2


m
,
1 EP e 1 n
2
and conclude that, for each R > 0, there is an NR such that

h Xm i 1


for n NR and || R.
max 1 EP e 1 n
1mn
2

P (1)k
for C
(ii) Take the branch of the logarithm
k
given by log = k=1
with |1 | < 1, and check that (1 ) + log |1 |2 for |1 | 12 .
Conclude first that

n
n
X h
i X
h Xm i R2 r2

X
m


n
log EP e 1 n
EP 1 e 1 n +



2
m=1

m=1

for n NR and || R, and then that



n
X
Xm
2
P
0

E 1 cos
n ()
n
2
m=1
uniformly for s in compacts.

Exercises for 2.3

91

(iii) Given  > 0, show that




n
n
X

2 X P 2
Xm
P
E Xm , |Xm | < n
, |Xm | < n
E 1 cos
2
2

n
n m=1
m=1

and that

n
X

2
2
gn ()
2
2



Xm
, |Xm | n 2 .
E 1 cos

n
m=1
P

Finally, combine these and apply (ii) to get limn 2 gn () 2 for all R.

Exercise 2.3.21. It is of some interest to know that the second moment


assumption can be removed from the hypotheses in Exercise 2.1.11 and that
the result there extends to Borel probability measures on RN .R To explain what
I have in mind, first use that exercise to see that if 2 = R x2 (dx) < ,
then = T2 = R N (0, 2 ). What I want to do now is remove the a
priori assumption that R x2 (dx) < . That is, I want to show that, for any
probability measure on R, = T2 N (0, 2 ) for some [0, ).
Since the = direction is obvious,
R and, by the discussion above, the =
direction is already covered when R x2 (dx) < , all that remains is to show
that
Z
(2.3.22)
= T2 =
x2 (dx) < .
R

See Exercise 2.4.33 for an interesting application of this result.


(i) Check (2.3.22) first under the condition that is symmetric (i.e., ()
= () for all BR ). Indeed, if is symmetric, show that
Z

() =
cos(x) (dx), R.
R

At the same time, show that


1
1 
() 2 ,
= T2 =
2 2 =

R.

Conclude from these two that


> 0 everywhere and that
Z

n
n
()2 , n N and R.
cos 2 2 x (dx) =
R

Finally, note that 1 x log x for x (0, 1], apply this to the preceding to
get
Z 


n 
n
(1) < , n N,
2
1 cos 2 2 x (dx) log
R

92
and arrive at

2 The Central Limit Theorem


Z


x2 (dx) 2 log
(1)

after an application of Fatous Lemma.


(ii) To complete the program, let be any solution to = T2 , and define by
ZZ
() =
1 (x y) (dx)(dy).
R2

R
Check that is symmetric and that = T2 . Hence, by (i), R x2 (dx) <
(in fact, is centered
normal). Finally, use this and part (i) of Exercise 1.4.27
R
to deduce that R x2 (dx) < .
(iii) Make the obvious extension of T2 to Borel probability measures on RN .
That is,


ZZ
x+y
(dx)(dy) for BRN .
T2 () =
1
1
22
RN RN

Using the result just proved when N = 1, show that = T2 if and only if
= 0,C for some non-negative definite, symmetric C.
Exercise 2.3.23. In connection with the preceding exercise, define T for
(0, ) and Borel probability measures on RN , so that
ZZ

1
T () =
1 2 (x + y) (dx)(dy), BRN .
RN RN

The problem under consideration here is that of determining for which s there
exist nontrivial (i.e., 6= 0 ) solutions to the fixed point equation = T .
Begin by reducing the problem to the case when N = 1. Next, repeat the initial
argument given in part (ii) of Exercise 2.3.21 to see that there is some solution
if and only if there is one that is symmetric. Assuming that is a non-trivial,
symmetric solution, use the reasoning in part (i) there to see that

Z
if (0, 2)
2
x (dx) =
0 if (2, ).
R
In particular, when (2, ), there are no non-trivial solutions to = T .
(See 3.2.3 for more on this topic.)
Exercise 2.3.24. Return to the setting of Exercise 2.1.13. After noting that,
so long as e Sn1 , the distribution of

x Sn1 n 7 (e, x)Rn R

is independent of e, use Lemma 2.3.3 to prove that the assertion in (2.1.15)


follows as a consequence of the one in (2.1.14).

Exercises for 2.3

93

Exercise 2.3.25. Begin by checking the identity (cf. (1.3.20))




Z
2
s1
s+1
t
ts e 22 dt = 2 2 s+1
2
0

for all (0, ) and s (1, ). Use the preceding to see that, for each
p (0, ),
r


 p
p+1
2p
P
p if X N (0, 2 ).

(2.3.26)
E |X| =
2

The goal of the exercise is to show that the moments of sub-Gaussian random
variable display similar behavior.
(i) Suppose that X is -sub-Gaussian, and show that, for each p (0, ),

p
p


p
p
+1 .
= 2 2 +1
EP |X|p Kp p where Kp p2 2
2
2

(ii) Again suppose that X is -sub-Gaussian, and let 2 be its variance. Show
that
 2+|p2|
+



(1 p
2)
p
EP |X|p K4

for each p (0, ).


Hint: When p 2, the inequality is trivial. To prove it when p < 2, show that,
for any q (1, ),

1 
2qp  1
2 EP |X|p q EP |X| q1 q0 ,

where q 0 =

q
q1

is the H
older conjugate of q.

(iii) Suppose that X1 , . . . , Xn are independent and that, for each 1 m n,


2
Xm is m -sub-Gaussian and has variance m
. Given {a1 , . . . , an } R, set
v
v
u n
u n
n
X
uX
uX
(am m )2 ,
(am m )2 , and B = t
S=
am Xm , = t
m=1

m=1

m=1

and show that, for each p (0, ),


+
(1 p
2)
K4

2+|p2|



B p EP |S|p Kp B p .

In particular, if m = and m = for all 1 m n, then


+
(1 p
2)

K4

 2+|p2|



(A)p EP |S|p Kp (A)p ,

v
u n
uX
a2m .
where A = t
m=1

94

2 The Central Limit Theorem

(iv) The most famous case of the situation discussed in (iii) is when the Xm s are
symmetric Bernoulli (i.e., P(Xm = 1) = 12 ). First use (iii) in Exercise 1.3.17
or direct computation to check that Xm is 1-sub-Gaussian, and then conclude
that
+
(1 p
2)

(2.3.27)

K4

n
X

! p2
a2m

p #
" n
X


P
E
am Xm Kp

n
X

m=1

m=1

! p2
a2m

m=1

for all {a1 , . . . , an } R. This fact is known as Khinchines Inequality.


Exercise 2.3.28. Let X1 , . . . ,P
Xn be independent, symmetric (Exercise 1.4.26)
n
random variables, and set S = 1 Xm . Show that, for each p (0, ) (cf. part
(ii) in Exercise 2.3.25),

+
(1 p
2)

K4

EP

n
X

! p2
! p2
n
h
i
X
2
2
.
EP |S|p Kp EP
Xm
Xm
1

Hint: Refer to the beginning of the proof of Lemma 1.1.6, and let R1 , . . . , Rn be

the Rademacher functions on [0, 1), set Q = [0,1) P on [0, 1) , B[0,1) F ,
and observe that
n
X
7 S()
Xm ()
1

has the same distribution under P as


(t, ) [0, 1) 7 T (t, )

n
X

Rm (t)Xm ()

does under Q. Next, apply Khinchines inequality to see that, for each ,
+
(1 p
2)
K4

n
X
1

! p2
Xm ()2

[0,1)



T (t, ) p dt Kp

n
X

! p2
Xm ()2

and complete the proof by taking the P-integral of this with respect to .
At least when p (1, ), I will show later that this sort of inequality holds
in much greater generality. Specifically, see Burkholders Inequality in Theorem
6.3.6.
Exercise 2.3.29. Suppose that X is an RN -valued Gaussian random variable
with mean value 0 and covariance C.
(i) Show that if A : RN RN is a linear transformation, then AX is an
N (0, ACA> ) random variable, where A> is the adjoint transformation.

Exercises for 2.3

95

(ii) Given a linear subspace L of RN , let FL be the -algebra generated by


{(, X)RN : L}, and take LC to be the subspace of such that (, C)RN =
0 for all L. Show that FL is independent of FLC .
Hint: Show that, because of linearity, it suffices to check that


 



EP e 1(,X)RN e 1(,X)RN = EP e 1(,X)RN EP e 1(,X)RN

for all L and LC .


(iii)
that N = N1 + N2 , where Ni Z+ for i {1, 2}, write RN 3 x =
 Suppose

x(1)
RN1 RN2 , and take L = {x : x(1) = 0(1) }. Show that if is a
x(2)

linear transformation taking RN onto L that satisfies , C RN = 0 for all
RN and L, then > X is independent of (I > )X.
(iv) Write

C=

C(11)
C(21)

C(12)
C(22)


,

where the block structure corresponds to RN = RN1 RN2 , and assume that
C(22) is non-degenerate. Show that the one and only transformation of the
sort in part (iii) is given by

=
and therefore that
> =

0(11)
C1
(22) C(21)

0(11)
0(21)

0(12)
I(22)


,

C(12) C1
(22)
I(22)


.

Hint: Note
 that = 0 if (2) = 0(2) , = if (1) = 0(1) , and that
C(I ) (21) = 0(21) .
(v) Continuing with the assumption that C(22) is non-degenerate, show that

X=

C(12) C1
(22) Y
Y


+

Z
0


,

where Y is an RN2 -valued, N (0, C(22) )-random variable, Z is an RN1 -valued


N (0, B) random variable with B = C(11) C(12) C1
(22) C(21) , and Y is indepenN1
N2
dent of Z. Conclude
R that is
 that, for any
 measurable F : R R
P
bounded below, E F (X(1) , X(2) ) equals
Z
RN2

Z
RN1

F x(1) , x(2) C(12) C1

(22)


x(2) ,B (dx(1) ) 0,C(22) (dx(2) ).

96

2 The Central Limit Theorem

Exercise 2.3.30. Given h L2 (RN ; C), recall that the (n + 2)-fold convolution
h?(n+2) is a bounded continuous function for each n N. Next, assume that
h(x) = h(x) for almost every x RN and that h 0 off of BRN (0, 1). As an
application of part (iii) in Exercise 1.3.22, show that

"

?(n+2)
h
(x) 2khk2 2
L

(|x| 2)+
n
khk
exp

N
1
N
(R ;C)
L (R ;C)
2n

2 #

Hint: Note that h L1 (RN ; C), assume that M khkL1 (RN ;C) > 0, and define
Af = M 1 h ? f for f L2 (RN ; C). Show that A is a self-adjoint contraction on
L2 (RN ; C), check that

h?(n+2) (x) = M n Tx h, An h L2 (RN ;C) ,
where Tx h h( + x), and note that

Tx h, A` h L2 (RN ;C) = 0

if ` |x| 2.

2.4 An Application to Hermite Multipliers


This section does not really belong here and should probably be skipped by those
readers who want to restrict their attention to purely probabilistic matters. On
the other hand, for those who want to see how probability theory interacts
with other branches of mathematical analysis, the present section may come as
something of a revelation.
2.4.1. Hermite Multipliers. The topic of this section will be a class of
linear operators called Hermite multipliers, and what will be discussed are certain
boundedness properties of these operators. The setting is as follows. For n N,
define
(2.4.1)

Hn (x) = (1)n e

x2
2

dn  x2 
e 2 ,
dxn

x R.

Clearly, Hn is an nth order, real, monic (i.e., 1 is the coefficient of the highest
order term) polynomial. Moreover, if we define the raising operator A+ on
C 1 (R; C) by




x2
d
d  x2
e 2 (x) = (x) + x(x),
A+ (x) = e 2
dx
dx

then
(2.4.2)

Hn+1 = A+ Hn

for all n N.

x R,

2.4 An Application to Hermite Multipliers

97

At the same time, if and are continuously differentiable functions whose first
derivatives are tempered (i.e., have at most polynomial growth at infinity), then
(2.4.3)

, A+


L2 (

0,1 ;C)

= A ,


L2 (0,1 ;C)

where A is the lowering operator given by A =


(2.4.2) with (2.4.3), we see that, for all 0 m n,

Hm , Hn


L2 (0,1 ;C)

= Hm , An+ H0


L2 (0,1 ;C)

= An Hm , H0

d
dx .

After combining


L2 (0,1 ;C)

= m! m,n ,

where, at the last step, I have used the fact that Hm is a monic mth order
polynomial. Hence, the (normalized) Hermite polynomials

(1)n x2 dn  x2 
Hn (x)
e 2 ,
= e2
H n (x) =
dxn
n!
n!

x R,

form an orthonormal set in L2 (0,1 ; C). (Indeed, they are one choice of the
orthogonal polynomials relative to the Gauss weight.)
Lemma 2.4.4. For each C, set



2
,
H(x; ) = exp x
2

x R.

Then
(2.4.5)

H(x; ) =

X
n
Hn (x),
n!
n=0

x R,

where the convergence is both uniform on compact subsets of R C and, for s


in compact subsets of C, uniform in L2 (0,1 ; C). In particular, H n : n N is
an orthonormal basis in L2 (0,1 ; C).
x2

Proof: By (2.4.1) and Taylors expansion for the function e 2 , it is clear that
(2.4.5) holds for each (x, ) and that the convergence is uniform on compact
subsets of R C. Furthermore, because the Hn s are orthogonal, the asserted
uniform convergence in L2 (0,1 ; C) comes down to checking that

lim

n 2
X

Hn k2 2
L (0,1 ;C) = 0

||R n=m n!

sup

for every R (0, ), and obviously this follows from our earlier calculation that
2
Hn 2
= n!.
L ( ;C)
0,1

98

2 The Central Limit Theorem



To prove the assertion that H n : n N forms an orthonormal basis in
L2 (0,1 ; C), it suffices to check that any L2 (0,1 ; C) that is orthogonal to all
of the Hn s must be 0. But, because of the L2 (0,1 ; C) convergence in (2.4.5),
we would have that
Z
(x) ex 0,1 (dx) = 0, C,
R

for such a . Hence, if


(x) =

x2
2

(x)
,
2

x R,

then kkL1 (R;C) = kkL1 (0,1 ;C) kkL2 (0,1 ;C) < and (cf. (2.3.2)) 0,
which, by the L1 (R; C) Fourier inversion formula
Z

1
d &0
in L1 (R; C),
e|| e 1 x ()
2 R

means that and therefore vanish Lebesgue-almost everywhere.





Now that we know H n : n N is an orthonormal basis, I can uniquely
determine a normal operator H for each C by specifying that

H Hn = n Hn

for each n N.

The operator H is called the Hermite multiplier with parameter , and


clearly
)
(

X




2
Dom H = L2 (0,1 ; C) :
||2n , H n L2 (0,1 ;C) <
n=1

H =

n , H N


L2 (0,1 ;C)

H n,


Dom H .

n=0

In particular, H is a contraction if and only if is an element of the closed unit


disk D in C, and it is unitary precisely when S1 D. Also, the adjoint of
H is H , and so it is self-adjoint if and only if R.
As we are about to see, there are special choices of for which the corresponding Hermite multiplier has interesting alternative interpretations and unexpected
additional properties. For example, consider the Mehler kernel1
"
2
2 #
x 2xy + y
1

exp
M (x, y; ) =
2 1 2
1 2
1

This kernel appears in the 1866 article by Mehler referred to in the footnote following (2.1.14).

.
It arises there as the generating function for spherical harmonics on the sphere S

2.4 An Application to Hermite Multipliers

99

for (0, 1) and x, y R. By a straightforward Gaussian computation (i.e.,


complete the square in the exponential) one can easily check that
Z
H(y; ) M (x, y; ) 0,1 (dy) = H(x; )
R

for all (0, 1) and (x, ) R C. In conjunction with (2.4.5), this means that
Z
(2.4.6)

H =

M ( , y; ) (y) 0,1 (dy),

(0, 1) and L2 (0,1 ; C),

and from here it is not very difficult to prove the following properties of H for
(0, 1).
Lemma 2.4.7. For each L2 (0,1 ; C), (, x) (0, 1) R 7 H (x)
C may be chosen to be a continuous function that is non-negative if 0
Lebesgue-almost everywhere. In addition, for each (0, 1) and every p
[1, ],


H p
L (

(2.4.8)

0,1 ;C)

kkLp (0,1 ;C) .

Proof: The first assertions are immediate consequences of the representation


in (2.4.6). To prove the second assertion, observe that H 1 = 1 and therefore,
as a special case of (2.4.6),
Z
M (x, y; ) 0,1 (dy) = 1

for all (0, 1) and x R.

Hence, by (2.4.6) and Jensens Inequality, for any p [1, ),




H (x) p

At the same time, by symmetry,


R, and therefore

M (x, y; ) |(y)|p 0,1 (dy).



H (x) p 0,1 (dx)

ZZ

M (x, y; ) 0,1 (dx) = 1 for all (, y) (0, 1)

M (x, y; ) |(y)| 0,1 (dx)0,1 (dy) =

||p d0,1 .

R
RR

Hence, (2.4.8) is now proved for p [1, ). The case when p = is even easier
and is left to the reader. 
The conclusions drawn in Lemma 2.4.7 from the Mehler representation in
(2.4.6) are interesting but not very deep (cf. Exercise 2.4.36). A deeper fact is

100

2 The Central Limit Theorem

the relationship between Hermite multipliers and the Fourier transform. For the
purposes of this analysis, it is best to define the Fourier operator F by
Z
 
(2.4.9)
Ff () =
e 1 2x f (x) dx, R,
R

for f L (R; C). The advantage


of this choice is that, without the introduction

of any further factors of 2, the Parseval Identity (cf. Exercise 2.4.37) becomes
the statement that F determines a unitary operator on L2 (R; C). In order to
relate F to Hermite multipliers, observe that, after analytically continuing the
result of another simple Gaussian computation,
Z
2
2
ex ex dx = e 4 for all C,
R

we see from (2.4.5) that


Z

X
p

2
n
2p x ex dx
e 1 2x Hn
n! R
n=0

n
p
p

2 X
(p 1)2
2p0 ,
pn Hn
+ 1 2p = e
=e
exp
n!
2
n=0

1
p
is the H
older conjugate of p and p 1 (p 1) 2 . Thus,
where p0 = p1
we have now proved that, for each p (1, ) and n N,
Z
p
p


2
2
2p0 x e .
2p x ex dx = pn Hn
(2.4.10)
e 1 2x Hn


In particular, when p = 2, (2.4.10) says that


n
1 hn ,
(2.4.11)
Fhn =

n N,

where hn is the nth (un-normalized) Hermite function given by


2
1 
(2.4.12)
hn (x) = Hn (4) 2 x ex , n N and x R.

More generally, (2.4.10) leads to the following relationship between F and


Hermite multipliers. Namely, for each p (1, ), define Up on Lp (0,1 ; C) by


2
1
1 
Up (x) = p 2p (2p) 2 x ex , x R.

It is then an easy matter to check that Up is an isometric surjection from


Lp (0,1 ; C) onto Lp (R; C). In addition, (2.4.10) can now be interpreted as the
statement that, for every p (1, ) and every polynomial ,
! 12
1
pp
.
(2.4.13)
F Up = Ap Up0 Hp where Ap
1
(p0 ) p0

See Exercise 2.4.35, where it is shown that Ap < 1 for p (0, 1).

2.4 An Application to Hermite Multipliers

101

2.4.2. Beckners Theorem. Having completed this brief introduction to


Hermite multipliers, I will now address a problem to which The Central Limit
Theorem has something to contribute. The problem is that of determining the
set of (, p, q) D (1, ) (0, ) with p q for which H determines a
contraction from Lp (0,1 ; C) into Lq (0,1 ; C). In view of the preceding discussion,
when (0, 1), a solution to this problem has implications for the Mehler
transform; and, when q = p0 , the solution tells us about the Fourier operator.
The role that The Central Limit Theorem plays in this analysis is hidden in the
following beautiful criterion, which was first discovered by Wm. Beckner.2
Theorem 2.4.14 (Beckner). Let D and 1 p q < be given. Then
(2.4.15)



H q
L (

0,1 ;C)

kkLp (0,1 ;C)

for all

L2 (0,1 ; C)

if

(2.4.16)

|1 |q + |1 + |q
2

 q1

|1 |p + |1 + |p
2

 p1

for every C.
That (2.4.16) implies (2.4.15) is trivial is quite remarkable. Indeed, it takes
a problem in infinite dimensional analysis and reduces it to a calculus question
about functions on the complex plane. Even though, as we will see later, this
reduction leads to highly non-trivial problems in calculus, Theorem 2.4.14 has
to be considered a major step toward understanding the contraction properties
of Hermite multipliers.3
The first step in the proof of Theorem 2.4.14 is to interpret (2.4.16) in operator theoretic language. For this
the standard Bernoulli
 purpose, let denote

probability measure on R, BR . That is, {1} = 12 . Next, use to denote
the function on R that is constantly equal to 1 and {1} to stand for the identity function on R (i.e., {1} (x) = x, x R). It is then clear that and
{1} constitute an orthonormal basis in L2 (; C); in fact, they are the orthogonal polynomials there. Hence, for each C, we can define the Bernoulli
multiplier K as the unique normal operator on L2 (; C) prescribed by


K F =
2

if F =

{1}

if F = {1}.

See Beckners Inequalities in Fourier analysis, Ann. Math., # 102 #1, pp. 159182 (1975).
Later, in his article Gaussian kernels have only Gaussian maximizers, Invent. Math. 12,
pp. 179208 (1990), E. Lieb essentially killed this line of research. His argument, which is
entirely different from the one discussed here, handles not only the Hermite multipliers but
essentially every operator whose kernel can be represented as the exponential of a second order
polynomial.
3

102

2 The Central Limit Theorem

Furthermore, (2.4.16) is equivalent to the statement that




K q
(2.4.17)
kkLp (,C) for all L2 (; C).
L (;C)

Indeed, it is obvious that (2.4.16) is equivalent to (2.4.17) restricted to s of


the form x R 7 1 + x as runs over C; and from this, together with
the observation that every element of L2 (; C) can be represented in the form
a + b{1} as (a, b) runs over C2 , one quickly concludes that (2.4.16) implies
(2.4.17) for general L2 (; C).
I next want to show that (2.4.17) can be parlayed into a seemingly more
general statement. To this end, define the n-fold tensor product operator Kn
on L2 ( n ; C) as follows. For F {1, . . . , n} set F 1 if F = and define
Y

F (x) =
{1} (xj ) for x = x1 , . . . , xn Rn
jF



if F 6= . Note that F : F {1, . . . , n} is an orthonormal basis for L2 ( n ; C),
and define Kn to be the unique normal operator on L2 ( n ; C) for which
Kn F = |F | F ,

(2.4.18)

F {1, . . . , n},

where |F | is used here to denote the number of elements in the set F . Alternatively, one can describe Kninductively on n Z+ by saying that K1 = K
and that, for C Rn+1 ; C and (x, y) Rn R,
 (n+1) 




K
(x, y) = K (x, ) (y) where (x, y) = Kn ( , y) (x).
It is this alternative description that makes it easiest to see the extension
of (2.4.17) alluded to above. Namely, what I will now show is that, for every
n Z+ ,


(2.4.19)
(2.4.17) = Kn q n kkLp ( n ;C) , L2 ( n ; C).

L ( ;C)

Obviously, there is nothing to do when n = 1. Next, assume (2.4.19) for n,


let C Rn+1 ; C be given, and define as in the second description of
(n+1)
K
. Then, by (2.4.17) applied to (x, ) for each x Rn and by the
induction hypothesis applied to ( , y) for each y R, we have that

Z Z
(n+1) q
 q

K
q n+1 =

K
(x,

)
(y)
(dy)
n (dx)

L (
;C)
Rn

Z

 pq

|(x, y)| (dy)


Rn



|( , y)|p

Rn

Z

Z

(dx) =

n

Rn

Z

L p ( n ;C)

pq

p

( , y) (dy)
q

L p ( n ;C)

Z
 pq
 pq

p


=
( , y) Lq ( n ;C) (dy)
(dy)
R

 pq


( , y) p p n (dy)
= kkqLp ( n+1 ;C) ,
L ( ;C)

2.4 An Application to Hermite Multipliers

103

where, in the passage to the third line, I have used the continuous form of
Minkowskis Inequality (it is at this point that the only essential use of the
hypothesis p q is made).
I am now ready to take the main step in the proof of Theorem 2.4.14.
Lemma 2.4.20. Define An : L2 (; C) L2 n ; C) by


An (x) =


 Pn
`=1 x`

for x Rn .

Then, for every pair of tempered and from C(R; C),




(2.4.21)
kkLp (0,1 ;C) = lim An Lp ( n ;C) for every p [1, )
n

and
(2.4.22)

H ,


L2 (0,1 ;C)



= lim Kn An , An
n

L2 ( n ;C)

for every (0, 1). Moreover, if, in addition, either or is a polynomial, then
(2.4.22) continues to hold for all C.
Proof: Let and be tempered elements of C(R; C), and define
fn () = Kn An , An


L2 ( n ;C)

and f () = H ,


L2 (0,1 ;C)

for n Z+ and C. I begin by showing that


(2.4.23)

(0, 1).

lim fn () = f (),

Notice that (2.4.23) is (2.4.22) for (0, 1) and that In (2.4.21) follows from
(2.4.22) with = 1, = ||p , and any (0, 1).
In order to prove (2.4.23), I will need to introduce other expressions for f ()
and the fn ()s. To this end, set

C =


,

and, using (2.4.6), observe (cf. (2.3.6)) that


Z
f () =
(x) (y) 0,C (dx dy).
R2

Next, let, for each x R\{0},


define k (x, ) to be the probability measure on R

such that k x, {sgnx} = 1
2 , and set k (0, ) = . Then it is easy to check

104

2 The Central Limit Theorem

R
R
that R {0} (y) k (1, dy) =R {0} (1) and R {1} (y) k (1, dy) = {1} (1)
and therefore K (1) = R (y) k (1, dy) for all . Hence, if be the
probability measure on R2 determined by (dx dy) = k (x, dy) (dx) or,
equivalently,


and {(1, 1)} = 1
{(1, 1)} = 1+
4 ,
4

then
K ,


L2 (;C)

(x) (y) (dx dy).

=
R2

Proceeding by induction, it follows that


Z
Z


n
K , 2
=

(x) (y) (dx1 dy1 ) (dxn dyn )


L (;C)

R2

R2

for all , C(Rn ; C). Hence, if (cf. Exercise 1.1.14) = R2


Z+
and P =
, then
fn () = E

Z+

, F = B ,


  Pn
1 Zm

,
F
n

where F (z) (x) (y) for z = (x, y) R2 and Zn () = zn , n Z+ , when


= (z1 , . . . , zn , . . . ) . Further, under P , the Zn s are mutually independent,
identically distributed R2 -valued random variables with mean value 0 and covariance C . In addition, Z1 is bounded, and therefore the last part of Theorem
2.3.19 applies and guarantees that (2.4.23) holds.
To complete the proof, suppose that is a polynomial of degree k. It is then
an easy matter to check that

An , F L2 ( n ;C) = 0 if |F | > k,

and therefore (cf. (2.4.18)) C 7 fn () C is also a polynomial of degree


no more than k. Moreover, because




X |F |



fn () =

An , F L2 ( n ;C) F , An L2 ( n ;C) ,


F

we also know that








fn () || 1 k An 2 n An 2 n ,
L ( ;C)
L ( ;C)

n Z+ and C.

Hence, because of (2.4.21) with p = 2, {fn : n Z+ } is a family of entire


functions on C that are uniformly bounded on compact subsets. At the same

2.4 An Application to Hermite Multipliers

105

time, because (, Hm )L2 (0,1 ;C) = 0 for m > k, f is also a polynomial of degree
at most k, and therefore (2.4.23) already implies that the convergence extends
to the whole of C and is uniform on compacts. Finally, in the case when ,
instead of , is a polynomial, simply note that


Kn An , An

and H ,


L2 (

0,1 ;C)


L2 ( n ;C)

= H,



= Kn
An , An


L2 (0,1 ;C)

L2 ( n ;C)

, and apply the preceding.

Proof of Theorem 2.4.14: Assume that (2.4.16) holds for a given pair 1 <
p q < and D. We then know that (2.4.19) holds for every n Z+ .
Hence, by Lemma 2.4.20, if and are tempered elements of C(R; C) and at
least one of them is a polynomial, then










H , L2 (0,1 ;C) = lim Kn An , An 2 n
n

L ( ;C)





lim An Lp ( n ;C) An Lq0 ( n ;C) = kkLp (0,1 ;C) kkLq0 (0,1 ;C) .
n

In other words, we now know that, for all tempered and from C(R; C),





(2.4.24)
H
,


kkLp (0,1 ;C) kkLq0 (0,1 ;C)
L2 (0,1 ;C)
so long as one or the other is a polynomial.
To complete the proof when p (1, 2], note that, for any fixed polynomial
, (2.4.24) for every tempered C(R; C) guarantees that the inequality in
(2.4.15) holds for that . At the same time, because p (1, 2] and the polynomials are dense in L2 (0,1 ; C), (2.4.15) follows immediately from its own restriction
to polynomials.
Finally, assume that p [2, ) and therefore that q 0 (1, 2]. Then, again
because the polynomials are dense in L2 (0,1 ; C), (2.4.24) for a fixed tempered
C(R; C) and all polynomials implies (2.4.15) first for all tempered continuous s and thence for all L2 (0,1 ; C). 
2.4.3. Applications of Beckners Theorem. I will now apply Theorem
2.4.14 to two important examples. The first example involves the case when
(0, 1) and shows that the contraction property proved in Lemma 2.4.7 can
be improved to say that, for each p (1, ) and (0, 1), there is a q =
q(p, ) (p, ) such that H is a contraction on Lp (0,1 ; C) into Lq (0,1 ; C).
Such an operator is said to be hypercontractive, and the fact that H is
hypercontractive was first proved by E. Nelson in connection with his renowned
construction of a non-trivial, two-dimensional quantum field.4 The proof that
4

Nelsons own proof appeared in his The free Markov field, J. Fnal. Anal. 12, pp. 1221
(1974).

106

2 The Central Limit Theorem

I will give is entirely different from Nelsons and is much closer to the ideas
introduced by L. Gross5 as they were developed by Beckner.
Theorem 2.4.25 (Nelson). Let (0, 1) and p (1, ) be given, and set
q(p, ) = 1 +

p1
.
2

Then


H q
kkLp (0,1 ;C) ,
L (0,1 ;C)

(2.4.26)

L2 (0,1 ; C),

for every 1 q q(p, ). Moreover, if q > q(p, ), then


n

sup H Lq (

(2.4.27)

0,1 ;C)

o
: kkLp (0,1 ;C) = 1 = .

Proof: I will leave the proof of (2.4.27) as an exercise. (Try taking s of


2
the form ex .) Also, because 0,1 is a probability measure and therefore the
left-hand side of (2.4.26) is non-decreasing as a function of q, I will restrict my
attention to the proof of (2.4.26) for q = q(p, ). Hence, by Theorem 2.4.14, what
I have to do is prove (2.4.16) for every 1 < p < q < and (0, 1) that are
related by

(2.4.28)

p1
q1

 12

I begin with the case when 1 < p < q 2, and I will first consider [0, 1).
Introducing the generalized binomial coefficients
 
r
r(r 1) (r ` + 1)

`!
`

one can write

for r R and ` N,



X
q
|1 |q + |1 + |q
=1+
()2k
2
2k
k=1

and



X
p
|1 |p + |1 + |p
=1+
2k .
2
2k
k=1

See Grosss Logarithmic Sobolev inequalities, Amer. J. Math. 97 #4, pp. 10611083
(1975). In this paper, Gross introduced the idea of proving estimates on H from the corresponding estimates for K . In this connection, have a look at Exercises 2.4.39 and 2.4.41.

2.4 An Application to Hermite Multipliers

107


q
Noting that, because q 2, 2k
0 for every k Z+ , and using the fact that,
p
because pq (0, 1), (1 + x) q 1 + pq x for all x 0, we see that

|1 |q + |1 + |q
2

 pq



pX q
()2k .
1+
q
2k
k=1

Hence, I will have completed the case under consideration once I check that





X
p
pX q
2k
()
2k ,
q
2k
2k
k=1

k=1

and clearly this will follow if I show that

 
 
p
p q
2k

q 2k
2k

for each k Z+ .

But the choice of in (2.4.28) makes the preceding an equality when k = 1, and,
when k 2,
 2k
p q
2k1
Y jq
q 2k

1,

p
jp
2k
j=2

since 1 < p < q 2.


At this point, I have proved (2.4.15) for 1 < p < q 2 and given by (2.4.28)
when (0, 1). Continuing with this choice of p, q, and , note that (2.4.15)
extends immediately to [1, 1] by continuity and symmetry. Finally, for
general C, set
a=

|1 | + |1 + |
,
2

b=

|1 | |1 + |
,
2

and c =

b
[1, 1].
a

Then

|1 | = 1+
2 (1 ) +

1
2 (1


) a b,

and, therefore, by the preceding applied to c, we have that

1

1
|1 c|q + |1 + c|q q
|1 |q + |1 + |q q
a
2
2
1
1

1



|1 |p + |1 + |p p
|a b|p + |a + b|p p
|1 c|p + |1 + c|p p
.
=
=
a
2
2
2

Hence, I have now completed the case when 1 < p < q 2 and is given by
(2.4.28).

108

2 The Central Limit Theorem

To handle the other cases, I will use the equivalence of (2.4.16) and (2.4.17).
Thus, what we already know is that (2.4.17) holds for 1 < p < q 2 and the
in (2.4.28). Next, suppose that 2 p < q < . Then, since 1 < q 0 < p0 2 and

q0 1
p1
,
= 0
p 1
q1

an application to q 0 and p0 of the result that we already have yields







K q
= sup K , L2 (;C) : L2 (; C) with kkLq0 () = 1
L (;C)



= sup , K 2
: L2 (; C) with kkLq0 () = 1
L (;C)

kkLp (;C) ,
where the is the one given in (2.4.28). Thus, the only case that remains is the
1
1
one when 1 < p 2 q < . But, in this case, set = (p 1) 2 , = (q 1) 2 ,
and observe that, because the associated in (2.4.28) is the product of with
, K = K K and therefore





K q
K L2 (;C) kkLp (;C) . 
L (;C)
As my second, and final, application of Theorem 2.4.14, I present the theorem
of Beckner for which he concocted Theorem 2.4.14 in the first place. The result
was
originally by H. Weyl, who guessed, on the basis of Fh0 =
conjectured
0
n
( 1) h0 , that the norm kFkpp0 of F as an operator on Lp (R; C) to Lp (R; C)
should be achieved by h0 . Weyls conjecture was partially verified by I. Babenko,
who proved it when p0 is an even integer. In particular, when combined with
the RieszThorin Interpolation Theorem, Babenkos result already shows (cf.
Exercise 2.4.35) that kFkpp0 < 1 for p (0, 1).

Theorem 2.4.29 (Beckner). For each p [1, 2],


(2.4.30)

kFf kLp0 (R;C) Ap kf kLp (R;C) ,

f Lp (R; C) L2 (R; C),

where F is the Fourier operator in (2.4.9), A1 = 1, and Ap is the constant in


2
(2.4.13). Moreover, if f is the Gauss kernel ex , then (2.4.30) is an equality.
Proof: Because of (2.4.11), the second part is a straightforward computation
that I leave to the reader. Also, I will only consider (2.4.30) when p (1, 2), the
other cases being well known (cf. Exercise 2.4.37).
Because of (2.4.13), the proof of (2.4.30) comes down to showing that


Hp p0
(2.4.31)
kkLp (0,1 ;C) , Lp (0,1 ; C),
L ( ;C)
0,1

2.4 An Application to Hermite Multipliers


where p =

109

1 (p 1) 2 . Indeed, by (2.4.13), (2.4.31) implies that

kFUp kLp0 (R;C) Ap kkLp (0,1 ;C)

(2.4.32)

for all polynomials . Next, if L2 (0,1 ; C) and {n : n 1} is a sequence of


polynomials which tend to in L2 (0,1 ; C), then, because p (1, 2), it is easy to
check that n in Lp (0,1 ; C) and Up n Up in L2 (R; C); and therefore,
since F is a bounded on L2 (R; C), Fatous Lemma shows that (2.4.32) continues
to hold for all L2 (0,1 ; C). Now let f Cc (R; C), and set = Up1 f . Then,
(2.4.32) implies that (2.4.30) holds for f . Finally, if f L2 (R; C) Lp (R; C),
choose {fn : n 1} Cc (R; C) so that fn f in both L2 (R; C) and Lp (R; C),
and conclude that (2.4.30) continues to hold.
By Theorem 2.4.14, (2.4.31) will follow as soon as I prove (2.4.16) for p . For
this purpose, write

1
= + 1 (p 1) 2 , where , R.

Then, because p0 1 = (p 1)1 , proving (2.4.16) for p becomes the problem


of checking that
1
h
i p20 p0
i p20 h

2
2
2
+ 1 + + (p 1) 2

1 + (p 1)

(*)
h

2

+ (p 1)

i p2

1+

2

+ (p 1)

i p2 p1

for all , R.
To prove (*), consider,
(0, ), the function g : [0, )2 [0, )
 1for each
1 
defined by g (x, y) = x +y . It is an easy matter to check that g is concave
or convex depending on whether [1, ) or (0, 1). In particular, since
p0
2

(1, ), when we set =

2

p0
2,

p0

+ (p 1)

i p20

1+

2

+ (p 1)

i p20

2





g x , y + g x+ , y
x + x+
,y
g
=
2
2

x = |1 |p , and y = (p 1) 2 ||p , we get

|1 |p + |1 + |p
2

! 20
p

p20
+ (p 1) 2

110

2 The Central Limit Theorem

and similarly, because

p
2

2

(0, 1),
0

+ (p 1)

i p2

1+

2

+ (p 1)

i p2

2
"

|1 |p + |1 + |p
2

# p2

 p2

+ (p0 1) 2

Thus, (*) will be proved once I show that


0

(**)

|1 |p + |1 + |p
2

! 20
p

+(p1)

|1 |p + |1 + |p
2

 p2

+(p0 1) 2 .

But because (cf. Theorems 2.4.14 and 2.4.25) we know that (2.4.16) holds with
1
p replaced by 2, q = p0 , and = p 1 2 , the left side of (**) is dominated by
1

(p 1) +

1 (p0 1) 2

2

+ 1 + (p0 1) 2
2

2

= 1 + (p 1) 2 + (p0 1) 2 .

At the same time, again by (2.4.16), only this time with p, 2, and the same
choice of , we see that the right-hand side of (**) dominates
1

(p 1) +

1 (p 1) 2

2

+ 1 + (p 1) 2
2

2

= 1 + (p 1) 2 + (p0 1) 2 .

Exercises for 2.4


2
, let 1 and
Exercise 2.4.33. Define S : R2 R so that S(x1 , x2 ) = x1+x
2
2 be the natural projection maps given by i (x1 , x2 ) = xi for i {1, 2}, and
let R denote Lebesgue measure on R. The goal of this exercise is to prove that
if f : R R is a Borel measurable function with the property that

(2.4.34)

f S =

f 1 + f 2

2R -almost everywhere,

then there is an R such that f (x) = x for R1 -almost every x R. Here


are steps which one can take to prove this result.
(i) After noticing that (2.4.34) holds when R is replaced by 0,1 , apply Exercise
2.3.21 to see that the 0,1 -distribution of x
f (x) is 0, for some [0, ).
Conclude, in particular, that f L2 (0,1 ; R).

Exercises for 2.4

111


(ii) For each n 0, let Z (n) denote span {Hn 1 Hnm 2 : 0 m n} .
S
2
Show that Z (m) Z (n) in L2 (0,1
; R) when m 6= n and the span of n=0 Z (n)
2
2
is dense in L2 (0,1
; R). Conclude from these that if F L2 (0,1
; R), then F =
P
(n)
and the series
n=0 n F , where n denotes orthogonal projection onto Z
convergences in L2 (0,1 ; R).
(iii) Using the generating (2.4.5), show that
n  
X
n
n
Hm 1 Hnm 2 ,
Hn S = 2 2
m
m=0

and use this to conclude that for any L2 (0,1 ; R),



, Hn L2 (0,1 ;R)
Hn S.
n ( S) =
n!

(iv) Show that if L2 (0,1 ; R), then



n

1 + 2

, Hn


=


L2 (0,1 ;R)
1
2

2 n!


Hn 1 + Hn 2 .

(v) By combining (iii) and (iv), show that




f, Hn L2 ( ;R)
f, Hn L2 ( ;R)

0,1
0,1
Hn 1 + Hn 2 .
(*)
Hn S =
1
n!
2 2 n!

From this, show that f, Hn L2 (0,1 ;R) = 0 unless n = 1. When n = 0, this is

obvious. When n 2, one can argue that, if f, Hn L2 (0,1 ;R) 6= 0, then (*)
implies that
Hn0 1 = Hn0 2 , which is possible only if Hn0 is constant. Finally,
P
1
f, Hn L2 (0,1 ;R) Hn , it follows that
since f = n=0 n!

f (x) = f, H1

Z


L2 (0,1 ;R)

H1 (x) =


f () 0,1 (d) x

for 0,1 -almost every x R.


Exercise 2.4.35. Because the Fourier operator F (cf. (2.4.9)) is a contraction
from L1 (R; C) to L (R; C) as well as from L2 (R; C) into L2 (R; C), the Riesz
Thorin Interpolation Theorem guarantees that it is a contraction from Lp (R; C)
0
into Lp (R; C) for each p (0, 1). However, this is a case in which RieszThorin
gives a less than optimal result. Indeed, show that

t 12 , 1 7 log A 1t R

is a strictly convex function that tends to 0 at both end points and is therefore
strictly negative. Hence, Ap < 1 for p (1, 2).

112

2 The Central Limit Theorem

Exercise 2.4.36. The inequality in (2.4.8) is an example of a general principle.


Namely, if (E, B) is any measurable space, then a map (x, ) E B 7
(x, ) [0, 1] is called a transition probability whenever x E 7 (x, )
is B-measurable for each B and B 7 (x, ) is a probability measure
on (E, B) for each x E. Given a transition probability (x, ), define the linear
operator on B(E; C) (the space of bounded, B-measurable : E C) by
Z
 
(x) =
(y) (x, dy), x E, for B(E; C).
E

Check that takes B(E; C) into itself and that kku kku . Next, given a
-finite measure on (E, B), say that is -invariant if
Z
() =
(x, ) (dx) for all B.
E

Using Jensens Inequality, first show that, for each p [1, ),


  p 

(x) ||p (x), x E,
and then that, for any -invariant ,
kkLp (;C) kkLp (;C) ,

B(E; C).

Finally, show that is -invariant if it is -reversing in the sense that


Z
Z


x, 2 (dx) =
y, 1 (dy) for all 1 , 2 B.
1

Exercise 2.4.37. Recall the Hermite functions hn , n N, in (2.4.12) and define


the normalized Hermite functions hn , n N by
1

hn =

24
1

(n!) 2

hn ,

n N.

By noting that (cf. the discussion following (2.4.12)) hn = U2 H n , show that



hn : n N constitutes an orthonormal basis in L2 (R; C), and from this
together with (2.4.11), arrive at Parsevals Identity:

kFf kL2 (R;C) = kf kL2 (R;C) ,

f L1 (R; C) L2 (R; C),

and conclude that F determines a unique unitary operator F on L2 (R; C) such


that Ff = Ff for f L1 (R; C) L2 (R; C). Finally, use this to verify the L2  
 
1
(x) Ff (x), x R, for
where Ff
= F,
Fourier inversion formula F
f L1 (R; C) L2 (R; C).

Exercises for 2.4

113

Exercise 2.4.38. By the same reasoning as I used to prove Theorem 2.4.29,


show that, for any pair 1 < p 2 q < and any complex number =
+ 1 , (2.4.16) and therefore (2.4.15) hold if both (q 1) 2 + 2 1 and



(q 2)()2 1 2 (q 1) 2 (p 1) (q 1)2 2 .

Exercise 2.4.39. L. Gross had a somewhat different approach to the proof of


(2.4.26). As in the proof that I have given, he reduced everything to checking
(2.4.17). However, he did this in a different way. Namely, given b (0, 1), he
set f (x) = 1 + bx and introduced the functions


t
t
ft (x) Ket f (x) = 1+e2 f (x) + 1e2 f (x), (t, x) [0, ) R,

and q(t) = 1 + (p 1)e2t , t [0, ), and proved that



d
ft q(t)
0.
(*)
L
(;C)
dt
Following the steps below, see if you can reproduce Grosss calculation.

(i) Set
F (t) = kft kLq(t) (;C) ,
and, by somewhat tedious but completely elementary differential calculus, show
that

Z
q(t)

F (t)1q(t)
ft
q(t)
dF
d

q(t)

f
log
F
(t)
dt (t) =
q(t)2
R

Z

q(t)2
q(t)1
ft (x)
ft (x) ft (x) (dx) .
+ 2
R

Next, check that


Z

ft (x)q(t)1 ft (x) ft (x) (dx)
R
Z


1
ft (x)q(t)1 ft (x)q(t)1 ft (x) ft (x) (dx),
= 2
R

and, after verifying that


q

q1

q1

4(q 1) 2 2
( )
q2

2

, (0, ) and q (1, ),

conclude that
dF
dt

(**)


Z
q(t)

F (t)1q(t)
ft
q(t)
d

q(t)

f
log
(t)
F (t)
q(t)2
R

Z
q(t)
2

q(t)
2
+ q(t) 1
ft (x) 2 ft (x) (dx) .
R

(ii) Prove the Logarithmic Sobolev Inequality


Z
Z
2

2

2
d 2
(x) (x) (dx)
(2.4.40)
log kk 2
R

for strictly positive s on R.

L (;C)

114

2 The Central Limit Theorem

Hint: Reduce to the case when (x) = 1 + bx for some b (0, 1), and, in this
case, check that (2.4.40) is the elementary calculus inequality
(1 + b)2 log(1 + b) + (1 b)2 log(1 b) (1 + b2 ) log(1 + b2 ) 2b2 ,

b (0, 1).

(iii) By plugging (2.4.40) into (**), arrive at (*), and conclude that (2.4.17)
holds for (0, 1) and q = 1 + p1
2 .

Exercise 2.4.41. The major difference between Grosss and Beckners approaches to proving Nelsons Theorem 2.4.25 is that Gross based his proof on
the equivalence of contraction results like (2.4.17) and (2.4.15) to Logarithmic
Sobolev Inequalities like (2.4.40). In Exercise 2.4.38, I outlined how one passes
from a Logarithmic Sobolev Inequality to a contraction result. The object of this
exercise is to go in the opposite direction. Specifically, starting from (2.4.26),
show that
Z
(2.4.42)

log
R

2

kkL2 (

0,1 ;C)

Z
d0,1 2

|0 |2 0,1 (dx)

for non-negative, continuously differentiable L2 (0,1 ; C) \ {0} with 0


L2 (0,1 ; C). See Exercise 8.4.8 for another derivation.
Exercise 2.4.43. As an application of Theorem 2.4.25, show that
kHn kLp (0,1 ;C)

n!(p 1) for n N and p [2, ).

To see that this estimate is quite good, show that kH1 kpLp (0,1 ;C) =

22
1
2

and apply Stirlings formula (1.3.21) to conclude that kH1 kLp (0,1 ;C)
as p .

p+1
2

 p1

,
1

p1 2
e

Chapter 3
Infinitely Divisible Laws

The results in this chapter are an attempt to answer the following question.
GivenPan RN -valued random variable Y with the property that, for each n Z+ ,
n
Y = m=1 Xm , where X1 , . . . , Xn are independent and identically distributed,
what can one say about the distribution of Y?
Recall that the convolution 1 ? 2 of two finite Borel measures 1 and 2 on
RN is given by
ZZ
1 ? 2 () =
1 (x + y) 1 (dx)2 (dy), BRN ,
RN RN

and that the distribution of the sum of two independent random variables is the
convolution of their distributions. Thus, the analytic statement of our problem
is that of describing those probability measures that, for each n 1, can be
of some probability measure n1 .
written as the n-fold convolution power ?n
1
n

I will say that such a is infinitely divisible and will use I(RN ) to denote
the class of infinitely divisible measures on RN . Since the Fourier transform
takes convolution into ordinary multiplication, the Fourier formulation of this
problem is that of describing those Borel probability measures on RN whose
Fourier transform
has, for each n Z+ , an nth root which is again the Fourier
transform of a Borel probability measure on RN .
Not surprisingly, the Fourier formulation of the problem is, in many ways, the
most amenable to analysis, and it is the formulation in terms of which I will solve
it in this chapter. On the other hand, this formulation has the disadvantage that,
although it yields a quite satisfactory description of
, it leaves the problem
of extracting information about from properties of
. For this reason, the
following chapter will be devoted to developing a probabilistic understanding of
the analytic answer obtained in this chapter.
3.1 Convergence of Measures on RN
In order to carry out our program, I will need two important facts about the
convergence of probability measures on RN . The first of these is a minor modification of the classical HellyBray Theorem, and the second is an improvement,
due to Levy, of Lemma 2.3.3.
115

116

3 Infinitely Divisible Laws

Say that the sequence {n : n 1} M1 (RN ) converges weakly to


M1 (RN ) and write n = when h, n i h, i for all Cb (RN ; C), and
apply Lemma 2.3.3 to check that n = if and only if
cn ()
() for every
N
R .
3.1.1. Sequential Compactness in M1 (RN ). Given a subset S of M1 (RN ),
I will say that S is sequentially relatively compact if, for every sequence
{n : n 1} S, there a subsequence {nm : m 1} and a M1 (RN ) such
that nm = .
Theorem 3.1.1.
and only if

A subset S of M1 (RN ) is sequentially relatively compact if


lim sup B(0, R){ = 0.

(3.1.2)

R S

Proof: I begin by pointing out that there is a countable set {k : k Z+ }


Cc (RN ; R) of linear independent functions whose span is dense, with respect
 to
uniform convergence, in Cc (RN ; R). To see this, choose Cc RN ; [0, 1] so
that = 1 on B(0, 1) and 0 off B(0, 2), and set R (y) = (R1 y) for R >
0. Next, for each ` Z+ , apply the StoneWeierstrass Theorem to choose a
countable dense subset {j,` : j Z+ } of C B(0, 2`); R , and set j,` = ` j,` .
Clearly {j,` : (j, `) (Z+ )2 } is dense in Cc (RN ; R). Finally, using lexicographic
ordering of (Z+ )2 , extract a linearly independent subset {k : k Z+ } by taking
k = jk ,`k , where (j1 , `1 ) = (1, 1) and (jk+1 , `k+1 ) is the first (j, `) such that
j,` is linearly independent of {1 , . . . , k }.
Given a sequence {n : n 1} S, we can use a diagonalization procedure to
find a subsequence {nm : m 1} such that ak = limm hk , nm i exists for
every k Z+ . Next, define the linear functional on the span of {k : k Z+ }
PK
so that (k ) = ak . Notice that if = k=1 k k , then



K
X





k hk , nm i = lim h, nm i kku ,

m
m



() = lim

k=1

and similarly that () = limm h, nm i 0 if 0. Hence, admits a


unique extension as a non-negativity preserving linear functional on Cc (RN ; R)
that satisfies |()| kku for all Cc (RN ; R).
Now assume that (3.1.2) holds. For each ` Z+ , apply the Riesz Representation Theorem to produce a non-negative Borel measure ` supported on B(0, 2`)
so that h, ` i = (` ) for Cc (RN ; R). Since h, `+1 i = () = h, ` i
whenever vanishes off of B(0, `), it is clear that




`+1 B(0, ` + 1) `+1 B(0, `) = ` B(0, `)

for all BRN .

3.1 Convergence of Measures on RN

117

Hence, if


 X

() lim ` B(0, `) =
` B(0, `) \ B(0, ` 1) ,
`

`=1

then is a non-negative Borel measure on RN whose restriction to B(0, `) is `


for each ` Z+ . In particular, (RN ) 1 and h, i = limm h, nm i for
every Cc (RN ; R). Thus, by Lemma 2.1.7, all that remains is to check that
(RN ) = 1. But
(RN ) h` , i = lim h` , nm i lim nm B(0, `)
m
m

= 1 lim nm B(0, `){ ,

and, by (3.1.2), the final term tends to 0 as ` .


To prove the converse assertion, suppose that S is sequentially relatively compact. If (3.1.2) failed, then we could find an (0, 1) and, for each n Z+ ,
a n S such that n B(0, n) . By sequential relative compactness, this
would mean that there is a subsequence {nm : m 1} S and a M1 (RN )
such that nm = and nm B(0, nm ) . On the other hand, for any
R > 0,


B(0, R) hR , i lim nm B(0, nm ) ,
m


and so we would arrive at the contradiction 1 = limR B(0, R) . 
3.1.2. L
evys Continuity Theorem. My next goal is to find a test in terms
of the Fourier transform to determine when (3.1.2) holds.
Lemma 3.1.3. Define

1

s(r) = inf

sin


for r (0, ).

Then s is a strictly positive, non-decreasing, continuous function that tends to


0 as r & 0. Moreover, if M1 (RN ), then, for all (r, R) (0, )2 ,
(3.1.4)




1
(re) rR + 2 {y : |(e, y)RN | R} for all e SN 1 ,

and



1
B(0, N 2 R){ N sup {y : |(e, y)RN | R}
eSN 1

(3.1.5)




N
max 1
() : || r .
s(rR)

118

3 Infinitely Divisible Laws

In particular, for any S M1 (RN ), (3.1.2) holds if and only if




lim sup 1
() = 0.

(3.1.6)

||&0 S

Proof: Given (3.1.4) and (3.1.5),


the final assertion is obvious.


(3.1.4), simply observe that 1 e 1(re,y)RN 2 r|(e, y)RN | .
Turning to (3.1.5), note that



1
()

RN

To prove


1 cos(, y)RN (dy).

Thus, for each e SN 1 ,


1
r


1
(te) dt

Z
RN \{0}

sin r(e, y)RN


1
r(e, y)RN

!

(dy)


s(rR) {y : |(e, y)RN | R} ,
and therefore
(3.1.7)




() s(rR) {y : |(e, y)RN | R} .
sup 1
B(0,r)

Since the first inequality in (3.1.5) is obvious, there is nothing more to be


done. 
I am now ready to prove Levys crucial improvement to Lemma 2.3.3.
Theorem 3.1.8 (L
evys Continuity Theorem). Let {n : n 1}
M1 (RN ), and assume that f () = limn
n () exists for each RN . Then
N
there is a M 1 (R ) such that
if and only if there is a > 0 for which
f =
limn sup||
n () f () = 0, in which case n = . (See part (iv) of
Exercise 3.1.9 for another version.)
Proof: The only assertion not already covered by Lemmas 2.1.7 and 2.3.3 is
the if part of the equivalence. But, if
n f uniformly in a neighborhood of
0, then it is easy to check that supn1 |1
n ()| must tend to zero as || 0.
Hence, by the last part of Lemma 3.1.3 and Theorem 3.1.1, we know that there
exists a and a subsequence {nm : m 1} such that nm = . Since
must
equal f , Lemma 2.3.3 says that n = . 

Exercises for 3.1

119

Exercises for 3.1


Exercise 3.1.9. One might think that to address the sort of problem posed
at the beginning of this chapter, it would be helpful to know which functions
f : RN C are the Fourier transforms of a probability measure. Such a
characterization is the content of Bochners Theorem, whose proof will be
outlined in this exercise. Unfortunately, his characterization looks more useful
than it is in practice. For instance, I will not use it to solve our problem, and it
is difficult to see how its use would simplify matters.
In order to state Bochners Theorem, say that a function f : RN C is
N
non-negative
 definite if, for each n 1 and 1 , . . . , n R , the matrix
f (i j ) 1i,jn is Hermitian and non-negative definite. Equivalently,1
n
X

f (i j )i j 0

for all 1 , . . . , n C.

i,j=1

Then Bochners Theorem is the statement that f =


for some M1 (RN ) if
and only if f (0) = 1 and f is a continuous, non-negative definite function.
(i) It is ironic that the necessity assertion is the more useful even though it is
nearly trivial. Indeed, if f =
, then it is obvious that f (0) = 1 and that f is
continuous. To see that it is also non-negative definite, write
n
X

1(i j ,x)RN

2

n

X



1(
,x)
i
RN ,
i j =
e
i


i=1

i,j=1

and integrate in x with respect to .


(ii) The first step in proving the sufficiency is to use
the
non-negative definiteness assumption to show that f (x) = f (x) and f (x) f (0) for all x RN .
Obviously, this proves that kf ku 1. Second, using a standard Riemann approximation procedure and the continuity of f , check that, for any rapidly decreasing,
continuous : RN C,
ZZ

()
dx d 0.
f (x )(x)
RN RN


In particular, when f L1 RN ; C , set
N

m(x) = (2)

1 (x,)RN

f () d,

RN
1

Recall that a non-negative definite operator on a complex Hilbert space is always Hermitian.

120

3 Infinitely Divisible Laws

and use Parsevals Identity and Fubinis Theorem, together with elementary
manipulations, to arrive at
Z
ZZ

()
d d 0
(2)N
m(x) (x)2 dx =
f ( )()
RN

RN RN

for all L1 (RN ; R) Cb (RN ; R) with L1 (RN ; R). Conclude that m is non
negative, and use this to complete the proof in the case when f L1 RN ; C .

(iii) It remains only to pass from the case when f L1 RN ; C to the general
|x|2

case. For each t (0, ), set ft (x) = et 2 f (x). Clearly, ft (0) = 1 and
ft Cb (RN ; C) L1 (RN ; C). In addition, show that

Z
n
n
X
X



ft i j i j =
f i j i (x)j (x) 0,tI (dx) 0,
RN

i,j=1

i,j=1

where i (x) i e 1 (i ,x)RN . Hence, ft is also non-negative definite, and so,


by part (ii), we know thatft = bt for some t M1 (RN ). Finally, apply Levys
Continuity Theorem to see that t =, where M1 (RN ) satisfies f =
.

(iv) Let {n : n 1} and f be as in Theorem 3.1.8. Combining Bochners


Theorem with Lemma 2.1.7, show that there exists a M1 (RN ) such that
f =
and n = if and only if f is continuous.
Exercise 3.1.10. Suppose that f is a non-negative definite function with f (0) =
1. As we have just seen, if f is continuous, then f =
for some M1 (RN ).
(i) Assuming that f =
, show that
(*)

kf ku 1



and |f () f ()|2 2 1 Re f ( ) ,

, RN .

Next, show that (*) follows directly from non-negative definiteness, whether
or not f is continuous. Thus, a non-negative definite function is uniformly
continuous everywhere if it is continuous at the origin.
Hint: Both parts of (*) follow from the fact that

f ()
f ()
1
1
f ( )
A = f ()

f ()

f ( )

is non-negative
definite. To get the second part, consider the quadratic form

v, Av C3 with v = (v1 , 1, 1).2
2

This choice of v was suggested to me by Linan Chen.

Exercises for 3.1

121

(ii) To understand how essential a role continuity plays in Bochners criterion,


show that f = 1{0} is non-negative definite. Even though this f cannot be the
Fourier transform of any M1 (RN ), it is nonetheless the Fourier transform
of a non-negativity preserving linear functional, one for which there is no Riesz
representation. To be more precise, consider the linear functional on the space
of functions Cb (RN ; C) for which
Z
1
(x) dx exists,
lim
R |B(0, R)| B(0,R)

and show that f () = (e ), where e (x) = e

1(,x)RN

Exercise 3.1.11. It is important to recognize the extent to which Levys Continuity Theorem and, as a by-product, Bochners Theorem, are strictly finite
dimensional results. For example, let H be an infinite dimensional, separa2
1
ble, real Hilbert space, and define f (h) = e 2 khkH . Obviously, f is a continuous and f (0)= 1. Show that it is also non-negative definite in the sense
that f (hi hj ) 1i,jn is a non-negative definite, Hermitian matrix for each
n Z+ and h1 , . . . , hn H. Now suppose that there were a Borel probability
measure on H such that
Z

(h)
e 1(h,x)H (dx) = f (h), h H.
H

Show that, for any orthonormal basis {ei : i Z+ } in H, the functions Xi (h) =
(ei , h)H , i Z+ , would be, under , a sequence of independent, N (0, 1)-random
variables, and conclude from this that
Z
Y

2
2
ekhkH (dh) =
E eXi = 0.
H

iZ+

Hence, no such can exist. See Chapter 8 for a much more thorough account
of this topic.
Hint: The non-negative definiteness of f can be seen as a consequence of the
analogous result for Rn .
Exercise 3.1.12. The RiemannLebesgue Lemma says that f() 0
as || if f L1 (RN ; C). Thus
() 0 as || if M1 (R)
is absolutely continuous. In this exercise we will examine situations in which
M1 (R) but
()
6 0 as || .
(i) Given a symmetric M1 (R), show that
is real valued, and use Bochners
Theorem to show that
() cannot tend to a strictly negative number as || .
Hint: Let > 0, and suppose that
() 2 as || . Choose R > 0
so that
()

for
||

R
and
n Z+ so that (n 1) > 1. Set A =


(`R kR) 1k,`n , and show that A cannot be non-negative definite.

122

3 Infinitely Divisible Laws

(ii) Show that


()
6 0 if has an atom (i.e., ({x}) > 0 for some x R).
Hint: Reduce to the case in which is symmetric, and therefore that = p0 +
q, where p (0, 1], q = 1 p, and M1 (R) is symmetric. If p = 1,
() = 1
for all . If p (0, 1), then
() 0 as || implies () pq < 0.

(iii) To produce an example that is non-atomic, refer to Exercise 1.4.29, take


p (0, 1) \ { 12 }, and let = p , where p is the measure described in that
exercise. Show that is a non-atomic element of M1 (R) for which

6
0 as
|| .
Hint: Show that
never vanishes and that
(2m ) is independent of m Z+ .

3.2 The L
evyKhinchine Formula
Throughout, I(R ) will be the set of M1 (RN ) that are infinitely divisible.
My strategy for characterizing I(RN ) will be to start from an easily understood
subset of I(RN ) and to get the rest by taking weak limits.
The elements of I(RN ) that first come to mind are the Gaussian measures
(cf. (2.3.6)) m,C . Indeed, if m RN and C is a symmetric, non-negative
definite transformation on RN , then it is clear from (2.3.7) that m,C = ?n
m C.
n ,n
Unfortunately, this is not a good starting place because it is too rigid: limits of
Gaussians are again Gaussian. Indeed, suppose that mn ,Cn = . Then
N

1 (,mn )RN 12 (,Cn )RN


()

for all RN ,

and so = m,C , where m = limn mn and C = limn Cn . In other words,


one cannot use weak convergence to escape the class of Gaussian measures.
A more fruitful choice is to start with the Poisson measures. Recall that if
is a probability measure on RN and [0, ), then the Poisson measure
with jump distribution and jumping rate (see 4.2 for an explanation of this
terminology) is the measure
, = e

X
n ?n
.
n!
n=0

To see that , is infinitely divisible, note that



 Z

1 (,y)RN

1
(dy)
,
d
()
=
exp

e
,

. To see why the Poisson measures provide a


and therefore that , = ?n

n ,
more hopeful choice of starting point, let m RN and a non-negative definite,
symmetric C be given, and choose (e1 , . . . , eN ) to be p
an orthonormal basis of
eigenvectors for C. Next, set mi = (m, ei )RN and i = (ei , Cei )RN , and take
!
N
N
X

1X
1
.
i ei + i ei
mi ei +
n =
n
2 i=1 n
2N i=1 n

3.2 The LevyKhinchine Formula

123

Then the Fourier transform of 2N n,n is


exp

N
X

n e

1mi (,ei ) N
R
n

i=1

!
N


X
i (, ei )RN
1
,
1 +
n cos
1
n2
i=1


which tends to [
m,C () as n , and so 2N n,n = m,C as n . Thus,
one can use weak convergence to break out to the class of Poisson measures.
As I will show in the next subsection, the preceding is a special case of a
result (cf. Theorem 3.2.7) that says that every infinitely divisible measure is the
weak limit of Poisson measures. However, before proving that result, it will be
convenient to alter our description of Poisson measures. For one thing, it should
be clear that, without loss in generality, I may always assume that the jump
distribution assigns no mass to 0. Indeed, if ({0}) = 1, then , = 0 = 0, 0
no matter how and 0 are chosen. If = ({0}) (0, 1), then , = 0 , 0 ,
where 0 = (1 ) and 0 = (1 )1 ( 0 ). In addition, although the
segregation of the rate and jumping distribution provides probabilistic insight,
there is no essential reason for doing so. Thus, nothing is lost if one replaces
, by M , where M is the finite measure , in which case

Z

d
M () = exp

1(,y)RN


1 M (dy) .


With these considerations in mind, let M0 (RN ) be the space of non-negative,


finite Borel measures M on RN with M ({0}) = 0, and set P(RN ) = {M : M
M0 (RN )}, the space of Poisson measures on RN .
3.2.1. I(RN ) Is the Closure of P(RN ). Let P(RN ) be the closure of P(RN )
under weak convergence. That is, P(RN ) if and only if there exists a sequence
{Mn : n 1} M0 (RN ) such that Mn =. My goal here is to prove that

I(RN ) = P(RN ).

(3.2.1)

Before turning to the proof of (3.2.1), I need the following simple lemma about
non-vanishing, C-valued functions. In its statement, and elsewhere,
(3.2.2)

log =

X
(1 )m
m
m=1

for C with |1 | < 1

is the principle branch of logarithm function on the open unit disk around 1 in
the complex plane.

Lemma 3.2.3. Let R (0, ) be given. If f C B(0, R); C \ {0} with

f (0) = 1, then there is a unique `f C B(0; R); C such that `f (0) = 0 and

124

3 Infinitely Divisible Laws



f = e`f . Moreover, if B(0; R), r (0, ), and 1

f ()
f ()

< 1 for all

B(, r) B(0, R), then, for each B(, r) B(0, R),

`f () `f () = log

f ()
,
f ()

and therefore





f
()

if 1
f ()

if f is a second element of C B(0; R); C \ {0} with
Finally,

f()
1 f () 12 for all B(0, R), then




f
()

|`f () `f ()| 2 1
f ()




()


f

` () `f () 2 1

f

f ()

1
.
2

f(0) = 1 and if

for B(0, R).


In particular, if {fn : n 1} C B(0, R); C \ {0} with fn (0) = 1 for all n 1,

and if fn f C B(0; R); C \ {0} uniformly on B(0, R), then f (0) = 1 and
`fn `f uniformly on B(0; R).

Proof: To prove the existence and uniqueness of `f , begin by observing that


there exists an M Z+ and 0 = r0 < r1 < < rM = R such that




1

f
()
1 

for 1 m M and B(0, rm ) \ B(0, rm1 ).

2


f rm1
||

Thus, we can define a function `f on B(0, R) so that `f (0) = 0 and


`f () = `f

rm1
||


+ log

f ()


rm1
||

if 1 m M and B(0, rm ) \ B(0, rm1 ).

Furthermore, working by induction on 1 m M , one sees that this `f is



continuous and satisfies f = e`f . Finally, for any ` C B(0, R); C satisfying

`(0) = 0 and f = e` , ( 12)1 (` `f ) is a continuous, Z-valued function that


vanishes at 0, and therefore ` = `f .
Next suppose that B(0, R) and that




1 f () < 1 for all B(, r) B(0, R).

f ()

3.2 The LevyKhinchine Formula


Set
`() = `f () + log

f ()
f ()

125

for B(, r) B(0, R),

and check that


( 12)1 `() `f () is a continuous, Z-valued function that vanishes at . Hence, ` = `f on B(0, R) B(, r), and therefore on
B(0, R) B(, r). Since | log(1 )| 2|| if || 12 , this completes the proof
of the asserted properties of `f .



1
Turning to the comparison between `f and `f when 1 ff ()
() 2 for all

B(0, R), set `() = `f () + log

f()
f () ,

check that `(0) = 0 and f = e` , and

conclude that `f `f = log ff . From this, the asserted estimate for |`f `f | is
immediate. 

Lemma 3.2.4. Define r


s(r) as in Lemma 3.1.3, and let M1 (RN ) and
0 < r < R be given. If |1
()| 12 for all B(0, r) and there is an
M1 (RN ) such that = ?n for some

(3.2.5)

16

r
4R

,

then |
()| 2n for all B(0, R).

Proof: First apply Lemma 3.2.3 to see that, because


() = ()n , neither

nor vanishes anywhere on B(0, r) and therefore that there are unique `, `


= e` , and = e` on B(0, r).


C B(0, r); C such that `(0) = 0 = `(0),

Further, since
= en` , uniqueness requires that ` = n1 `. Next, observe that,
because ` = log
and |1
| 12 on B(0, r), |`| 2 there. Hence, because

1
`
Re` 0, |1 | = 1 e n n2 on B(0, r). Using this in (3.1.7), we have, for
any > 0 and e SN 1 , that



2
1
,
max 1 ()
(3.2.6)
{y : |(e, y)RN | }
ns(r)
s(r) B(0,r)


4
for B(0, R). Finally take
which, by (3.1.4), leads to 1 () R + ns(r)
1
() = ()n to check that this gives the desired
= 4R , and use (3.2.5) and
conclusion. 
I now have everything that I need to prove the equality (3.2.1).

Theorem 3.2.7. For each I(RN ) there is a unique ` C(RN ; C) satis1


fying ` (0) = 0 and
= e` . Moreover, for each n Z+ , e n ` is the Fourier
In addition, if
transform of the unique n1 M1 (RN ) such that = ?n
1 .
n

Mn M0 (RN ) is defined by
(3.2.8)

Mn () n n1 (RN \ {0})

for BRN ,

126

3 Infinitely Divisible Laws

then Mn =. Finally, I(RN ) is closed in the sense that I(RN ) if there


exists a sequence {k : k 1} I(RN ) such that k = . In particular, n1
is uniquely determined and (3.2.1) holds.

Proof: Let I(RN ) be given. Since there is an r > 0 such that |1


()| 12
+
?n
for all B(0, r) and, for all n Z , = 1 for some n1 M1 (RN ),
n
Lemma 3.2.4 guarantees that
never vanishes. Hence, by Lemma 3.2.3, both
the existence and uniqueness of ` follow. Moreover, if = ?n
1 , then, from
n
n

() = cn1 () , we know first that cn1 never vanishes and then that ` = n`,
where ` is the unique element of C(RN ; C) satisfying `(0) = 0 and cn1 = e` . In
1

particular, this proves that n1 = e n ` for any n1 with = n


1 , and so there is
n
at most one such n1 .
Now define Mn as in the statement, and observe that





1
n ` () 1
1
e` () =
()
()

1
=
exp
n
e
d
()
=
exp
n

Mn
n

as n . Hence, Mn =. In particular, this proves that I(RN ) P(RN ),


and therefore, since we already know that P(RN ) I(RN ), the final statement
will follow once we check that I(RN ) is closed.
To prove that I(RN ) is closed, suppose that {k : k 1} I(RN ) and that
k = . The first step in checking that I(RN ) is to show that
never
vanishes. To this end, use the fact that
k
uniformly on compacts to see
that there must exist an r > 0 such that |1
k ()| 12 for all k Z+ and
B(0, r). Hence, because each of the k s is infinitely divisible, one can use
Lemma 3.2.4 to see that, for each R (0, ),

inf{|
k ()| : k Z+ and B(0, R)} > 0,

and clearly this is more than enough to show that


never vanishes. Thus we can
choose a unique ` C(RN ; C) so that `(0) = 0 and
= e` . Moreover, if `k = `k ,
then, by Lemma 3.2.3, `k ` uniformly on compacts. Now let n Z+ be given,
. Then we know that
and choose {k, n1 : k 1} M1 (RN ) so that k = ?n
k, 1
1

`
1 = e n k , and so, as k ,
k, n1 e n ` uniformly on compacts. Hence,

[
k, n
1

n1 for some n1 M1 (RN ). Since this


by Levys Continuity Theorem, e n ` =
N

means that = ?n
1 , we have shown that I(R ).
n

3.2.2. The Formula. Theorem 3.2.7 provides interesting information, but it


fails to provide a concrete characterization of the infinitely divisible laws. In this
subsection I will give an explicit formula for
when I(RN ), which, in view
of the first part of Theorem 3.2.7, is equivalent to characterizing the functions
in {` : I(RN )}.

3.2 The LevyKhinchine Formula

127

In order to understand what follows, it may be helpful to first guess what


the characterization might be. We already know two families of measures which
are contained in I(RN ): the Gaussian measures m,C for m RN and symmetric, non-negative definite C Hom(RN ; RN ), and the Poisson measures M for
M M0 (RN ). Further, it is obvious that , I(RN ) = ? I(RN ), and
we know that I(RN ) if n = for some {n : n 1} I(RN ). Finally,
Theorem 3.2.7 tells us that every element of I(RN ) is the limit of Poisson measures. Thus, by Levys Continuity Theorem, we should be asking what sort of
functions can arise as the locally uniform limit of functions of the form
Z h
i



1
e 1(,y)RN 1 M (dy),
(*)

` = 1 , m RN 2 , C RN +
RN

and, as I already noted, only the Poisson component M offers much flexibility.
With this in mind, I introduce for each [0, ) the class M (RN ) of Borel
measures M on RN such that
Z
|y|
M (dy) < .
M ({0}) = 0 and

RN 1 + |y|

When M M0 (RN ), the function ` in (*) equals ` for = m,C ? M . More


generally, even if M M (RN ) \ M0 (RN ), for each r > 0, Mr given by M (dy) =
1[r,) (|y|)M (dy) is an element of M0 (RN ). Furthermore, if M M1 (RN ), then
it is clear that, as r & 0,
Z
Z
i
i
 1(,y) N
 1(,y) N
R
R
1 M (dy)
1 Mr (dy)
e
e
RN

RN

uniformly on compacts. Thus, by Levys Continuity Theorem, when M


M1 (RN ), the function ` in (*) is ` for a I(RN ). In order to handle
M M (RN ) for > 1, we must make the integrand M -integrable
at 0 by

1(,y)RN
. Thus,
subtracting off the next term in the Taylor expansion of e
choose a Borel measurable function : RN [0, 1] that equals 1 in a neighborhood of 0, and set `r () equal to

1 , m


RN

1
2

, C


RN

+
RN

 i
e 1(,y)RN 1 1(y) , y RN Mr (dy).

Because
`r () =

1 , mr


RN

1
2

, C


RN

1(,y)RN

RN

Z
where mr = m

(y)y Mr (dy),
RN

i
1 Mr (dy),

128

3 Infinitely Divisible Laws

we know that `r = `r for r = mr ,C ? Mr . In addition, if M M2 (RN ) and


`() equals

1
2 (, C)RN

1(, m)RN

Z
+
RN

 i
e 1(,y)RN 1 1(y) , y RN M (dy),

then `r ` uniformly on compacts. Hence, again by Levys Continuity Theorem, we know that, for each M M2 (RN ), the function
`()

(**)



1 ,m RN 12 , C RN
Z h

 i
+
e 1(,y)RN 1 1(y) , y RN M (dy)
RN

equals ` for some I(RN ).


One might think that repeated application of the same procedure would show
that one need not stop at M2 (RN ) and that more singular M s can occur in
the representation of `. More precisely, one might try accommodating M s from
M3 (RN ) by subtracting off the next term in the Taylor expansion. That is, one
would replace
Z

1(,y)RN

RN

1(y) , y


RN

Mr (dy)

by
Z
RN

2 i

e 1(,y)RN 1 1(y) , y RN + 12 (y) , y RN Mr (dy)

in the expression for `r . However, to re-write this `r in the form given in (*),
one would have to replace C by
Z
C

(y)y y Mr (dy),
RN

which would destroy non-negative definiteness as r & 0.


The preceding discussion is evidence for the conjecture that the functions ` of
the form in (**) coincide with {` : I(RN )}, and the rest of this subsection is
devoted to the verification of this conjecture. Because of their role here, elements
of M2 (RN ) are called L
evy measures.
The strategy that I will adopt derives from the observation that ` () =
limn n
n1 () 1 . Thus, if we can understand the operation


A = lim n h, n1 i (0)
n

3.2 The LevyKhinchine Formula

129

for a sufficiently rich class


of functions , then we can understand
` () by

1(,x)RN
1(,x)RN
is not an
. Even though x
e
applying A to x
e
element, for technical reasons, it turns out that the class of s on which it
is easiest to understand A is the Schwartz test function space S (RN ; C) (the
space of smooth C-valued functions that, together with all of their derivatives,
are rapidly decreasing). The basic reason why S (RN ; C) is well suited to our
analysis is that the Fourier transform maps S (RN ; C) onto itself. Further, once
we understand how A acts on S (RN ; C), it is a relatively simple matter to use
that understanding to compute ` ().

Lemma 3.2.9. Let I(RN ) be given. For each r (0, ) there exists a
C(r) < such that |` ()| C(r)(1 + ||2 ) for all RN whenever I(RN )
satisfies |1
()| 12 for B(0, r). Moreover,



A (c1 + ) lim n hc1 + , n1 i c + (0)
n
Z
1
` ()()
d
=
(2)N RN

(3.2.10)

for each c C and S (RN ; C).


Proof: Suppose that I(RN ) satisfies |1
()| 12 for B(0, r).
Applying (3.1.4) and the second inequality in (3.2.6) with = n1 , we know
that, for any (, R) (0, )2 ,

sup |1 cn1 ()| R +

||R

4
.
ns(r)

1
, we obtain sup||R |1 cn1 ()| 12
Hence, if R r, then, by taking = 4R
and therefore sup||R | n1 ` ()| 2 if n satisfies (3.2.5). Finally, observe that
2
there
 is an  > 0 such that s(t) t for t (0, 1], and therefore that |` ()|

2 1+

64R2
r 2

for || R, which completes the proof of the first assertion.

Clearly it suffices to prove (3.2.10) when c = 0. Thus, let S (RN ; C) be


given. Then, by (2.3.4),

Z


1
d
n e n ` () 1 ()
(2) n h, n1 i (0) =
RN

Z
Z 1 Z
t
` ()()
d,
d dt
=
e n ` () ` ()()
N

RN

RN
1

n1 ()| 1, ` () has a most quadratic


where (keeping in mind that |e n ` | = |
growth, and ()

is rapidly decreasing) the passage to the second line is justified

130

3 Infinitely Divisible Laws

by Fubinis Theorem and the limit is an application of Lebesgues Dominated


Convergence Theorem. 
Lemma 3.2.9, especially (3.2.10), provides us with two critical pieces of information about A . Namely, it tells us that A satisfies the minimum principle
and that it is quasi-local. To be precise, set D = RS (RN ; R). That is, D
if and only if there is a () R such that ()1 S (RN ; R). I will say
that a real-valued linear functional A on D satisfies the minimum principle if
(3.2.11)

A 0 if D and (0) = min (x)


xRN

and that A is quasi-local if


(3.2.12)

lim AR = 0

for all D,


x
for R > 0. Notice that, by applying the minimum principle
where R (x) = R
to both 1 and 1, one knows that A1 = 0.
To see that A satisfies both these conditions, first observe that if (0) =
minxRN (x), then h, n1 i (0) 0 for all n Z+ , and therefore that
A 0. Secondly, to check that A is quasi-local, note that it suffices to treat
N
S (RN ; R) and that for such a , c

Thus,
R () = R (R).
Z

N
` R1 ()
d 0,
(2) A R =
RN

since ` (0) = 0 and


supR1 |` (R1 )()|

is rapidly decreasing.
As I am about to show, these two properties allow us to say a great deal about
A . Before explaining this, first observe that if M M (RN ), then, for every
Borel measurable : RN C,
(3.2.13)

|(y)|
< = L1 (M ; C).

yRN \{0} 1 |y|


sup

Using (3.2.13), one can easily check that if Cb2 (RN ; C) and S (RN ; R)
equals 1 in a neighborhood of 0, then
y


(y) (0) (y) y, (0) RN

is M -integrable for every M M2 (RN ).


Second, in preparation for the proof of the next lemma, I have to introduce

the following partition of unity for RN \ {0}. Choose C RN ; [0, 1] so that

has compact support in B(0, 2) \ B 0, 14 and (y) = 1 when 12 |y| 1,
and set m (y) = (2m y) for m Z. Then, if y RN and 2m1 |y|

3.2 The LevyKhinchine Formula

131

2m , mP
(y) = 1 and n (y) = 0 unless m 2 n m + 1. Hence, if
(y) = mZ m (y) for y RN \ {0}, then is a smooth function with values
in [1, 4]; and therefore, for each m Z, the function m given by m (0) = 0
m (y)
for y RN \ {0} is a smooth, [0, 1]-valued function that
and m (y) = (y)

vanishes off of B(0, 2m+1 ) \ B(0, 2m2 ). In addition, for each y RN \ {0},
P
m2
|y| 2m+1 .
mZ m (y) = 1 and m (y) = 0 unless 2
Finally, given n Z+ and C n (RN ; C), define n (x) to be the multilinear
map on (RN )n into C by
!
n
X

 n

n
x+
tm m
.
(x) (1 , . . . , n ) =
t1 tn
t1 ==tn =0
m=1

Obviously, and 2 can be identified as the gradient of and Hessian of .


Lemma 3.2.14. Let D be the space of functions described above. If A :
D R is a linear functional on D that satisfiesR (3.2.11) and (3.2.12), then
there is a unique M M2 (RN ) such that A = RN (y) M (dy) whenever
is an element of S (RN ; R) for which (0) = 0, (0) = 0, and 2 (0) = 0.
Next, given Cc RN ; [0, 1] satisfying = 1 in a neighborhood of 0, set
(y) = (y)(, y)RN for RN , and define m RN and C Hom(RN ; RN )
by
Z


(3.2.15) , m ) = A and , C 0 RN = A 0
( 0 )(y) M (dy).
RN

Then C is symmetric, non-negative definite, and independent of the choice of .


Finally, for any D,


A = 12 Trace C2 (0) + m , (0) RN
Z 
 
(3.2.16)
+
(y) (0) (y) y, (0) RN M (dy).
RN

Proof: Referring to the partition of unity described


above, define m =

A(m ) for C B(0, 2m+1 ) \ B(0, 2m2 ); R , where


m (y) =

m (y)(y)

if 2m2 |y| 2m+1

otherwise.

Clearly m is linear. In addition, if 0, then m 0 = m (0), and so, by




m+1 ) \ B(0, 2m2 ); R ,


B(0,
2
(3.2.11), m 0. Similarly, for any
C

kku m m 0 = kku m m (0), and therefore |m | Km kku ,
where Km = Am . Hence, m admits a unique extension
as a continuous

linear functional on C B(0, 2m+1 ) \ B(0, 2m2 ); R that is non-negativity

132

3 Infinitely Divisible Laws

preserving and has norm Km ; and so, by the Riesz Representation Theorem,
we now know that there is a unique non-negative Borel measure Mm on RN
such that MRm is supported on B(0, 2m+1 ) \ B(0, 2m2 ), Km = Mm (RN ), and
A(m ) = RN (y) Mm (dy) for all S (RN ; R).
P
Now define the non-negative Borel measure M on RN by M = mZ Mm .
Clearly, M ({0}) = 0. In addition, if Cc RN \ {0}; R , then there is an
n Z+ such that m 0 unless |m| n. Thus,

A =

n
X

A(m ) =

m=n

m=n

=
RN

(y) Mm (dy)

RN

n
X

Z
n
X

m (y)(y)

Z
M (dy) =

(y) M (dy),
RN

m=n

and therefore
Z
(3.2.17)

A =

(y) M (dy)
RN


for Cc RN \ {0}; R .
Before taking the next step, observe that, as an application of (3.2.11), if
1 , 2 D, then
1 2 and 1 (0) = 2 (0) = A1 A2 .

(*)

Indeed, by linearity, this reduces to the observation that, by (3.2.11), if D


is non-negative and (0) = 0, then A 0.
With these preparations, I can show that, for any D,
Z
0 = (0) =

(**)

(y) M (dy) A.
RN

Pn
To check this, apply (*) to n = m=n m and , and use (3.2.17) together
with the Monotone Convergence Theorem to conclude that
Z

Z
n (y) M (dy) = lim An A.

(y) M (dy) = lim

RN

RN

Now let be as in the statement of the lemma, and set R (y) = (R1 y) for
R > 0. By (**) with (y) = |y|2 (y) we know that
Z
RN

|y|2 (y) M (dy) A < .

3.2 The LevyKhinchine Formula

133

At the same time, by (3.2.17) and (*),


Z

1 (y) R (y) M (dy) A(1 )
RN

for all R > 0, and therefore, by Fatous Lemma,


Z

1 (y) M (dy) A(1 ) < .
RN

Hence, I have proved that M M2 (RN ).


I am now in a position to show that (3.2.17) continues to hold for any
S (RN ; R) that vanishes along with its first and second order derivatives at 0.
To this end, first suppose that vanishes in a neighborhood of 0. Then, for
each R > 0, (3.2.17) applies to R , and so
Z

R (y)(y) M (dy) = A(R ) = A + A (1 R ) .
RN

By (*) applied to (1 R ) and (1 R )kku ,




A (1 R ) kku A(1 R ) = kku AR 0

as R ,

where I used (3.2.12) to get the limit assertion. Thus,


Z
Z
A = lim
R (y)(y) M (dy) =
(y) M (dy),
R

RN

RN

because, since M is finite on the support of and therefore is M -integrable,


Lebesgues Dominated Convergence Theorem applies. I still have to replace the
assumption that vanishes in a neighborhood of 0 by the assumption that it
vanishes to second order there. For this purpose, first note that, by (3.2.13),
is certainly M -integrable, and therefore
Z

(y) M (dy) = lim A (1 R ) = A lim A(R ).
RN

R&0

R&0

By our assumptions about at 0, we can find a C < such that |R (y)|


CR|y|2 (y) for all R (0, 1]. Hence, by (*) and the M -integrability of |y|2 (y),
there is a C 0 < such that |A(R )| C 0 R for small R > 0, and therefore
A(R ) 0 as R & 0.
To complete the proof from here, let S (RN ; R) be given, and set


(x)

= (x) (0) (x) x, (0) RN 12 (x)2 x, 2 (0)x RN .

Then, by the preceding, (3.2.17) holds for and, after one re-arranges terms,
says that (3.2.16) holds. Thus, the properties of C are all that remain to be
proved. That C is symmetric requires no comment. In addition, from (*), it
is clearly non-negative definite. Finally, to see that it is independent of the
chosen, let 0 be a second choice, note that 0 = in a neighborhood of 0, and
apply (3.2.17). 

134

3 Infinitely Divisible Laws

Remark 3.2.18. A careful examination of the proof of Lemma 3.2.14 reveals a


lot. Specifically, it shows why the operation performed by the linear functional
A cannot be of order greater than 2. The point is, that, because of the minimum
principle, A acts as a bounded, non-negative linear functional on the difference
between and its second order Taylor polynomial, and, because of quasi-locality,
this action can be represented by integration against a non-negative measure.
The reason why the second order Taylor polynomial suffices is that second order
polynomials are, apart from constants, the lowest order polynomials that can
have a definite sign.
In order to complete the program, I need to introduce the notion of a L
evy
system, which is a triple (m, C, M ) consisting of an m RN , a symmetric, nonnegative definite transformation C on RN , and a Levy measure M M2 (RN ).
Given a Levy system (m, C, M ) and a Borel measurable : RN [0, 1] satisfying
!
(3.2.19)

sup

|y|

1 (y)

sup

(y)|y|

< ,

yB(0,1)
/

yB(0,1)\{0}

we will need to know that


Z

1

1(,y)RN
N M (dy)
1(y)
,
y)

e

R
2
1
+
||
(3.2.20)
RN
is bounded and tends to 0 as || .

To see this, note that, for each r (0, 1],


1(,y)RN
1 1(y) , y)RN M (dy)
e
RN
Z


1(,y)RN
1 1 , y)RN M (dy)

e
B(0,r)
Z
Z


+ ||
1 (y) |y| M (dy) +
2 + ||(y)|y| M (y)

B(0,r)

||
2

B(0,r){

|y|2 M (dy) + ||

B(0,r)

Z
+ ||


1 (y) M (dy)

B(0,r)


(y)|y| M (dy) + 2M B(0, r){ .

B(0,r){

Obviously, this proves the boundedness in (3.2.20). In addition,


R it shows that, for
each r (0, 1], the limit there as || is dominated by 12 B(0,r) |y|2 M (dy),
which tends to 0 as r & 0.

3.2 The LevyKhinchine Formula

135

Knowing (3.2.20), we can define

(3.2.21)



`(m,C,M ) () = 1 m, RN 12 , C RN
Z 

 
+
e 1 (,y)RN 1 1(y) , y RN M (dy)
RN

for any Levy system (m, C, M ) and any Borel measurable : RN [0, 1]
that satisfies (3.2.19). Furthermore, because `(m,C,Mr ) `(m,C,M ) uniformly
on compacts when Mr (dy) = 1[r,) (|y|) M (dy), it is clear that `(m,C,M ) is
continuous.
Theorem 3.2.22 (L
evyKhinchine). For each I(RN ), there is a unique
1
` C(RN ; C) such that ` (0) = 0 and
= e` , and, for each n Z+ , e n ` is
the Fourier transform of the unique n1 M1 (RN ) satisfying = ?n
1 . Next,
n

let : RN [0, 1] be a Borel measurable function that satisfies (3.2.19).


Then, for each I(RN ), there is a unique Levy system (m , C , M ) such
that ` = `(m ,C ,M ) , and, for each Levy system (m, C, M ), there is a unique

I(RN ) such that ` = `(m,C,M ) . In fact, if I(RN ), then


Z
RN

(y) M (dy) = lim nh, n1 i


n

for all S (R ; C) that satisfy lim |y|2 |(y)| = 0,


N

|y|&0

Z
C = lim n
n

RN

0 (y) y y n1 (dy)

and
m0 = lim n
n

0 (y)2 y y M (dy),

RN

Z
RN

0 (y)y n1 (dy)


for any if 0 Cc RN ; [0, 1] satisfying 0 = 1 in a neighborhood of 0. Finally,
for any Borel measurable : RN [0, 1] satisfying (3.2.19),
m

m0

Z
+


(y) 0 (y) M (dy).

RN

Proof: The initial assertion is covered by Theorem 3.2.7.



To prove the second assertion, let Cc RN ; [0, 1] with = 1 in B(0, 1) be
given. For I(RN ), I will show that ` = `(m ,C,M ) , where m , C, and M are
determined from (cf.
(3.2.10)) A as in Lemma 3.2.14. To this end, define e for
RN by e (x) = e 1(,x)RN , and set R (x) = (R1 x) for R > 0. The idea

136

3 Infinitely Divisible Laws

is to show that, as R , A (R e ) tends to both ` () and to `(m ,C,M ) ().


To check the first of these, use (3.2.10) to see that
Z
Z
` (R1 0 x)
( 0 ) d 0 .
` ( 0 )c
R ( 0 + ) d 0 =
(2)N A (R e ) =
RN

RN

Hence, since ` is continuous and, by Lemma 3.2.9, supR1 |` (R1 )


()| is
rapidly decreasing, Lebesgues Dominated Convergence Theorem says that
Z
R ( 0 ) d 0 = ` ().
lim A (R e ) = ` ()(2)N
R

To prove that A (R e ) also tends to


A (R e ) = `(m ,C,M ) ()

RN

`(m ,C,M ) (),

use (3.2.16) to write


1 R (y) e (y) M (dy),
RN

and observe that the last term is dominated by M B(0, R){ 0.
So far we know that, for each I(RN ), there is a Levy system (m , C, M )
such that ` () = `(m ,C,M ) . Moreover, in the preliminary discussion at the
beginning of this subsection, it was shown that, for each Levy system (m, C, M ),
there exists a I(RN ) for which `(m,C,M ) = ` .
Finally, let 0 be as in the statement of this theorem. Given I(RN ), let
0
m RN , C Hom(RN ; RN ), and M M2 (RN ) be associated with A as in
0
(3.2.16) of Lemma 3.2.14 when = 0 . As we have just seen, ` = `(m
.
0
,C ,M )
1

Further, by that lemma and (3.2.10), we know that


Z
(y) M (dy) = A = lim nh, n1 i
n

RN

for any S (RN ; R) that vanishes to second order at 0. In addition, by that


same lemma and (3.2.10), we know that
Z
Z
2
0 (y)2 y y M (dy)
C = lim n
0 (y) y y n1 (dy)
n

RN

RN

and that
m0 = lim n
n

Z
RN

0 (y) n1 (dy).

In particular, m0 , C , and M are all uniquely determined by and 0 . In


addition, if : RN [0, 1] is any other Borel measurable function satisfying
(3.2.19), then the preceding combined with
Z




0
0 (y) (y) , y RN M (dy)
`(m,C,M ) () = `(m,C,M ) () + 1
RN

R

shows that `(m,C,M ) = ` if and only if m = m0 + RN (y) 0 (y) M (dy),


C = C , and M = M . 
The expression in (3.2.21) for ` in terms of a Levy system is known as the
L
evyKhinchine formula.

Exercises for 3.2

137

Exercises for 3.2


Exercise 3.2.23. Referring to (3.2.21), suppose that I(RN ) with ` =
`(m,C,M ) for some Levy system (m, C, M ) whose Levy measure M satisfies
R
e|y| M (dy) < for all (0, ). Show that ` admits a unique ex|y|1
tension as an analytic function on CN and that ` () continues to be given by
0
(3.2.21) when the RN -inner product of (1 , . . . , N ) CN with (10 , . . . , N
)
P
N
N
0
C is i=1 i i . Further, show that
Z

e(,y)RN (dy) = e` (

1)

for all CN .

RN

Hint: The first part is completely elementary complex analysis. To handle the
second part, begin by arguing that it is enough to treat the cases when either
M = 0 or C = 0. The case M = 0 is trivial, and the case when C = 0 can be
further reduced to the one in which = M for an M M0 (RN ) with compact
P
m
support in RN \ {0}. Finally, use the representation M = e m=0 m! ?m to
complete the computation in this case.

Exercise 3.2.24. Given I(RN ) and knowing (3.2.20), show that


, C


RN

2 lim t2 ` (t)

for all I(RN ) and RN .

Similarly, when M M1 (RN ), show that


m m

Z
(y)y M (dy)
RN

is independent of the choice of satisfying (3.2.19) and, for each RN ,

 

2
and
, m = 1 lim t1 ` (t) + t2 , C RN
t
Z 




e 1(,y)RN 1 M (dy).
` () = 12 , C RN + 1 , m RN +
RN

Finally, if I(RN ) is symmetric, show that M is also symmetric and that


1
` () = , C +
2

Z
RN

cos , y


RN


1 M (dy).


Exercise 3.2.25. Given I(R), show
that (, 0) = 0 if and only if

C = 0, M M1 (R), M (, 0) = 0, and (cf. the preceding exercise)
m 0. The following are steps that you might follow.

138

3 Infinitely Divisible Laws

r
(i) To prove the if assertion,
 set M (dy) = 1[r,) (y) M (dy) for r > 0, and
r
show that m ? M r (, 0] = 0 for
 all r > 0 and m ? M = as r & 0.
Conclude from these that (, 0) = 0.

(ii) Now assume that  (, 0) = 0. To see that C = 0, show that if > 0,
then 0,2 ? (, 0) > 0 for any M1 (R).
n

(iii) Continuing (ii), show that (, 0) n1 (, 0) , and conclude first

that n1 (, 0) = 0 for all n Z+ and then that
Z

M (, 0) = 0 and m
(y)y M (dy).

RN

Finally, deduce from these that M M1 (R) and that m 0.


(iv) Suppose that X N (0, 1), and show that the distribution of |X| cannot be
infinitely divisible.
Exercise 3.2.26. The Gamma distributions is an interesting source of infinitely divisible laws. Namely, consider the family {t : t (0, )} M1 (R)
given by
xt1 ex
dx.
t (dx) = 1(0,) (x)
(t)

(i) Show by direct computation that


s ? t (dx) =

B(s, t)
1(0,) (x)xs+t1 ex dx,
(s)(t)

where

Z
B(s, t)

s1 (1 )t1 d

(0,1)

is Eulers Beta function, and conclude that s+t = s ? t . In particular, one


gets, as a dividend, the famous identity B(s, t) = (s)(t)
(s+t) .

(ii) As a consequence of (i), we know that the t s are infinitely divisible. Show
that their LevyKhinchine representation is
#
" Z


1 y
y dy
.
1 e
bt () = exp t
e
y
(0,)

Exercise 3.2.27. Given a M1 (RN ) for which there exists a strictly increasing sequence {nm : m 1} Z+ and a sequence { n1 : m 1} M1 (RN )
m

such that = ?n1 m for all m 1, show that I(RN ).


nm

Hint: First use Lemma 3.2.4 to show that


never vanishes and therefore that
N
there is a unique ` C(R ; C) such that ` (0) = 0 and
= e` . Next,
proceed as in the proof of Theorem 3.2.7 to show that P(RN ), and apply
that theorem to conclude that I(RN ).

3.3 Stable Laws

139

3.3 Stable Laws


Recall from Exercise 2.3.23 the maps T : M1 (RN ) M1 (RN ) given by the
prescription


ZZ
x+y
(dx)(dy),
T () =
1
1
2
RN RN

N
and
fixed points of T . That is, F (RN ) =
 let F (RN ) denote the set of non-trivial

M1 (R ) \ {0 } : = T . If F (RN ) and 2n denotes the distrin
n
bution of x
2 x under , then = ?2
2n for all n. Hence, by the result in
Exercises 3.2.27, I(RN ), and so F (RN ) I(RN ) for all (0, ). In this
section, I will study the Levy systems associated with elements of F (RN ).
3.3.1. General Results. Knowing that F (RN ) I(RN ), we can phrase the
N
condition = T in terms of the associated Levy systems.
 Namely, N F (R )
1
N

if and only if I(R ) \ {0 } and ` () = 2` 2 for all R . Next,


using this and Exercise 3.2.24, we see that, for F (RN ),

0
if > 2
n
2n
2
n
` () = 2n ` (2 ) = 2n( 1) 2 ` (2 )
1
2 (, C )RN if = 2

as n . Thus, we have already recovered the results in Exercises 2.3.21 and


2.3.23.
I next will examine F (RN ) for (0, 2) in greater detail. For this purpose,
define T M for M M2 (RN ) to be the Borel measure determined by
Z
Z
1

(3.3.1)
(y) T M (dy) = 2
(2 y) M (dy)
RN

RN

for Borel measurable : RN [0, ). It is easy to check that T maps


M2 (RN ) into itself.
Lemma 3.3.2. For any (0, 2),
M = T M , and
Z

1
1

1
)m =
(y) (2 y) y M (dy).
(1 2

( C = 0,

F (R ) {0 }

RN

In addition, if M M2 (RN ) \ {0} satisfies M = T M for some (0, 2), then


M M (RN ) for all > but M
/ M (RN ).
Proof: From the uniqueness of the Levy system associated with an element of
2
I(RN ), it is clear that, for any I(RN ), MT = T M , CT = 21 C ,
and
Z

1
1

1
m +
(y) (2 y) y T M (dy).
mT = 2
RN

140

3 Infinitely Divisible Laws

2
Hence, F (RN ) {0 } if and only if M = T M , C = 21 C , and, for
any satisfying (3.2.19),

1
1

(1 2

)m


1
(y) (2 y) y M (dy).

=
RN

In particular, when (0, 2), C = 0, and so the first assertion follows.


The second assertion turns on the fact that, for all n Z+ ,
n

M = T M = M B(0, 2 ) \ B(0, 2

n+1


1 
) = 2n M B(0, 1) \ B(0, 2 ) .

1 
From this we see that M B(0, 1) \ B(0, 2 ) > 0 unless M = 0 and that
P

the M -integral of |y| over B(0, 1) is bounded below by 21 n=0 2n(1 ) and
P

above by n=0 2n(1 ) . 

Theorem 3.3.3. F2 (RN ) if and only if = 0,C for some non-negative


definite, symmetric C Hom(RN ; RN ) \ {0}. If (1, 2), then F (RN ) if
and only if I(RN ) and ` () equals

12

, y

1
1


RN

M (dy)

2 <|y|1

Z 

1(,y)RN

1 1[0,1] (|y|) , y


RN

M (dy)

RN

for some M

T

>


M (RN ) \ M (RN ) satisfying M = T M . If (0, 1),

then F (RN ) if and only if I(RN ) and ` () equals


Z

1(,y)RN


1 M (dy)

RN

for some M

T


N
M
(R
)
\ M (RN ) satisfying M = T M . Finally,

>

F1 (RN ) if and only if I(RN ) and either = m for some m RN \ {0} or


` () =

1 m,


RN

+
RN

 
e 1(,y)RN 1 1 1[0,1] (|y|) , y RN M (dy)

T

N
N

for some m RN and M


(1,2] M (R ) \ M1 (R ) satisfying M = T1 M
and
Z
y M (dy) = 0.
1
2 <|y|1

3.3 Stable Laws

141

Proof: The first assertion requires no comment. When (0, 2), the if
1
assertions can be proved by checking that, in each case, ` () = 2` (2 ).
When [1, 2), the only if assertion follows immediately from Lemma 3.3.2
with = 1B(0,1) , and when (0, 1), it follows from that lemma combined
with the observation that
Z
Z
1 
1

y M (dy). 
y M (dy) =
M = T M = 1 2
1
{2 <|y|1}

B(0,1)

3.3.2. -Stable Laws. The most studied elements of F (RN ) are the stable laws: those I(RN )\{0 } such that ` (t) = t ` () for all t (0, ),
not just for t = 2. Equivalently, if M1 (RN ) is -stable if and only if
I(RN ) \ {0 } and, for all non-negative, Borel measurable functions ,
Z
Z
1
(y) t (dy) =
(t y) (dy), t (0, ),
RN

RN

where bt () = et` () . Thus, there are no -stable laws if > 2, and is 2-stable
if and only if = 0,C for some C 6= 0. To examine the -stable laws when
(0, 2), I will need the computations contained in the following lemmas.
Lemma 3.3.4. Assume that M M2 (RN ) and that (0, 2), and define the
finite Borel measure on SN 1 by
Z
 2 |y|
1
y
|y| e
M (dy)
|y|
h, i =
(2 ) RN \{0}

for bounded, Borel measurable : SN 1 C. Then, M satisfies


Z
Z
(3.3.5)
(ty) M (dy) = t
(y) M (dy), t (0, )
RN

RN

for all Cc RN \ {0}; R if and only if


Z

(y) M (dy) =
RN

(r)
SN 1

(0,)

dr

r1+

!
(d)


for all Cc RN \ {0}; R .
Proof: The if assertion is obvious. In addition, the only
 if assertion
y
will follow once I prove it for s such that (y) = 1 |y| 2 (|y|), where



1 C SN 1 ; [0, ) and 2 Cc (0, ); R . Given 1 C SN 1 ; [0, ) ,
determine the Borel measure on (0, ) by
Z

y
2 (|y|)|y|2 M (dy)
h2 , i =
1 |y|
RN \{0}

142

3 Infinitely Divisible Laws


for 2 Cc (0, ); R . Then (3.3.5) implies that
Z
e

tr

(dr) = t

(0,)

er (dr) = t2 (2 )h, i

(0,)

for t (0, ). Hence, since


Z
r1 etr dr = (2 )t2 ,

t (0, ),

(0,)

uniqueness of the Laplace transform (cf. Exercise 1.2.12) implies that (dr) =
h1 , ir1 dr, and therefore that
Z
RN \{0}

y
1 |y|

Z
M (dy) =
(0,)

2 (r)
(dr) = h1 , i
r2

Z
1 (r)
(0,)

dr

r1+

. 

Lemma 3.3.6. Let I(RN ). Then is 2-stable if and only if = 0,C for
some symmetric, non-negative definite C 6= 0; is -stable for some (0, 1)
if and only if there is a finite, non-negative Borel measure 6= 0 on SN 1 such
that
!
Z
Z
 dr
 1(,r) N
R
1 1+ (d);
` () =
e
r
SN 1
(0,)

is 1-stable if and only if there exists a finite, non-negative, Borel measure


on SN 1 and an m RN satisfying
Z
N 1
|m| + (S
) > 0 and
(d) = 0
SN 1

such that ` () equals


1 , m RN
Z
Z
+

  dr
 1(,r) N
R
1 11[0,1] (r) , r RN 2
e
r
(0,)

SN 1

!
(d);

and is -stable for some (1, 2) if and only if there is a finite, non-negative,
Borel measure 6= 0 on SN 1 such that ` () equals

1
1

Z
,
SN 1

Z
+
SN 1


RN

(d)

  dr
 1(,r) N
R
1 11[0,1] (r) , r RN 1+
e
r
(0,)

!
(d).

3.3 Stable Laws

143

Proof: The sufficiency part of each case is easy to check directly or as a consequence of Theorem 3.3.3. To prove the necessity, first check that if is -stable
and therefore ` (t) = t ` (), then M must have the scaling property in (3.3.5)
and therefore have the form described in Lemma 3.3.4. Second, when M has
this form, simply check that in each case the result in Theorem 3.3.3 translates
into the result here. 
In the following, C+ denotes the open upper half-space { C : Im() > 0}
in C, and C+ denotes its closure { C : Im() 0}. In addition, given C
, where arg is 0 if = 0 and is the
and (0, 2), we take || e 1arg

unique (, ] such that = ||e 1 if 6= 0.

Lemma 3.3.7. If (0, 1), then

1r

r1+

(0,)

(1
dr =


for C+ .

In particular,

Z
a
(0,)

and

(2)
(1)

cos
2

if (1, 2)

cos r 1
dr =
(1) cos
2

r1+

2

Z
b
(0,)

(1 )
sin r
sin
dr =
2

r1+

if (0, 1)
if = 1

if (0, 1).

Proof: Let f () denote the integral on the left-hand side of the first equation.
Clearly f is continuous on C+ and analytic on C+ . In addition, f () = f (1)
for (0, ), and Re f (1) < 0. Hence, there exist c > 0 and 0, 2 such

that f () = ce 1 for (0, ). Since C+ 7 C is the unique


continuous extension of (0,) 7 (0, ) to C+ that is analytic on
C+ , we know that f () = ce 1 for C+ . In addition,

f ( 1) =

Z
(0,)

1
er 1
dr =
1+

Z
(0,)

r er dr =

(1 )
.

and =
Hence, c = (1)
2 .

When (0, 1), the values of a and b follow immediately from the evaluation of f (1). When (1, 2), one can find the value of a by first observing
that
Z
Z
cos r 1
cos(r) 1

dr for (0, ),
dr
=

1+
r1+
r
(0,)
(0,)

144

3 Infinitely Divisible Laws

and then differentiating this with respect to to get


Z

(0,)

cos r 1
dr =
r1+

Z
(0,)

sin r
dr = b1 .
r

To evaluate a1 , simply note that

(2 ) cos
2
= . 
&1
2
1

a1 = lim a = lim
&1

Theorem 3.3.8. Let I(RN ). If (0, 2) \ {1}, then is -stable if and


only if there exists a finite, non-negative, Borel measure 6= 0 on SN 1 such
that


Z
(, )RN

(d).
` () = (1)1(0,1) ()
1
SN 1

On the other hand, is 1-stable if and only if there exist an m RN and a


finite, non-negative, Borel measure on SN 1 such that |m| + (SN 1 ) > 0,
Z
(d) = 0,
SN 1

and
` () =


1 , m RN 1

where log = log || +

Z
,
SN 1


RN

log ,


RN

(d),

1arg for C.

Proof: When (0, 1), the conclusion is a simple application of the corresponding results in Lemmas 3.3.6 and 3.3.7. When (1, 2), one has to
massage the corresponding expression in Lemma 3.3.6. Specifically, begin with
the observation that

Z
i dr
h

1
+
e 1r 1 11[0,1] (r)r 1+
r
1
(0,)
!

Z
i dr
h

1sgn()
1sgn()r

1 1sgn()1[0,1] (r)r 1+ +
= ||
e
1
r
(0,)

for R. Thus, we can write the expression for ` () as


Z
SN 1




, N g sgn(, )RN (d),
R

3.3 Stable Laws

145

where (cf. Lemma 3.3.7)

i dr
1
1 11[0,1] (r)r 1+
g (1) =
e
1
r
(0,)

 dr
1
.
sin r 1[0,1] (r)r 1+
= a 1
1
r
(0,)
Z

1r

Next use integration by parts over the intervals (0, 1] and [1, ) to check that
Z
sin r 1[0,1] (r)r
(0,)

a1

Hence, since

 dr
1
1
+
=
1+
1
r

Z
(0,)

a1
1
cos r 1
.
+
dr =

1
r

(2)
sin
= (1)
2 ,

g (1) =

(2 )
e 2 ,
( 1)

and therefore


(2 )
sgn(x, )RN , RN =
( 1)

(, )RN


.

1
.
Thus, all that we need to do is replace the in Theorem 3.3.8 by (1)
Turning to the case = 1, note that, because of the mean zero condition on
,

Z
SN 1

1(,)RN r

(0,)

11[0,1] (r)r ,

i dr


RN

r2

!
(d)

!
i dr
h
= lim
e 1(,)RN r 1 1+ (d)
%1 SN 1
r
(0,)


Z
(, )RN
(1 )

(d)
= lim
%1

1
SN 1
Z



 
1
, RN , RN (d)
= 1 lim
%1 1 SN 1
Z



, RN log , RN (d),
= 1
Z

SN 1

where I have used (1 )(1 ) = (2 ) 1. 


I close this section with a couple of examples of particularly important stable
laws.

146

3 Infinitely Divisible Laws

Corollary 3.3.9. For any (0, 2], is a symmetric and -stable law if
and only if there is a finite, non-negative, symmetric, Borel measure 6= 0 on
SN 1 such that
Z


, N (d).
` () =
R
SN 1

Moreover, is a rotationally invariant, -stable law if and only if ` () = t||


for some t (0, ).
Proof: If is 2-stable, then = 0,C for some C 6= 0 and is therefore symmetric. In addition, by defining on SN 1 so that
 
Z
y
1
0,C (d),
|y|2
h, i =
|y|
2 RN

we see that
` () =


1
, C RN =
2

Z
SN 1



, N 2 (d).
R

If (0, 2) \ {1}, then, for every non-zero, symmetric on SN 1 ,


Z

SN 1



(,



Z


(, )RN

1(0,1) ()

(d)
(d) = (1)

csc
RN
2
1
SN 1

is ` () for a symmetric, -stable . Conversely, if is symmetric and -stable


for some (0, 1), then, because ` () = ` (), the associated in Theorem
3.3.8 can be chosen to be symmetric, in which case ` () equals
(1)

1(0,1) ()

SN 1

(, )RN

Z





, N (d).
() = cos

R
2
SN 1

To handle the case when = 1, first suppose that 6= 0 on SN 1 is symmetric.


Then
Z
Z



, N (d) = 2

, RN (d)
R
SN 1

{:(,)RN <0}


RN

log ,


RN

+ ,


RN

log ,


RN

(d)

{:(,)RN <0}

1
= 1

Z
,
SN 1


RN

log ,


RN

(d),

which is ` () for a symmetric, 1-stable . Conversely, if is symmetric and


1-stable, one can use ` () = ` () to see that m = 0 and is symmetric in

Exercises for 3.3

147

the expression for ` () in Theorem 3.3.8. Hence, by the preceding calculation,


` () has the desired form.
Finally, if is a rotationally invariant, -stable law, then ` () is a rotationally
invariant function of and therefore the preceding leads to
Z




|| , 0 (d 0 ) SN 1 (d) = t|| ,

` () =
SN 1

SN 1

where SN 1 is normalized surface measure on SN 1 and

t = (SN 1 )

SN 1

SN 1



e, N SN 1 (d)
R

for any e SN 1 . Conversely, by taking to be an appropriate multiple of


SN 1 , one sees that, for any t (0, ), t|| is ` () for a symmetric, stable . 

Exercises for 3.3


Exercise 3.3.10. Given (0, 2), define S for finite, non-negative, Borel
1
measures on B(0, 1) \ B(0, 2 ) by

S () =

X
mZ

2m

1 (2 y) (dy),
RN

and show that this map is one-to-one and onto the set of M M2 (RN ) satisfying
(cf. (3.3.1)) M = T M . Conclude that, for each (0, 2), F (RN ) contains
lots of elements!
Exercise 3.3.11. Here are a few further properties of elements of F (RN ).

(i) Show that there is F (RN ) such that {y : (e, y)RN < 0} = 0 for some
e SN 1 if and only if (0, 1).
Hint: Reduce to the case when N = 1, and look at Exercise 3.2.24.

N 1
(ii) If F1 (RN ), show that,
, {y : (e, y)RN < 0} >
 for every e S
0 {y : (e, y)RN > 0} > 0.
(iii) If (1, 2), show that for each  > 0 there is a F (R) such that
(, ] = 0.
Exercise 3.3.12. Take N = 1. This exercise is about an important class of
stable laws known as one-sided stable laws: stable laws that are supported
on [0, ).
(i) Show that there exists a one-sided -stable law only if (0, 1).

148

3 Infinitely Divisible Laws

(ii)If
(0, 1), show that is a one-sided -stable law if and only if ` () =

for some t (0, ).


t 1

(iii) Let (0, 1), and use t to denote the one-sided -stable law with `t () =


t 1 . Show that

Z
e

1y

t (dy)

= exp t

[0,)

 
for C with Im() 0.

In particular, use Exercise 1.2.12 to conclude that t is characterized by the


facts that it is supported on [0, ) and its Laplace transform is given by
Z

ey t (dy) = et ,

0.

[0,)

Exercise 3.3.13. Given (0, 2], let


t denote the symmetric -stable law,
described in Corollary 3.3.9, with `t () = t|| . Clearly 2t = 0,2tI . When
(0, 2), show that
Z

t =
0,2 I t2 (d ),
[0,)

where t2 is the one-sided 2 -stable law in part (iii) of the preceding exercise.
This representation is an example of subordination, and, as we will see in
Exercise 3.3.17, can be used to good effect.

Exercise 3.3.14. Because their Fourier transforms are rapidly decreasing,


we know that each of the measures t in part (iii) of Exercise 3.3.11 admits a
smooth density with respect to Lebesgue measure R on R. In this exercise, we
examine these densities.
(i) For (0, 1), set
h
t =

(3.3.15)

and show that

dt
dR

for t (0, ),

t
e h
,
t ( ) d = e

0
1


h1 (t ).
and that h
t ( ) t

[0, ),

Exercises for 3.3

149

(ii) Only when = 12 is an explicit expression for h


1 readily available. To find
this expression, first note that, by the uniqueness of the Laplace transform (cf.
1

Exercise 1.2.12) and (i), h12 is uniquely determined by


Z
1
2
e h12 ( ) d = e , [0, ).
0

Next, show that


Z
1
2
a2
1
2 e2ab
2 e( +b ) d =
b
0

and

2 e(

a2

+b2 )

d =

2 e2ab
a

for all (a, b) (0, ) , and conclude from the second of these that
1

1(0,) ( )e 4
.
h1 ( ) =

3
4 2
1
2

(3.3.16)

Hint: To prove the first identity, try the change of variables x = a 2 b 2 ,


and get the second by differentiating the first with respect to a.

Exercise 3.3.17. In this exercise we will discuss the densities of the symmetric
stable laws
t for (0, 2) (cf. Exercise 3.3.13). Once again, we know that
each
admits
a smooth density with respect to Lebesgue measure RN on RN .
t
Further, it is clear that this density is symmetric and that

1
1 d
d
t
1
(t x)
(x) = t
dRN
dRN

for t (0, ).

(i) Referring to Exercise 3.3.14 and using Exercise 3.3.12, show that
Z
|x|2

1
d
1
N
2 e 4 h 2 ( ) d.

(x) =
(3.3.18)
N
dRN
(4) 2 0
1

(ii) Because we have an explicit expression for h12 , we can use (3.3.18) to get an

explicit expression for

d11
dRN

. In fact, show that

N
2tN
d1t
(t, x) (0, ) RN ,
(x) = tR (x)
N +1 ,
dRN
N (t2 + |x|2 ) 2
1
N +1
is the surface area of SN in RN +1 . The function
where N = 2 2 N2+1
R
1 is the density for what probabilists call the Cauchy distribution. For
N
general N s, (t, x) (0, ) RN 7 tR (x) is what analysts call the Poisson
kernel for the right half-space in RN +1 . That is (cf. Exercise 10.2.22), if f
Cb (RN ; R), then
Z
N
(t, x)
uf (t, x) =
f (x y) tR (y) dy

(3.3.19)

RN

is the unique, bounded harmonic extension of f to the right half-space.

150

3 Infinitely Divisible Laws

(iii) Given (0, 2), show that


kf k2

ZZ

RN RN

|yx|

Z
f (x)f (y) dxdy =
RN

|f()|2
1 (d)

for f L1 (RN ; C). This can be used to prove that k k determines a Hilbert
norm on Cc (RN ; C).

Chapter 4
L
evy Processes

Although analysis was the engine that drove the proofs in Chapter 3, probability
theory can do a lot to explain the meaning of the conclusions drawn there.
Specifically, in this chapter I will develop an intuitively appealing way of thinking
about a random
variable
 X whose distribution is infinitely divisible, an X for
 1 (,X)
P
N
R
equals
which E e


exp

1 , m)

1
2

, C

Z
+
RN


RN


h


 i
1 (,y)RN
1 1 1[0,1] |y| , y RN M (dy)
e

for some m RN , some symmetric, non-negative definite C Hom(RN ; RN ),


and Levy measure M M2 (RN ). In most of this chapter I will deal with the
case when there is no Gaussian component. That is, I will be assuming that
C = 0. Because it is distinctly different, I will treat the Gaussian component
separately in the final section. However, I begin with some general comments
that apply to the considerations in the whole chapter.
The key idea, which seems to have been Levys, is to develop a dynamic
picture of X. To understand the origin of his idea, denote by I(RN ) the
distribution of X, and define ` accordingly, as in Theorem 3.2.7. Then, for
each t [0, ), there is a unique t I(RN ) for which bt = et` , and so
s+t = s ? t for all s, t [0, ). Levys idea was to associate with {t : t 0}
a family of random variables {Z(t) : t 0} that would reflect the structure of
{t : t 0}. Thus, Z(0) = 0 and, for each (s, t) [0, ), Z(s + t) Z(s)
should be independent of {Z( ) : [0, s]} and have distribution t . In other
words, {Z(t) : t 0} should be the continuous parameter analog of the sums
of independent, identically distributed random variables. Indeed, given any >
0, let {Xm : m 0} be a sequence of independent random variables with
distribution . Then {Z(nP
) : n 0} should have the same distribution as
{Sn : n 0}, where Sn = 1mn Xm . This observation suggests that one
should think about t
Z(t) as a evolution that, when one understands its
dynamics, will reveal information about Z(1) and therefore .
151

152

4 Levy Processes

For reasons that should be obvious now, an evolution {Z(t) : t [0, )} of the
sort described above used to be called a process with independent, homogeneous increments, the term process being the common one for continuous
families of random variables and the adjective homogeneous referring to the
fact that the distribution of the increment Z(t) Z(s) for 0 s < t depends
only on the length t s of the time interval over which it is taken. In more
recent times, a process with independent, homogeneous increments is said to be
a L
evy process, and so I will adopt this more modern terminology.
Assuming that the family {Z(t) : t [0, )} exists, notice that we already
know what the joint distribution of {Z(tk ) : k N} must be for any choice of
0 = t0 < < tk < . Indeed, Z(0) = 0 and


P Z(tk ) Z(tk1 ) k , 1 k K =

K
Y

tk tk1 (k )

k+1

for any K Z+ and 1 , . . . , K BRN . Equivalently, P Z(tk ) k , 0 k K


equals
X
 Y
Z
Z Y
K
k
K
10 (0)
1k
yj
tk tk1 (dyk )
(RN )K

k=1

j=1

k=1

for any K Z+ and 0 , . . . , K BRN . My goal is this chapter is to show that


each I(RN ) admits a Levy process {Z (t) : t 0} and that the construction
of the associated Levy process improves our understanding of .
Unfortunately, before I can carry out this program, I need to deal with a few
technical, bookkeeping matters.
4.1 Stochastic Processes, Some Generalities
Given an index A with some nice structure and a family {X() : A} of
random variables on a probability space (, F, P) taking values in some measurable space (E, B), it is often helpful to think about {X() : A} in terms of
the map 7 X( , ) E A . For instance, if A is linearly ordered, then

X( , ) can be thought of as a random evolution. More generally, when


probabilists want to indicate that they are thinking about {X() : A} as
the map
X( , ), they call {X() : A} a stochastic process on A
with state space (E, B).
The distribution of a stochastic process is the probability measure X P
on1 (E A , B A ) obtained by pushing P forward under the map
X( , ).
Hence two stochastic processes {X() : A} and {Y () : A} on (E, B)
have the same distribution if and only if


P X(k ) k , 0 k K = P Y (k ) k , 0 k K
Recall that BA is the -algebra over E A that is generated by all the maps E A 7
() E as runs over A.
1

4.1 Stochastic Processes, Some Generalities

153

for all K Z+ , {0 , . . . , K } A, and 0 , . . . , K B.


As long as A is countable, there are no problems because E A is a reasonably
tame object and B A contains lots of its subsets. However, when A is uncountable,
E A is a ridiculously large space and B A will be too meager to contain many of
the subsets in which one is interested. The point is that for B to be in B A there
must (cf. Exercise 4.1.11) be a countable subset {k : k N} of A such that
one can determine whether or not B by knowing {(k ) : k N}. Thus

[0,)
(cf. Exercise 4.1.11), for instance, C [0, ); R
/ BR
.
Probabilists expended a great deal of effort to overcome the problem raised in
the preceding paragraph. For instance, using a remarkable piece of measure theoretic reasoning, J.L. Doob2 proved that in the important case when A = [0, )
and E = R, one can always make a modification,
what he called the separable

modification, so that sets like C [0, ); R become measurable. However, in
recent times, probabilists have tried to simplify their lives by constructing their
processes in such a way that these unpleasant measurability questions never
arise. That is, if they suspect that the process should have some property that
is not measurable with respect to B A , they avoid constructions based on general principles, like Kolmogorovs Extension Theorem (cf. part (iii) of Exercise
9.1.17), and instead adopt a construction procedure that produces the process
with the desired properties already present.
The rest of this chapter contains important examples of this approach, and
the rest of this section contains a few technical preparations.
4.1.1. The Space D(RN ). Unless its Levy measure M is zero, a Levy process for I(RN ) cannot be constructed so that it has continuous paths. In
fact, if M 6= 0, then t
Z (t) will be almost never continuous. Nonetheless, {Z (t) : t 0} can be constructed so that its paths are reasonably nice.
Specifically, its paths can be made to be right-continuous everywhere and have no
oscillatory discontinuities. For this reason, I introduce the space D(RN ) of paths
: [0, ) RN such that (t) = (t+) lim &t ( ) for each t [0, )
and (t) lim %t ( ) exists in RN for each t (0, ). Equivalently,
(0) = (0+), and, for each t (0, ) and  > 0, there is a (0, t) such that
sup{|(t)( )| : (t, t+)} <  and sup{|(t)( )| : (t, t)} < .
The following lemma presents a few basic properties possessed by elements of
n
D(RN ). In its statement, for n N and (0, ), b c+
: m
n = min{m2
+
n

+
n
n
Z and m 2 } and b cn = b cn 2
= max{m2
: m N and m <
2n }. In addition, for 0 a < b,
(4.1.1)

kk[a,b] sup |(t)|


t[a,b]

See Chapter II of Doobs Stochastic Processes, Wiley (1953).

154

4 Levy Processes

is the uniform norm of  [a, b], and


X
K
var[a,b] () = sup
|(tk ) (tk1 )| : K Z+
(4.1.2)

k=1


and a = t0 < t1 < < tK = b
is the total variation of  [a, b].
Lemma 4.1.3.
r > 0, the set

If D(RN ), then, for each t > 0, kk[0,t] < , and for each
J(t, r, ) { (0, t] : |( ) ( )| r}

is finite subset of (0, t]. In addition, there exists an n(t, r, ) N such that, for
every n n(t, r, ) and m Z+ (0, 2n ],



 
m2n t (m 1)2n t r = m2n = + for some J(t, r, ).
t n

Finally,


kk[0,t] = lim max |(m2n t)| : m N [0, 2n ]
n

and
var[0,t] () = lim




m2n t (m 1)2n t .

mZ+ [0,2n ]

Proof: Begin by noting that it suffices to treat the case when t = 1, since one
can always reduce to this case by replacing with
(t ).
If kk[0,1] were infinite, then we could find a sequence {n : n 1} [0, 1] such
that |(n )| , and clearly, without loss in generality, we could choose this
sequence so that n [0, 1] and {n : n 1} is either strictly decreasing or
strictly increasing. But, in the first case this would contradict right-continuity,
and in the second it would contradict the existence of left limits. Thus, kk[0,1]
must be finite.
Essentially the same reasoning shows that J(1, r, ) is finite. If it were not,
then we could find a sequence {n : n 0} of distinct points in (0, 1] such
that |(n ) (n )| r, and again we could choose them so that they were
either strictly increasing or strictly decreasing. If they were strictly increasing,
then n % for some (0, 1] and, for each n Z+ , there would exist a
n0 (n1 , n ) such that |(n ) (n0 )| 2r , which would contradict the
existence of a left limit at . Similarly, right-continuity would be violated if the
n s were decreasing.
Although it has the same flavor, the proof of the existence of n(1, r, ) is a
bit trickier. Let 0 < 1 < K 1 be the elements of J(1, r, ). If n(1, r, )

4.1 Stochastic Processes, Some Generalities

155

failed to exist, then we could choose a subsequence {(mj , nj ) : j 1} from


Z+ N so that
1} is strictly increasing and tj mj 2nj (0, 1]
 {nj : j n
satisfies tj tj 2 j r for all j Z+ , but tj 6= bk c+
nj for any
j Z+ and 1 k K. If tj = t infinitely often for some t, then we would
have the contradiction that t
/ J(1, r, ) and yet |(t) (t)| r. Hence,
I will assume that the tj s are distinct. Further, without loss in generality, I
assume that {tj : j 1} is a subset of one of the intervals (0, 1 ), (k1 , k )
for some 2 k K, or of (K , 1]. Finally, I may and will assume that either
tj % t (0, 1] or that tj & t [0, 1). But, since |(tj ) (tj 2nj )| r,
tj % t contradicts the existence of (t). Similarly, if tj & t and tj 2nj t
for infinitely many j 0 s, then we get a contradiction with right-continuity at t.
Thus, the only remaining case is when tj & t and tj 2nj < t tj for all but
a finite number of js, in which case we get the contradiction that t
/ J(1, r, )
and yet


|(t) (t)| = lim (tj ) tj 2nj r.
j

To prove the assertion about kk[0,1] , simply observe that, by monotonicity,


the limit exists and that, by right-continuity, for any t [0, 1],




lim max (m2n ) kk[0,1] .
|(t)| = lim btc+
n
n
n

n 0m2

The assertion about var[0,1] () is proved in essentially the same manner, although now the monotonicity comes from the triangle inequality and the first
equality in the preceding must be replaced by |(t)(t)| = limn |(btc+
n )

(btcn )|. 
I next give D(RN ) the topological structure corresponding to uniform convergence on compacts, or, equivalently, the topological structure for which
(, 0 )

2n

n=1

k 0 k[0,n]
1 + k 0 k[0,n]

is a metric. Because it is not separable (cf. Exercise 4.1.10), this topological


structure is less than ideal. Nonetheless, the metric is complete. To see
that it is, first observe that |( )| kk[0,t] for all 0 < t. Thus, if
sup`>k (` , k ) 0 as k , then there exist paths : [0, ) RN and
: (0, ) RN such that
sup |k ( ) ( )| 0
[0,t]

and

)| 0
sup |k ( ) (
(0,t]

for each t > 0. Therefore, if t n & , then

lim |( ) (n )| 2k k k[0,t] + lim |k ( ) k (n )| 2k k k[0,t]

156

4 Levy Processes

for all k Z+ , and so is right-continuous. Essentially the same argument


) for > 0, which means, of course, that D(RN )
shows that ( ) = (
and that sup (0,t] |k ( ) ( )| 0 for each t > 0.
One might think that I would take the measurable structure on D(RN ) to be
the one given by the Borel field BD(RN ) determined by uniform convergence on
compacts. However, this is not the choice I will make. Instead, the measurable
structure I choose for D(RN ) is the one that D(RN ) inherits as a subset of
(RN )[0,) . That is, I take for D(RN ) the
 measurable structure given by the algebra FD(RN ) = {(t) : t [0, )} , the -algebra generated by the maps
D(RN ) 7 (t) RN as t runs over [0, ). The reason for my insisting on
this choice is that I want two D(RN )-valued stochastic processes {X(t) : t 0}
and {Y(t) : t 0} to induce the same measure on D(RN ) if they have the same
distribution. Seeing as (cf. Exercise 4.1.11) FD(RN ) $ BD(RN ) , this would not be
true were I to choose the Borel structure.
Because FD(RN ) 6= BD(RN ) , FD(RN ) -measurability does not follow from topological properties like continuity. Nonetheless, many functions related to the
topology of D(RN ) are FD(RN ) -measurable. For example, the last part of Lemma
4.1.3 proves that both
kk[0,t] , which is continuous, and
var[0,t] (),
which is lower semicontinuous, are both FD(RN ) -measurable for all t [0, ).
In the next subsection, I will examine other important functions on D(RN ) and
will show that they, too, are FD(RN ) -measurable.
4.1.2. Jump Functions. Let M (RN ) be the space of non-negativeBorel
measures M on RN with the properties that M ({0}) = 0 and M B(0, r){ <
for all r > 0. A jump function is a map t [0, ) 7 j(t, ) M (RN )
j(0, ) = 0, t
with the property that, for each BRN with 0
/ ,
j(t, )
is a non-decreasing, piecewise constant element of D(RN ) such that j(t, )
j(t, ) {0, 1} for each t > 0.
Lemma 4.1.4. A map t
j(t, ) is a non-zero jump function if and only if
there exists a set 6= J (0, ) that is finite or countable and a set {y :
J} RN \ {0} such that { J (0, t] : |y | r} is finite for each
(t, r) (0, )2 and
X
(4.1.5)
j(t, ) =
1[,) (t)y .
J

In particular, if t
j(t, ) is a jump function and t > 0, then, either j(t, ) =
j(t, ) or j(t, ) j(t, ) = y for some y RN \ {0}.
Proof: It should be obvious that if J and {y : J} satisfy the stated
conditions, then the t
j(t, ) given by (4.1.5) is a jump function. To go in the
other direction, suppose that t
j(t, ) is a jump function, and, for each r > 0,
set fr (t) = j t, RN \ B(0, r) . Because t
fr (t) is a non-decreasing, piecewise
constant, right-continuous function satisfying fr (0) = 0 and fr (t) fr (t)

4.1 Stochastic Processes, Some Generalities

157

{0, 1} for each t > 0, it has at most a countable number of discontinuities, and
at most fr (t) of them can occur in any interval (0, t]. Furthermore, if fr has a
discontinuity at , then j , B(0, r) j , B(0, r) = 0, and so the measure
= j(, ) j( , ) is a {0, 1}-valued probability measure on RN that assigns
mass 0 to B(0, r). Hence (cf. Exercise 4.1.15) fr ( ) 6= fr ( ) = = y
for some y RN \ B(0, r). From these considerations, it follows easily that if
J(r) = { (0, ) : fr ( ) 6= fr ( )} and if, for each J(r), y RN \B(0, r)
is chosen so that j(, ) j( , ) = y , then J(r) (0, t] is finite for all t > 0
and
X

j t,  B(0, r){ =
1[,) (t)y .
J(r)

Thus, if J = r>0 J(r), then J is at most countable, {(, y ) : J} has the


required finiteness property, and (4.1.5) holds. 
The reason for my introducing jump functions is that every element
D(RN ) determines a jump function t
j(t, , ) by the prescription
j(t, , ) =
(4.1.6)


1 ( ) ( ) ,

J(t,)

where J(t, ) { (0, t] : ( ) 6= ( )},


for RN \ {0}.
S To check that j(t, , ) is well defined and is a jump function,
take J() = t>0 J(t, ) and y = ( ) ( ) when J(), note that,
by Lemma 4.1.3, J() is at most countable and that {(, y ) : J()} has
the finiteness required in Lemma 4.1.4, and observe that (4.1.5) holds when
j(t, ) = j(t, , ) and J = J().
Because it will be important for us to know that the distribution of a D(RN )valued stochastic process determines the distribution of the jump functions for
its paths, we will make frequent use of the following lemma.
Lemma 4.1.7. If : RN R is a BRN -measurable function that vanishes in a
neighborhood of 0, then is j(t, , )-integrable for all (t, ) [0, ) D(RN ),
and
Z
(t, ) [0, ) D(RN ) 7
(y) j(t, dy, ) R
RN

is a B[0,) FD(RN ) -measurable function that, for each , is right-continuous


and piecewise constant as a function of t. Finally,
for all Borel measurable
R
: RN [0, ), (t, ) [0, ) D(RN ) 7 RN (y)j(t, dy, ) [0, ] is
B[0,) FD(RN ) -measurable.
Proof: The final assertion is an immediate consequence of the earlier one plus
the Monotone Convergence Theorem.
Let r > 0 be given. If is a Borel measurable function that vanishes on
B(0, r), then it is immediate from the first part of Lemma 4.1.3 that is

158

4 Levy Processes

j(t, ,R)-integrable for all (t, ) [0, ) D(RN ) and, for each D(RN ),
t
(y) j(t, dy, ) is right-continuous and piecewise constant. Thus, it
RN
suffices to show that, for each t (0, ),
Z
(*)

(y) j(t, dy, ) is FD(RN ) -measurable.


RN

Moreover, it suffices to do this when t = 1 and is continuous, since rescaling


time allows one to replace t by 1 and the set of s for which (*) is true is closed
under pointwise convergence. But, by the second part of Lemma 4.1.3, we know
that
2n

X


m2n (m 1)2n =
m=1





b c+
n b cn

J(1,r,)

for n n(1, r, ), and therefore


Z
RN

2n

X


(y) j(1, dy, ) = lim
m2n (m 1)2n . 
n

m=1

Here are some properties of a path D(RN ) which are determined by


its relationship to its jump function. First, it should be obvious that
C(RN ) C [0, ); RN if and only if j(t, , ) = 0 for all t > 0. At the opposite
extreme, say that a is an absolutely pure
jump path if and only if (cf.
R
3.2.2) j(t, , ) M1 (RN ) and (t) = y j(t, dy, ) for all t > 0. Among
the absolutely pure jump paths are those that are the piecewise constant paths:
those absolutely pure jump s for which j(t, , ) M0 (RN ), t > 0. Because
of Lemma 4.1.7, each of these properties is FD(RN ) -measurable. In particular, if
{Z(t) : t 0} is a D(RN )-valued stochastic process whose paths almost surely
have any one of these properties, then the paths of every D(RN )-valued stochastic
process with the same distribution as {Z(t) : t 0} will almost surely possess
that property.
Finally, I need to address the question of when a jump function is the jump
function for some D(RN ).
Theorem 4.1.8. Let t
j(t, ) be a non-zero jump function, and set j (t, dy)
and if (t) =
=
/
R 1 (y)j(t, dy) for BRN . If BRN with 0
y j(t, dy), then is a piecewise constant element of D(RN ), j(t, , ) =

N
j (t, ), and j(t, , ) = j R \ (t, ) = j(t, )j (t, ) for any D(RN )
whose jump function is t
j(t, ). Finally, suppose that {m : m 0}
N
D(R ) and a non-decreasing Ssequence {m : m 0} BRN satisfy the

conditions that RN \ {0} = m=0 m and, for each m N, 0


/ m and
m
j(t, , m ) = j (t, ), t 0. If m uniformly on compacts, then
j(t, , ) = j(t, ), t 0.

Exercises for 4.1

159

Proof: Throughout the proof I will use the notation introduced in Lemma
4.1.4.
we know that
Assuming that 0
/ ,
X
j (t, ) =
1[,) (t)1 (y )y ,
J

where, for each t > 0, there are only finitely many non-vanishing terms. At the
same time,
X
X
(t) =
1[,) (t)1 (y )y and j(t, , ) =
1[,) (t)1RN \ (y )y
J

if j(t, , ) = j(t, ). Thus, all that remains is to prove the final assertion. To
this end, suppose that j(t, , ) 6= j(t, , ). Since k m k[0,t] 0, there
exists an m such that m (t) 6= m (t) and therefore that j(t, ) j(t, ) = y
for some y m . Since this means that n (t) n (t) = y for all n m, it
follows that (t) (t) = y and therefore that j(t, , ) j(t, , ) = y =
j(t, ) j(t, ). Conversely, suppose that j(t, ) 6= j(t, ) and choose m so
that j(t, ) j(t, ) = y for some y m . Then n (t) n (t) = y for
all n m. Thus, since this means that (t) (t) = y, we again have that
j(t, , ) j(t, , ) = y = j(t, ) j(t, ). After combining these, we see
that j(t, , ) j(t, , ) = j(t, ) j(t, ) for all t > 0, from which it is an
easy step to j(t, ) = j(t, , ) for all t 0. 
Exercises for 4.1
Exercise 4.1.9. When dealing with uncountable collections of random variables, it is important to understand what functions are measurable with respect
to them. To be precise, suppose that {Xi : i I} is a non-empty collection of
functions on some space
 with values in some measurable space (E, B), and let
F = {Xi : i I} be the -algebra over which they generate. Show that
+
A F if and only if there is a sequence {im : m Z+ } I and an B Z
such that



A = : Xi1 (), . . . , Xim (), . . . .
More generally, if f : R, show that f is F-measurable if and only if there
+
+
is a sequence {im : m Z+ } I and a F Z -measurable F : E Z R such
that

f () = F Xi1 (), . . . , Xim (), . . . .
Hint: Make use of Exercise 1.1.12.
Exercise 4.1.10. Let e SN 1 , set t ( ) = 1[t,) ( )e for t [0, 1], and show
that kt s k[0,1] = 1 for all s 6= t from [0, 1]. Conclude from this that D(RN )
is not separable in the topology of uniform convergence on compacts.

160

4 Levy Processes

Exercise 4.1.11. Using Exercise 4.1.9, show that a function : D(RN ) R


is FD(RN ) -measurable if and only if there exists an (RN )N -measurable function
: (RN )N R and a sequence {tk : k N} [0, ) such that

() = (t0 ), . . . , (tk ), . . . , D(RN ).
Next, define t as in Exercise 4.1.10, and use that exercise together with the
preceding to show that the open set { D(RN ) : t [0, 1] k t k[0,1] < 1}
is not FD(RN ) -measurable. Conclude that BD(RN ) % FD(RN ) . Similarly, conclude
that neither D(RN ) nor C(RN ) is a measurable subset of (RN )[0,) . On the
other hand, as we have seen, C(RN ) FD(RN ) .
Exercise 4.1.12. Show that
Z
(4.1.13)
var[0,t] ()
|y| j(t, dy, ),

(t, ) [0, ) D(RN ).

RN

Hint: This is most easily seen from the representation of j(t, , ) in terms of
point masses at the discontinuities of . One can use this representation to show
that, for each r > 0,
Z
X



var[0,t] ()
( ) ( ) =
|y| j(t, dy, ), (t, ) [0, ).
|y|r

J(t,r,)

Exercise
4.1.14. If is an absolutely pure jump path, show that var[0,t] () =
R
|y| j(t, dy, ) and therefore that has locally bounded variation. Conversely,
if C(RN ) has locally bounded variation,
show that is an absolutely pure
R
jump path if and only if var[0,t] () = |y| j(t, dy, ). Finally,
if D(RN )
R
and j(t, , ) M1 (RN ) for all t 0, set c (t) (t) y j(t, dy, ) and
show that c C(RN ) and
Z
var[0,t] () = var[0,t] (c ) + |y| j(t, dy, ).
Exercise 4.1.15. If M1 (RN ), show that () {0, 1} for all BRN if
and only if = y for some y RN .
Hint: Begin by showing that it suffices to handle the case when N = 1. Next,
assuming that N = 1, show that is compactly supported, let m be its mean
value, and show that = m .
4.2 Discontinuous L
evy Processes
In this section I will construct the Levy processes corresponding to those
I(RN ) with no Gaussian component. That is,



1 , m RN

() = exp

(4.2.1)
Z h

 i
1(,y)
1 1 1[0,1] (|y|) , y RN M (dy) .
+
e
RN

4.2 Discontinuous Levy Processes

161

Because they are the building blocks out of which all such processes are made,
I will treat separately the case when is a Poisson measure M for some M
M0 (RN ) and will call the corresponding Levy process the Poisson process
associated with M .
4.2.1. The Simple Poisson Process. I begin with the case when
P N 1= 1
1
and M = 1 , for which M is the
simple
Poisson
measure
e
m=0 m! m



whose Fourier transform is exp e 1 1 .


To construct the Poisson process associated with 1 , start with a sequence
{m : m 1} of independent, unit exponential random variables on a probability space (, F, P). That is,
!
n
X

+
P { : 1 () > t1 , . . . , n () > tn } = exp
tm
m=1

for all n Z+ and (t1 , . . . , tn ) Rn . Without loss in generality, I may and will
assume that m () > 0 for all m Z+ and . InPaddition, by The Strong

Law of Large Numbers, I may and will assume


that m=1 m () = for all
Pn
. Next, set T0 () = 0 and Tn () = m=1 m (), and define
(4.2.2) N (t, ) = max{n N : Tn () t} =

1[Tn (),) (t)

for t [0, ).

n=1

Clearly t
N (t, ) is a non-decreasing, right-continuous, piecewise constant, Nvalued path that starts at 0 and, whenever it jumps, jumps by +1. In particular,
N ( , ) D(RN ), N (t, ) N (t, ) {0, 1} for all t (0, ), and (cf. (4.1.6))
j t, , N ( , ) = N (t, )1 .


Because P N (t) = n = P Tn t < Tn+1 , P N (t) = 0 = P(1 > t) = et ,
and, when n 1 (below || denotes the Lebesgue measure of BRn )
Z
Z
Pn+1

P N (t) = n , = e m=1 m d1 dn+1 = et |B|,
A


Pn
Pn+1
where A = (1 , . . . , n+1 ) (0, )n+1 :

t
<

and
m
m
m=1
m=1

Pn
B = (1 , . . . , n ) (0, )n :

t
.
By
making
the
change
of
m
m=1
Pm
variables sm = j=1 j and remarking that the associated Jacobian is 1, one


sees that |B| = |C|, where C = (s1 , . . . , sn ) Rn : 0 < s1 < < sn t .
n
Since |C| = tn! , we have shown that the P-distribution of N (t) is the Poisson
measure t1 . In particular, 1 is the P-distribution of N (1).
I now want to use the same sort of calculation to show that {N (t) : t [0, )}
is a simple Poisson process, that is, a Levy process for 1 . (See Exercise
4.2.18 for another, perhaps preferable, approach.)


162

4 Levy Processes

Lemma 4.2.3. For any (s, t) [0, ), the P-distribution of the increment
N (s + t) N (s) is t1 . In addition, for any K Z+ and 0 = t0 < t1 < < tK ,
the increments {N (tk ) N (tk1 ) : 1 k K} are independent.
Proof: What I have to show is that, for all K Z+ , 0 = n0 nK , and
0 = t0 < t1 < < tK ,

P N (tk ) N (tk1 ) = nk nk1 , 1 k K
K
Y
e(tk tk1 ) (tk tk1 )nk nk1
,
(nk nk1 )!

k=1

which is equivalent to checking that


K
Y

(tk tk1 )nk nk1
;
P N (tk ) = nk , 1 k K = etK
(nk nk1 )!
k=1

and, since the case when nK = 0 is trivial, I will assume that nK 1. In fact,
because neither side is changed if one removes those nk s for which nk = nk1 ,
I will assume that 0 = n0 < < nK .
Begin by noting that


P N (tk ) = nk , 0 k K = P Tnk tk < Tnk+1 , 1 k K
Z
Z
PnK +1
= e m=1 m d1 dnK +1 = etK |B|,
A

where
(
A=

nK +1

(1 , . . . , nK +1 ) (0, )

nk
X

m tk <

m=1

nX
k +1

)
m , 1 k K

m=1

and
(
B=

(1 , . . . , nK ) (0, )nK : tk1 <

nk
X

)
m tk : 1 k K

m=1

Pm
To compute |B|, make the change of variables sm = j=1 j to see that |B| =
|C|, where


C = (s1 , . . . , snK ) RnK : tk1 < snk1 +1 < < snk tk for 1 k K .
Finally, for 1 k K, set


Ck = (snk1 +1 , . . . , snk ) Rnk nk1 : tk1 < snk1 +1 < < snk tk ,

4.2 Discontinuous Levy Processes

163

and check that


|C| =

|Ck | =

kS

Y (tk tk1 )nk nk1


. 
(nk nk1 )!

kS

The simple Poisson process {N (t) : t 0} is aptly named. It starts at 0,


waits a unit exponential holding time before jumping to 1, sits at 1 for another,
independent, unit exponential holding time before jumping to 2, etc. Thus, since
1 is the distribution of this process at time 1, we now have an appealing picture
of the way in which simple Poisson random variables arise.
Given [0, ), I will say that a D(R)-valued process whose distribution is
the same as that of {N (t) : t 0} is a simple Poisson process run at rate
.
4.2.2. Compound Poisson Processes. I next want to build a Poisson
process associated with a general M M0 (RN ). If M = 0, there is nothing to
do, since the corresponding process will simply sit at 0 for all time. If M 6= 0, I
write it as , where = M (RN ) and = M
. After augmenting the probability
space if necessary, I introduce a sequence {Xn : n 1} of mutually independent,
-distributed, random variables that are independent of the unit exponential
random variables {m : m 1} out of which I built the simple Poisson process
{N (t) : t 0} in the preceding subsection. Further, since M ({0}) = 0, I may
and will assume that none of the Xn s is ever 0. Finally, set

(4.2.4)

ZM (t, ) =

Xn (),

1nN (t,)

with the understanding that a sum over the empty set is 0.


Clearly, the process {ZM (t) : t 0} is nearly as easily understood as is the
simple Poisson process. Like the simple Poisson process, its paths are rightcontinuous, start at 0, and are piecewise constant. Further, its holding times
and jumps are all independent of one another. The difference is that its holding
times are now -exponential random variables (i.e., exponential with mean value
1
) and its jumps are random variables with distribution . In particular,

(4.2.5)


j t, , ZM ( , ) =

X
1nN (t,)

Xn () =

1[Tn (),) (t)Xn () .

n=1

I now want to check that {ZM (t) : t 0} is a Levy process for M and, as
such, deserves to be called a Poisson process associated with M : the one with
. That is, I want to show that, for
rate M (RN ) and jump distribution M M
(RN )
each 0 = t0 < t1 < tK , the random variables ZM (tk ) ZM (tk1 ), 1 k K,

164

4 Levy Processes

are mutually independent and that the kth one has distribution (tk tk1 )M .
Equivalently, I need to check that, for any 1 , . . . , K RN ,
!#
"
K
K
X
Y


P
1
k , ZM (tk ) ZM (tk1 ) RN
=
[
E exp
k M (k ),
k=1

k=1

where k = tk tk1 . But, because of our independence assumptions, the above


expectation is equal to
X

P N (tk ) N (tk1 ) = nk nk1 , 1 k K
nK n1 0

K
X
EP exp 1

k , Xm


RN

k=1 nk1 +1<mnk

K
K
n n
Y
Y
ek k k k1
(k )nk nk1 =
[
k M (k ).
(nk nk1 )!

nK n1 0 k=1

k=1

Any stochastic process {Z(t) : t 0} with right-continuous, piecewise constant paths and the same distribution as the process {ZM (t) : t 0} just
constructed is called a Poisson process associated with M .
Here is a beautiful and important procedure for transforming one Poisson
process into another.
0

Lemma 4.2.6. Suppose that F : RN RN is a Borel measurable function


0
that takes the origin in RN into the origin in RN , and, for M M0 (RN ), define
0
M F M0 (RN ) by


M F () = M F 1 \ {0}
for BRN 0 .
If {Z(t) : t 0} is a Poisson process associated with M and
Z

F
(4.2.7)
Z (t, ) =
F (y) j t, dy, Z( , ) for (t, ) [0, ) ,
RN

then {ZF (t) : t 0} is a Poisson process associated with M F . Moreover, if,


for each i in an index set I, Fi : RN RNi is a Borel measurable satisfying
N
Fi (0) = 0 and, for each
most one i I for which Fi (y) 6= 0,
 Fy R , there is at
i
then the processes {Z (t) : t 0} : i I are mutually independent.
Proof: In proving the first part, I will, without loss in generality, assume that
(cf. (4.2.4)) Z = ZM . But then, by (4.2.5),
X

ZF (t, ) =
F Xn () ,
1nN (t,)

4.2 Discontinuous Levy Processes

165

from which the first assertion follows immediately from the same computation
with which I just showed that {ZM (t) : t 0} is a Poisson process associated
with M .
To prove the second assertion, I begin by observing that it suffices to treat
the case when I = {1, 2}. To see this, suppose that we know the result in that
case, and let n > 2 and a set {i1 , . . . , in } of distinct elements from I be given.
By taking F1 = (Fi1 , . . . , Fin1 ), F2 = Fin , and applying the assumed result, we


would have that {ZFin (t) : t 0} is independent of ZFi1 (t), . . . , ZFin1 (t) :
t 0 . Hence,
 F proceeding by induction, we would be able to show that the
processes {Z im (t) : t 0} : 1 m n are independent.
Now assume that I = {1, 2}. What I have to check is that, for any K Z+ ,
0 = t0 < t1 < < tK , and {(k1 , k2 ) : 1 k K} RN1 RN2 ,
"

K h
X

P
1
k1 , ZF1 (tk ) ZF1 (tk1 RN1
E exp
k=1

i

+ k2 , ZF2 (tk ) ZF2 (tk1 ) RN2
"
P

=E

exp

#

!#

K
X

k1 , ZF1 (tk )

F1


Z (tk1 ) RN1

k=1

"
P

exp

K
X

!#
k2 , ZF2 (tk )

F2


Z (tk1 ) RN2

k=1


For this purpose, take F : RN RN1 +N2 to be given by F (y) = F1 (y), F2 (y) ,
and set k = (k1 , k2 ). Then the first expression in the preceding equals
"
#

K
X

F
F
P
1
k , Z (tk ) Z (tk1 RN1 +N2
E exp
k=1

K
Y

h


i
EP exp 1 k , ZF (tk tk1 ) RN1 +N2
,

k=1

since {ZF (t) : t 0} has independent, homogeneous increments. Hence, it


suffices to observe that, for any t > 0 and = ( 1 , 2 ),

 Z 

h

i

EP exp , ZF (t) RN1 +N2 = exp t
e 1(,F (y))RN1 +N2 1 M (dy)
RN

 Z 

1(1 ,F1 (y))RN1
1 M (dy)
= exp t
e
RN

 Z 

1(2 ,F2 (y))RN2
1 M (dy)
exp t
e
RN
h

i h

i


= EP exp 1 , ZF1 (t) RN1 EP exp 2 , ZF2 (t) RN2 . 

166

4 Levy Processes

As an essentially immediate consequence of Lemma 4.2.6 and Theorem 4.1.8,


we have the following important conclusion.
If {Z(t) : t 0} is aPoisson process associated with M ,
BRN \{0} , j t, , Z( ) : t 0 is a simple Poisson process
Moreover, if

y j t, dy, Z and M () = M ( ) for BRN ,

then {Z (t) : t 0} is the Poisson process associated with M and j t, , Z


= j t, , Z for all (t, ) [0, ) BRN . Finally, if {i : i I} is a family
of
disjoint Borel subsets of RN \ {0}, then boththe Poisson processes
 mutually
i
{Z (t) : t 0} : i I as well as the jump processes {j t, i , Z : t 0} :
i I are mutually independent.
Theorem 4.2.8.
then, for each
run at rate M ().
Z

Z (t) =

The result in Theorem 4.2.8 says that the jumps of a Poisson process can be
decomposed into a family of mutually independent, simple Poisson processes run
at rates determined by the M -measure of the jump sizes. The next result can
be thought of as a re-assembly procedure that complements this decomposition
result.


Theorem 4.2.9. If {Zk (t) : t 0} : 1 k K are mutually independent
Poisson processes associated with {Mk : 1 k K} M0 (RN ), then
(
)
K
K
X
X
Z(t)
Zk (t) : t 0 is a Poisson process associated with M
Mk .
k=1

k=1

Next, suppose that the Mk s are mutually singular in the sense that, for each
k, there exists a k BRN \{0} with the properties that k ` = and

Mk k { = 0 = M` (k ) for ` 6= k. Then, for P-almost every ,
K
 X

j t, , Z( , ) =
j t, , Zk ( , ) ,

t [0, ).

k=1

Equivalently, for P-almost every and all t 0, there is at most one k such
that Zk (t, ) 6= Zk (t, ).
Proof: Clearly, {Z(t) : t 0} starts at 0 and has independent increments. In
addition, for any s, t [0, ) and RN ,
K
i
i Y
h
h
EP e 1(,Zk (s+t)Zk (s))RN
EP e 1(,Z(s+t)Z(s))RN =
k=1

K
Y

 Z
exp t

1(,y)RN



1 Mk (dy)

RN

k=1

 Z
= exp t
RN

1(,y)RN


1 M (dy) .


4.2 Discontinuous Levy Processes

167

Now assume that the Mk s are as in the final part of the statement, and choose
k s accordingly. Without loss in generality, I will assume that RN \ {0} =
SK
k=1 k . Also, because the assertion depends only on the joint distribution of
the processes involved, I may and will assume that
Z

Zk (t) =
y j t, dy, Z for 1 k K,
k

PK
since then Z(t) = k=1 Zk (t), and, by Theorem 4.2.8, the Zk s are independent
and the kth one is a Poisson process associated with Mk . But
 with this choice,

another application of Theorem 4.2.8 shows that j t, , Zk = j t, k , Z ,
and therefore
K
 X

j t, , Z =
j t, , Zk , t [0, ). 
k=1

Because the paths of a Poisson process are piecewise constant, they certainly
have finite variation on each compact time interval. The first part of the next
lemma provides an estimate of that variation. The estimate in the second part
will be used in 4.2.5.
Lemma 4.2.10.
M0 (RN ), then

If {Z(t) : t 0} is a Poisson process associated with M




E var[0,t] (Z) = t

|y| M (dy).
RN

In addition, if

R
RN

|y| M (dy) < and Z(t)


= Z(t)

2

 N 2t

2

[0,t] R N t EP |Z(t)|
= 2
P kZk
R
R2

R
RN

y M (dy), then

|y|2 M (dy).

RN

Proof: Again I will assume that (cf. (4.2.4)) Z = ZM , in which case


X
var[0,t] (Z) =
|Xm |.
1mN (t)

Hence (cf. the notation used in 4.1.1)





 

EP var[0,t] (Z) = EP N (t) EP |X1 | = t

Z
|y| (dy) = t

RN

|y| M (dy).
RN

Turning to the second part, begin by observing that







n

P kZk[0,t] > R = lim P


max Z m2 t > R
n
1m2n




12
n



N lim sup P
max n e, Z m2 t RN > N R .
n eSN 1

1m2

168

4 Levy Processes

Next, given e SN 1 and n 1, write


X


n

n t) Z((`

e, Z(m2
t) RN =
e, Z(`2
1)2n t) RN ,
1`m

and apply Kolmogorovs Inequality to conclude that






2 

n
12



.
P
max n e, Z m2 t RN > N R N R2 EP e, Z(t)
RN
1m2



R
M (t)|2 = t N |y|2 M (dy). To this
Thus, we will be done once I check
that
EP |Z
R


R
2
2 2

end, first note that EP |Z(t)|


= EP |Z(t)|2
t |m|2 , where m = RN y (dy).

m = Xm m, then EP |Z(t)|2 equals
At the same time, if X


2
2




X
X






P
P


E
X m = E
Xm + |m|2 EP N (t)2
1mN (t)

1mN (t)






1 |2 + |m|2 2 t2 + t = tEP |X1 |2 + 2 t2 |m|2 .
= tEP |X

 R
Thus, since EP |X1 |2 = RN |y|2 M (dy), the desired equality follows. 
4.2.3. Poisson Jump Processes. Rather than attempting to construct more
general Levy processes directly, I will first construct their jump processes and
then construct them out of their jumps. With this idea in mind, given a probability space (, F, P), I will say that (t, )
j(t, , ) is a Poisson jump process
associated with M M (RN ) if, for each , t
j(t, , ) is a jump func+
tion, and for each n Z and collection

Sn {1, . . . , n } of mutually disjoint Borel
subsets sets of RN satisfying 0
/ i=1 i , {j(t, i ) : t 0} : 1 i n are
mutually independent, simple Poisson processes, the ith of which is run at rate
M (i ) for each 1 i n. By starting with simple functions and passing to
limits, one can easily check that
Z
(t, ) [0, ) 7 (y) j(t, dy, ) [0, ]

is measurable for every Borel measurable function : RN [0, ]. Therefore,


0
if F : RN RN is a Borel measurable function, and, for T > 0,


Z
(T ) :
|F (y)| j(T, dy, ) < ,
then both the set (T ) and the function
Z
0
(t, ) [0, T ] (T )
F (y) j(t, y, ) RN
are measurable. Note that if |F (y)| vanishes for ys in a neighborhood of 0, then
(T ) = for all T > 0.
My goal in this subsection is to prove the following existence result.

4.2 Discontinuous Levy Processes

169

Theorem 4.2.11. For each M M (RN ) there exists an associated Poisson


jump process. (See 9.2.2 for another approach.)
Proof: Set A0 = RN \B(0, 1) and Ak = B(0, 2k+1 )\B(0, 2k ) for k Z+ , and
define Mk(dy) = 1Ak (y) M (dy). Next, choose mutually independent Poisson
processes {Zk (t) : t 0} : k N  so that the kth one is associated with Mk ,
and set jk (t, , ) = j t, , Zk ( , ) . Without loss in generality, I may and will
assume that jk (t, Ak {, ) = 0 for P
all (t, ) [0, ) and k N. In addition,
m
by Theorem 4.2.9, if Z(m) (t) = k=0 Zk (t), then we know that, for P-almost
every ,
m
 X
j (m) (t, , ) j t, , Z(m) ( , ) =
jk (t, , ),

t 0.

k=0

Hence, I may and will assume that


t

j(t, , )

jk (t, , )

k=1

is a jump function for all . Finally, suppose that {i : 1 i n} BRN


Sn
Sn
are disjoint and that 0
/ i=1 i . Choose m N so that ( 1 i ) B(0, 2m ) =
, and note that, P-almost surely, j(t, i , ) = j (m) (t, i , ) for all t 0 and
1 i n. Hence, the required independence property is a consequence of the
last part of Theorem 4.2.8. 
In preparation for the next section, I prove the following.
0

Lemma 4.2.12. Let F : RN RN be a Borel measurable function satisfying



0
F (0) = 0 and 0
/ F 1 RN \ B(0, r) for any r > 0. For any M M (RN ),
0
M F M (RN ). Moreover, if {j(t, ) : t 0} is a Poisson jump process
associated with M and j F (t, , ) j t, F 1 (\{0}), , then {j F (t, ) : t 0}
is a Poisson jump process associated with (cf. Lemma 4.2.6) M F . Finally, if
0
0
/ F 1 (RN \ {0}) and
Z
Z
ZF (t, ) y j F (t, dy, ) = F (y) j(t, dy, ),
0

then M F M0 (RN ), {ZF (t) : t 0} is a Poisson process associated with M F ,


and j t, , ZF ( , ) = j F (t, , ).
Proof: To prove the first assertion, suppose that {1 , . . . , n } are disjoint
Sn
0
Borel subsets of RN and that 0
/ i=1 i . Then {F 1 (1 ), . . . , F 1 (n )}
satisfy the same conditions
as subsets of RN , and therefore, since j F (t, i , ) =
 F
1
j t, F (i ), ), {j (t, i ) : t 0} : 1 i n has the required properties.

170

4 Levy Processes
0

Turning to the second assertion, first note that M F M0 (RN ) is an immedi


0
ate consequence of 0
/ F 1 (RN \ {0}) and that the equality j t, , ZF ( , ) =
j F (t, , ) is a trivial application of the final part of Theorem 4.1.8. To prove
that {ZF (t) : t 0} is a Poisson process associated with M F , use Theorem
4.2.8 to see that {j F (t, ) : t 0} has the same distribution as the jump
process for a Poisson
process {Z(t) : t 0} associated with M F . Hence,
R
since Z(t) = y j(t, dy, Z), {ZF (t) : t 0} has the same distribution as
{Z(t) : t 0}. 

4.2.4. L
evy Processes with Bounded Variation. Although the contents
of the previous section provide the machinery with which to construct a Levy
process for any with Fourier transform given by (4.2.1), for reasons made clear
in the next lemma, I will treat the special case when M M1 (RN ) here and will
deal with M M2 (RN ) \ M1 (RN ) in the following subsection.
Lemma 4.2.13. Let {j(t, ) : t 0}R be a Poisson jump process associated
with M M2 (RN ), and set V (t, ) = |y| j(t, dy, ). Then V (t) < almost
surely or V (t) = almost surely for all t > 0, depending on whether M is or is
not in M1 (RN ). (See Exercise 4.3.11 to see that the same conclusion holds for
any M M (RN ).)
R
Proof: Since |y|>1 |y| j(t, dy, ) < for all (t, ) [0, ) , the question
R
is entirely about the finiteness of V0 (t, ) B(0,1) |y| j(t, dy, ). To study this
k+1 ) \ B(0, 2k ), F (y) = |y|1
k
Ak (y), and Vk (t, ) =


Rquestion, set Ak = B(0, 2
|y|
j(t,
dy,
)
for
k

1.
Clearly,
the
processes
{V
(t)
: t 0} : k Z+
k
Ak
are mutually independent. In addition, for each k, t
Vk (t) is non-decreasing
and, by the second part of Lemma 4.2.12, {Vk (t) : t 0} is a Poisson process
associated with M Fk . Thus, by Lemma 4.2.10,



ak EP Vk (t) = t


|y| M (dy) and bk Var Vk (t) = t

Ak

|y|2 M (dy).

Ak

From the first of these, it follows that

"Z
P

|y| j(t, dy) =

B(0,1)



E Vk (t) =

|y| M (dy),
B(0,1)

k=1

which finishes the case when M M1 (RN ). When M M2 (RN ) \ M1 (RN ), set
Vk (t) = Vk (t) tak . Then, for each t > 0, {Vk (t) : k Z+ } is a sequence of
mutually independent random variables with mean value 0. Furthermore,

X
k=1

X

Var Vk (t) = t
bk = t
k=1

|y|2 M (dy) < .


B(0,1)

4.2 Discontinuous Levy Processes

171

P
Hence, by Theorem
1.4.2,
k=1 Vk (t) converges P-almost
P
P surely. But, when

M
/ M1 (RN ), k=1 ak = , and so, for each t > 0, k=1 Vk (t) must diverge
P-almost surely. 
Before stating the main result of the subsection, I want to introduce the notion
of a generalized Poisson measure. Namely, if M M1 (RN ) \ M0 (RN ) and
M is the element of I(RN ) whose Fourier transform is given by

Z 

1(,y)RN
1 M (dy) ,
exp
e

R
or, equivalently,
d
M is given by (4.2.1) with m = B(0,1) y M (dy), then I will
call M the generalized Poisson measure for M . Similarly, if {Z(t) : t 0}
is a Levy process for a generalized Poisson measure M , I will say that it is a
generalized Poisson process associated with M .

Theorem 4.2.14. Suppose that M M1 (RN ) and that {j(t, ) : t 0} is


a Poisson jump process associated with M . Set N = { : t > 0 j(t, , )
/
N
M1 (R )}, and define (t, )
ZM (t, ) so that
R
y j(t, dy, ) if
/N
ZM (t, ) =
0
if N .
Then P(N ) = 0 and {ZM (t) : t 0} is a (possibly generalized) Poisson process
associated with M . In particular, t
ZM (t, ) is absolutely pure jump for all
, and {j(t, , ZM ) : t 0} is a Poisson jump process associated with M .
Finally, if I(RN ) has Fourier transform given by (4.2.1), then
!
)
(
Z
y M (dy) + ZM (t) : t 0
t m
B(0,1)

is a Levy process for .


Proof: That P(N ) = 0 follows from Lemma 4.2.13. To prove that {ZM (t) :
t 0} is a Levy process for M , set
Z
Z(r) (t, ) =
y j(t, dy, )
|y|>r
(r)

for r > 0. By Lemma 4.2.12, {Z (t) : t 0} is a Poisson process associated


with M (r) (dy) 1(r,) (y) M (dy). In addition, if
/ N , then Z(r) ( , )
ZM ( , ) uniformly on compacts, from which it is easy to check that {ZM (t) :
t 0} is a Poisson process associated with M and that the process in the last
assertion is a Levy process for the whose Fourier transform is given by (4.2.1)

with this M . Finally, by the last part of Theorem 4.1.8, j t, , ZM ( , ) =
j(t, , ) when
/ N , from which it is clear that {j(t, , ZM ) : t 0} is a
Poisson jump process associated with M . 
4.2.5. General, Non-Gaussian L
evy Processes. In this subsection I will
complete the construction of Levy processes with no Gaussian component.

172

4 Levy Processes

Theorem 4.2.15. For each m RN and M M2 (RN ) there is a Levy process


for the I(RN ) whose Fourier transform is given by (4.2.1). Moreover, if
{Z(t) : t 0} is such a process, then {j(t, , Z) : t 0} is a Poisson jump
process associated with M . Finally, if, for r (0, 1],
Z
Z
Z(r) (t) =
y j(t, dy, Z) t
y M (dy),
|y|>r

r<|y|1

then
!
P



sup Z( ) m Z(r) ( ) 

[0,t]

N 2t
2

|y|2 M (dy).
B(0,r)

Proof: Without loss in generality, I will assume that m = 0.


By Theorem 4.2.11, we know that there is a Poisson jump process {j(t, ) :
t 0} associated with M . Take
j(t, dy, ) = j(t, dy, ) t1
(y)M (dy),
B(0,1)

and define
(r)

y j(t, dy, ),

(t, ) =

(t, ) [0, ) ,

|y|>r

for r (0, 1]. By Theorem 4.2.14, we know that {Z(r) (t) : t 0} is a Levy
process for (r) , where
!
Z
i
h


(r) () = exp
d
e 1(,y)RN 1 1 1[0,1] (y) , y RN M (dy) .
|y|>r

Furthermore, by the second part of Lemma 4.2.10, we know that, for 0 < r <
r0 1,
Z
 N 2t
0
|y|2 M (dy).
(*)
P kZ(r ) Z(r) k[0,t]  2

0
r<|y|r

Hence, if 1 rm & 0 is chosen so that


Z
|y|2 M (dy) 2m ,
B(0,rm )

then

P sup kZ(rn ) Z(rm ) k[0,t]
n>m

1
m

P kZ(rn+1 ) Z(rn ) k[0,t] (m + 1)2

nm

N 2t

(n + 1)4 2n ,

n=m

4.2 Discontinuous Levy Processes

173

and therefore, by the first part of the BorelCantelli Lemma,




1
= 1.
P m n m kZ(rn ) Z(rm ) k[0,t] m+1

We now know that there is a P-null set N such that, for any
/ N , there
exists a Z( , ) D(RN ) to which {Z(rm ) ( , ) : n 0} converges uniformly
on compacts. Thus, if we take Z(t, ) = 0 for (t, ) [0, ) N , then it is an
easy matter to check that {Z(t) : t 0} is a Levy process for the I(RN )
whose Fourier transform is given by (4.2.1) with m = 0. In addition, since, by
Theorem 4.1.8, we know that t
j(t, , ) is the jump function for t
Z(t, )
when
/ N , it is clear that {j(t, , Z) : t 0} is a Poisson jump process
associated with M . Finally, to prove the estimate in the concluding assertion,
observe that, for
/ N , the path t
Z(r) (t, ) used in our construction
coincides with the path described in the statement. Thus, the desired estimate
is an easy consequence of the one in (*) above. 
Corollary 4.2.16. Let I(RN ) with Fourier transform given by (4.2.1),
and suppose that {Z(t) : t 0} is a Levy process for . Then, depending
on whether or not M M1 (RN ), either P-almost all or P-almost none of the
paths t
Z(t) has locally bounded variation. Moreover, if M M1 (RN ), then,
P-almost surely,
!
Z
y M (dy)
is an absolutely pure jump path.
t
Z(t) t m
B(0,1)

Proof: From Theorem 4.2.14, we already know that t


Z(t) tm is almost
surely an absolutely pure jump path if M M1 (RN ), and so t
Z(t) is almost
surely of locally bounded variation. Conversely, if t
Z(t) has locally bounded
variation with positive probability, then, by (4.1.13),
j t, , Z M1 (RN ) with

positive probability. But then, since {j t, , Z : t 0} is a Poisson jump
process associated with M , it follows from Lemma 4.2.13 that M M1 (RN ). 
Corollary 4.2.17. Let and {Z(t) : t 0} be as in Corollary 4.2.16. Given
set
BRN with 0
/ ,
Z
Z

y M (dy).
Z (t) =
y j(t, dy, Z), M (dy) = 1 (y)M (dy), and m =

B(0,1)

Then {Z (t) : t 0} is a Poisson process associated with M , {Z(t) Z (t) :


t 0} is a Levy process for the element of I(RN ) whose Fourier transform is



1 , m m RN
exp

Z
h


 i
1(,y)RN
1 11[0,1] |y| , y RN M (dy) ,
+
e
RN \


and {Z(t) Z (t) : t 0} is independent of {j t, , Z : t 0}, and therefore
of {Z (t) : t 0} as well.

174

4 Levy Processes

Proof: That {Z (t) : t 0} is a Poisson process associated with M is an


immediate consequence of Lemma 4.2.12. Next, define Z(r) (t) as in Theorem
4.2.15. Then, for all r (0, 1],
Z
Z
(r)

Z (t) Z (t) =
1RN \ (y)y j(t, dy) t
y M (dy).
|y|>r

r<|y|1

In particular, this means that {Z(r) (t)Z (t) : t 0} has independent,


 homogeneous increments and (cf. Theorem 4.1.8) is independent of {j t, , Z : t 0}.
Thus, since, as r & 0, Z(r) (t) Z(t) tm in probability, it follows that
{Z(t) Z (t) : t 0} is independent of {j(t, , Z ) : t 0}. In addition,





(r)

e 1t(,mm )RN EP e 1(,Z(t)Z (t))RN = lim EP e 1(,Z (t)Z (t)+tm )RN


r&0

Z
i
h




e 1(,y)RN 1 11[0,1] |y| , y RN M (dy)


= lim exp
r&0

(B(0,r)){

Z
= exp
RN \

1(,y)RN

11[0,1]

!

 i
|y| , y RN M (dy) .

Hence, it follows that {Z(t) Z (t) : t 0} is a Levy process for the specified
element of I(RN ). 
Exercises for 4.2
Exercise 4.2.18. Here is another proof that the process {N (t) : t 0} in
4.2.1 has independent, homogeneous increments. Refer to the notation used
there.
(i) Given n Z+ and measurable functions f : [0, )n+1 7 [0, ) and g :
[0, )n R, show that


EP f (1 , . . . , n+1 ), n+1 > g(1 , . . . , n )


+
= EP eg(1 ,... ,n ) f 1 , . . . , n , n+1 + g(1 , . . . , n )+ .
(ii) Let K Z+ , 0 = n0 n1 nK , and 0 = t0 t1 < < tK = s
be given, and set A = {N (tk ) = nk , 1 k K}. Show that A = B
{nK +1 > s TnK }, where B {1 , . . . , nK } , and apply (i) to see that


P(A) = EP e(sTnK ) , B .
(iii) Let n Z+ and t > 0 be given, and set h() = P(Tn1 > ). Referring to
(ii) and again using (i), show that



P A {N (s + t) N (s) < n} = EP h(t + s TnK +1 ), B {nK +1 > s TnK }



 

= EP e(sTnK ) h(t nK +1 ), B = EP h(t nK +1 ) EP e(sTnK ) , B

= P N (t) < n P(A).

Exercises for 4.2

175

Exercise 4.2.19. Let {N (t) : t 0} be a simple Poisson process, and show


that limt N t(t) = 1 P-almost surely.

Hint: First use The Strong Law of Large Numbers to show that limn
1 P-almost surely. Second, use



2
N (t) N (n)
 P N (1) n 2 2
P
sup
 n
t
ntn+1

to see that

N (n)
n



N (t) N (btc)
= 0 P-almost surely.

lim
t
btc
t

Exercise 4.2.20. Assume that I(R) has its Fourier transform given by
(4.2.1), and let {Z(t) : t 0} be a Levy process for . Using Exercise 3.2.25,

show that t R Z(t) is non-decreasing if and only if M M1 (R), M (, 0) =
0, and m [1,1] y M (dy).
Exercise 4.2.21. Let {j(t, ) : t 0} be a Poisson jump process associated
with some M M (RN ), and suppose that F : RN R is a Borel measurable,
M -integrable function that vanishes at 0.
(i) Let N be the set of for which there is a t > 0 such that F is not
j t, , )-integrable, and show that P(N ) = 0.
(ii) Show that (cf. Lemma 4.2.6) M F M1 (R) and that, in fact,
Z
Z
F
|y| M (dy) = |F (y)| M (dy) < .
Next, define
F

R

Z (t, ) =

F (y) j(t, dy, )

if
/N
if N ,

and show that {Z F (t) : t 0} is a (possibly generalized) Poisson process associated with M F .
(iii) Show that
Z F (t)
=
t
t
lim

Z
F (y) M (dy)

P-almost surely.

Hint: Begin by using Lemma 4.2.10 to show that it suffices to handle F s that
vanish in a neighborhood of 0. When F vanishes in a neighborhood of 0, use
Lemma 4.2.12 to see that {Z F (t) : t 0} is a Poisson process associated with
M F . Finally, use the representation of a Poisson process in terms of a simple
Poisson process and independent random variables, and apply The Strong Law
of Large Numbers together with the result in Exercise 4.2.19.

176

4 Levy Processes

Exercise 4.2.22. Let {Z(t) : t 0} be a Levy process for the I(RN ) with

Fourier transform given by (4.2.1), and set


 Z(t) = Z(t) tm. Show that for all
[0,t] R is dominated by t times
R [1, ) and t (0, ), P kZk
4N
R2

|y|2 M (dy) +

B(0,1)

2
R

1<|y| R

|y| M (dy) + M B(0,

Hint: Write Z(t)


= Z1 (t) + Z2 (t) + Z3 (t), where
Z
Z
y
j(t,
dy,
Z)
and
Z
(t)
=
Z2 (t) =
3

|y|> R

1<|y| R


R){ .

y j(t, dy, Z).

Then,

P kZk[0,t] R P kZ1 k[0,t]

R
2

+ P kZ2 k[0,t]

R
2


+ P kZ3 k[0,t] 6= 0 .

Apply the estimates in Lemma 4.2.10 to control the first two terms on the right,
and use




N
P j t, RN \ B(0, R), Z 6= 0 = 1 etM (R \B(0, R))

to control the third.


Exercise 4.2.23. Let be a locally finite Borel measure on RN . A Poisson
point process with intensity measure is a random, locally finite, purely atomic
measure-valued random variable
P ( , ) with the properties that, for any
bounded BRN , P () is a Poisson random variable with mean value ()
and, for any n 2 and family {1 , . . . , n } of mutually disjoint, bounded, Borel
subsets of RN , {P (1 ), . . . , P (n )} are mutually independent. The purpose of
this exercise is to show how one always construct such a Poisson point process
for any .
y
(i) Define F : RN RN so that F (0) = 0 and F (y) = |y|
2 for y 6= 0. Clearly,
1
F is one-to-one and onto, and both F and F
are Borel measurable. Assuming
that ({0}) = 0, show that M F M (RN ) and that = (F 1 ) M .

(ii) Continue to assume that ({0}) = 0, let {j(t, ) : t 0} be a Poisson jump


process associated with the M in (i), and set P ( , ) = (F 1 ) j(1, , ). Show
that
P ( , ) is a Poisson point process with intensity .
(vi) In order to handle s that charge 0, suppose ({0}) > 0. Choose a point
x RN for which ({x}) = 0, define 0 () = (x + ), note that 0 ({0}) = 0,
and construct a Poisson point process
P 0 ( , ) with intensity measure 0 .
0
Finally, define P (, ) = P ( x, ), and check that
P ( , ) is a Poisson
point process with intensity measure .

4.3 Brownian Motion, the Gaussian Levy Process

177

Exercise 4.2.24. Let M M2 (RN ) be given, and assume that there exists a
decreasing sequence {rn : n 0} (0, 1] with rn & 0 such that
Z
m = lim
y M (dy)
n

rn <|y|1

exists. Let I(RN ) have Fourier transform given by (4.2.1) with this m and
M . If {Z(t) : t 0} is a Levy process for , set
Z

Zn (t, ) =
y j t, dy, Z( , ) ,
|y|>rn


and show that limn P kZ Zn k[0,t]  = 0 for all t 0 and  > 0. Thus,
after passing to a subsequence {nm : m 0} if necessary, one sees that, P-almost
surely,
Z

Z(t, ) = lim
y j t, dy, Z( , ) ,
m

|y|>rnm

where the convergence is uniform on finite time intervals. In particular, one can
say that P-almost all the paths t
Z(t, ) are conditionally pure jump.
4.3 Brownian Motion, the Gaussian L
evy Process
What remains of the program in this chapter is the construction of a Levy
process for the standard, normal distribution 0,I , the infinitely divisible law
||2

whose Fourier transform is e 2 . Indeed, if {Z0,I (t) : t 0} is such a process


and {Z (t) : t 0} is a Levy process for the I(RN ) whose Fourier transform
is given by (4.2.1), and if {Z0,I (t) : t 0} is independent of {Z (t) : t 0},
1
then it is an easy matter to check that C 2 Z0,I (t) + Z (t) will be a Levy process
for 0,C ? , whose Fourier transform is


exp



1 , m RN 12 , C RN

Z h

 i
+
e 1(,y)RN 1 1 1[0,1] (|y|) , y RN M (dy) .
RN

Because one of its earliest applications was as a mathematical model for the
motion of Brownian particles, 1 such a Levy process for 0,1 is called a Brownian motion. In recognition of its provenance, I will adopt this terminology
and will use the notation {B(t) : t 0} instead of {Z0,I (t) : t 0}.
1

R. Brown, an eighteenth century English botanist, observed the motion of pollen particles
in a dilute gas. His observations were interpreted by A. Einstein as evidence for the kinetic
theory of gases. In his famous 1905 paper, Einstein took the first steps in a program, eventually
completed by N. Wiener in 1923, to give a mathematical model of what Brown had seen.

178

4 Levy Processes

Before getting into the details, it may be helpful to think a little about what
sorts of properties we should expect the paths t
B(t) will possess. For this
N
purpose, set Mn = n 12 + 12 , and recall that we have seen already
n
n
that Mn =0,I . Since a Poisson process associated with Mn has nothing but
1
jumps of size n 2 , if one believes that the Levy process for 0,I should be, in
some sense, the limit of such Poisson processes, then it is reasonable to guess
that its paths will have jumps of size 0. That is, they will be continuous.
Although the prediction that the paths of {B(t) : t 0} will be continuous
is correct, it turns out that, because it is based on the Central Limit Theorem,
the heuristic reasoning just given does not lead to the easiest construction. The
problem is that The Central Limit Theorem gives convergence of distributions,
not random variables, and therefore one should not expect the paths, as opposed
to their distributions, of the approximating Poisson processes to converge. For
this reason, it is easier to avoid The Central Limit Theorem and work with
Gaussian random variables from the start, and that is what I will do here. The
Central Limit approach is the content of 9.3.

4.3.1. Deconstructing Brownian Motion. My construction of Brownian


motion is based on an idea of Levys; and in order to explain Levys idea, I will
begin with the following line of reasoning.
Assume that {B(t) : t 0} is a Brownian motion in RN . That is, {B(t) : t
0} starts at 0, has independent increments, any increment B(s + t) B(s) has
distribution 0,tI , and the paths t
B(t) are continuous. Next, given n N, let
t
Bn (t) be the polygonal path obtained from t
B(t) by linear interpolation
during each time interval [m2n , (m + 1)2n ]. Thus,
Bn (t) = B(m2n ) + 2n t m2n




B (m + 1)2n B(m2n )

for m2n t (m + 1)2n . The distribution of {B0 (t) : t 0} is very


easy to understand. Namely, if Xm,0 = B(m) B(m 1) for m 1, then the
N
X
Pm,0 s are independent, standard normal R -valued random variables, B0 (m) =
1mn Xm,0 , and B0 (t) = (m t)B0 (m 1) + (t m + 1)B0 (m) for m 1
t m. To understand the relationship between successive Bn s, observe that
Bn+1 (m2n ) = Bn (m2n ) for all m N and that




n
Xm,n+1 2 2 +1 Bn+1 (2m 1)2n1 Bn (2m 1)2n1

!
 B m2n + B (m 1)2n
n
+1
n1
B (2m 1)2

= 22
2
h



n
= 2 2 B (2m 1)2n1 B (m 1)2n


i
B m2n B (2m 1)2n1
,

4.3 Brownian Motion, the Gaussian Levy Process

179

and therefore {Xm,n+1 : m 1} is again a sequence of independent standard


normal random variables. What is less obvious is that {Xm,n : (m, n) Z+ N}
is also a family of independent random variables. In fact, checking this requires
us to make essential use of the fact that we are dealing with Gaussian random
variables.
In preparation for proving the preceding independence assertion, say that
G L2 (P; R) is a Gaussian family if G is a linear subspace and each element
of G is a centered (i.e., mean value 0), R-valued Gaussian random variable.
My
 interest
 in Gaussian families at this point is that the linear span G(B) of
, B(t) RN : t 0 and RN is one. To see this, simply note that, for any
0 = t0 < t1 < tn and 1 , . . . , n RN ,
!
n
n
n
X
X
X


m , B(tm ) RN =
m , B(t` ) B(t`1 ) RN
,
m=1

`=1

m=`

RN

which, as a linear combination of independent centered Gaussians, is itself a


centered Gaussian.
The crucial fact about Gaussian families is the content of the next lemma.
Lemma 4.3.1. Suppose that G L2 (P; R) is a Gaussian family. Then the
closure of G in L2 (P; R) is again a Gaussian family. Moreover, for any S G,
S is independent of S G, where S is the orthogonal complement of S in
L2 (P; R).
Proof: The first assertion is easy since, as I noted in the introduction to Chapter 3, Gaussian random variables are closed under convergence in probability.
Turning to the second part, what I must show is that if X1 , . . . , Xn S and
X10 , . . . , Xn0 S G, then (cf. part (ii) of Exercise 1.1.13)
#
# " n
#
" n
" n
n

Y
Y
Y
Y
0
0
0
0
1 m
Xm
1 m Xm
P
1 m
Xm
P
1 m Xm
P
E
e
=E
e
e
E
e
m=1

m=1

m=1

m=1

0
for any choice of {m : 1 m n} {m
: 1 m n} R. But the
expectation value on the left is equal to

!2
n
X

1
0
0

m Xm + m
Xm
exp EP
2
m=1

!2
!2
n
n
X
X
1
1
0
0

m
Xm
m Xm EP
= exp EP
2
2
m=1
m=1
#
# " n
" n
Y
Y
0
0
e 1 m Xm ,
= EP
e 1 m Xm EP

m=1

m=1

180

4 Levy Processes

0
0
since EP [Xm Xm

0 ] = 0 for all 1 m, m n.
Armed with Lemma 4.3.1, we can now check that {Xm,n : (m, n) Z+ N}
is independent. Indeed, since, for all (m, n) Z+ N and RN , , Xm,n RN
a member of the Gaussian family G(B), all that we have to do is check that, for
each (m, n) Z+ N, ` N, and (, ) (RN )2 ,


 
EP , Xm,n+1 RN , B(`2n ) RN = 0.

But, since, for s t, B(s) is independent of B(t) B(s),




 


 

EP , B(s) RN , B(t) RN = EP , B(s) RN , B(s) RN = s , RN
and therefore


 
n
2 2 1 EP , Xm,n+1 RN , B(`2n ) RN
h
 i
 
= EP , B (2m 1)2n1 N , B(`2n ) N
R
R
 i

 
1 P h
n
, B m2
+ B (m 1)2n N , B(`2n ) N
E
2
R
R


m

`
+
(m

1)

`
= 0.
= 2n , RN m 12 `
2

4.3.2. L
evys Construction of Brownian Motion. Levys idea was to
invert the reasoning given in the preceding subsection. That is, start with a
family {Xm,n : (m, n) Z+ N} of independent N (0, I)-random variables.
Next, define {Bn (t) : t 0} inductively
Bn (t) is linear on each
P so that t
n
n
interval [(m 1)2 , m2 ], B0 (m) = 1`m X`,0 , m N, Bn+1 (m2n ) =
Bn (m2n ) for m N, and


n
Bn+1 (2m 1)2n = Bn (2m 1)2n1 + 2 2 1 Xm,n+1 for m Z+ .

If Brownian motion exists, then the distribution of {Bn (t) : t 0} is the


distribution of the process obtained by polygonalizing it on each of the intervals
[(m 1)2n , m2n ], and so the limit limn Bn (t) should exist uniformly on
compacts and should be Brownian motion.
To see that this procedure works, one must first verify that the preceding
definition of {Bn (t) : t
That
 0} gives a process
 with the correct distribution.

is, we need to show that Bn (m+1)2n Bn m2n : m N is a sequence of
independent N (0, 2n I)-random variables. But, since this sequence is contained
in the Gaussian family spanned by {Xm,n : (m, n) Z+ N}, Lemma 4.3.1 says
that we need only show that
h


EP , Bn (m + 1)2n Bn m2n N
R


 i

0
0
n
, Bn (m + 1)2
Bn m0 2n
= 2n , 0 RN m,m0
RN

4.3 Brownian Motion, the Gaussian Levy Process

181

for , 0 RN and m, m0 N. When n = 0, this is obvious. Now assume that


it is true for n, and observe that

Bn+1 (m2n ) Bn+1 (2m 1)2n1

Bn (m2n ) Bn (m 1)2n
n
2 2 1 Xm,n+1
=
2

and


Bn+1 (2m 1)2n1 Bn+1 (m 1)2n

Bn (m2n ) Bn (m 1)2n
n
+ 2 2 1 Xm,n+1 .
=
2

Using these expressions and the induction hypothesis, it is easy to check the
required equation.
Second, and more challenging, we must show that, P-almost surely, these
processes are converging uniformly on compact time intervals. For this purpose,
consider the difference t
Bn+1 (t) Bn (t). Since this path is linear on each
interval [m2n1 , (m + 1)2n1 ],




Bn+1 (m2n1 ) Bn (m2n1 )
max Bn+1 (t) Bn (t) =
max
t[0,2L ]

1m2L+n+1

= 2 2 1

max

1m2L+n

L+n
2X

|Xm,n+1 | 2 2 1

14
|Xm,n+1 |4 .

m=1

Thus, by Jensens Inequality,

L+n
2X



n
EP kBn+1 Bn k[0,2L ] 2 2 1

14


nL4
EP |Xm,n+1 |4 = 2 4 CN ,

m=1


1
where CN EP |X1,0 |4 4 < .
Starting from the preceding, it is an easy matter to show that there is a
measurable B : [0, ) RN such that B(0) = 0, B( , ) C [0, ); RN )
for each , and kBn Bk[0,t] 0 both P-almost surely and in L1 (P; R)
n
n
for every t [0, ). Furthermore, since
) P-almost surely
 B(m2 )n= Bn (m2 n

2
for all (m, n) N , it is clear that B (m + 1)2
B(m2 ) : m 0 is a
sequence of independent N (0, 2n I)-random variables for all n N. Hence, by
continuity, it follows that {B(t) : t 0} is a Brownian motion.
We have now completed the task described in the introduction to this section.
However, before moving on, it is only proper to recognize that, clever as his
method is, Levy was not the first to construct a Brownian motion. Instead, it

182

4 Levy Processes

was N. Wiener who was the first. In fact, his famous2 1923 article Differential
Space in J. Math. Phys. #2 contains three different approaches.
4.3.3. L
evys Construction in Context. There are elements of Levys
construction that admit interesting generalizations, perhaps the most important
of which is Kolmogorovs Continuity Criterion.
Theorem 4.3.2. Suppose that {X(t) : t [0, T ]} is a family of random
variables taking values in a Banach space B, and assume that, for some p
[1, ), C < , and r (0, 1],

1
1
EP kX(t) X(s)kpB p C|t s| p +r for all s, t [0, T ].

Then, there exists a family {X(t)


: t [0, T ]} of random variables such that

) B is
X(t) = X(t)
P-almost surely for each t [0, T ] and t [0, T ] 7 X(t,
continuous for all . In fact, for each (0, r),
"
P

sup
0s<tT

X(s)k

kX(t)
B
(t s)

!p # p1

5CT p +r
.

(1 2r )(1 2r )

Proof: First note that, by rescaling time, it suffices to treat the case when
T = 1.


Given n 0, set Mn = max1m2n X(m2n ) X (m 1)2n B , and
observe that

! p1
2n
X
1




p
X(m2n ) X (m 1)2n
C2rn .
EP Mnp p EP
B
m=1

Next, let t
Xn (t) be the polygonal path obtained by linearizing t
each interval [(m 1)2n , m2n ], and check that

X(t) on

max kXn+1 (t) Xn (t)kB






n
n

X
(m

1)2

X(m2
)


= max n X (2m 1)2n1
Mn+1 .
1m2

2

t[0,1]

i 12 p

C2rn , and so there exists a


Hence, EP supt[0,1] kXn+1 (t) Xn (t)kpB
: [0, 1] B such that t
) is continuous for all
measurable X
X(t,
and
"
# p1
C2rn
Xn (t)kp
.

EP sup kX(t)
B
1 2r
t[0,1]
2

Wieners article is remarkable, but I must admit that I have never been convinced that it is
complete. Undoubtedly, my doubts are more a consequence of my own ineptitude than of his.

4.3 Brownian Motion, the Gaussian Levy Process

183

Moreover, because, for each t [0, 1], kX( ) X(t)kB 0 in probability as

t, it is easy to check that, for each t [0, 1], X(t)


= X(t) P-almost surely.
n1
To prove the final estimate, note that for 2
t s 2n one has that
X(s)k

kX(t)
B kX(t) Xn (t)kB + kXn (t) Xn (s)kB + kXn (s) X(s)kB
) Xn ( )kB + 2n (t s)Mn ,
2 sup kX(
[0,1]

and therefore that


X(s)k

kX(t)
B
) Xn ( )kB + 2n 2(1)n Mn .
22(n+1) sup kX(
(t s)
[0,1]

But, by the estimates proved above, this means that


"
P

sup
0s<t1

X(s)k

kX(t)
B
(t s)

!p # p1



X
2(n+1) 2rn
n rn
+
2
2
2
1 2r
n=0

5C
. 
(1 2r )(1 2r )

Corollary 4.3.3.
If {B(t) : t 0} is an RN -valued Brownian motion, then,

B(t) is P-almost surely H
older continuous of order .
for each 0, 21 , t
In fact, for each T (0, ),


P

sup
0s<tT


|B(t) B(s)|
< .
(t s)

Proof: In view of Theorem 4.3.2, all that


 we have to do is note that, for each
n Z+ , there is a Cn < such that EP |B(t) B(s)|2n Cn |t s|n . 
4.3.4. Brownian Paths Are Non-Differentiable. Having shown that
Brownian paths are H
older continuous of every order strictly less than 12 , I
will close this section by showing that they are nowhere H
older continuous of
any order strictly greater than 12 . In particular, this will prove Wieners famous
result that Brownian paths are nowhere differentiable. The proof that follows is
due to A. Devoretzky.

Theorem 4.3.4. Let {B(t) : t 0} be an RN -valued Brownian motion. Then,


for each > 12 ,

|B(t) B(s)|
<
P s [0, ) lim
t&s
(t s)


= 0.

184

4 Levy Processes

Proof: Because {B(T + t) B(T ) : t 0} is a Brownian motion for each


T [0, ), it suffices for us to show that


|B(t) B(s)|
<
P s [0, 1) lim
t&s
(t s)


= 0.

To this end, note that, for every L Z+ ,


|B(t) B(s)|
<

t&s
(t s)

s [0, 1) lim

[
\
[
n L1
[
\ 
B

m+`+1
n

m+`
n

M
n

M =1 =1 n= m=0 `=0

Thus, it enough to show that there is a choice of L such that



lim nP B

`+1
n

`
n


0 ` < L = 0.

M
n ,

But

P B

`+1
n

`
n

 L
M

= 0, n1 I B 0, n

M
n ,

0`<L
N
2

(2)

Z
1
B(0,M n 2 )

|y|2
2

!L
dy

Cn( 2 )N L .

Hence, we need only take L so that ( 12 )N L > 1. 

In spite of their being non-differentiable, differentials of Brownian paths


display remarkable regularity properties. To wit, I make the following simple
observation. In its statement, k kH.S. denotes the HilbertSchmidt norm on
Hom(RN ; RN ).
Theorem 4.3.5. If {B(t) : t 0} is an RN -valued Brownian motion, then,
for each T (0, )


[nt]

X




lim sup

tI
m,n
m,n


n t[0,T ]
m=1

= 0 P-almost surely,

H.S.



m1
. In particular, P-almost no Brownian path
where m,n B B m
n
n B
has locally bounded variation.

4.3 Brownian Motion, the Gaussian Levy Process

185

N
Proof: Let
 (e1 , . . . , eN ) be an orthonormal basis for R , and set Xi (k, n) =
ei , k,n B RN . Then, what we have to show is that


m

X
m


(*)
lim sup
Xi (k, n)Xj (k, n) i,j = 0 P-almost surely.
n 1mnT

n
k=1

To this end, note that, for each n Z+ and 1 i N , {Xi (k, n) : k 1} are
mutually independent N (0, n1 )-random variables. Hence, for each 1 i N ,
{Xi (k, n)2 n1 : k 1} are independent random variables with mean value
0 and variance 3n2 , and therefore, by (1.4.22) and the second inequality in
(1.3.2),


m 

 4
X


E max
Xi (k, n)2 n1

1mnT
k=1

4





12M4 T 2
X
2
1
,
4E
Xi (k, n) n
n2

1knT

where M4 is the fourth moment of X1 (1, 1)2 1, and so the BorelCantelli


Lemma can be used to check (*) when i = j. When i 6= j, the argument is
essentially the same, only, because Xi (k, n)Xj (k, n) has mean value 0, there is
no need to subtract off its mean.

To prove the final assertion, note that if C [0, T ]; R has bounded variation, then
[nT ] 
X
 2

m1
= 0. 

lim
m
n
n
n

m=1

4.3.5. General L
evy Processes. Our original reason for constructing Brownian motion was to complete the program of constructing all the Levy processes.
In this subsection, I will do that.
Throughout this subsection, I(RN ) has Fourier transform




1 , m RN 12 , C RN
exp

(4.3.6)
Z h

 i
1(,y)RN
1 11[0,1] (|y|) , y RN M (dy) ,
+
e

where m RN , C Hom(RN ; RN ) is symmetric and non-negative definite, and


M M2 (RN ). In addition, I will use 0 to denote m,C and 1 to denote the
element of I(RN ) whose Fourier transform is

Z h

 i
exp
e 1(,y)RN 1 11[0,1] (|y|) , y RN M (dy) .

Thus, = 0 ? 1 .

186

4 Levy Processes

Theorem 4.3.7. There is a Levy process {Z(t) : t 0} for each I(RN ).


Furthermore, if 0 and 1 are as in the preceding discussion and if {Z(t) : t 0}
is a Levy process for = 0 ? 1 , then there exist independent Levy processes
{Z0 (t) : t 0} and {Z1 (t) : t 0} for 0 and 1 , respectively, such that
Z(t) = Z0 (t) + Z1 (t), t 0, P-almost surely. In fact, if, for r (0, 1],
Z
Z
Z(r) (t) =
y j(t, dy, Z) t
y M (dy),
|y|>r

r<|y|1

then, for each t (0, ),


 N 2t
P kZ(r) Z1 k[0,t]  2


|y|2 M (dy).
B(0,r)

Proof: Let {B(t) : t 0} be a Brownian motion and {Z1 (t) : t 0} an


1
independent Levy process for 1 , and define Z0 (t) = tm + C 2 B(t) and Z(t) =
Z0 (t)+Z1 (t). As I pointed out in the introduction to this section, {Z0 (t) : t 0}
is a Levy process for 0 and {Z(t) : t 0} is a Levy process for . Furthermore,
because t
Z0 (t) is continuous, j(t, , Z) = j(t, , Z1 ). Hence, by the last part
of Theorem 4.2.15, we know that the last part of the present theorem holds for
this choice of {Z(t) : t 0}. Finally, since every Levy process for will have
the same distribution as this one, there is nothing more to do. 

Corollary 4.3.8. Let {Z(t) : t 0} be a Levy process for . Then t


Z(t)
is P-almost surely continuous if and only if M = 0 and is P-almost surely of
locally bounded variation if and only if C = 0 and M M1 (RN ). Finally,
t
Z(t) is P-almost surely
R an absolutely pure jump path if and only if C = 0,
M M1 (RN ), and m = B(0,1) y M (dy).

Proof: Let Z(t) = Z0 (t) + Z1 (t) be the decomposition described in Theorem


4.3.7, and let {j(t, ) : t 0} be the jump process for {Z(t) : t 0}. If
M = 0, then Z1 (t) = 0, t 0, P-almost surely, and so t
Z(t) = Z0 (t)
is continuous P-almost surely. Conversely, if t
Z(t) is continuous P-almost
surely, then j(t, ) = 0, t 0, P-almost surely. Hence, since {j(t, ) : t 0} is
a Poisson jump process associated with M , we see that M = 0. Next, suppose
that C = 0. Then Z(t) = Z1 (t) + tm, t 0, P-almost surely and therefore,
by Corollary 4.2.16, t
Z(t) has locally bounded variation P-almost surely
N
if and only if M M1 (R ) and is P-almost Rsurely an absolutely pure jump
path if and only if M M1 (RN ) and m = B(0,1) y M (dy). Thus, all that
remains is to show that C = 0 if t
Z(t) P-almost surely has locally bounded
variation. But,
Z(t) has locally bounded variation P-almost surely, then,
R if t
by (4.1.13), |y|j(t, dy) < , t 0, P-almost surely and therefore, by Lemma
4.2.13, M M1 (RN ), which, by Corollary 4.2.16, implies that t
Z1 (t) has
locally bounded variation P-almost surely. Since this means that t
Z0 (t) must

Exercises for 4.3

187

also have locally bounded variation P-almost surely, and, since {Z0 (t) : t 0}
1
has the same distribution as {tm + C 2 B(t) : t 0}, Theorem 4.3.5 shows that
this is possible only if C = 0. 

Remark 4.3.9. Recall the linear functional A introduced in (3.2.10). As I


showed in Lemma 3.2.14, the action of A on decomposes into a local part
and a non-local part, which, with 20-20 hindsight, we can write as, respectively,


m, (0) RN + 12 Trace C2 (0)
Z h
 i
and
(y) (0) 1[0,1] (|y|) y, (0) RN M (dy).

In terms of this decomposition, Corollary 4.3.8 is saying that the local part of
A governs the continuous part of {Z(t) : t 0} and that the non-local part
governs the discontinuous part.
Exercises for 4.3
Exercise 4.3.10. This exercise deals with a few elementary facts about Brownian motion.
(i) Let {X(t) : t 0} be an RN -valued stochastic process satisfying X(0, ) = 0
N
and X( , ) C(RN ) for all , and showthat {X(t)
 : t 0} is an R -valued
Brownian motion if and only if the span of , X(t) RN : t 0 & RN } is a
Gaussian family with the property that, for all t, t0 [0, ) and , 0 RN ,
h

 i
EP , X(t) RN 0 , X(t0 ) RN = t t0 (, 0 )RN .
(ii) Assuming that {B(t) : t 0} is an RN -valued Brownian motion, show that
{OB(t) : t 0} is also an RN -valued Brownian motion for any orthogonal transformation O on RN . That is, the distribution of Brownian motion is invariant
under rotation. (See Theorem 8.3.14 for a significant generalization.)
(iii) Assuming that {B(t) : t 0} is an RN -valued Brownian motion, show that
1
{ 2 B(t) : t 0} is also an RN -Brownian motion for each (0, ). This is
called the Brownian scaling invariance property.

Exercise 4.3.11. This exercise introduces the time inversion invariance property of Brownian motion.
(i) Suppose that
{B(t) : t 0} is an RN -valued Brownian motion, and set

1
X(t) = tB t for t > 0. As an application of (i) in Exercise 4.3.10, show that
{X(t) : t > 0} has the same distribution as {B(t) : t > 0}, and conclude from
) = 0 and, for
this that limt&0 X(t) = 0 P-almost surely. In particular, if B(0,
t (0, ),



tB 1t , when lim 0 B 1 , = 0

B(t, ) =
0
otherwise,
N

show that {B(t)


: t 0} is an R -valued Brownian motion.

188

4 Levy Processes

(ii) As a consequence of part (i), prove the Brownian Strong Law of Large Numbers: limt t1 B(t) = 0.
Exercise 4.3.12. Let {B(t) : t 0} be an RN -valued Brownian motion.
(i) As an application of Theorem 1.4.13, show that, for any e SN 1 and
T (0, ),
!
P



sup e, B(t) R
t[0,T ]




R2
2P e, B(T ) RN R 2e 2T ,

and conclude that



R2
P kBk[0,T ] R 2N e 2N T .

(4.3.13)

(ii) Now assume that N = 1, and set B (t) = max


as in part (i),
 [0,t] B( ). Just

use Theorem 1.4.13 to show that P B (1) a 2P B(1) a for all a > 0. By
examining its proof, one sees that the inequality in Theorem 1.4.13 comes from
not knowing how far over a the partial sums jump when they first exceed level a.
Thus, because we are now dealing with continuous partial sums, one should
suspect that the inequality can be made an equality. To verify this suspicion, let
n () denote the set of such that |B(t, ) B(s, )| <  for all 0 s < t 1
with t s 2n , and show that, for 0 <  < a,
{B(1) a} n ()

2n
1 
[
n
n
n

max B(`2 ) < a  B(m2 ) & B(1) B(m2 ) > 0 ,


m=1

0`<m



and conclude that P {B(1) a} n () 12 P B (1)  a  for all n N.
Now let n and then  & 0 to arrive at P B (1) a 2P B(1) a .
(iii) By combining the preceding with Brownian scaling invariance, arrive at
r
(4.3.14)



P B (t) a = 2P B(t) a =

1
at 2

x2
2

dx.

This beautiful result, which is sometimes called the reflection principle for
Brownian motion, seems to have appeared first in L. Bacheliers now famous
1900 thesis, where he used what is now called Brownian motion to model
price fluctuations on the Paris Bourse. More information about the reflection
principle can be found in 8.6.3.

Exercises for 4.3

189

Exercise 4.3.15. Let {B(t) : t 0} be an R-valued Brownian motion. The


goal of this exercise is to prove the Brownian Law of the Iterated Logarithm:

lim q

B(t)
= 1 = lim q
t&0
2t log(2) t
2t log(2) t1
B(t)

P-almost surely.

Begin by checking that the second equality follows from the first applied to the

time inverted process {B(t)


: t 0} described in (i) of Exercise 4.3.11. Next,
observe that
B(n)
= 1 P-almost surely
lim q
n
2n log(2) n

is just the Law of the Iterated Logarithm for standard normal random variables.
Thus, all that remains is to show that







B(n)
B(t)
= 0 P-almost surely,
q
lim
sup q

n t[n,n+1]
2t log(2) t
2n log(2) n
which can be checked by a combination of the Strong Law for Brownian motion,
the estimate in (4.3.13), and the easy half of the BorelCantelli Lemma.
Exercise 4.3.16. Given a stochastic process {X(t) : t 0}, the stochastic

process {X(t)
: t 0} is said to be a modification of {X(t) : t 0} if, for

each t [0, ), X(t)


= X(t) P-almost surely. Further, given a stochastic process
{X(t) : t 0} with values in a metric space (E, ), one says that {X(t) : t 0}
is stochastically continuous if, as t s, X(t) X(s) in probability for
each s [0, ).
(i) Show that the simple Poisson process {N (t) : t 0} is stochastically continuous. Thus, stochastic continuity does not imply path continuity.
(ii) Let Q denote the set of rational real numbers. Show that an RN -valued,
stochastically continuous stochastic process {X(t) : t 0} admits a continuous
modification if and only if, for each T > 0, t [0, T ] Q 7 X(t) is uniformly
continuous. Conclude that a stochastically continuous process {X(t) : t 0}
admits a continuous modification if and only if there exists a M1 C(RN )
such that the distribution of {X(t) : t 0} under P is the same as the distribution of {(t) : t 0} under . Equivalently, a stochastically continuous
process {X(t) : t 0} admits a continuous modification if and only if there
exists a continuous stochastic process {Y (t) : t 0}, not necessarily on the
same probability space, with the same distribution as {X(t) : t 0}.

190

4 Levy Processes

Exercise 4.3.17. It is important to realize that the insistence in Theorem


4.3.2 that the pth moment of |X(t) X(s)| be dominated by |t s| to a power
strictly greater than p is essential. To see this, recall the simple Poisson process
{N (t) : t 0} in 5.2.1, and set X(t) = N (t) t. The paths of this process are
right-continuous but definitely not continuous. On the other hand, show that

2 
EP N (t) N (s) (t
s)
t s for 0 s < t. More generally, knowing

2
that E |X(t) X(s)| is dominated by |t s| is not enough to conclude that
there is a continuous modification of t
X(t).
Exercise 4.3.18. There is an important extension of Theorem 4.3.2 to processes that have a multidimensional parametrization. Let B be a Banach space
and {X(x) : x [0, T ] } a family of B-valued random variables with the property that

1

EP kX(y) X(x)kpB p C|y x| p +r

for some p [1, ), r > 0, and C < . Show that there exists a family

{X(x)
: x [0, T ] } with the properties that x [0, T ] 7 X(x,
) B

is continuous for all , and, for each x [0, T ] , X(x,


) = X(x, ) P-almost
surely. Further, show that, for each (0, r), there is a universal K(, r, ) <
such that

kX(y)
X(x)k

B
K(, r, )CT p +r .
EP sup

|y x|
x,y[0,T ]
y6=x

Hint: First rescale time to reduce to the case when T = 1. Now assume that
2
T = 1. Given n N, take Sn to be the set of pairs (m, m0 ) {0, . . . , 2n }N
P

such that m0i mi for all 1 i and i=1 (m0i mi ) = 1, note that Sn has
no more than 2(n+1) elements, set


Mn = max kX(m0 2n ) X(m2n )kB : (m, m0 ) Sn ,
1

Xn (x) denote the nth


and show that EP [Mn ] C2 p 2rn . Next, let x
dyadic multiliniarization of x
X(x), the one that is multilinear on each dyadic
QN
cube i=1 [(mi 1)2n , mi 2n ] for (m1 , . . . , mN ) {1, . . . , 2n }N . As in the
proof of Theorem 4.3.2, argue that kXn+1 Xn ku,B Mn+1 , and conclude

that there exists an (x, )


X(x,
) that is continuous in x for each and
is P-almost surely equal to X(x, ) for each x. Finally, to derive the Holder
1
continuity estimate, observe that kXn (y) Xn (x)kB 2n 2 |y x|Mn , and
proceed as in the proof of the corresponding part of Theorem 4.3.2.

Exercise 4.3.19. In this exercise we will examine a couple of the implications


that Theorem 4.3.5 has about any RiemannStieltjes type integration theory

Exercises for 4.3

191

involving Brownian paths. For simplicity, I will restrict my attention to the onedimensional case. Thus, let {B(t) : t 0} be an R-valued Brownian motion.
Because t
B(t) is continuous, one knows that any function : [0, 1] R of
bounded variation is RiemannStieltjes integrable on [0, 1] with respect to B 
[0, 1]. However, as the following shows, almost no Brownian path is Riemann
Stieltjes with respect to itself. Namely, using Theorem 4.3.5, show that P-almost
surely,
lim

n
X

m=1
n
X

lim

whereas
lim

m1
n



m
n



m
n

m1
n



B(1)2 1
,
2

m
n

m1
n



B(1)2 + 1
,
2



= B(1)2 .

m=1

n
X

2m1
2n



m
n

m1
n

m=1

Exercise 4.3.20. Say that a D(RN )-valued process {Z(t) : t 0} is a Levy


process if Z(0) = 0 and it has independent, homogeneous increments. Show that
every Levy process is a Levy process for some I(RN ).
Exercise 4.3.21. Let {j(t, ) : t 0} be a Poisson jump process associated
with some
M M (RN ). In Lemma 4.2.13, we showed that when M M2 (RN ),
R
then |y| j(t, dy) < , t 0, with positive probability only if M M1 (RN ).
In this exercise, weR will show that the same is true for any M M (RN ). That
is, assuming that |y| j(t, dy) < , t 0, with positive probability, it is to be
shown that M M1 (RN ). Here are some steps that you might want to follow.
R
(i) As an application of Kolmogorovs 01 Law, show that |y| j(t, dy) <
with positive probability implies it is finite with probability 1.
R
(ii) Let N be the set of for which there is aRt > 0 such that |y| j(t, dy, )
= . By (i), P(N ) = 0. Define Z(t, ) = y j(t, dy, ) for
/ N and
Z(t, ) = 0 for N , and show that {Z(t) : t 0} is a Levy process with
absolutely pure jump paths.
(iii) Applying Theorem 4.1.8, first show that {Z(t) : t 0} is a Levy process
for a with Levy measure M , and then apply Corollary 4.3.8 to conclude that
M M1 (RN ).
Exercise 4.3.22. Corollary 4.3.3 can be sharpened. In fact, Levy showed that
if {B(t) : t 0} is an R-valued Brownian motion, then

P

lim

sup

&0 0<ts

|B(t) B(s)|
= 2
L()


= 1,

192

4 Levy Processes

p
where L() log 1 . Notice that, on the one hand, this result is in the direction that one should expect: we know (cf. Theorem 4.3.4) that Brownian paths
are almost never H
older continuous of any order greater than 12 . On the other
hand, the Brownian Law of the Iterated Logarithm (cf. Exercise q
4.3.15) might
make one guess that their true modulus of continuity ought to be log(2) 1 ,

not L(). However, that guess is wrong because it fails to take into account the
difference between a question about what is true at a single time as opposed to
what is true simultaneously for all times. The purpose of this exercise is to show
how the considerations in 4.3.3 can be used to get a statement that is related
to but far less refined than Levys. The result to be proved here says only that


|B(t) B(s)|
K =1
(4.3.23)
P lim sup
&0 0<ts
L()

for some K < .


(i) First show that it suffices to prove that there exists a K < such that

P lim

sup

&0 0<ts
s,t[0,1]

|B(t) B(s)|

K = 1
L()

and that this will follow from


(*)

X
n=0

sup
2n1 ts2n

|B(t) B(s)|
>K
L(2n1 )

!
< .

(ii) Define the polygonal approximation {Bn (t) : t 0} as in 4.3.1, set Mn =


max1m2n |B(m2n ) B((m 1)2n )|, and show that

2kB Bn k[0,1]
Mn
|B(t) B(s)|
for 2n1 t s 2n .
+

n1
L(2n )
L(2n1 )
L(2
)
P
P p
(iii) Set C = n=0 (n + 1)2n , show that n=m L(2n )1 CL(2m )1 for
all m 0, and, arguing as in the proof of Theorem 4.3.2, conclude that, for any
R > 0,

X


P kB Bn k[0,1] R
P Mm+1 C 1 RL(2m1 )1 .
m=n

(iv) Show that, for all R > 0 and n N,



1 2
P Mn RL(2n )1 2n(12 R ) ,
and combine this with (ii) and (iii) to prove that (*) holds for some K < .

Chapter 5
Conditioning and Martingales

Up to this point I have been dealing with random variables that are either
themselves mutually independent or are built out of other random variables
that are. For this reason, it has not been necessary for me to make explicit
use of the concept of conditioning, although, as we will see shortly, this concept
has been lurking silently in the background. In this chapter I will first give the
modern formulation of conditional expectations and then provide an example of
the way in which conditional expectations can be used.
Let (, F, P) be a probability space, and suppose that A F is a set having
positive P-measure. For reasons that are most easily understood when is finite
and P is uniform, the ratio
P(B|A)

P(A B)
,
P(A)

B F,

is called the conditional probability of B given A. As one learns in an


elementary course, the introduction of conditional probabilities makes many
calculations much simpler; in particular, conditional probabilities help to clarify
dependence relations between the events represented by A and B. For example,
B is independent of A precisely when P(B|A) = P(B) or, in words, when the
condition that A occurs does not change the probability that B occurs. Thus, it
is unfortunate that the nave definition of conditioning as described above does
not cover many important situations. For example, suppose that X and Y are
random variables and that one wants to talk about the conditional probability
that Y b given that X = a. Unless one is very lucky and P(X = a) > 0,
dividing by P(X = a) is not going to do the job. As this example illustrates,
it is of great importance to generalize the concept of conditional probability to
include situations when the event on which one is conditioning has P-measure 0,
and the next section is devoted to Kolmogorovs elegant solution to the problem
of doing so.
5.1 Conditioning
In order to appreciate the idea behind Kolmogorovs solution, imagine someone
told you the conditional probability that the event B occurs given that the
event A occurs. Obviously, since you have no way of saying anything about the
193

194

5 Conditioning and Martingales

probability of B when A does not occur, she has provided you with incomplete
information about B. Thus, before you are satisfied, you should demand to
know also what is the conditional probability of B given that A does not occur.
Of course, this second piece of information is relevant only if A is not certain,
in which case P(A) < 1 and therefore P B A{ is well defined. More generally,
suppose that P = {A1 , . . . , AN } (N here may be either finite or countably
infinite) is a partition of into elements of F having positive P-measure. Then,
in order to have complete information about the probability
of B F relative to
P, one has to know the entire list of the numbers P B An , 1 n N . Next,
suppose that one attempts to describe this list in a way that does not depend
explicitly on the positivity of the numbers P(An ). For this purpose, consider the
function
N
X

7 f ()
P B An 1An ().
n=1

Clearly, f is not only F-measurable, it is measurable with respect to the algebra (P) over generated by P. In particular (because the only (P)measurable set of P-measure 0 is empty), f is uniquely determined by its Pintegrals EP [f, A] over sets A (P). Moreover, because, for each B (P)
and n, either An B or B An = , we have that
N

 X

EP f, A =
P B An =
n=1



P An B = P A B .

{n:An B}

Hence, the function f is uniquely determined by the properties that it is (P)measurable and that



EP f, A = P A B
for every A (P).
The beauty of this description is that it makes perfectly good sense even if
some of the An s have P-measure 0, except in that case the description does not
determine f pointwise but merely up to a (P)-measurable P-null set (i.e., a
set of P-measure 0), which is the very least one should expect to pay for dividing
by 0.
5.1.1. Kolmogorovs Definition. With the preceding discussion in mind,
one ought to find the following formulation reasonable. Namely, given a sub-algebra F and a (, ]-valued random variable X whose negative
part X (X 0) is P-integrable, I will say that the random variable X
is a conditional expectation of X given if X is (, ]-valued and

is P-integrable, and
-measurable, X




(5.1.1)
EP X , A = EP X, A for every A .
Obviously, having made this definition, my first duty is to show that such an
X always exists and to discover in what sense it is uniquely determined. The
latter problem is dealt with in the following lemma.

5.1 Conditioning

195

Lemma 5.1.2. Let be a sub--algebra of F, and suppose that X and Y


are a pair of (, ]-valued -measurable random variables for which X and
Y are both P-integrable. Then




EP X , A EP Y , A

for every A ,

if and only if X Y (a.s., P).


Proof: Without loss in generality, I may and will assume that = F and
will therefore drop the subscript ; and, since the if implication is completely
trivial, I will discuss only the minimally less trivial only if assertion. Thus,
suppose that P-integrals of Y dominate those of X and yet that X > Y on
a set of positive P-measure. We could then choose an M [1, ) so that
P(A) P (B) > 0, where



1
and B X = and Y M }.
A X M and Y X M

But if P(A) > 0, then








EP X, A EP Y, A EP X, A

1
M P (A),



which, because EP X, A is a finite number, is impossible. At the same time, if
P(B) > 0, then




= EP X, B EP Y, B M < ,
which is also impossible.

Theorem 5.1.3. Let be a sub--algebra of F and X a (, ]-valued


random variable for which X is P-integrable. Then there exists a conditional
expectation value X of X. Moreover, if Y is a second (, ]-valued random
variable and Y X (a.s., P), then Y is P-integrable and Y X (a.s., P) for
any Y that is a conditional expectation value of Y given . In particular, if
X = Y (a.s., P), then {Y 6= X } is a -measurable, P-null set.1
Proof: In view of Lemma 5.1.2, it suffices for me to handle the initial existence
statement. To this end, let G denote the class of X satisfying EP [X ] < for
which an X exists, and let G + denote the non-negative elements
 of G. If
{Xn : n 1} G + is non-decreasing and, for each n Z+ , Xn denotes a


conditional expectation of Xn given , then 0 Xn Xn+1 (a.s., P),


and therefore I can arrange that 0 Xn Xn+1 everywhere. In par
ticular, if X and X are the pointwise limits of the Xn s and Xn s, respectively, then the Monotone Convergence Theorem guarantees that X is a
1 Kolmogorov himself, and most authors ever since, have obtained the existence of conditional
expectation values as a consequence of the RadonNikodym Theorem. Because I find projections more intuitively appealing, I prefer the approach given here.

196

5 Conditioning and Martingales

conditional expectation of X given . Hence, we now know that G + is closed


under non-decreasing, pointwise limits, and therefore we will know that G + contains all non-negative random variables X as soon as we show that G contains all
bounded Xs. But if X is bounded (and is therefore an element of L2 (P; R)) and
L = L2 (, , P; R) is the subspace of L2 (P; R) consisting of its -measurable
elements, then the orthogonal projection X of X onto L is a -measurable
random variable that is P-square integrable and satisfies (5.1.1).
So far I have proved that G + contains all non-negative, F-measurable Xs.
Furthermore, if X is non-negative, then (by Lemma 5.1.2) X 0 (a.s., P) and
so X is P-integrable precisely when X itself is. In particular, I can arrange
and P-integrable.
to make X take its values in [0, ) when X is non-negative

Finally, to see that X G for every X with EP X < , simply consider X +
and X separately, apply the preceding to show that X 0 (a.s., P) and

that X is P-integrable, and check that the random variable





X + X when X 0 and X <
X
0
otherwise
is a conditional expectation of X given . 
Convention. Because it is determined only up to a -measurable P-null set,
one cannot, in general, talk about the conditional expectation of X as a function.
Instead, the best that one can do is say that the conditional expectation of
X is the equivalence class of -measurable X s that satisfy (5.1.1), and I will
adopt the notation EP [X|] to denote this equivalence class. On the other hand,
because one is usually interested only in P-integrals of conditional expectations,
it has become common practice to ignore, for the most part, the distinction
between the equivalence class EP [X|] and the members of that equivalence class.
Thus (just as one would when dealing with the Lebesgue spaces) I will abuse
notation by using EP [X|] to denote a generic element of the equivalence class
EP [X|] and will be more precise only when EP [X|] contains some particularly
distinguished member. For example, recall the random variables Tn entering the
definition of the simple Poisson process {N (t) : t (0, )} in 4.2.1. It is then
clear (cf. part (i) in Exercise 1.1.9) that we can take
h


i
EP 1{n} N (t) T1 , . . . , Tn = 1[0,t] Tn e(tTn ) ,
and one would be foolish to take any other representative. More generally, I
will always take non-negative representatives of EP [X|] when X itself is nonnegative and R-valued representatives when X is P-integrable. Finally, for historical reasons, it is usual to distinguish the case when X is the indicator function
1B of a set B F and to call EP [1B |] the conditional probability of B
given and to write P(B|) instead of EP [1B |]. Of course, representatives of
P(B|) will always be assumed to take their values in [0, 1].

5.1 Conditioning

197

Once one has established the existence and uniqueness of conditional expectations, there is a long list of more or less obvious properties that one can easily
verify. The following theorem contains some of the more important items that
ought to appear on such a list.
Theorem 5.1.4. Let be a sub--algebra of F. If X is a P-integrable random
variable and C is a -system (cf. Exercise 1.1.12) that generates , then
 
Y = EP X (a.s., P)




Y L1 (, , P; R) and EP Y, A = EP X, A for A C {}.
Moreover, if X is any (, ]-valued random variable that satisfies EP [X ]
< , then each of the following relations holds P-almost surely:
(5.1.5)

P  
 
E X EP |X| ;

(5.1.6)

h   i
 

EP X T = EP EP X T

when T is a sub--algebra of ; and, when X is R-valued and P-integrable,




 
EP X = EP X .
Next, let Y be a second (, ]-valued random variable with EP [Y ] < .
Then, P-almost surely,


 
 
EP X + Y = EP X + EP Y

for each , [0, ),

and


 
EP Y X = Y EP X

(5.1.7)

if Y is -measurable and (XY ) is P-integrable. Finally, suppose that {Xn :


n 1} is a sequence of (, ]-valued random variables. Then, P-almost
surely,
 
 
EP Xn % EP X

(5.1.8)

if EP [X1 ] < and Xn % X (a.s., P);

and, more generally,



P

(5.1.9) E



 
lim Xn lim EP Xn

if Xn 0 (a.s., P) for each n Z+ .

198

5 Conditioning and Martingales

Proof: To prove the first assertion, note that the set of A for which
EP [X, A] = EP [Y, A] is (cf. Exercise 1.1.12) a -system that contains C and
therefore . Next, clearly (5.1.5) is just an application of Lemma 5.1.2, while
(5.1.6) and the two equations that follow it are all expressions of uniqueness. As
for the next equation, one can first reduce to the case when X and Y are both
non-negative. Then one can use uniqueness to check it when Y is the indicator
function of an element of , use linearity to extend it to simple -measurable
functions, and complete the job by taking monotone limits. Finally, (5.1.8) is an
immediate application of the Monotone Convergence Theorem, whereas (5.1.9)
comes from the conjunction of


 
inf Xn inf EP Xn
nm
nm


P

E
with (5.1.8).

(a.s., P),

m Z+ ,

It probably will have occurred to most readers that the properties discussed
in Theorem 5.1.4 give strong evidence that, for fixed , X 7 EP [X|]()
behaves like an integral (in the sense of Daniell) and therefore ought to be
expressible in terms of integration with respect to a probability measure P .
Indeed, if one could actually talk about X 7 EP [X|]() for a fixed (as opposed
to P-almost every) , then there is no doubt that such a P would have to
exist. Thus, it is reasonable to ask whether there are circumstances in which one
can gain sufficient control over all the P-null sets involved to really make sense
out of X 7 EP [X|]() for fixed . Of course, when is generated by a
countable partition P, we already know what to do. Namely, when A P,
we can take
(
0
if P(A) = 0
P
E [X|]() =
EP [X, A]
if P(A) > 0.
P(A)

Even when does not arise in this way, one can often find a satisfactory representation of conditional expectations as expectations. A quite general statement
of this sort is the content of Theorem 9.2.1 in Chapter 9.
5.1.2. Some Extensions. For various applications it is convenient to have
two extensions of the basic theory developed in 5.1.1. Specifically, as I will now
show, the theory is not restricted to probability (or even finite) measures and
can be applied to random variables that take their values in a separable Banach
space. Thus, from now
on, will be an arbitrary (non-negative) measure on

(, F) and E, kkE will be a separable Banach space; and I begin by reviewing
a few elementary facts about -integration for E-valued random variables.2
2 The integration that I outline below is what functional analysts call the Bochner integral for
Banach spacevalued functions. There is a more subtle and intricate theory due to Pettis, but
Bochners theory seems adequate for most probabilistic considerations.

5.1 Conditioning

199

A function X : E is said to be -simple


if X is F-measurable, X takes

only finitely many values, and X 6= 0 < , in which case its integral with
respect to is the element of E given by
Z
X

E [X] =
X() (d)
x (X = x).

xE\{0}

Notice that another description of E [X] is as the unique element of E with the
property that





E [X], x = E hX, x i for all x E
(I use E to denote the dual of E and hx, x i to denote the action of x E on
x E), and therefore that the mapping taking -simple X to E [X] is linear.
Next, observe that 7 kX()kE R is F-measurable if X : E is
F-measurable. In particular, for F-measurable X : E, I will set
(

1
if p [1, )
E kXkpE p
kXkLp (;E) =



if p =
inf M : kXkE > M = 0

and will write X Lp (; E) when kXkLp (;E) < . Also, I will say the X :
E is -integrable if X L1 (; E); and I will say that X is locally
-integrable if 1A X is -integrable for every A F with (A) < .
The definition of -integration for an E-valued X is completed in the following
lemma.
Lemma 5.1.10. For
each -integrable
X : E there is a unique element

E [X] E satisfying EP [X], x = EP [hX, x ] for all x E . In particular,
the mapping X L1 (; E) 7 E [X] E is linear and satisfies




E [X] E kXkE .
(5.1.11)
E
Finally, if X Lp (; E), where p [1, ), then there is a sequence {Xn : n 1}
of E-valued, -simple functions with the property that kXn XkLp (;E) 0.
Proof: Clearly uniqueness, linearity, and (5.1.11) all follow immediately from
the given characterization of E [X]. Thus, all that remains is to prove existence
and the final approximation assertion. In fact, once the approximation assertion
is proved, then existence will follow immediately from the observation that, by
(5.1.11), E [X] can be taken equal to limn E [Xn ] if kX Xn kL1 (;E) 0.
To prove the approximation assertion, I begin with the case when is finite
and M = sup kX()kE < . Next, choose a dense sequence {x` : ` 1} in
E, set A0,n = , and let
o
n
for (`, n) Z+ Z+ .
A`,n = : kX() x` kE < n1

200

5 Conditioning and Martingales

Then, for each n Z+ there exists an Ln Z+ with the property that


!
Ln
[
1
\
A`,n < p .
n
`=1

Hence, if Xn : E is defined so that


when 1 ` Ln and A`,n \

Xn () = x`

`1
[

Ak,n

k=0

and Xn () = 0 when
/

SLn
1

A`,n , then Xn is -simple and

M + (E)
.
n
In order to handle the general case, let X Lp (; E) and n Z+ be given.
We can then find an rn (0, 1] with the property that
Z
1
,
kX()kpE (d)
(2n)p
kX Xn kLp (;E)

(rn ){

where

o
n
for r (0, 1].
(r) : r kX()kE 1r

Since, for any r (0, 1], rp (r) kXkpLp (;E) , we can apply the preceding to
the restrictions of and X to (rn ) and thereby find a -simple Xn : (rn )
E with the property
! p1
Z
1
p
.

kX() Xn ()kE (d)


2n
(rn )

Hence, after extending Xn to by taking it to


-simple Xn for which kX Xn kLp (;E) n1 .
Given an F-measurable X : E and a B
I will use, depending on the context, either
Z


E X, B or
X d or
B

be 0 off of (rn ), we arrive at a



F for which 1B X L1 (; E),

Z
X() (d)
B

to denote the quantity E [1B X]. Also, when discussing the spaces Lp (; E), I
will adopt the usual convention of blurring the distinction between a particular
F-measurable X : E belonging to Lp (; E) and the equivalence class of
those F-measurable Y s that differ from X on a -null set. Thus, with this
convention, k kLp (;E) becomes a bona fide norm (not just a seminorm) on
Lp (; E) with respect to which Lp (; E) becomes a normed vector space. Finally,
by the same procedure with which one proves the Lp (; R) spaces are complete,
one can prove that the spaces Lp (; E) are complete for any separable Banach
space E.

5.1 Conditioning

201

Theorem 5.1.12. Let (, F, ) be a -finite measure space and X : E


a locally -integrable function. Then



X 6= 0 = 0 E X, A = 0 for A F with (A) < .
Next, assume that is a sub--algebra for which  is -finite. Then, for
each locally -integrable X : E, there is a -almost everywhere unique
locally -integrable, -measurable X : E such that
(5.1.13)





E X , A = E X, A

for every A with (A) < .

In particular, if Y : E is a second locally -integrable function, then, for


all , R,

X + Y = X + Y (a.e., ).
Finally,


X kXkE
E

(5.1.14)

(a.e., ).

Hence, not only does (5.1.13) continue to hold for any A with 1A X
L1 (; E), but also, for each p [1, ], the mapping X Lp (; E) 7 X
Lp (; E) is a linear contraction.
Proof: Clearly, it is only necessary to prove the = part of the first assertion.
Thus, suppose that (X 6= 0) > 0. Then, because E is separable and therefore
(cf. Exercise 5.1.19) E with the weak* topology
is also separable,
there exists

an  > 0 and a x E with the property that X, x  > 0, from which
it follows (by -finiteness) that there is an A F for which (A) < and
D


 E
i
E X, A , x = E X, x , A 6= 0.

I turn next to the uniqueness and other properties of X . But it is obvious that
uniqueness is an immediate consequence of the first assertion and that linearity
follows from uniqueness. As for (5.1.14), notice that if x E and kx kE 1,
then











E X , x , A = E X, x , A E kXkE , A = E kXkE , A
for every A with (A) < . Hence,
at
least when
 is a probability

measure, Theorem 5.1.3 implies that X , x kXkE (a.e., ) for each


element x from the unit ball in E ; and so, because E with the weak* topology
is separable, (5.1.14) follows in this case. To handle s that are not probability
measures, note that either () = 0, in which case everything is trivial, or
() (0, ), in which case we can renormalize to make it a probability

202

5 Conditioning and Martingales

measure, or () = , in which case we can use the -finiteness of  to


reduce ourselves to the countable, disjoint union of the preceding cases.
Finally, to prove the existence of X , I proceed as in the last part of the
preceding paragraph to reduce myself to the case when is a probability measure
P. Next, suppose that X is simple, let R denote its range, and note that
X

X
xP X = x
xR

has the required properties. In order to handle general X L1 (P; E), I use the
approximation result in Lemma 5.1.10 to find a sequence {Xn : n 1} of simple
functions that tend to X in L1 (P; E). Then, since

(Xn ) (Xm ) = Xn Xm (a.s., P)
and therefore, by (5.1.14),



(Xn ) (Xm ) 1
kXn Xm L1 (P;E) ,
L (P;E)
1
we
exists a -measurable X L (P; E) to which the sequence
 know that there
(Xn ) : n 1 converges; and clearly X has the required properties. 
Referring to the setting in the second part of Theorem 5.1.12, I will extend
the convention introduced following Theorem 5.1.3 and call the -equivalence
class of X s satisfying (5.1.13) the -conditional expectation of X given
, will use E [X|] to denote this -equivalence class, and will, in general,
ignore the distinction between the equivalence class and a generic representative
of that class. In addition, if X : E is locally -integrable, then, just
as in Theorem 5.1.4, the following are essentially immediate consequences of
uniqueness:


 
E Y X = Y E X (a.e., ) for Y L (, , ; R),

and

h   i
 

E X T = E E X T

(a.e., )

whenever T is a sub--algebra of for which  T is -finite.


Exercises for 5.1
Exercise 5.1.15. As the proof of existence in Theorem 5.1.4 makes clear, the
operation X L2 (P; R) 7 EP [X|] is just the operation of orthogonal projection from L2 (P; R) onto the space L2 (, , P; R) of -measurable elements of
L2 (P; R). For this reason, one might be inclined to think that the concept of conditional expectation is basically a Hilbert space notion. However, as this exercise
shows, that inclination should be resisted. The point is that, although conditional expectation is definitely an orthogonal projection, not every orthogonal
projection is a conditional expectation!

Exercises for 5.1

203


(i) Let L be a closed linear subspace of L2 (P; R), and let L = {X : X L}

be the -algebra over generated by X L. Show that L = L2 , L , P; R if
and only if 1 L and X + L whenever X L.
Hint: To prove the if assertion, let X L be given, and show that
h
i
+
Xn n X 1 1 L for every R and n Z+ .
Conclude that Xn % 1(,) X must be an element of L.
(ii) Let be an orthogonal projection operator on L2 (P; R), set L = Range(),
and let = L , where L is defined as in part (i). Show that X = EP [X|]
(a.s., P) for all X L2 (P; R) if and only if 1 = 1 and
(*)


X Y = (X)(Y )

for all

X, Y L (P; R).

Hint: Assume that 1 = 1 and that (*) holds. Given X L (P; R), use
induction to show that
2
kXknL2n (P) kXkn1
L (P) kXkL (P)

and

n

= X(X)n1

n
for all n Z+ . Conclude that kXkL (P) kXkL (P) and that X

L, n Z+ , for every X L (P; R). Next, using the preceding together with
Weierstrasss Approximation Theorem, show that (X)+ L, first for X
L (P; R) and then for all X L2 (P; R). Finally, apply (i) to arrive at L =
L2 , , P; R .
(iii) To emphasize the point being made here, consider once again a closed
linear subspace L of L2 (P; R), and let L be orthogonal projection onto L.
Given X L2 (P; R), recall that L X is characterized as the unique element of
L for which X L X L, and show that EP [X|L ] is the unique element of
L2 (, L , P; R) with the property that
 

X EP X L f Y1 , . . . , Yn

for all n Z+ , f Cb Rn ; R , and Y1 , . . . , Yn L. In particular, L X =
EP [X|L ] if and only if X L X is perpendicular not only to all linear functions
of the Y s in L but even to all nonlinear ones.
Exercise 5.1.16. In spite of the preceding, there is a situation in which orthogonal projection coincides with conditioning. Namely, suppose that G is a
closed Gaussian family in L2 (P; R), and let L be a closed, linear subspace of G.
As an application of Lemma 4.3.1, show that, for any X G, the orthogonal
projection L X of X onto L is a conditional expectation value of X given the
-algebra L generated by the elements of L.

204

5 Conditioning and Martingales

Exercise 5.1.17. Because most projections are not conditional expectations,


it is an unfortunate fact of life that, for the most part, partial sums of Fourier
series cannot be interpreted as conditional expectations. Be that as it may, there
are special cases in which such an interpretation is possible. To see this, take
= [0, 1), F = B[0,1) , and P to be the restriction of Lebesgue measure to [0, 1).
Next, for n N, take Fn to be the -algebra generated by
those f C([0,
1); C)

n
for k Z,
that are periodic with period 2 . Finally, set ek (x) = exp 1k2x


and use elementary Fourier analysis to show that, for each n N, ek2n : k Z
is an orthonormal basis for L2 (, Fn , P; C). In particular, conclude that, for
every f L2 (P; C),
X
 

f, ek2n L2 ([0,1);C) ek2n ,
EP f Fn = EP [f ] +
kZ

where the convergence is in L2 ([0, 1]; C). (Also see Exercise 5.2.45.)
Exercise 5.1.18. Let (, F, ) be a measure space and a sub--algebra of
F with the property that  is -finite. Next, let E be a separable Hilbert
0
space, p [1, ], X Lp (; E), and Y a -measurable element of Lp (; E) (p0
is the Holder conjugate of p). Show that
h
 i 
 
-almost surely.
E Y, X E = Y, E X
E

Hint: First observe that it suffices to check that


h
h
 i
  i
E Y, X E = E Y, E X
.
E

Next, choose an orthonormal basis {en : n 0} for E, and justify the steps in


  X


 
E Y, en E en , X E
E Y, X E =
1

h



 i
 
E Y, en E E en , X E = E Y, E [X|] E .

Exercise 5.1.19. Let E be a separable Banach space, and show that, for each
R > 0, the closed ball BE (0, R) with the weak* topology is a compact metric
space. Conclude from this that the weak* topology on E is second countable
and therefore separable.
Hint: Choose a countable, dense subset {xn : n 1} in the unit ball BE (0, 1),
and define

(x , y ) =

X
n=1



2n hxn , x y i for x , y BE (0, R).

5.2 Discrete Parameter Martingales

205

Show that is a metric for the weak* topology on BE (0, R). Next, choose
{xnm : m 1} so that xn1 = x1 and xnm+1 = xn if n is the first n > nm such
that xn is linearly independent of {x1 , . . . , xn1 }. Given a sequence {x` : ` 1}
in BE (0, R), use a diagonalization argument to find a subsequence {x`k : k 1}
such that am = limk hxnm , x`k i exists for each m 1. Now define f on the
PM
PM
span S of {xnm : m 1} so that f (x) = m=1 m am if x = m=1 m xnm ,
note that f (x) = limk hx, x`k i for x S, and conclude that f is linear on
S and satisfies the estimate |f (x)| RkxkE there. Since S is dense in E,
there is a unique extension of f as a bounded linear functional on E satisfying
the same estimate, and so there exists an x BE (0, R) such that hx, x i =
limk hx, x`k i for all x S. Finally, check that this convergence continues to
hold for all x E, and conclude that x`k x in the weak* topology.

Exercise 5.1.20. The purpose of this exercise is to show that Bochners theory
of integration for Banach space functions relies heavily on the assumption that
the Banach space be separable. In particular, the approximation procedure on
which the proof of Lemma 5.1.10 fails in the absence of separability. To see
this, consider the Banach space ` (; R) of uniformly bounded sequences x =
(x0 , . . . , xn , . . . ) RN with kxk` (N;R) = supn0 |xn |. Next, let {Xn : n 0}
be a sequence of mutually independent, {1, 1}-valued, Bernoulli random with

mean value 0 on some probability space


 (, F, P), and define X : ` (N; R)
by X() = X0 (), . . . , Xn (), . . . . Show that, for any simple function Y :
` (N; R),

P kX Yk` (N;R) < 14 = 0.

Hint: For any R, show that P |Xn | < 14 12 and therefore that
P kX Ak` (N;R) < 14 = 0 for any A ` (N; R).

5.2 Discrete Parameter Martingales


In this section I will introduce an interesting and useful class of stochastic processes that unifies and simplifies several branches of probability theory as well as
other branches of analysis. From the analytic point of view, what I will be doing
is developing an abstract version of differentiation theory (cf. Theorem 6.1.8).
Although I will want to make some extensions in 5.3, I start in the following setting. (, F, P) is a probability space and {Fn : n N} is a nondecreasing sequence of sub--algebras of F. Given a measurable space (E,
 B),
:
n

N}
of
E-valued
random
variables
is
Fn :
say that
the
family
{X
n

n N -progressively measurable if Xn is Fn -measurable for each n N.
random variables
is said
Next, a family {Xn : n N} of (, ]-valued


to be
a P-submartingale with respect to Fn : n N if it is Fn :
n N -progressively measurable, EP [Xn ] < , and, for each n N, X
n
|F
]
(a.s.,
P).
It
is
said
to
be
a
P-martingale
with
respect
to
Fn :
EP [Xn+1
n



n N if {Xn : n N} is an Fn : n N -progressively measurable family of

206

5 Conditioning and Martingales

R-valued, P-integrable random variables satisfying Xn = EP [Xn+1 |Fn ] (a.s., P)


for each n N. In the
 future, I will abbreviate these statements by saying that
the triple Xn , Fn , P is a submartingale or a martingale.
Examples. The most trivial example of a submartingale is provided by a non-
decreasing sequence {an : n 0}. That is, if Xn an , n N, then Xn , Fn , P
is a submartingale
on any probability space , F, P relative to any non-decreas
ing Fn : n N . More interesting examples are those which follow.1
(i) Let {Yn : n 1} be a sequence of mutually independent (, ]-valued ran-
, n N, set F0 = {, }, Fn = {Y1 , . . . , Yn }
dom variables with EP [Yn ] <P
+
for n Z , and define Xn = 1mn Ym , where summation over the empty set
is taken to be 0. Then, because EP [Yn+1 |Fn ] = EP [Yn+1 ] (a.s., P) and therefore




EP Xn+1 Fn = Xn + EP Yn+1 (a.s., P)

for every n N, we see that Xn , Fn , P is a submartingale if and only if
EP [Yn ] 0 for all n Z+ . In fact, if the Yn s are R-valued
and P-integrable,

then the same line of reasoning shows that Xn , Fn , P is a martingale if and
only if EP [Yn ] = 0 for all n Z+ . Finally, if {Yn : n 0} L2 (P; R) and
EP [Yn ] = 0 for each n Z+ , then
 2 
 2 
Fn = Xn2 + EP Yn+1
Fn Xn2 (a.s., P),
EP Xn+1

and so Xn2 , Fn , P is a submartingale.


(ii) If X is an R-valued, P-integrable random variable and Fn : n N is a non-
decreasing sequence of sub--algebras of F, then, by (5.1.6), EP [X|Fn ], Fn , P
is a martingale.


(iii) If Xn , Fn , P is a martingale, then, by (5.1.5), |Xn |, Fn , P is a submartingale.
5.2.1. Doobs Inequality and Marcinkewitzs Theorem. In view of Example (i) above, we see that partial sums of independent random variables with
mean value 0 are a source of martingales and that their squares are a source of
submartingales. Hence, it is reasonable to ask whether some of the important
facts about such partial sums will continue to be true for all martingales, and
perhaps the single most important indication that the answer may be yes is
contained in the following generalization of Kolmogorovs Inequality (cf. Theorem 1.4.5). Like most of the foundational results in martingale theory, this one
is due to J.L. Doob. It is interesting that Doobs proof is essentially the same
as Kolmogorovs, only, if anything easier.
1

For a much more interesting and complete list of examples, the reader might want to consult
J. Neveus Discrete-parameter Martingales, NorthHolland (1975).

5.2 Discrete Parameter Martingales

207


Theorem 5.2.1 (Doobs Inequality). Assume that Xn , Fn , P is a submartingale. Then, for every N Z+ and (0, ),




1 P
(5.2.2)
P
max Xn E XN , max Xn .
0nN
0nN

In particular, if the Xn s are non-negative, then, for each p (1, ),



(5.2.3)

sup Xnp
nN

 p1

 1
p
sup EP Xnp p .
p 1 nN

Proof: To prove (5.2.2), set A0 = {X0 } and




An = Xn but max Xm <
0m<n

for n Z+ .

Then the An s are mutually disjoint and An Fn for each n N. Thus,





 X
N
N
 X
EP Xn , An
P
max Xn =
P An
0nN

n=0
n=0




N
X
EP XN , An
1
= EP XN , max Xn .

0nN

n=0

Now assume that the Xn s are non-negative. Given (5.2.2), (5.2.3) becomes
an easy application of Exercise 1.4.18. 
Doobs inequality is an example of what analysts call a weak-type inequality. To be more precise, it is a weak-type 11 inequality. The terminology derives
from the fact that such an inequality follows immediately from an L1 -norm, or
strong-type 11, inequality between the objects under consideration; but, in general, it is strictly weaker. In order to demonstrate how powerful such a result
can be, I will now apply Doobs Inequality to prove a theorem of Marcinkewitz.
Because it is an argument to which we will return again, the reader would do
well to become comfortable with the line of reasoning that allows one to pass
from a weak-type inequality, like Doobs, to almost sure convergence results.
Corollary 5.2.4. Let X be an R-valued random variable
and p

[1, ). If
X Lp (P; R), then, for any non-decreasing sequence Fn : n N of sub-algebras of F,
"
#
_
 
P
P
(a.s., P) and in Lp (P; R) as n .
E X Fn E X Fn
0

In particular, if X is
Lp (P; R).

W
0

Fn -measurable, then EP [X|Fn ] X (a.s., P) and in

208

5 Conditioning and Martingales

W
Proof: Without loss in generality, assume that F = 0 Fn .
Given X L1 (P; R), set Xn = EP [X|Fn ] for n N. The key to my proof will
be the inequality




1
(5.2.5)
P sup |Xn | EP |X|, sup |Xn | , (0, );

nN
nN

and, since, by (5.1.5), |Xn | EP [|X| |Fn ] (a.s., P), while proving (5.2.5) I may
and will assume that X and all the Xn s are non-negative. But then, by (5.2.2),




1 P
P
sup Xn > E XN , sup Xn >

0nN
0nN


1
= EP X, sup Xn >

0nN

for all N Z+ , and therefore (5.2.5) follows when N and one takes right
limits in .
As my first application of (5.2.5), note that {Xn : n 0} is uniformly Pintegrable. Indeed, because |Xn | EP [|X| |Fn ], we have from (5.2.5) that
h
h
i
i
sup EP |Xn |, |Xn | sup EP |X|, |Xn |
nN
nN


P
E |X|, sup |Xn | 0
nN

as . Thus, we will know that the asserted convergence takes place in


L1 (P; R) as soon as we show that it happens P-almost surely. In addition, if
X 
Lp (P; R) for some
p (1, ), then, by (5.2.5) and Exercise 1.4.18, we see
that |Xn |p : n N is uniformly P-integrable and, therefore, that Xn X
in Lp (; R) as soon as it does (a.s., P). In other words, everything comes down
to checking the P-almost sure convergence for X L1 (P; R).
To prove the P-almost sure convergence, let G be the set of X L1 (P; R) for
which Xn X (a.s., P). Clearly, X G if X L1 (P; R) is Fn -measurable for
some n N, and, therefore, G is dense in L1 (P; R). Thus, all that remains is to
prove that G is closed in L1 (P; R). But if {X (k) : k 1} G and X (k) X
in L1 (P; R), then, by (5.2.5),






P sup Xn X 3
nN


P








sup Xn Xn(k) + P sup Xn(k) X (k)

nN

nN




+ P X (k) X



(k)

2
(k)
(k)

X X
+ P sup Xn X

L1 (P)

nN

5.2 Discrete Parameter Martingales

209

for every N Z+ , (0, ), and k Z+ . Hence, by first letting N and


then k , we see that




lim P sup Xn X 3 = 0 for every (0, );
N

nN

and this proves that X G. 


Before moving on to more sophisticated convergence results, I will spend a
little time showing that Corollary 5.2.4 is already interesting. In order to introduce my main application, recall my preliminary discussion of conditioning
when I was attempting to explain Kolmogorovs idea at the beginning of this
chapter. As I said there, the most easily understood situation occurs when one
conditions with respect to a sub--algebra that is generated by a countable
partition P. Indeed, in that case one can easily verify that


  X EP X, A
P
1A ,
(5.2.6)
E X =
P(A)
AP

where it is understood that




EP X, A
0
P(A)

when P(A) = 0.

Unfortunately, even when F is countably generated, need not be (cf. Exercise


1.1.18). Furthermore, just because is countably generated, it will be seldom
true that its generators can be chosen to form a countable partition. (For example, as soon as contains an uncountable number of atoms, such a partition
cannot exist.) Nonetheless, if is any countably generated -algebra, then we
can find a sequence {Pn : n 0} of finite partitions with the properties that
!

[


Pn
and Pn1 Pn , n Z+ .
=
0

In fact, simply choose a countable generating sequence {An : n 0} for and


take Pn to be the collection of distinct sets of the form B0 Bn , where
Bm {Am , Am {} for each 0 m n.
Theorem 5.2.7. Let be a countably generated sub--algebra of F, and
choose {Pn : n 0} to be a sequence of finite partitions as above. Next, given
p [1, ) and a random variable X Lp (P; R), define Xn for n N by the
right-hand side of (5.2.6) with P = Pn . Then Xn EP [X|] both P-almost
surely and in Lp (P; R). Moreover, even if is not countably generated, for each
separable, closed subspace L of Lp (P; R) there exists a sequence {Pn : n N}
of finite partitions such that


X EP X, A
 
1A EP X (a.s., P) and in Lp (P; R)
P(A)
APn

for every X L.

210

5 Conditioning and Martingales


Proof: To prove the first part, simply set Fn = Pn , identify the Xn in
(5.2.6) as EP [X|Fn ], and finally apply Corollary
5.2.4. As for the second part,

let (L) be the -algebra generated by EP [X|] : X L , note that (L) is
countably generated and that

 

EP X = EP X (L)

(a.s., P)

for each X L,

and apply the first part with replaced by (L). 


Theorem 5.2.7 makes it easy to transfer the usual Jensens Inequality to conditional expectations.
Corollary 5.2.8 (Jensens Inequality). Let C be a closed, convex subset
of RN , X a C-valued, P-integrable random variable, and a sub--algebra of
F. Then there is a C-valued representative X of
P  
E X1
 

..
P

E X
. .

EP XN
In addition, if g : C [0, ) is continuous and concave, then



EP g(X) g X

(a.s., P).

Finally, if f : C R is continuous, convex, and bounded above and if X is a


C-valued, P-integrable random variable, then f (X) is P-integrable and
(5.2.9)




f EP [X|] EP f (X)|] (a.s., P).

(See Exercise 6.1.15 for Banach spacevalued random variables.)


Proof: By the classical Jensens Inequality, Y g(X) is P-integrable. Hence,
by the second part of Theorem 5.2.7, we can find finite partitions Pn , n N, so
that
X EP [X, A]
1A EP [X|]
Xn
P(A)
APn

and



X EP g(X), A


1A EP g(X)
Yn
P(A)
APn

P-almost surely. Furthermore, again by the classical Jensens Inequality,


EP [X, A]
C
P(A)

and




 P
EP g(X), A
E [X, A]
g
P(A)
P(A)

5.2 Discrete Parameter Martingales

211

for all A F with P(A) > 0. Hence, if denotes the set of for which


Xn ()
lim
RN +1
n Yn ()
exists, v is a fixed element of C,

limn Xn () if
X ()
v
if
/ ,
and


Y ()

limn Yn () if
v
if
/ ,

then X is a C-valued representative


of EP [X|], Y is a representative of

P
E [g(X)|], and Y () g X () for every .
Turning to the final assertion, begin by observing that once one knows that
f (X) L1 (P; R), the concluding inequality follows immediately by applying the
first part to the non-negative, concave function M f , where M R is an upper
bound of f . Thus, what remains to be shown is that f (X) L1 (P; R). To
and
this end, set fn = (n) f for n 1. Then fn is bounded and

 convex,

P
P
so, by the preceding with = {, }, we know
that fn E [X] E fn (X) .

Writing fn = f+ fn , this shows that EP fn (X) M + f E P [X] when
n f E P [X] . Finally, note that fn = n f , and conclude that f (X)
L1 (P; R). 
Corollary 5.2.10. Let I be a non-empty, closed interval in R {+} (i.e.,
either I R is bounded on the right or I R is unbounded on the right and
I includes the point +). Then every I-valued random variable X with Pintegrable negative part admits an I-valued representative of EP [X|]. Furthermore, if f : I R {+} is a continuous, convex function and either f is
bounded above and X L1 (P; R) or f is bounded below and to the left (i.e.,
f is bounded on each interval of the form I (, a] with a I R), then
f (X) L1 (P; R) and (5.2.9) holds. In particular, for each p [1, ),
P  
E X p
kXkLp (P;R) .
L (P;R)

Finally, if either Xn , Fn , P is anI-valued martingale and f satisfies the preceding conditions or if Xn , Fn , P is an I-valued submartingale and f is continuous, non-decreasing, convex, and bounded below, then f (Xn ), Fn , P is a
submartingale.
Proof: In view of Corollary 5.2.8, we know that an I-valued representative
of EP [X|] exists when X is P-integrable, and the general case follows after a
trivial truncation procedure.

212

5 Conditioning and Martingales

In the case when X is P-integrable and f is bounded above, f (X) L1 (P; R)


and (5.2.9) are immediate consequences of the last part of Corollary 5.2.8. To
handle the case when f is bounded below and to the left, first observe that either
f is non-increasing everywhere or there is an a I R with the property that
f is non-increasing to the left of a and non-decreasing to the right of a. Next,
let an I-valued X with X L1 (P) be given, and set Xn = X n. Then there
exists an m Z+ such that Xn is I-valued for all n m; and clearly, by the
preceding, we know that
  


(*)
f EP Xn EP f (Xn ) (a.s., P) for all n m.


Moreover, in the case when f is non-increasing, f (X

 n ) : n m is bounded
below and non-increasing; and, in the other case, f (Xn ) : n m a is
bounded below and non-decreasing. Hence, in both cases, the desired conclusion
follows from (*) after an application of the version of the Monotone Convergence
Theorem in (5.1.8).
To complete the proof, simply note that in either of the two cases given, the
results just proved justify
 





P-almost surely. 
EP f (Xn ) Fn1 f EP Xn Fn1 f Xn1
5.2.2. Doobs Stopping Time Theorem. Perhaps the most far-reaching
contribution that Doob made to martingale theory is his observation that one
can stop a martingale without destroying the martingale property. Later, L.
Snell showed that the analogous result is true for submartingales.
In order to state their results here, I need to introduce the notion of a stopping
time in this setting. Namely, I will say that thefunction : N {} is
a stopping time relative to {Fn : n 0} if : () = n Fn for each
n N. In addition, given a stopping time , I use F to denote the -algebra of
A F such that A { = n} Fn , n Z+ . Notice that F1 F2 if 1 2 .
In addition, if {Xn : n N} is Fn : n N -progressively measurable, check
that the random variable X given by X () = X() () is F -measurable on
{ < }.
Doob used stopping times to give a mathematically rigorous formulation of the
W.C. Fields assertion that you cant cheat an honest man. That is, consider
a gambler who is trying to beat the system. Assuming that he is playing a fair
game, it is reasonable to say his gain Xn after n plays will evolve as a martingale.
More precisely, if Fn contains
the history of the game up to and including the

nth play, then Xn , Fn , P will be a martingale. In the context of this model, a
stopping time can be thought of as a feasible (i.e., one that does not require the
gift of prophesy) strategy that the gambler can use to determine when he should
stop playing in order to maximize his gains. When couched in these terms, the
next result predicts that there is no strategy with which the gambler can alter
his expected gain.

5.2 Discrete Parameter Martingales

213

Theorem 5.2.11 (Doobs Stopping


Time Theorem). For any submartin
gale (martingale)
Xn , Fn , P that is P-integrable and any stopping time ,

Xn , Fn , P is again a P-integrable submartingale (martingale).
Proof: Let A Fn1 . Then, since A { > n 1} Fn1 ,






EP Xn , A = EP X , A { n 1} + EP Xn , A { > n 1}






EP X , A { n 1} + EP Xn1 , A { > n 1} = EP X(n1) , A ;
and, in the case of martingales, the inequality in the preceding can be replaced
by an equality. 
Closely related to Doobs Stopping Time Theorem is an important variant
due to G. Hunt. In order to facilitate the proof of Hunts result, I begin with an
easy but seminal observation of Doobs.
Lemma 5.2.12 (Doobs Decomposition). For each n N let Xn be an
Fn -measurable, P-integrable random variable. Then, up to a P-null set, there is
at most one sequence {An : n 0} L1 (P; R) such
 that A0 = 0, An is Fn1 +
measurable for each n Z , and Xn An , Fn , P is a martingale. Moreover, if
(Xn , Fn , P) is an integrable submartingale, then such a sequence {An : n 0}
exists, and An1 An P-almost surely for all n Z+ .
Proof: To prove the uniqueness assertion, suppose that {An : n 0} and
{Bn : n 0} are two such sequences, and set n = Bn An . Then 0 = 0,
n is Fn1 -measurable for each n Z+ , and (n , Fn , P) is a martingale. But
this means that n = EP [n | Fn1 ] = n1 for all n Z+ , and so n = 0 for
all n N.
Now suppose that (Xn , Fn , P) is an integrable submartingale. To prove the
asserted existence result, set A0 0 and



An = An1 + EP Xn Xn1 Fn1 0

for n Z+ .


Theorem 5.2.13 (Hunt). Let Xn , Fn , P be a P-integrable submartingale.
Given bounded stopping times and 0 satisfying 0 ,
(5.2.14)



X EP X 0 F

(a.s., P),


and the inequality can be replaced by equality when Xn , Fn , P is a martingale.
(Cf. Exercise 5.2.39 for unbounded stopping times.)
Proof: Choose {An : n N} for (Xn , Fn , P) as in Lemma 5.2.12, and set
Yn = Xn An for n N. Then, because A A 0 and A is F -measurable,




 
EP X 0 F EP Y 0 + A F = EP Y 0 F + A .

214

5 Conditioning and Martingales


Hence, it suffices to prove that equality holds in (5.2.14) when Xn , Fn , P is a
martingale. To this end, choose N Z+ to be an upper bound for 0 , let F
be given, and note that
N


 X

EP XN , { = n}
EP XN , =
n=0

N
X





EP Xn , { = n} = EP X , .

n=0

Similarly, since F F 0 , EP [XN , ] = EP [X 0 , ]. 


5.2.3. Martingale Convergence Theorem. My next goal is to show that,
even when they are not given in the form covered by Corollary 5.2.4, martingales
want to converge. If for no other reason, such a result has got to be more difficult
because one does not know ahead of time what, if it exists, the limit ought to
be. Thus, the reasoning will have to be more subtle than that used in the proof
of Corollary 5.2.4. I will follow Doob and base my argument on the idea that, in
some sense, a martingale has got to be nearly constant and that a submartingale
is the sum of a martingale and a non-decreasing process. In order to make
mathematics out of this idea, I need to introduce a somewhat novel criterion for
convergence of real numbers. Namely, given a sequence {xn : n 0} R and
a numbers < a < b < , say that {xn : n 0} upcrosses the interval
[a, b] at least N times if there exist integers 0 m1 < n1 < < mN < nN
such that xmi a and xni b for each 1 i N and that it upcrosses [a, b]
precisely N times if it upcrosses [a, b] at least N but does not upcross [a, b] at
least N + 1 times. Notice that limn xn < limn xn if and only if there exist
rational numbers a < b such that {xn : n 0} upcrosses [a, b] at least N times
for every N Z+ . Hence, {xn : n 0} converges in [, ] if and only if it
upcrosses [a, b] at most finitely often for each pair of rational numbers a < b.
2
Theorem 5.2.15
 (Doobs Martingale Convergence Theorem). Suppose
that Xn , Fn , P is a P-integrable submartingale. For < a < b < , let
U[a,b] () denote the precise number of times that {Xn () : n 0} upcrosses
[a, b]. Then




EP (Xn a)+
P
.
(5.2.16)
E U[a,b] sup
ba
nN

In particular, if
(5.2.17)



sup EP Xn+ < ,

nN

2 In the notes to Chapter VII of his Stochastic Processes, Wiley (1953), Doob gives a thorough
account of the relationship between his convergence result and earlier attempts in the same
direction. In particular, he points out that, in 1946, S. Anderson and B. Jessen formulated
and proved a closely related convergence theorem.

5.2 Discrete Parameter Martingales

215

then there exists a P-integrable random variable X to which {Xn : n 0} converges P-almost surely. (See Exercises 5.2.36 and 5.2.38 for other derivations.)

a)+
, and note that (by Corollary 5.2.10) Yn , Fn , P is
Proof: Set Yn = (Xnba
a P-integrable submartingale. Next, let N Z+ be given, set 00 = 0, and, for
k Z+ , define
0
k = inf{n k1
: Xn a} N

and k0 = inf{n k : Xn b} N.

Proceeding by induction, it is an easy matter to check that all the k s and


(N )
0 s are stopping times. Moreover, if U[a,b] () is the precise number of times
k

XnN () : n 0 upcrosses [a, b], then
(N )

U[a,b]

N
X

N
X


0
Yk0 Yk = YN Y0
Yk Yk1

k=1

YN

k=1
N
X


0
Yk Yk1
.

k=1



0
0
0 for all
k and therefore, by (5.2.14), EP Yk Yk1
Hence, since k1
(N )

k Z+ , we see that EP [U[a,b] ] EP [YN ], and clearly (5.2.16) follows from this
after one lets N .
Given (5.2.16), the convergence result is easy. Namely, if (5.2.17) is satisfied,
then (5.2.16) implies that there is a set of full P-measure such that U[a,b] () <
for all rational a < b and ; and so, by the remark preceding the
statement of this theorem, for each , {Xn () : n 0} converges to some
X() [, ]. Hence, we will be done as soon as we know that EP [|X|, ] <
. But



 


 

EP |Xn | = 2EP Xn+ EP Xn 2EP Xn+ EP X0 ,

n N,

and therefore Fatous Lemma plus (5.2.17) shows that X is P-integrable.

The inequality in (5.2.16) is quite famous and is known as Doobs Upcrossing Inequality.
Remark 5.2.18. The argument in the proof of Theorem 5.2.15 is so slick that
it is easy to miss the point that makes it work. Namely, the whole proof turns
0
on the inequality EP [Yk Yk1
] 0. At first sight, this inequality seems to be
0
wrong, since one is inclined to think that Yk < Yk1
. However, Yk need be
0
less than Yk1
only if k < N , which is precisely what, with high probability,
the submartingale property is preventing from happening.

216

5 Conditioning and Martingales


Corollary 5.2.19. Let Xn , Fn , P be a martingale. Then there exists an
X L1 (P; R) such that Xn = EP [X|Fn ] (a.s., P) for each n N if and only if
the sequence {Xn : n 0} is uniformly P-integrable. In addition, if p (1, ],
then there is an X Lp (P; R) such that Xn = EP [X|Fn ] (a.s., P) for each n N
if and only if {Xn : n 0} is a bounded subset of Lp (P; R).
Proof: Because of Corollary 5.2.4 and (5.2.3), I need only check the if statement in the first assertion. But, if {Xn : n 0} is uniformly P-integrable,
then (5.2.17) holds and therefore Xn X (a.s., P) for some P-integrable X.
Moreover, uniform integrability together with almost sure convergence implies
convergence in L1 (P; R), and therefore, by (5.1.5), for each m N,
 
 
Xm = lim EP Xn Fm = EP X Fm (a.s., P). 
n

Just as Corollary 5.2.4 led us to an intuitively appealing way to construct


conditional expectations, so does Doobs Theorem gives us an appealing approximation procedure for RadonNikodym derivatives.
Theorem 5.2.20 (Jessen). Let P and Q be a pair of probability measures on
the measurable space (, F) and Fn : n N a non-decreasing sequence of sub-algebras whose union generates F. For each n N, let Qn,a and Qn,s denote,
respectively, the absolutely continuous and singular parts of Qn Q  Fn
dQ
. Also, let Qa and Qs be the
with respect to Pn P  Fn , and set Xn = dPn,a
n
a
absolutely and singular continuous parts of Q with respect to P, and set Y = dQ
dP .
Then Xn Y (a.s., P). In particular, Q P if and only if Xn 0 (a.s., P).
Moreover, if Qn  Pn for each n N, then Q  P if and only if {Xn : n 0} is
uniformly P-integrable, in which case Xn Y in L1 (P; R) as well as P-almost
surely. Finally,
if Qn Pn (i.e., Pn  Qn as well as Qn  Pn ) for each n N

and G limn Xn (0, ) , then Qa (A) = Q(A G) for all A F, and
therefore Q(G) = 1 Q  P and Q(G) = 0 Q P.

Proof: Without loss in generality, I will assume throughout that all the Xn s
P
P
a
as well as Y dQ
dP take values in [0, ); and clearly, E [Xn ], n N, and E [Y ]
are all dominated by 1.
First note that
n
o
for A Fn .
Qn,s (A) = sup Q(A B) : B Fn and P(B) = 0

Hence, Qn,s  Fn1 Qn1,s for each n Z+ , and so






EP Xn , A = Qn,a (A) Qn1,a (A) = EP Xn1 , A

for all n Z+ and A Fn1 . In other words, Xn , Fn , P is a non-positive
submartingale. Moreover, in the case when Qn  Pn , n N, the same argument

5.2 Discrete Parameter Martingales

217


shows that Xn , Fn , P is a non-negative martingale. Thus, in either case, there
is a non-negative, P-integrable random variable X with the property that Xn
X (a.s., P). In order to identify X as Y , use Fatous Lemma to see that, for any
m N and A Fm ,




EP X, A lim EP Xn , A = lim Qn,a (A) Q(A);
n

S
and therefore EP [X, A] Q(A), first for A 0 Fm and then
 for every A F.
In particular, by choosing B F so that Qs (B) = 0 = P B{ , we have that






EP X, A = EP X, A B Q(A B) = Qa (A) = EP Y, A for all A F,
which means that X Y (a.s., P). On the other hand, if Yn = EP [Y |Fn ] for
n N, then




EP Yn , A = Qa (A) Qn,a (A) = EP Xn , A for all A Fn ,
and therefore Yn Xn (a.s., P) for each n N. Thus, since Yn Y and
Xn X P-almost surely, this means that Y X (a.s., P).

Next, assume that Qn  Pn for each n N and therefore that Xn , Fn , P
is a non-negative martingale. If {Xn : n 0} is uniformly P-integrable, then
Xn Y in L1 (P; R) and therefore Qs () = 1 EP [Y ] = 0. Hence, Q  P
when {Xn : n 0} is uniformly P-integrable. Conversely, if Q  P, then it
is easy to see that Xn = EP [Y |Fn ] for each n N, and therefore, by Corollary
5.2.4, that {Xn : n 0} is uniformly P-integrable.
Finally, assume that Qn Pn for each n N. Then, the Xn s can be chosen
dPn
. Hence, if Pa and Ps are
to take their values in (0, ) and Yn X1n = dQ
n
the absolutely continuous and singular parts of P relative to Q and if Y
Q
a
limn Yn , then Y = dP
dQ and so Pa (A) = E [Y, A] for all A F. Thus, when
1
on G and
B F is chosen so that Ps (B) = 0 = Q(B{), then, since Y = X
P
P
E [X, C G] = E [X, C] for all C F, it is becomes clear that





Q(A G = EQ XY, A G = EPa X, A G




= EP X, A G B = EP X, A B = Qa (A B) = Qa (A)

for all A F. 
5.2.4. Reversed Martingales and De Finettis Theory. For some applications it is important to know what happens if one runs a submartingale or
martingale backwards. Thus,
again let (, F, P) be a probability space, only

this time suppose that Fn : n N is a sequence of sub--algebras that
is non-increasing. Given a sequence {Xn : n 0} of (, ]-valued random variables, I will say that the triple Xn , Fn , P is either a reversed submartingale or a reversed martingale if, for each n N, Xn is Fn -measurable
and either Xn L1 (P; R) and Xn+1 EP [Xn | Fn+1 ] or Xn L1 (P; R) and
Xn+1 = EP [Xn | Fn+1 ].

218

5 Conditioning and Martingales

Theorem 5.2.21. If (Xn , Fn , P) is a reversed submartingale, then



(5.2.22)

P sup Xn R
nN



1 P
E X0 , sup Xn R ,
R
nN

R (0, ).

In particular, if (Xn , Fn , P) is a non-negative reversed submartingale and X0


L1 (P; R), then {Xn : n 0} is uniformly P-integrable and
(5.2.23)





sup Xn

nN

Lp (P;R)


p
X0 p
L (P;R)
p1

when p (1, ).


Moreover, if (Xn , Fn , P) is a reversed martingale, then (|Xn |, Fn , P is a re, Fn , P) is a reversed submartingale and
versed submartingale. Finally, if (XnT

X0 L1 (P; R), then there is a F n=0 Fn -measurable X : [, ]


to which Xn converges P-almost surely. In fact, X will be P-integrable if
supn0 EP [|Xn |] < ; and if (Xn , Fn , P) is either a non-negative reversed submartingale or a reversed martingale with X0 Lp (P; R) for some p [1, ),
then Xn X in Lp (P; R).
Proof: More or less everything here follows immediately from the observation
that (Xn , Fn , P) is a reversed submartingale or a reversed martingale if and only
if, for each N Z+ , (XN nN , FN nN , P) is a submartingale or a martingale.
Indeed, by this observation and (5.2.2) applied to (XN nN , FN nN , P),



P

max Xn > R

0nN



1 P
E X0 , max Xn > R
0nN
R

for every N 1. When N , the left-hand side of the preceding tends to


P (supnN Xn > R) and






EP X0 , max Xn > R = EP X0+ , max Xn > R EP X0 , max Xn > R
0nN
0nN
0nN






+

P
P
P
E X0 , sup Xn > R E X0 , sup Xn > R = E X0 , sup Xn > R ,
nN

nN

nN

since X0+ is non-negative, and therefore the Monotone Convergence Theorems


applies, and X0 is integrable, and therefore Lebesgues Dominated Convergence
Theorem applies. Thus (5.2.22) follows after one takes right limits in R. Starting
from (5.2.22) and applying Exercise 1.4.18, (5.2.23) follows for non-negative,
reversed submartingales. Moreover, because it is obvious that (|Xn |, Fn , P) is a
reversed submartingale when (Xn , Fn , P) is a reversed martingale, (5.2.23) holds
for reversed martingales as well.

5.2 Discrete Parameter Martingales

219

Next, suppose that (Xn , Fn , P) is a non-negative, reversed submartingale or a


reversed martingale. Then






P
P
P
sup E |Xn |, |Xn | R sup E |X0 |, |Xn | R E |X0 |, sup |Xn | R ,
nN

nN

nN

which, by (5.2.22), tends to 0 as R . Thus, {Xn : n 0} is uniformly


P-integrable.
It remains to prove the convergence assertions, and again the key is the same
observation about reversing time to convert reversed submartingales into submartingales. However, before seeing how it applies, first say that {xn : n 0}
downcrosses [a, b] at least N times if there exist 0 m1 < n1 < < mN < nN
such that xmi b and xni a for each 1 i N . Clearly, the same argument
that I used for upcrossings applies to downcrossings and shows that {xn : n 0}
converges in [, ] if and only if it downcrosses [a, b] finitely often for each
rational pair a < b. In addition, {xn : 0 n N } downcrosses [a, b] the same
(N )
number of times as {xN n : 0 n N } upcrosses it. Hence, if D[a,b] () is
the number of times {XnN : n 0} downcrosses [a, b], then this observation
(N )
together with the estimate in the proof of Theorem 5.2.15 for EP [U[a,b] ] show
that


 (N )  EP (X0 a)+
P
.
E D[a,b]
ba
Starting from here, the argument used to prove Theorem 5.2.15 shows that there
exits a F -measurable X : [, ] to which {Xn : n 0} converges
P-almost surely. Once one has this almost sure convergence result, the rest of
the theorem is an easy application of standard measure theory and the uniform
integrability estimates proved above. 
An important application of reversed martingales is provided by De Finettis
theory of exchangeable random variables. To describe his theory, let denote
the group of all finite permutations of Z+ . That is, an element of is an
isomorphism
of Z+ that moves only a finite number of integers. Alternatively,
S
= m=1 m , where m is the group of isomorphisms of Z+ with the property
that n = (n) for all n > m. Next, let (E, B) be a measurable space, and, for
+
+
each , define S : E Z E Z so that


if x = x1 , . . . , xn , . . . .
S x = x(1) , . . . , x(n) , . . .
+

Obviously, each S is a B Z -measurable isomorphism from E Z onto itself. Also,


if


+
for m Z+ ,
Am B BZ : B = S B for all m
+

then the Am s form a non-increasing sequence of sub--algebras of B Z , and

\
m=1



+
Am = A B B Z : B = S B for all .

220

5 Conditioning and Martingales

Now suppose that {Xn : n 1} is a sequence of E-valued random variables on


+
the probability space (, F, P), and set X() = X1 (), . . . , Xn (), . . . ) E Z .
The Xn s are said to be exchangeable random variables if X has the same
P-distribution as S X for every . The central result of De Finettis theory
is De Finettis Strong Law, which states that, for any g : E R satisfying
g X1 L1 (P; R),
n



1X
g Xm ,
EP g X1 X1 (A ) = lim
n n
1

(5.2.24)

where the convergence is P-almost sure and in L1 (P; R).


To prove (5.2.24), observe that, for any 1 m n, EP [g Xm | X1 (An )] =
EP [g X1 | X1 (An )], which immediately leads to
#
"

n
n
X
1
1


1 X
1
P
P
g Xm .
g Xm X (An ) =
E g X1 X (An ) = E
n m=1
n m=1

Hence, (5.2.24) follows as an application of Theorem 5.2.21.


De Finettis Strong Law makes it important to get a handle on the -algebra
X1 (A ). In particular, one would like to know when X1 (A ) is trivial in
the sense that each of its elements has probability 0 or 1, in which case (5.2.24)
self-improves to the statement that
n

(5.2.25)

1X
g Xm = EP [g X1 ]
lim
n n
1

P-almost surely and in L1 (P; R).

The following lemma is the crucial step toward gaining an understanding of


X1 (A ).

T
Lemma 5.2.26. Refer to the preceding, and let T = m=1 {Xn : n m}
be the tail -algebra determined by {Xn : n 1}. Then T X1 (A ) and
X1 (A ) is contained in the completion of T with respect to P. In particular,
for each F L1 (P; R),

 

(a.s., P).
(5.2.27)
EP F X1 (A ) = EP F T
Proof: The inclusion T X1 (A ) is obvious. Thus, what remains to be
proved is that, for any F L1 (P; R), EP [F | X1 (A )] is, up to a P-null set,
T -measurable. To this end, begin by observing that it suffices to check this for
N } -measurable for some N Z+ . Indeed, since
F s that are {Xn : 1 m

X1 (A ) {Xn : n 1} , we know that
i
h 



EP F X1 (A ) = EP EP F {Xn : n 1} X1 (A )
i
h 

= lim EP EP F {Xm : 1 m N } X1 (A ) .
N

5.2 Discrete Parameter Martingales

221


Now suppose that F is {Xm : 1 m N } -measurable. Then there
exists a g : E N R such that F = g X1 , . . . , XN ). If N = 1, then, because
Pn
limn n1 m=1 g Xm is T -measurable, (5.2.24) says that E P [F | X1 (A )]
is T -measurable. To get the same conclusion when N 2, I want to apply the
same reasoning, only now with E replaced by E N . To be precise, define



)
Z+
: B = S B for all (N ) , where
A(N
= B B


(N ) = : (`N + m) = (`N + 1) + m 1 for all ` N and 1 m < N
of length N .
is the group of finite permutations that transform Z+ in blocks
 1 (N ) 
N
P

By (5.2.24) applied with E replacing E, we find that E F X (A ) =
 
(N )
1
1
EP F T P-almost surely. Hence,
 since X (A )1 X (A ), (5.2.27) holds
for every {Xn : 1 n N } -measurable F L (P; R). 
The best known consequence of Lemma 5.2.26 is the HewittSavage 0
1 Law, which says that X1 (A ) is trivial if the Xn s are independent and
identically distributed. Clearly, their result is an immediate consequence of
Lemma 5.2.26 together with Kolmogorovs 01 Law.
Seeing as the Strong Law of Large Numbers follows from (5.2.24) combined
with the HewittSavage 01 Law, one might think that (5.2.24) represents an
extension of the strong law. However, that is not really the case, since it can be
shown that X1 (A ) is trivial only if the Xn s are independent. On the other
hand, the derivation of the strong law via (5.2.24) extends without alteration to
the Banach space setting (cf. part (ii) of Exercise 6.1.16).
5.2.5. An Application to a Tracking Algorithm. In this subsection I will
apply the considerations in 5.2.1 to the analysis of a tracking algorithm. The
origin of this algorithm is an idea which Jan Mycielski introduced as a model
for learning. However, the treatment here derives from a variation, suggested
by Roy O. Davies, of Mycielskis model. Because I do not understand learning
theory, I prefer to think of Mycielskis algorithm as a tracking algorithm.
Let (E, B) be a measurable space for which there exists a nested sequence
{Pk : k 0} of finite or countable partitions such that P0 = {E} and B =
S
be the parent of Q in the sense
( k=0 Pk ). Given k 1 and Q Pk , let Q

that Q is the unique element of Pk1 which contains Q. Also, for each x E
and k 0, use Qk (x) to denote the unique Q Pk such that Q 3 x. Further, let
be a probability measure on (E, B) with the property that, for some (0, 1),
for each Q S Pk
0 < (Q) (1 )(Q)
k=0
Next, let (, F, P) be a probability space on which there exists a sequence
{Xn : n 1} of mutually independent E-valued random variables with distribution . In addition, let {Zn : n 1} be a sequence of E-valued random variables with the property that, for each n 1, Zn is independent of
{Xm : 1 m n} , let n be the distribution of Zn , and assume that

222

5 Conditioning and Martingales




n
< for some r (1, ). Finally, den  with Kr supn1 d
d r
L (;R)

fine {Yn : n 1} by the prescription that Yn () = Xm () if Xm () is the first


element of {X1 (), . . . , Xn ()} which
 is closest to Zn () in the sense that,
Z

/
Q
()
for 1 j < m, Xm () Qk Zn () , and
for some k 0, Xj ()
k
n

/ Qk+1 Zn () for m < j n.
Xj ()
The goal here is to show that the Yn s search out the Zn s in the sense that,
for any B-measurable f : E R,

(5.2.28)
lim P |f (Yn ) f (Zn )|  = 0 for all  > 0.
n

At least in the case when n = , Mycielski has an alternative, in some sense


simpler, derivation of (5.2.28).
The strategy which I will use is the following. For each k 1 and f L1 (; R),
define fk : E R so that
Z
1
f (y) (dy).
fk (x) =
(Qk (x)) Qk (x)



Obviously fk Yn () = fk Zn () if Yn () Qk Zn () . Moreover, as I will
/ Qk (Zn ) = 0 for each k 0. Thus, the key step is
show below, limn P Yn
to show that

lim sup P |f (Yn ) fk (Yn )|  = 0 for all  > 0.
k n1



Notice that, because fk = E f (Pk ) , this would be obvious from Corollary
5.2.4 if the Yn were replaced by Xn . Thus, the problem comes down to showing
that the distributions of Yn s are uniformly sufficiently close to .
For each n 1, define
n (z, ) =
=

n
X
X
k=0 j=1

X
n

k=0

j1

nj
1 Qk (z)
(Qk (z) \ Qk+1 (z)) 1 Qk+1 (z)

 (Qk (z) \ Qk+1 (z))
 ,
Qk+1 (z)
Qk (z) \ Qk+1 (z)

n

n . Then
where n (Q) 1 (Q) 1 (Q)
ZZ

1B (z, y) n (z, dy)n (dz).
(5.2.29)
P (Zn , Yn ) B =
B

In particular, if n is the distribution of Yn , then



Z

\ Q)
X
X
(Q
n
.
n () = n (z, ) n (dz) =
(Q)n (Q)
\ Q)
(Q
k=0 QPk+1

5.2 Discrete Parameter Martingales

223


In addition, because Q` (z) \ Q`+1 (z) Qk (z) = if ` < k and is equal to
Q` (z) \ Q`+1 (z) when ` k,

 X

n Q`+1 (z))
n z, Qk (z) =
`=k

= lim

n
n 
n
1 (QL+1 (z)) 1 (Qk (z)
= 1 1 (Qk (z)) .

Thus, if r0 is the Holder conjugate of r, then



/ Qk (Zn ) =
P Yn

Z
1(Qk (z))

n

Z
n (dz) Kr

 10
r
nr0
,
(dz)
1 (Qk (z))

and so, by Lebesgues Dominated Convergence Theorem,



/ Qk (Zn ) = 0 for all k 0.
(5.2.30)
lim P Yn
n

Given an f L1 (; R) and Q
1
Af (Q) =
(Q)

k=0

Z
f d

Pk , set
(
0

and M f (Q) = sup A|f |(Q ) : Q Q

)
Pk

k=0

Clearly,

x Q = M f (Q) f (x) sup A|f | Qk (x) ,
k0




and, because Af Qk (x) = E f (Pk ) (x), Doobs Inequality (5.2.3) implies
p
kf kLp (;R) for all p (1, ].
that kf kLp (;R) p1

Lemma 5.2.31. For any f L1 (; R),


Z
Z
1
f dn .
(5.2.32)
|f | dn
0

In particular, if q [1, ) and f Lqr (; R), then



(5.2.33)

kf kLq (n ;R)

rKr

 q

kf kLqr0 (;R) .

Proof: Without loss in generality, I will assume throughout that f 0.


To prove (5.2.32), first note that
Z
Z

X
X
1
n
f d
(Q)n (Q)
f dn =
\ Q) Q\Q

(Q
k=0 QPk+1
X X

n (Q)n (Q)M f (Q),


1
k=0 QPk+1

224

5 Conditioning and Martingales

since

(Q \ Q)

1 M f (Q).

f d 1 Af (Q)

Q\Q

Next, for each k 0,


X
QPk+1

n

1 (Q) n (Q)M f (Q)

=
n (Q)n (Q)M f (Q)

QPk+1


n n (Q)M f (Q)

1 (Q)

QPk+1

X
n
n

1 (Q) n (Q)M f (Q)
1 (Q) n (Q)M f (Q)

QPk+1

QPk

X
n
n
1 (Q) n (Q)M f (Q)
1 (Q) n (Q)M f (Q),

X
QPk+1

QPk

and therefore
K  X
X

Z
f dn lim

n
1 (Q) n (Q)M f (Q)

QPk+1

k=0

n
1 (Q) n (Q)M f (Q)

QPk

= lim

n
1 (Q) n (Q)M f (Q)

f dn .

QPK+1

Given (5.2.32), (5.2.33) is an easy application of Holders Inequality and the


estimate coming from (5.2.3) on the Lp (; R)-norm of f in terms of that of f .
Namely,
Z

f dn

(f ) dn Kr

1 Kr

Z

0
q r

(f )

 10
r
d

rKr
r0
kf kqLqr0 (;R) .
kf q kLr0 (;R) =
0

r 1

Theorem 5.2.34. For each B-measurable f : E R, (5.2.28) holds. More0


over, if q (1, ) and f Lqr (; R), then
(5.2.35)



lim EP |f (Yn ) f (Zn )|q = 0 for each p [1, q).

(See Exercise 6.1.19 for a related result.)

5.2 Discrete Parameter Martingales

225

Proof: It is easy to prove


Indeed, given > 0, choose
 (5.2.28) from (5.2.35).
R
R > 0 so that |f | R < , and
 set f = f 1[R,R] (f ). Then, by (5.2.35),
limn P |f R (Yn ) f R (Zn )|  = 0 for all  > 0. Hence,

lim P |f (Yn ) f (Zn )| 3
n


lim n |f f R |  + lim n |f f R |  .
n

By Holders Inequality,

1
1
n |f f R |  Kr |f f R |  r0 < Kr r0 ,

and, by (5.2.33) with q = 1,

1
 rKr
rKr 10
r .
|f f R |  r0 <
n |f f R | 

The proof of (5.2.35) follows the strategy outlined earlier. That is,

1
EP |f (Yn ) f (Zn )|p p


1
kf fk kLp (n ;R) + EP |fk (Yn ) fk (Zn )|p p + kfk f kLp (n ;R) .

By (5.2.33),

kf fk kLp (n ;R)

rKr

 p1

kf fk kLpr0 (;R) ,

and, by Holders Inequality,


1

kf fk kLp (n ;R) Krp kf fk kLpr0 (;R) .

Since, by Corollary 5.2.4, kf fk kLpr0 (;R) 0 as k , all that remains is



1
to show that, for each k 0, EP |fk (Yn ) fk (Zn )|p p 0. But


1

1
/ Qk (Zn ) p
EP |fk (Yn ) fk (Zn )|p p = EP |fk (Yn ) fk (Zn )|p , Yn
11

1
/ Qk (Zn ) p q .
EP |fk (Yn ) fk (Zn )|q q P Yn
By (5.2.30), the final factor tends to 0 as n . Hence, since, by Holders
Inequality and (5.2.33),

1
EP |fk (Yn ) fk (Zn )|q q kfk kLq (n ;R) + kfk kLq (n ;R)
  1
 1
 1
  1
r q
r q
+ 1 Krq kf kLqr0 (;R) ,
+ 1 Krq kfk kLqr0 (;R)

the proof is complete. 

226

5 Conditioning and Martingales


Exercises for 5.2

Exercise 5.2.36. In this exercise I will outline a quite independent derivation


of the convergence assertion in Doobs Martingale Convergence Theorem. The
key observations here are first that, given Doobs Inequality (cf. (5.2.2)), the
result is nearly trivial for martingales having uniformly bounded second moments
and second that everything can be reduced to that case.

(i) Let Mn , Fn , P be a martingale which is L2 -bounded (i.e., supnN EP [Mn2 ] <
). Note that
h


 2 
2 i
EP Mn2 EP Mm1
= EP Mn Mm1

for

1 m n;

and starting from this, show that there is an M L2 (P;


 R) such that Mn  M
2
in L (P; R). Next apply (5.2.5) to the submartingale Mnm Mm , Fn , P to
show that, for every  > 0,

P



sup Mn Mm 

nm

i
1 P h
E M Mm 0


as

m ,

and conclude that Mn M (a.s., P).



(ii) Let Xn , Fn , P be a non-negative submartingale with the property that
supnN EP [Xn2 ] < , define the sequence {An : n N} accordingly,
as in Lemma

5.2.12, and set Mn = Xn An , n N. Then Mn , Fn , P is a martingale, and
clearly both Mn and An are P-square integrable for each n N. In fact, check
that





2
= EP Mn Mn1 Xn + Xn1
EP Mn2 Mn1







2
2
EP An An1 Xn + Xn1 EP Xn2 Xn1
,
= EP Xn2 Xn1
and therefore that


 
EP Mn2 EP Xn2 and

 
 
EP A2n 4EP Xn2

for every n N.


Finally, show that there exist M L2 (P; R) and A L2 P; [0, ) such that
Mn M , An % A, and, therefore, Xn X M + A both P-almost surely
and in L2 (P; R).

(iii) Let Xn , Fn , P be a non-negative martingale, set Yn = eXn , n N, use
Corollary 5.2.10 to see that Yn , Fn , P is a uniformly bounded, non-negative,
submartingale, and apply part (ii) to conclude that {Xn : n 0} converges
P-almost surely to a non-negative X L1 (P; R).

Exercises for 5.2

227


(iv) Let Xn , Fn , P be a martingale for which
(5.2.37)

 
sup EP Xn < .
nN

 

Fm 0 for n N. Show that Y


For each m N, define Yn,m
= EP Xnm

n+1,m


Yn,m (a.s., P), define Ym = limn Yn,m , check that both Ym , Fm , P and





Ym , Fm , P are non-negative martingales with EP Y0+ +Y0 supnN EP |Xn | ,
and note that Xm = Y m+ Ym (a.s., P) for each m N. In other words, every
martingale Xn , Fn , P satisfying (5.2.37 ) admits a Hahn decomposition3 as
the difference of two non-negative martingales whose sum has expectation value
dominated by the left-hand side of (5.2.37). Finally, use this observation together
with (iii) to see that every such martingale converges P-almost surely to some
X L1 (P; R).

(v) By combining the final assertion in (iv) together with Doobs Decomposition
in Lemma 5.2.12, give another proof of the convergence assertion in Theorem
5.2.15.
Exercise 5.2.38. In this exercise we will develop another way to reduce Doobs
Martingale Convergence Theorem to the case of L2 -bounded martingales. The
technique here is due to R. Gundy and derives from the ideas introduced by
Calderon and Zygmund in connection with their famous work on weak-type 11
estimates for singular integrals.


measurable, [0, R]-valued
(i) Let {Zn : n N} be a Fn : n N -progressively

,
F
,
P
is
a
submartingale.
Next, choose
sequence with the property that
Z
n
n

{An : n N} for Zn , Fn , P as in Lemma 5.2.12, note that An s can be chosen
so that 0 An An1 R for all n Z+ , and set Mn = Zn + An , n N.
Check that Mn , Fn , P is a non-negative martingale with Mn (n + 1)R for
each n N. Next, show that





2
= EP Mn Mn1 Zn + Zn1
EP Mn2 Mn1





2
+ EP An An1 Zn + Zn1
= EP Zn2 Zn1




2
+ 2R EP An An1 ,
EP Zn2 Zn1
and conclude that EP [A2n ] EP [Mn2 ] 3REP [Z0 ] for all n N.

(ii) Let Xn , Fn , P be a non-negative martingale. Show that, for each R

(R)
(R)
(R)
(R)
(0, ), Xn = Mn An + n , n N, where Mn , Fn , P is a non-negative

   (R)

(R) 2 
3R EP X0 ; An : n N is a
martingale satisfying supn0 EP Mn
(R)

non-decreasing sequence of random variables with the properties that A0


3

This useful observation was made by Klaus Krickeberg.

0,

228

5 Conditioning and Martingales

 (R) 2 
 
 (R)
(R)
3REP X0 ; and n :
An is Fn1 -measurable, and supn1 EP An



n N is a Fn : n N -progressively measurable sequence with the property
that
 

1
P n N (R)
6= 0 EP X0 .
n
R
(R)

(R)

(R)

Hint: Set Zn = Xn R and n = Xn Zn for n N, apply part (i)


 (R)

to Zn : n N , and use Doobs Inequality to estimate the probability that
(R)
n 6= 0 for some n N.

(iii) Let Xn , Fn , P be any martingale. Using (ii) above and part (iv) of Exer(R)
(R)
(R)
cise 5.2.36, show that, for each R (0, ), Xn = Mn + Vn + n , n N,






(R)
(R) 2
12 REP |Xn | ;
where Mn , Fn , P is a martingale satisfying EP Mn

 (R)
(R)
(R)
Vn : n N is a sequence of random variables satisfying V0 0, Vn is
Fn1 -measurable, and

!2
n
X




Vm(R) V (R) 12REP |Xn |
EP
m1
1

for n Z+ ; and {n : N} is an
sequence satisfying



Fn : n N -progressively measurable





2
P 0 m n (R)
=
6
0
EP |Xn | .
m
R

The preceding representation is called


onZygmund decomposi the Calder
tion of the martingale Xn , Fn , P .

(iv) Let Xn , Fn , P be a martingale that satisfies (5.2.37), and use part (iii)
above together with part (i) of Exercise 5.2.36 to show that, for each R (0, ),
2
times the
{Xn : n 0} converges off of a set whose P-measure is no more than R
P
supremum over n N of E [|Xn |]. In particular, when combined with Lemma
5.2.12, the preceding line of reasoning leads to the advertised alternate proof of
the convergence result in Theorem 5.2.15.

Exercise 5.2.39. In this exercise we will extend Hunts Theorem (cf. Theorem

5.2.13) to allow unbounded stopping times. To this end, let Xn , Fn , P be a
uniformly P-integrable submartingale on the probability space (, F, P), and set
Mn = Xn An , n N, where {An : n
 N} is the sequence produced in Lemma
5.2.12. After checking that Mn , Fn , P is a uniformly P-integrable martingale,
show that, for any stopping time : X = EP [M |F ] + A (a.s., P), where
X , M , and A are, respectively, the P-almost sure limits of {Xn : n 0},
{Mn : n 0}, and {An : n 0}. In particular, if and 0 are a pair of stopping
times and 0 , conclude that X EP [X 0 |F ] (a.s., P).

Exercises for 5.2

229

Exercise 5.2.40. There are times when submartingales converge even though
they are not bounded in L1 (P; R). For example, suppose that (Xn , Fn , P) is a
submartingale for which there exists a non-decreasing function
 : R 7 R with
the properties that (R) R for all R and Xn+1 Xn (a.e., P) for each
n N.


(i) Set R () = inf n N : Xn () R for R (0, ), and note that
sup XnR X0 (R)

(a.e., P).

nN

In particular, if X0 is P-integrable, show that {Xn () : n 0} converges in R


for P-almost every for which the sequence {Xn () : n 0} is bounded above.
+
Hint: After observing that supnN EP [Xn
] < for every R (0, ), conR
clude that, for each R (0, ), {Xn : n 0} converges P-almost everywhere
on {R = }.
(ii) Let {Yn : n 1} be a sequence of mutually independent, P-integrable
random variables, assume that EP [Yn ] 0 for n N and supnN kYn+ kL (P;R) <
Pn
, and set Sn = 1 Ym . Show that {Sn : n 0} is either P-almost surely
unbounded above or P-almost surely convergent in R.


(iii) Let Fn : n N be a non-decreasing sequence of sub--algebras and An
an element of Fn for each n N. Show that the set of for which either

1An () < but

n=0

or



P An Fn1 () =

n=1

1An () = but

n=0



P An Fn1 () <

n=1

has P-measure 0. In particular, note that this gives another derivation of the
second part of the BorelCantelli Lemma (cf. Lemma 1.1.3).
Exercise 5.2.41. For each n N, let (En , Bn ) be a measurable space and
n and n a pair of probability measures on (En , Bn ) with the property that
Theorem, Q
which says that (cf. Exercise 1.1.14)
n 
Qn . Prove Kakutanis
Q
Q
either nN n nN n or nN n  nN n .
Hint: Set
Y
Y
Y
Y
En , F =
Bn , P =
n , and Q =
n .
=
nN

nN

nN

nN

Qn
( 0 Bm ), where n is the natural projection from onto
Next, take Fn =
Q
n
0 Em , set Pn = P  Fn and Qn = Q  Fn , and note that
n1

Xn (x)

Y
dQn
(x) =
fm (xm ),
dPn
0

x ,

230

5 Conditioning and Martingales

dn
. In particular, when n n for each n N, use Kolwhere fn d
n
mogorovs
01 Law (cf. Theorem 1.1.2) to see that Q(G) {0, 1}, where G

limn Xn (0, )}, and combine this with the last part of Theorem 5.2.20
to conclude that Q 6 P = Q  P. Finally, to remove the assumption
that

n n for all ns, define n on (En , Bn ) by n = 1 2n1 n + 2n1 n ,
Q
n , and use the preceding to complete
check that n n and Q  Q
nN
the proof.

Exercise 5.2.42. Let (, F) be a measurable space and a sub--algebra of


F. Given a pair of probability measures P and Q on (, F), let X and Y
be non-negative RadonNikodym derivatives
of, respectively, P P  and

Q Q  with respect to P + Q , and define

P, Q =

X2 Y2 d(P + Q).

(i) Show that if is any -finite measure on


 (, ) with the property that
P  and Q  , then the number P, Q given above is equal to
Z 

dP
d

 12 

dQ
d

 12

d.


Also, check that P Q if and only if P, Q = 0.


(ii) Suppose that Fn : n N is a non-decreasing sequence of sub--algebras
of F, and show that (P, Q)Fn (P, Q)W Fn .
0

(iii) Referring to part (ii), assume that Q  Fn  P  Fn for each n N, let Xn


be a non-negative RadonNikodym
derivative
  Fn ,
 to P
Wof Q  Fn with respect
W
P
Xn 0
and show that Q  0 Fn is singular to P  0 Fn if and only if E
as n .

(iv) Let {n }
0 (0, ), and, for each n N, let n and n be Gaussian
measures on R with variance n2 . If an and bn are the mean values of, respectively,
n and n , show that
Y

nN

depending on whether

Y
nN

P
0

or

Y
nN

nN

n2 (bn an )2 converges or diverges.

Exercise 5.2.43. Let {Xn : n Z+ } be a sequence of identically distributed,


mutually independent, integrable, mean value P
0, R-valued random variables on
n
the probability space (, F, P), and set Sn = 1 Xm for n Z+ . In Exercise

Exercises for 5.2

231

1.4.28 we showed that limn |Sn | < P-almost surely. Here we will show
that

(5.2.44)

lim |Sn | = 0 P-almost surely.


n

As was mentioned before, this result was proved first by K.L. Chung and W.H.
Fuchs. The basic observation behind the present proof is due to A. Perlin, who
noticed that, by the HewittSavage 01 Law, limn |Sn | = L P-almost surely
for some L [0, ). Thus, the problem is to show that L = 0, and we will do
this by an simple argument invented by A. Yushkevich.

(i) Assuming that L > 0, use the HewittSavage 01 Law to show that

P |Sn x| <

L
3


i.o. = 0

for any x R,

where i.o. stands for infinitely often and means here for infinitely many
ns.
Hint: Set = L3 . Begin by observing that, because {Sm+n Sm : n Z+ }
has the same P-distribution as {Sn : n Z+ }, P(|Sm+n Sm | < 2 i.o.) = 0 for
any m Z+ . Thus, since |Sm+n x| |Sm+n Sm | |Sm x|, P(|Sn x| <
i.o.) P(|Sm x| ) for any m Z+ . Moreover, by the HewittSavage
01 Law, P(|Sn x| < i.o.) {0, 1}. Hence, either P(|Sn x| < i.o.) = 0,
or one has the contradiction that P(|Sm x| < ) = 0 for all m Z+ and yet
P(|Sn x| < i.o.) = 1.

(ii) Still assuming that L > 0, argue that


P |Sn L| <

L
3


i.o. P |Sn + L| <

L
3


i.o. = 1,

which, in view of (i), is a contradiction. Conclude that (5.2.44) holds.


(iii) Knowing (5.2.44) and the HewittSavage 01 Law, show that, for each x R
and  > 0, one has the dichotomy

P |Sn x| <  = 0 for all n 1 or


P |Sn x| <  i.o. = 1.

Exercise 5.2.45.
of reversed martingales.
 Here is a rather frivolous application

Let (, F, P), Fn : n N , and {ek : k Z be as in part (v) of Exercise
5.1.17. Next, take
Sm = {(2k + 1)2m : k Z} for each m N, and, for

2
f L [0, 1); C , set
m (f ) =

X
`Sm

f, e`

e,
L2 ([0,1);C) `

232

5 Conditioning and Martingales

where the convergence is in L2 (([0, 1]; C). By Exercise 5.1.17,


n

 X
m (f ).
f EP f Fn+1 =
m=0



After noting that Fn : n N is non-increasing, use the convergence result for
reversed martingales in Theorem 5.2.21 to see that the expansion
f = f, 1

+
L2 ([0,1);C)

m (f )

m=0

converges both almost everywhere as well as in L2 ([0, 1); C).4

When f is a function with the property that (f, e` )L2 ([0,1);C) = 0 for all ` Z\{2m : m N},
the preceding almost everywhere convergence result can be interpreted as saying that the
Fourier series of f converges almost everywhere, a result that was discovered originally by
Kolmogorov. The proof suggested here is based on fading memories of a conversation with
N. Varopolous. Of course, ever since L. Carlesons definitive theorem on the almost every
convergence of the Fourier series of an arbitrary square integrable function, the interest in this
result of Kolmogorov is mostly historical.

Chapter 6
Some Extensions and Applications
of Martingale Theory

Many of the results obtained in 5.2 admit easy extensions to both infinite
measures and Banach spacevalued random variables. Furthermore, in many
applications, these extensions play a useful, and occasionally essential, role. In
the first section of this chapter, I will develop some of these extensions, and in the
second section I will show how these extensions can be used to derive Birkhoffs
Individual Ergodic Theorem. The final section is devoted to Burkholders Inequality for martingales, an estimate that is second in importance only to Doobs
Inequality.
6.1 Some Extensions
Throughout
the

discussion that follows, (, F, ) will be a measure space and
Fn : n N will be a non-decreasing sequence of sub--algebras with the
property that  F0 is -finite. In particular, this means that the conditional
expectation of a locally -integrable random variable given Fn is well defined (cf.
Theorem 5.1.12) even if the random variable takes
values in a separable Banach

space E. Thus, I will say that the sequence Xn ; n N of E-valued random
variables is a -martingale
 with respect to Fn : n N , or,
 more briefly,

that the triple Xn , Fn , is a martingale, if {Xn : n N} is Fn : n N progressively measurable, each Xn is locally -integrable, and


Xn1 = E Xn Fn1 (a.e., ) for each n Z+ .
Furthermore, whenE = R, I will
say that {Xn : n N} is a -submartingale
with respect to Fn : n N 
(equivalently, the triple (Xn , Fn , ) is a submartingale) if {Xn : n N} is Fn : n N -progressively measurable, each
Xn is locally -integrable, and


Xn1 E Xn Fn1 (a.e., ) for each n Z+ .
6.1.1. Martingale Theory for a -Finite Measure Space. Without any
real effort, I can now prove the following variants of each of the basic results in
5.2.
233

234

6 Some Extensions and Applications


Theorem 6.1.1. Let Xn , Fn , be an R-valued -submartingale. Then, for
each N N and A F0 on which XN is -integrable,







1
(6.1.2)

max Xn A E XN ,
max Xn A
0nN
0nN

for all (0, ); and so, when all the Xn s are non-negative, for every p
(1, ) and A F0 ,

 p1
E sup |Xn |p , A


1
p
sup E |Xn |p , A p .
p 1 nN
nN

Furthermore, for each stopping time , Xn
 , Fn , is a submartingale or a
martingale depending on whether Xn , Fn , is a submartingale or a martingale.
In addition, for any pair of bounded stopping times 0 ,


X E X 0 F
(a.e., ),

and the inequality is an equality in the martingale case. Finally, given a < b
and A F0 ,




E (Xn a)+ , A

,
E U[a,b] , A sup
ba
nN

where U[a,b] () denotes the precise number of times that {Xn () : n 1}


upcrosses [a, b] (cf. the discussion preceding Theorem 5.2.15), and therefore


sup E Xn+ , A < for every A F0 with (A) <
nN

= Xn X

(a.e., ),

W
where X is 0 Fn -measurable
and locally -integrable. In fact, in the case of
W
martingales, there is a 0 Fn -measurable, locally -integrable X such that
 
Xn = E X Fn (a.e., ) for all n N
if and only if {Xn : n 0} is uniformly -integrable on each A F0 with
(A) < , in which case X is -integrable if and only if Xn X in L1 (; R).
On the other hand, when p (1, ), X Lp (; R) if and only if {Xn : n 0}
is bounded in Lp (; R), in which case Xn X in Lp (; R).
Proof: Obviously, there is no problem unless () = . However, even then,
each of these results follows immediately from its counterpart in 5.2 once one
makes the following trivial observation. Namely, given 0 F0 with (0 )
(0, ), set
F 0 = F[0 ],

Fn0 = Fn [0 ],

Xn0 = Xn  0 ,

and P =

 F0
.
(0 )

6.1 Some Extensions

235


Then Xn0 , Fn0 , P0 is asubmartingale or a martingale depending on whether
the original Xn , Fn , was a submartingale or a martingale. Hence, when
() = , simply choose aSsequence {k : k 1} of mutually disjoint, -finite

elements of F0 so that = 1 k , work on each k separately, and, at the end,


sum the results. 
I will now spend a little time seeing how Theorem 6.1.1 can be applied to
give a simple proof of the HardyLittlewood Maximal Inequality. To state
their result, define the maximal function Mf for f L1 (RN ; R) by
Z
1
|f (y)| dy, x RN ,
Mf (x) = sup
Q3x |Q| Q

where Q is used to denote a generic cube


(6.1.3)

Q=

N
Y

[aj , aj + r)

with a = (a1 , . . . , aN ) RN and r > 0.

j=1

As is easily checked, Mf : RN [0, ] is lower semicontinuous and therefore


certainly Borel measurable. Furthermore, if we restrict our attention to nicely
meshed families of cubes, then it is easy to relate Mf to martingales. More
precisely, for each n Z, the nth standard dyadic partition of RN is the
partition Pn of RN into the cubes

N 
Y
ki ki + 1
, k ZN .
,
(6.1.4)
Cn (k)
n
n
2
2
i=1

These partitions are nicely meshed in the sense that the (n + 1)st is a refinement
of the nth. Equivalently, if Fn denotes the -algebra over RN generated by the
partition Pn , then Fn Fn+1 . Moreover, if f L1 (RN ; R) and
Z


f
nN
f (y) dy for x Cn (k) and k ZN ,
Xn (x) 2
Cn (k)

then, for each n Z,


 
Xnf = ERN |f | Fn

(a.e., RN ),

where RN denotes Lebesgue measure on RN . In particular, for each m Z,



f
Xm+n
, Fm+n , RN , n N,
is a non-negative martingale; and so, by applying (6.1.2) for each m Z and
then letting m & , we see that
Z
n
o
1


(0)
|f (y)| dy, (0, ),
(6.1.5)
x
:
M
f
(x)

{M(0) f }

236

6 Some Extensions and Applications

where

(
(0)

f (x) = sup

1
|Q|

Z
|f (y)| dy : x Q
Q

Pn

nZ

and I have used || to denote RN (), the Lebesgue measure of .


At first sight, one might hope that it should be possible to pass directly from
(6.1.5) to analogous estimates on the level sets of Mf . However, the passage
from (6.1.5) to control on Mf is not as easy as it might appear at first: the
sup in the definition of Mf involves many more cubes than the one in the
definition of M(0) f . For this reason I will have to introduce additional families
of meshed partitions. Namely, for each {0, 1}N , set


(1)n
N
+ Cn (k) : k Z
,
Pn () =
3 2n

where Cn (k) is the cube described


in (6.1.4).

It is then an easy matter to check
that, for each {0, 1}N , Pn () : n Z is a family of meshed partitions of
RN . Furthermore, if
)
(
Z
[


 () 
1
f (y) dy : x Q
Pn () , x RN ,
M f (x) = sup
|Q| Q
nZ

then exactly the same argument that (when = 0) led us to (6.1.5) can now be
used to get
Z
n
o


1


N
()
f (y) dy
(*)
x R : M f (x)

{M() f }

for each {0, 1}N and (0, ). Finally, if Q is given by (6.1.3) and
r 3 12n , then it is possible to find an {0, 1}N and a C Pn () for which
Q C. (To see this, first reduce to the case when N = 1.) Hence,

max

{0,1}N

M() f Mf 6N

max

{0,1}N

M() f.

After combining this with the estimate in (*), we arrive at the following version
of the HardyLittlewood Maximal Inequality:
n
o (12)N Z


|f (y)| dy.
(6.1.6)
x RN : Mf (x)

RN

At the same time, (*) implies that


max

{0,1}N

()
M f p N
L (R ;R)

p
kf kLp (RN ;R) ,
p1

p (1, ].

6.1 Some Extensions

237

To check this, first note that it suffices to do so when f vanishes outside of


the ball B(0, R) for some R > 0. Second, assuming that f = 0 off of B(0, R),
observe that (*) implies that
Z
n
o


1


f (y) dy.
x B(0, R) : M() f (x)

{M()B(0,R) f }

Next, even though the result in Exercise 1.4.18 was stated for probability measures, it applies equally well to any finite measure. Thus, we now know that
()

kM

! p1

()

f kLp (RN ;R) = lim

(M

f ) (x) dx

B(0,R)

p
kf kLp (RN ;R) ,
p1

and so we can repeat the argument just made to obtain


(6.1.7)

N


Mf p N (12) p kf kLp (RN ;R)
L (R ;R)
p1

for p (1, ].

In this connection, notice that there is no hope of getting this sort of estimate
when p = 1, since it is clear that
lim |x|N Mf (x) > 0
|x|

whenever f does not vanish RN -almost everywhere.


The inequality in (6.1.6) plays the same role in classical analysis as Doobs
Inequality plays in martingale theory. For example, by essentially the same
argument as I used to pass from Doobs Inequality to Corollary 5.2.4, we obtain
the following version of famous Lebesgue Differentiation Theorem.
Theorem 6.1.8. For each f L1 RN ; R),
Z


1
f (y) f (x) dy = 0
lim
B&{x} |B| B
(6.1.9)

for RN -almost every x RN ,


where, for each x RN , the limit is taken over balls B that contain x and tend
to x in the sense that their radii shrink to 0. In particular,
Z
1
f (y) dy for RN -almost every x RN .
f (x) = lim
B&{x} |B| B

Proof: I begin with the observation that, for each f L1 (RN ; R),
Z


1

f (y) dy N Mf (x), x RN ,
Mf (x) sup
B3x |B| B

238

6 Some Extensions and Applications



2N
with N = B(0, 1) . Second, notice that (6.1.9) for every
where n =
N
x RN is trivial when f Cc (RN ; R). Hence, all that remains is to check that
if fn f in L1 (RN ; R) and if (6.1.9) holds for each fn , then it holds for f . To
this end, let  > 0 be given and check that, because of the preceding and (6.1.6),


Z




f (y) f (x) dy 
x : lim 1


B&{x} |B| B
n
 o

x : M(f
fn )(x)

3


Z





1

fn (y) fn (x) dy
+ x : lim
3
B&{x} |B| B
n


 o

+ x : fn (x) f (x)

3

3
1 + (12)N N kf fn kL1 (RN )


for every n Z+ . Hence, after letting n , we get (6.1.9) f . 


Although applications like Lebesgues Differentiation Theorem might make
one think that (6.1.6) is most interesting because of what it says about averages
over small cubes, its implications for large cubes are also significant. In fact, as I
will show in 6.2, it allows one to prove Birkhoffs Individual Ergodic Theorem
(cf. Theorem 6.2.7), which may be viewed as a result about differentiation at
infinity. The link between ergodic theory and the HardyLittlewood Inequality
is provided by the following deterministic
version

of the Maximal Ergodic Lemma
(cf. Lemma 6.2.1). Namely, let ak : k ZN be a summable subset of [0, ),
and set
X
1
aj+k , n N and k ZN ,
S n (k) =
N
(2n)
jQn


where Qn = j ZN : n ji < n for 1 i N . By applying (6.1.6) and
(6.1.7) to the function f given by (cf. (6.1.4)) f (x) = ak when x C0 (k), we
see that


(12)N X
N
ak , (0, )
(6.1.10)
card k Z : sup S n (k)

nZ+
N
kZ

and

! p1
(6.1.11)

X
kZN

sup |S n (k)|p
nZ+

(12)N p

p1

! p1
X

|ak |p

for p (1, ].

kZN

The inequality in (6.1.10) is called Hardys Inequality. Actually, Hardy


worked in one dimension and was drawn to this line of research by his passion

6.1 Some Extensions

239

for the game of cricket. What Hardy wanted to find is the optimal order in
which to arrange batters to maximize the average score per inning. Thus, he
worked with a non-negative sequence {ak : k 0} in which ak represented the
expected number of runs scored by player k, and what he showed is that, for
each (0, ),




k N : sup S n (k)


+
nZ

is maximized when {ak : k 0} is non-increasing, from which it is an easy


application of Markovs Inequality to prove that



X


k N : sup S n (k) 1
ak ,


nZ+
0

(0, ).

Although this sharpened result can also be obtained as a corollary the Sunrise
Lemma,1 Hardys approach remains the most appealing.
6.1.2. Banach SpaceValued Martingales. I turn next to martingales
with values in a separable Banach space. Actually, everything except the easiest
aspects of this topic becomes extremely complicated and technical very quickly,
and, for this reason, I will restrict my attention to those results that do not
involve any deep properties of the geometry of Banach spaces. In fact, the only
general theory with which I will deal is contained in the following.

Theorem 6.1.12. Let E be a separable
Banach
space
and
X
,
F
,

an En
n

valued martingale. Then kXn kE , Fn , is a non-negative submartingale and
therefore, for each N Z+ and all (0, ),

(6.1.13)


sup kXn kE

0nN


1
E kXN kE ,


sup kXn kE .
0nN

In particular, for each p (1, ],


(6.1.14)





sup kXn kE
nN

Lp (;E)

p
sup kXn kLp (;E) .
p 1 nN

Finally, if Xn = E [X | Fn ], where X Lp (; E) for some p [1, ), then


"
#
_

Xn E X Fn both (a.e., ) and in Lp (; E).
0

See Lemma 3.4.5 in my A Concise Introduction to the Theory of Integration, Third Edition,
Birkhauser (1998).

240

6 Some Extensions and Applications


Proof: The fact kXn kE , Fn , is a submartingale is an easy application of
the inequality in (5.1.14); and, given this fact, the inequalities in (6.1.13) and
(6.1.14) follow from the corresponding inequalities in Theorem 6.1.1.
While proving the convergence statement, I may and will assume that F =
W
p

0 Fn . Now let X L (; E) be given, and set Xn = E [X|Fn ], n N.


Because of (6.1.13) and (6.1.14), we know (cf. the proofs of Corollary 5.2.4 and
Theorem 6.1.8) that the set of X for which Xn X (a.e., ) is a closed
subset of Lp (; E). Moreover, if X is -simple, then the -almost everywhere
convergence of Xn to X follows easily from the R-valued result. Hence, we
now know that Xn X (a.s, ) for each X L1 (; E). In addition, because
of (6.1.14), when p (1, ), the convergence in Lp (; E) follows by Lebesgues
Dominated Convergence Theorem. Finally, to prove the convergence in L1 (; E)
when X L1 (; E), note that, by Fatous Lemma,
kXkL1 (;E) lim kXn kL1 (;E) ,
n

whereas (5.1.14) guarantees that


kXkL1 (;E) lim kXn kL1 (;E) .
n

Hence, because




kXn kE kXkE kXn XkE 2kXkE ,
the convergence in L1 (; E) is again an application of Lebesgues Dominated
Convergence Theorem. 
Going beyond the convergence result in Theorem 6.1.12 to get an analog of
Doobs Martingale Convergence Theorem is hard. For one thing, a nave analog
is not even true for general separable Banach spaces, and a rather deep analysis
of the geometry of Banach spaces is required in order to determine exactly when
it is true. (See Exercise 6.1.18 for a case in which it is.)
Exercises for 6.1
Exercise 6.1.15. In this exercise we will develop Jensens Inequality in the
Banach space setting. Thus, (, F, P) will be a probability space, C will be a
closed, convex subset of the separable Banach space E, and X will be a C-valued
element of L1 (P; E).
(i) Show that there exists a sequence {Xn : n 1} of C-valued, simple functions
that tend to X both P-almost surely and in L1 (P; E).
(ii) Show that EP [X] C and that



EP g(X) g EP [X]
for every continuous, concave g : C [0, ).

Exercises for 6.1

241

(iii) Given a sub--algebra of F, follow the argument in Corollary 5.2.8 to


show that there exists a sequence {Pn }
0 of finite, -measurable partitions with
the property that
X EP [X, A]
1A EP [X|]
P(A)

both P-almost surely and in L1 (P; E).

APn

In particular, conclude that there is a representative X of EP [X|] that is


C-valued and satisfies



EP g(X) g X
(a.s., P)
for each continuous, convex g : C [0, ).
Exercise 6.1.16. Again let (, F, P) be a probability space and E be a separable Banach space. Further, suppose that {FTn : n 0} is a non-increasing se
quence of sub--algebras of F, and set F = 0 Fn . Finally, let X L1 (P; E).
(i) Show that
 
EP X Fn EP [X|F ]

both P-almost surely and in Lp (P; E)

for any p [1, ) with X Lp (P; E).


Hint: Use (6.1.13) and the approximation result in Theorem 5.1.10 to reduce to
the case when X is simple. When X is simple, get the result as an application
of the convergence result for R-valued, reversed martingales in Theorem 5.2.21.
(ii) Using part (i) and following the line of reasoning suggested at the end of
5.2.4, give a proof of The Strong Law of Large Numbers for Banach space
valued random variables.2 (See Exercises 6.2.18 and 9.1.18 for entirely different
approaches.)
Exercise 6.1.17.
As we saw in the proof of Theorem 6.1.8, the Hardy
Littlewood maximal function can be used to dominate other quantities of interest. As a further indication of its importance, I will use it in this exercise
to prove the analog of Theorem 6.1.8
for a large class of approximate identities.
R
That is, let L1 (RN ; R) with RN (x) dx = 1 be given, and set

t (x) = tN xt , t (0, ) and x RN .

Then {t : t > 0} forms an approximate identity in the sense that, as


tempered distributions, t 0 as t & 0. In fact, because
kt ? f kLp (RN ;R) kkL1 (RN ;R) kf kLp (RN ;R) ,
2

t (0, ) and p [1, ],

This proof, which seems to have been the first, of the Strong Law for Banach spaces was
given by E. Mourier in El
ements al
eatoires dans un espace de Banach, Ann. Inst. Poincar
e
13, pp. 166244 (1953).

242

6 Some Extensions and Applications

and

Z
(y) f (x ty) dy,

t ? f (x) =
RN

it is easy to see that, for each p [1, ),




lim t ? f f Lp (RN ;R) = 0

t&0

first for f Cc (RN ; R) and then for all f Lp (RN ; R).


The purpose of this exercise is to sharpen the preceding under the assumption
that

(x) = |x| ,


x RN \ {0} for some C 1 (0, ); R with
Z
A
rN |0 (r)| dr < .
(0,)

Notice that when is non-negative and non-increasing, integration by parts


shows that A = N .
(i) Let f Cc (RN ; R) be given, and set
f(r, x) =

1
|B(x, r)|

Z
f (y) dy

for r (0, ) and x RN .

B(x,r)

Using integration by parts and the given hypotheses, show that


t ? f (x) =

N1

rN 0 (r) f(tr, x) dr,

(0,)

and conclude that




t ? f (x)

A
N

(x),
Mf

is the quantity introduced at the beginning of the proof of Theorem


where Mf
6.1.8. In particular, conclude that there is a constant KN (0, ), depending
only on N Z+ , such that


M f (x) sup t ? f (x) KN A Mf (x),

x RN .

t(0,)

(ii) Starting from the conclusion in (i), show that



(12)N KN Akf kL1 (RN )
{x : M f (x) R}
,
R

f L1 (RN ; R),

Exercises for 6.1

243

and that for p (1, ],


N


M f p N (12) KN A p kf kLp (RN ;R) , f Lp (RN ; R).
L (R ;R)
p1
Finally, proceeding as in the proof of Theorem 6.1.8, use the first of these to
prove that, for f L1 (RN ; R) and Lebesgue almost every x RN ,


lim t ? f (x) f (x)
t&0



t (y) f (x y) f (x) dy = 0.

lim

t&0

RN

Two of the most familiar examples to which the preceding applies are the
2
N
Gauss kernel gt (x) = (2t) 2 exp |x|2 and the Poisson kernel (cf. (3.3.19))
N
R
t . In both these cases, A = N .

Exercise 6.1.18. Let E be a separable Hilbert space and (Xn , F, P) an Evalued martingale on some probability space (, F, P) satisfying the condition


sup EP kXn k2E < .
nZ+

W
Proceeding as in (i) of Exercise 5.2.36, first prove that there is a 1 Fn -measurable X L2 (P; E) to which {Xn : n 1} converges in L2 (P; E), next check
that
 
Xn = EP X Fn (a.s., P) for each n Z+ ,
and finally apply the last part of Theorem 6.1.12 to see that Xn X P-almost
surely.
Exercise 6.1.19. This exercise deals with a variation, proposed by Jan Mycielski, on the sort of search algorithm discussed in 5.2.5. Let G be a non-empty,

bounded, open subset of RN with the property that RN B(x, r) G N rd
for some > 0 and all x G and 0 < r diam(G), and define on (G, BG )
(G)
by () = RNN (G) . Next, let (, F, P) be a probability space on which there
R
exists sequences {Xn : n 1} and {Zn : n 1} of G-valued random variables
with the properties that the Xn s are mutually independent and have distribution , Zn is independent of {X 1 , . . . , Xn } and has distribution n  for each
n
< for some r (1, ). Without loss
n 1, and Kr supn1 d
d r
L (;R)

in generality, assume that n 6= n0 = Xn () 6= Xn0 () for all . For each


n 1, let Yn () be the last element of {X1 (), . . . , Xn ()} which is closest to
Zn (). That is, if n is the permutation group on {1, . . . , n} and, for n ,


An () = : |X(m) () Zn ()| < |X(m1) () Zn ()| : for 2 m n ,
then Yn = Z(n) on An (). Show that for all Borel measurable f : G R,
|f (Yn ) f (Zn )| 0 in P-probability. Here are some steps that you might want
to follow.

244

6 Some Extensions and Applications

(i) Given f L1 (; R), show that


)
(
Z
1
|f | d 1 Mf (x)
MG f (x) sup
|B(x, r) G| B(x,r)G
r>0

and therefore that there is a C < such that kMG f kLp (;R)
for all p (1, ].

Cp
p
p1 kf kL (;R)

(ii) Given n 1 and z G, set




An (z) = : |Xm () z| < |Xm1 () z| : for 2 m n ,
and show that


E f (Yn ) = n!
P



EP f (Xn ), An (z) n (dz).

Next, for n 2, set rn () = |Xn1 () z|, and show that


"Z
#



P
P
E f (Xn ), An (z) = E
f d, An1 (z) MG f (z)P An (z) ,
B(z,rn )

and conclude from this that



E f (Yn )
P

Z
MG f dn .

(iii) Given the conclusion drawn at the end of (ii), proceed as in the derivation
of Theorem 5.2.34 from Lemma 5.2.31 to get the desired result.
6.2 Elements of Ergodic Theory
Among the two or three most important general results about dynamical systems
is D. Birkhoffs Individual Ergodic Theorem. In this section, I will present a
generalization, due to N. Wiener, of Birkhoffs basic theorem.
The setting in which I will prove the Ergodic Theorem will be the following.
(,
be a -finite measure space on which there exits a semigroup
 kF, ) will N
: k N
of measurable, -measure preserving transformations.
That is, for each k NN , k is an F-measurable map from into itself, 0 is
the identity map, k+` = k ` for all k, ` NN , and

() = (k )1 () for all k N and F.
Further, E will be a separable Banach space with norm k kE , and, given a
function F : E, I will be considering the averages
1 X
F k (), n Z+ ,
An F () N
n
+
kQn



N
where Q+
: kkk < n and kkk max1jN kj . My
n is the cube k N
goal (cf. Theorem 6.2.7) is to show that, for each p [1, ) and F Lp (; E),
{An F : n 1} converges -almost everywhere. In fact, when either is finite
or p (1, ), I will show that the convergence is also in Lp (; E).

6.2 Elements of Ergodic Theory

245

6.2.1. The Maximal Ergodic Lemma. Because he was thinking in terms


of dynamical systems and therefore did not take full advantage of measure theory, Birkhoffs own proof of his theorem is rather cumbersome. Later, F. Riesz
discovered a proof which has become the model for all later proofs. Specifically,
he introduced what is now called the Maximal Ergodic Inequality, which is an
inequality that plays the same role here that Doobs Inequality played in the
derivation of Corollary 5.2.4. In order to cover Wieners extension of Birkhoffs
theorem, I will derive a multiparameter version of the Maximal Ergodic Inequality, which, as the proof shows, is really just a clever application of Hardys
Inequality.1
Lemma 6.2.1 (Maximal Ergodic Lemma). For each n Z+ and p [1, ],
An is a contraction on Lp (; E). Moreover, for each F Lp (; E),
(6.2.2)



(24)N
kF kL1 (;E) ,
sup kAn F kE

n1

(0, ),

or




sup kAn F kE
n1

(6.2.3)

Lp ()

(24)N p
kF kLp (;E) ,
p1

depending on whether p = 1 or p (1, ).


Proof: First observe that, because kAn F kE An kF kE , it suffices to prove
all of these assertions in the case when E = R and F is non-negative. Thus, I
will restrict myself to this case. Since F k has the same distribution as F
itself, the first assertion is trivial. To prove (6.2.2) and (6.2.3), let n Z+ be
given, apply (6.1.10) and (6.1.11) to

ak ()

F k ()

if k Q+
2n

if k
/ Q+
2n ,

and conclude that


n
o

+
k
Cn () card k Qn : max Am F ()
1mn

(12)

F k ()

kQ+
2n

The idea of using Hardys Inequality was suggested to P. Hartman by J. von Neumann and
appears for the first time in Hartmans On the ergodic theorem, Am. J. Math. 69, pp.
193199 (1947).

246

6 Some Extensions and Applications

and
X
kQ+
n

max

1mn

Am F ()

p

(12)N p
p1

p X 

F k ()

p

kQ+
2n

Hence, by Tonellis Theorem,




max Am F

1mn

kQ+
n

Z
=

Cn () (d)

Z
(12)N X
F k f d

+
kQ2n

and, similarly,
X Z
kQ+
n

max

1mn

Am F

p


d

(12)N p
p1

p X Z 

F k

p

d.

kQ+
2n


Finally, since the distributions of max1mn Am F k and F k do not
depend on k NN , the preceding lead immediately to



max Am F

1mn

and




max Am F
1mn

Lp ()

(24)N
kF kL1 ()

2 p (12)N p
kF kLp ()

p1

for all n Z+ . Thus, (6.2.2) and (6.2.3) follow after one lets n . 
Given (6.2.2) and (6.2.3), I adopt again the strategy used in the proof of
Corollary 5.2.4. That is, I must begin by finding a dense subset of each Lp -space
on that the desired convergence results can be checked by hand, and for this
purpose I will have to introduce the notion of invariance.
A set F is said to be invariant, and I write I if = (k )1 () for
every k NN . As is easily checked, I is a sub--algebra of F. In addition, it
is clear that F is invariant if = (ej )1 () for each 1 j N , where
{ei : 1 i N } is the standard orthonormal basis in RN . Finally, if I is the
-completion of I relative to F in the sense that I if and only if F and
I such that ()
= 0 (AB (A\B)(B \A) is the symmetric
there is
difference between the sets A and B), then an F-measurable F : E is
I-measurable if and only if F = F k (a.e., ) for each k NN . Indeed, one

6.2 Elements of Ergodic Theory

247

need only check this equivalence for indicator functions of sets. But if F
= 0 for some
I, then
and ()



+ ()
= 0,
(k )1 () (k )1 ()
and so I. Conversely, if I, set
[
=

(k )1 (),
kNN

I and ()
= 0.
and check that
Lemma 6.2.4. Let I(E) be the subspace of I-measurable elements of L2 (; E).
Then, I(E) is a closed linear subspace of L2 (; E). Moreover, if I(R) denotes
orthogonal projection from L2 (; R) onto I(R), then there exists a unique linear
contraction I(E) : L2 (; E) I(E) with the property that I(E) (af ) =
aI(R) f for a E and f L2 (; R). Finally, for each F L2 (; E),

(6.2.5)

An F I(E) F

(a.e., ) and in L2 (; E).

Proof: I begin with the case when E = R. The first step is to identify the
orthogonal complement I(R) of I(R). To this end, let N denote the subspace
of L2 (; R) consisting of elements having the form g g ej for some g
L2 (; R) L (; R) and 1 j N . Given f I(R), observe that



f, g g ej L2 (;R) = f, g L2 (;R) f ej , g ej L2 (;R) = 0.
Hence, N I(R) . On the other hand, if f L2 (; R) and f N , then it is
clear that f f f ej for each 1 j N and therefore that


f f ej 2 2
L (;R)

2

2
= kf kL2 (;R) 2 f, f ej L2 (;R) + f ej L2 (;R)




= 2 kf k2L2 (;R) f, f ej L2 (;R) = 2 f, f f ej L2 (;R) = 0.
Thus, for each 1 j N , f = f ej -almost everywhere; and, by induction
on kkk , one concludes that f = f k -almost everywhere for all k NN .
In other words, we have now shown that I(R) = N or, equivalently, that
N = I(R) .
Continuing with E = R, next note that if f I(R), then An f = f (a.e., )
for each n Z+ . Hence, (6.2.5) is completely trivial in this case. On the other
hand, if g L2 (; R) L (; R) and f = g g ej , then
X
X
nN An f =
g k
g k+ej ,
{kQ+
n :kj =0}

{kQ+
n :kj =n1}

248

6 Some Extensions and Applications

and so, with p {2, },




2kgkLp (;R)
An f p
0

L (;R)
n

as n .

Hence, in this case also, (6.2.5) is easy. Finally, to complete the proof for E = R,
simply note that, by (6.2.3) with p = 2 and E = R, the set of f L2 (; R) for
which (6.2.5) holds is a closed linear subspace of L2 (; R) and that we have
already verified (6.2.5) for f I(R) and f from a dense subspace of I(R) .
Turning to general Es, first note that I(E) F is well defined for -simple F s.
P`
Indeed, if F = 1 ai 1i for some {ai : 1 i `} E and {i : 1 i `} of
mutually disjoint elements of F with finite -measure, then
I(E) F =

`
X

ai I(R) 1i

and so


I(E) F 2 2

L (;E)

`
X

!2
kai kE I(R) 1i




= I(R)

`
X
1

! 2


kai kE 1i
2

L (;R)

`
X

kai k2E (i ) = kF k2L2 (;E) .

Thus, since the space of -simple functions is dense in L2 (; E), it is clear that
I(E) not only exists but is also unique.
Finally, to check (6.2.5) for general Es, note that (6.2.5) for E-valued, simple F s is an immediate consequence of (6.2.5) for E = R. Thus, we already
know (6.2.5) for a dense subspace of L2 (; E), and so the rest is another elementary application of (6.2.3). 
6.2.2. Birkhoff s Ergodic Theorem. For any p [1, ), let Ip (E) denote
the subspace of I-measurable elements of Lp (; E). Clearly Ip (E) is closed for
every p [1, ). Moreover, since
 
(6.2.6)
() < = I(E) F = E F I ,

when is finite I(E) extends automatically as a linear contraction from Lp (; E)


onto Ip (E) for each p [1, ), the extension being given by the right-hand side
of (6.2.6). However, when (E) = , there is a problem. Namely, because  I
will seldom be -finite, it will not be possible to condition with respect to I.
Be that as it may, (6.2.5) provides an extension of I(E) . Namely, from (6.2.5)
and Fatous Lemma, it is clear that, for each p [1, ),


I(E) F p
kF kLp (;E) , F Lp (; E) L2 (; E),
L (;E)
and therefore the desired existence of the extension follows by continuity.

6.2 Elements of Ergodic Theory

249

Theorem 6.2.7 (Birkhoff s Individual Ergodic Theorem). For each p


[1, ) and F Lp (; E),
(6.2.8)

An F I(E) F

(a.e., ).

Moreover, if either p (1, ) or p = 1 and () < , then the convergence in


(6.2.8) is also in Lp (; E). Finally, if () ({) = 0 for every I, then
(6.2.8) can be replaced by
E [F ]

()
lim An F =
n

if () (0, )
(a.e., ),
if () =

and the convergence is in Lp (; E) when either p (1, ) or p = 1 and () <


.
Proof: As I said above, the proof is now an easy application of the strategy
used to prove Corollary 5.2.4. Namely, by (6.2.2), the set of F L1 (; E) for
which (6.2.8) holds is closed and, by (6.2.5), it includes L1 (; E) L (; E).
Hence, (6.2.8) is proved for p = 1. On the other hand, when p (1, ),
(6.2.3) applies and shows first that the set of F Lp (; E) for which (6.2.8)
holds is closed in Lp (; E) and second that -almost everywhere convergence
already implies convergence in Lp (; E). Hence, we have proved that (6.2.8)
holds and that the convergence is in Lp (; E) when p (1, ). In addition,
when () ({) = 0 for all I, it is clear that the only elements of Ip (E)
are -almost everywhere constant, which, in the case when () < , means (cf.

[F ]
, and, when () = , means that Ip (E) = {0}
(6.2.6)) that I(E) F = E()
for all p [1, ).
In view of the preceding, all that remains is to discuss the L1 (; E) convergence
in the case when p = 1 and () < . To this end, observe that, because the
An s are all contractions in L1 (; E), it suffices to prove L1 (; E) convergence
for E-valued, -simple F s. But L1 (; E) convergence for such F s reduces
to showing that An f I(R) f in L1 (; R) for non-negative f L (; R).
Finally, if f L1 ; [0, ) , then



An f kL1 () = kf kL1 () = I(R) f kL1 (;R) ,

n Z+ ,

where, in the last equality, I used (6.2.6); and this, together with (6.2.8), implies
(cf. the final step in the proof of Theorem 6.1.12) convergence in L1 (). 


I will say that semigroup k : k NN is ergodic on (, F, ) if, in addition
to being -measure preserving, () ({) = 0 for every invariant I.

250

6 Some Extensions and Applications

Classic Example. In order to get a feeling for what the Ergodic Theorem is
saying, take to be Lebesgue measure on the interval [0, 1) and, for a given
(0, 1), define : [0, 1) [0, 1) so that
() + [ + ] = + mod 1.
If is rational and m is the smallest element of Z+ with the property that
m Z+ , then it is clear that, for any F on [0, 1), F = F if and only if F
1
. Hence, if F L2 [0, 1); C and
has period m
Z

c` (F )
F ()e 1 2` d, ` Z,
[0,1)

then elementary Fourier analysis leads to the conclusion that, in this case,

X
lim An F () =
cm` (F )e 1 2m` for Lebesgue-almost every [0, 1).
n

`Z


On the other hand, if is irrational, then k : k N} is -ergodic on [0, 1).
To see this, suppose that F I(C). Then (cf. the preceding and use Parsevals
Identity)
X

2

c` (F ) c` (F ) 2 .
0 = F F L2 ([0,1);C) =
`Z

But, clearly,

c` (F ) = e

1 2`

c` (F ),

` Z,

and so (because is irrational) c` (F ) = 0 for each ` 6= 0. In other words, the only


elements of I(C) are -almost everywhere constant. Thus, for each irrational
(0, 1), p [1, ), separable Banach space E, and F Lp [0, 1); E ,
Z
lim An F =
F () d Lebesgue-almost everywhere and in Lp (; E).
n

[0,1)

Finally, notice that the situation changes radically when one moves from [0, 1) to
[0, ) and again takes to be Lebesgue measure and (0, 1) to be irrational.
If I extend the definition of by taking () = bc + ( bc) for
[0, ), then it is clear that invariant functions are those that are constant on each
R bc+1
interval [m, m+1) and that, Lebesgue-almost surely, An f () bc f () d.
On the other hand, if one defines () = + , then every invariant set that
has non-zero measure will have infinite measure, and so, now, every choice of
(0, 1) (not just irrational ones) will give rise to an ergodic system. In
particular, one will have, for each p [1, ) and F Lp (; E),
lim An F = 0

Lebesgue-almost everywhere,

and the convergence will be in Lp (; E) when p (1, ).

6.2 Elements of Ergodic Theory

251

6.2.3. Stationary Sequences. For applications to probability theory, it


is useful to reformulate these considerations in terms of stationary families of
random variables. Thus, let (, F, P) be a probability space and (E, B) be a
measurable space (E need not be a Banach space). Given a family F = {Xk :
k NN } of E-valued random variables on (, F, P), I will say that F is Pstationary (or simply stationary) if, for each ` NN , the family


F` Xk+` : k NN
has the same (joint) distribution under P as F itself. Clearly, one can test for
stationarity by checking that the distribution of Fej is the same as that of F for
each 1 j N . In order to apply the considerations of 6.2.1 to stationary
families, note that all questions about the properties of F can be phrased in
N
terms of the following canonical setting . Namely, set E = E N and define
N
N
on E, B N
to be the image measure F P. In other words, for each B N ,
() = P F . Next, for each ` NN , define ` : E E to be the natural
shift transformation on E given by ` (x)k = xk+` for all k NN . Obviously,
stationarity of F is equivalent to the statement that {k : k NN } is -measure
N
preserving. Moreover, if I is the -algebra of shift invariant elements B N

1
(i.e., = k
() for all k NN ), then, by Theorem 6.2.7, for any separable
Banach space B, any p [1, ), and any F Lp (P; B),

h
i
1 X

F Fk = EP F F F1 (I) (a.s., P) and in Lp (P; B).
lim N
n n
+
kQn




N
In particular, when k : k NN is ergodic on E, B N , I will say that the
family F is ergodic and conclude that the preceding can be replaced by


1 X
F Fk = EP F F (a.s., P) and in Lp (P; B).
(6.2.9)
lim N
n n
+
kQn

So far I have discussed one-sided stationary families, that is, families indexed
by NN . However, for various reasons (cf. Theorem 6.2.11) it is useful to know
that one can usually embed a one-sided stationary family into a two-sided one. In
terms of the semigroup
of shifts,
to the trivial observation that
 k
this corresponds
N
NN
the semigroup : k N
on E = E
can be viewed as a sub-semigroup


= E ZN . With these comments in
of the group of shifts k : k ZN on E
mind, I will prove the following.
Lemma 6.2.10. Assume that E is a complete, separable, metric space and that
F = {Xk : k NN } is a stationary family of E-valued random variables on the
and
F,
P)
probability space
 (, F, P).
Then there exists a probability space (,
N
N

a family F = Xk : k Z
with the property that, for each ` Z ,


` X
k+` : k NN
F
as F has under P.
has the same distribution under P

252

6 Some Extensions and Applications

Proof: When formulated correctly, this theorem is an essentially trivial application of Kolmogorovs Extension Theorem (cf. part (iii) of Exercise 9.1.17).
Namely, for n N, set


n = k ZN : kj n for 1 j N ,
and define n : E 0 E n so that
n (x)k = xn+k

for x E 0 and k n , where n (n, . . . , n)

Next, take 0 on E 0 to be the P-distribution of F and, for n 1, n on E n


to be (n ) 0 . Using stationarity, one can easily check that, for each n 0
and k NN , n is invariant under the obvious extension of k to E n . In
particular, if one identifies E n+1 with E n+1 \n E n , then

n+1 E n+1 \n = n ()

for all BE n .

Hence the n s are consistently defined on the spaces E n , and therefore Kolmogorovs Extension Theorem applies and guarantees the existence of a unique
N
Borel probability measure on E Z with the property that
N

EZ

\n


= n ()

for all n 0 and BE n .

Moreover, since each n is k -invariant for all k NN , it is clear that is also.


N
Thus, because k is invertible on E Z and k is its inverse, it follows that
is invariant under k for all k ZN .
= ,
= E ZN , F = B , P
To complete the proof at this point, simply take

k (
and X
) =
k for k ZN . 
As an example of the advantage that Lemma 6.2.10 affords, I present the
following beautiful observation made originally by M. Kac.
Theorem 6.2.11.
Let (E, B) be a measurable space and {Xk : k N}
a stationary sequence of E-valued random variables on the probability space
(, F, P). Given B,
 define the return time ()
 = inf{k 1 : Xk () }.
Then, EP , X0 = P Xk for some k N . In particular, if {Xk : k
N} is ergodic, then



P X0 > 0 = EP , X0 = 1.
Proof: Set Uk = 1 Xk for k N. Then {Uk : k N} is a stationary sequence
of {0, 1}-valued random
 variables. Hence, by Lemma 6.2.10, we can find a prob on which there is a family {U
F,
P
k : k Z} of {0, 1}-valued
ability space ,

6.2 Elements of Ergodic Theory

253


n , . . . , U
n+k , . . .
random variables with the property that, for every n Z, U
as (U0 , . . . , Uk , . . . ) has under P. In particular,
has the same distribution under P


U
0 = 1 and
P 1, X0 = P


U
n = 1, U
n+1 = 0, . . . , U
0 = 0 ,
P n + 1, X0 = P

n Z+ .

Thus, if


(
) inf k N : Uk (
) = 1 ,
then


= n 1 ,
P n, X0 = P

n Z+ ,

and so



< .
EP , X0 = P
Now observe that



> n = P
U
n = 0, . . . , U
0 = 0 = P X0
P
/ , . . . , Xn
/ ,
from which it is clear that


< = P k N Xk .
P
Finally, assume that
P{Xk : k N} is ergodic and that P(X0 ) > 0.
Because, by (6.2.9), 0 1 Xk = P-almost surely, it follows that, P-almost
surely, Xk for some k N. 
It should be noticed that, although there are far more elementary proofs, when
{Xn : n 0} is an irreducible, ergodic Markov chain on a countable state space
E, then Kacs theorem proves that the stationary measure at the state x E is
the reciprocal of the expected time that the chain takes to return to x when it
starts at x.
6.2.4. Continuous Parameter Ergodic Theory. I turn now to the setting of continuously parametrized semigroups
Thus, again
 of transformations.

(, F, ) is a -finite measure space and t : t [0, )N is a measurable
semigroup of -measure preserving transformations on . That is, 0 is the
identity, s+t = s t ,
(t, ) [0, )N 7 t () is B[0,)N F-measurable,

and t = for every t [0, )N . Next, given an F-measurable F with
values in some separable Banach space E, let G(F ) be the set of with the
property that
Z


F t () dt < for all T (0, ).
E
[0,T )N

254

6 Some Extensions and Applications

Clearly,
G(F ) = t () G(F )

for every t [0, )N .

In addition, if F Lp (; E) for some p [1, ), then


!
Z
Z


F t () p dt (d) = T N kF kp p

and so
F

L (;E)

[0,T )N

< ,



Lp (; E) = G(F ){ = 0.

p[1,)

Next, for each T (0, ), define


( N R
T
F t () dt
[0,T )N
AT F () =
0

if G(F )
if
/ G(F ),

and note that, as a consequence of the invariance of G(F ),




AT F t = AT F t
for all t [0, )N .
to denote the -algebra of F with the property that =
Finally, use I


t 1
( ) () for each t [0, )N , and say that t : t [0, )N is ergodic if

() ({) = 0 for every I.


 t
Theorem
6.2.12.
Let
(,
F,
)
be
a
-finite
measure
space
and
: t

[0, )N be a measurable semigroup of -measure preserving transformations
on . Then, for each separable Banach space E, p [1, ), and T (0, ),
AT is a contraction on Lp (; E). Next, set I(E)
= I(E) A1 , where I(E)

 k

N
is defined in terms of : k N
as in Theorem 6.2.7. Then, for each
p [1, ) and F Lp (; E),
lim AT F = I(E)
F

(6.2.13)

(a.e., ).

Moreover, if p (1, ) or p = 1 and () < , then the convergence is also in


Lp (; E). In fact, if () < , then
 
(a.e., ) and in Lp ( : E).
lim AT F = E F I
T



Finally, if t : t [0, )N is ergodic, then (6.2.13) can be replaced by
lim AT F =

E [F ]
()

(a.e., ),

where it is understood that the ratio is 0 when the denominator is infinite.

6.2 Elements of Ergodic Theory


Proof: The first step is the observation that




(24)N
kF kL1 (;E) ,
(6.2.14)
sup AT F E

T >0

255

(0, )

and





sup AT F
T >0
E

(6.2.15)

Lp (;E)

(24)N p
kF kLp (;E)
p1

for p (1, ).

Indeed, because of (AT F ) t = AT (F t ), (6.2.14) is derived from (6.1.6)


in precisely the same way as I derived (6.2.2) from (6.1.10), and (6.2.15) comes
from (6.1.7) just as (6.2.3) came from (6.1.7).
Given (6.2.14) and (6.2.15), we know that it suffices to prove (6.2.13) for
a dense subset of L1 (; E). Thus, let F be a uniformly bounded element of
L1 (; E) and set F = A1 F . Because
Z
N


T AT F () nN An F ()
F t ()kE dt
E
[0,n+1)N \[0,n)N

for n T n + 1,


lim
sup
n

nT n+1




AT F An F
E

=0

for every p [1, ].

Lp (;R)

Hence, for F L1 (; E)L (; E), (6.2.13) follows from (6.2.8). As for
 the case
when () < , all that we have to do is check that I(E)
F = E F
I (a.e., ).

However, from (6.2.13), it is easy to see that I(E)


F is measurable with respect

to the -completion of I, and so it suffices to show that







E F, = E A1 F, for all I.
then
But, if I,



E A1 F, =



E F t , dt

[0,1)N

Z
=


1 
E F t , t
() dt = E [F, ].

[0,1)N



Finally, assume that t : t [0, )N is -ergodic. When () < , the
asserted result follows immediately from the preceding; and when () = , it
follows from the fact that I(E)
F is measurable with respect to the -completion


of I.

256

6 Some Extensions and Applications


Exercises for 6.2

Exercise 6.2.16. Given an irrational (0, 1) and an  (0, 1), let Nn (, )


be the number of 1 m n with the property that




` 
for some ` Z.

m 2m

As an application of the considerations in the Classic Example given at the end


of 6.1, show that
Nn (, )
.
lim
n
n

Hint: Let 0, 2 be given, take f equal to the indicator function of [0, )
Pn
(1 , 1), and observe that Nn (, ) k=1 f k () so long as 0 2 .


Exercise 6.2.17. Assume that () < and that k : k NN is ergodic.
Given a non-negative F-measurable function f , show that

lim An f < on a set of positive -measure = f L1 (; R)

E [f ]
(a.e., ).
n
()


Exercise 6.2.18. Let F = Xk : k NN be a stationary family of random
variables on the probability space (, F, P) with values in the measurable space
NN
(E, B), and let I denote the -algebra of shift invariant BE
.
= lim An f =

(i) Take
T


Xk : kj n for all 1 j N ,

n0



N
the tail -algebra
determined
by
X
:
k

N
. Show that F1 (I) T , and
k


conclude that Xk : k NN is ergodic if T is P-trivial (i.e., P() {0, 1} for
all T ).
(ii) By combining (i), Kolmogorovs 01 Law, and the Individual Ergodic Theorem, give another derivation of The Strong Law of Large Numbers for independent, identically distributed, integrable random variables with values in a
separable Banach space.


Exercise 6.2.19. Let Xk : k N be a stationary, ergodic sequence of Rvalued, integrable random variables on (, F, P). Using the reasoning suggested
in Exercise 1.4.28, prove Guivarchs lemma:


n1

X




Xk < .
EP X1 = 0 = lim

n
k=0

6.3 Burkholders Inequality

257

6.3 Burkholders Inequality



Given a martingale Xn , Fn , P with X0 = 0 and a sequence {n : n 0}
of bounded functions with the property that n is Fn -measurable for n 0,
determine {Yn : n 0} byY0 = 0 and Yn Yn1 = n1 (Xn Xn1 ) for n 1.
It is clear that Yn , Fn , P is again a martingale. In addition, if the absolute
values of all the n s are bounded by some constant < and Xn is square
P-integrable, then one can easily check that
n
n
X
  X




 
EP Yn2 =
EP n2 (Xn Xn1 )2 2
EP (Xn Xn1 )2 = 2 EP Xn2 .
m=1

m=1

On the other hand, it is not at all clear how to compare the size of Yn to that
of Xn in any of the Lp spaces other than p = 2.
The problem of finding such a comparison was given a definitive solution by D.
Burkholder, and I will present his solution in this section. Actually, Burkholder
solved the problem twice. His first solution was a beautiful adaptation of general
ideas and results that had been developed over the years to solve related problems in probability theory and analysis and, as such, did not yield the optimal
solution. His second approach is designed specifically to address the problem
at hand and bears little or no resemblance to familiar techniques. It is entirely
original, remarkably elementary and effective, but somewhat opaque. The approach is the outgrowth of many years of deep thinking that Burkholder devoted
to the topic, and the reader who wants to understand the path that led him to
it should consult the explanation that he wrote.1
6.3.1. Burkholders Comparison Theorem. Burkholders basic result is
the following comparison theorem.


Theorem
6.3.1 (Burkholder). Let , F, P be a probability space, Fn :

n N a non-decreasing sequence of sub--algebras of F, and E and F a pair
of (real or complex)
separable Hilbert spaces. Next, suppose that Xn , Fn , P

and Yn , Fn , P are, respectively, E- and F -valued martingales. If
kY0 kF kX0 kE and kYn Yn1 kF kXn Xn1 kE , n Z+ ,
P-almost surely, then, for each p (1, ) and n N,
(6.3.2)



Yn p
Bp Xn Lp (P;E) ,
L (P;F )

where Bp (p 1)

1
.
p1

As I said before, the derivation of Theorem 6.3.1 is both elementary and


mysterious. I begin with the trivial observation that, without loss in generality,
1

For those who want to know the secret behind this proof, Burkholder has revealed it in his
article Explorations in martingale theory and its applications for the 1989 Saint-Flour Ecole
dEt
e lectures published by Springer-Verlag, LNM 1464 (1991).

258

6 Some Extensions and Applications

I may assume that both E and F are complex Hilbert spaces, since we can always
complexify them, and, in addition, that E = F , since, if that is not already the
case, I can embed them in E F . Thus, I will be making these assumptions
throughout.
The heart of the proof lies in the computations contained in the following two
lemmas.
Lemma 6.3.3. Let p (1, ) be given, set

p =

p2p (p 1)p1
2p

if p [2, )
if p (1, 2],

and define u : E 2 R by (cf. (6.3.2))


u(x, y) = kykE Bp kxkE
Then
kykpE Bp kxkE

p

kykE + kxkE

p u(x, y),

p1

(x, y) E 2 .

Proof: When p = 2, there is nothing to do. Thus, I will assume that p


(1, ) \ {2}.
Observe that it suffices to show that, for all (x, y) E 2 satisfying kxkE +
kykE = 1, depending on whether p (2, ) or p (1, 2),


p p2p (p 1)p1 kykE (p 1)kxkE
p

(*)
kykE (p 1)kxkE
p2p (p 1)p1 kykE (p 1)kxkE .
Indeed, when p (2, ), (*) is precisely the result desired, and, when p (1, 2),
(*) gives the desired result after one divides through by (p 1)p and reverses
the roles of x and y.
I begin the verification of (*) by checking that
2p

(**)

(p 1)

p1

>1

if p (2, )

<1

if p (1, 2).

To this end, set f (p) = (p 1) log(p 1) (p 2) log p for p (1, ). Then


f is strictly convex on (1, 2) and strictly concave on (2, ). Thus, f  (1, 2)
cannot achieve a maximum and, therefore, since limp&1 f (p) = 0 = f (2), f < 0
on (1, 2). Similarly, f  (2, ) cannot achieve a minimum and, therefore, since
f (2) = 0 while limp% f (p) = , we have that f > 0 on (2, ).
Next, observe that proving (*) comes down to checking that, for s [0, 1],
(s) p

2p

(p 1)

p1

p p

(1 ps) (1 s) + (p 1) s

0
0

if p (2, )
if p (1, 2).

6.3 Burkholders Inequality

259

To this end, note that, by (**), (0) > 0 when p (2, ) and (0) < 0 when
p (1, 2). Also, for s (0, 1),
h
i
0 (s) = p (p 1)p sp1 + (1 s)p1 p2p (p 1)p1
and

h
i
00 (s) = p(p 1) (p 1)p sp2 (1 s)p2 .


In particular, we see that p1 = 0 p1 = 0. In addition, depending on whether
p (2, ) or p (1, 2), lims&0 00 (s) is negative or positive, 00 is strictly increasing or decreasing on (0, 1), and lims%1 00 (1) is positive or negative. Hence,
there exists a unique t = tp (0, 1) with the property that


< 0 if p (2, )
> 0 if p (2, )
00  (0, t)
and 00  (t, 1)
> 0 if p (1, 2)
< 0 if p (1, 2.


Moreover, because 00 (t) = 0, it is easy to see that t 0, p1 .


Now suppose that p (2, ) and consider on each of the intervals p1 , 1 ,
 
 1
t, p , and 0, t separately. Because both and 0 vanish at p1 while 00 > 0



on p1 , 1 , it is clear that > 0 on p1 , 1 . Next, because 0 p1 = 0 and


00  t, p1 > 0, we know that is strictly decreasing on t, p1 and therefore that

 
 t, p1 > p1 = 0. Finally, because 00  (0, t) < 0 while (0) (t) 0,
we also know that  (0, t) > 0. The argument when p (1, 2) is similar, only
this time all the signs are reversed. 

Lemma 6.3.4. Again let p (1, ) be given, and define u : E F R as in


Lemma 6.3.3. In addition, define the functions v and w on E 2 \ {0, 0} by
p2

v(x, y) = p kykE + kxkE
kykE + (2 p)kxkE
and
w(x, y) = p(1 p) kykE + kxkE

p2

kxkE .

Then, for (x, y) E 2 and (k, h) E 2 satisfying



min ky + thkE kx + tkkE > 0 and khkE kkkE ,
t[0,1]

one has
u(x + k, y + h) u(x, y) v(x, y) Re

y
kykF




x
,
k
, h + w(x, y) Re kxk
E
F

when p [2, ) and








y
x
,
k
,
h
v(y,
x)
Re
(p1) u(x+k, y+h)u(x, y) w(y, x) Re kyk
kxkE
E
E

when p (1, 2].

260

6 Some Extensions and Applications

Proof: Set

(t) = t; (x, k), (y, h)
ky + thkE (p 1)kx + tkkE

kx + tkkE + ky + thkE

p1

and observe that


(


u x + tk, y + th =


t; (x, k), (y, h)
(p 1)

if p [2, )


t; (y, h), (x, k)

if p (1, 2).

Hence, it suffices for us to check that






y+th
x+tk
,
k
,
h
+
w(x
+
tk,
y
+
th)Re
0 (t) = v(x + tk, y + th)Re ky+thk
kx+tkkE
E
E

and prove that


00 t; (x, k), (y, h)

if p [2, ) and khkE kkkE

if p (1, 2] and khkE kkkE .

To prove the preceding,


 = y + th, (t) = kx(t)kE +
 set x(t) = x + tk, y(t)
ky(t)kE , a(t) =

Re x(t),k

kx(t)kE

, and b(t) =

Re y(t),h

ky(t)kE

. One then has that

h


i
0 (t) = p(t)p2 (1 p)kx(t)kE a(t) + ky(t)kE + (2 p)kx(t)kE b(t)
h
i

= p (1 p)(t)p2 kx(t)kE a(t) + b(t) + (t)p1 b(t) .
In particular, the first expression establishes the required form for 0 (t). In
addition, from the second expression, we see that

2
00 (t)
= (p 1)(p 2) (t)p3 kx(t)kE a(t) + b(t)
p
i
h

2
2
E
+ (p 1)(t)p2 a(t) a(t) + b(t) + kx(t)k
ky(t)kE b (t) + a (t)
i
h

b (t)2
(t)p2 (p 1) a(t) + b(t) b(t) + (t) ky(t)k
E

2
= (p 1)(p 2) (t)p3 kx(t)kE a(t) + b(t)

+ (p 1)(t)p2 kkk2E khk2E + (p 2)(t)p1

b (t)2
ky(t)kE ,

p
p
where a (t) = kkk2E a(t)2 and b (t) = khk2E b(t)2 . Hence the required
properties of 00 (t) have also been established. 

6.3 Burkholders Inequality

261

Proof of Theorem 6.3.1: Set Kn = Xn Xn1 and Hn = Yn Yn1 for


n Z+ . I will assume that there is an  > 0 with the property that


X0 () span{Kn () : n Z+ } 
E
and


Y0 () span{Hn () : n Z+ } 
E
for all . Indeed, if this is not already the case, then I can replace E by
R E (or, when E is complex, C E) and Xn () and Yn (), respectively, by


Xn() () , Xn () and Yn() () , Yn () ,
()

()

for each n N. Clearly, (6.3.2) for each Xn and Yn implies (6.3.2) for Xn
and Yn after one lets  & 0. Finally, because there is nothing to do when the
right-hand side of (6.3.2) is infinite, let p (1, ) be given, and assume that
Xn Lp (P; E) for each n N. In particular, if u is the function defined in
Lemma 6.3.3 and v and w are those defined in Lemma 6.3.4, then
u(Xn , Yn ) L1 (P; R)

and v(Xn , Yn ), w(Xn , Yn ) Lp (P; R)

p
is the Holder conjugate of p.
for all n N, where p0 = p1


Note that, by Lemma 6.3.3, it suffices for us to show that An EP u Xn , Yn
0, n N. Since u X0 , Y0 ) 0 P-almost surely, there is no question that
A0 0. Next, assume that An 0, and, depending on whether p [2, ) or
p (1, 2], use the appropriate part of Lemma 6.3.4 to see that
 i
h

An+1 EP v(Xn , Yn )Re kYYnnkE , Hn+1
E
 i
h

P
Xn
+ E w(Xn , Yn )Re kXn kE , Kn+1

or

 i
h

An+1 EP w(Yn , Xn )Re kYYnnkE , Hn+1
E
 i
h

P
Xn
.
E v(Yn , Xn )Re kXn kE , Kn+1
E

v(Xn , Yn ) kYYnnkE

But, since
(cf. Exercise 5.1.18)

is Fn -measurable, E [Hn+1 |Fn ] = 0, and therefore


P

 i
h

= 0.
EP v(Xn , Yn )Re kYYnnkE , Hn+1
E

Since the same reasoning shows that each of the other terms on the right-hand
side vanishes, we have now proved that An+1 0. 
As an immediate consequence of Theorem (6.3.2), we have the following answer
to the question raised at the beginning of this section.

262

6 Some Extensions and Applications

Corollary 6.3.5. Suppose that (Xn , Fn , P) is a martingale with values in


a separable (real or complex) Hilbert space E. Further, let F be a second
separable, complex Hilbert space, and suppose that {n : n 0} is a sequence
of Hom(E; F )-valued random variables with the properties that 0 is constant,
n is Fn -measurable for n 1, and kn kop < (a.s., P) for some constant
< and all n N. If kY0 kF kX0 kE and Yn Yn1 = n1 (Xn Xn1 )
for n 1, then (Yn , Fn , P) is an F -valued martingale and, for each p (1, ),
(cf. (6.3.2))
kYn kLp (P;F ) Bp kXn kLp (P;E) , n N.
6.3.2. Burkholders Inequality. In many applications, the most useful
form of Burkholders result is as a generalization to p 6= 2 of the obvious equality
" n
#
X


2
P
2
E |Xn X0 | = E
|Xm Xm1 | .
P

m=1

This is the form of his inequality which is best known and, as such, is called
Burkholders Inequality. Notice that his inequality can be viewed as a vast
generalization of Khinchines Inequality (2.3.27), although it applies only when
p (1, ).


Theorem
6.3.6 (Burkholders Inequality). Let , F, P and Fn : n

N be as in Theorem 6.3.1, and let Xn , Fn , P be a martingale with values in
the separable Hilbert space E. Then, for each p (1, ),

(6.3.7)



1
sup Xn X0 Lp (P;E)
Bp nN

! p2 p1

X


Xn Xn1 2

EP
E
1



Bp sup Xn X0 Lp (P;E) ,
nN

with Bp as in (6.3.2).
Proof: Let F = `2 (N; E) be the separable Hilbert space of sequences

y = x0 , . . . , xn , . . . E N
satisfying
kykF

X
0

! 12
kxn k2E

< ,

Exercises for 6.3

263

and define
Yn () = (X0 (), X1 () X0 (), . . . , Xn () Xn1 (), 0, 0, . . . ) F

for and n N. Obviously, Yn , Fn , P is an F -valued martingale. Moreover,
kX0 kE = kY0 kF

and kXn Xn1 kE = kYn Yn1 kF ,

n N,

and therefore the right-hand side of (6.3.7) is implied by (6.3.2) while the lefthand side also follows from (6.3.2) when the roles of the Xn s and Yn s are
reversed. 
Exercises for 6.3
Exercise 6.3.8. Because it arises repeatedly in the theory of stochastic integration, one of the most frequent applications of Burkholders Inequality is to
situations in which E is a separable Hilbert space and (Xn , Fn , P) is an E-valued
martingale for which one has an estimate of the form

h
1
i 2p

P

2p


<
Kp sup E kXn Xn1 kE Fm1

+
nZ

(P;R)

for some p [1, ). To see how such an estimate gets used, let F be a second separable Hilbert space and suppose that {n : n N} is a sequence of
Hom(E; F )-valued random variables with the properties that, for each n N,
1

 2p
< . Set Y0 = 0 and
n is Fn -measurable and an EP kn k2p
op

Yn =

n
X

m1 Xm Xm1

for n Z+ ,

m=1

and show that



1
Yn 2p
(2p 1)n 2 Kp
L (P;F )

n1
1 X 2p
a
n m=0 m

1
! 2p

Exercise 6.3.9. Return to the setting in Exercise 5.2.45, and let [0,1) denote
Lebesgue measure on [0, 1). Given f L2 ([0,1) ; C), show that, for each p
(1, ),

1
f (f, 1)L2 ( ;C) p
(p 1)
[0,1)
L ([0,1);C)
p1

! p p1
Z

X

2 2
m (f )
dt

[0,1)

m=0

(p 1)


1
f (f, 1)L2 ( ;C) p
.
[0,1)
L ([0,1) ;C)
p1

264

6 Some Extensions and Applications

For functions f with (f, e` )L2 ([0,1) ;C) = 0 unless ` = 2m for some m N, this
estimate is a case of a famous theorem proved by Littlewood and Paley in order
to generalize Parsevals Identity to cover p 6= 2. Unfortunately, the argument
here is far too weak to give their inequality for general f s.
Exercise 6.3.10. In connection with the preceding exercise,
it is interesting

to note that there is an orthonormal basis for L2 [0,1) ; R that, as distinguished
from the trigonometric functions, can be nearly completely understood in terms
of martingale analysis. Namely, recall the Rademacher functions {Rn : n Z+ }
introduced in 1.1.2. Next, use F to denote the set of all finite subsets F of Z+ ,
and define the Walsh function WF for F F by

1
if F =
WF = Q
mF Rm if F 6= .
Finally, set A0 = and An = {1, . . . , n} for n Z+ .
(i) For each n N, let Fn be the -algebra generated by the partition

 k k+1 
: 0 k < 2n .
2n , 2n


Show that, for each n Z+ , WF : F An is an orthonormal
 basis for the

subspace L2 [0, 1), Fn , [0,1) ; R , and conclude
from
this
that
WF : F F

forms an orthonormal basis for L2 [0,1) ; R .

(ii) Let f L1 [0, 1); R be given, and set
!
X Z
f
Xn =
f (t) WF (t) dt WF for n N.
F An

[0,1)

Using the result in (i), show that Xnf = E[0,1) [f |Fn ] and therefore that Xnf , Fn ,
[0,1) is a martingale.
In particular, Xnf f both (a.e., [0,1) ) as well as in

1
L [0,1) ; R .

(iii) Show that for each p (1, ) and f L1 [0,1) ; R with mean value 0,
(p 1) (p 1)1 kf kLp ([0,1);R)

X
X

[0,1)

n=1

F An \An1

f (s) WF (s) ds
[0,1)

p1
2 p2

WF (t) dt

(p 1) (p 1)1 kf kLp ([0,1);R) .


Exercise 6.3.11. Although Burkholders Inequality is extremely useful, it
does not give particularly good estimates in the case of martingales with bounded
increments. For such martingales, the following line of reasoning, which was
introduced by J. Azema in his thesis, is useful.

Exercises for 6.3

265

(i) For any a R and x [1, 1], show that


eax

1 + x a 1 x a
e = cosh a + x sinh a.
e +
2
2

(ii) Suppose that {Y1 , . . . , Yn } are [1, 1]-valued random variables on the probability space (, F, P) with the property that, for each 1 m n,


EP Yj1 Yjm = 0

for all 1 j1 < < jm n.

Show that, for any {aj }n1 R,

n
n
n
X
X
Y
1
a2j ,
aj Yj
cosh aj exp
EP exp
2
j=1
j=1
j=1

and conclude that

!
n
2
X
R
,
P
aj Yj R exp Pn
2 j=1 a2j
j=1

R [0, ).

(iii) Suppose that (Xn , Fn , P) is a bounded martingale with X0 0, and set


Dn kXn Xn1 kL (P) . Show that
R2
P (Xn R) exp Pn
2 j=1 Dj2

!
,

R [0, ).

Chapter 7
Continuous Parameter Martingales

It turns out that many of the ideas and results introduced in 5.2 can be easily
transferred to the setting of processes depending on a continuous parameter. In
addition, the resulting theory is intimately connected with Levy processes, and
particularly Brownian. In this chapter, I will give a brief introduction to this
topic and some of the techniques to which it leads.1
7.1 Continuous Parameter Martingales
There is a huge number of annoying technicalities which have to be addressed in
order to give a mathematically correct description of the continuous time theory
of martingales. Fortunately, for the applications which I will give here, I can
keep them to a minimum.
7.1.1. Progressively
Measurable
Functions. Let (, F) be a measurable


space and Ft : t [0, ) a non-decreasing family of sub--algebras. I will say
that a function X on [0, )
 into a measurable
space (E, B) is progressively
measurable with respect to Ft : t [0, ) if X  [0, T ] is B[0,T ] FT measurable for every T [0, ). When E is a metric space, I will say that
X : [0, ) E is right-continuous if X(s, ) = limt&s X(t, ) for every
(s, ) [0, ) and will say that it is continuous if X( , ) is continuous
for all .
Remark 7.1.1. The reader might have been expecting a slightly different definition of progressive measurability
here. Namely,
he might have thought that


one would say that X is Ft : t [0, ) -progressively measurable if it is
B[0,) F-measurable and 7 X(t, ) E is Ft -measurable for each
t [0, ). Indeed, in extrapolating from the discrete parameter setting, this
would be the first definition at which one would arrive. In fact, it was the notion
with which Doob and Ito originally worked;
and such functions were said by
them to be adapted to Ft : t [0, ) . However, it came to be realized
that there are various problems with the notion of adaptedness. For example,
even if X is adapted and f : E R is a bounded, B-measurable function, the
1

A far more thorough treatment can be found in D. Revuz and M. Yors treatise Continuous
Martingales and Brownian Motion, Springer-Verlag, Grundlehren der Mathematishen #293
(1999).

266

7.1 Continuous Parameter Martingales

267


Rt
function (t, )
Y (t, ) 0 f X(s, ) ds R need not be adapted. On the
other hand, if X is progressively measurable, then Y will be also.
The following simple lemma should help to explain the virtue of progressive
measurability and its relationship to adaptedness.
Lemma 7.1.2. Let PM denote the set of A [0, ) with the property
that [0, t] A B[0,t] Ft for every t 0. Then PM is a sub--algebra of
B[0,) F and X is progressively measurable if and only if it is PM-measurable.
Furthermore, if E is a separable metric space and X : [0, ) E is a
right-continuous function, then X is progressively measurable if it is adapted.
Proof: Checking that PM is a -algebra is easy. Furthermore, for any X :
[0, ) E, T [0, ), and B,


(t, ) [0, T ] : X(t, )
 
= [0, T ] (t, ) [0, ) : X(t, ) },


and so X is Ft : t [0, ) -progressively measurable if and only if it is PMmeasurable. Hence, the first assertion has been proved.
Next, suppose that X is a right-continuous, adapted function. To see that X
is progressively measurable, let t [0, ) be given, and define

 n
Xnt (, ) = X [2 2n]+1 t, , for (, ) [0, ) and n N.

Obviously, Xnt is B[0,t] Ft -measurable for every n N and Xnt (, ) X(, )


as n for every (, ) [0, t] . Hence, X  [0, t] is B[0,t] Ft measurable, and so X is progressively measurable. 
7.1.2. Martingales: Definition and Examples. Given
 a probability
space
(, F, P ) and a non-decreasing family of sub--algebras Ft : t [0, ) , I will
say
with respect to
 that X : [0, ) (, ] is a submartingale

Ft : t [0, ) or, equivalently, that X(t), Ft , P is a submartingale if X
is a right-continuous, progressively measurable function with the properties that
X(t) is P-integrable for every t [0, ) and


X(s) EP X(t) Fs (a.s., P) for all 0 s t < .


When both X(t), Ft , P and X(t), Ft , P are
 submartingales,
I will say either
that X is a martingale with respect to Ft : t [0, ) or simply that
X(t), Ft , P is a martingale. Finally, if Z : [0, )
 C is a rightcontinuous, progressively measurable function, then
Z(t), Ft , P is said

 to be a
(complex) martingale if both Re Z(t), Ft , P and Im Z(t), Ft , P are.
The next two results show that Levy processes provide a rich source of continuous parameter martingales.

268

7 Continuous Parameter Martingales

Theorem 7.1.3. Let I(RN ) with


() = e` () , where ` () equals



1 , m RN , C RN +

1(,y)RN

1 1[0,1] (|y|) , y

RN


RN

M (dy).

If (, F, P) is a probability space and Z : [0, ) RN is a B[0,) Fmeasurable map with the properties that Z(0, ) = 0 and Z( , ) D(RN ) for
every , then {Z(t) : t 0} is a Levy process for if and only if, for each
x RN ,


(7.1.4)

exp



1(, Z(t))RN t` () , Ft , P is a martingale,


where Ft = {Z( ) : [0, t]} .
Proof: If {Z(t) : t 0} is a Levy process for , then, because Z(t) Z(s) is
independent of Fs and has characteristic function e(ts)` () ,
h



EP exp 1 , Z(t) RN t` ()

= exp

= exp

1 , Z(s)

1 , Z(s)

RN

RN

i

Fs

h

1
s` () e(st)` () EP e

s` () .

 i

,Z(t)Z(s)

RN

To prove the converse assertion, observe that the defining distributional property
of a Levy process for can be summarized as the statement that Z(0, ) = 0
and, for each 0 s < t, Z(t) Z(s) is independent of {Z( ) : [0, t]} and
has distribution ts , where
c = e ` . Hence, since (7.1.4) implies that
h

  i
EP exp 1 , Z(t) Z(s) RN Fs = e(ts)` () ,

RN ,

there is nothing more to do. 


Another, and often more useful, way to capture the same result is to introduce
the L
evy operator
L (x) =
(7.1.5)



1
Trace C2 (x) + m, (x) RN
2 Z
h
 i
+
(x + y) (x) 1[0,1] (|y|) y, (x) RN M (dy)
RN

for Cb2 (RN ; C).

7.1 Continuous Parameter Martingales

269

Theorem 7.1.6. Assume that I(RN ) and that {Z(t)


: t 0} is a Levy

1,2
N
process for . Then, for every F Cb [0, ) R ; C ,


Z t



F t, Z(t)
+ L F , Z( ) d, Ft , P
0


is a martingale, where Ft = {Z( ) : [0, t]} and L is the operator
described in (7.1.5). Conversely, if Z is a progressively measurable function
satisfying Z(0, ) = 0 and Z( , ) D(RN ) for each , and if


Z t


Z(t)
L Z( ) d, Ft , P
0

is a martingale for each Cc (RN ; R), then {Z(t) : t 0} is a Levy process


for .
Proof: Begin by noting that it suffices to handle the case when F is the
restriction to [0, ) RN of an element of the Schwartz test function space
S (R RN ; C). Indeed, because kL ku CkkCb2 (RN ;C) for some C < ,

the result for F Cb1,2 [0, ) RN ; C follows, via an obvious approximation procedure, from the result for F S (R RN ; C). Next observe that
it suffices to treat F  S (RN ; C). To see this, simply interpret the process
t [0, ) 7 t, Z (t) RN +1 as a Levy process for 1 .
Now let S (RN ; C) be given. The key to proving the required result is the
identity
d
?
t = (L ) ?
t ,
dt

(*)

where
t is the distribution of x under t , the measure determined by bt = et` .
The easiest way to check (*) is to work via Fourier transform and to use (3.2.10)
to verify that
d\
t` ()
()et` () ,
?
t () = ` ()()e

= Ld
dt

which is equivalent to (*). To see how (*) applies, observe that



 

EP Z(t) Fs = ?
ts Z(s) ,
and therefore that, for any A Fs ,
Z t



 

 
EP Z(t) , A] EP Z(s) , A =
EP (L ) ?
s Z(s) , A d
s
Z t

Z t




P

=
E L Z( ) , A d = E
L Z( ) d, A ,
s

270

7 Continuous Parameter Martingales

which, after rearrangement, is the asserted martingale property.


To prove the converse assertion, again begin with the observation that, by
an easy approximation procedure, one can prove the martingale property for all
Cb2 (RN ; C) as soon
as one knows it for Cc (RN ; R). In particular, one

1(,x)RN
, in which case L = ` (), and therefore, for
can take (x) = e
any A Fs , one gets that
Z t
i
h

 
u( ) d.
u(t) EP exp 1(, Z(t) RN N , A = u(s) + ` ()
R

Since this means that u(t) = e(ts)` () u(s), it follows that {Z(t) : t 0}
satisfies (7.1.4) and is therefore a Levy process for . 
As an immediate consequence of the preceding we have the following characterizations of the distribution of a Levy process. In the statement that follows,
Ft is the -algebra over D(RN ) generated by {( ) : [0, t]}.

Theorem 7.1.7. Given I(RN ), let Q M1 D(RN ) be the
 distribution
of a Levy process for . Then Q is the unique P M1 D(RN ) that satisfies
either one of the properties:
i


h

exp 1 , (t) RN + t` () , Ft , P

is a martingale with mean value 1 for each RN ,


or



(t) (0)


L ( ) d, Ft , P

is a martingale with mean value 0 for each Cc (RN ; R).


7.1.3. Basic Results. In this subsection I run through some of the results
from 5.2 that transfer immediately to the continuous parameter setting.
Lemma 7.1.8. Let the interval I and
 the function f : I R {} be as in
Corollary 5.2.10. If either X(t), Ft , P is an I-valued martingale or X(t), Ft , P
is an I-valued submartingale
and f is non-decreasing and bounded below, then

f X(t), Ft , P is a submartingale.
Proof: The fact that the parameter is continuous plays no role here, and so
this result is already covered by the argument in Corollary 5.2.10. 

Theorem 7.1.9 (Doobs Inequality). Let X(t), Ft , P be a submartingale.
Then, for every (0, ) and T [0, ),
"
#
!
1 P
P
sup X(t) E X(T ), sup X(t) .

t[0,T ]
t[0,T ]

7.1 Continuous Parameter Martingales

271

In particular, for non-negative submartingales and T [0, ),


# p1

"
sup X(t)p

EP

t[0,T ]


1
p
EP X(T )p p ,
p1

p (0, ).

Proof: Because of Exercise 1.4.18, I need only prove the first assertion. To
this end, let T (0, ) and
apply Theorem 5.2.1 to the discrete
 n N be given, 
mT
, P , and observe that
parameter submartingale X 2n , F mT
n
2


sup X


mT
2n

: 0m2

% sup X(t)

as n . 

t[0,T ]

Theorem 7.1.10
 (Doobs Martingale Convergence Theorem). Assume
that X(t), Ft , P is a P-integrable submartingale. If


sup EP X(t)+ < ,
t[0,)

W
then there exists an F t0 Ft -measurable X = X() L1 (P; R) to which

X(t) converges P-almost surely as t . Moreover, when X(t), Ft , P is either
a non-negative submartingale or 
a martingale, the convergence takes place in
1
L (P; R) if and only if the family X(t) : t [0, ) is uniformly P-integrable,
in which case X(t) EP [X | Ft ] or X(t) = EP [X | Ft ] (a.s., P) for all t [0, ),
and




1
(7.1.11)
P sup |X(t)| EP |X|, sup |X(t)| .

t0
t0


Finally, again when X(t), Ft , P is either a non-negative submartingale
or a

martingale, for each p (1, ) the family |X(t)|p : t [0, ) is uniformly Pintegrable if and only if supt[0,) kX(t)kLp (P) < , in which case X(t) X
in Lp (P; R).
Proof: To prove the initial convergence assertion, note that, by Theorem
W 5.2.15
applied to the discrete parameter process X(n), Fn , P , there is an nN Fn measurable X L1 (P; R) to which X(n) converges P-almost surely. Hence,
we need only check that limt X(t) exists in [, ] P-almost surely. To
(n)
this end, define U[a,b] () for n N and a < b to be the precise number of



times that the sequence X 2mn , : m N upcrosses the interval [a, b] (cf. the
(n)
paragraph preceding Theorem 5.2.15), observe that U[a,b] () is non-decreasing
(n)

as n increases, and set U[a,b] () = limn U[a,b] (). Note that if U[a,b] () < ,

272

7 Continuous Parameter Martingales

then (by right-continuity), there is an s [0, ) such that either X(t, ) b for
all t s or X(t, ) a for all t s. Hence, we will know that X(t, ) converges

in [, ] for P-almost every as soon as we show that EP U[a,b] <
for every pair a < b. In addition, by (5.2.16), we know that
P

sup E
nN

(n)
U[a,b]



EP (X(t) a)+
< ,
sup
ba
t[0,)

and so the required estimate follows


 from the Monotone Convergence Theorem.
Now assume that X(t), Ft , P is either a non-negative submartingale or a
martingale.
Given the preceding, it is clear that X(t) X in L1 (P; P)

if X(t) : t [0, ) is uniformly P-integrable. Conversely, suppose that
X(t) X in L1 (P; R). Then, for any T [0, ),






|X(T )| lim EP |X(t)| FT = EP |X| FT .

(*)

In particular, from Theorem 7.1.9,


!
P

sup |X(t)|
t[0,T ]

"
#
1 P
E |X|, sup |X(t)|

t[0,T ]

for every T (0, ). Hence, (7.1.11) follows when one lets T . But, again
from (*),






EP |X(T )|, |X(T )| EP |X|, |X(T )| EP |X|, sup |X(t)| ,
t0


and therefore, since, by (7.1.11), P supt0 |X(t)| 0 as , we can
conclude that {X(t) : t 0} is uniformly P-integrable.
Finally, if {X(T ) : T 0} is bounded in Lp (P; R) for some p (1, ), then,
by the last part of Theorem 7.1.9, supt0 |X(t)|p is P-integrable and therefore
X(t) X in Lp (P; R). 
7.1.4. Stopping Times and Stopping Theorems. A stopping time
relative to a non-decreasing family {Ft : t 0} of -algebras is a map :
[0, ] with the property that { t} Ft for every t 0. Given a
stopping time , I will associate with it the -algebra F consisting of those
A such
S that A { t} Ft for every t 0. Note that, because
{ < t} = n=0 { (1 2n )t}, { < t} Ft for all t 0.
Here are a few useful facts about stopping times.
Lemma 7.1.12. Let be a stopping time. Then is F -measurable, and,
for any progressively measurable function X with
values in a measurable space

(E, B), the function
X(, ) X (), is F -measurable on { < } in

7.1 Continuous Parameter Martingales

273



the sense that : () < & X(, ) F for all B. In addition,
f is again a stopping time if f : [0, ] [0, ] is a non-decreasing, rightcontinuous function satisfying f ( ) for all [0, ]. Next, suppose that
1 and 2 are a pair of stopping times. Then 1 + 2 , 1 2 , and 1 2
are all stopping times, and F1 2 F1 F2 . Finally, for any A F1 ,
A {1 2 } F1 2 .
Proof: Since { s} { t} = { s t} Ft , it is clear that is
F -measurable. Next, suppose that X is a progressively measurable
function.


To prove that X() is F -measurable, begin by checking that : (),
A Ft for any A Bt Ft . Indeed, this is obvious when A = [0, s] B for
s [0, t] and B Ft and, since these generate B[0,t] Ft , follows in general.
Now, for any t 0 and B,



A(t, ) (, ) [0, ) : , X(, ) [0, t] B[0,t] Ft ,
and therefore



{X() } { t} = : (), A(t, ) Ft .
As for f when f satisfies the stated conditions, simply note that
{f t} = { f 1 (t)} Ft ,
where f 1 (t) inf{ : f ( ) t} t.
Next suppose that 1 and 2 are two stopping times. It is trivial to see that
1 2 and 1 2 are again stopping times. In addition, if Q denotes the set of
rational numbers, then
[
{1 + 2 > t} = {1 > t}
{1 qt & 2 > (1 q)t} Ft .
qQ[0,1]

Thus, 1 + 2 is a stopping time. To prove the final assertions,


 begin with the
observation that if 1 2 , then A {2 t} = A {1 t} {2 t} Ft
for all A F1 and t 0, and therefore F1 F2 . Next, for any 1 and 2 ,
{1 2 } F2 since
[
{1 > 2 } {2 t} =
{1 > qt} {2 qt} Ft .
qQ[0,1]

Finally, if A F1 , then


A {1 2 } {1 2 t} = A {1 t} {1 t 2 },
and therefore, since A {1 t} Ft and {1 t 2 } Ft2 Ft , we have
that A {1 2 } F1 2 . 
In order to prove the continuous parameter analog of Theorems 5.2.13 and
5.2.11, I will need the following uniform integrability result.

274

7 Continuous Parameter Martingales


Lemma 7.1.13. If X(t), Ft , P is either a martingale or a non-negative, integrable submartingale, then, for each T > 0, the set


X() : is a stopping time dominated by T
is uniformly P-integrable. Furthermore, if, in addition, {X(t) : t 0} is uniformly
7.1.10) X() = limt X(t) (a.s., P),
 P-integrable and (cf. Theorem

then X() : is a stopping time is uniformly P-integrable.

Proof: Throughout, without loss in generality, I will assume that X(t), Ft , P
is a non-negative, integrable submartingale.
n
for n 0. By Lemma 7.1.12,
Given a stopping time T , define n = [2 2]+1
n
n is again a stopping time. Thus, by Theorem
5.2.13
applied to the discrete

parameter submartingale X(m2n ), Fm2n , P ,






X(n ) EP X 2n ([2n T ] + 1) Fn EP X(T + 1) Fn ,

and so




EP X(n ), X(n ) EP X(T + 1), X(n )
"
P

X(T + 1),

sup

X(t) .

t[0,T +1]

Starting from here, noting that n & as n , and applying Fatous Lemma,
we arrive at
"
#


(*)
EP X(), X() > EP X(T + 1), sup X(t) .
t[0,T +1]


Hence, since, by Theorem 7.1.9, P supt[0,T +1] X(t) tends to 0 as ,
this proves the first assertion. When {X(t) : t 0} is uniformly integrable, we
can replace (*) by




P
P
E X( T ), X( T ) > E X(), sup X(t)
t0

for any stopping time and T > 0. Hence, after another application of Fatous
Lemma, we get




P
P
E X(), X() > E X(), sup X(t) .
t0

At the same time, the first inequality in Theorem 7.1.9 can be replaced by




1
1 P
P sup X(t) E X(), sup X(t) EP [X()],

t0
t0

and so the asserted uniform integrability follows. 


It turns out that in the continuous time context, Doobs Stopping Time Theorem is most easily seen as a corollary of Hunts. Thus, I will begin with Hunts.

7.1 Continuous Parameter Martingales

275

Theorem 7.1.14 (Hunt).


Stopping Time Theorem, Hunts continuous pa
rameter Let X(t), Ft , P be either a non-negative, integrable submartingale
or a martingale.
If 1 and 2 are bounded stopping times and 1 2 , then

P
X(1 ) E X(2 ) F1 , and equality holds in the martingale case. Moreover,
when {X(t) : t 0} is uniformly P-integrable and X() limt X(t), then
the same result holds for arbitrary stopping times 1 2 .

Proof: Given 1 2 T , define (i )n = 2n [2n i ] + 1 for n 0, note that
(i )n is a {Fm2n : m 0}-stopping time and that F1 F(1 )n , and apply
Theorem 5.2.13 to the discrete parameter submartingale X(m2n , Fm2n , P
in order to see that
h
h
 i
 i
EP X (1 )n , A EP X (2 )n , A , A F1 ,
with equality in the martingale case. Because of right-continuity and Lemma
7.1.13,
X (i )n X(i ) in L1 (P; R), and so we have now shown that X(1 )

P
E X(2 ) F1 , with equality in the martingale case.
When {X(t) : t 0} is uniformly P-integrable and 1 2 are unbounded,
{X(i T ) : T 0} is uniformly P-integrable for i {1, 2}. Hence, for any
A F1 and 0 t T ,




EP X(T 1 ), A {1 t} EP X(T 2 ), A {1 t} ,
with equality in the martingale case. Letting first T and then t tend to infinity,
one gets the same relationship for X(1 ) and X(2 ), initially with A {1 < }
and then, trivially, with A alone. 

Theorem 7.1.15 (Doobs Stopping Time Theorem). If X(t), Ft , P is
either a non-negative, integrable submartingale
or a martingale, then, for every

stopping time , X(t ), Ft , P is either an integrable submartingale or a
martingale.
Proof: Given 0 s < t and A Fs , note that A { > s} Fs and
therefore, by Hunts Theorem applied to the stopping times s and t , that






EP X(t ), A = EP X(), A { s} + EP X(t ), A { > s}






EP X(), A { s} + EP X(s ), A { > s} = EP X(s ), A ,
where the inequality is an equality in the martingale case. 
To demonstrate just how powerful these results are, I give the following extension of the independent increment property of Levy processes. In its statement, the maps t : D(RN ) D(RN ) for t [0, ) are defined so that
t ( ) = ( + t) (t),
Ft = {( ) : [0, t]} , is a
 [0, ). Also,

stopping time relative to Ft : t [0, ) , and is the map on { : () < }
into D(RN ) given by = () .

276

7 Continuous Parameter Martingales


Theorem 7.1.16. Given I(RN ), let Q M1 D(RN ) be the distribution
of the Levy process for . Then, for each stopping time and FD(RN ) F measurable functions F : D(RN ) D(RN ) [0, ),
Z
ZZ


F , Q (d) =
1[0,) ( 0 ) F (, 0 ) Q (d)Q (d 0 ).
{<}

Proof: By elementary measure theory, all that we


 have to show is that, for
each B F contained in { < }, Q (1 ) B = Q ()Q (B).
Given B F contained in { < } with Q (B) > 0, choose T > 0 so that
Q (BT ) > 0 when BT = B { T }, and define QT M1 D(RN ) so that
QT () =

Q (1 ) BT

Q (BT )

If we show that QT = Q , then we will know that




Q (1 ) B = lim Q (1 ) BT
T

= Q () lim Q (BT ) = Q ()Q (B)


T

and therefore will be done.


By Theorem 7.1.6, checking that QT = Q comes down to showing that, for
any 0 s < t, RN , and A Fs ,





EQT e 1(x,(t))RN t` () , A = EQT e 1(x,(s))RN s` () , A .
To this end, note that, by Theorem 7.1.14 applied to s + T and t + T ,



Q (BT )EQT e 1(x,(t))RN t` () , A
i
h

= EQ e 1(,())RN +` () e 1(,(t+))RN (t+)` () , (1 A) BT


i
h

= EQ e 1(,())RN +` () e 1(,(s+))RN (s+)` () , (1 A) BT




= Q (BT )EQT e 1(x,(s))RN s` () , A ,

since
e 1(,())RN +` () 1A ( )1BT () is Fs+T -measurable. 
7.1.5. An Integration by Parts Formula. In this subsection I will derive
a simple result that has many interesting applications.

Theorem 7.1.17. Suppose V : [0, ) C is a right-continuous, progressively measurable


 function, and let |V |(t, ) [0, ] denote the total variation
var[0,t] V ( , ) of V ( , ) on the interval [0, t]. Then |V | : [0, ) [0, ]
is a non-decreasing, progressively measurable function that is right-continuous

7.1 Continuous Parameter Martingales

277


on each interval [0, t) for which |V |(t, ) < . Next, suppose that X(t), Ft , P
is a C-valued martingale with the property that, for each (t, ) (0, ) ,
the product kX( , )k[0,t] |V |(t, ) < , and define
(R
Y (t, ) =

(0,t]

X(s, ) V (ds, ) if |V |(t, ) <

otherwise,

where, in the case when |V |(t, ) < , the integral is the Lebesgue integral of
X( , ) on [0, t] with respect to the C-valued measure determined by V ( , ).
If
h


i
EP kXk[0,T ] |V |(T ) + V (0) < for all T (0, ),

then X(t)V (t) Y (t), Ft , P is a martingale.
Proof: Without loss in generality,
I will assume
that both X and V are R

valued. To see that |V | is Ft : t [0, ) -progressively measurable, simply
observe that, by right-continuity,
[2n t]

|V |(t, ) = sup
nN

X
V

k+1
2n


t, V

k
2n ,


;

k=0

and to see that |V |( , ) is right-continuous on [0, t) whenever |V |(t, ) < ,


recall that the magnitude of the jumps (from the right and left) of the variation
of a function coincide with those of the function 
itself.

I turn now to the second part. Certainly Y is Ft : t [0, ) -progressively
measurable. In addition, because kX( , )k[0,t] |V |(t, ) < for all (t, )
[0, ) , for any one has that
Z
Y (t, ) = 0

or Y (t, ) =

X(s, ) V (ds, )

for all t [0, );

(0,t]

and so, in either case, Y ( , ) is right-continuous and Y (t, ) Y (s, ) can be


computed as
[2n t]

lim

k+1
2n

t,



k+1
2n


t, V

k
2n

s,



k=[2n s]

In fact, under the stated integrability condition, the convergence in the preceding
takes place in L1 (P; R) for every t [0, ); and therefore, for any 0 s t <

278

7 Continuous Parameter Martingales

and A Fs ,


EP Y (t) Y (s), A
[2n t]

= lim

h
EP X

k+1
2n

t,

h

EP X(t) V

k+1
2n



k+1
2n


t, V

k
2n

s,



,A

k=[2n s]

[2n t]

= lim


t, V

k
2n

s,



,A

k=[2n s]

h
i
 i
= EP X(t) V (t) V (s) , A = EP X(t)V (t) X(s)V (s), A ,
and clearly this is equivalent to the asserted martingale property. 
We will make frequent practical applications of Theorem 7.1.17 later, but
here I will show that it enables us to prove that there is an important dichotomy
between continuous martingales and functions of bounded variation. However,
before doing so, I need to make a small, technical digression.
 A function : [0, ] is an extended stopping time relative to
Ft : t [0, ) if { < t} Ft for every t (0, ). Since { < t} Ft for any
stopping time , it is clear that every stopping time is an extended stopping time.
On the other hand, not every extended stopping time is a stopping time. To wit,
if X : [0, ) R is a right-continuous,

progressively measurable function
relative to
X( ) : [0, t] : t 0 , then = inf{t 0 : X(t) > 1} will
always be an extended stopping time but will seldom be a stopping time.
T
Lemma 7.1.18. For each t 0, set Ft+ = >t F . Then : [0, ]
is an extended stopping time if and only if it is a stopping time relative to
{Ft+ : t 0}. Moreover, if X(t), Ft , P is either a non-negative,
integrable

submartingale or a martingale, then so is X(t), Ft+ , P . In particular, if is
an extended stopping time, then X(t ), Ft+ , P is a non-negative, integrable
submartingale or a martingale.
T
Proof: The first assertion is immediate from { t} = >t { < }. To prove
the second assertion, apply right-continuity and the first uniform integrability
result in Lemma 7.1.13 to see that if 0 s < t and A Fs+ , then






EP X(s), A = lim EP X( ), A EP X(t), A ,
&s

where the inequality is an equality in the martingale case. 



Theorem 7.1.19. Suppose that
X(t), Ft , P is a continuous martingale, and

let |X|(t, ) = var[0,t] X( , ) denote the variation of X( , )  [0, t]. Then

P t > 0 0 < |X|(t, ) < = 0.
Equivalently, for P-almost every and all t > 0, either X(, ) = X(0, ) for
[0, t] or |X|(t, ) = .

7.1 Continuous Parameter Martingales

279

Proof: Without loss in generality, I will assume that X(0, ) 0. Given


R > 0, let R () = sup{t 0 : |X|(t, ) R}, and set XR (t) = X(t R ).
Then R is an extended stopping time, and so, by Lemma 7.1.18, (XR (t), Ft+ , P)
is a bounded martingale. Hence, by Theorem 7.1.17,


XR (t)


XR ( ) XR (d ), Ft+ , P

is also a martingale, and so




EP XR (t)2 = EP

Z


XR ( ) XR (d ) .

On the other hand, since XR ( ) is continuous, and therefore, by Fubinis Theorem,


ZZ
Z t
XR (t)2 =
XR (d1 )XR (d2 ) = 2
XR ( ) XR (d ),
0

[0,t]2

we also know that




E XR (t)2 = 2EP

Z


XR ( ) XR (d ) .



Hence, EP XR (t)2 = 0 for all t > 0, which means that XR ( ) 0 P-almost
surely. 
The preceding result leads immediately to the following analog of the uniqueness statement in Lemma 5.2.12.
Corollary 7.1.20. Let X : R be a right-continuous, progressively
measurable function. Then, up to a P-null set, there is at most one continuous,
progressively measurable A : R such that A(0, ) = 0, A( , ) is of
locally bounded variation for P-almost every , and X(t) A(t), Ft , P is
a martingale.
The role of continuity here seems minor, but it is crucial. Namely, continuity was used in Theorem 7.1.19 only when I wanted to know that XR (t)2 =
Rt
2 0 XR ( ) XR (d ). On the other hand, it is critical.
 Namely, if {N (t)
 : t 0}
is the simple Poisson process in 4.2 and Ft = N ( ) : [0, t] , then it
is easy to check that N (t) t, Ft , P is a martingale, all of whose paths are of
locally bounded variation.

280

7 Continuous Parameter Martingales


Exercises for 7.1

Exercise 7.1.21. The definition of stopping times and their associated algebras that I have adopted is due to E.B. Dynkin. Earlier, less ubiquitous
but more transparent, definitions appear in the work of Doob and Hunt under
the name of optional stopping times. To explain these earlier definitions, let E
be a complete, separable metric space and a non-empty collection of rightcontinuous paths : [0, ) E with the property that for all and
t [0, ), the stopped path t given by t ( ) = (t ) is again in . Similarly,
given a function : [0, ], define so that (t) = t () . Finally,
for each t [0, ), define the -algebras
Ft over to be the one generated
W
by {( ) : [0, t]}, and take F = t0 Ft . In terms of these quantities, an
optional stopping time is an F-measurable map : [0, ] such that

() t = () = ( t ), and the associated -algebra is { (t) : t 0} .
The goal of this exercise is to show that is an optional stopping time if and
only if it is a stopping time and that its associated -algebra is F .
(i) It is an easy matter (cf. Exercise 4.1.9) to check that f : R is F+
+
measurable if and only if there exists a B Z -measurable F : E Z  R and a
sequence {tm : m Z+ } such that f () = F (t1 ), . . . , (tm ), . . . , from which
it is clear that an F-measurable f will be Ft -measurable for some t [0, ) if
and only if f () = f ( t ). Use this to show that every optional stopping time is
a stopping time.


(ii) Show that : [0, ] is a stopping time relative to Ft : t [0, )
if and only if it is F-measurable and, for each t [0, ), { : () t} =
{ : ( t ) t}. In addition, if is a stopping time, show that () < =
() = ( ), and therefore that () t = () = ( t ) for all t [0, ).
Thus, is an optional stopping time if and only if it is a stopping time.
Hint: In proving the second
part, check that { = t} Ft , and conclude that

1{t} () = 1{t} ( t ) for all (t, ) [0, ) .

(iii) If is a stopping time, show that F = { (t) : t 0} . Besides having
intuitive value, this shows that, at least in the situation here, F is countably
generated.
Hint: Using right-continuity, first show that
is F-measurable. Next,
given a B-measurable f : E R and t [0, ), use (ii) to show that


 

1[0,t] () f ( ) = 1[0,t] ( t ) f ( ( t ) , [0, ),

and conclude that { (t) : t 0} F . To prove the opposite inclusion, show that if f :  R is F -measurable, then, for each t [0, ),
1{t} () f () = 1{t} ( t ) f ( t ), and thereby arrive at f () = f ( ). Fi
nally, use this together with Exercise 4.1.9 to show that f is { (t) : t 0} measurable.

Exercises for 7.1

281



Exercise 7.1.22. Let (, F, P) be a probability space and Ft : t [0, )
is non-decreasing family of sub--algebras of F. Denoteby F and Ft the completions of F and Ft with respect to P. If X(t), Ft , P is a submartingale or

martingale, show that X(t), Ft , P is also.

Exercise 7.1.23. Let I(RN ) be given as in Exercise 3.2.23, and extend `


to CN accordingly. If {Z(t) : t 0} is a Levy process for , show that (7.1.4)
continues to hold for all CN .
Exercise 7.1.24. In Exercise 3.3.12, we discussed one-sided
 stable laws, and

in Exercise 4.3.12 we showed that P max [0,t] B( ) a = 2P B(t) a ,
where {B(t) : t 0} is an R-valued Brownian motion. In this exercise, we will
examine the relationship between these two.
(i) Set a () = inf{t 0 : (t) a}, and show that the result in Exercise
4.3.12 can be rewritten as
r
W

(1)


t =

y2
2

dy.

at 2

Now use the results in Exercise 3.3.14 (especially (3.3.16)) to conclude that the
1

W (1) -distribution of a is 21 , the one-sided 12 -stable law at time 2 2 a.


22

(ii) Here is another, more conceptual way to understand the conclusion drawn
in (i) that the W (1) -distribution is a one-sided 12 -stable law. Namely, begin
 by
a
a+b
a
b
a
showing that if (0) = 0 and () < , then
() = () + . As
an application of Theorem 7.1.16, conclude from this that if a denotes the W (1) distribution of a , then a+b = a ? b . In particular, this means that 1
ca = ea` , where ` is the exponent appearing in
is infinitely divisible and that

the LevyKhinchine formula for .

(iii) Next, use Brownian scaling to see that, for all > 0, a has the same W (1) distribution as 2 a , and use this together with part (iii) of Exercise 3.3.12 to
1

see that the distribution of 1 is c2 for some c > 0.


1

(iv) Although we know from (i) that the constant c must be 2 2 , here is an

1 2
easier way to find it. Use Exercise 7.1.23 to see that e(t) 2 t , Ft , W (1)
for every R, and apply Doobs Stopping Time Theorem and the fact that
(1) 
1 2 a
W (1) ( a < ) = 1 to verify the identity EW e 2 = ea for > 0.
1

Hence, the Laplace transform of c2 is e 2 , which, by the calculation in part


1
(iii) of Exercise 3.3.12, means that c = 2 2 . Of course, this calculation makes
the preceding parts of this exercise unnecessary. Nonetheless, it is interesting to
see the Brownian explanation for the properties of the one-sided, 12 -stable laws.

282

7 Continuous Parameter Martingales

Exercise 7.1.25. An important corollary of Theorem 7.1.16 is the following


formula. Working in the setting of that theorem, show that, for any stopping
time and t (0, ) and BRN ,
h
i



Q { : (t) & () t} = EQ t () , t ,
where, as usual, is determined by
c = e ` . As a consequence,
h
i



Q { : (t) & () > t} = t () EQ t () , t ,


which is a quite general, generic statement of what is called Duhamels Formula.
7.2 Brownian Motion and Martingales
In this section we will see that continuous martingales and Brownian motion are
intimately related concepts. In addition, we will find that martingale theory,
and especially Doobs and Hunts Stopping Time Theorems, provides a powerful
tool with which to study Brownian paths.
7.2.1. L
evys Characterization of Brownian Motion. When applied
to = 0,I , Theorem 7.1.6 says that a progressively measurable function B :
[0, ) RN with B(0, ) = 0 and B( , ) D(RN ) is a Brownian motion
if and only if


Z t


1

B(
)
d,
F
,
P
B(t)
t
2
0

is a martingale for all Cc (RN ; R). In this subsection, I, following Levy,1 will
give another martingale characterization of Brownian motion, this time involving
many fewer test functions. On the other hand, we will have to assume ahead of
time that B( , ) C(RN ) for every .
Theorem 7.2.1 (L
evy). Let B : [0, ) RN be a progressively measurable function satisfying
B(0, ) = 0 and B( , ) C(RN ) for every .

Then B(t), Ft , P is a Brownian motion if and only if


, B(t)


RN


2
t||2
, Ft , P
+ , B(t) RN
2

is a martingale for every , RN .


1

L
evys Theorem is Theorem 11.9 in Chapter VII of Doobs Stochastic Processes, Wiley (1953).
Doob uses a clever but somewhat opaque Central Limit argument. The argument given here
is far simpler and is adapted from the one introduced by H. Kunita and S. Watanabe in their
article On square integrable martingales, Nagoya Math. J. 30 (1967).

7.2 Brownian Motion and Martingales

283


Proof: First suppose that B(t), Ft , P is a Brownian motion. Then, because
B(t) B(s) is independent of Fs and has distribution 0,I ,




EP B(t) B(s) Fs = 0 and EP B(t) B(t) B(s) B(s) Fs = (t s)I.
Hence, the necessity is obvious.
To prove the sufficiency, Theorem 7.1.3 says that it is enough to prove that

i 
h

t||2
P
E exp 1 , B(t) RN + 2 , A

(*)
i 
h

s||2
P
= E exp 1 , B(s) RN + 2 , A

for 0 s < t and A Fs . The challenge is to learn how to do this by taking


full advantage of the assumed continuity. To this end, let  (0, 1] be given, set
0 s, and use induction to define
 n
o



n = inf t n1 : B(t) B n1  n1 +  t
for n Z+ . Proceeding by induction, one can easily check that {n : n 0}
is a non-decreasing sequence of [s, t]-valued stopping times. Hence, by Theorem
7.1.14 and our assumption,


h
i
h
i


(**)
EP n Fn1 = 0 = EP 2n n Fn1 ,
where



n () , B n (), B n1 (),
RN

2
n () || n () n1 () .
Moreover, because B( , ) is continuous, we know that, for each , |n ()|
||, n () ||2 , and n () = t for all but a finite number of ns. In
particular, we can write the difference
between the left and the right sides of (*)

+
P
as the sum over n Z of E Dn Mn , A], where

i
h
Dn exp 1 n + 2n 1
i
h

2
Mn exp 1 , B(n1 ) RN + ||2 n1 .

By Taylors Theorem,



1 n +
Dn

n
2

1
2

1 n +

n
2


2
1 ||2 2
6 e 1 n +

3
n
2 .

284

7 Continuous Parameter Martingales

Hence, after rearranging terms, we see that Dn = 1 n 12 2n n + En ,


where, by our estimates on n and n ,



 ||2
||2
3
2
|En | 12 |n n | + 8n + 23 e 2 |n |3 + 8n  1 + ||2 e 2 2n + n ;

and so, after taking (**) into account,we arrive at




X
X









EP Dn Mm , A =
EP En Mn , A




1


2 1 + ||2 e

||2
2




||2
EP n |Mn |, A 2 1 + ||2 (t s)e 2 (1+t) .

In other words, we have now proved that, for every  (0, 1], the difference
||2

between the two sides of (*) is dominated by 2(1 + ||2 )(t s)e 2 (1+t) , and so
the equality in (*) has been established. 
As in Theorem 7.1.19, the subtlety here is in the use of the continuity assumption. Indeed, the same example that demonstrated its importance there,
does so again here. Namely, if {N (t) : t 0} is a simple Poisson
process and

X(t) = N (t) t, then both X(t), Ft , P and X(t)2 t, Ft , P are martingales,
but X(t), Ft , P is certainly not a Brownian motion.
7.2.2. DoobMeyer Decomposition, an Easy Case. The continuous parameter analog of Lemma 5.2.12 is a highly non-trivial result, one that was
proved by P.A. Meyer and led him to his profound analysis of stochastic processes. Nonetheless, there is an important case in which Meyers result is relatively easy to prove, and that is the case proved in this subsection. However,
before getting to that result, there is a rather fussy matter to be dealt with.

Lemma 7.2.2. For each n N, let Xn : [0, ) R be a right-continuous,


progressively measurable function with the property that Xn ( , ) is continuous
for P-almost every . If
lim sup kXn ( , ) Xm ( , )k[0,t] = 0 (a.s., P) for each t (0, ),

m n>m

then there is a right-continuous, progressively measurable X : [0, ) R such


that X( , ) is continuous and Xn ( , ) X( , ) uniformly on compacts for
P-almost every .
Proof: Set A = {(t, ) : limm supn>m kXn ( , ) Xm ( , )k[0,t] = 0}.
Then A is progressively measurable. Next, define () = sup{t 0 : (t, ) A},
note that { < t} Ft for each t (0, ), and set B = {(t, ) : () t}.
Then, B is again progressively measurable. To see this, first note that

Ft
if t s
{(, ) [0, t] : () < s} =
{ < s} Ft if t > s,

7.2 Brownian Motion and Martingales

285

and so (, )
() and therefore also (, )
() are progressively
measurable functions. Hence, since B = {(, ) : () 0}, B is
progressively measurable.
Now define

limn Xn (t, ) if (t, ) A


X(t, ) = 0
if (t, ) B \ A

X (),
if (t, )
/ B.
Clearly X( , ) is right-continuous. Moreover, because = (a.s., P), X( , )
is continuous and Xn ( , ) X( , ) uniformly on compacts for P-almost
every . Thus, it only remains to check that X is progressively measurable.
For this purpose, let BR be given, and set C = {(t, ) : X(t, ) }.
Because A and the Xn s are progressively measurable, it is clear that C A is
progressively measurable. Similarly, because B \ A is progressively measurable
and C (B \ A) equals B \ A or depending on whether 0 or 0
/ ,
C (B \ A), and therefore C B, are progressively measurable. Hence, we
now know that X  B is progressively measurable. Finally, we showed earlier
that (t, )
t () is progressively
measurable, and therefore so is (t,)

[0, ) 7 t (), B. Thus, because X(t, ) = X t (), , we
are done. 

Theorem 7.2.3. Let X(t), Ft , P be an R-valued, square integrable martingale with the property that X( , ) is continuous for P-almost every
. Then there is a P-almost surely unique progressively measurable function
hXi : [0, ) [0, ) such that hXi(0, ) = 0 and hXi( , ) is continuous

and non-decreasing for P-almost every , and X(t)2 hXi(t), Ft , P is a
martingale.
Proof: The uniqueness is an immediate consequence of Corollary 7.1.20.
The proof of existence, which is based on a suggestion I got from K. Ito, is
very much like that of Theorem 7.2.1. Without loss in generality, I will assume
that X(0) 0.
I begin by reducing to the case when X is P-almost surely bounded. To this
end, suppose that we know the result in this case. Given a general X and n N,
define n = inf{t 0 : |X(t)| n} and Xn (t) = X(tn ). Then, |Xn ( , )| n
and, by Doobs Inequality, n ()
 % for P-almost every . Moreover, by
Corollary 7.1.15, Xn (t), Ft , P is a martingale. Thus, by our assumption, for
each n, we know hXn i exists. In addition, by Corollary 7.1.15 and uniqueness,
we know (cf. Exercise 7.2.10) that, P-almost surely, hXm i(t) = hXn i(t m ) for
all m n and t 0. Now define hXi so that hXi(t) = hXn i(t) for n t < n+1 .
Then hXi is progressively measurable and right-continuous, hXi(0) = 0, and,
P-almost surely, hXi is continuous and non-decreasing. Furthermore, X(t

286

7 Continuous Parameter Martingales


n )2 hXi(t n ), Ft , P is a martingale for each n N. Finally, note that, by
Doobs Inequality,






EP khXik[0,t] EP kXk2[0,t] 4EP |X(t)|2 ,
and so, as n , X(t n)2 hXi(t n ) X(t)2 hXi(t) in L1 (P; R).
Hence, X(t)2 hXi(t), Ft , P is a martingale.
I now assume that |X( , )| C < for P-almost every . For each
n N, use induction to define {k,n : k 0} so that 0,n = 0, k,0 = k, and, for
(k, n) (Z+ )2 , k,n is equal to


inf{`,n1 : `,n1 > k1,n }
 

inf t k1,n : (t k1,n ) |X(t) X(k1,n )| n1 .

Working by induction, one sees that, for each n N, {k,n : k 0} is a nondecreasing sequence of bounded stopping times. Moreover, because X( , ) is
P-almost surely continuous, we know that, for each n N, limk k,n () =
P-almost every . Finally, the sequences {k,n : k 0} are nested in the
sense that {k,n1 : k 0} {k,n : k 0} for each n Z+ .

Set Xk,n = X k,n ) and, for k 1, k,n (t) = X t k,n X t k1,n .
Then X(t)2 = 2Mn (t) + hXin (t), where
Mn (t) =

X
k=1

Xk1,n k,n (t)

and hXin (t) =

k,n (t)2 .

k=1

Of course, for P-almost every , all but a finite number of terms in each of
these sums vanish. In addition, one should observe that hXin (s) hXin (t) if
s 0 and t s > n1 .

I now want to show that Mn (t), Ft , P is a P-almost surely continuous martingale for all n N, and the first step is to show for each (k, n) Z+ N,
Xk1,n k,n (t), Ft , P is a P-almost surely continuous martingale. Indeed, if
0 s < t and A Fs , then




EP Xk1,n k,n (t), A = EP Xk1,n k,n (t), A {k1,n s}


+ EP Xk1,n k,n (t), A {k1,n > s} .

Next, check that




EP Xk1,n k,n (t), A {k1,n s}



= EP Xk1,n X(k,n ) X(k1,n ) , A {k,n s}
h

i


+ EP Xk1,n X (t k,n ) s X(k1,n ) , A {k1,n s < k,n }


= EP Xk1,n k,n (s), A {k,n s}



+ EP Xk1,n X(s) X(k1,n ) , A {k1,n s < k,n }


= EP Xk1,n k,n (s), A {k1,n s} ,

7.2 Brownian Motion and Martingales

287

where, in the passage to the second to last equality, I have used the fact that
Xk1,n 1A 1[k1,n ,k,n ) (s) is Fs -measurable and applied Theorem 7.1.14. At the
same time


EP Xk1,n k,n (t), A {k1,n > s}



= EP Xk1,n X(t k,n ) X(t k1,n ) , A {s < k1,n t}



= EP Xk1,n X(t) X(t) , A {s < k1,n t}


= 0 = EP Xk1,n k,n (s), A {k1,n > s} ,
where I have used the fact that Xk1,n 1A 1(s,t] (k1,n ) is Ftk1,n -measurable
and again applied Theorem 7.1.14
 in getting the second

to last line. After

P
combining these, one sees that EP Xk1,n

(t),
A
=
E
Xk1,n k,n (s), A ,
k,n

which means that Xk1,n k,n (t), Ft , P is a P-almost surely continuous martingale.

Given the preceding, it is clear that, for each n and `, Mn (t `,n ), Ft , P
is a P-almost surely continuous, square integrable martingale. In addition, for
k 6= k 0 , Xk1 k,n (t `,n ) is orthogonal to Xk0 1 k0 ,n (t `,n ) in L2 (P; R).
Thus
"
#


P
2
E
sup
Mn ( ) 4EP Mn (t `,n )2
0 t`,n

=4

`
X

`
X
 2



EP Xk1,n
k,n (t `,n )2 4C 2
EP k,n (t `,n )2

k=1

2 P

k=1
2

= 4C E X(t `,n )


4C E X(t) ,
P


from which it is easy to see that Mn (t), Ft , P is a P-almost surely continuous,
square integrable martingale.
I will now show that limm supn>m kMn Mm k[0,t] = 0 P-almost surely
(m)

(m)

and in L2 (P; R) for each t [0, ). To this end, define Yk1,n so that Yk1,n ()
(m)

= Xk1,n () X`1,m () when `1,m () k1,n () < `,m (). Then Yk1,n
P
(m)
(m)
1
(a.s., P), and Mn Mm = k=1 Yk1,n k,n .
is Fk1,n -measurable, |Yk1,n | m
Hence, by the same reasoning as above,

X




 (m)

4
EP kMn Mm k2[0,t] 4
EP (Yk1,n )2 k,n (t)2 2 EP X(t)2 ,
m
k=1

which is more than enough to get the asserted convergence result.


We can now apply Lemma 7.2.2 to produce a right-continuous, progressively
measurable M : [0, ) R which is P-almost surely continuous and to

288

7 Continuous Parameter Martingales

which {Mn : n 1} converges uniformly on


 compacts, both P-almost surely
and in L2 (P; R). In particular, M (t), Ft , P is a square integrable martingale.
Finally, set hXi = (X 2 2M )+ . Obviously, hXi = X 2 2M (a.s., P), and hXi is
right-continuous, progressively measurable, and P-almost surely continuous. In
addition, because, P-almost surely, hXin hXi uniformly on compacts and
hXin (s) hXin (t) when t s > n1 , it follows that hXi( , ) is non-decreasing
for P-almost every . 

Remark 7.2.4. The reader may be wondering why I chose to complicate the
preceding statement and proof by insisting that hXibe progressively measurable

with respect to the original family of -algebras Ft : t [0, ) . Indeed,
Exercise 7.1.22 shows that I could have replaced all the -algebras with their
completions, and, if I had done so, there would have been no reason not to
have taken X( , ) to be continuous and hXi( , ) to be continuous and nondecreasing for every . However, there is a price to be paid for completing
-algebras. In the first place, when one does, all statements become dependent
on the particular P with which one is dealing. Secondly, because completed algebras are nearly never countably generated, certain desirable properties can
be lost by introducing them. See, for example, Theorem 9.2.1.
By combining Theorem 7.2.3 with Theorem 7.2.1, one can show that, up to
time re-parametrization, all continuous martingales are Brownian motions. In
order to avoid technical difficulties, I will prove this result only in the simplest
case.

Corollary 7.2.5. Let X(t), Ft , P be a continuous, square integrable martingale with the properties that, for P-almost every , hXi( , ) is strictly
increasing and
exists a Brownian motion
 limt hXi(t, ) = . Then there

B(t), Ft0 , P such that X(t) = X(0) + B hXi(t) , t [0, ) P-almost surely. In
particular,

X(t)
X(t)
= 1 = lim q
lim q
t
2hXi(t) log(2) hXi(t)
2hXi(t) log(2) hXi(t)

P-almost surely.
Proof: Clearly, given the first part, the last assertion is a trivial application of
Exercise 4.3.15.
After replacing F and the Ft s by their completions and applying Exercise
7.1.22, I may and will assume that X(0, ) = 0, X( , ) is continuous, hXi( , )
is continuous and strictly increasing, and limt hXi(t, ) = for every .
Next, for each (t, ) [0, ), set t () = hXi1 (t, ), where hXi1 ( , ) is the
inverse of hXi( , ). Clearly, for each , t
t () is a continuous, strictly
increasing function that tends to infinity as t . Moreover, because hXi is
progressively measurable, t is a stopping time for each t [0, ). Now set

7.2 Brownian Motion and Martingales

289


B(t) = X(t ). Since it is obvious that X(t) = B hXi(t) , all that I have to
show is that B(t), Ft0 , P is a Brownian motion for some non-decreasing family
{Ft0 : t 0} of sub--algebras.
Trivially, B(0, ) = 0 and B( , ) is continuous for all . In addition, B(t) is Ft -measurable, and so B is progressively measurable with respect
to {Ft : t  0}. Thus, by Theorem
7.2.1, I will be done once I show that

2
B(t), Ft , P and B(t) t, Ft , P are martingales. To this end, first observe
that
"
#
"
#
EP

sup X( )2 = lim EP
[0,t ]

sup

X( )2

[0,T t ]





4 lim EP X(T t )2 4 lim EP hXi(T t ) 4t.
T

Thus, limT X(T t ) B(t) in L2 (P; R). Now let 0 s < t and A Fs
be given. Then, for each T > 0, AT A {s T } FT s , and so, by
Theorem 7.1.14,




EP X(T t ), AT = EP X(T s ), AT
and




EP X(T t )2 hXi(T t ), AT = EP X(T s )2 hXi(T s ), AT .
Now let T , and apply the preceding convergence assertion to get the
desired conclusion. 
7.2.3. Burkholders Inequality Again. In this subsection we will see what
Burkholders Inequality looks like in the continuous parameter setting, a result
whose importance for the theory of stochastic integration is hard to overstate.

Theorem 7.2.6 (Burkholder). Let X(t), Ft , P be a P-almost surely continuous, square integrable martingale. Then, for each p (1, ) and t [0, )
(cf. (6.3.2)),

p1
(7.2.7) Bp1 kX(t) X(0)kLp (P;R) EP hX(t)i 2 p Bp kX(t) X(0)kLp (P;R) .

Proof: After completing the -algebras if necessary, I may (cf. Exercise 7.1.22)
and will assume that X( , ) is continuous and that hXi( , ) is continuous and
non-decreasing for every . In addition, I may and will assume that X(0) =
0. Finally, I will assume that X is bounded. To justify this last assumption, let
n = inf{t 0 : |X(t)| n}, set Xn (t) = X(t n ), and use Exercise 7.2.10 to
see that one can take hXn i = hXi(t n ). Hence, if we know (7.2.7) for bounded
martingales, then

p1
Bp1 kX(t n )kLp (P;R) EP hXi(t n ) 2 p Bp kX(t n )kLp (P;R)

290

7 Continuous Parameter Martingales

for all n 1. Since hXi is non-decreasing, we can apply Fatous Lemma to the
preceding and thereby get


p1
kX(t)kLp (P;R) lim kX(t n )kLp (P;R) Bp EP hXi(t) 2 p ,
n

which is the left-hand side of (7.2.7). To get the right-hand side, note that either
kX(t)kLp (P;R) = , in which case there is nothing to do, or kX(t)kLp (P;R) < ,
in which case, by the second half of Theorem 7.1.9, X(t n ) X(t) in
Lp (P; R) and therefore



p1
p1
EP hXi(t) 2 p = lim EP hXi(t n ) 2 p
n

Bp lim kX(t n )kLp (P;R) = Bp kX(t)kLp (P;R) .


n

Proceeding under the above assumptions and referring to the notation in the
proof of Theorem 7.2.3, begin by observing that, for any t [0, ) and n
N, Theorem 7.1.14 shows that X(t k,n ), Ftk,n , P is a discrete parameter
martingale indexed by k N. In addition, k,n = t for all but a finite number
of ks. Hence, by (6.3.7) applied to X(t k,n ), Ftk,n , P ,

p1
Bp1 kX(t)kLp (P;R) EP hXin (t) 2 p Bp kX(t)kLp (P;R)

for all n N.

In particular, this shows that supn0 khXin (t)kLp (P;R) < for every p (1, ),
and therefore, since hXin (t)  hXi(t) (a.s.,
P), this is more than enough to
p
p
P
P
2
2
verify that E hXin (t) E hXi(t) for every p (1, ). 

Exercises for 7.2



Exercise 7.2.8. Let X(t), Ft , P be a square integrable, continuous martingale. Following the strategy used to prove Theorem 7.2.1, show that

Z

F X(t)
0

t
1 2
2 x F



X( ) hXi(d ), Ft , P

is a martingale for every F Cb2 (R; C).


Hint: Begin by using cutoffs and mollification to reduce to the case when F
Cc (R; R). Next, given s < t and  > 0, introduce the stopping times 0 = s and
n = inf{t n1 : |X(t) X(n1 )| } (n1 + ) (hXi(n1 ) + ) t
for n 1. Now proceed as in the proof of Theorem 7.2.1.

Exercises for 7.2

291


Exercise 7.2.9. Let X(t), Ft , P be a continuous, square integrable martingale with X(0) = 0, and assume that there exists a non-decreasing function
A : [0, ) [0, ) such that hXi(t) A(t) (a.s.,
P) for each t [0, ). The

goal of this exercise is to show that E(t), Ft , P is a martingale when


E(t) = exp X(t) 12 hXi(t) .

(i) Given R (0, ), set R = inf{t 0 : |X(t)| R}, and show that
!
Z
tR

eX(tR )

1
2

eX( ) dhXi, Ft , P

is a martingale.
Hint: Choose F Cc (R; R) so that F (x) = ex for x [2R, 2R], apply
Exercise 7.2.8 to this F , and then use Doobs Stopping Time Theorem.
1

(ii) Apply Theorem 7.1.17


to the martingale in (i) and e 2 hXi(tR ) to show

that E(t R ), Ft , P is a martingale.

(iii) By replacing X and R with 2X and 2R in (ii), show that






EP E(t R )2 eA(t) EP e2X(tR )2hXi(tR ) = eA(t) .
Conclude that {E(t
 R ) : R (0, )} is uniformly P-integrable and therefore
that E(t), Ft , P is a martingale.

Exercise 7.2.10. If X(t), Ft , P is a P-almost surely continuous, square integrable martingale, is a stopping time, and Y (t) = X(t ), show that
hY i(t) = hXi(t ), t 0, P-almost surely.
Exercise 7.2.11. Continuing
 in the setting of Exercise 7.2.9, first show that,
for every R, E (t), Ft , P is a martingale, where

E (t) = exp X(t)

2
2 hXi(t)


.

Next, use Doobs Inequality to see that, for each 0,


!
!
sup X( ) R

[0,t]

sup E ( ) eR

2
2

A(t)

eR+

2
2

A(t)

[0,t]

Starting from this, conclude that


(7.2.12)


R2
P kXk[0,t] R 2e 2A(t) .

Finally, given this estimate, show that the conclusion in Exercise 7.2.8 continues
to hold for any F C 2 (R; C) whose second derivative has at most exponential
growth.

292

7 Continuous Parameter Martingales

Exercise 7.2.13.
Given a pair
continuous martingales

 of square integrable,
hX+Y ihXY i
, and show that
X(t), Ft , P and Y (t), Ft , P , set hX, Y i =
4
X(t)Y (t) hX, Y i(t), Ft , P is a martingale. Further, show that hX, Y i is
uniquely determined up to a P-null set by this property together with the facts
that hX, Y i(0, ) = 0 and hX, Y i( , ) is continuous and has locally bounded
variation for P-almost every .

Exercise 7.2.14. Let B(t), Ft , P be an RN -valued Brownian motion. Given

f, g Cb1,2 [0, ) RN ; R , set
t


X(t) = f t, B(t)



+ 12 f , B( ) d,



+ 12 g , B( ) d,

Y (t) = g t, B(t)
0

and show that


t


f g , B( ) d.

hX, Y i(t) =
0

Hint: First reduce to the case when f = g. Second, write X(t)2 as


f t, B(t)

2



+ 12 f , B( ) d

2X(t)
0

Z



+ 12 f , B( ) d

2
,

and apply Theorem 7.1.17 to the second term.


7.3 The Reflection Principle Revisited
In Exercise 4.3.12 we saw that Levys Reflection Principle (Theorem 1.4.13) has
a sharpened version when applied to Brownian motion. In this section I will give
another, more powerful way of discussing the reflection principle for Brownian
motion.
7.3.1. Reflecting Symmetric L
evy Processes. In this subsection, will
be used to denote a symmetric, infinitely divisible law. Equivalently (cf. Exercise
3.3.11),
= e` () , where
` () =


1
, C RN +
2

Z
RN




cos , y RN 1 M (dy)

for some non-negative definite, symmetric C and symmetric Levy measure M .

7.3 The Reflection Principle Revisited

293

Lemma 7.3.1. Let


 {Z(t) : t 0} be a Levy process for , and set Ft =
{Z( ) : [0, t]} . If is a stopping time relative to Ft : t [0, ) and

Z(t)
if > t

Z(t)
2Z(t ) Z(t) =
2Z() Z(t) if t,
: t 0} is again a Levy process for .
then {Z(t)
Proof: According to Theorem 7.1.3, all that I have to show is that






t`
()
,
F
,
P
exp 1 (, Z(t)

t
RN

is a martingale for all RN . Thus, let 0 s < t and A Fs be given. Then,


by Theorem 7.1.14 and the fact that ` () = ` (),
i
h




t`
()
,
A

s}
EP exp 1 (, Z(t)

RN
i
h




= EP e2 1(,Z(s))RN exp 1 (, Z(t) RN t` () , A { s}
i
h



= EP e2 1(,Z(s))RN exp 1 , Z(s) RN s` () , A { s}
i
h




t`
()
,
A

s}
.
= EP exp 1 , Z(s)

N
R

Similarly,
i
h




t`
()
,
A

{
>
s}
EP exp 1 , Z(t)

RN
i
h




= EP e2 1(,Z(t))RN exp 1 (, Z(t) RN t` () , A { > s}
i
h



= EP exp 1 , Z(t ) RN (t )` () , A { > s}
i
h



= EP exp 1 , Z(s ) RN (s )` () , A { > s}
i
h




s`
()
,
A

{
>
s}
. 
= EP exp 1 , Z(s)

N
R

Obviously, the process {Z(t)


: t 0} in Lemma 7.3.1 is the one obtained
by reflecting (i.e., reversing the direction of {Z(t) : t 0}) at time , and the
lemma says that the distribution of the resulting process is the same as that
of the original one. Most applications of this result are to situations when one
knows more or less precisely where the process is at the time when it is reflected.
For example, suppose N = 1, a (0, ), and a = inf{t 0 : Z(t) a}. Noting
= Z(t) for t a and therefore that a = inf{t 0 : Z(t)

that, because Z(t)
a}, we have that


P Z(t) x & a t = P 2Z(a ) Z(t) x & a t

= P Z(t) 2Z(a ) x & a t .

294

7 Continuous Parameter Martingales

Hence, if x a, and therefore Z(t) 2Z(a ) x = a t when a < ,


then


P Z(t) x & a t = P Z(t) 2Z(a ) x & a < for x a.


Applying this
when
x
=
a
and
using
P

t
=
P
Z(t)

a
&

t
+
a
a



P Z(t) > a , one gets P a t 2P Z(t) a , a conclusion that also could
have been reached via Theorem 1.4.13.
7.3.2. Reflected Brownian Motion. The considerations in the preceding
subsection are most interesting
when applied to R-valued Brownian motion.

Thus, let B(t), Ft , P be an R-valued Brownian motion. To appreciate the
improvements that can be made in the calculations just made, again take a =
inf{t 0 : B(t) a} for some a > 0. Then, because Brownian paths are
continuous, a < = B(a ) = a and so, since P(a < ) = 1, we can say
that


(7.3.2) P B(t) x & a t = P B(t) 2ax for (t, x) [0, )(, a].


In particular, by taking x = a and using P B(t) a = P B(t) a & a t ,
we recover the result in Exercise 4.3.12 that


P a t = 2P B(t) a .
A more interesting application of Lemma 7.3.1 to Brownian motion is to the
case when is the exit time from an interval other than a half-line.
Theorem 7.3.3. Let a1 < 0 < a2 be given, define (a1 ,a2 ) = inf{t 0 : B(t)
/
(a1 ,a2 )
(a1 ,a2 )
(a1 , a2 )}, and set Ai (t) = {
t & B(
) = ai } for i {1, 2}. Then,
for B[a1 ,) ,


0 P {B(t) } A1 (t) P {B(t) 2(a2 a1 ) + } A1 (t)


= P B(t) 2a1 P B(t) 2(a2 a1 ) +
and, for B(,a2 ] ,


0 P {B(t) } A2 (t) P {B(t) 2(a2 a1 ) + } A2 (t)


= P B(t) 2a2 P B(t) 2(a2 a1 ) + .

Hence, for B[a1 ,) , P {B(t) } A1 (t) equals
h
X

i
0,t 2a1 + 2(m 1)(a2 a1 ) 0,t + 2m(a2 a1 )
m=1


and, for B(,a2 ] , P {B(t) } A2 (t) equals
h
X

i
0,t 2a2 2(m 1)(a2 a1 ) 0,t 2m(a2 a1 ) ,
m=1

where in both cases the convergence is uniform with respect t in compacts and
B(a1 ,a2 ) .

7.3 The Reflection Principle Revisited

295

Proof: Suppose B[a1 ,) . Then, by Lemma 7.3.1,




P {B(t) } A1 (t) = P {2a1 B(t) } A1 (t)


= P B(t) 2a1 P {B(t) 2a1 } A2 (t) ,
since B(t) 2a1 = B(t) a1 = (a1 ,a2 ) t. Similarly,


P {B(t) } A2 (t) = P {2a2 B(t) } A1 (t)


= P B(t) 2a2 P {B(t) 2a2 } A1 (t)
when B(,a2 ] . Hence, since 2a1 (, a1 ] (, a2 ] if B[a1 ,) ,



P {B(t) } A1 (t) = P B(t) 2a1 P B(t) 2(a2 a1 ) +

+ P {B(t) 2(a2 a1 ) + } A1 (t)
when B[a1 ,) . Similarly, when B(,a2 ] ,



P {B(t) } A2 (t) = P B(t) 2a2 P B(t) 2(a2 a1 ) +

+ P {B(t) 2(a2 a1 ) + } A2 (t) .
To check that


P {B(t) }A1 (t) P {B(t) 2(a2 a1 )+}A1 (t) 0 when B[a1 ,) ,
first use Theorem 7.1.16 to see that



P {B(t) } A1 (t) = EP 0,t (a1 ,a2 ) ( a1 ), A1 (t) .

Second, observe that, because [a1 , ), 0, 2(a2 a1 ) + 0, () for all
0. The case when B(,a2 ] and A1 (t) is replaced by A2 (t) is handled in
the same way.

Given the preceding, one can use induction to check that P {B(t) }A1 (t)
equals
M h
X


i
P B(t) 2a1 2(m 1)(a2 a1 ) P B(t) 2m(a2 a1 ) +

m=1


+ P {B(t) 2M (a2 a1 ) + } A1 (t)
for all B[a1 ,) . The same line of reasoning applies when B(,a2 ] and
A1 (t) is replaced by A2 (t). 
Perhaps the most useful consequence of the preceding is the following corollary.

296

7 Continuous Parameter Martingales

Corollary 7.3.4. Given a c R and an r (0, ), set I = (c r, c + r) and



P I (t, x, ) = P {x + B(t) } { I > t} , x I and BI .
Then
Z

(7.3.5)

P I (t, z, ) P I (s, x, dz).

P (s + t, x, ) =
I

Next, set
g(t, x) =

g(t, x + 4m),

x2

where g(t, x) = (2t) 2 e 2t

mZ

and
p(1,1) (t, x, y) = g(t, y x) g(t, y + x + 2)

for (t, x, y) (0, ) [1, 1]2 .

Then p(1,1) is a smooth function that is symmetric in (x, y), strictly positive
on (0, ) (0, 1)2 , and vanishes when x {1, 1}. Finally, if

pI (t, x, y) = r1 p(1,1) r2 , r1 (x c), r1 (y c) ,

(t, x, y) (0, ) I 2 ,

then
I

(7.3.6)

p (s + t, x, y) =

pI (s, x, z)pI (t, z, y) dz

and, for (t, x) (0, ) I, P I (t, x, dy) = pI (t, x, y) dy.


Proof: Begin by applying Theorem 7.1.16 to check that P I (s + t, x, ) equals
W (1) {x + (s) + s (t) } {x + (s) + s ( ), [0, t s]}

{x + () I, [0, s]}


(1) 
= EW P I t, x + (s), , {x + () I, [0, s]}
Z
= P I (t, z, ) P I (s, x, dz).
I

Next, set a1 = r1 (c x) 1 and a2 = r1 (x x) + 1. Then



P I (t, x, ) = P {B(t) x} {B( ) (ra1 , ra2 ), [0, t]}

= P {B(r2 t) r1 ( x)} {B(r2 ) (a1 , a2 ), [0, t]}

= P B(r2 t) r1 ( x) & (a1 ,a2 ) > r2 t


= P B(r2 t) r1 ( x) P B(r2 t) r1 ( x) & (a1 ,a2 ) r2 t ,

7.3 The Reflection Principle Revisited

297

where, in the passage to the second line, I have used Brownian scaling. Now,
use the last part of Theorem 7.3.3, the symmetry of 0,r2 t , and elementary
rearrangement of terms to arrive first at
P I (t, x, ) =

Xh


i
r2 t 4m + r1 ( x) r2 t 4m + 2 + r1 ( + x 2c) ,

mZ

and then at P I (t, x, dy) = pI (t, x, y) dy. Given this and (7.3.5), (7.3.6) is obvious.
Turning to the properties of p(1,1) (t, x, y), both its symmetry and smoothness are clear. In addition, as the density for P (1,1) (t, x. ), it is non-negative,
and, because x
g(t, x) is periodic with period 4, it is easy to see that
(1,1)
p
(t, 1, y) = 0. Thus, everything comes down to proving that p(1,1) (t, x, y)
> 0 for (t, x, y) (0, ) (1, 1)2 . To this end, first observe that, after rearranging terms, one can write p(1,1) (t, x, y) as
g(t,y x) g(t, y + x) + g(t, 2 x y)
h
X

+
g(t, y x + 4m) g(t, y + x + 2 + 4m)
m=1

i
+ g(t, y x 4m) g(t, y + x 2 4m) .
Since each of the terms in the sum over m Z+ is positive, we have that



2(1|x|)(1|y|) 
t
1 2e g(t, y x)
p(1,1) (t, x, y) > g(t, y x) 1 2e
if t 2(1 |x|)(1 |y|). Hence, for each (0, 1), p(1,1) (t, x, y) > 0 for all
(t, x, y) [0, 22 ] [1 + , 1 ]2 . Finally, to handle x, y [1 + , 1 ] and
t > 22 , apply (7.3.6) with I = (1, 1) to see that
p

(1,1)

(m + 1) , x, y)

p(1,1) (2 , x, z)p(1,1) (m2 , z, y) dz,

|z|(1)

and use this and induction to see that p(1,1) (m2 , x, y) > 0 for all m 1. Thus,
if n Z+ is chosen so that n2 < t (n + 1)2 , then another application of
(7.3.6) shows that
(1,1)

Z
(t, x, y)
|z|(1)

p(1,1) (t n2 , x, z)p(1,1) (n2 , z, y) dz > 0. 

298

7 Continuous Parameter Martingales


Exercises for 7.3

Exercise 7.3.7. Suppose that G is a non-empty, open subset of RN , define


xG : C(RN ) [0, ] by
xG () = inf{t 0 : x + (t)
/ G},
and set

P G (t, x, ) = W (N ) { : x + (t) & xG () > t}
for (t, x) (0, ) G and BG .
(i) Show that
G

P G (t, z, ) P G (s, x, dy).

P (s + t, x, ) =
G

(ii) As an application of Exercise 7.1.25, show that


P G (t, x, ) = 0,tI ( x) EW

(N )




0,(txG )I x (xG ) , xG .

. This is the probabilistic version of Duhamels Formula, which we will see again
in 10.3.1.
(iii) As a consequence of (ii), show that there is a Borel measurable function
pG : (0, ) G2 [0, ) such that (t, y)
pG (t, x, y) is continuous for
each x G and P G (t, x, dy) = pG (t, x, y) dy for each (t, x) (0, ) G. In
particular, use this in conjunction with (i) to conclude that
Z
G
p (s + t, x, y) =
pG (t, z, y)pG (s, x, z) dz.
G
N

Hint: Keep in mind that (, )


(2 ) 2 e
long as stays away from the origin.

||2
2

is smooth and bounded as

(iv) Given c = (c1 , . . . , cN ) RN and r > 0, let Q(c, r) denote the open cube
QN
i=1 (ci r, ci + r), and show that (cf. Corollary 7.3.4)
pQ(c,r) (t, x, y) =

N
Y

p(ci r,ci +r) (t, xi , yi )

i=1

for x = (x1 , . . . , xN ), y = (y1 , . . . , yN ) Q(c, r). In particular, conclude that


pQ(c,r) (t, x, y) is uniformly positive on compact subsets of (0, ) Q(c, r)2 .
(v) Assume that G is connected, and show that pG (t, x, y) is uniformly positive
on compact subsets of (0, ) G2 .
Hint: If Q(c, r) G, show that pG (t, x, y) pQ(c,r) (t, x, y) on (0, )Q(c, r)2 .

Chapter 8
Gaussian Measures on a Banach Space

As I said at the end of 4.3.2, the distribution of Brownian motion is called


Wiener measure because Wiener was the first to construct it. Wieners own
thinking about his measure had little or nothing in common with the Levy
Khinchine program. Instead, he looked upon his measure as a Gaussian measure
on an infinite dimensional space, and most of what he did with his measure is best
understood from that perspective. Thus, in this chapter, we will look at Wiener
measure from a strictly Gaussian point of view. More generally, we will be
dealing here with measures on a real Banach space E that are centered Gaussian
in the sense that, for each x in the dual space E , x E 7 hx, x i R is
a centered Gaussian random variable. Not surprisingly, such a measure will be
said to be a centered Gaussian measure on E .
Although the ideas that I will use are already implicit in Wieners work, it was
I. Segal and his school, especially L. Gross,1 who gave them the form presented
here.
8.1 The Classical Wiener Space
In order to motivate what follows, it is helpful to first understand Wiener measure from the point of view which I will be adopting here.
8.1.1. Classical Wiener Measure. Up until now I have been rather casual
about the space from which Brownian paths come. Namely, because Brownian
paths are continuous, I have thought
 of their distribution as being a probability
on the space C(RN ) = C [0, ); RN . In general, there is no harm done by choosing C(RN ) as the sample space for Brownian paths. However, for my purposes
here, I need my sample spaces to be separable Banach spaces, and, although it
is a complete, separable metric space, C(RN ) is not a Banach space. With this
in mind, define (RN ) to be the space of continuous paths : [0, ) RN
with the properties that (0) = 0 and limt t1 |(t)| = 0.
1

See I.E. Segals Distributions in Hilbert space and canonical systems of operators, T.A.M.S.,
88 (1958) and L. Grosss Abstract Wiener spaces, Proc. 5th Berkeley Symp. on Prob. &
Stat., 2 (1965), Univ. of California Press. A good exposition of this topic can be found in
H.-H. Kuos Gaussian Measures in Banach Spaces, Springer-Verlag, Math. Lec. Notes., # 463
(1975).

299

300

8 Gaussian Measures on a Banach Space

Lemma 8.1.1. The map


|(t)|
[0, ]
t0 1 + t

is lower semicontinuous, and the pair (RN ), k k(RN ) is a separable Banach
space that is continuously embedded as a Borel measurable
subset of C(RN ). In


N
particular, B(RN ) coincides with BC(RN ) [(R )] = A(RN ) : A BC(RN ) .

Moreover, the dual space (RN ) of (RN ) can be identified with the space
of RN -valued, Borel measures on [0, ) with the properties that ({0}) = 0
and 1
Z

kk(RN )
(1 + t) ||(dt) < ,
C(RN ) 7 kk(RN ) sup

[0,)

when the duality relation is given by


Z
h, i =

(t) (dt).

[0,)

Finally, if (B(t), Ft , P) is an RN -valued Brownian motion, then B (RN )


P-almost surely and


EP kBk2(RN ) 32N.
Proof: It is obvious that the inclusion map taking (RN ) into C(RN ) is continuous. To see that k k(RN ) is lower semicontinuous on C(RN ) and that
(RN ) BC(RN ) , note that, for any s [0, ) and R (0, ),
n
o


A(s, R) C(RN ) : (t) R(1 + t) for t s
is closed in C(RN ). Hence, since kk(RN ) R A(0, R), k k(RN ) is
lower semicontinuous. In addition, since { C(RN ) : (0) = 0} is also closed,
(RN ) =

[
n
o
\

A m, n1 : (0) = 0 BC(RN ) .
n=1 m=1


In order to analyze the space (RN ), k k(RN ) , define




N
N
N
F : (R ) C0 R; R C R; R : lim |(s)| = 0
|s|

by


(es )
,
F () (s) =
1 + es
1

s R.

I use || to denote the variation measure determined by .

8.1 The Classical Wiener Space

301


As is well known, C0 R; RN with the uniform norm is a separable Banach
space,

N
N
and it is obvious that F is an isometry from (R ) onto
C0 R; R . Moreover,

by the Riesz Representation Theorem for C0 R; RN , one knows that the dual
of C0 R; RN is isometric to the space of totally finite, RN -valued measures
on R; BR with the norm given by total variation. Hence, the identification

of (RN ) reduces to the obvious interpretation of the adjoint map F as a


mapping from totally finite RN -valued measures onto the space of RN -valued
measures that do not charge 0 and whose variation measure integrates (1 + t).
Because of the Strong Law in part (ii) of Exercise 4.3.11, it is clear that almost
every Brownian path is in (RN ). In addition, by the Brownian scaling property
and Doobs Inequality (cf. Theorem 7.1.9),
P

kBk2(RN )

X
n=0

4
2

n+1

n+2


P

sup |B(t)|
0t2n


P

sup |B(t)|



32EP |B(1)|2 = 32N.

0t1

n=0

In view of Lemma 8.1.1, we now know that the distribution of RN -valued


Brownian motion induces a Borel measure W (N ) on the separable Banach space
(RN ), and throughout this chapter I will refer to this measure as the classical
Wiener measure.
My next goal is to characterize, in terms of (RN ), exactly which measure
on (RN ) Wieners is, and for this purpose I will use the following simple fact
about Borel probability measures on a separable Banach space.
Lemma 8.1.2. Let E with norm k kE be a separable, real Banach space, and
use
(x, x ) E E 7 hx, x i R
to denote the duality relation between E and its dual space E . Then the Borel
field BE coincides with the -algebra generated by the maps x E 7 hx, x i
as x runs over E . In particular, if, for M1 (E), one defines its Fourier
transform
: E C by
Z
i
h

(x ) =
exp 1 hx, x i (dx), x E ,
E

then
is a continuous function of weak* convergence on , and
uniquely
determines in the sense that if is a second element of M1 () and
= ,
then = .
Proof: Since it is clear that each of the maps x E 7 hx, x i R is
continuous and therefore BE -measurable, the first assertion will follow as soon

302

8 Gaussian Measures on a Banach Space

as we show that the norm x


kxkE can be expressed as a measurable function of
these maps. But, because E is separable, we know (cf. Exercise 5.1.19) that the
closed unit ball BE (0, 1) in E is separable with respect to the weak* topology
and therefore that we can find a sequence {xn :, n 1} BE (0, 1) so that

kxk = sup hx, xn i,

x E.

nZ+

Turning to the properties of


, note that its continuity with respect to weak*
convergence is an immediate consequence of Lebesgues Dominated Convergence
Theorem. Furthermore, in view of the preceding, we will know that
completely

determines as soon as we show that, for each n Z+ and X = x1 , . . . , xn
n
E ,
determines the marginal distribution X M1 (RN ) of


x E 7 hx, x1 i, . . . , hx, xn i Rn
under . But this is clear (cf. Lemma 2.3.3), since
!
n
X
d

m xm
for = (1 , . . . , n ) Rn . 
X () =
m=1

I will now compute the Fourier transform of W (N) . To this end, first recall

that, for an RN -valued Brownian motion, { , B(t) RN : t 0 and RN


spans a Gaussian family G(B) in L2 (P; R). Hence, span , (t) : t

0 and RN
is a Gaussian family in L2 (W (N ) ; R). From this, combined
with an easy limit argument using Riemann sum approximations, one sees that,

for any (RN ) ,


h, i is a centered Gaussian random variable under
W (N ) . Furthermore, because, for 0 s t,

 

 

(N ) 
(N ) 
EW
, (s) RN , (t) RN = EW
, (s) RN , (s) RN = s , RN ,
we can apply Fubinis Theorem to see that
ZZ

(N ) 
EW
h, i2 =
s t (ds) (dt).
[0,)2

Therefore, we now know that W (N ) is characterized by its Fourier transform

ZZ
1

\
(N ) () = exp
s t (ds) (dt) , (RN ) .
(8.1.3)
W

2
[0,)2

Equivalently, we have shown that W (N ) is the centered Gaussian measure on

(RN ) with the property that, for each (RNRR


) ,
h, i is a centered
Gaussian random variable with variance equal to
s t (ds) (dt).
[0,)2

8.1 The Classical Wiener Space

303

8.1.2. The Classical CameronMartin Space. From the Gaussian standpoint, it is extremely unfortunate that the natural home for Wiener measure is
a Banach space rather than a Hilbert space. Indeed, in finite dimensions, every
centered, Gaussian measure with non-degenerate covariance can be thought of
as the canonical, or standard, Gaussian measure on a Hilbert space. Namely, if
0,C is the Gaussian measure on RN with mean 0 and non-degenerate covariance
C, consider RN as a Hilbert space H with inner product (g, h)H = (g, Ch)RN ,
and take H to be the natural Lebesgue measure there: the one that assigns
measure 1 to a unit cube in H or, equivalently, the one obtained by pushing the
1
usual Lebesgue measure RN forward under the linear transformation C 2 . Then
we can write
khk2
1
2H
H (dh)
e
0,C (dh) =
N
(2) 2
and
2

d
0,C (h) = e

khk
H
2

As was already pointed out in Exercise 3.1.11, in infinite dimensions there is


no precise analog of the preceding canonical representation (cf. Exercise 8.1.7
for further corroboration of this point). Nonetheless, a good deal of insight can
be gained by seeing how close one can come. In order to guess on which Hilbert
space it is that W (N ) would like to live, I will give R. Feynmans highly questionable but remarkably powerful way of thinking about such matters. Namely,
n
given n Z+ , 0 = t0 < t1 < < tn , and a set A BRN , we know that



W (N ) assigns : (t1 ), . . . , (tn ) A probability
#
"
Z
n
X
|ym ym1 |2
1
dy1 dyn ,
exp
tm tm1
Z(t1 , . . . , tn ) A
m=1
N
Qn
where y0 0 and Z(t1 , . . . , tn ) = m=1 2(tm tm1 ) 2 . Now rename the
variable ym as (tm ), and rewrite the preceding as Z(t1 , . . . , tn )1 times


!2
Z
n


X
(t
)

(t
)
t

t
m
m1
m
m1
d(t1 ) d(tn ).
exp
tm tm1
2
A
m=1

Obviously, nothing very significant has happened yet, since nothing very exciting has been done yet. However, if we now close our eyes, suspend our disbelief,
and pass to the limit as n tends to infinity and the tk s become dense, we arrive
at Feynmans representation 2 of Wieners measure:
#
"
Z

2
1
1
(N )
dt d,
(t)
(8.1.4)
W
d) = exp
2 [0,)
Z
2

In truth, Feynman himself never dabbled in considerations so mundane as the ones that

follow. He was interested in the Sch


odinger equation, and so he had a factor 1 multiplying
the exponent.

304

8 Gaussian Measures on a Banach Space

where denotes the velocity (i.e., derivative) of . Of course, when we reopen


our eyes and take a look at (8.1.4), we see that it is riddled with flaws. Not even
one of the ingredients on the right-hand side of (8.1.4) makes sense! In the first
place, the constant Z must be 0 (or maybe ). Secondly, since the image of the
measure d under

n
(RN ) 7 (t1 ) . . . , (tn ) RN
is Lebesgue measure for every n Z+ and 0 < t1 < tn , d must be the nonexistent translation invariant measure on the infinite dimensional space (RN ). Finally, the integral in the exponent only makes sense if is differentiable in some
sense, but almost no Brownian path is. Nonetheless, ridiculous as it is, (8.1.4)
is exactly the expression at which one would arrive if one were to make a sufficiently nave interpretation of the notion that Wiener measure is the standard
Gauss measure on the Hilbert space H(RN ) consisting of absolutely continuous
h : [0, ) RN with h(0) = 0 and
L2 ([0,);RN ) < .
khkH1 (RN ) = khk
Of course, the preceding discussion is entirely heuristic. However, now that
we know that H1 (RN ) is the Hilbert space at which to look, it is easy to provide
a mathematically rigorous statement of the connection between (RN ), W (N ) ,
and H1 (RN ). To this end, observe that H(RN ) is continuously embedded in
1
(RN ) as a dense subspace. Indeed, if h H1 (RN ), then |h(t)| t 2 khkH1 (RN ) ,
and so not only is h (RN ) but also khk(RN ) 12 khkH1 (RN ) . In addition,

since Cc (0, ); RN is already dense in (RN ), the density of H1 (RN ) in
(RN ) is clear. Knowing this, abstract reasoning (cf. Lemma 8.2.3) guarantees

that (RN ) can be identified as a subspace of H1 (RN ). That is, for each

(RN ) , there is a h H1 (RN ) with the property that h, h H1 (RN ) = hh, i

for all h H1 (RN ), and in the present setting it is easy to give a concrete

representation of h . In fact, if (RN ) , then, for any h H1 (RN ),


Z
hh, i =

h(t) (dt) =
(0,)

Z
=
(0,)

(0,)

!
) d
h(

(0,t)



) (, ) d = h, h 1 N ,
h(
H (R )

where
Z
h (t) =
(0,t]


(, ) d.

(dt)

8.1 The Classical Wiener Space

305

Moreover,

kh k2H1 (RN ) =



(, ) |2 d =

(0,)

ZZ

(0,)

(ds) (dt) d

(,)2

ZZ
=

s t (ds) (dt).

(0,)2

Hence, by (8.1.3),
\
(N ) () = exp
W

(8.1.5)

kh k2H(RN )

!
,

(RN ) .

Although (8.1.5) is far less intuitively appealing than (8.1.4), it provides a


mathematically rigorous way in which to think of W (N ) as the standard Gaussian
measure on H1 (RN ). Furthermore, there is another way to understand why one
should accept (8.1.5) as evidence for this way of thinking about W (N ) . Indeed,

given (RN ) , write


Z
Z T

h, i = lim
(t) (dt) = lim
(t) d (t, ) ,
T

[0,T ]

where the integral in the last expression is taken in the sense of Riemann
Stieltjes. Next, apply the integration by part formula3 to conclude that t
(t, ) is RiemannStieltjes integrable with respect to t
(t) and that
Z T
Z T




(t) d (t, ) = (T ) (T, ) +


(t, ) d(t).
0

Hence, since
|(T )|
lim |(T )|||(T, ) lim
T
T 1 + T

Z
(8.1.6)

h, i = lim

Z
(1 + t) ||(dt) = 0,
(0,)

h (t) d(t),

where again the integral is in the sense of RiemannStieltjes. Thus, if one

somewhat casually writes d(t) = (t)


 dt, one can believe that h, i provides a
reasonable interpretation of , h H(RN ) for all (RN ), not just those that
are in H1 (RN ).
Because R. Cameron and T. Martin were the first mathematicians to systematically exploit the consequences of this line of reasoning, I will call H1 (RN ) the
CameronMartin space for classical Wiener measure.
3

See, for example, Theorem 1.2.7 in my A Concise Introduction to the Theory of Integration,
Birkh
auser (1999).

306

8 Gaussian Measures on a Banach Space


Exercises for 8.1

Exercise 8.1.7. Let H be a separable Hilbert space, and, for each n Z+


and subset {g1 , . . . , gn } H, let A(g1 , . . . , gn ) denote the -algebra over H
generated by the mapping

h H 7 (h, g1 )H , . . . , (h, gn )H Rn ,
and check that
A=

[

A(g1 , . . . , gn ) : n Z+ and g1 , . . . , gn H

is an algebra that generates BH . Show that there always exists a finitely additive
WH on A that is uniquely determined by the properties that it is -additive on
A(g1 , . . . , gn ) for every n Z+ and {g1 , . . . , gn } H and that


Z
i
h
kgk2H
, g H.
exp 1 (h, g)H WH (dh) = exp
2
H

On the other hand, as we already know, this finitely additive measure admits a
countably additive extension to BH if and only if H is finite dimensional.
8.2 A Structure Theorem for Gaussian Measures
Say that a centered Gaussian
measure
W on a separable Banach space E is


non-degenerate if EW hx, x i2 > 0 unless x = 0. (See Exercise 8.2.11.) In
this section I will show that any non-degenerate, centered Gaussian measure W
on a separable Banach space E shares the same basic structure that W (N ) has
on (RN ). In particular, I will show that there is always a Hilbert space H E
for which W is the standard Gauss measure in the same sense that W (N ) was
shown in 8.1.2 to be the standard Gauss measure for H1 (RN ).
8.2.1. Ferniques Theorem. In order to carry out my program, I need a
basic integrability result about Banach spacevalued, Gaussian random variables. The one that I will use is due to X. Fernique, and his is arguably the
most singularly beautiful result in the theory of Gaussian measures on a Banach
space.
Theorem 8.2.1 (Ferniques Theorem). Let E be a real, separable Banach
space, and suppose that X is an E-valued random variable that is centered and
Gaussian in the sense that, for each x E , hX, x i is a centered, R-valued
Gaussian random variable. If R = inf{r : P(kXkE r) 34 )}, then

(8.2.2)

 2n
h kXk2E i
X
1
e
.
E e 18R2 K e 2 +
3
n=0

(See Corollary 8.4.3 for a sharpened statement.)

8.2 A Structure Theorem for Gaussian Measures

307

Proof: After enlarging the sample space if necessary, I may and will assume
that there is an E-valued random variable X 0 that is independent of X and has
1
1
the same distribution as X. Set Y = 2 2 (X + X 0 ) and Y 0 = 2 2 (X X 0 ).
Then the pair (Y, Y 0 ) has the same distribution as the pair (X, X 0 ). Indeed, by
2
Lemma 8.1.2, this
random
variable
 comes down to showing that the R -valued


hY, x i, hY , x i has the same distribution as hX, x i, hX 0 , x i , and that is


an elementary application of the additivity property of independent Gaussians.
Turning to the main assertion, let 0 < s t be given, and use the preceding
to justify



P kXkE s P kXkE t = P kXkE s & kX 0 kE t
1 
1
= P kX X 0 kE 2 2 s & kX + X 0 kE 2 2 t


1 
1
P kXkE kX 0 kE 2 2 s & kXkE + kX 0 kE 2 2 t
2

1
1
P kXkE kX 0 kE 2 2 (t s) = P kXkE 2 2 (t s) .


Now suppose that P kXk R
1
tn = R + 2 2 tn1 for n 1. Then

3
4,

and define {tn : n 0} by t0 = R and



2
P kXkE R P kXkE tn P kXkE tn1
and therefore

P kXkE tn

P kXkE R

P kXkE tn1

P kXkE R

 !2

for n 1. Working by induction, one gets from this that

 !2n
P kXkE R

P kXkE R


P kXkE tn

P kXkE R

and therefore, since tn = R 2

n+1
2 1
1
2 2 1

32

n+1
2

R, that P kXkE 32

n+1
2


n
R 32 .

Hence,

h kXk2E i

 X
n+1
n
n
1
e2 P 32 2 R kXkE 32 2 R
EP e 18R2 e 2 P kXkE 3R +
n=0
1

e2 +


X
n=0

n
e 2

= K.

8.2.2. The Basic Structure Theorem. I will now abstract the relationship,
proved in 8.1.2, between (RN ), H1 (RN ), and W (N ) , and for this purpose I
will need the following simple lemma.

308

8 Gaussian Measures on a Banach Space

Lemma 8.2.3. Let E be a separable, real Banach space, and suppose that
H E is a real Hilbert space that is continuously embedded as a dense subspace
of E.
(i) For each x E there is a unique hx H with the property that
h, hx H = hh, x i for all h H, and the map x E 7 hx H is
linear, continuous, one-to-one, and onto a dense subspace of H.
(ii) If x E, then x H if and only if there is a K < such that |hx, x i|
Kkhx kH for all x E . Moreover, for each h H, khkH = sup{hh, x i : x
E & kx kE 1}.
(iii) If L is a weak* dense subspace of E , then there exists a sequence {xn :
n 0} L such that {hxn : n 0} is an orthonormal basis for H. Moreover,
P
if x E, then x H if and only if n=0 hx, xn i2 < . Finally,
h, h0


H

hh, xn ihh0 , xn i for all h, h0 H.

n=0

Proof: Because H is continuously embedded in E, there exists a C < such


that khkE CkhkH . Thus, if x E and f (h) = hh, x i, then f is linear and
|f (h)| khkE kx kE Ckx kE khkH , and so, by the Riesz Representation
Theorem
 for Hilbert spaces, there exists a unique hx H such that f (h) =
h, hx H . In fact, khx kH Ckx kE , and uniqueness can be used to check
that x
hx is linear. To see that x
hx is one-to-one, it suffices to show

that x = 0 if hx = 0. But if hx = 0, then hh, x i = 0 for all h H, and


therefore, because H is dense in E, x = 0. Because I will use it later, I will
prove slightly more than the density of just {hx : x E } in H. Namely, for
any weak* dense subset S of E , {hx : x S } is dense in H. Indeed, if this
were not the case,
exist an h H \ {0} with the property that
 then there would

hh, x i = h, hx H = 0 for all x S. But, since S is weak* dense in E , this


would lead to the contradiction that h = 0. Thus, (i) is now proved.
Obviously, if h H, then |hh, x i| = |(h, hx )H | khx kH khkH for x E .
Conversely, if x E and |hx, x i| Kkhx kH for some K < and all x E ,
set f (hx ) = hx, x i for x E . Then, because x
hx is one-to-one, f

is a well-defined, linear functional on {hx : x E }. Moreover, |f (x )|


Kkhx kH , and therefore, since {hx : x E } is dense, f admits a unique
extension as a continuous, linear functional on H. Hence, by Rieszs theorem,
there is an h H such that
hx, x i = f (hx ) = h, hx


H

= hh, x i,

x E ,

which means that x = h H. In addition, if h H, then khkH = sup{hh, x i :


khx kH 1} follows from the density of {hx : x E }, and this completes
the proof of (ii).

8.2 A Structure Theorem for Gaussian Measures

309

Turning to (iii), remember that, by Exercise 5.1.19, the weak* topology on E


is second countable. Hence, the weak* topology on L is also second countable
and therefore separable. Thus, we can find a sequence in L that is weak* dense
in E , and then, proceeding as in the hint given for Exercise 5.1.19, extract
a subsequence of linearly independent elements whose span S is weak* dense
in E . Starting with this subsequence, apply the GrahmSchmidt orthogonalization procedure to produce a sequence {xn : n 0} whose span is S and
for which {hxn : n 0} is orthonormal in H. Moreover, because the span of
{hxn : n 0} equals {hx : x S }, which, by what we proved earlier, is
dense in H, {hxn : n 0} is an orthonormal basis in H. Knowing this, it is
immediate that
0

h, h


H

h, hxn


H

h , hxn

n=0


H

hh, xn ihh0 , xn i.

n=0

P
P
2
2
2
In particular,
n=0 hx, xn i < ,
P khkH = n=0 hh, xn i . Finally, if x E and

set g = m=0 hx, xn ihxn . Then g H and hx g, x i = 0 for all x S .


Hence, since S is weak* dense in E , x = g H. 
Given a separable real Hilbert space H, a separable real Banach space E, and
a W M1 (E), I will say that the triple (H, E, W) is an abstract Wiener
space if H is continuously embedded as a dense subspace of E and W M1 (E)
has Fourier transform
(8.2.4)

c ) = e
W(x

khx k2
H
2

for all x E .

The terminology is justified by the fact, demonstrated at the end of 8.1.2,
that H1 (RN ), (RN ), W (N ) is an abstract Wiener space. The concept of an
abstract Wiener space was introduced by Gross, although his description was
somewhat different from the one just given (cf. Theorem 8.3.9 for a reconciliation
of mine with his definition).
Theorem 8.2.5. Suppose that E is a separable, real Banach space and that
W M1 (E) is a centered Gaussian measure that is non-degenerate. Then there
exists a unique Hilbert space H such that (H, E, W) is an abstract Wiener space.
q


Proof: By Ferniques Theorem, we know that C EW kxk2E < .
To understand the proof of existence, it is best to start with the proof of
uniqueness. Thus, suppose that H is a Hilbert space for which (E, H, W) is an
abstract Wiener space. Then, for all x , y E , hhx , y i = (hx , hy )H =
hhy , x i. In addition,

hhx , x i =

khx k2H

Z
=

hx, x i2 W(dx),

310

8 Gaussian Measures on a Banach Space

and so, by the symmetry just established,


Z
(*)
hhx , y i = khx k2H = hx, x ihx, y i W(dx),
for all x , y E . Next observe that
Z


hx, x i x W(dx) Ckhx kH ,
(**)
E
R
and therefore that the integral xhx, x i W(dx) is a well-defined element of E.
Moreover, by (*),
Z


hhx , y i =
xhx, x i W(dx), y
for all y E ,
and so
Z
(***)

hx =

xhx, x i W(dx).

Finally, given h H, choose {xn : n 1} E so that hxn h in H. Then




lim sup h , xn i h , xm i 2
= lim sup khx hx kH = 0,
m n>m

L (W;R)

m n>m

and so, if denotes the closure of {h , x i : x E } in L2 (W; R) and F :


E is given by
Z
F () = x(x) W(dx), ,
then h = F () for some . Conversely, if and {xn : n 1} is chosen
so that h , xn i in L2 (W; R), then {hxn : n 1} converges in H to some
h H and it converges in E to F (). Hence, F () = h H. In other words,
H = F ().
The proof of existence is now a matter of checking that if and F are defined
as above and if H = F () with kF ()kH = kkL2 (W;R) , then (H, E, W) is an
abstract Wiener space. To this end, observe that
Z


hF (), x i = hx, x i(x) W(dx) = F (), hx H ,


and therefore both (*) and (***) hold for this choice of H. Further, given (*), it
is clear that khx k2H is the variance of h , x i and therefore that (8.2.4) holds.
At the same time, just as in the derivation of (**), kF ()kE CkkL2 (W;R) =
CkF ()kH , and so H is continuously embedded inside E. Finally, by the Hahn
Banach Theorem, to show that H is dense in E it suffices to check that the only
x E such Rthat hF (), x i = 0 for all is x = 0. But when = h , x i,
hF (), x i = hx, x i2 W (dx), and therefore, because W is non-degenerate, such
an x would have to be 0. 
8.2.3. The CameronMarin Space. Given a centered, non-degenerate
Gaussian measure W on E, the Hilbert space H for which (H, E, W) is an abstract Wiener space is called its CameronMartin space. Here are a couple
of important properties of the CameronMartin subspace.

8.2 A Structure Theorem for Gaussian Measures

311

Theorem 8.2.6. If (H, E, W) is an abstract Wiener space, then the map


x E 7 hx H is continuous from the weak* topology on E into the
strong topology on H. In particular, for each R > 0, {hx : x BE (0, R)}
is a compact subset of H, BH (0, R) is a compact subset of E, and so H BE .
Moreover, when E is infinite dimensional, W(H) = 0. Finally, there is a unique
linear, isometric map I : H L2 (W; R) such that I(hx ) = h , x i for all
x E , and {I(h) : h H} is a Gaussian family in L2 (W; R).

c ) is continuous
Proof: To prove the initial assertion, remember that x
W(x

with respect to the weak* topology. Hence, if xk x in the weak* topology,


then
!
khxk hx k2H
c k x ) 1,
= W(x
exp
2

and so hxk hx in H.
Given the first assertion, the compactness of {hx : x BE (0, R)} in H
follows from the compactness (cf. Exercise 5.1.19) of BE (0, R) in the weak*
topology. To see that BH (0, R) is compact in E, again apply Exercise 5.1.19 to
check that BH (0, R) is compact in the weak topology on H. Therefore, all that
we have to show is that the embedding map h H 7 h E is continuous
from the weak topology on H into the
 strong topology on E. Thus, suppose
that hk h weakly in H. Because hx : x BE (0, 1) is compact in H,
for each  > 0 there exist an n Z+ and a {x1 , . . . , xn } BE (0, 1) such that

{hx : x BE (0, 1)}

n
[

BH (hxm , ).

Now choose ` so that max1mn |hhk h, xm i| <  for all k `. Then, for any
x BE (0, 1) and all k `,


|hhk h, x i|  + min hk h, hx hxm H  + 2 sup khk kH .
1mn

k1

Since, by the uniform boundedness principle, supk1 khk kH < , this proves
that khk hkE = sup{hhk h, x i : x BE (0, 1)} 0 as k .
S
Because H = 1 BH (0, n) and BH (0, n) is a compact subset of E for each
n Z+ , it is clear that H BE . To see that W(H) = 0 when E is infinite
dimensional, choose {xn : n 0} as in the final part of Lemma 8.2.3, and
set Xn (x) = hx, xn i. Then the Xn s are an infinite
P sequence of independent,
centered, Gaussians with mean value 1, and so n=0 Xn2 = W-almost surely.
Hence, by Lemma 8.2.3, W-almost no x is in H.
Turning to the map I, define I(hx ) = h , x i. Then, for each x , I(hx ) is
a centered Gaussian with variance khx k2H , and so I is a linear isometry from

312

8 Gaussian Measures on a Banach Space

{hx : x E } into L2 (W; R). Hence, since {hx : x E } is dense in H, I


admits a unique extension as a linear isometry from H into L2 (W; R). Moreover,
as the L2 (W; R)-limit of centered Gaussians, I(h) is a centered Gaussian for each
h H. 
The map I in Theorem 8.2.6 was introduced for the classical Wiener space by
Paley and Wiener, and so I will call it the PaleyWiener map. To appreciate
its importance here, observe that {hx : x E } is the subspace of g H
with the property that h H 7 (h, g)H R admits a continuous extension
to E. Even though, when dim(H) = , no such continuous extension exists for
general g H, I(g) can be thought of as an extension of h
(h, g)H , albeit
one that is defined only up to a W-null set. Of course, one has to be careful
when using this interpretation, since, when H is infinite dimensional, I(g)(x)
for a given x E is not well-defined simultaneously of all g H. Nonetheless,
by adopting it, one gets further evidence for the idea that W wants to be the
standard Gauss measure on H. Namely, because
(8.2.7)

khk2


H
EW e 1 I(h) = e 2 ,

h H,

if W lived on H, then it would certainly be the standard Gauss measure there.


Perhaps the most important application of the PaleyWiener map is the following theorem about the behavior of Gaussian measures under translation.
That is, if y E and y : E E is given by y (x) = x + y, we will be looking
at the measure (y ) W and its relationship to W. Using the reasoning suggested
above, the result is easy to guess. Namely, if W really lived on H and were given
by a Feynman-type representation
W(dh) =

1 khk2H
e 2 H (dh),
Z

then (g ) W should have the Feynman representation


1 khgk2H
2
H (dh),
e
Z

which could be rewritten as







(g ) W (dh) = exp h, g H 12 kgk2H W(dh).

Hence, if we assume that I(g) gives us the correct interpretation of ( , g)H , we


are led to guess that, at least for g H,




(8.2.8) (g ) W(dx) (dh) = Rg (x) W (dx), where Rg = exp I(g) 12 kgk2H .

That (8.2.8) is correct was proved for the classical Wiener space by Cameron
and Martin, and for this reason it is called the CameronMartin formula. In
fact, one has the following result, the second half of which is due to Segal.

Exercises for 8.2

313

Theorem 8.2.9. If (H, E, W) is an abstract Wiener space, then, for each


g H, (g ) W  W and the Rg in (8.2.8) is the corresponding RadonNikodym
derivative. Conversely, if (y ) W is not singular with respect to W, then y H.
Proof: Let g H, and set = (g ) W. Then






(x ) = EW e 1hx+g,x i = exp 1hg, x i 12 khx k2H .

(*)

Now define by the right-hand side of (8.2.8). Clearly M1 (E). Thus, we


will have proved the first part once we show that is given by the right-hand
side of (*). To this end, observe that, for any h1 , h2 H,


 2

 1 I(h1 )+2 I(h2 ) 
22
1
2
2
kh1 kH + 1 2 h1 , h2 H + kh2 kH
E e
= exp
2
2
W

for all 1 , 2 C. Indeed, this is obvious when 1 and 2 are pure imaginary,
and, since both sides are entire functions of (1 , 2 ) C2 , it follows in general
by analytic
continuation. In particular, by taking h1 = g, 1 = 1, h2 = hx , and

2 = 1, it is easy to check that the right-hand side of (*) is equal to (x ).


To prove the second assertion, begin by recalling from Lemma 8.2.3 that if
y E, then y H if and only if there is a K < with the property that
|hy, x i| K for all x E with khx kH = 1. Now suppose that (x ) W 6
W, and let R be the RadonNikodym derivative of its absolutely continuous
part. Given x E with khx kH = 1, let Fx be the -algebra generated by
x
hx, x i, and check that (y ) W  Fx  W  Fx with RadonNikodym
derivative


hy, x i2

.
Y (x) = exp hy, x ihx, x i
2

Hence,

2


 1
Y EW R Fx EW R 2 Fx ,
and so (cf. Exercise 8.2.19)
hy, x i2
exp
8


 1
 1
= EW Y 2 EW R 2 (0, 1].

Since this means that hy, x i2 8 log 1 , the proof is complete.

Exercises for 8.2


Exercise 8.2.10. Let C Hom(RN ; RN be a positive definite and symmetric,
take E = RN to be the standard Euclidean metric, and let H = RN with
 the
Hilbert inner product (x, y)H = (x, C1 y)RN . Show that H, E, 0,C is an
abstract Wiener space.

314

8 Gaussian Measures on a Banach Space

Exercise 8.2.11. Let E be a separable Banach space and W a centered Gaussian measure on E, but do not assume
that

 W is non-degenerate. Denote by N
the set of x E for which EW hx, x i2 = 0, and set


= x E : hx, x i = 0 for all x N .
E
is closed, that W(E)
= 1, and that W  E
is a non-degenerate,
Show that E

centered Gaussian measure on E.



Hint: Since W {x E : hx, x i =
6 0} = 0 for each x N , the only question is
if and only
whether one can choose a countable subset C N such that x E

if hx, x i = 0 for all x C. For this purpose, recall that, by Exercise 5.1.19, E
with the weak* topology is second countable and therefore that N is separable
with respect to the weak* topology.
Exercise 8.2.12. Let {xP
separable Banach space
n : n 0} be a sequence in the P

E with the property that n=0 kxn kE < . Show that n=0 |n |kxP
n k < for

N
0,1
-almost every RN , and define X : RN E so that X() = n=0 n xn
P
if n=0 |n |kxn kE < and X() = 0 otherwise. Show that the distribution
of X is a centered, Gaussian measure on E. In addition, show that is
non-degenerate if and only if the span of {xn : n 0} is dense in E.
Exercise 8.2.13. Here an application of Ferniques Theorem to functional analysis. Let E and F be a pair of separable Banach spaces and a Borel measurable,
linear map from E to F . Given a centered, Gaussian E-valued random variable
X, use Exercise 2.3.21 see that X is an F -valued, a centered Gaussian random variable, and apply Ferniques Theorem to conclude that X is a square
integrable and has mean value 0. Next, suppose that is not continuous, and
choose {xn : n 0} E and {yn : n 0} F so that kxn kE = 1 = kyn kF
and h(xn ), yn i n + 13 . Using Exercise 8.2.12, show that there exist centered, Gaussian F -valued random variables {Xn : n 0},P
{X n : n 0},

N
2
and X under 0,1 such that Xn () = (n + 1) n xn , X() = n=0 Xn (), and
N
X n () = X() Xn () for 0,1
-almost every RN . Show that
Z

N
k
h X(), yn i 0,1
(d)
Z
N
h Xn (), yn i 0,1
(d) (n + 1),

X()k2F

N
0,1
(d)

N
and thereby arrive at the contradiction that X
/ L2 (0,1
; F ). Conclude that
every Borel measurable, linear map from E to F is continuous. Notice that,
as a consequence, we know that the PaleyWiener integral I(h) of an h in the
CameronMartin space is equal W-almost everywhere to a Borel measurable,
linear function if and only if h = hx for some x E .

Exercises for 8.2

315

Exercise 8.2.14. Let W p


bePa centered, Gaussian measure on a separable Ban
2
nach space E, and set =
m=1 am , where a1 , . . . , an R. If X1 , . . . , Xn are
mutually independent, E-valued random variables with distribution
Pn W on some
probability space (, F, P), show that the P-distribution of S m=1 am Xm is
the same as the W-distribution of x
x. In particular,




EP kSkpE = p EW kxkpE
for all p [0, ).

Hint: Using Exercise 8.2.11, reduce to the case when W is non-degenerate. For
this case, let H be the CameronMartin space for W on E, and show that
i
h
2

2
EP e 1hS,x i = e 2 khx kH for all x E .

Exercise 8.2.15. Referring to the setting in Lemma 8.2.3, show that there is a
(n)
sequence {k kE : n 0} of norms on E each of which is commensurate with
(N )
k kE (i.e., Cn1 k k k kE Cn k k for some Cn [1, )) such that, for
each R > 0,
(n)

BH (0, R) = {x E : kxkE R for all n 0}.


Hint: Choose {xm : m 0} E so that {hxm : m 0} is an orthonormal
Pn
basis for H, define Pn : E H by Pn x = m=0 hx, xm ihxm , and set
(n)
kxkE

kPn xk2H + kx Pn xk2E .

Exercise 8.2.16. Referring to the setting in Ferniques


Theorem,
observe that


all powers of kXkE are integrable, and set 2 = E kXk2E . Show that
h kXk2E i
E e 722 K.

In particular, for any n 1, conclude that




E kXk2n
(72)n n!K 2n ,
E
which is remarkably close to the equality that holds when E = R. See Corollary
8.4.3 for a sharper statement.
Exercise 8.2.17. Again let E be a separable, real Banach space. Suppose that
{Xn : n 1} is a sequence for centered, Gaussian E-valued random variables on
some probability space (, F, P) and that Xn X in P-probability. Show that
X is again a centered,
random variable and that there exists a > 0
 Gaussian
2 
for which supn1 EP ekXn kE < . Conclude, in particular, that Xn X in
Lp (P; E) for every p [1, ).

316

8 Gaussian Measures on a Banach Space

Exercise 8.2.18. Given (RN ) , I pointed out at the end of 8.1.2 that the
PaleyWiener integral
 [I(h )]() can be interpreted as the RiemannStieltjes
integral of (s, ) with respect to (s). In this exercise, I will use this observation as the starting point for what is called stochastic integration.

(i) Given (RN ) and t > 0, set t (d ) = 1[0,t) ( )(d ) + t [t, ) , and
show that for all (RN )
h, t i =


(, ) d( ),

where the integral on the right is taken in the sense of RiemannStieltjes. In


particular, conclude that t
h, t i is continuous for each .

(ii) Given f Cc1 [0, ); RN , set f (d ) = f ( ) d , and show that
h, tf i =

f ( ) d( ),
0

where again the integral on the right is RiemannStieltjes. Use this to see that
the process
Z t

f ( ) d( ) : t 0
0

has the same distribution under W (N ) as


(*)

 Z t


2
B
|f ( )| d : t 0 ,
0

where {B(t) : t 0} is an R-valued Brownian motion.



R t
(iii) Given f L2loc [0, ); RN and t > 0, set htf ( ) = 0 f (s) ds. Show that


the W (N ) -distribution of the process I(htf ) : t 0 is the same as that of the
process in (*). In particular, conclude (cf. part (ii) of Exercise 4.3.16) that there
is a continuous modification of the process {I(htf ) : t 0}. For reasons made
clear in (ii), such a continuous modification is denoted by
Z


f ( ) d( ) : t 0 .

Of course, unless f has bounded variation, the integrals in the preceding are
no longer interpretable as RiemannStieltjes integrals. In fact, they not even
defined by but only as a stochastic process. For this reason, they are called
stochastic integrals.

8.3 From Hilbert to Abstract Wiener Space

317

Exercise 8.2.19. Define Rg as in (8.2.8), and show that




 p  p1
(p 1)kgk2H
W
for all p (0, ).
E Rg = exp
2
Exercise 8.2.20. Here is another way to think about Segals half of Theorem
8.2.9. Using Lemma 8.2.3, choose {xn : n 0} E so that {hxn : n 0} is
an orthonormal basis for H. Next, define F : E RN so thatQ
F (x)n = hx, xn i

N
for each n N, and show that F W = 0,1
and (F y ) W = 0 an ,1 , where
Q

N
an = hy, xn i. Conclude from this that (y ) W W if 0,1
0 an ,1 . Finally,
P
use this together with Exercise 5.2.42 to see that (y ) W W if 0 a2m = ,
which, by Lemma 8.2.3, will be the case if y
/ H.

8.3 From Hilbert to Abstract Wiener Space


Up to this point I have been assuming that we already have at hand a nondegenerate, centered Gaussian measure W on a Banach space E, and, on the
basis of this assumption, I produced the associated CameronMartin space H.
In this section, I will show how one can go in the opposite direction. That is,
I will start with a separable, real Hilbert space H and show how to go about
finding a separable, real Banach space E for which there exists a W M1 (E)
such that (H, E, W) is an abstract Wiener space. Although I will not adopt his
approach, the idea of carrying out such a program is Grosss.
Warning: From now on, unless the contrary is explicitly stated, I will be assuming that the spaces with which I am dealing are all infinite dimensional,
separable, and real.
8.3.1. An Isomorphism Theorem. Because, at an abstract level, all infinite
dimensional, separable Hilbert spaces are the same, one should expect that, in a
related sense, the set of all abstract Wiener spaces for which one Hilbert space is
the CameronMartin space is the same as the set of all abstract Wiener spaces
for which any other Hilbert space is the CameronMartin space. The following
simple result verifies this conjecture.
Theorem 8.3.1. Let H and H 0 be a pair of Hilbert spaces, and suppose that
F is a linear isometry from H onto H 0 . Further, suppose that (H, E, W) is
an abstract Wiener space. Then there exists a separable, real Banach space
E 0 H 0 anda linear isometry F from E onto E 0 such that F  H = F and
H 0 , E 0 , F W is an abstract Wiener space.
Proof: Define kh0 kE 0 = kF 1 h0 kE for h0 H 0 , and let E 0 be the Banach space
obtained by completing H 0 with respect to k kE 0 . Trivially, H 0 is continuously
embedded in E 0 as a dense subspace, and F admits a unique extension F as an
isometry from E onto E 0 . Moreover, if (x0 ) (E 0 ) and F > is the adjoint map
from (E 0 ) onto E , then

h0 , h0(x0 ) H 0 = hh0 , (x0 ) i = hF 1 h0 , F > (x0 ) i


= F 1 h0 , hF > (x0 ) H = h0 , F hF > (x0 ) H 0 ,

318

8 Gaussian Measures on a Banach Space

and so h0(x0 ) = F hF > (x0 ) . Hence,


i
i
h
i
h
h
0
0
0
> 0

EF W e 1 hx ,(x ) i = EW e 1 hF x,(x ) i = EW e 1 hx,F (x ) i


1

1 kF 1 h0

k2

1 kh0

k2

(x0 ) H 0 ,
(x0 ) H = e 2
= e 2 khF > (x0 ) kH = e 2

which completes the proof that H 0 , E 0 , F W is an abstract Wiener space. 
Theorem 8.3.1 says that there is a one-to-one correspondence between the abstract Wiener spaces associated with one Hilbert space and the abstract Wiener
spaces associated with any other. In particular, it allows us to prove the theorem
of Gross which states that every Hilbert space is the CameronMartin space for
some abstract Wiener space.

Corollary 8.3.2. Given a separable, real Hilbert space H, there exists a


separable Banach space E and a W M1 (E) such that (H, E, W) is an abstract
Wiener space.
Proof: Let F : H 1 (R) H be an isometric isomorphism, and use Theorem
8.3.1 to construct a separable Banach space E and an isometric, isomorphism
F : (R) E so that (H, E, W) is an abstract Wiener space when W =
F W (1) . 
It is important to recognize that although a non-degenerate, centered Gaussian
measure on a Banach space E determines a unique CameronMartin space H,
a given H will be the CameronMartin space for an uncountable number of
abstract Wiener spaces. For example, in the classical case when H = H1 (RN ),
we could have replaced (RN ) by a subspace which reflected the fact that almost
every Brownian path is locally H
older continuous of any order less than a half.
We will see a definitive, general formulation of this point in Corollary 8.3.10.
8.3.2. Wiener Series. The proof that I gave of Corollary 8.3.2 is too nonconstructive to reveal much about the relationship between H and the abstract
Wiener spaces for which it is the CameronMartin space. Thus, in this subsection I will develop another, entirely different way of constructing abstract
Wiener spaces for a Hilbert space.
The approach here has its origins in one of Wieners own constructions of
Brownian motion and is based on the following line of reasoning. Given H,
choose an orthonormal basis {hn : n 0}. If there were a standard Gauss
measure
 W on H, then the random variables {Xn : n 0} given by Xn (h) =
h, hn H would be independent, standard normal, R-valued random variables,
P
and, for each h H,
0 Xn (h)hn would converge in H to h. Even though
W cannot live on H, this line of reasoning suggests that a way to construct an
abstract Wiener space is to start with a sequence {Xn : n 0} of R-valued,
independent standard normalPrandom variables on some probability space, find

a Banach space E in which 0 Xn hn converges with probability 1, and take


W on E to the distribution of this series.

8.3 From Hilbert to Abstract Wiener Space

319

To convince oneself that this line of reasoning has a chance of leading somewhere, one should observe that Levys construction corresponds to a particular choice of the orthonormal basis {hm : m 0}.1 To see this, determine
{h k,n : (k, n) N2 } by



1
on k21n , (2k + 1)2n



n1
h k,0 = 1[k,k+1) and h k,n = 2 2
1 on (2k + 1)2n , (k + 1)21n

0
elsewhere

for n 1. Clearly, the h k,n s are orthonormal in L2 [0, ); R . In addition, for
each n N, the span of {h k,n : k N} equals that of {1[k2n ,(k+1)2n ) : k N}.
Perhaps the easiest way to check this is to do so by dimension counting. That
is, for a given (`, n) N2 , note that

{h `,0 } {h k,m : `2m1 k < (` + 1)2m1 and 1 m n}


has the same number of elements as {1[k2n ,(k+1)2n ) : `2n k < (` + 1)2n }
and that the first set is contained in the span of the second. As a consequence,

we know that {h k,n : (k, n) N2 } is an orthonormal basis in L2 [0, ); R , and
Rt
so, if hk,n (t) = 0 h k,n ( ) d and (e1 , . . . , eN ) is an orthonormal basis in RN ,
then


hk,n,i hk,n ei : (k, n, i) N2 {1, . . . , N }
is
known as the Haar basis, in H1 (RN ). Finally, if
 an orthonormal basis,
2
Xk,n,i : (k, n, i) N {1, . . . , N } is a family of independent, N (0, 1)-random
PN
variables and Xk,n = i=1 Xk,n,i ei , then
n X
X
N
X

Xk,m,i hk,m,i (t) =

m=0 k=0 i=1

n X

hk,m (t)Xk,m

m=0 k=0

is precisely the polygonalization that I denoted by Bn (t) in Levys construction


(cf. 4.3.2).
The construction by Wiener, alluded to above, was essentially the same, only
he chose a different basis for H1 (RN ). Wiener took h k,0 (t) = 1[k,k+1) (t) for

1
k N and h k,n (t) = 2 2 1[k,k+1) (t) cos n(t k) for (k, n) N Z+ , which
means that he was looking at the series

1

X
X
2 2 sin n(t k)
Xk,n ,
(t k)1[k,k+1) (t)Xk,0 +
1[k,k+1) (t)
n
+
k=0

(k,n)NZ

The observation that L


evys construction (cf. 4.3.2) can be interpreted in terms of a Wiener
series is due to Z. Ciesielski. To be more precise, initially Ciesielski himself was thinking
entirely in terms of orthogonal series and did not realize that he was giving a re-interpretation
of L
evys construction. Only later did the connection become clear.

320

8 Gaussian Measures on a Banach Space

where again {Xk,n : (k, n) N2 } is a family of independent, RN -valued, N (0, I)random variables. The reason why Levys choice is easier to handle than Wieners
is that, in Levys case, for each n Z+ and t [0, ), hk,n (t) 6= 0 for precisely
one k N. Wieners choice has no such property.
With these preliminaries, the following theorem should come as no surprise.
Theorem 8.3.3. Let H be an infinite dimensional, separable, real Hilbert
space and E a Banach space into which H is continuously embedded as a dense
subspace. If for some orthonormal basis {hm : m 0} in H the series

(8.3.4)

m hm converges in E

m=0
N
for 0,1
-almost every = (0 , . . . , m , . . . ) RN

and if S : RN E is given by
 P
m=0 m hm
S() =
0

when the series converges in E


otherwise,


N
then H, E, W with W = S 0,1
is an abstract Wiener space. Conversely, if
(H, E, W) is an abstract Wiener space and {hm : m 0} is an orthogonal
sequence in H such that, for each m N, either hm = 0 or khm kH = 1, then
"
(8.3.5)


p #
n
X



sup
I(hm )hm
< for all p [1, ),


n0
m=0

and, for W-almost every x E,


m=0 [I(hm )](x)hm converges
 in E to the
W-conditional expectation value of x given {I(hm ) : m 0} . Moreover,

[I(hm )](x)hm is W-independent of x

m=0

[I(hm )](x)hm .

m=0

Finally,P
if {hm : m 0} is an orthonormal basis in H, then, for W-almost every

x E, m=0 [I(hm )](x)hm converges in E to x, and the convergence is also in


Lp (W; E) for every p [1, ).
Proof: P
First assume that (8.3.4) holds for some orthonormal basis, and set
n
N
Sn () = m=0 m hm and W = S 0,1
. Then, because Sn () S() in E for
N
N
0,1 -almost every R ,
n
i
h
Y
2
2
N
1
1
c ) = lim E0,1
e 2 (hx ,hm )H = e 2 khx kH ,
W(x
e 1hSn ,i = lim
n

m=0

8.3 From Hilbert to Abstract Wiener Space

321

which proves that (H, E, W) is an abstract Wiener space.


Next suppose that (H, E, W) is an abstract Wiener space and that {hm :
m 0} is an orthogonal sequence with khm kH {0, 1} for each m 0. By
Theorem 8.2.1, x Lp (W; E) for every p [1, ). Next, for each
W n N, set
Fn = {I(hm ) : 0 m n} . Clearly, Fn Fn+1 and F n=0 Fn is the
Pn
-algebra generated by {I(hm ) : m 0}. Moreover, if Sn = m=0 I(hm )hm ,
then, since {I(hm ) : m 0} is a Gaussian family and hx Sn (x), x i is perpendicular in L2 (W; R) to I(hm ) for all x E and 0 m n, x Sn (x) is
W-independent of Fn . Thus Sn = EW [x | Fn ], and so, by Theorem 6.1.12, we
know both that (8.3.5) holds and that Sn EW [x | F] W-almost surely. In
addition, the W-independence of Sn (x) from x Sn (x) implies that the limit
quantities possess the same independence property.
In order to complete the proof at this point, all that I have to do is show that
x = EW [x | F] W-almost surely when {hm : m 0} is an orthonormal basis.
W
Equivalently, I must check that BE is contained P
in the W-completion F of F.
n
To this end, note that, for each h H, because m=0 (h, hm )H hm converges in
H to h,
!
n
n
X
X


h, hm H I(hm ) = I
h, hm H hm I(h) in L2 (W; R).
m=0

m=0

Hence, I(h) is F -measurable for every h H. In particular, this means that


W
x
hx, x i is F -measurable for every x E , and so, since BE is generated
W
by {h , x i : x E }, BE F . 
It is important to acknowledge that the preceding theorem does not give another proof of Wieners theorem that Brownian motion exists. Instead, it simply
says that, knowing it exists, there are lots of ways in which to construct it. See
Exercise 8.3.21 for a more satisfactory proof of the same conclusion in the classical case, one that does not require the a priori existence of W (N ) .
The following result shows that, in some sense, a non-degenerate, centered,
Gaussian measure W on a Banach space does not fit on a smaller space.

Corollary 8.3.6. If W is a non-degenerate, centered Gaussian measure on


a separable Banach space E, then E is the support of W in the sense that W
assigns positive probability to every non-empty open subset of E.
Proof: Let H be the CameronMartin
space for W. Since H is dense in E, it

suffices to show that W BE (g, r) > 0 for every g H and r > 0. Moreover,
since, by the CameronMartin formula (8.2.8) (cf. Exercise 8.2.19)




W BE (0, r) = (g ) W BE (g, r) = EW Rg , BE (g, r)
q
kgk2

H
W BE (g, r) ,
e 2

322

8 Gaussian Measures on a Banach Space


I need only show that W BE (0, r) > 0 for all r > 0. To this end, choose an
Pn
orthonormal basis {hm : m 0} in H, and set Sn = m=0 I(hm )hm . Then, by
Theorem 8.3.3, x
Sn (x) is W-independent of x
x Sn (x) and
 Sn (x) x
in E for W-almost every x E. Hence, W {kx Sn (x)kE < 2r } 12 for some
n N, and therefore


W BE (0, r) 12 W kSn kE < 2r .
Pn
But kSn k2E CkSn k2H = m=0 I(hm )2 for some C < , and so



n+1
r
> 0 for any r > 0. 
BRn+1 0, 2C
W kSn kE < 2r 0,1

8.3.3. Orthogonal Projections. Associated with any closed, linear subspace L of a Hilbert space H, there is an orthogonal projection map L : H
L determined by the property that, for each h H, h L h L. Equivalently,
L h is the element of L that is closest to h. In this subsection I will show that if
(H, E, W) is an abstract Wiener space and L is a finite dimensional subspace of
H, then L admits a W-almost surely unique extension PL to E. In addition,
I will show that PL x x in L2 (W; E) as L % H.
Lemma 8.3.7. Let (H, E, W) be an abstract Wiener space
P and {hm : m
0} an orthonormal basis in H. Then, for each h H,
m=0 (h, hm )H I(hm )
converges to I(h) W-almost surely and in Lp (W; R) for every p [1, ).
Proof: Define the -algebras Fn and F as in the proof P
of Theorem 8.3.3. Then,
n
by the same argument as I used there, one can identify m=0 (h, hm )H I(hm ) as
W

EW [I(h) | Fn ]. Thus, since F BE , the required convergence statement is an


immediate consequence of Corollary 5.2.4. 

Theorem 8.3.8.
Let (H, E, W) be an abstract Wiener space. For each
finite dimensional subspace L of H there is a W-almost surely unique map
PL : E H such that, for every h H and W-almost every x E,
h, PL x H = I(L h)(x), where L denotes orthogonal projection from H onto
L. In fact, if {g1 , . . . , gdim(L) } is an orthonormal basis for L, then PL x =
Pdim(L)
[I(gi )](x)gi , and so PL x L for W-almost every x E. In partic1
ular, the distribution of x E 7 PL x L under W is the same as that
Pdim(L)
dim(L)
of (1 , . . . , dim(L) ) Rdim(L) 7
i gi L under 0,1
. Finally,
1
x
PL x is W-independent of x
x PL x.
Proof: Set ` = dim(L). It suffices to note that
!
`
`
X
X
I(L h) = I
(h, gk )H gk =
(h, gk )H I(gk ) =
k=1

k=1

`
X
k=1

!
I(gk )gk , h
H

for all h H 
We now have the preparations needed to prove a result which shows that
my definition of an abstract Wiener space is the same as Grosss. Specifically,
Grosss own definition was based on the property proved in the following.

8.3 From Hilbert to Abstract Wiener Space

323

Theorem 8.3.9. Let (H, E, W) be an abstract Wiener space and


 {hn : n 0}
an orthonormal basis for H, and set Ln = span {h
,
.
.
.
,
h
}
n . Then, for all
 0 2
W
2
 > 0 there exists an n N such that E kPL xkE  whenever L is a finite
dimensional subspace that is perpendicular to Ln .
Proof: Without loss in generality, I will assume that k kE k kH .
Arguing by contradiction, I will show that if the asserted property does not
hold,P
then there would exist an orthonormal basis {fn : n 0} for H such

that 0 I(fn )fn fails to converge in L2 (W; E). Thus, suppose that there exists
an  > 0 such that
 for all n N there exists a finite dimensional L Ln
with EW kPL xk2E 2 . Under
 this assumption, define
{nm : m 0}
N, {`m : m 0} N, and {f0 , . . . , fnm } : m 0 Lnm inductively
by the following prescription. First, take n0 = 0 = `0 and f0 = h0 . Next,
knowing nm and {f0, . . . , fnm }, choose a finite dimensional subspace L Lnm
so that EW kPL xk2E 2 , set `m = dim(L), and let {gm,1 , . . . , gm,`m } be an
orthonormal basis for L. For any > 0 there exists an n nm + `m such that
`m
X



Ln gm,i , Ln gm,j
i,j .
H
i,j=1

In particular, if (0, 1), then the elements of {Ln gm,i : 1 i `m } are


linearly independent and the orthonormal set {
gm,i : 1 i `m } obtained from
them via the GramSchmidt orthogonalization procedure satisfies (cf. Exercise
8.3.16)
`m
X

`m
X



Ln gm,i , Ln gm,j i,j

k
gm,i gm,i kH K`m

i=1

i,j=1

for some Km < which depends only on `m . Moreover, and because L Lnm ,
gm,i Lnm for all 1 i `m. Hence, we can find an nm+1 nm + `m so that
span {hn : nm < n nm+1 } admits an orthonormal basis {fnm +1 , . . . , fnm+1 }
P`
with the property that 1m kgm,i fnm +i kH 4 .
Clearly {fn : n 0} is an orthonormal basis for H. On the other hand,


2 12
2 12
`m
+`m
X

nmX






I(gm,i )gm,i I(fnm +i )fnm +i
EW
I(fn )fn  EW




n=nm +1

`m
X

2  1

EW I(gm,i )gm,i I(fnm +i )fnm +i H 2 ,

2  1

and so, since EW I(gi,m )gm,i I(fnm +i )fnm +i H 2 is dominated by
2  1

1


EW I(gm,i ) I(fnm +i ) gm,i H 2 + EW I(fnm +i )2 2 kgm,i fnm +i kH

2kgm,i fnm +i kH ,

324

8 Gaussian Measures on a Banach Space

we have that

2 12
+`m
nmX




EW
I(fn )fn


2
n +1
m

for all m 0,

P
and this means that 0 I(fn )fn cannot be converging in L2 (W; E). 
Besides showing that my definition of an abstract Wiener space is the same
as Grosss, Theorem 8.3.9 allows us to prove a very convincing statement, again
due to Gross, of just how non-unique is the Banach space for which a given
Hilbert space is the CameronMartin space.
Corollary 8.3.10. If (H, E, W) is an abstract Wiener space, then there
exists a separable Banach space E0 that is continuously embedded in E as a
measurable subset and has the properties that W(E
 0 ) = 1, bounded subsets of
E0 are relatively compact in E, and (H, E0 , W  E0 is again an abstract Wiener
space.
Proof: Again I will assume that k kE k kH .
Choose {xn : n 0} E so that {hn : n 0} is an orthonormal basis
in H when hn = hxn , and set Ln = span {h0 , . . . , hn } . Next, using Theorem 8.3.9, choose an increasing sequence {nm : m 0} so that n0 = 0 and

1
EW kPL xk2E 2 2m for m 1 and finite dimensional L Lnm , and define
Q` for ` 0 on E into H so that

Q0 x =

hx, x0 ih0

and Q` x =

n`
X

hx, xn ihn

when ` 1.

n=n`1 +1

Finally, set Sm = PLnm =


that
kxkE0 kQ0 xkE +

Pm

`=0

Q` , and define E0 to be the set of x E such


`2 Q` xkE <

and kSm x xkE 0.

`=1

To show that k kE0 is a norm on E0 and that E0 with norm k kE0 is a


Banach space, first note that if x E0 , then
kxkE = lim kSm xkE kQ0 xkE + lim
m

m
X

kQ` xkE kxkE0 ,

m `=1

and therefore k kE0 is certainly a norm on E0 . Next, suppose that the sequence
{xk : k 1} E0 is a Cauchy sequence with respect to k kE0 . By the
preceding, we know that {xk : k 1} is also Cauchy convergent with respect to

8.3 From Hilbert to Abstract Wiener Space

325

k kE , and so there exists an x E such that xk x in E. We need to show


that x E0 and that kxk xkE0 0. Because {xk : k 1} is bounded in E0 ,
it is clear that kxkE0 < . In addition, for any m 0 and k 1,
kx Sm xkE = lim kx` Sm x` kE lim kx` Sm x` kE0
`

= lim

n kQn x` kE

` n>m

n2 kQn xk k + sup kx` xk kE0 .

n>m

`>k

Thus, by choosing k for a given  > 0 so that sup`>k kx` xk kE0 < , we
conclude that limm kx Sm xkE <  and therefore that Sm x x in E.
Hence, x E0 . Finally, to see that xk x in E0 , simply note that

kx xk kE0 = kQ0 (x xk )kE +

m2 kQm (x xk )kE

m=1

kQ0 (x` xk )kE +

lim
`

!
2

m kQm (x` xk )kE

sup kx` xk kE0 ,


`>k

m=1

which tends to 0 as k .
To show that bounded subsets of E0 are relatively compact in E, it suffices
to show that if {x` : ` 1} BE0 (0, R), then there is an x E to which a
subsequence converges in E. For this purpose, observe that, for each m 0,
there is a subsequence {x`k : k 1} along which Sm x`k converges in Lnm .
Hence, by a diagonalization argument, {x`k : k 1} can be chosen so that
{Sm x`k : k 1} converges in Lnm for all m 0. Since, for 1 j < k,
X
kx`k x`j kE kSm x`k Sm x`j kE +
kQn (x`k x`j )kE
n>m

kSm x`k Sm x`j kE + 2R

X 1
,
n2
n>m

it follows that {x`k : k 1} is Cauchy convergent in E and therefore that it


converges in E.
I must still show that E0 BE and that (H, E0 , W0 ) is an abstract Wiener
space when W0 = W  E0 . To see the first of these, observe that x E 7
kxkE0 [0, ] is lower semicontinuous and that {x : kSm x xkE 0} BE .
In addition, because, by Theorem 8.3.3, kSm x xkE 0 for W-almostevery
x E, we will know that W(E0 ) = 1 once I show that W kxkE0 < = 1,
which follows immediately from




 X


EW kxkE0 = EW kQ0 xkE +
m2 EW kQm xkE
1



EW kQ0 xkE +

X
1


1
m2 EW kQm xk2E 2 < .

326

8 Gaussian Measures on a Banach Space

The next step is to check that H is continuously embedded in E0 . Certainly


h H = kSm h hkE kSm h hkH 0. Next suppose that h
H \ {0} and that h Lnm , and let L be the line spanned by h. Then PL x =
khk2
H [I(h)](x)h, and so, because L Lnm ,


1
khkE
1
W
2 2 khkE
.
=

E
I(h)
khkH
khk2H
2m

Hence, we now know that h Lnm = khkE 2m khkH . In particular,


kQm+1 hkE 2m kQm+1 hkH 2m khkH for all m 0 and h H, and so
!

X
X
m2
2
khkH = 25khkH .
khkE0 = kQ0 hkE +
m kQm hkE 1 + 2
m
2
m=1
m=1

To complete the proof, I must show that H is dense in E0 and that, for each
c0 (y ) = e 12 khy k2H , where W0 = W  E0 and hy H is determined
y E0 , W

by h, hy H = hh, y i for h H. Both these facts rely on the observation that
X
kx Sm xkE0 =
n2 kQn xkE 0 for all x E0 .
n>m

Knowing this, the density of H in E0 is obvious. Finally, if y E0 , then, by


the preceding and Lemma 8.3.7,
hx, y i = lim hSm x, y i = lim
m

= lim

nm
X

hy , hn

 
H

nm
X

hx, xn ihhn , y i

n=0




I(hn ) (x) = I(hy ) (x)

n=0

for W0 -almost every x E0 . Hence h , y i under W0 is a centered Gaussian


with variance khy k2H . 
8.3.4. Pinned Brownian Motion. Theorem 8.3.8 has a particularly inter
esting application to the classical abstract Wiener space H1 (RN ), (RN ), W (N ) .
Namely,
suppose that 0 = t0 < t1 < < tn , and let L be the span of

htm e : 1 m n and e SN 1 , where ht ( ) t . In this case,
PL =

n
X

htm htm1
(tm ) (tm1 ) ,
t tm1
m=1 m

and so
(t1 ,... ,tn ) (t) [ PL ](t)
(

ttm1
if t [tm1 , tm ]
(t) (tm1 ) tm
(8.3.11)
tm1 (tm ) (tm1 )
=
(t) (tn )
if t [tn , ).

8.3 From Hilbert to Abstract Wiener Space

327

Thus, if (, ~y) (RN ) (RN )n 7 (t1 ,... ,tn ),~y (RN ) is given by
(t1 ,... ,tn ),~y = (t1 ,... ,tn ) +

n
X
htm htm1
(ym ym1 ),
t tm1
m=1 m

where ~y = (y1 , . . . , yn ) and y0 0, then, for any Borel measurable F : (RN )


(RN )n [0, ),
Z


F , (t1 ), . . . , (tn ) W (N ) (d)
(RN )

(8.3.12)

 Z

F (t1 ,... ,tn ),~y , ~y W

=
(RN )n

(N )


(d) 0,C(t1 ,... ,tn ) (d~y),

(RN )

where C(t1 , . . . , tn )(m,i),(m0 i0 ) = tm tm0 i,i0 for 1 m, m0 n and 1 i, i0 N


is the covariance of
((t1 ), . . . , (tn )) under W (N ) . Equivalently, if
(t1 ,... ,tn ),~y = (t1 ,... ,tn ) +

n
X
htm htm1
ym ,
t tm1
m=1 m

then
Z



F , (t1 ) (t0 ), . . . , (tn ) (tn1 ) W (N ) (d)

(RN )

(8.3.13)

 Z

=
(RN )n


 (N )

~
F (t1 ,... ,tn ),~y , y W
(d) 0,D(t1 ,... ,tn ) (d~y),

(RN )

where D(t1 , . . . , tn )(m,i),(m0 ,i0 ) = (tm tm1 )m,m0 i,i0 for 1 m, m0 n and
1 i, i0 N is the covariance matrix for (t1 ) (t0 ), . . . , (tn ) (tn1 )
under W (N ) .
There are several comments that should be made about these conclusions. In
the first place, it is clear from (8.3.11) that t
(t1 ,... ,tn ) (t) returns to the origin
at each of the times {tm : 1 m n}. In addition, the excursions (t1 ,... ,tn ) 
[tm1 , tm ], 1 m n, are independent of each other and of (t1 ,... ,tn )  [tn , ).
(N )

Secondly, if W(t1 ,... ,tn ),~y denotes the W (N ) -distribution of


(8.3.12) says that
(N )

W(t1 ,... ,tn ),((t1 ),... ,(tn ))

(t1 ,... ,tn ),~y , then

is a regular conditional probability distribution (cf. 9.2) of W (N ) given the algebra generated
by {(t1 ), . . . , (tn )}. Expressed in more colloquial terms, the

process (t1 ,... ,tn ),~y (t) : t 0 is Brownian motion pinned to the points
{ym : 1 m n} at times {tm : 1 m n}.

328

8 Gaussian Measures on a Banach Space

8.3.5. Orthogonal Invariance. Consider the standard Gauss distribution


0,I on RN . Obviously, 0,I is rotation invariant. That is, if O is an orthogonal transformation on RN , then 0,I is invariant under the transformation
TO : RN RN given by TO x = Ox. On the other hand, none of these
transformations can be ergodic, since any radial function on RN is invariant
under TO for every O.
Now think about the analogous situation when RN is replaced by an infinite
dimensional Hilbert space H and (H, E, W) is an associated abstract Wiener
space. As I am about to show, W still enjoys rotation invariance with respect
to orthogonal transformations on H. On the other hand, because kxkH =
for W-almost every x E, there are no non-trivial radial functions now, a
fact that leaves open the possibility that some orthogonal transformation of H
give rise to ergodic transformations for W. The purpose of this subsection is
to investigate these matters, and I begin with the following formulation of the
rotation invariance of W.
Theorem 8.3.14. Let (H, E, W) be an abstract Wiener space and O an orthogonal transformation on H. Then there is a W-almost surely unique, Borel
measurable map TO : E E such that I(h) TO = I(O> h) W-almost surely
for each h H. Moreover, W = (TO ) W.
Proof: To prove uniqueness, note that if T and T 0 both satisfy the defining
property for TO , then, for each x E ,
hT x, x i = I(hx )(T x) = I(O> hx ) = I(hx )(T 0 x) = hT 0 x, x i
for W-almost every x E. Hence, since E is separable in the weak* topology,
T x = T 0 x for W-almost every x E.
To prove existence, choose an orthonormal
basis {hm : m
for H, and let
P
P0}

C be the set of x E for which both m=0 [I(hm )](x)hm and m=0 [I(hm )](x)Ohm
converge in E. By Theorem 8.3.3, we know that W(C) = 1 and that
 P
m=0 [I(hm )](x)Ohm if x C
x
TO x
0
if x
/C
has distribution W. Hence, all that remains is to check that I(h)TO = I(O> h)
W-almost surely for each h H. To this end, let x E , and observe that

[I(hx )](TO x) = hTO x, x i =

hx , Ohm


H

[I(hm )](x)

m=0

O> hx , hm


H

[I(hm )](x)

m=0

for W-almost every x E. Thus, since, by Lemma 8.3.7, the last of these
series convergences W-almost surely to I(O> hx ), we have that I(hx ) TO =

8.3 From Hilbert to Abstract Wiener Space

329

I(O> hx ) W-almost surely. To handle general h H, simply note that both


h H 7 I(h) TO L2 (W; R) and h H 7 I(O> h) L2 (W; R) are
isometric, and remember that {hx : x E } is dense in H. 
I next want to discuss the possibility of TO being ergodic for some orthogonal transformations O. First notice that TO cannot be ergodic if O has a
non-trivial, finite dimensional invariant
Pn subspace L, since if {h1 , . . . , hn } were
an orthonormal basis for L, then m=1 I(hm )2 would be a non-constant, TO invariant function. Thus, the only candidates for ergodicity are Os that have
no non-trivial, finite dimensional, invariant subspaces. In a more general and
highly abstract context, I. Segal2 showed that the existence of a non-trivial, finite dimensional subspace for O is the only obstruction to TO being ergodic.
Here I will show less.
Theorem 8.3.15. Let (H, E, W) be an abstract Wiener space. If O is an
orthogonal transformation
on H with the property that, for every g, h H,

limn On g, h H = 0, then TO is ergodic.

Proof: What I have to show is that any TO -invariant element L2 (W; R)


is W-almost surely constant, and for this purpose it suffices to check that
(*)




lim EW ( TOn ) = 0
n

for all L2 (W; R) with mean value 0. In fact, if {hm : m 1} is an


orthonormal basis for H, then it suffices to check (*) when

(x) = F [I(h1 )](x), . . . , [I(hN )](x)
for some N Z+ and bounded, Borel measurable F : RN R. The reason
why it is sufficient to check it for such s is that, because TO is W-measure
preserving, the set of s for which (*) holds is closed in L2 (W; R). Hence, if
we start with any L2 (W; R) with mean value 0, we can first approximate it
in L2 (W; R) by bounded functions with mean value 0 and then
 condition these
bounded approximates with respect to {I(h1 ), . . . , I(hN )} to give them the
required form.

Now suppose that = F I(h1 ), . . . , I(hN ) for some N and bounded, measurable F . Then
ZZ


EW TOn =
F ()F () 0,Cn (d d),
RN RN
2

See I.E. Segals Ergodic subsgroups of the orthogonal group on a real Hilbert Space, Annals
of Math. 66 # 2, pp. 297303 (1957). For a treatment in the setting here, see my article Some
thoughts about Segals ergodic theorem, Colloq. Math. 118 # 1, pp. 89-105 (2010).

330

8 Gaussian Measures on a Banach Space

where

Cn =

I
B>
n

Bn
I


with Bn =



hk , On h`

 
H

1k,`N

and the block structure corresponds to RN RN . Finally, by our hypothesis


about O, we can find a subsequence {nm : m 0} such that limm Bnm = 0,
from which it is clear that 0,Cnm tends to 0,I 0,I in variation and therefore


lim EW ( TOnm ) = EW []2 = 0. 

Perhaps the best tests for whether an orthogonal transformation satisfies the
hypothesis in Theorem 8.3.15 come from spectral theory. To be more precise, if
Hc and Oc are the space and operator obtained by complexifying H and O, the
Spectral Theorem for normal operators allows one to write
Z
Oc =

dE ,

where {E : [0, 2)} is a resolution of the identity in Hc by orthogonal


projection operators. The spectrum of Oc is said to be absolutely
continuous

if, for each h Hc , the non-decreasing function
E h, h Hc is absolutely

continuous, which, by polarization, means that
E h, h0 Hc is absolutely
continuous for all h, h0 Hc . The reason for introducing this concept here is
that, by combining the RiemannLebesgue Lemma with Theorem 8.3.15, one can
prove that TO is ergodic if the spectrum of Oc is absolutely continuous.3 Indeed,

given h, h0 H, let f be the RadonNikodym derivative of
E h, h0 H ,
c
and apply the RiemannLebesgue Lemma to see that
n

O h, h


H

1n

f () d 0 as n .

See Exercises 8.3.24, 8.3.25, and 8.5.15 for a more concrete examples.
Exercises for 8.3
Exercise 8.3.16. The purpose of this exercise is to provide the linear algebraic
facts that I used in the proof of Theorem 8.3.9. Namely, I want to show that if
a set {h1 , . . . , hn } H is approximately orthonormal, then the vectors hi differ
by very little from their GramSchmidt orthogonalization.
3

This conclusion highlights the poverty of the result here in comparison to Segals result,
which says that TO is ergodic as soon as the spectrum of Oc is continuous.

Exercises for 8.3

331


(i) Suppose that A = aij 1i,jn Rn Rn is a lower triangular matrix whose
diagonal entries are non-negative. Show that there is a Cn < , depending only
on n, such that kIRn Akop Cn kIRn AA> kop .
Hint: Show that it suffices to treat the case when AA> 2IRn , and set =
IRn AA> . Assuming that AA> 2IRn , work by induction on n, at each step
using the lower triangularity of A, to see that

12
`
X
1
a2` j if 1 ` < n
|a` ` an ` | |n ` | + (AA> )n2 n
j=1

n1
X


1 a2n n |n n | +
a2n ` .
`=1



(ii) Let {h1 , . . . , hn } H, set B = (hi , hj )H 1i,jn , and assume that kIRn
Bkop < 1. Show that the hi s are linearly independent.
(iii) Continuing part (ii), let {f1 , . . . , fn } be the orthonormal set obtained from
the hi s by the GramSchmidt orthogonalization procedure, and let A be the
matrix whose (i, j)th entry is (hi , fj )H . Show that A is lower triangular and
that its diagonal entries are non-negative. In addition, show that AA> = B.
(iv) By combining (i) and (iii), show that there is a Kn < , depending only
on n, such that
n
X

khi fi kH Kn

i=1

Hint: Note that hi =


khi

n
X


i,j (hi , hj )H .
i,j=1

Pn

j=1

fi k2H

aij fj and therefore that


=

n
X

IRn A

2
ij

nkIRn Ak2op .

j=1

Exercise 8.3.17. Given a Hilbert space H, the problem of determining for


which Banach spaces H arises as the CameronMartin space is an extremely
delicate one. For example, one might hope that H will be the CameronMartin
space for E if H is dense in E and its closed unit ball BH (0, 1) is compact in E.
However, this is not the case. For example,qtake H = `2 (N; R) and let E be the
P n2
completion of H with respect to kkE
n=0 n+1 . Show that BH (0, 1) is
compact as a subset of E but that there is no W M1 (E) for which (H, E, W)
is an abstract Wiener space.
Hint: The first part is an easy application of the standard diagonalization ar2
P
n
1
kk`2 (N;R) . To
m+1
gument combined with the obvious fact that nm n+1
prove the second part, note that in order for W to exist it would be necessary
P n2
N
to be 0,1
-almost surely convergent.
for n=0 n+1

332

8 Gaussian Measures on a Banach Space

Exercise 8.3.18. Let (H, E, W) be an abstract Wiener space, and assume that
H is infinite dimensional. As was pointed out, {hx : x E } is the subspace of
g H for which there exists a C < with the property that |(h, g)H | CkhkE
for all h H. Show that for each g H there is separable Banach space Eg
that is continuously embedded as a Borel subset of E such that W(Eg ) = 1,
(H, Eg , W  Eg ) is an abstract Wiener space, and |(h, g)H | khkEg for all
h H.
Hint: Refer to the notation used in the proof of Corollary 8.3.10. Choose nm %

1
so that n0 = 0 and, for m 1, kL
gkH 2m and EW kPL k2E 2 2m
nm
for finite dimensional L Lnm . Next, define Eg to be the space of x E with
the properties that PLnm x x in E and

kxkEg

X


 
kQ` xkE + Q` x, g H < ,

`=0

Pn`

where Q0 x = hx, x0 ihx0 and Q` x =


n=n`1 +1 hx, xn ihxn for ` 1. Using
the reasoning in the proof of Corollary 8.3.10, show that Eg has the required
properties.
Exercise 8.3.19. Let N = 1. Using Theorem 8.3.3, take Wieners choice of orthonormal basis and check that there are independent, standard normal random
variables {Xm : m 1} under W (1) such that, for W (1) -almost almost every ,
1

(t) = tX0 () + 2 2

Xm ()

m=1

sin(mt)
,
m

t [0, 1],

where the convergence is uniform. From this, show that, W (1) -almost surely,
1

1 X Xm ()2 + 8X0 ()Xm ()


X0 ()2
,
+ 2
(t) dt =
m2
m=1
3
2

where the convergence of the series is absolute. Using the preceding, conclude
that, for any (0, ),

EW

(1)


Z

# 12
# 12 "
 "Y


X
1
2
.
1 + 4
(t)2 dt =
1+ 2 2
m2 2 + 2
m
m=1
m=1

Finally, recall Eulers product formula



Y
sinh z =
1+
m=1

z2
m2 2


,

z C,

Exercises for 8.3

333

and arrive first at


W (1)



Z
exp


 1

(t) dt
= cosh 2 2 ,
2

and then, after an application of Brownian rescaling, at


"
!#
Z T

 1

W (1)
2
E
exp
(t) dt
= cosh 2 T 2 .
0

This is a famous calculation that can be made using many different methods.
We will return to it in 10.1.3. See, in addition, Exercise 8.4.7.
Hint: Use Eulers product formula to see that

X
1
sinh t
d
= 2t
log
2
2
n + t2
t
dt
n=1

for t R.

Exercise 8.3.20. Related to the preceding exercise, but easier, is finding the
Laplace transform of the variance
!2
Z
Z
1 T
1 T
2
(t) dt
(t) dt
VT ()
T 0
T 0

of a Brownian path over the interval [0, T ]. To do this calculation, first use
Brownian scaling to show that


(1) 
(1) 
EW eVT = EW eT V1 .
Next, use elementary Fourier series to show that (cf. part (iii) of Exercise 8.2.18)
R
2
1
2 X
Z 1

f
(t)
d(t)
X
k
0
,
V1 () = 2
(t) cos(kt) dt =
k2 2
0
k=1

k=1

1
2

where fk (t) = 2 sin(kt) for k 1. Since the fk s are orthonormal as elements


of L2 [0, ); R , this leads to
 12


 Y
2
W (1) V1
.
E
e
=
1+ 2 2
k
k=1

Now apply Eulers formula to arrive at


s
W



E eVT =

2T

.
sinh( 2T )

Finally, using Wieners choice of basis, show that


V1 () has the same dis2
R1
(1)
tribution as
(t) t(1) dt under W , a fact for which I would like
0
but do not have any conceptual explanation.

334

8 Gaussian Measures on a Banach Space

Exercise 8.3.21. The purpose of this exercise is to show that, without knowing ahead of time that W (N ) lives on (RN ), for the Hilbert space H1 (RN ) one
N
-almost surely in (RN ).
can give a proof that any Wiener series converges 0,1
N
Thus, let {hm : m 0} be an orthonormal basis
Pn in H(R ) and, for n N
N
and = (0 , . . . , m , . . . ) R , set Sn (t, ) = m=0 m hm (t). The goal is to
N
show that {Sn ( , ) : n 0} converges in (RN ) for 0,1
-almost every RN .


(i) For RN , set ht, ( ) = t , check that , Sn (t) RN = ht, , Sn (t) H1 (RN ) ,

N
and apply Theorem 1.4.2 to show that limn , Sn (t) RN exists both 0,1
2 N
N
almost surely and in L (0,1 ; R) for each (t, ) [0, ) R . Conclude from
N
this that, for each t [0, ), limn Sn (t) exists both 0,1
-almost surely and
2 N
N
in L (0,1 ; R ).
(ii) On the basis of part (i), show that we will be done once we know that,
N
for 0,1
-almost every x RN , {Sn ( , x) : n 0} is equicontinuous on finite
intervals and that supn0 t1 |Sn (t, x)| 0 as t . Show that both these
will follow from the existence of a C < such that
"

#
Sn (t) Sn (s)
N
3
0,1
CT 8 for all T (0, ).
(*)
E
sup sup
1
(t s) 8
0s<tT n0

(iii) As an application of Theorem 4.3.2, show that (*) will follow once one
checks that


N
0,1
4
E
sup |Sn (t) Sn (s)| B(t s)2 , 0 s < t,
n0

for some B < . Next, apply (6.1.14) to see that



  4

N
N 
4
sup E0,1 |Sn (t) Sn (s)|4 .
E0,1 sup |Sn (t) Sn (s)|4
3 n0
n0

In addition, because Sn (t) Sn (s) is a centered Gaussian, argue that



2
N 
N 
E0,1 |Sn (t) Sn (s)|4 3E0,1 |Sn (t) Sn (s)|2 .
Finally, repeat the sort of reasoning used in (i) to check that

N 
E0,1 |Sn (t) Sn (s)|2 N (t s) for 0 s < t.
Exercise 8.3.22. In this exercise we discuss some properties of pinned Brownian motion. Given T > 0, set T (t) = (t) tT
T (T ). As I pointed out at
the end of 8.3.2, the W (N ) -distribution of T is that of a Brownian motion
conditioned to be back at 0 at time T . Next take T (RN ) to be the space of
(N )
continuous paths : [0, T ] RN satisfying (0) = 0 = (T ), and let WT
(N )
N
N
denote the W
-distribution of (R ) 7 T T (R ).

Exercises for 8.3

335

(i) Show that the W (N ) -distribution of {T (t) : t 0} is the same as that of


1
{T 2 1 (T 1 t) : t 0}.

(ii) Set H1T (RN ) = {h  [0, T ] : h H1 (RN ) & h(T ) = 0}, and define

L2 ([0,T ];RN ) . Show that the triple H1 (RN ), T (RN ), W (N )
khkH1T (RN ) = khk
T
T
(N )

is an abstract Wiener space. In addition, show that WT is invariant under


time reversal. That is, {(t) : t [0, T ]} and {(T t) : t [0, T ]} have the
(N )
same distribution under WT .
Hint: Begin by identifying T (RN ) as the space of finite, RN -valued Borel
measures on [0, T ] such that ({0}) = 0 = ({T }).
Exercise 8.3.23. Say that D E is determining if x = y whenever hx, x i =
hy, x i for all x D. Next, referring to Theorem 8.3.14, suppose that O is an
orthogonal transformation on H and that F : E 7 E has the properties that
F  H = O and that x
hF (x), x i is continuous for all x s from a determining
set D. Show that TO x = F (x) for W-almost every x E.

Exercise 8.3.24.
Consider H1 (RN ), (RN ), W (N ) , the classical Wiener
space. Given (0, ), define O : H1 (RN ) H(RN ) by [O h](t) =
1
2 h(t), show that O is an orthogonal transformation, and apply Exercise 8.3.23 to see that TO is the Brownian scaling map S given by S (t) =
1
2 (t) discussed in part (iii) of Exercise 4.3.10. The main goal of this exercise
is to apply Theorem 8.3.15 to show that TO is ergodic for every (0, )\{1}.

(i) Given an orthogonal transformation O on H1 (RN ), show that On h, h0 H1 (RN )

tends to 0 for all h, h0 H1 (RN ) if limn On h, h0 H1 (RN ) = 0 for all h, h0

h 0 C (0, ); RN .
H(RN ) with h,
c


(ii) Complete the program by showing that On h, h0 H1 (RN ) tends to 0 for all

h 0 C (0, ); RN .
(0, ) \ {1} and h, h0 H1 (RN ) with h,
c
(iii) There is another way to think about the operator O . Namely, let RN
be Lebesgue measure on R, define U : H(RN ) L2 (RN ; RN ) by U h(x) =
x
x ), and show that U is an isometry from H1 (RN ) onto L2 (RN ; RN ). Fure 2 h(e
ther, show that U O = log U , where : L2 (RN ; RN ) L2 (RN ; RN ) is
the translation map f (x) = f (x + ). Conclude from this that

On h, h0


H1 (RN )

= (2)1

Z
R

1n log

d
Uch(), U
h0


CN

d,

and use this, together


 with the RiemannLebesgue Lemma, to give a second
proof that On h, h0 H1 (RN ) tends to 0 as n when 6= 1.

336

8 Gaussian Measures on a Banach Space

(iv) As a consequence of the above and Theorem 6.2.7, show that for each
(0, ) \ {1}, q [1, ), and F Lq (W (N ) ; C),
n1

(N )
1 X
F Sn = EW [F ] W (N ) -almost surely and in Lq (W (N ) ; C).
n n
m=0

lim

Next, replace Theorem 6.2.7 by Theorem 6.2.12 to show that


Z t

(N )
1
1 F S d = EW [F ]
lim
t log t 1

W (N ) -almost surely and in Lq (W (N ) ; C). In particular, use this to show that,


for n N,
( Qn
Z t
2
n
1
m=1 (2m 1) if n is even
2 1 ( )n d =
lim
t log t 1
0
if n is odd.

Exercise 8.3.25. Here is a second reasonably explicit example to which Theorem 8.3.15 applies. Again consider the classical case when H = H1 (RN ), and
assume that N Z+ is even. Choose a skew-symmetric A Hom(RN ; RN )
whose kernel is {0}. That is, A> = A and Ax = 0 = x = 0.
(i) Define OA on H1 (RN ) by
Z

) d,
e A h(

OA h(t) =
0

and show that OA is an orthogonal transformation that satisfies the hypotheses


in Theorem 8.3.15.
Hint: Using elementary spectral theory, show that there exist non-zero, real
numbers 1 , . . . , N and an orthonormal basis (e1 , . . . , eN ) in RN such that
2
Ae2m1 = m e2m and Ae2m = m e2m1 for 1 m N2 . Thus, if Lm is the
space spanned by e2m1 and e2m , then Lm is invariant under A and the action
of e A on Lm in terms of this basis is given by


cos(m ) sin(m )
.
sin(m ) cos(m )
n
Finally, observe that OA
= OnA , and apply the RiemannLebesgue Lemma.

(ii) With the help of Exercise 8.3.23, show that


Z t
TOA (t) =
e A d( ),
0

where the integral is taken in the sense of RiemannStieltjes.

8.4 A Large Deviations Result and Strassens Theorem

337

8.4 A Large Deviations Result and Strassens Theorem


In this section I will prove the analog of Corollary 1.3.13 for non-degenerate,
centered Gaussian measures on a Banach space. Once we have that result, I will
apply it to prove Strassens Theorem, which is the law of the iterated logarithm
for such measures.
8.4.1. Large Deviations for Abstract Wiener Space. The goal of this
subsection is to derive the following result.
Theorem 8.4.1. Let (H, E, W) be an abstract Wiener space, and, for  > 0,
1
denote by W the W-distribution of x
 2 x. Then, for each BE ,

inf
h

(8.4.2)

khk2H
lim  log W ()
2
&0

khk2H
.
2
h

lim  log W () inf


&0

The original version of Theorem 8.4.1 was proved by M. Schilder for the classical Wiener measure using a method that does not extend easily to the general
case. The statement that I have given is due to Donsker and S.R.S. Varadhan,
and my proof derives from an approach (which very much resembles the arguments given in 1.3 to prove Cramers Theorem) that was introduced into this
context by Varadhan.
The lower bound is an easy application of the CameronMartin formula. Indeed, all that I have to do is show that if h H and r > 0, then


khk2H
.
lim  log W BE (h, r)
2
&0

(*)

To this end, note that, for any x E and > 0,



1
1
W BE (hx , ) = W BE ( 2 hx ,  2 )
i
h 1
2

1
1
= EW e 2 hx,x i 2 khx kH , BE (0,  2 )

2
1

1
1
e kx kE 2 khx kH W BE (0,  2 ) ,

which means that


khx k2H
,
BE (hx , ) BE (h, r) = lim  log W BE (hx , r) kx kE
&0
2

and therefore, after letting & 0 and remembering that {hx : x E } is dense
in H, that (*) holds.

338

8 Gaussian Measures on a Banach Space

The proof of the upper bound in (8.4.2) is a little more involved. The first step
is to show that it suffices to treat the case when is relatively compact. To this
end, refer to Corollary 8.3.10, and set CR equal to the closure in E of BE0 (0, R).

2 
By Ferniques Theorem applied to W on E0 , we know that EW ekxkE0 K <
for some > 0. Hence

W E \ CR = W E \ C

 2 R

Ke

R2


and so, for any BE and R > 0,



R2 
W 2W( CR ) Ke  .

Thus, if we can prove the upper bound for relatively compact s, then, because
CR is relatively compact, we will know that, for all R > 0,

khk2H
2
h


lim  log W ()

inf

&0

R2


,

from which the general result is immediate.


To prove the upper bound when is relatively compact, I will show that, for
any y E,
(
kyk2

2 H if y H
lim lim  log W BE (y, r)
(**)
r&0 &0

if y
/ H.

To see that (**) is enough, assume that it is true and let BE \{} be relatively
compact. Given (0, 1), for each y choose r(y) > 0 and (y) > 0 so that

(


W BE (y, r(y))

(1)
2
2 kykH

1


if y H

if y
/H

for all 0 <  (y). Because is relatively compact, we can find N Z+ and
SN
{y1 , . . . , yN } such that 1 BE (yn , rn ), where rn = r(yn ). Then, for
sufficiently small  > 0,


 
1
1
2
,
inf khkH
W () N exp

2 h

and so


lim  log W ()

&0

Now let & 0.

1
inf khk2H
2 h


1
.

8.4 A Large Deviations Result and Strassens Theorem

339

Finally, to prove (**), observe that

i

h 1


1

W BE (y, r) = W BE ( y , r ) = EW e 2 hx,x i e 2 hx,x i , BE ( y , r )



khx k2
 1


1

H
e (hy,x irkx kE ) EW e 2 hx,x i = e hy,x i 2 rkx kE ,

for all x E. Hence,




lim lim  log W BE (y, r) sup hy, x i 12 khx k2H .

r&0 &0

x E

Finally, note that the preceding supremum is the same as half the supremum
kyk2
of hy, x i over x with khx kH = 1, which, by Lemma 8.2.3, is equal to 2 H if
y H and to if y
/ H.
An interesting corollary of Theorem 8.4.1 is the following sharpening, due to
Donsker and Varadhan, of Ferniques Theorem.

Corollary 8.4.3. Let W be a non-degenerate, centered, Gaussian measure on


the separable Banach space E, let H be the associated CameronMartin space,
and determine > 0 by 1 = inf{khkH : khkE = 1}. Then

1
lim R2 log W kxkE R = 2 .
2

 2 2 
In particular, EW e 2 kxkE is finite if < 1 and infinite if 1 .

Proof: Set f (r) = inf{khkH : khkE r}. Clearly f (r) = rf (1) and f (1) =
1 . Thus, by the upper bound in (8.4.2), we know that



2
f (1)2
.
=
lim R2 log W kxkE R = lim R2 log WR2 kxkE 1
R
R
2
2

Similarly, by the lower bound in (8.4.2), for any (0, 1),



lim R2 log W kxkE R lim R2 log W kxkE > R
R


inf

khk2H
: khkE > R
2

1
f (1 + )2
= (1 + )2 2 ,
2
2

and so we have now proved the first assertion.


 2 kxk2E 
is finite when <
Given the first assertion, it is obvious that EW e 2
1 and infinite when > 1 . The case when = 1 is more delicate.
To handle it, I first show that = sup{khx kH : kx kE = 1}. Indeed, if
x E and kx kE = 1, set g = khhxxkE , note that kgkE = 1, and check that

340

8 Gaussian Measures on a Banach Space


1 hg, x i = g, hx H = kgkH khx kH . Hence khx kH kgk1
H . Next,
suppose that h H with khkE = 1. Then, by the HahnBanach Theorem, there
exists a x E with kxkE = 1 and hh, x i = 1. In particular, khkH khx kH
h, hx H = hh, x i = 1, and therefore khk1
H khx kH , which, together with
the preceding, completes the verification.
The next step is to show that there exists an x E with kx kE = 1 such
that khx kH = . To this end, choose {xk : k 1} E with kxk kE = 1 so
that khxk kH . Because BE (0, 1) is compact in the weak* topology and,
by Theorem 8.2.6, x E 7 hx H is continuous from the weak* topology
into the strong topology, we can assume that {xk : k 1} is weak* convergent to
some x BE (0, 1) and that khx kH = , which is possible only if kx kE = 1.
Finally, knowing that this x exists, note that h , x i is a centered Gaussian
under W with variance 2 . Hence, since kxkE |hx, x i|,
h kxk2E i Z
2
e 22 0,2 (d) = . 
EW e 22
R

8.4.2. Strassens Law of the Iterated Logarithm. Just as in 1.5 we were


able to prove a law of the iterated logarithm on the basis of the large deviation
estimates in 1.3, so here the estimates in the preceding subsection will allow
us to prove a law of the iterated for centered Gaussian random variables on a
Banach space. Specifically, I will prove the following theorem, whose statement
is modeled on V. Strassens famous law of the iterated for Brownian motion (cf.
8.6.3).
q
Sn
, where
Recall from 1.5 the notation n = 2n log(2) (n 3) and Sn =
n
Pn
Sn = 1 Xm .

Theorem 8.4.4. Suppose that W is a non-degenerate, centered, Gaussian


measure on the Banach space E, and let H be its CameronMartin space.
If {Xn : n 1} is a sequence of independent, E-valued, W-distributed random variables on some probability space (, F, P), then, P-almost surely, the
sequence {Sn : n 1} is relatively compact in E and the closed unit ball
BH (0, 1) in H coincides with its set of limit points. Equivalently, P-almost surely,
limn kSn BH (0, 1)kE = 0 and, for each h BH (0, 1), limn kSn hkE =
0.

Because, by Theorem 8.2.6, BH (0, 1) is compact in E, the equivalence of the


two formulations is obvious, and so I will concentrate on the second formulation.
I begin by showing that limn kSn BH (0, 1)kE = 0 P-almost surely, and
the fact that underlies my proof is the estimate that, for each open subset G of
E and < inf{khkH : h
/ G}, there is an M (0, ) with the property that





2 2
Sn
for all n Z+ and M n.

/ G exp
(*)
P
2n

8.4 A Large Deviations Result and Strassens Theorem

341

To check (*), first note (cf. Exercise 8.2.14) that the distribution of Sn under

1

/G =
P is the same as that of x
n 2 x under W and therefore that P Sn

W n2 (G{). Hence, (*) is really just an application of the upper bound in (8.4.2).

Given (*), I proceed in very much the same way as I did at the analogous place
in 1.5. Namely, for any (1, 2),

lim kSn BH (0, 1)kE lim

max

m m1 n m

kSn BH (0, 1)kE

kSn BH (0, [ m1 ] )kE


m
n
n m



Sn

BH (0, 1)
lim max m
.
m 1n
m1

lim

max
m1

At this point in 1.5 (cf. the proof of Lemma 1.5.3), I applied Levys reflection
principle to get rid of the max. However, Levys argument works only for
R-valued random variables, and so here I will replace his estimate by one based
on the idea in Exercise 1.4.25.
Lemma 8.4.5. Let {YmP: m 1} be mutually independent, E-valued random
n
variables, and set Sn = m=1 Ym for n 1. Then, for any closed F E and
> 0,


P(kSn F kE )
.
P max kSm F kE 2
1mn
1 max1mn P(kSn Sm kE )

Proof: Set
Am = {kSm F kE 2 and kSk F kE < 2 for 1 k < m}.
Following the hint for Exercise 1.4.25, observe that


P max kSm F kE 2
min P(kSn Sm kE < )
1mn

n
X
m=1

1mn

n
 X

P Am {kSn Sm kE < }
P Am {kSn F kE } ,
m=1


which, because the Am s are disjoint, is dominated by P kSn F kE . 
Applying the preceding to the situation at hand, we see that
!



Sn
BH (0, 1)
P
max
2
1n m [ m1 ]
E




S[m ]
BH (0, 1)
P [
m1 ]
E
.

1 max1n m P kSn kE [ m1 ]

342

8 Gaussian Measures on a Banach Space

After combining this with the estimate in (*), it is an easy matter to show that,
for each > 0, there is a (1, 2) such that
!

X

Sn

BH (0, 1)
P
max
2 < ,

m1 n m [ m1 ]
E
m=1

from which it should be clear why limn kSn BH (0, 1)kE = 0 P-almost surely.
The proof that, P-almost surely, limn kSn hkE = 0 for all h BH (0, 1)
differs in no substantive way from the proof of the analogous assertion in the
second part of Theorem 1.5.9. Namely, because BH (0, 1) is separable, it suffices
to work with one h BH (0, 1) at a time. Furthermore, just as I did there, I can
reduce the problem to showing that, for each k 2,  > 0, and h with khkH < 1,



X

P Skm km1 h E <  = .
m=1

But, if khkH < < 1, then (8.4.2) says that






2
m
m1
)
P Skm km1 h E <  = W km km1 BE (h, ) e log(2) (k k
2
km km1

for all large enough ms.


Exercises for 8.4
Exercise 8.4.6. Let (H, E, W) be an abstract Wiener space, and assume that
dim(H) = . If W is defined for  > 0 as in Theorem 8.4.1, show that
W1 W2 if 2 6= 1 .
Hint: Choose {xm : m 0} E so that {hxm : m 0} is an orthonormal
basis in H, and show that
n1
1 X
hx, xm i2 =  W -almost surely.
n n
m=0

lim

Exercise 8.4.7. Show that the in Corollary 8.4.3 is 12 in the case of the

classical abstract Wiener space H1 (RN ), (RN ), W (N ) and therefore that


lim R2 log W (N ) kk(RN ) R = 2.

Next, show that


!
lim R

log W

(N )

sup |( )| R
[0,t]

1
2t

8.5 Euclidean Free Fields

343

and that
!


2
sup |( )| R (t) = 0 = .
t

lim R2 log W (N )

[0,t]

Finally, show that


lim R

log W

(N )

Z

t
2

|( )| d R
0


=

2
8t2

and that
lim R

log W

(N )

Z
0




2

|( )| d R (t) = 0 = 2 .
2t
2

Hint: In each case after the first, Brownian scaling can be used to reduce the
problem to the case when t = 1, and the challenge is to find the optimal constant
C for which khkE CkhkH , h H for the appropriate
  abstract Wiener space

N
(E, H, W).
In
the
second
case
E
=
C
[0,
1]
:
R
 [0, 1] : (RN )
0


and H =  [0, 1] : H1 (RN ) , in the third (cf. part (ii) of Exercise 8.3.22)
N
E = 1 (RN ) and H = H11 (RN ) , in the fourth E = L2 [0, 1];
{ 
 R ) and H =
1
N
2
N
1 N
[0, 1] : H (R )}, and in the fifth E = L [0, 1]; R
and
H
=
H
(R
).
1

The optimization problems when E = (RN ) or C0 [0, 1]; RN are rather easy
1
consequences of |(t)| t 2 kkH1 (RN ) . When E = 1 (RN ), one should start with
L1 ([0,1];RN ) kkH11 (RN ) .
the observation that if H11 (RN ), then 2kku kk
In the final two cases, one can either use elementary variational calculus or one
can make use of, respectively, the orthonormal bases

2 2 sin n +

1
2



 1
: n 0 and 2 2 sin n : n 1 in L2 [0, 1]; R).


Exercise 8.4.8. Suppose that f C E; R , and show, as a consequence of
Theorem 8.4.4, that



lim f Sn = min{f (h) : khkH 1} and lim f Sn = max{f (h) : khkH 1}
n

W N -almost surely.
8.5 Euclidean Free Fields
In this section I will give a very cursory introduction to a family of abstract
Wiener spaces they played an important role in the attempt to give a mathematically rigorous construction of quantum fields. From the physical standpoint,
the fields treated here are trivial in the sense that they model free (i.e.,
non-interacting) fields. Nonetheless, they are interesting from a mathematical

344

8 Gaussian Measures on a Banach Space

standpoint and, if nothing else, show how profoundly properties of a process are
effected by the dimension of its parameter set.
I begin with the case when the parameter set is one dimensional and the
resulting process can be seen as a minor variant of Brownian motion. As we
will see, the intractability of the higher dimensional analogs increases with the
number of dimensions.
8.5.1. The OrnsteinUhlenbeck Process. Given x RN and (RN ),
consider the integral equation
Z
1 t
U(, x, ) d, t 0.
(8.5.1)
U(t, x, ) = x + (t)
2 0

A completely elementary argument (e.g., via Gronwalls Inequality) shows that,


for each x and , there is at most one solution. Furthermore, integration by
parts allows one to check that if
Z t

2t
e 2 d( ),
U(t, 0, ) = e
0

where the integral is taken in the sense of Riemann-Stieltjes, then


t

U(t, x, ) = e 2 x + U(t, 0, )

is one, and therefore the one and only, solution.


The stochastic process {U(t, x) : t 0} under W (N ) was introduced by
L. Ornstein and G. Uhlenbeck1 and is known as the OrnsteinUhlenbeck
process starting from x. From our immediate point of view, its importance is
that it leads to a completely tractable example of a free field.
Intuitively, U(t, 0, ) is a Brownian motion that has been subjected to a linear
restoring force. Thus, locally it should behave very much like a Brownian motion.
However, over long time intervals it should feel the effect of the restoring force,
which is always pushing it back toward the origin. To see how these intuitive
ideas are reflected in the distribution of {U(t, 0, ) : t 0}, I begin by using
t
Exercise 8.2.18 to identify e, U(t, 0) RN as e 2 I(hte ) for each e SN 1 , where




t
hte ( ) = 2 e 2 1 e. Hence, the span of , U(t, 0) RN : t 0 & RN is
a Gaussian family in L2 (W (N ) ; R), and

EW

(N )



|ts|
s+t 
U(s, 0) U(t, 0) = e 2 e 2 I.

The key to understanding the process {U(t, 0)



 : t t 0} is the observation
that it has the same distribution as the process e 2 B et 1 : t 0 , where
1

In their article On the theory of Brownian motion, Phys. Reviews 36 # 3, pp. 823-841
(1930), L. Ornstein and G. Uhlenbeck introduced this process in an attempt to reconcile some
of the more disturbing properties of Wiener paths with physical reality.

8.5 Euclidean Free Fields

345

{B(t) : t 0} is a Brownian motion, a fact that follows immediately from the


observation that they are Gaussian families with the same covariance structure.
In particular, by combining this with the Law of the Iterated Logarithm proved
in Exercise 4.3.15, we see that, for each e SN 1 ,


e, U(t, x) RN
e, U(t, x) RN

= 1 = lim
lim
(8.5.2)
t
2 log t
2 log t
t

W (N ) -almost surely, which confirms the suspicion that the restoring force dampens the Brownian excursions out toward infinity.
A second indication that U( , x) tends to spend more time than Brownian
paths do near the origin is that its distribution at time t will be e 2t x,(1et )I ,
and so, as distinguished from Brownian motion itself, its distribution as time
t tends to a limit, namely 0,I . This observation suggests that it might be
interesting to look at an ancient OrnsteinUhlenbeck process, one that already
has been running for an infinite amount of time. To be more precise, since the
distribution of an ancient OrnsteinUhlenbeck at time 0 would be 0,I , what
we should look at is the process that we get by making the x in U( , x, )
a standard normal random variable. Thus, I will say that a stochastic process
{UA (t) : t 0} is an ancient OrnsteinUhlenbeck process if its distribution
is that of {U(t, x, ) : t 0} under 0,I W (N ) .
If {U
process, then it is clear
 A (t) : t  0} is an ancient OrnsteinUhlenbeck

that , UA (t) RN : t 0 & RN spans a Gaussian family with covariance



|ts|
EP UA (s) UA (t) = e 2 I.

As

we see that if {B(t) : t 0} is a Brownian motion, then
 at consequence,
e 2 B et : t 0 is an ancient OrnsteinUhlenbeck process. In addition, as
we suspected, the ancient OrnsteinUhlenbeck process is a stationary process
in the sense that, for each T > 0, the distribution of {UA (t + T ) : t 0} is
the same as that of {UA (t) : t 0}, which can be checked either by using the
preceding representation in terms of Brownian motion or by observing that its
covariance is a function of t s.
In fact, even more is true: it is time reversible in the sense that, for each T > 0,
{UA (t) : t [0, T ]} has the same distribution as {UA (T t) : t [0, T ]}. This
observation suggests that we can give the ancient OrnsteinUhlenbeck its past
by running it backwards. That is, define UR : [0, ) RN (RN )2 RN by

U(t, x, + )
if t 0
UR (t, x, + , ) =
U(t, x, ) if t < 0,

and consider the process {UR (t, x, + , ) : t R} under 0,I W (N ) W (N ) .


This process also spans a Gaussian family, and it is still true that

|ts|
(N )
(N ) 
(8.5.3) E0,I W W
UR (s) UR (t) = u(s, t)I, where u(s, t) e 2 ,

346

8 Gaussian Measures on a Banach Space

only now for all s, t R. One advantage of having added the past is that the
statement of reversibility takes a more appealing form. Namely, {UR (t) : t R}
is reversible in the sense that its distribution is the same whether one runs
it forward or backward in time. That is, {UR (t) : t R} has the same
distribution as {UR (t) : t R}. For this reason, I will say that {UR (t) : t 0}
is a reversible OrnsteinUhlenbeck process if its distribution is the same
as that of {UR (t, x, + , ) : t 0} under 0,I W (N ) W (N ) .
An alternative way to realize a reversible OrnsteinUhlenbeck process is to
start with an RN -valued Brownian motion
{B(t) : t 0} and consider the

t
t
process {e 2 B(et ) : t R}. Clearly , e 2 B(et ) RN : (t, ) R RN is
a Gaussian family with covariance given by (8.5.3). It is amusing to observe
that, when one uses this realization, the reversibility of the OrnsteinUhlenbeck
process is equivalent to the time inversion invariance (cf. Exercise 4.3.11) of the
original Brownian motion.
8.5.2. OrnsteinUhlenbeck as an Abstract Wiener Space. So far, my
treatment of the OrnsteinUhlenbeck process has been based on its relationship
to Brownian motion. Here I will look at it as an abstract Wiener space.
Begin with the one-sided process
0) : t 0}. Seeing as this process
 t {U(t,
t
2
has the same distribution as e B e 1 : t 0}, it is reasonably clear
that the Hilbert space associated with this process should be the space HU (RN )
t
of functions hU (t) = e 2 h et 1), h H1 (RN ). Thus, define the map F U :
H1 (RN ) HU (RN ) accordingly, and introduce the Hilbert norm k kHU (RN )
on HU (RN ) that makes F U into an isometry. Equivalently,
Z
h d
i2
1
U 2
ds
(1 + s) 2 hU log(1 + s)
kh kHU (RN ) =
[0,) ds

1
U U 2
khU k2 2
= kh U k2 2
N .
N + h ,h
N +
L ([0,);R )

L ([0,);R )

L ([0,);R )

Note that
h U , hU


L2 ([0,);RN )

1
2

Z
[0,)

d U
|h (t)|2 dt =
dt

1
lim |hU (t)|2
2 t

= 0.
1

To check the final equality, observe that it is equivalent to limt t 2 |h(t)| = 0


1
1
for h H(RN ). Hence, since supt>0 t 2 |h(t)| khkH1 (RN ) and limt t 2 |h(t)|
= 0 if h has compact support, the same result is true for all h H1 (RN ). In
particular,
q
khU kHU (RN ) = kh U k2L2 ([0,);RN ) + 14 khU k2L2 ([0,);RN ) .

If we were to follow the prescription in Theorem 8.3.1, we would next complete


t
HU (RN ) with respect to the norm supt0 e 2 |hU (t)|. However, we already know

8.5 Euclidean Free Fields

347

from (8.5.2) that {U(t, 0) : t 0} lives on U (RN ), the space of (RN )


such that limt (log t)1 |(t)| = 0 with Banach norm
1
kk sup log(e + t)
|(t)|,
t0

and so we will adopt U (RN ) as the Banach space for HU (RN ). Clearly, the
dual space U (RN ) of U (RN ) can be identified with the space of RN -valued
Borel
measures on [0, ) that give 0 mass to {0} and satisfy kkU (RN )
R
log(e
+ t) ||(dt) < .
[0,)

(N )
Theorem 8.5.4. Let U0 M1 U (RN ) be the distribution of {U(t, 0) :
(N ) 
t 0} under W (N ) . Then HU (RN ), U (RN ), U0
is an abstract Wiener
space.

Proof: Since Cc (0, ); RN is contained in HU (RN ) and is dense in U (RN ),
we know that HU (RN ) is dense in U (RN ). In addition, because U (t) =
t
e 2 (et 1), where H1 (RN ), and k U kHU (RN ) = kkH1 (RN ) , k U ku
1
k U kHU (RN ) follows from |(t)| t 2 kkH1 (RN ) . Hence, HU (RN ) is continuously
embedded in U (RN ).
To complete the proof, remember our earlier calculation of the covariance of
{U(t; 0) : t 0}, and use it to check that
(N )

EU0



h, i2 =

ZZ
u0 (s, t) (ds) (dt),

where u0 (s, t) e

|st|
2

s+t
2

[0,)2

U
N
Hence, what I need to show is that if  U (RN ) hU
H (R ) is the
U
U
U
map determined by hh , i = h , h HU (RN ) , then

(8.5.5)

2
khU
kHU (RN ) =

ZZ
u0 (s, t) (ds) (dt).

[0,)2

In order to do this, we must first know how hU


is constructed from . But if
(8.5.5) is going to hold, then, by polarization,

U
e, hU
( ) RN = hh , ei =

ZZ
u0 (s, t) (ds) e, (dt)

[0,)2

Z
=

e,

!
u0 (, t) (dt)

[0,)

.
RN


RN

348

8 Gaussian Measures on a Banach Space

R
Thus, one should guess that hU
( ) = [0,) u0 (, t) (dt) and must check that,
U
U
N
U
U
N
with this choice, h
 H (R ), (8.5.5) holds, and, for all h H (R ),
U
U
U
hh , i = h , h HU (RN ) .
The key to proving all these is the equality
Z
Z
hU ( )u0 (, t) d = hU (t),
(*)
h U ( ) u0 (, t) d + 14
[0,)

[0,)

which is an elementary application of integration by parts. Applying (*) with


N = 1 to hU = u0 ( , s), we see that
Z
u0 (s, ) u0 (t, ) d = u0 (s, t),
[0,)
U
N
from which it follows easily both that hU
H (R ) and that (8.5.5) holds.
U
U
N
U
U
In addition, if h H (R ), then hh , i = h , hU
HU (RN ) follows from (*)
after one integrates both sides of the preceding with respect to (dt). 
I turn next to the reversible case. By the considerations in 8.4.1, we know
(N )
that the distribution UR of {UR (t) : t 0} under 0,1 W (N ) W (N ) is a
Borel measure on the space Banach space U (R; RN ) of continuous : R RN
such that lim|t| (log t)1 |(t)| = 0 with norm

1
kkU (R;RN ) sup log(e + |t|)
|(t)| < .
tR

Furthermore, it should be clear that one can identify U (R; RN ) with the space
of RN -valued Borel measures on R satisfying
Z
kkU (R;RN )
log(e + |t|) ||(dt) < .
R

Theorem 8.5.6. Take H1 (R; RN ) to be the separable Hilbert space of absolutely continuous h : R RN satisfying
khkH1 (R;RN )

2 2 N + 1 khk2 2 N < .
khk
4
L (R:R )
L (R:R )
(N ) 

Then H1 (R; RN ), U (R; RN ), UR

is an abstract Wiener space.

|st|
2

, and let U (R; RN ). By the same reasoning


Proof: Set u(s, t) e
as I used in the preceding proof,
hh, i = h, h


H1 (R;RN )

8.5 Euclidean Free Fields


and
kh k2H1 (R;RN ) =

349

ZZ
u(s, t) (ds) (dt)
RR




u(, t) (dt). Hence, since , (t) RN : t 0 & RN


(N ) 
(N )
spans a Gaussian family in L2 UR ; R and u(s, t)I = EUR (s) (t) , the
proof is complete. 
when h ( ) =

8.5.3. Higher Dimensional Free Fields. Thinking a la Feynman, Theorem


(N )
8.5.6 is saying that UR wants to be the measure on H 1 (R; R) given by
1

( 2)dim(H1 (R;RN ))


Z 
 
1
2
2
1

|h(t)| + 4 |h(t)| dt H1 (R;RN ) (dh),


exp
2 R

where H1 (R;RN ) is the Lebesgue measure on H1 (R; RN ).


I am now going to look at the analogous situation when N = 1 but the
parameter set R is replaced by R for some 2. That is, I want to look at
the measure that Feynman would have written as
1

( 2)dim(H 1 (R ;R))



Z

1
2
2
1
|h(x))| + 4 |h(x)| dx H 1 (R ;R) (dh),
exp
2 R

where H 1 (R ; R) is the separable Hilbert space obtained by completing the


Schwartz test function space S (R ; R) with respect to the Hilbert norm
khkH 1 (R ;R)

khk2L2 (R ;R) + 14 khk2L2 (R ;R) .

When = 1 this is exactly the Hilbert space H 1 (R; R) described in Theorem


8.5.6 for N = 1. When 2, generic elements of H 1 (R ; R) are better than
generic elements of L2 (R ; R) but are not enough better to be continuous. In
fact, they are not even well-defined pointwise, and matters get worse as gets
larger. Thus, although Feynmans representation is already questionable when
= 1, its interpretation when 2 is even more fraught with difficulties. As
we will see, these difficulties are reflected mathematically by the fact that, in
order to construct an abstract Wiener space for H 1 (R ; R) when 2, we will
have to resort to Banach spaces whose elements are generalized functions (i.e.,
distributions in the sense of L. Schwartz).2
2

The need to deal with generalized functions is the primary source of the difficulties that
mathematicians have when they attempt to construct non-trivial quantum fields. Without
going into any details, suffice it to say that in order to construct interacting (i.e., non-Gaussian)
fields, one has to take non-linear functions of a Gaussian field. However, if the Gaussian field
is distribution valued, it is not at all clear how to apply a non-linear function to it.

350

8 Gaussian Measures on a Banach Space

The approach that I will adopt is based on the following subterfuge. The space
H 1 (R ; R) is one of a continuously graded family of spaces known as Sobolev
spaces. Sobolev spaces are graded according to the number of derivatives better or worse than L2 (R ; R) their elements are. To be more precise, for each
s R, define the Bessel operator B s on S (R ; C) so that
 s
s () = 1 + ||2 2 ().
d

B
4

m
When s = 2m, it is clear that B s = 14 , and so, in general, it is reasonable
to think of B s as an operator that, depending on whether s 0 or s 0,
involves taking or restoring derivatives of order |s|. In particular, kkH 1 (R ;R) =
kB 1 kL2 (R ;R) for S (R ; R). More generally, define the Sobolev space
H s (R ; R) to be the separable Hilbert space obtained by completing S (R ; R)
with respect to
s
Z
s
1
s
1
2 d.

+ ||2 |h()|
khkH s (R ;R) kB hkL2 (R ;R) =
4

(2) R

Obviously, H 0 (R ; R) is just L2 (R ; R). When s > 0, H s (R ; R) is a subspace of L2 (R ; R), and the quality of its elements will improve as s gets larger.
However, when s < 0, some elements of H s (R ; R) will be strictly worse than
elements of L2 (R ; R), and their quality will deteriorate as s becomes more negative. Nonetheless, for every s R, H s (R ; R) S 0 (R ; R), where S 0 (R ; R),
whose elements are called real-valued tempered distributions, is the dual
space of S (R ; R). In fact, with a little effort, one can check that an alternative
description of H s (R ; R) is as the subspace of u S 0 (R ; R) with the property that B s u L2 (R ; R). Equivalently, H s (R ; R) is the isometric image in
S (R ; R) of L2 (R ; R) under the map B s , and, more generally, H s2 (R ; R) is
the isometric image of H s1 (R ; R) under B s2 s1 . Thus, by Theorem 8.3.1, once
we understand the abstract Wiener spaces for any one of the spaces H s (R ; R),
understanding the abstract Wiener spaces for any of the others comes down to
understanding the action of the Bessel operators, a task that, depending on what
one wants to know, can be highly non-trivial.
+1

Lemma 8.5.7. The space H 2 (R ; R) is continuously embedded as a dense


subspace of the separable Banach space C0 (R ; R) whose elements are continuous functions that tend to 0 at infinity and whose norm is the uniform norm.
Moreover, given a totally finite, signed Borel measure on R , the function
Z
1
|xy|
2
2
,
(dy), with K
h (x) K
e
+1
R
2

is an element of H

+1
2

kh k

(R ; R),

ZZ

2
H

+1
2

(R ;R)

= K
R R

|xy|
2

(dx)(dy),

8.5 Euclidean Free Fields

351

and
hh, i = h, h


H

+1
2

for each h H

(R ;R)

+1
2

(R ; R).

Proof: To prove the initial assertion, use the Fourier inversion formula to write
Z

d
h(x) = (2)
e 1(x,)R h()
R

for h S (R ; R), and derive from this the estimate


 12
Z
 +1

2 2
1
khk
d
+
||
khku (2) 2
4

Hence, since H
norm k k +1
H

+1
2

+1
2

(R ;R)

(R ; R) is the completion of S (R ; R) with respect to the


+1
, it is clear that H 2 (R ; R) is continuously embedded in

(R ;R)

+1

C0 (R ; R). In addition, since S (R ; R) is dense in C0 (R ; R), H 2 (R ; R) is


also.
To carry out the next step, let be given, and observe that the Fourier
 +1
2
() and therefore that
transform of B +1 is 14 + ||2

e 1(x,)R ()
1
+1
d
B
(x) =
+1
 2
(2) R
1
2
+
||

4
Z
Z
1(yx,)R
e
1

d (dy).
=
 +1
(2) R
2
R 1 + ||2
4

Now use (3.3.19) (with N = and t = 12 ) to see that

Z
|yx|
e 1(yx,)R
1
2
,
d
=
K
e

+1

(2) R 1 + ||2 2

and thereby arrive at h = B


kh k2

+1
2

(R ;R)

4
+1

. In particular, this shows that


Z
2

|()|
1
d < .
=
(2) R 1 + ||2  +1
2
4

Now let h S (R ; R), and use the preceding to justify



+1
+1
hh, i = hB 2 h, B 2 B +1 i = h, h +1
H

(R ;R)

+1

Since both sides are continuous with respect to convergence in H 2 (R ; R), we



+1
for all h H 2 (R ; R). In
have now proved that hh, i = h, h +1
H

(R ;R)

particular,
kh k

ZZ

2
H

+1
2

(R ;R)

= hh , i = K
R R

|yx|
2

(dx)(dy). 

352

8 Gaussian Measures on a Banach Space


+1

Theorem 8.5.8. Let 2 (R ; R) be the space of continuous : R R sat1


+1
isfying lim|x| log(e+|x|)
|(x)| = 0, and turn 2 (R ; R) into a separable
1
= supxRN log(e + |x|)
|(x)|. Then
Banach space with norm kk +1

+1
2

(R ;R)

(R ; R) is continuously embedded as a dense subspace of


H

+1
there is a W +1 M1 2 (R ; R) such that
H

+1
2

(R ; R), and

(R ;R)

+1
2

(R ; R),

+1
2

(R ; R), W


+1
2

(R ;R)


is an abstract Wiener space. Moreover, for each 0, 12 , W

every is H
older continuous of order and, for each > 12 , W

+1
2

(R ;R)

+1
2

(R ;R)

-almost

-almost

no is anywhere H
older continuous of order .
Proof: The initial part of the first assertion follows from the first part of
Lemma 8.5.7 plus the essentially trivial fact that C0 (R ; R) is continuously em+1
bedded as a dense subspace of 2 (R ; R). Further, by the second part of
that same lemma combined with Theorem 8.3.3, we will have proved the second part of the first assertion here once we show that, when {hm : m 0} is
P
+1
an orthonormal basis in H 2 (R ; R), the Wiener series m=0 m hm converges
+1
N
-almost every = (0 , . . . , m , . . . ) RN . Thus, set
in 2 (R ; R) for 0,1
Pn
Sn () = m=0 m hm for n 1. More or less mimicking the steps outlined in
Exercise 8.3.21, I will begin by showing that, for each 0, 12 and R [1, ),

(*)

|Sn (y) Sn (x)|


< ,
|y x|
n0 x,yQ(z,R)

sup E0,1 sup

zR

sup

x6=y

where Q(z, R) = z + [R, R) . Indeed, by the argument given in that exercise combined with the higher dimensional analog of Kolmogorovs continuity
criterion in Exercise 4.3.18, (*) will follow once we show that

N 
E0,1 |Sn (y) Sn (x)|2 C|y x|,

x, y R ,

for some C < . To this end, set = y x , and apply Lemma 8.5.7 to check
E

N
0,1

n

 X
2
2
|Sn (y) Sn (x)| =
hm , h

+1
2

(R ;R)

m=0

kh k2

+1
2

(R ;R)

= 2K 1 e

|yx|
2

Knowing (*), it becomes an easy matter to see that there exists a measurable S : R RN R such that x
S(x, ) is continuous of each and

8.5 Euclidean Free Fields

353

N
Sn ( , ) S( , ) uniformly on compacts for 0,1
-almost every RN . In
N
fact, because of (*), it suffices to check that limn Sn (x) exists 0,1
-almost

surely for each x R , and this follows immediately from Theorem 1.4.2 plus

Var m hm (x) =

m=0

hm , hx

2
H

+1
2

= khx k2

(R ;R)

m=0

+1
2

(R ;R)

= K .

N
-almost every , x
S(x, )
Furthermore, again from (*), we know that, 0,1

1
is -H
older continuous so long as 0, 2 .
N
I must still check that, 0,1
-almost surely, the convergence of Sn ( , ) to
+1
S( , ) is taking place in 2 (R ; R), and, in view of the fact that we already
N
know that, 0,1
-almost surely, it is taking place uniformly on compacts, this
reduces to showing that
1
N
lim log(e + |x|
sup |Sn (x)| 0 0,1
-almost surely.

|x|

n0

For this purpose, observe that (*) says that




N
0,1
sup E
sup kSn ku,Q(z,1) < ,
zR

n0

where k ku,C denotes the uniform norm over a set C R . At this point, I
would like to apply Ferniques
Theorem (Theorem 8.2.1) to the Banach space

` N; Cb (Q(z, 1); R) and thereby conclude that there exists an > 0 such that



N
(**)
B sup E0,1 exp sup kSn k2u,Q(z,1)
< .
zR

n0


However, ` N; Cb (Q(z, 1); R) is not separable. Nonetheless, there are two
ways to get around this technicality. The first is to observe that the only place
separability was used in the proof of Ferniques Theorem was at the beginning,
where I used it to guarantee that BE is generated by the maps x
hx, x i as

x runs over E and therefore that the distribution of X is determined by the


distribution of {hX, x i : x E }. But, even though ` N; Cb (Q(z, 1); R)
is not separable, one can easily check that it nevertheless possesses this property. The second way to deal
 with the problem is to apply his theorem to
` {0, . . . , N }; Cb (Q(z, 1); R) , which is separable, and to note that the resulting estimate can be made uniform in N N. Either way,
p one arrives at (**).
2
Now set (t) = et 1 for t 0. Then 1 (s) = 1 log(1 + s), and



sup kSn ku,Q(0,M ) = max sup kSn ku,Q(m,1) : m Q(0, M ) Z


n0

n0

mQ(0,M )Z


sup kSn ku,Q(m,1) .


n0

354

8 Gaussian Measures on a Banach Space

Thus, because 1 is concave, Jensens Inequality applies and yields





N
E0,1 sup kSn ku,Q(0,M ) 1 (2M ) B ,
n0

and therefore

"

Sn (x)

log(e
+ |x|)
|x|R n0

N
0,1

sup sup

1
m(log R) 4

h
i
N
E0,1 supn0 kSn ku,Q(0,em4 )

log(e + e(m1)4 )

1
m(log R) 4

p
log(1 + 2 e(m+1)4 B)

0
log(e + e(m1)4 )

as R .

To complete the proof, I must show that, for any > 12 , W

+1
2

(R ;R)

-almost

no is anywhere Holder continuous of order , and for this purpose I will proceed
as in the proof of Theorem 4.3.4. Because the {(x + y) : x R } has the same
W +1 -distribution for all y, it suffices for me to show that, W +1 H

(R ;R)

(R ;R)

almost surely, there is no x Q(0, 1) at which is Holder continuous of order


> 12 . Now suppose that 12 , 1 , and observe that, for any L Z+ and
e S1 , the set H() of s that are -Holder continuous at some x Q(0, 1)
is contained in
\

L n
\

:

m+`e
n

m+(`1)e 
n

M
n

o
.

M =1 n=1 mQ(0,n)Z `=1

Hence, again using translation invariance, we see that we need only show that
there is an L Z+ such that, for each M Z+ ,



(`1)e 
M
,
1

: `e
n W +1

n
n
n
H

(R ;R)

tends to 0 as n . To this end, set U (t, ) = K 2 (te), and observe that


the W +1 -distribution of {U (t) : t 0} is that of an R-valued ancient
H

(R ;R)

OrnsteinUhlenbeck process. Thus, what I have to estimate is



 `
`1 
`1
` 
P e 2n B e n e 2n B e n nM , 1 ` L ,


where B(t), Ft , P is an R-valued Brownian motion. But clearly this probability
is dominated by the sum of


`
`1 
` 
P B e n B e n M2ne 2n
, 1 ` L

Exercises for 8.5

355

and

P 1 ` L

`1 
1 
1 e 2n B e n

M e 2n
2n


.

M 2 n2(1)

8
, which, since < 1,
The second of these is easily dominated by 2Le
means that it causes no problems. As for the first, one can use the independence
of Brownian
increments
and Brownian
scaling to dominate it by the Lth power

 of

1 
P B(1)B e n M (2n )1 . Hence, I can take any L such that 12 L >
. 
As a consequence of the preceding and Theorem 8.3.1, we have the following
corollary.

Corollary 8.5.9. Given s R, set




+1
+1
s (R ; R) = B s 2 : 2 (R ; R) ,

kks (R ;R) = kB

+1
2 s

and
WH s (R ;R) = (B s

+1
2

) W

+1
2

(R ;R)

+1
2

(R ;R)
s

Then s (R ; R) is a separable Banach space in which H (R ; R) is continuously



embedded as a dense subspace, and H s (R ; R), s (R ; R), WH s (R ;R) is an
abstract Wiener space.
Exercises for 8.5
Exercise 8.5.10. In this exercise we will show how to use the OrnsteinUhlenbeck process to prove Poincar
es Inequality
Var0,1 () = k h, 0,1 ik2L2 (0,1 ;R) k0 k2L2 (0,1 ;R)

(8.5.11)

for the standard Gaussian distribution on R. I will outline the proof of (8.5.11)
for S (R; R), but the estimate immediately extends to any L2 (0,1 ; R)
whose (distributional) first derivative is again in L2 (0,1 ; R).
(i) For S (R; R), set
u (t, x) = EW

(1)



U (t, x) ,

where {U (t, x) : t 0} is the one-sided, R-valued OrnsteinUhlenbeck process


t
starting at x. Show that u0 (t, x) = e 2 u0 (t, x) and that

lim u (t, ) = and

t&0

lim u (t, ) = h, 0,1 i

in L2 (0,1 ; R).

Show that another expression for u is


!
Z
t
 1
(y e 2 x)2
t 2
dy.
(y) exp
u (t, x) = 2(1 e )
2(1 et )
R

Using this second expression, show that u (t, ) S (R; R) and that t
[0, ) 7 u (t, ) S (R; R) is continuous. In addition, show that u (t, x) =
1
00
0
2 u (t, x) xu (t, x) .

356

8 Gaussian Measures on a Banach Space

(ii) For 1 , 2 C 2 (R; R) whose second derivative are tempered, show that
1 , 002 x2


L2 (0,1 ;R)

= 01 , 02


L2 (0,1 ;R)

and use this together with (i) to show that, for any S (R; R),
hu (t, ), 0,1 i = h, 0,1 i and

d
ku (t, )k2L2 (0,1 ;R) = et ku0 (t, )k2L2 (0,1 ;R) .
dt

Conclude that ku (t, )kL2 (0,1 ;R) kkL2 (0,1 ;R) and
d
ku (t, )k2L2 (0,1 ;R) et k0 k2L2 (0,1 ;R) .
dt

Finally, integrate the preceding inequality to arrive at (8.5.11).


Exercise 8.5.12. In this exercise I will outline how the ideas in Exercise 8.5.10
can be used to give another derivation of the logarithmic Sobolev Inequality
(2.4.42). Again, I restrict my attention to S (R; R), since the general case
can be easily obtained from this by taking limits.
(i) Begin by showing that (2.4.42) for S (R; R) once one knows that
(*)

log


0,1

(0 )2


0,1

for uniformly positive R S (R; R).


(ii) Given a uniformly positive R S (R; R), use the results in Exercise
8.5.10 to show that



et u0 (t, )2
d

.
u (t, ) log u (t, ) 0,1 =
u (t, ) 0,1
2
dt

(iii) Continuing (ii), apply Schwarzs inequality to check that


u0 (t, x)2
u (0 )2 (t, x),
u (t, x)

and combine this with (ii) to get


et
d

u (t, ) log u (t, ) 0,1


2
dt

Finally, integrate this to arrive at (*).

(0 )2


.
0,1

Exercises for 8.5

357

Exercise 8.5.13. Although it should be clear that the arguments given in Exercises 8.5.10 and 8.5.12 work equally well in RN and yield (8.5.11) and (2.4.42)
with 0,1 replaced by 0,I and (0 )2 replaced by ||2 , it is significant that each
of these inequalities for R implies its RN analog. Indeed, show that Fubinis Theorem is all that one needs to pass to the higher dimensional results. The reason
why this remark is significant is that it allows one to prove infinite dimensional
versions of both Poincares Inequality and the logarithmic Sobolev Inequality,
and both of these play a crucial role in infinite dimensional analysis. In fact,
Nelsons interest in hypercontractive estimates sprung from his brilliant insight
that hypercontractive estimates would allow him to construct a non-trivial (i.e.,
non-Gaussian), translation invariant quantum field for R2 .
Exercise 8.5.14. It is interesting to see what happens if one changes the sign
of the second term on the right-hand side of (8.5.1), thereby converting the
centripetal force into a centrifugal one.
(i) Show that, for each (RN ), the unique solution to
V(t, ) = (t) +

1
2

V(, ) d,

t 0,

is

V(t, ) = e 2

e 2 d( ),

where the integral is taken in the sense of RiemannStieltjes.





(ii) Show that , V(t, ) RN : (t, ) [0, ) RN under W (N ) is a Gaussian
family with covariance
v(s, t) = e

s+t
2

|ts|
2

(iii) Let {B(t) : t 0} be an RN -valued Brownian motion, and show that the
distribution of


 t
e 2 B 1 et : t 0

is the W (N ) -distribution of {V(t) : t 0}. Next, let V (RN ) be the space of


continuous : [0, ) RN with the properties that
(0) = 0 = lim et |(t)|,
t


and set kkV (RN ) supt0 et |(t)|. Show that V (RN ); k kV (RN ) is a

separable Banach space and that there exists a unique V (N ) M1 V (RN )
such that the distribution of {(t) : t 0} under V (N ) is the same as the
distribution of {V(t) : t 0} under W (N ) .

358

8 Gaussian Measures on a Banach Space

(iv) Let HV (RN ) be the space of absolutely continuous h : [0, ) RN with


the properties that h(0) = 0 and h 12 h L2 [0, ); RN . Show that HV (RN )
with norm


khkHV (RN ) h 12 h L2 ([0,);RN )
V
N
is a separable Hilbert space that is continuously embedded
 in (R ) as a dense
V
N
V
N
(N )
subspace. Finally, show that H (R ), (R ), V
is an abstract Wiener
space.

(v) There is a subtlety here that is worth mentioning. Namely, show that
HU (RN ) is isometrically embedded in HV (RN ). On the other hand, as distinguished from elements of HU (RN ), it is not true that k 12 k2L2 (R;RN ) =
2L2 (R;RN ) + 41 kk2L2 (R;RN ) , the point being that whereas the elements h of
kk

HV (RN ) with h Cc (0, ); RN are dense in HU (RN ), they are not dense in

HV (RN ).
Exercise 8.5.15. Given x R and a slowly increasing C(R ; R), define
x C(R ; R) so that x (y) = (x + y) for y R . Next, extend x
to S 0 (R ; R) so that h, x ui = hx , ui for S (R ; R), and check that
this is a legitimate extension in the sense that it is consistent with the original
definition when applied to us that are slowly increasing, continuous functions.
Finally, given s R, define Ox : H s (R ; R) H s (R ; R) by Ox h = x h.
(i) Show that B s x = x B s for all s R and x R .
(ii) Given s R, define Ox = x  H s (R ; R), and show that Ox is an orthogonal
transformation.
(iii) Referring to Theorem 8.3.14 and Corollary 8.5.9, show that the measure

preserving transformation TOx that Ox determines on s (R ; R), WH s (R ;R) is
the restriction of x to s (R ; R).

(iv) If x 6= 0, show that TOx is ergodic on s (R ; R), WH s (R ;R) .
8.6 Brownian Motion on a Banach Space
In this concluding section I will discuss Brownian motion on a Banach space.
More precisely, given a non-degenerate, centered, Gaussian measure W on a
separable Banach space E, we will see that there exists an E-valued stochastic
process {B(t) : t 0} with the properties that B(0) = 0, t
B(t) is continuous,

and, for all 0 s < t, B(t) B(s) is independent of {B( ) : [0, s]} and
has distribution (cf. the notation in 8.4) Wts .
8.6.1. Abstract Wiener Formulation. Let W on E be as above, use H
to denote its CameronMartin space, and take H 1 (H) to be the Hilbert space
of absolutely continuous h : [0, ) H such that h(0) = 0 and khkH 1 (H) =
L2 ([0,);H) < . Finally, let (E) be the space of continuous : [0, )
khk

8.6 Brownian Motion on a Banach Space

359

E
= 0, and turn (E) into a Banach space with norm
E satisfying limt k(t)k
t
1
kk(E) = supt0 (1 + t) k(t)kE . By exactly the same line of reasoning as
I used when E = RN , one can show that (E) is a separable Banach space in
which H 1 (E) is continuously embedded as a dense subspace. My goal is to prove
the following statement.

Theorem 8.6.1. With H 1 (H) and (E) as above, there is a unique W (E)
M1 (E) such that H 1 (H), (E), W (E) is an abstract Wiener space.
1
Choose an orthonormal basis {h1m : m 0} in H
(R), and, for n 0, t 0,
P
n
N
1
and x = (x0 , . . . , xm , . . . ) E , set Sn (t, x) =
m=0 hm (t)xm . I will show
N
that, W -almost surely, {Sn ( , x) : n 0} converges in (E), and, for the
most part, the proof follows the same basic line of reasoning as that suggested in
Exercise 8.3.21 when E = RN . However, there is a problem here that we did not
encounter there. Namely, unless E is finite dimensional, bounded subsets will
not necessarily be relatively compact in E. Hence, local uniform equicontinuity
plus local boundedness is not sufficient to guarantee
that a collection of E-valued

paths is relatively compact in C [0, ); E , and that is the reason why we have
to work a little harder here.

Lemma 8.6.2. For W N -almost every x E N , {Sn ( , x) : n 0} is relatively


compact in (E).
Proof: Choose E0 E, as in Corollary 8.3.10, so
 that bounded subsets of E0
are relatively compact in E and H, E0 , W  E0 is again an abstract Wiener
space. Without loss in generality, I will assume
 that
 k kE k kE0 , and, by
Ferniques Theorem, we know that C EW0 kxk4E0 < .

Pn
Since (cf. Exercise 8.2.14) Sn (t, x) Sn (s, x) = m=0 h1t h1s , h1m H 1 (R) xm ,
where h1 = , the W0N -distribution of Sn (t) Sn (s) is Wn , where 2n =



Pn 1
1
1 2
WN
kSn (t) Sn (s)k4E0 C(t s)2 .
0 ht hs , hm H 1 (R) t s. Hence, E
In addition, {kSn (t) Sn (s)kE0 : n 1} is a submartingale, and so, by Doobs
Inequality plus Kolmogorovs Continuity Criterion, there exists a K < such
that, for each T > 0,
(*)

EW


sup

sup

n0 0s<tT

kSn (t) Sn (s)kE0

(t s)

1
8

KT 4 .

From (*) and Sn (0) = 0, we know that, W N -almost surely, {Sn ( , x) : n 0} is


uniformly k kE0 -bounded and uniformly k kE0 -equicontinuous on each interval
[0, T ]. Since this means that, for every T > 0, {Sn (t, x) : n 0 & t [0, T ]}
is relatively compact in E and {Sn ( , x)  [0, T ] : n 0} is uniformly k kE equicontinuous W N -almost surely, the AscoliArzela Theorem guarantees that,
W N -almost surely, {Sn ( , x) : n 0} is relatively compact in C [0, ); E with

360

8 Gaussian Measures on a Banach Space

the topology of uniform convergence on compacts. Thus, in order to complete


the proof, all that I have to show is that, W N -almost surely,
lim sup sup

T n0 tT

kSn (t, x)kE


= 0.
t

But,
sup
t2k

X 7`
X
kSn (t, x)kE
kSn (t, x)kE

2 8

sup
t
t
`
`+1
2 t2
`k

`k

and therefore, by (*),


"
EW

kSn (t, x)kE


sup sup
t
n0 t2k

sup
0t2`+1

kSn (t, x)kE


1

t8

24 K
1
8

2 1

2 8 . 

Now that we have the requisite compactness of {Sn : n 0}, convergence


comes to checking a criterion of the sort given in the following simple lemma.
Lemma 8.6.3. Suppose that {n : n 0} is a relatively compact sequence in
(E). If limn hn (t), x i exists for each t in a dense subset of [0, ) and x
in a weak* dense subset of E , then {n : n 0} converges in (E).
Proof: For a relatively compact sequence to be convergent, it is necessary and
sufficient that every convergent subsequence have the same limit. Thus, suppose
that and 0 are limit points of {n : n 0}. Then, by hypothesis, h(t), x i =
h0 (t), x i for t in a dense subset of [0, ) and x in a weak* dense subset of E .
But this means that the same equality holds for all (t, x ) [0, ) E and
therefore that = 0 . 
Proof of Theorem 8.6.1: In view of Lemmas 8.6.2 and 8.6.3 and the separability of E in the weak* topology, we will know that {Sn ( , x) : n 0}
converges in (E) for W N -almost every x E N once we show that, for each
(t, x ) [0, ) E , {hSn (t, x), x i : n 0} converges
for W N -almost
Pn in R
N

1
every x E . But if x E , then hSn (t, x), x i = 0 hxm , x ihm (t), the random variables x
hxm , x ih1m (t) are P
independent, centered Gaussians under

N
W with variance khx k2H h1m (t)2 , and 0 h1m (t)2 = kht k2H 1 (R) = t. Thus, by
Theorem 1.4.2, we have the required convergence.
Next, define B : [0, ) E N E so that

limn Sn (t, x) if {Sn ( , x) : n 0} converges in (E)
B(t, x) =
0
otherwise.

Given (E) , determine h H 1 (H) by h, h H 1 (H) = hh, i for all h
H 1 (H). I want to show that, under W N , x

hB( , x), i is a centered Gaussian

8.6 Brownian Motion on a Banach Space

361

with variance kh k2H 1 (H) . To this end, define xm E so that1 hx, xm i =


hh1m x, i for x E. Then,
n
X
hB( , x), i = lim hSn ( , x), i = lim
hxm , xm i
n

W N -almost surely.

Hence, hB( , x), i is certainly a centered Gaussian under W N , and, because we


are dealing with Gaussian random variables, almost sure convergence implies L2 convergence. To compute its variance, choose an orthonormal basis {hk : k 0}
for H, and note that, for each m 0,
WN

X


2
2
hxm , xm i = khxm kH =
hh1m hk , i2 .
k=0

Thus, since {h1m hk : (m, k) N2 } is an orthonormal basis in H 1 (H),


WN

X
X


2
2
1
2
hB( ), i =
hhm hk , i =
h1m hk , h H 1 (H) = kh k2H 1 (H) .
m,k=0

m,k=0

Finally, to complete the proof, all that remains is to take W (E) to be the
W N -distribution of x
B( , x). 
8.6.2. Brownian Formulation. Let (H, E, W) be an abstract Wiener space.
Given a probability space (, F, P), a non-decreasing family of sub--algebras
{Ft : t 0},
 and a measurable map B : [0, ) E, say that the triple
B(t), Ft , P is a W-Brownian motion if
(1) B is {Ft : t 0}-progressively measurable,

(2) B(0, ) = 0 and B( , ) C [0, ); E for P-almost every ,
(3) B(1) has distribution W, and, for all 0 s < t, B(t)B(s) is independent
1
of Fs and has the same distribution as (t s) 2 B(1).

Lemma 8.6.4. Suppose


that {B(t) : t 0} satisfies conditions (1) and (2).


Then B(t), Ft , P is a W-Brownian motion if and only if hB(t), x i, Ft , P is
an R-valued Brownian
motion for each x E with khx kH = 1. In addition,

if B(t), Ft , P is a W-Brownian motion, then the span G(B) of {hB(t), x i :
(t, x ) [0, ) E } is a Gaussian family in L2 (P; R) and



(8.6.5)
EP hB(t1 ), x1 ihB(t2 ), x2 i = (t1 t2 ) hx1 , hx2 H .
Conversely, if G(B) is a Gaussian family in L2 (P; R) and (8.6.5) holds,
then

B(t), Ft , P is a W-Brownian motion when Ft = {B( ) : [0, t]} .
1

Given h1 H 1 (R) and x E, I use h1 x to denote the element of (E) determined by


(t) = h1 (t)x.

362

8 Gaussian Measures on a Banach Space


Proof: If B(t), Ft , P is a W-Brownian motion and x E with khx kH = 1,
then hB(t), x i hB(s), x i = hB(t) B(s), x i is independent of Fs and is a
centered Gaussian with variance (t s). Thus, hB(t), x i, Ft , P is an R-valued
Brownian motion.

Next assume that hB(t), x i, Ft , P is an R-valued Brownian motion for every
x with khx kH = 1. Then hB(t) B(s), x i is independent of Fs for every
x E , and so, since BE is generated by {h , x i : x E }, B(t) B(s) is
independent of Fs . In addition, hB(t) B(s), x i is a centered Gaussian with
variance (t s)khx k2H , and therefore B(1) has distribution W and B(t) B(s)

1
has the same distribution as (t s) 2 B(1). Thus, B(t), Ft , P is a W-Brownian
motion.

Again assume that B(t), Ft , P is a W-Brownian motion. To prove that
G(B) is a Gaussian family for which (8.6.5) holds, it suffices to show that, for
all 0 t1 t2 and x1 , x2 E , hB(t1 ), x1 i + hB(t2 ), x2 i is a centered Gaussian
with covariance t1 khx1 + hx2 k2H + (t2 t1 )khx2 k2H . Indeed, we would then
know not only that G(B) is a Gaussian family but also that the variance of
hB(t1 ), x1 i hB(t2 ), x2 i is t1 khx1 hx2 k2H + (t2 t1 )khx2 k2H , from which (8.6.5)
is immediate. But

hB(t1 ), x1 i + hB(t2 ), x2 i = hB(t1 ), x1 + x2 i + hB(t2 ) B(t1 ), x2 i,


and the terms on the right are independent, centered Gaussians, the first with
variance t1 khx1 + hx2 k2H and the second with variance (t2 t1 )khx2 k2H .

Finally, take Ft = {B( ) : [0, t]} , and assume that G(B) is a Gaussian
family satisfying (8.6.5). Given x with khx kH = 1 and 0 s < t, we know
that hB(t) B(s), x i = hB(t), x i hB(s), x i is orthogonal in L2 (P; R) to
hB( ), y i for every [0, s] and y E . Hence, since Fs is generated by
{hB( ), y i : (, y ) [0, s]E }, we know that hB(t)B(s), x i is independent
of Fs . In addition, hB(t) B(s), x i is a centered
Gaussian with variance t s,

and so we have proved that hB(t), x i, Ft , P is an R-valued Brownian
 motion.
Now apply the first part of the lemma to conclude that B(t), Ft , P is a WBrownian motion. 
Theorem 8.6.6. Refer to the notation in
 Theorem 8.6.1. When = (E),
F = BE , and Ft = {( ) :  [0, t]} , (t), Ft , W (E) is a W-Brownian
motion. Conversely, if B(t), Ft , P is any W-Brownian motion, then B( , )
(E) P-almost surely and W (E) is the P-distribution of
B( , ).
Proof: To prove the first assertion, let t1 , t2 [0, ) and x1 , x2 E be given,
and define i (E) so that h, i i = h(ti ), xi i for i {1, 2}. Then (cf. the
notation in the proof of Theorem 8.6.1) hi = h1ti hxi , and so
EW

(E)





h(t1 ), x1 ih(t2 ), x2 i = h1 h2 H 1 (H) = (t1 t2 ) hx1 , hx2 H .

8.6 Brownian Motion on a Banach Space

363

Starting from this, it is an easy matter to check that the span of {h(t), x i :
(t, x ) [0, ) E } is a Gaussian family in L2 (W (E) ; R) that satisfies (8.6.5).
To prove the converse, begin by observing that, because G(B) is a Gaussian
family satisfying (8.6.5), the distribution of 7 B( , ) C [0, ); E

under P is the same as that of (E) 7 ( ) C [0, ); E under W (E) .
Hence




k(t)kE
kB(t)kE
(E)
= 0 = 1,
lim
=0 =W
P lim
t
t
t
t

and so B( , ) (E) P-almost surely and the distribution of


(E) is W (E) . 

B( , ) on

8.6.3. Strassens Theorem Revisited. What I called Strassens Theorem


in 8.4.2 is not the form in which Strassen himself presented it. Instead, his
formulation was in terms of rescaled R-valued Brownian motion, not partial sums
of independent random variables. The true statement of Strassens Theorem is
the following in the present setting.
Theorem 8.6.7 (Strassen). Given (E), define n (t) = (nt)
n for n 1
q
and t [0, ), where n = 2n log(2) (n 3). Then, for W (E) -almost every ,
the sequence {n : n 0} is relatively compact in (E) and BH 1 (H) (0, 1) is its
set of limit points. Equivalently, for W (E) -almost every ,

lim kn BH 1 (H) (0, 1)k(E) = 0

and, for each h BH 1 (H) (0, 1), limn kn hk(E) = 0.

Not surprisingly, the proof differs only slightly from that of Theorem 8.4.4.
In proving the W (E) -almost sure convergence of {n : n 1} to BH 1 (H) (0, 1),
there are two new ingredients here. The first is the use of the Brownian scaling
invariance property (cf. Exercise 8.6.8), which says that the W (E) is invariant
1
under the scaling maps S : (E) (E) given by S = 2 ( ) for
> 0 and is easily proved as a consequence of the fact that these maps are
isometric from H 1 (H) onto itself. The second new ingredient is the observation
that, for any R > 0, r (0, 1], and (E), k(r ) BH 1 (H) (0, R)k(E)
k BH 1 (H) (0, R)k(E) . To see this, let h BH 1 (H) (0, R) be given, and check
that h(r ) is again in BH (0, R) and that k(r ) h(r )k(E) k hk(E) .

364

8 Gaussian Measures on a Banach Space

Taking these into account and applying (8.4.2), one can now justify




W (E) m1max m
n BH 1 (H) (0, 1) (E)

n
!

m

2 (n m )
(E)

BH 1 (H) (0, 1)
=W
max


n
m1 n m
(E)







m1

[
]

m
W (E) m1max m m n BH 1 (H) 0,

m

n
2 [ m1 ]
2
(E)






[ m1 ]

m
W (E) BH 1 (H) 0,

m


2 [ m1 ]
2
(E)

 m


1
B
(0,
1)

= W (E) 2 1
m1
H (H)
[
]
(E)



R2 [ m1 ]
(E)
m1
log(2) [
]
= W m 2
k BH 1 (H) (0, 1)k(E) exp
m
[ m1 ]

for all (1, 2), R < inf{khkH 1 (H) : khk(E) }, and sufficiently large m 1.
Armed with this information, one can simply repeat the argument given at the
analogous place in the proof of Theorem 8.4.4.
The proof that, W (E) -almost surely, n approaches every h C infinitely often
also requires only minor modification. To begin, one remarks that if A (E)
is relatively compact, then
k(t)kE
= 0.
T A t[T
/ 1 ,T ] 1 + t
lim sup

sup

Thus, since, by the preceding, for W (E) -almost every , the union of {n : n 1}
and BH 1 (H) (0, 1) is relatively compact in (E), it suffices to prove that



n (t) n (k 1 ) h(t) h(k 1 ) kE
= 0 W (E) -almost surely
lim sup
1+t
n t[k1 ,k]

for each h BH 1 (H) (0, 1) and k 2. Because, for a fixed k 2, the random

variables k2m k2m (k 1 )  [k 1 , k], m 1, are W (E) -independent random
variables, we can use the BorelCantelli Lemma as in 8.4.2 and thereby reduce
the problem to showing that, if km (t) = km (t + k 1 ) km (k 1 ), then


W (E) kk2m hk(E) =

m=1

for each > 0, k 2, and h BH 1 (H) (0, 1). Finally, since W (E) km 1 is the
k2m
W (E) distribution of
k2m , the rest of the argument is the same as the one
given in 8.4.2.

Exercises for 8.6

365

Exercises for 8.6



Exercise 8.6.8. Let H 1 (H), (E), W (E) be as in Theorem 8.6.1.
1

(i) Given > 0, define S : (E) (E) so that S (t) = 2 (t), t


[0, ), and show that (S ) W (E) = W (E) . Again, this property is called Brownian scaling invariance.

(ii) Define I : (E) C [0, ); E so the I(0) = 0 and I(t) = t(t1 ) for
t > 0. Show that I is an isometry from (E) onto itself and that I  H 1 (H)
is an isometry on H onto itself. Finally, use this to prove the Brownian time
inversion invariance property: I W (E) = W (E) .

Exercise 8.6.9. Let H U (H) be the Hilbert space of absolutely continuous hU :


R H with the property that
q
khkH U (H) = kh U k2L2 (R;H) + 14 khU k2L2 (R;H) < ,

and take U (E) to be the Banach space of continuous U : R E satisfying


U
k U (t)
. If F : (E)
= 0 with norm kU kU (E) = suptR log(e+|t|)
lim|t| klog(t)k
t
t

C(R; E) is given by [F ()](t) = e 2 (et ), show


 that F takes (E) continuously
into U (E) and that H U (H), U (E), U (E) is an abstract Wiener space when
(E)
(E)
UR = F W (E) . Of course, one should recognize the measure UR as the
distribution of an E-valued, reversible, OrnsteinUhlenbeck process.

Exercise 8.6.10. A particularly interesting case of the construction in Exercise


8.6.9 is when H = H 1 (RN ) and E = (RN ). Working
in that setting, define

B : R [0, ) U (E) RN by B (s, t), = [(s)](t), and show that,
(RN ) 
for each s R, B(s, t), F(s,t) , UR
is an RN -valued Brownian motion when
F(s,t) = {B(s, ) : [0, t]} . Next, for each t [0, ), show that the

(E)
UR -distribution of
B( , t) is that of t times a reversible, RN -valued
OrnsteinUhlenbeck process.

Exercise 8.6.11. Continuing in the same setting as in the preceding, set 2 =



(E) 
EW
kk2(E) , and combine the result in Exercise 8.2.16 with Brownian scaling
invariance to show that
!


R2
(E)
,
W
sup k(t)kE R K exp
72 2 t
[0,t]

where K is the constant in Ferniques Theorem. Next, use this together with
Theorem 8.4.4 and the reasoning in Exercise 4.3.16 to show that

k(t)kE
k(t)kE
= L = lim q
lim q
t&0
2t log(2) t
2t log(2)


where L = sup khkE : h BH (0, 1) .
t

W (E) -almost surely,


1
t

366

8 Gaussian Measures on a Banach Space

Exercise 8.6.12. It should be recognized that Theorem 8.4.4 is an immediate


corollary of Theorem 8.6.7. To see this, check that {(n) : n 1} has the same
distribution under W (E) as {Sn : n 1} has under W N and that BH (0, 1) =
{h(1) : h BH 1 (H) }, and use these to show that Theorem 8.4.4 follows from
Theorem 8.6.7.

Exercise 8.6.13. For (E) and n Z+ , define n (E) so that


n (t) =

log(2) (n 3)

 
t
,
n

t [0, ),

and show that, W (E) -almost surely, {n : n 1} is relatively compact in (E)


and that BH 1 (H) (0, 1) is the set of its limit points.

Hint: Referring to (ii) in Exercise 8.6.8, show that it suffices to prove these
properties for the sequence {(I)n : n 1}. Next check that




(I)n Ih
=
n h (E)
(E)

for h H 1 (H),

and use Theorem 8.6.7 and the fact that I is an isometry of H 1 (H) onto itself.

Chapter 9
Convergence of Measures on a Polish Space

In Chapters 2 and 3, I introduced a notion of convergence on M1 (RN ) that is


appropriate when discussing either Central Limit phenomena or the sort of limits
that arose in connection with infinitely divisible laws. In this chapter, I will give
a systematic treatment of this sort of convergence and show how it extends
to probability measures on any Polish space, that is, any complete, separable,
metric space. Unfortunately, this extension will entail an excursion into territory
that borders on abstract nonsense, although I hope to avoid crossing that border.
In any case, just as Banachs great achievement was the ingenious use for infinite
dimensional vector spaces of completeness to replace local compactness, so here
we will have to learn how to substitute compactness by completeness in measure
theoretic arguments.
9.1 ProhorovVaradarajan Theory
The goal in this section is to generalize results like Lemma 2.1.7 and Theorem
3.1.1 to a very abstract setting.
9.1.1. Some Background. When discussing the convergence of probability measures on a measurable space (E, B), one always has at least two senses
in which the convergence may take place, and (depending on additional structure that the space may possess)
one may have more. To be more precise,

let B(E; R) B (E, B); R be the space of bounded, R-valued, B-measurable
functions on E, use M1 (E) M1 (E, B) to denote the space of all probability
measures on (E, B), and define the duality relation
Z
h, i =
d for B(E; R) and M1 (E).
E

Next, again use kku supxE |(x)| to denote the uniform norm of
B(E; R), and consider the neighborhood basis at M1 (E) determined by
the sets




U (, r) = M1 (E) : h, i h, i < r for B(E, R) with kku 1
as r runs over (0, ). For obvious reasons, the topology defined by these neighborhoods U is called the uniform topology on M1 (E). In order to develop
some feeling for the uniform topology, I will begin by examining a few of its
elementary properties.
367

368

9 Convergence of Measures on a Polish Space

Lemma 9.1.1.
M1 (E) by

Define the variation distance between elements and of

n
o

k kvar = sup h, i h, i : B(E; R) with kku 1 .
Then (, ) M1 (E)2 7 k kvar is a metric on M1 (E) that is compatible
with the uniform topology. Moreover, if , M1 (E) are two elements of
M1 (E) and is any element of M1 (E) with respect to which both and are
absolutely continuous (e.g., +
2 ), then

(9.1.2)

k kvar = kg f kL1 (;R) ,

where f =

d
.
and g =

In particular, k kvar 2, and equality holds precisely when (i.e., they


are singular). Finally, the metric (, ) M1 (E)2 7 k kvar is complete.
Proof: The first assertion needing comment is the one in (9.1.2). But, for every
B(E; R) with kku 1,


Z



h, i h, i = (g f ) d kg f kL1 (;R) ,
E

and equality holds when = sgn (g f ). To prove the assertion that follows
(9.1.2), note that
kg f kL1 (;R) kf kL1 (;R) + kgkL1 (;R) = 2
and that the inequality is strict if and only if f g > 0 on a set of strictly positive
-measure or, equivalently, if and only if 6 . Thus, all that remains is to
check the completeness assertion. To this end, let {n : n 1} M1 (E)
satisfying
lim sup kn m kvar = 0
m nm

P
be given, and set = n=1 2n n . Clearly, is an element of M1 (E) with
n
respect to which each n is absolutely continuous. Moreover, if fn = d
d , then,
1
by (9.1.2), {fn : n 1} is a Cauchy convergent sequence in L (; R). Hence,
since L1 (; R) is complete, there is an f L1 (; R) to which the fn s converge in
L1 (; R). Obviously, we may choose f to be non-negative, and certainly it has
-integral 1. Thus, the measure given by d = f d is an element of M1 (E),
and, by (9.1.2), kn kvar 0. 
As a consequence of Lemma 9.1.1, we see that the uniform topology on M1 (E)
admits a complete metric and that convergence in this topology is intimately related to L1 -convergence in the L1 -space of an appropriate element of M1 (E).

9.1 ProhorovVaradarajan Theory

369

In fact, M1 (E) looks in the uniform topology like a galaxy that is broken into
many constellations, each constellation consisting of measures that are all absolutely continuous with respect to some fixed measure. In particular, there will
usually be too many constellations for M1 (E) in the uniform topology to be
separable. To wit, if E is uncountable and {x} B for every x E, then the
point masses x , x E, (i.e., x () = 1 (x)) form an uncountable subset of
M1 (E) and ky x kvar = 2 for y 6= x. Hence, in this case, M1 (E) cannot be
covered by a countable collection of open k kvar -balls of radius 1.
As I said at the beginning of this section, the uniform topology is not the only
one available. Indeed, for many purposes and, in particular, for probability theory, it is too rigid a topology to be useful. For this reason, it is often convenient
to consider a more lenient topology on M1 (E). The first one that comes to mind
is the one that results from eliminating the uniformity in the uniform topology.
That is, given a M1 (E), define
o


 n
(9.1.3) S , ; 1 , . . . , n M1 (E) : max hk , i hk , i <
1kn

for n Z+ , 1 , . . . , n B(E; R), and > 0. Clearly these sets S determine a


Hausdorff topology on M1 (E) in which the net { : A} converges to if
and only if lim h, i = h, i for every B(E; R). For historical reasons,
in spite of the fact that it is obviously weaker than the uniform topology, this
topology on M1 (E) is sometimes called the strong topology, although, in some
of the statistics literature, it is also known as the -topology.
A good understanding of the relationship between the strong and uniform
topologies is most easily gained through functional analytic considerations that
will not be particularly important for what follows. Nonetheless, it will be useful
to recognize that, except in very special circumstances, the strong topology is
strictly weaker than the uniform topology. For example, take E = [0, 1] withits
Borel field, and consider the probability measures n (dt) = 1 + sin(2nt) dt
for n Z+ . Noting that, since | sin(2nt) sin(2mt)| 2 and therefore
Z 1
| sin(2nt) sin(2mt)|
1
dt
2 kn m kvar =
2
0
Z
2
1
1 1
sin(2nt) sin(2mt) dt =

4
4 0

for m 6= n, one sees that {n : n 1} not only fails to converge in the uniform
topology, it does
 1not even have any limit points as n 2 . On the other
hand, because 2 2 sin(2nt) : n 1 is orthonormal in L [0,1] ; R , Bessels
Inequality says that
!2
Z

X
2
(t) sin(2nt) dt
kk2L2 ([0,1] ) kk2u <
n=1

[0,1]

370

9 Convergence of Measures on a Polish Space


and therefore h, n i h, [0,1] i for every B [0, 1]; R . In other words,
{n : n 1} converges to [0,1] in the strong topology, but it converges to nothing
at all in the uniform topology.
9.1.2. The Weak Topology. Although the strong topology is weaker than
the uniform and can be effectively used in various applications, it is still not
weak enough for most probabilistic applications. Indeed, even when E possesses
a good topological structure and B = BE is the Borel field over E, the strong
topology on M1 (E) shows no respect for the topology on E. For example,
suppose that E is a metric space and, for each x E, consider the point mass
x on BE . Then, no matter how close x E \ {x} gets to y in the sense
of the topology on E, x is not getting close to y in the strong topology on
M1 (E). More generally (cf. Exercise 9.1.15), measures cannot be close in the
strong topology unless their sets of small measure are essentially the same. Thus,
for example, the convergence that is occurring in The Central Limit Theorem
(cf. Theorem 2.1.8) cannot, in general, be taking place in the strong topology;
and since The Central Limit Theorem is an archetypal example of the sort of
convergence result at which probabilists look, it is only sensible for us to take a
hint from the result that we got there.
Thus, let E be a metric space, set B = BE , and consider the neighborhood
basis at M1 (E) given by the sets S(, ; 1 , . . . , n ) in (9.1.3) when the
k s are restricted to be elements of Cb (E; R). The topology that results is much
weaker than the strong topology, and is therefore justifiably called the weak
topology on M1 (E). (The reader who is familiar with the language of functional
analysis will, with considerable justice, complain about this terminology. Indeed,
if one thinks of Cb (E; R) as a Banach space and of M1 (E) as a subspace of its
dual space Cb (E; R) , then the topology that I am calling the weak topology
is what a functional analyst would call the weak topology. However, because
it is the most commonly accepted choice of probabilists, I will continue to use
the term weak instead of the more correct term weak .) In particular, the weak
topology respects the topology on E: y tends to x in the weak topology on
M1 (E) if and only if y x in E. Lemma 2.3.3 provides further evidence
that the weak topology is well adapted to the sort of analysis encountered in
probability theory, since, by that lemma, weak convergence of {n : n 1}
M1 (RN ) to is equivalent to pointwise convergence of
cn () to
().
Besides being well adapted to probabilistic analysis, the weak topology turns
out to have many intrinsic virtues that are not shared by either the uniform or
strong topologies. In particular, as we will see shortly, when E is a separable
metric space, the weak topology on M1 (E) is not only a metric topology, which
(cf. Exercise 9.1.15) the strong topology seldom is, but it is even separable,
which, as we have seen, the uniform topology seldom is. In order to check these
properties, we will first have to review some elementary facts about separable
metric spaces.
Given a metric for a topological space E, I will use Ub (E; R) to denote

9.1 ProhorovVaradarajan Theory

371

the space of bounded, -uniformly continuous R-valued functions on E and will


endow Ub (E; R) with the topology determined by the uniform norm. Thus,
Ub (E; R) becomes in this way a closed subspace of Cb (E; R).
Lemma 9.1.4. Let E be a separable metric space. Then E is homeomorphic
+
to a subset of [0, 1]Z . In particular:
(i) If E is compact, then the space C(E; R) is separable with respect to the
uniform metric.
(ii) Even when E is not compact, it nonetheless admits a metric with respect
to which it becomes a totally bounded metric space.
(iii) If is a totally bounded metric on E, then Ub(E; R) is separable.
Proof: Let be any metric on E, and choose {pn : n 1} to be a countable,
+
dense subset of E. Next, define h : E [0, 1]Z to be the mapping whose nth
coordinate is given by
hn (x) =

(x, pn )
,
1 + (x, pn )

x E.

It is then an easy matter to check that h is homeomorphic onto a subset of


+
[0, 1]Z .
+
To prove (i), I will first check it for compact subsets K of E = [0, 1]Z . To this
+
end, denote by P the space of polynomials p : [0, 1]Z R. That is, P consists
+
of finite, R-linear combinations of the monomials [0, 1]Z 7 kn11 kn`` ,
where ` 1, 1 k1 < < k` , and {n1 , . . . , n` } N. Clearly, if P0 is the
subset of P consisting of those ps with rational coefficients, then P0 is countable,
and P0 is dense in P. Thus, it suffices to show that {p  K : p P} is dense
in C(K; R). But P is obviously an algebra. In addition, if and are distinct
+
points in [0, 1]Z , it is an easy (in fact, a one dimensional) matter to see that
there is a p P for which p() 6= p(). Hence, the desired density follows
from the StoneWeierstrass Approximation Theorem. Finally, for an arbitrary
+
compact metric space E, define h : E [0, 1]Z as above, note that K h(E)
is compact, and conclude that the map C(K; R) 7 h C(E; R) is
a homeomorphism between the uniform topologies on these spaces. Since we
already know that C(K; R) is separable, this completes (i).
The proof of (ii) is easy. Namely, define
D(x, ) =

X
|n n |
2n
n=1

for x, [0, 1]Z .

Clearly, D is a metric for [0, 1]Z , and therefore


(x, y) E 2 7 (x, y) D h(x), h(y)

372

9 Convergence of Measures on a Polish Space


+

is a metric for E. At the same time, since [0, 1]Z is compact, and therefore
the restriction of D to any subset is totally bounded, it is clear that is totally
bounded on E.
denote the completion of E with respect to the totally
To prove (iii), let E
E
is both complete and
bounded metric . Then, because E is dense in E,

R 7  E
totally bounded and therefore compact. In addition, C E;
Ub(E; R) is a surjective homeomorphism; and so (iii) now follows from (i). 
One of the main reasons why Lemma 9.1.4 will be important to us is that it
will enable us to show that, for separable metric spaces E, the weak topology
on M1 (E) is also a separable metric topology. However, thus far we do not
even know that the neighborhood bases are countably generated, and so, for a
moment longer, I must continue to consider nets when discussing convergence.
In order to indicate that a net { : A} M1 (E) is converging weakly
(i.e., in the weak topology) to , I will write = .
Theorem 9.1.5. Let E be any metric space and { : A} a net in M1 (E).
Given any M1 (E), the following statements are equivalent:
(i) = .
(ii) If is any metric for E, then h, i h, i for every Ub (E; R).
(iii) For every closed set F E, lim (F ) (F ).

(iv) For every open set G E, lim (G) (G).

(v) For every upper semicontinuous function f : E R that is bounded above,


limhf, i hf, i.

(vi) For every lower semicontinuous function f : E R that is bounded below,


limhf, i hf, i.

(vii) For every f B(E; R) that is continuous at -almost every x E,


hf, i hf, i.
Finally, assume that E is separable, and let be a totally bounded metric
for

E. Then there exists a countable subset {n : n 1} Ub(E; [0, 1] that is
+
dense in Ub(E; R), and therefore the mapping H : M1 (E) [0, 1]Z given by
H() = h1 , i, . . . , hn , i, . . . is a homeomorphism from the weak topology
+
on M1 (E) into [0, 1]Z . In particular, when E is separable, M1 (E) with the
weak topology is itself a separable metric space and, in fact, one can take


X
hn , i hn , i
2
(, ) M1 (E) 7 R(, )
2n
n=1

to be a metric for M1 (E).

9.1 ProhorovVaradarajan Theory

373

Proof: The implications


(iii) (iv),

(vii) = (i) = (ii),

and (v) (vi)

are all trivial. Thus, the first part will be complete once I check that (ii) =
(iii), (iv) = (vi), and that (v) together with (vi) imply (vii). To see the
first of these, let F be a closed subset of E, and set

n (x) = 1

(x, F )
1 + (x, F )

 n1

for n Z+ and x E.

It is then clear that n Ub (E; R) for each n Z+ and that 1 n (x) & 1F (x)
as n for each x E. Thus, The Monotone Convergence Theorem followed
by (ii) imply that
(F ) = lim hn , i = lim limhn , i lim (F ).
n

In proving that (iv) = (vi), I may and will assume that f is a non-negative,
lower semicontinuous function. For n N, define
fn =

X
` 4n
`=0

where


I`,n =

2n

1I`,n

4
1 X
1J`,n f,
f = n
2
`=0

` `+1
,
2n 2n


and J`,n =


`
, .
2n

It is then clear that 0 fn % f and therefore that hfn , i hf, i as n .


At the same time, by lower semicontinuity, the sets {f J`,n } are open, and so
(iv) implies
hfn , i limhfn , i limhf, i

for each n Z . After letting n , one sees that (iv) = (vi).


Turning to the proof that (v) & (vi) = (vii), suppose that f B(E; R) is
continuous at -almost every x E, and define
f (x) = lim f (y)
yx

and f (x) = lim f (y)


yx

for x E.

It is then an easy matter to check that f f f everywhere and that equality holds -almost surely. Furthermore, f is lower semicontinuous, f is upper
semicontinuous, and both are bounded. Hence, by (v) and (vi),

limhf, i limhf , i hf , i = hf , i limhf , i limhf, i;

374

9 Convergence of Measures on a Polish Space

and so I have now completed the proof that conditions (i) through (vii) are
equivalent.
Now assume that E is separable, and let be a totally bounded metric for E.
By (iii) of Lemma 9.1.4, Ub(E; R) is separable. Hence, we can find a countable
set {n : n 1} that is dense in Ub(E; R). In particular, by the equivalence of
(i) and (ii) above, we see that hn , i hn , i for all n Z+ if and only if
+
= , which is to say that the corresponding map H : M1 (E) [0, 1]Z is
+
a homeomorphism. Since [0, 1]Z is a compact metric space and D (cf. the proof
of (ii) in Lemma 9.1.4) is a metric for it, we also see that the R described is a
totally bounded metric for M1 (E). In particular, M1 (E) is separable. Finally,
since, by (ii) in Lemma 9.1.4, it is always possible to find a totally bounded
metric for E, the last assertion needs no further comment. 
The reader would do well to pay close attention to what (iii) and (iv) say
about the nature of weak convergence. Namely, even though = , it is
possible that some or all of the mass that the s assign to the interior of a
set may gravitate to the boundary in the limit. This phenomenon is most easily
understood by taking E = R, to be the unit point mass at  [0, 1),
checking that = 1 , and noting that 1 (0, 1) = 0 < 1 = (0, 1) for each
[0, 1).
Remark 9.1.6. Those who find nets distasteful will be pleased to learn that,
from now on, I will be restricting my attention to separable metric spaces E and
therefore need only discuss sequential convergence when working with the weak
topology on M1 (E). Furthermore, unless the contrary is explicitly stated, I will
always be thinking of the weak topology when working with M1 (E).
Given a separable metric space E, I next want to find conditions that guarantee
that a subset of M1 (E) is compact; and at this point it will be convenient to
have introduced the notation K E to indicate that K is a compact subset
of E. The key to my analysis is the following extension of the sort of Riesz
Representation result in Theorem 3.1.1 combined with a crucial observation
made by S. Ulam.1
Lemma 9.1.7. Let E be a separable metric space, a metric for E, and a
non-negative linear functional on Ub (E; R) (i.e., is a linear map that assigns
a non-negative value to a non-negative Ub (E; R)) with (1) = 1. Then, in
order for there to be a (necessarily unique) M1 (E) satisfying () = h, i
for all Ub (E; R), it is sufficient that, for every  > 0, there exist a K E
1

It is no accident that Ulam was the first to make this observation. Indeed, the term Polish
space was coined by Bourbaki in recognition of the contribution made to this subject by the
Polish school in general and C. Kuratowski in particular (cf. Kuratowskis Topologie, Vol. I,
WarszawaLwow (1933)). Ulam had studied with Kuratowski.

9.1 ProhorovVaradarajan Theory

375

such that
(9.1.8)



() sup |(x)| + kku ,
xK

Ub (E; R).

Conversely, if E is a Polish space and M1 (E), then for every  > 0 there is a
K E such that (K) 1 . In particular, if M1 (E) and () = h, i
for Cb (E; R), then, for each  > 0, (9.1.8) holds for some K E.
Proof: I begin with the trivial observation that, because is non-negative and
(1) = 1, () kku . Next, according to the Daniell theory of integration,
the first statement will be proved as soon as we know that (n ) & 0 whenever

{n : n 1} is a non-increasing sequence of functions from Ub E; [0, ) that
tend pointwise to 0 as n . To this end, let  > 0 be given, and choose
K E so that (9.1.8) holds. One then has that



lim n lim sup |n (x)| + k1 ku = k1 ku ,

n xK

since, by Dinis Lemma, n & 0 uniformly on compact subsets of E.


Turning to the second part, assume that E is Polish, and use B(x, r) to denote
the open ball of radius r > 0 around x E, computed with respect to a complete
metric for E. Next,
let {pk : k 1} be a countable dense subset of E, and set

Bk,n = B pk , n1 for k, n Z+ . Given M1 (E) and  > 0, we can choose,
for each n Z+ , an `n Z+ so that
`n
[

!
Bk,n

k=1


.
2n

Hence, if
Cn

`n
[
k=1

B k,n

and K =

Cn ,

n=1

then (K) 1 . At the same time, it is obvious that, on the one hand,
K is closed (and therefore -complete) and that, on the other hand, K

S`n
2
for every n Z+ . Hence, K is both complete and totally
k=1 B pk , n
bounded with respect to and, as such, is compact. 
As Lemma 9.1.7 makes clear, probability measures on a Polish space like to
be nearly concentrated on a compact set. Following Prohorov and Varadarajan,2
2

See Yu. V. Prohorovs article Convergence of random processes and limit theorems in probability theory, Theory of Prob. & Appl., which appeared in 1956. Independently, V.S.
Varadarajan developed essentially the same theory in Weak convergence of measures on a
separable metric spaces, Sankhy
a, which was published in 1958. Although Prohorov got into
print first, subsequent expositions, including this one, rely heavily on Varadarajan.

376

9 Convergence of Measures on a Polish Space

what we are about to see is that, for a Polish space E, relatively compact subsets
of M1 (E) are those whose elements are nearly concentrated on the same compact
set of E. More precisely, given a separable metric space E, say that M M1 (E)
is tight if, for every  > 0, there exists a K E such that (K) 1  for
all M .
Theorem 9.1.9. Let E be a separable metric space and M M1 (E). Then
M is compact if M is tight. Conversely, when E is Polish, M is tight if M is
compact.3

Proof: Since it is clear, from (iii) in Theorem 9.1.5, that M is tight if and only
if M is, I will assume throughout that M is closed in M1 (E).
To prove the first statement, take
 to be a totally bounded metric on E,

choose {n : n 1} Ub E; [0, 1] accordingly, as in the last part of Theorem


9.1.5, and let 0 = 1. Given a sequence {` : ` 1} M1(E), we can use a
standard diagonalization procedure to extract a subsequence `k : k 1 such
that
(n ) lim hn , `k i
k

exists for each n N. Since () limk h, `k i continues to exist for


every in the uniform closure of the span of {n : n 1}, we now see that
determines a non-negative linear functional on Ub(E; R) and that (1) = 1.
Moreover, because M is tight, we can find, for any  > 0, a K E such that
(K) 1  for every M , and therefore (9.1.8) holds with this choice
of K. Hence, by Lemma 9.1.7, we know that there is a M1 (E) for which
() = h, i, Ub(E; R). Because this means that h, `k i h, i for
every Ub(E; R), the equivalence of (i) and (ii) in Theorem 9.1.5 allows us
to conclude that `k = .
Finally, suppose that E is Polish and that M is compact in M1 (E). To see
that M must be tight, repeat the argument used to prove the second part of
Lemma 9.1.7. Thus, choose Bk,n for k, n Z+ as in the proof there, and set
f`,n () =

`
[

!
Bk,n

for `, n Z+ .

k=1

By (iv) in Theorem 9.1.5, M1 (E) 7 f`,n () [0, 1] is lower semicontinuous. Moreover, for each n Z+ , f`,n % 1 as ` % . Thus, by Dinis Lemma,
we can choose, for each n Z+ , one `n Z+ so that f`n ,n () 1 2n for all
3

For the reader who wishes to investigate just how far these results can be pushed before
they start of break down, a good place to start is Appendix III in P. Billingsleys Convergence
of Probability Measures, Wiley (1968). In particular, although it is reasonably clear that
completeness is more or less essential for the necessity, the havoc that results from dropping
separability may come as a surprise.

9.1 ProhorovVaradarajan Theory

377

M ; and at this point the rest of the argument is precisely the same as the
one given at the end of the proof of Lemma 9.1.7. 
9.1.3. The L
evy Metric and Completeness of M1 (E). We have now seen
that M1 (E) inherits properties from E. To be more specific, if E is a metric
space, then M1 (E) is separable or compact if E itself is. What I want to show
next is that completeness also gets transferred. That is, I will show that M1 (E)
is Polish if E is. In order to do this, I will need a lemma that is of considerable
importance in its own right.
Lemma 9.1.10. Let E be a Polish space and a bounded subset of Cb (E; R)
that is equicontinuous at each x E. (That is, for each x E, sup |(y)
(x)| = 0 as y x.) If {n : n 1} {} M1 (E) and n = , then




lim sup h, n i h, i = 0.

Proof: Let  > 0 be given, and use the second part of Theorem 9.1.9 to choose
K E so that





sup kku
sup n K{ < .
4

nZ+

By (iv) of Theorem 9.1.5, K{ satisfies the same estimate. Next, choose a
metric for E and a countable dense set {pk : k 1} in K. Using equicontinuity

together with compactness, find ` Z+ and 1 , . . . , ` > 0 so that K x :
(x, pk ) < k for some 1 k ` and




sup (x) (pk ) <
4

for 1 k ` and x K with (x, pk ) < 2k .



Because r (0, ) 7 y K : (y, x) r
[0, 1] is non-decreasing

for each x K, we can find,
for each 1 k `, an rk k , 2k such that

(Bk ) = 0 when Bk x K : x, pk < rk . Finally, set A1 = B1 and
Sk
S`
Ak+1 = Bk+1 \ j=1 Bj for 1 k < `. Then, K k=1 Ak , the Ak s are
disjoint, and, for each 1 k `,



sup sup (x) pk <
4
xAk


and Ak = 0.

Hence, by (vii) in Theorem 9.1.5 applied to the 1Aks,


`


X






sup pk n Ak Ak = . 
lim sup h, n i h, i <  + lim

k=1

378

9 Convergence of Measures on a Polish Space

Theorem 9.1.11. Let E be a Polish space and a complete metric for E.


Given (, ) M1 (E)2 , define
n

L(, ) = inf : (F ) F () +
o

and (F ) F () + for all closed F E ,
where F () denotes the set of x E that lie a -distance less than from F .
Then L is a complete metric for M1 (E), and therefore M1 (E) is Polish.
Proof: It is clear that L is symmetric and that it satisfies the triangle inequality. Thus,
 we will know that it is a metric for M1 (E) as soon as we show
that L n , 0 if and only if n = . To this end, first suppose that


L n , 0. Then, for every closed F , F () + limn n (F ) for all
> 0; and therefore, by countable additivity, (F ) limn n (F ) for every
closed F . Hence, by the equivalence of (i) and (iii) in Theorem 9.1.5, n = .
Now suppose that n = , and let > 0 be given. Given a closed F in E,
define

x, F () {

for x E.
F (x) =
x, F () { + (x, F )

It is then an easy matter to check that both


1F F 1F ()


(x, y)
.
and F (x) F (y)

In particular, by Lemma 9.1.10, we can choose m Z+ so that



n
o


sup sup hF , n i hF , i : F closed in E < ,
nm

from which it is an easy matter to see that, for all n m,




(F ) n F () + and n (F ) F () + .

In other words, supnm L n , , and, since > 0 was arbitrary, we have

shown that L n , 0.
In order to finish the proof, I must show that if {n : n 1} M1 (E) is
L-Cauchy convergent, then it is tight. Thus, let  > 0 be given, and choose, for
each ` Z+ , an m` Z+ and a K` E so that




max n K` { `+1 .
sup L n , m` `+1 and
1nm
2
2
`
nm`
( ) 
one then has that supnZ+ n K` ` { ` for each ` Z+ .
T
( )
In particular, if K `=1 K` ` , then n (K) 1  for all n Z+ . Finally,

Setting ` =


,
2`

9.1 ProhorovVaradarajan Theory

379

because each K` is compact, it is easy to see that K is both -complete and


totally bounded and therefore also compact. 
When E = R, P. Levy was the first to construct a complete metric on M1 (E),
and it is for this reason that I will call the metric L described in Theorem 9.1.11
the L
evy metric determined by . Using an abstract argument, Varadarajan
showed that M1 (E) must be Polish whenever E is, and the explicit construction
that I have used is essentially the one first produced by Prohorov.
Before closing this subsection, it seems appropriate to introduce and explain
some of the more classical terminology connected with applications of weak convergence to probability theory. For this purpose, let (, F, P) be a probability
space and E a metric space. Given a sequence {Xn : n 1} of E-valued random variables on (, F, P), one says that the {Xn : n 1} tends in law (or in
L
distribution) to the E-valued random variable X and writes Xn X if (cf.
Exercise 1.1.16) (Xn ) P = X P. The idea here is that, when the measures under consideration are the distributions of random variables, one wants to think
of weak convergence of the distributions as determining a kind of convergence
of the corresponding random variables. Thus, one can add convergence in law
to the list of possible ways in which random variables might converge. In order
to elucidate the relationship between convergence in law, P-almost sure convergence, and convergence in P-measure, it will be convenient to have the following
lemma.
Lemma 9.1.12. Let (, F, P) be a probability space and E a metric space.
Given any E-valued random variables {Xn : n 1} {X} on (, F, P) and any
pair of topologically equivalent metrics and for E, Xn , X 0 in Pmeasure if and only if Xn , X 0 in P-measure. In particular, convergence
in P-measure does not depend on the choice of metric, and so one can write
Xn X in P-measure without specifying a metric. Moreover, if Xn X in
L
P-measure, then Xn X. In fact, if E is a Polish space and L is the Levy
metric on M1 (E) associated with a complete metric for E, then
L X P, Y P) P (X, Y )

for all > 0 and E-valued random variables X and Y .


Proof: To prove the first assertion, suppose that
(Xn , X) 0 in P-measure but that (Xn , X)
6 0 in P-measure.
After passing to a subsequence if necessary,
we could then arrange that (Xn , X)

0 (a.s., P) but P (Xn , X)   for all n Z+ and some  > 0. But this
is impossible, since then we would have that (Xn , X) 0 P-almost surely
but not in P-measure. Hence, we now know that convergence in P-measure does

380

9 Convergence of Measures on a Polish Space

not depend on the choice of metric. To complete the first part, suppose that
(Xn , X) 0 in P-measure. Then, for every Ub (E; R) and > 0,







lim EP Xn EP X) lim EP Xn (X)
n
n



() + kku lim P Xn , X = (),
n

where


() sup |(y) (x)| : (x, y) 0 as

& 0.

Thus, by (ii) in Theorem 9.1.5, (Xn ) P = X P.


Now assume that E is Polish, and take and L accordingly. Then, for any
closed set F and > 0,


X P(F ) = P(X F ) P (Y, F ) < + P (X, Y )


= Y P F () + P (X, Y ) .
Hence, since the same is true when the roles of X and Y are reversed, the
asserted estimate for L X P, Y P) holds. 
As a demonstration of the sort of use to which one can put these ideas, I
present the following version of the Principle of Accompanying Laws.
Theorem 9.1.13. Let E be a Polish space and, for each k Z+ , let {Yk,n :
n 1} be a sequence of E-valued random variables on the probability space
(, F, P). Further, assume that, for each k Z+ , there is a k M1 (E) such

that Yk,n
P = k as n . Finally, let be a complete metric for E, and
suppose that {Xn : n 1} is a sequence of E-valued random variables on
(, F, P) with the property that



(9.1.14)
lim lim P Xn , Yk,n = 0 for every > 0.
k n

Then there is a M1 (E) such that k = as k and (Xn ) P = as



L
n . In particular, if, as n , Yn X and P (Xn , Yn ) 0 for
L

each > 0, then Xn X.


Proof: Let L be the Levy metric associated with a complete metric for E.
By the second part of Lemma 9.1.12,




sup L (Y`,n ) P, (Xn ) P sup lim P (Y`,n , Xn ) ,
`k n

`k

and therefore, by (9.1.14),


(*)


lim lim L (Y`,n ) P, (Xn ) P = 0.

k n

Exercises for 9.1

381

Thus, since for any k Z+ ,




sup L ` , k = sup lim L (Y`,n ) P, (Yk,n ) P ,
`k

`k n

{k : k 1} is an L-Cauchy sequence and, as such, converges to some . Finally,


for every k Z+ ,



L , (Xn ) P L(, k ) + L k , (Yk,n ) + L (Yk,n ) P, (Xn ) P ,
and so



lim L , (Xn ) P L(, k ) + lim L (Yk,n ) P, (Xn ) P .
n

Thus, after letting k and applying (*), one concludes that (Xn ) P =
. 
Exercises for 9.1
Exercise 9.1.15. Let (E, B) be a measurable space with the property that
{x} B for all x E. In this exercise, we will investigate the strong topology
in a little more detail. In particular, in part (iv), we will show that when
M1 (E) is non-atomic (i.e., {x} = 0 for every x E), then there is no
countable neighborhood basis of in the strong topology. Obviously, this means
that the strong topology for M1 (E) admits no metric whenever M1 (E) contains
a non-atomic element.
(i) Show that, in general,


k kvar = 2 max (A) (A) : A B
and that in the case when E is a metric space, B its Borel field, and a metric
for E,


k kvar = sup h, i h, i : Ub (E; R) and kku 1 .
(ii) Show that if {n : n 1} is a P
sequence in M1 (E) that tends in the strong

topology to M1 (E), then  n=1 2n n .


(iii) Given M1 (E), show that admits a countable neighborhood basis
in the strong topology if and only if there exists a countable {k : k 1}
B(E; R) such that, for any net { : A} M1 (E), in the strong
topology as soon as lim hk , i = hk , i for every k Z+ .

382

9 Convergence of Measures on a Polish Space


+

(iv) Referring to Exercises 1.1.14 and 1.1.16, set = E Z and F = B Z . Next,


+
let M1 (E) be given, and define P = Z on (, F). Show
 that, for any
B(E; R), the random variables x 7 Xn (x) xn , n Z+ , are
mutually P-independent and all have distribution . In particular, use the
Strong Law of Large Numbers to conclude that
n

1 X
Xm (x) = h,
n n
m=1

lim

for each x outside of a P-null set.


Now assume that is non-atomic, and suppose that admitted a countable
neighborhood basis in the strong topology. Choose {k : k 1} B(E; R)
accordingly, as in (iii), and (using the preceding) conclude P
that there exists at
n
least one x for which the measures n given by n n1 m=1 xm , n Z+ ,
converge in the strong topology to . Finally, apply (ii) to see that this is
impossible.

Exercise 9.1.16. Throughout this exercise, E is a separable metric space.


(i) We already know that M1 (E) is separable;
however, our proof was non-con
structive. Show that if {pk : k 1 is a dense subset of E, then the set of
Pn
+
all convex combinations
and {k : 1 k n}
k=1 k pk , where n Z
Pn
[0, 1] Q with 1 k = 1, is a countable dense set in M1 (E).
(ii) We have seen that M1 (E) is compact if E is. To see that the converse is
also true, show that x E 7 x M1 (E) is a homeomorphism whose image
is closed.
(iii) Although it is a little off our track, it is amusing to show that E being
compact is equivalent to Cb (E; R) being separable; and, in view of (i) in Lemma
9.1.4, this comes down to checking that E is compact if Cb (E; R) is separable.
to denote the Hint: Let be a totally bounded metric on E, and use E
completion of E. Show that if {xn : n 1} E has the properties that
and limn (xn ) exists for every Cb (E; R), then x
xn x
E
E.
1
,
and
consider
functions
of
the
form
f

for
(Suppose not, set (x) = (x,
x)
f Cb (R; R).) Finally, assuming that Cb (E; R) is separable, and, using a
diagonalization procedure, show that every sequence {xn : n 1} E admits a
and limm xn
subsequence {xnm : m 1} that converges to some x
E
m
exists for every Cb (E; R).

(iv) Let {Mn : n 1} be a sequence of finite, non-negative measures on (E, B).


Assuming that {Mn : n 1} is tight in the sense that {Mn (E) : n 1} is
bounded and that, for each  > 0, there is a K E such that supn Mn K{

Exercises for 9.1

383

, show that there is a subsequence {Mnk : k 1} and a finite measure M such


that
Z
Z
dM = lim
dMnk , for all Cb (E; R).
k

R
Conversely,
if
E
is
Polish
and
there
is
a
finite
measure
M
such
that
dMn
E
R
dM for every Cb (E; R), show that {Mn : n 1} is tight.
E
Exercise
9.1.17. Let {E` : ` 1} be a sequence of Polish spaces, set E =
Q
1 E` , and give E the product topology.
(i) For each ` Z+ , let ` be a complete metric for E` , and define

X
1 ` (x` , y` )
R(x, y) =
2` 1 + ` (x` , y` )

for x, y E.

`=1

Show that R is a complete metric


Q for E, and conclude that E is a Polish space.
In addition, check that BE = 1 BE` .
(ii) For ` Z+ , let ` be the natural projection map from E onto E` , and show
that K E if and only if
\
K=
`1 (K` ), where K` E` for each ` Z+ .
`Z+

Also, show that the span of the functions


`
Y

k k ,

where ` Z+ and k Ubk (Ek ; R), 1 k `,

k=1

is dense in UbR (E; R).


(E) is
 In particular, conclude from these that A M1+
tight if and only if (` ) : A M1 (E` ) is tight for every ` Z and
that n = in M1 (E) if and only if
* `
+
* `
+
Y
Y
k k , n
k k ,
k=1

k=1

for every ` Z+ and choice of k Ubk (Ek ; R), 1 k `.


Q`
(iii) For each ` Z+ , set E` = k=1 Ek , and let ` denote thenatural projection
map from E onto E` . Next, let [1,`] be an element of M1 E` , and assume that
the [1,`] s are consistent in the sense that, for every ` Z+ ,

[1,`+1] E`+1 = [1,`] () for all BE` .
Show that there is a unique M1 (E) such that [1,`] = (` ) for every
` Z+ .

384

9 Convergence of Measures on a Polish Space

Hint: Choose and fix an e E, and define ` : E` E so that




` x1 , . . . , x`



=
n

n`

xn

if

en

otherwise.



Show that (` ) [1,`] : ` Z+ M1 (E) is tight and that any limit must be
the desired .
The conclusion drawn in (iii) is the renowned Kolmogorov Extension (or
Consistency) Theorem. Notice that, at least for Polish spaces, it represents
a vast generalization of the result obtained in Exercise 1.1.14.
Exercise 9.1.18. In this exercise we will use the theory of weak convergence
to develop variations on The Strong Law of Large Numbers (cf. Theorem 1.4.9).
Thus, let E be a Polish space, (, F, P ) a probability space, and {Xn : n 1}
a sequence of mutually independent E-valued random variables on (, F, P )
with common distribution M1 (E). Next, define the empirical distribution
function
n
1 X
X () M1 (E),
7 Ln ()
n m=1 m

and observe that, for any B(E; R),

n


1 X
Xm () ,
, Ln () =
n m=1

n Z+ and .

As a consequence of the Strong Law, show that


(9.1.19)

Ln () = for P -almost every ,

which is The Strong Law of Large Numbers for the empirical distribution.
Now show that (9.1.19) provides another (cf. Exercises 6.1.16 and 6.2.18) proof
of the Strong Law of Large Numbers for Banach spacevalued random variables.
Thus, let EPbe a real, separable, Banach space with dual space E , and set
n
S n () = n1 1 Xm () for n Z+ and .

(i) As a preliminary step, begin with the case when


(*)



BE (0, R){ = 0

for some

R (0, ).


Choose Cb R; R so that (t) = t for t [R, R] and (t) = 0 when
|t|

R + 1, and define x Cb (E; R) for x E by x (x) = hx, x i , x E,

Exercises for 9.1

385

where hx, x i is used here to denote the action of x E on x E. Taking (*)


into account and applying (9.1.19) and Lemma 9.1.10, show that
lim

sup

n kx k 1
E



Z


hx , Ln ()i hx, x i (dx) = 0


E

for P-almost every , and conclude from this that



lim S n () m E = 0

for P-almost every ,

where (cf. Lemma 5.1.10) m = E [x].


(ii) The next step is to replace the boundedness assumption in (*) by the hypothesis that x
kxkE is -integrable. Assuming that it is, define, for R (0, ),
n Z+ , and ,
Xn(R) ()


=

Xn ()



if Xn () E < R

otherwise

Pn (R)
(R)
(R)
() = Xn () Xn (). Next, set S n = n1 1 Xm , n Z+ , and,

 (R)
from (i), note that S n () : n 1 converges in E for P-almost every .
In particular, if  > 0 is given and R (0, ) is chosen so that
(R)

and Yn

Z
kxkE (dx) <


,
8

{kxkE R}

use the preceding and Theorem 1.4.9 to verify the computation


lim P



sup S n S m E 

nm


(R)

(R)


lim P sup S n S m
m
2
nm

n
!

1 X


(R)
Yk
+ 2 lim P sup
m

4
nm n 1
E
!
n

1 X
Y (R)  = 0,
2 lim P sup
k
E
m
4
nm n 1


and from this conclude that S n E [x] P-almost surely.

386

9 Convergence of Measures on a Polish Space

(iii) Finally, repeat the argument


given in
the proof of Theorem 1.4.9 to show

that kxk is -integrable if S n : n 1 converges in E on a set of positive
P-measure.4

9.2 Regular Conditional Probability Distributions


As I mentioned in the discussion following Theorem 5.1.4, there are quite general
situations in which conditional expectation values can be computed as expectation values. The following is a basic result in that direction.
Theorem 9.2.1. Suppose that is a Polish space and that F = B . Then,
for every sub--algebra of F, there is a P-almost surely unique -measurable
map 7 P
M1 () with the property that


P AB =

P
(B) P(d) for all A and B F.

In particular, for each (, ]-valued random variable X that is bounded be


low, 7 EP [X] is a conditional expectation value of X given . Finally,
if is countably generated, then there is a P-null set N with the property
that P
/ N and A .
(A) = 1A () for all
Proof: To prove the uniqueness, suppose 7 Q
M1 () were a
second such mapping. We would then know that, for each B F, Q
(B) =
P
(B)
for
P-almost
every

.
Hence,
since
F
(as
the
Borel
field
over a

second countable topological space) is countably generated, we could find one

-measurable P-null set off of which Q


= P . Similarly, to prove the final
assertion when is countably generated, note (cf. (5.1.7)) that, for each A
, P
(A) = 1A () = (A) for P-almost every . Thus, once again
countability allows us to choose one -measurable P-null set N such that P

=  if
/ N.
I turn next to the question of existence. For this purpose, first choose (cf. (ii)
of Lemma 9.1.4) to be a totally bounded metric for , and let U = Ub (; R) be
the space of bounded, -uniformly continuous, R-valued functions on . Then
(cf. (iii) of Lemma 9.1.4) U is a separable Banach space with respect to the
uniform norm. In particular, we can choose a sequence {fn : n 0} U so
that f0 = 1, the functions f0 , . . . , fn are linearly independent for each n Z+ ,
and the linear span S of {fn : n 0} is dense in U. Set g0 = 1, and, for each
n Z+ , let gn be some fixed representative of EP [fn | ]. Next, set


R = RN : m N n = 0 for all n m
4

The beautiful argument that I have just outlined is due to Ranga Rao. See his 1963 article
The law of large numbers for D[0, 1]-valued random variables, Theory of Prob. & Appl.
VIII #1, where he shows that this method applies even outside the separable context.

9.2 Regular Conditional Probability Distributions


and define
f =

n fn

and g =

n=0

387

n gn

n=0

for R. Because of the linear independence of the fn s, we know that f = f


if and only if = . Hence, for each , we can define the (not necessarily
continuous) linear functional : S R so that

f = g (),

R.

Clearly, (1) = 1 for all . On the other hand, we cannot say that
is always non-negative as a linear functional on S. In fact, the best we can
do is extract a -measurable P-null set N so that is a non-negative linear
functional on S whenever
/ N . To this end, let Q denote the rational reals
and set


Q+ = R QN : f 0 .
Since g 0 (a.s., P) for every Q+ and Q+ is countable,
n
N : Q+

o
g () < 0

is a -measurable, P-null set. In addition, it is obvious that, for every


/ N,
(f ) 0 whenever f is a non-negative element of S. In particular, for
/ N,

kf ku (f ) = kf ku 1 f 0,

f S,

and therefore admits a unique extension as a non-negative, continuous linear


functional on U that takes 1 to 1. Furthermore, it is an easy matter to check
that, for every f U, the function

g() =

(f ) for
/N
EP [f ]

for

is a conditional expectation value of f given .


At this point, all that remains is to show that, for P-almost every
/ N,
is given by integration with respect to a P M1 (). In particular, by the
Riesz Representation Theorem, there is nothing more to do in the case when
is compact. To treat the case when is not compact, I will use Lemma 9.1.7.
Namely, first choose (cf. the last part of Lemma 9.1.7) a non-decreasing
sequence

1
+
of sets Kn , n Z , with the property that P Kn { 2n . Next, define

m,n () =

m (, Kn )
1 + m (, Kn )

for m, n Z+ .

388

9 Convergence of Measures on a Polish Space

Clearly, m,n U for each pair (m, n) and 0 m,n % 1Kn { as m for each
n Z+ . Thus, by The Monotone Convergence Theorem, for each n Z+ ,
Z
Z


sup m,n P(d) = lim
m,n P(d)
m

N { mZ+

N{



1
= lim EP m,n n ;
m
2
and so, by the BorelCantelli Lemma, we can find a -measurable P-null set
N 0 N such that



M () sup n sup m,n
< for every
/ N 0.
nZ+

mZ+

Hence, if
/ N 0 , then, for every f U and n Z+ ,




(f ) (1 m,n ) f + m,n f


M ()
kf ku
(1 m,n ) f u +
n


for all m Z+ . But (1 m,n ) f u kf ku,Kn as m , and so we now see
that the condition in (9.1.8) is satisfied by for every
/ N 0 . In other words,
0

I have shown that, for each


/ N , there is a unique P M1 () such that
P
(f ) = E [f ] for all f U. Finally, if we complete the definition of the map

0
7 P
by taking P = P for N , then this map is -measurable and
Z



EP f, A =
EP [f ] P(d), A ,

first for all f U and thence for all F-measurable f s that are bounded below. 
If P is a probability measure on (, F) and is a sub--algebra of F, then
a conditional probability distribution of P given is a map (, B) 7

P
(B) such that P is a probability measure on (, F) for each and

P (B) a conditional probability of B given for all B F. If, in addition, for outside a -measurable, P-null set and all A , P (A) = 1A (),
then the conditional probability distribution is said to be regular. Notice that,
although they may not always exist, conditional probability distributions are
always unique up to a -measurable, P-null set so long as F is countably generated. Moreover, Theorem 9.2.1 says that they will always exist if is Polish and
F = B . Finally, whenever a conditional probability distribution of P given
exists, the argument leading to the last part of Theorem 9.2.1 when is countably generated is completely general and shows that a regular version can be
found.
9.2.1. Fibering a Measure. When is a product space E1 E2 of two
Polish spaces and is the -algebra generated by the second coordinate, then
the conclusion of Theorem 9.2.1 takes a particularly pleasing form.

9.2 Regular Conditional Probability Distributions

389

Theorem 9.2.2. Let E1 and E2 be a pair of Polish spaces, and take to be


the Polish space E1 E2 . Given M1 (), use 2 to denote the marginal
distribution of on E2 : 2 () = (E1 ) for BE2 . Then there is a Borel
measurable map x2 E2 7 (x2 , ) M1 (E1 ) such that (dx1 dx2 ) =
(x2 , dx1 ) 2 (dx2 ).
Proof: Referring to Theorem 9.2.1, take P = , = {E1 : BE2 }, and
let 7 P
M1 () be the map guaranteed by the result there. Next,
choose and fix a point x01 E1 . Then, because
P
is -measurable, we

know that P(x1 ,x2 ) = P(x0 ,x2 ) . In addition, because is countably generated,
1
the final part of Theorem 9.2.1 guarantees
that there exists a 2 -null set B

BE2 such that P
E

{x
}
=
1
for
all x2
/ B. Hence, if we define
0
1
2
(x ,x2 )
1

x2
(x2 , ) by (x2 , ) = P
( E2 ), then, for any Borel measurable
(x01 ,x2 )
: E1 E2 [0, ), h, i equals


Z Z
Z Z
0
( 0 )P
(d
)
P(d)
=
(x
,
x
)
(x
,
dx
)
2 (dx2 ). 
1
2
2
1

E2

E1

In the older literature, the result in Theorem 9.2.2 would be called a fibering
of . The name derives from the idea that on E1 E2 can be decomposed into
its vertical component 2 and its restrictions (x2 , ) to horizontal fibers
E1 {x2 }. Alternatively, Theorem 9.2.2 can be interpreted as saying that any
M1 (E1 E2 ) can be decomposed into its marginal distribution on E2 and
a transition probability x2 E2 7 (x2 , ) M1 (E1 ). The two extreme cases
are when the coordinates are independent, in which case (x2 , ) is independent
of x2 , and the case when the coordinates are equal, in which case (x2 , ) = x2 .
As an application of Theorem 9.2.2, I present the following important special
case of a more general result that indicates just how remarkably fungible nonatomic measures are.
Corollary 9.2.3. Let [0,1) denote Lebesgue measure on [0, 1). For each
N Z+ and M1 (RN ), there is a Borel measurable map f : [0, 1) RN
such that = f [0,1) .
Proof: I will work by induction on N Z+ . When N = 1, take



f (u) = inf t R : (, t] u , u [0, 1).
Next, assume the result is true for N , take E1 = R and E2 = RN in Theorem
9.2.2, and, given M1 (RN ), define 2 M1 (RN ) and y RN 7 (y, )
M1 (R) accordingly. By the induction hypothesis, 2 = f2 ( ) [0,1) for some
f2 : [0, 1) RN . Thus, if g : [0, 1)2 R RN is given by
 



g(u1 , u2 ) = inf t R : f2 (u2 ), (, t] u1 , f2 (u2 )

390

9 Convergence of Measures on a Polish Space

for (u1 , u2 ) [0, 1)2 , then g is Borel measurable on [0, 1)2 and = g 2[0,1) .
Finally, by Lemma 1.1.6 or part (ii) of Exercise 1.1.11, we know
that there is a

Borel measurable map u [0, 1) 7 U(u) = U1 (u), U2 (u) [0, 1)2 such that
U [0,1) = 2[0,1) , and so we can take f (u) = g U. 
9.2.2. Representing L
evy Measures via the It
o Map. There is another
way of thinking about the construction of the Poisson jump processes, one that
is based on Corollary 9.2.3 and the transformation property described in Lemma
4.2.12. The advantage of this approach is that it provides a method of coupling
Levy processes corresponding to different Levy measures. Indeed, it is this coupling procedure that underlies K. Itos construction of Markov processes modeled
on Levy processes.1
Let M0 (dy) = |y|N 1 dy, which is the Levy measure for a (cf. Corollary 3.3.9)
symmetric 1-stable law. My first goal is to show that every M M (RN ) can
be realized as (cf. the notation in Lemma 4.2.6) M0F for some Borel measurable
F : RN RN satisfying F (0) = 0.2
Theorem 9.2.4. For each M M (RN ) there exists a Borel measurable map
F : RN RN such that F (0) = 0 and

M () = M0F M0 F 1 ( \ {0}) , BRN .
Proof: I begin with the case when N = 1. Given M M (R), define (r, 1)
for r > 0 by



(r, 1) = sup [0, ) : M [, ) r1



(r, 1) = sup [0, ) : M (, ] r1 ,
where I have taken the supremum over the empty set to be 0. Applying Exercise
9.2.6 with (dr)= r2 (0,) (dr), one sees that M = M0F when F (0) = 0 and
y
for y R \ {0}.
F (y) = |y|, |y|
Now assume that N 2, and let M M (RN ). If M = 0, simply take
F 0. If M 6= 0, choose a non-decreasing function h : (0, ) (0, ) so that
Z

h |y| M (dy) = 1,

and define M1 (0, ) SN 1 ) so that


Z

h, i =
h |y| (y)M (dy).
RN
1

See K. It
os On stochastic differential equations, Memoirs of the A.M.S. 4 (1951) or my
Markov Processes from K. It
os Perspective, Princeton Univ. Press, Annals of Math. Studies
155 (2003).
2 There is nothing sacrosanct about the choice of M as my reference measure. For instance, it
0
should be obvious that one can choose any L
evy measure M with the property that M0 = M F
for some Borel measurable F : RN RN that takes 0 to 0.

9.2 Regular Conditional Probability Distributions

391

Using 2 to denote the marginal distribution of on SN 1 , apply Corollary 9.2.3


to find a Borel measurable f : [0, 1) RN so that 2 = f [0,1) . Since 2 lives
on SN 1 , I may and will assume that f (u) SN 1 for all u [0, 1). Next, use
Theorem 9.2.2 to find a measurable map SN 1 7 (, ) M1 (0, )
so that (dr d) = (, dr) 2 (d), and define : (0, ) SN 1 [0, )
by
)
(
Z
N 1
1
.
(, dr)
(r, ) = sup [0, ) :
r
[,) h(r)

Then, again by Exercise 9.2.6, but this time with (dr) = N 1 r2 (0,) (dr),
for any continuous : RN [0, ) that vanishes in a neighborhood of 0,
Z
Z

(r)
(, dr) = N 1
(r, ) r2 dr, SN 1 ,
(0,)
(0,) h(r)

and so
Z

(y) M (dy) = N 1

(r, ) r
SN 1

RN

dr 2 (d)

(0,)

= N 1
[0,1)

!
 2
(r, )f (t) r dr [0,1) (dt).

(0,)


Finally, define g : SN 1 [0, N 1 ) by g() = SN 1 { 0 SN 1 : 10 1 } ,
note that N 1 [0,1) = g SN 1 , and conclude that M = M0F when


y
y
for y RN \ {0}. 
f g |y|
F (0) = 0 and F (y) = |y|, |y|

We can now prove the following theorem, which is the simplest example of
It
os procedure.
Theorem 9.2.5. Let {j0 (t, ) : t 0} be a Poisson jump process associated
with M0 . Then, for each M M (RN ), there is a Borel measurable map
F : RN RN with F (0) = 0 and a Poisson jump process {j(t, ) : t 0}
associated with M such that j(t, ) = j0F (t, ), t 0, P-almost surely.
Proof: Choose F as in Theorem 9.2.4 so that M = M0F . For R > 0, set
FR (y) = 1[R,) (y)F (y). By Lemma 4.2.12, we know that {j0FR (t, ) : t 0} is
a Poisson jump process associated with M FR . In particular, for each r > 0,





EP j0F t, RN \ B(0, r) = lim EP j0FR t, RN \ B(0, r) = M RN \ B(0, r) < .
R&0

Hence, there exists a P-null set N such that t


j0F (t, , ) is a jump function
F
for all
/ N . Finally, if j(t, , ) = j0 (t, , ) when
/ N and j(t, , ) = 0
for N , then {j(t, ) : t 0} is a jump process associated with M and
j(t, ) = j0F (t, ), t 0, for P-almost every . 

392

9 Convergence of Measures on a Polish Space


Exercises for 9.2

Exercise 9.2.6. Let be an infinite non-negative,


non-atomic,
Borel measure


on [0, ) with the property that [r2 , ) < [r1 , ) < for all 0 <
r1 < r2 < . Given any other non-negative
Borel measure on [0, ) with the

properties that ({0}) = 0 and [r, ) < for all r > 0, define



(r) = sup (0, ) : [, ) [r, ) ,

r 0,


where
over the empty set is taken to be 0. Show that [t, ) =
 the supremum

r : (r) t for all t > 0, and therefore that h, i = h , i for all Borel
measurable : [0, ) [0, ) that vanish at 0.


Hint: Determine g : (0, )
(0,
)
so
that

g(r),

= r, and check that


 
{r : (r) t} = g ([t, )) , for all t > 0.
9.3 Donskers Invariance Principle
The content of this section is my main justification for presenting the material in
9.1. Namely, as we saw in Chapter 8, there is good reason to think that Wiener
measure is the infinite dimensional version of the standard Gauss measure in
RN , and as such one might suspect that there is a version of The Central Limit
Theorem that applies to it. In this section I will prove such a Central Limit
Theorem for Wiener measure. The result is due to M. Donsker and is known as
Donskers Invariance Principle (cf. Theorem 9.3.1).
Before getting started, I need to make a couple of simple preparatory remarks. In the first place, I will be thinking of Wiener
measure W (N ) as a Borel

N
N
probability measure on C(R ) = C [0, ); R with the topology of uniform
convergence on compact intervals. Equivalently, C(RN ) is given the topology for
which

X
1 k 0 k[0,n]
(, 0 ) =
2n 1 + k 0 k[0,n]
n=1

is a metric, which, just as in the case of D(RN ) (cf. 4.1.1), is complete on


C(RN ) and, as distinguished from D(RN ), is separable there. One way to check
separability is to note that the set of paths that, for some n N, are linear on
[(m 1)2n , m2n ] and satisfy (m2n ) QN for all m Z+ is a countable,
dense subset. In particular, this means that C(RN ) is a Polish space, and so
the theory developed in 9.1 applies to it. In addition, the Borel field BC(RN )

coincides with {(t) : t 0} , the -algebra that C(RN ) inherits as a subset
of (RN )[0,) (cf. 4.1). Indeed, since

(t) is continuous for every t 0, it



is obvious that {(t) : t 0} BC(RN ) . At the same time, since kk[0,t] =
sup{|(
 ) : [0, t] Q}, it is easy to check that open balls are {(t) :
t 0} -measurable. Hence, since every
open set is the countable union of open

balls, BC(RN ) {(t) : t 0} . Knowing that these -algebras coincide,

9.3 Donskers Invariance Principle

393


we know that two probability measures , M1 C(RN ) are equal if they
determine the same distribution on (RN )[0,) , that is, if, for each n  Z+ and
0 = t0 < t1 < tn , the distribution of C(RN ) 7 (t0 , . . . , (tn ) (RN )n
is the same under and .
9.3.1. Donskers Theorem. Let (, F, P) be a probability space, and suppose that {Xn : n 1} is a sequence of independent,
P-uniformly square

integrable random variables (i.e., as R , EP |Xn |2 , |Xn | R 0
uniformly in n) with mean value 0 and covariance I.  Given n 1, define
Pm
m
12
=
n
7 Sn ( , ) C(RN ) so that
S
(0)
=
0,
S
n
n
k=1 Xk , and
n
 m1 m 
+
Sn ( , ) is linear on each interval n , n for all m Z . Donskers theorem
is the following.

Theorem 9.3.1 (Donskers Invariance Principle). If n = (Sn ) P


M1 C(RN ) is the distribution of 7 Sn ( , ) C(RN ) under P, then
n =W (N ) . Equivalently, for any bounded, continuous : C(RN ) C,


lim EP Sn = h, W (N ) i.

Proving this result comes down to showing that {n : n 1} is tight and


that every limit point is W (N ) . The second of these is a rather elementary
application of the Central Limit Theorem, and, at least when the Xn s have
uniformly bounded fourth moments, the first is an application of Kolmogorovs
Continuity Criterion. Finally, to remove the fourth moment assumption, I will
use the Principle of Accompanying Laws. It should be noticed that, at no point
in the proof, do I make use of the a priori existence of Wiener measure. Thus,
Theorem 9.3.1 provides another derivation of its existence, a derivation that
includes an an extremely ubiquitous approximation procedure.
Lemma 9.3.2. Any limit point of {n : n 1} is W (N ) .
Proof: Since a probability on C(RN ) is uniquely determined by its finite dimensional time marginals, and because (0) = 0 with probability 1 under
all the n s as well as W (N ) , it suffices to show that, for each ` Z+ and
0 = t0 < t1 < < t` ,

Sn (t1 ), Sn (t2 ) Sn (t1 ), . . . , Sn (t` ) Sn (t`1 ) P = 0,1 I 0,` I ,
where k = tk tk1 , 1 k `. To this end, for 1 k ` and n >
bntk c
1

n (k) = n 2

X
j=bntk1 c+1

Xj ,

1
k ,

set

394

9 Convergence of Measures on a Polish Space

where, as usual, I use the notation btc to denote the integer part of t. Noting
that






Sn tk Sn tk1 n (k)








bntk1 c
bntk c

Sn tk Sn

+ Sn tk1 Sn
n
n



Xbnt c+1 + Xbnt c+1
k
k1
,

1
n2

one sees that, for any  > 0,


`
2
X




P
Sn tk Sn tk1 n (k) 2
k=1

`
2
X
n2


P
Xbntk c+1
4

k=0

`
2 i 4(` + 1)N
4 X P h
=
0
E
X
bnt
c+1
k
n2
n2
k=0

as n . Hence, by the Principle of Accompanying Laws (cf. Theorem 9.1.13),


we need only check that

n (1), . . . , n (`) P = N
N
.
1
`
Moreover, since



n (1), . . . , n (`) P = n (1) P n (`) P

for all sufficiently large ns, this reduces to checking n (k) P = 0,k I for
each 1 k `. Finally, given 1 k `, set Mn (k) = bntk c bntk1 c, and use
Theorem 2.3.8 to see that, as n ,



Mn (k)
X

||2
1
P

, Xbntk c+j RN
exp
E exp
1
2
Mn (k) 2 j=1

uniformly for in compact subsets of RN . Hence, since


see that, for any fixed RN ,

Mn (k)
n

k , we now




 i
k ||2
= \
exp
E exp 1 , n (k) RN
0,k I (),
2
P


and therefore n (k) P = 0,k I . 

9.3 Donskers Invariance Principle

395

I turn next to the problem of showing that {n : n 1} is tight. By the


AscoliArzela
a Theorem, any subset K C(RN ) of the form


\
|(t) (s)|
R`
: |(0)| sup
(t s)
0s<t`
`=1

iscompact for any > 0 and {R` : ` 1} [0, ). Thus, since n (0) =
0 = 1, all that we have to do is show that, for each T > 0,


|Sn (t) Sn (s)|
P
< ,
sup E
sup
1
(t s) 8
n1
1s<tT

and, by Theorem 4.3.2, this would follow if we knew that




(*)
sup EP |Sn (t) Sn (s)|4 C(t s)2 , s, t [0, ),
n1

for some C < .


I will prove
(*) under the assumption that, for some M < and all n 1,

EP |Xn |4 M . To do this, note that when k 1 ns < nt k,
h
h i
4 i
4
EP Sn (t) Sn (s) = n2 (t s)4 EP Xk M (t s)2 .
On the other hand, when k 1 ns k ` nt ` + 1,
h
4 i
EP Sn (t) Sn (s)
h
h
 4 i

 4 i
27EP Sn (t) Sn n` + 27EP Sn n` Sn nk
h
4 i

+ 27EP Sn nk Sn (s)

4
X

4

4

`k

k
27 P
`
2
2

s
Xk+j + 27M n
+ 2 E
27M n t
n
n
n
j=1

81N 2 M (` k)2
135N 2 M (t s)2 ,
n2
where, in the passage to the final line, I have taken {ei : 1 i N } to be an
orthonormal basis for RN and used the estimate

2 2
X

`k
N
`k



X X

EP
Xk+j = EP
ei , Xk+j RN
j=1

i=1
j=1
54M (t s)2 +

N
X
i=1

4
`k

X

EP
ei , Xk+j RN 3N 2 M (` k)2
j=1

396

9 Convergence of Measures on a Polish Space

coming from the second inequality in (1.3.2).


In order to remove assumption on the fourth, I will apply the Principle
of Accompanying Laws. Namely, because the Xn s are uniformly
 square Pintegrable, one can use a truncation
procedure to find functions fn, : n

Z+ and > 0 Cb RN , RN with the properties that, for each > 0,
supnZ+ fn, u < ,
h
2 i
sup EP Xn fn, Xn < ,

nZ+

and, for every n Z+ , the random variable Xn, fn, Xn has mean value 0
and covariance I. Next, for each > 0, define the maps  7 Sn, ( , )
C(RN ) relative to {Xn, : n 1}, and set n, = Sn, P. Then, by the
preceding, we know that n, = W (N ) for each > 0. Hence, by Theorem
9.1.13, we will have proved that n = W (N ) as soon as we show that






lim sup P sup Sn (t) Sn, (t)  = 0
&0 nZ+

0tT

for every T Z+ and  > 0. To this end, first observe that, because Sn ( ) and
Sn, ( ) are linear on each interval [(m 1)2n , m2n ],


m
X



1


Y
sup Sn (t) Sn, (t) = max
k, ,
1
1mnT


2
n k=1
t[0,T ]

where Yk, Xk Xk, . Next, note that




!
m

1 X

Yk, 
P
max
1
1mnT n 2

k=1


!
m
X
 n 12 

N max P
max
e, Yk, RN 1 .
1mnT

eSN 1
N2
k=1

Finally, by Kolmogorovs Inequality,


m

!
X
 n 12 
NT

P
max
e, Yk, RN 1 2
1mnT


N2
k=1

for every e SN 1 .
9.3.2. Rayleighs Random Flights Model. Here is a more picturesque
scheme for approximating Brownian motion. Imagine the path t
R(t) of a
bird that starts at the origin, flies in a randomly chosen direction at unit speed

9.3 Donskers Invariance Principle

397

for a unit exponential random time, then switches to a new randomly chosen
direction for a second unit exponential time, etc. Next, given  > 0, rescale time
1
and space so that the path becomes t
R (t), where R (t)  2 R(1 t). I
will show that, as  & 0, the distribution of {R (t) : t 0} becomes Brownian
motion. This model was introduced by Rayleigh and is called his random flights
model.
In the following, {m : m 1} is a sequence of mutually independent, unit
exponential random variables from which their partial sums {Tn : n 0} and
the associated simple Poisson process {N (t) : t 0} are defined as in 4.2.1.
Finally, given  > 0, N (t) = N (1 t).

Lemma 9.3.3. Let {Xn : n 1} a sequence of mutually independent RN valued, uniformly square P-integrable random variables with mean value 0 and
covariance I, and define {Sn (t) : t 0} accordingly, as in Theorem 9.3.1. (Note
that the Xn s are not assumed to be independent of the n s.) Next, define
X (t, ) =

N (t,)

Xm ,

(t, ) [0, ) .

m=1

Then, for all r (0, ) and T [0, ),


!


sup X (t) Sn (t) r

lim P

&0

= 0,

where n [1 ].

t[0,T ]

Proof: Note that




N (t, )
,
X (t, ) Sn (t, ) = ( n 1) Sn
n




N (t, )
, Sn (t, ) .
+ Sn
n

Hence, for every (0, 1],


!
P



sup X (t) Sn (t) r
t[0,T ]

!



N (t)
t
+ P sup
n
t[0,T ]
!

r
.
+ P sup
sup Sn (t) Sn (s)
2
s[0,T ] |ts|



r
P
sup Sn (t)
2
t[0,T +]

But, by Theorem 9.3.1 and the converse statement in Theorem 9.1.9, we know
that the first term tends to 0 as  & 0 uniformly in (0, 1] and that the third

398

9 Convergence of Measures on a Polish Space

term tends to 0 as & 0 uniformly in  (0, 1]. Thus, all that remains is to
note that, by Exercise 4.2.19,
!




(9.3.4)
lim P sup N (t) t = 0. 
&0

t[0,T ]

Now suppose that {n : n 1} is a sequence of mutually independent RN valued random variables that satisfy the conditions that
h
i
M sup EP |n n |4 < ,
nZ+
h
i


EP n n = 0, and EP (n n ) (n n ) = I, n Z+ .
Finally, define 7 R( , ) C(RN ) by
N (t,)
X

R(t, ) = t TN (t,) () N (t,)+1 () +
m ()m ().
m=1

The process {R(t) : t 0} is my interpretation of Rayleighs random flights


model. A typical choice of the n s would be to make them independent of the
holding times (i.e.,
then s) and to choose them to be uniformly distributed over
N 1
N .
the sphere S

Theorem 9.3.5. Referring to the preceding, set




R (t, ) =  R t , , (t, ) [0, ) .



Then R P = W (N ) as  & 0.

Proof: Set Xn = n n , and, using the same notation as in Lemma 9.3.3,


observe that



R (t) X (t)  XN (t)+1 .


Hence, by Lemma 9.3.3 and Theorems 9.3.1 and 9.1.13, all that we have to do
is check that
!


lim P sup  XN (t)+1 r = 0
&0

t[0,T ]

for every r (0, ) and T [0, ). To this end, set T = 1+T


 . Then, by
(9.3.4), we have that
!




r
lim P sup  XN (t)+1 r = lim P max |Xn+1 |
0nT
&0
&0

t[0,T ]

14
1

M (2 + T ) 4
 P X
4
= 0. 
E
|Xn+1 |
lim
lim
&0 r
&0
r
0nT

Exercise for 9.3

399

Exercise for 9.3



Exercise 9.3.6. Let {n : n 1} M1 C(RN ) , and, for each T (0, ),
let Tn M1 C [0, T ]; E) denote the distribution of

C(RN ) 7  [0, T ] C [0, T ]; RN under n .

Show that there is a M1 C(RN ) to which {n : n 1} converges in
M1 C(RN ) if and only if, for each T (0, ), there is a T M1 C([0, T ]; RN )
with the property that

Tn = T in M1 C([0, T ]; RN ) ,
in which case T is the distribution of
C(RN ) 7  [0, T ] C([0, T ]; RN ) under .
In particular, weak convergence of measures on C(RN ) is really a local property.
Exercise 9.3.7. Donskers own proof of Theorem 9.3.1 was entirely different
from the one given here. Instead it was based on a special case of his result, a
case that had been proved already (with a very difficult argument) by P. Erdos
and M. Kac. The result of Erdos and Kac was that if {Xn : n 1} is a sequence
of independent, uniformly square integrable random variables with mean value
0 and variance 1, then, for all a 0,
! r Z
m
X
2 x2
21
e 2 dx.
Xk a =
lim P max n
n
1mn
a
k=1

Prove their result as an application of Donskers Theorem and part (iii) of Exercise 4.3.11. According to Kac, it was G. Uhlenbeck who first suggested that
their result might be a consequence of a more general invariance principle.
Exercise 9.3.8. Here is another version
of Rayleighs random
flights model.


Again let {k : k 1}, Tm : m 0 , and N (t) : t 0 be as in 4.2.2, and
set
Z t


R(t) =
(1)N (s) ds and R (t) =  R t .
0

Show that R P = W (1) as  & 0.


Hint: Set k = 0 or 1 according to whether k N is even or odd, and note that
n
X
k=1

(1)k k =

n
X
k=1

X

k k+1 k n n =


2k 2k1 n n+1 .

1k n
2

Now proceed as in the derivations of Lemma 9.3.3 and Theorem 9.3.5.

Chapter 10
Wiener Measure and
Partial Differential Equations

In this chapter I will give a somewhat sketchy survey of the bridge between
Brownian motion and partial differential equations. Like all good bridges, it
is valuable when crossed starting at either end. For those starting from the
probability side, it provides a computational tool with which the evaluation of
many otherwise intractable Wiener integrals is reduced to finding the solution to
a partial differential equation. For aficionados of partial differential equations,
it provides a representation of solutions that often reveals properties that are
not at all apparent in more conventional, purely analytic, representations.
10.1 Martingales and Partial Differential Equations
The origin of all the connections between Brownian motion and partial differential equations is the observation that the Gauss kernel
(10.1.1)

g (N ) (t, x) = (2t) 2 e

|x|2
2t

(t, x) (0, ) RN ,

is simultaneously the density for the Gaussian distribution 0,tI and the solution
to the heat equation t u = 12 u in (0, ) R with initial condition 0 . More
precisely, if Cb (RN ; R), then

Z
u (t, x) =

g (N ) (t, y x)(y) dy

RN


is the one and only bounded u C 1,2 (0, ) RN ; R that solves the Cauchy
initial value problem
t u = 21 u in (0, ) RN with lim u(t, ) = uniformly on compacts.
t&0

Checking that u solves this problem is an elementary computation. Showing


that it is the only solution is less straightforward. Purely analytic proofs can
be based on the weak minimum principle. If one assumes more about u, then a
probabilistic proof can be based on Theorem 7.1.6. Indeed, if one assumes that
400

10.1 Martingales and Partial Differential Equations

401



u Cb1,2 [0, ) RN ; C , then that theorem shows that, when  B(t), Ft , P is a
Brownian motion, for each T > 0, u(T tT, x+B(tT )Ft , P is a martingale.
Thus,
Z


u(T, x) = EP B(T ) =
(x + y) 0,tI (dy) = u (T, x).
RN

In Theorem 10.1.2, I will prove a refinement of Theorem 7.1.6 that will enable
me (cf. the discussion following Corollary 10.1.3) to remove the assumption that
the derivatives of u are bounded.
As the preceding line of reasoning indicates, the advantage that probability
theory provides comes from lifting questions about a partial differential equation to a pathspace setting, and martingales provide one of the most powerful
machines with which to do the requisite lifting. In this section I will refine and
exploit that machine.
10.1.1. Localizing and Extending Martingale Representations. The
purpose of this subsection is to combine Theorems 7.1.6 and 7.1.17 with Corollary
7.1.15 to obtain a quite general method for representing solutions to partial
differential equations as Wiener integrals.
For the purposes of this chapter, it is best to think of Wiener measure
W (N )

N
N
as a Borel measure on the Polish space C(R )  C [0, ); R
and to take
{Ft : t 0} with Ft = {( ) : [0, t]} as the standard choice of a
non-decreasing family of -algebras. The reason for using C(RN ) instead of (cf.
(N )
8.1.3) (RN ) is that we will want to consider the translates Wx of W (N ) by
(N
)
x RN . That is, Wx is the distribution of
x + under W (N ) . Since it

(N )
N
is clear that the map x R 7 Wx M1 C(RN ) is continuous, there is
no doubt that it is Borel measurable.
Theorem 10.1.2. Let G be a non-empty, open subset of R RN , and, for
s R, define sG : C(RN ) [0, ] by



sG () = inf t 0 : s + t, (t)
/G .
Further, suppose that V : G R is a Borel measurable function that is
bounded above on the whole of G and bounded below on each compact subset
of G, and set
!
Z tsG

EsV (t, ) = exp
V s + , ( ) d .
0


If w C 1,2 (G; R) Cb (G; R) satisfies t + 12 + V w f on G, where f :
G R is a bounded, Borel measurable function, then


EsV (t, )w s + t sG (), (t sG )
Z

tsG ()


E (, )f s + , ( ) , Ft , Wx(N )

402

10 Wiener Measure and P.D.E.s


is a submartingale for every (s, x) G. In particular, if t + 12 + V w = f on
G, then the preceding triple is a martingale.
Proof: Without loss in generality, I may and will assume that s = 0.
Choose a sequence {Gn : n 0} of open sets such that (0, x)S G0 , Gn

Gn+1 , Gn is a compact subset of G for each n N , and G = n=0 Gn . At


the same time, for each n N, choose n C R RN ; [0, 1] so that n = 1
on Gn and n vanishes off a compact subset of G, and define wn and Vn so
that wn = n w and V
 n = n V on G and wn and Vn vanish off of G. Clearly,
1,2
N
wn Cb R R ; R and Vn is bounded and measurable.
(N ) 
By Theorem 7.1.6, we know that Mn (t), Ft , Wx
is a martingale, where

Mn (t, ) = wn


t, (t)

with gn = t wn + 12 wn .


gn , ( ) d

Thus, if
Z
En (t, ) = exp

Vn , ( ) d


,

then, by Theorem 7.1.17,




Z t
(N )
En (t, )Mn (t, )
En (, )Mn (, )Vn (, ) d, Ft , Wx
0

is also a martingale. In addition,


Z

Z t

En (, )Vn (, )
gn , () d d
0
0
Z t

Z t


=
gn , ()
En (, )Vn , ( ) d d
0

Z
= En (t, )

gn


, () d


En (, )gn , () d,

and therefore
Z

En (t, )Mn (t, )

En (, )Mn (, )Vn (, ) d
Z t


= En (t, )wn t, (t)
En (, )fn , ( ) d,
0

where fn = gn + Vn wn . Hence, we now know that




Z t


(N )
En (t, )wn t, (t)
En (, )fn , ( ) d, Ft , Wx
0

10.1 Martingales and Partial Differential Equations

403

is a martingale.
Finally, define 0Gn for Gn in the same way as 0G was defined for G. Since
fn f on Gn , an application of Theorem 7.1.15 gives the desired result with
0Gn in place of 0G , and, because 0Gn % 0G , this completes the proof. 
Perhaps the most famous application of Theorem 10.1.2 is the FeynmanKac
formula,1 a version of which is the content of the following corollary.
Corollary 10.1.3. Let V : [0, T ] RN R be a Borel measurable function
that is uniformly bounded above everywhere
and bounded below uniformly on

compacts. If u C 1,2 (0, T ) RN ; R is bounded and satisfies the Cauchy initial
value problem
t u = 12 u+V u+f in (0, T )RN

with lim u(t, ) = uniformly on compacts


t&0

for some bounded, Borel measurable f : [0, T ] RN R and Cb (RN ; R),


then
 RT


(N )
V (,( )) d
u(T, x) = EWx
e 0
(T )
"Z
#
T Rt

(N )
V (,( )) d
Wx
+E
e 0
f t, (t) dt .
0

Proof: Given Theorem 10.1.2, there is hardly anything to do. Indeed, here
G = (0, T ) RN and so 0G = T . Thus, by Theorem 10.1.2 applied to w(t, ) =
u(T t, ), we know that
 R tT

V (,( )) d
e 0
u T t T, (t)
Z

tT

R
0

V (,()) d


f , ( ) d, Ft , Wx(N )

is a martingale. Hence,
W (N )

u(T, x) = lim E
t%T

 Rt

V (,( )) d
e 0
u T t, (t)
Z
+

R
0

V (,()) d


f , ( ) d ,


0
1

In the same spirit as he wrote down (8.1.4), Feynman expressed solutions to Schr
odingers
equation in terms of path-integrals. After hearing Feynman lecture on his method, Kac realized
that one could transfer Feynmans ideas from the Schr
odinger to the heat context and thereby
arrive at a mathematically rigorous but far less exciting theory.

404

10 Wiener Measure and P.D.E.s

from which the asserted equality follows immediately.

As a special case of the preceding, we obtain the missing uniqueness statement



in the introduction to this section. Namely, if u C 1,2 (0, ) RN ; C is a
bounded solution to the heat equation with initial value , then, by considering
the real and imaginary parts of u separately, Corollary 10.1.3 implies that
(N )

u(t, x) = EWx


(t) =

Z
(y)g(t, y x) dy.
RN

10.1.2. Minimum Principles. In this subsection I will show how Theorem


10.1.2 leads to an elegant derivation of the basic minimum principle for solutions
to equations like the heat equation. Actually, there are two such minimum
principles, one of which says that solutions achieve their minimum value at the
boundary of the region in which they are defined and the other of which says that
only solutions that are constant can achieve a minimum value on the interior.
The first of these principles is called the weak minimum principle, and the
second is called the strong minimum principle.
Theorem 10.1.4. Let G be a non-empty open subset of R RN , and let V
be a function of the sort described in Theorem 10.1.2. Further, suppose that
(s, x) G is a point at which
(10.1.5)




Wx(N ) t (0, ) s t, (t)
/ G = 1.

If u C 1,2 (G; R) is bounded below and satisfies t u 12 u V u 0 in G and if


lim(t,y)(t0 ,y0 ) u(t, y) 0 for every (t0 , y0 ) G with t0 < s, then u(s, x) 0.

Proof: Without loss in generality, I will assume that s = 0.


= {(t, y) : (t, y) G} and define w on G
by w(t, y) = u(t, y). Next,
Set G
choose an exhaustion {Gn : n 0} of G as in the proof of Theorem 10.1.2, and
fn = {(t, y) : (t, y) Gn }. By Theorem 10.1.2, we know that
set G
w(0, x) E

(N )

Wx

h R n ()
i
V (,( )) d
e 0
w n (), (n ) ,


fn } n. Moreover, by (10.1.5), for
where n () = inf{t 0 : t, (t)
/ G

(N )
Wx -almost every , n (), (n ) tends to a point in {(t, x) G : t < 0}
as n , and therefore



lim w n (), (n ) = lim u n (), (n ) 0
n

Wx(N ) -almost surely.

Hence, by Fatous Lemma, we see that u(0, x) = w(0, x) 0. 

10.1 Martingales and Partial Differential Equations

405

Theorem 10.1.6. In the same setting as the preceding, suppose that u


C 1,2 (G; R) satisfies t u 12 u V u 0 in G. If (s, x)  G and 0 = u(s, x)
u(t, y) for all (t, y) G with t s, then u s t, (t)
= 0 for all (t, )

[0, ) C(RN ) such that (0) = x and s , ( ) G for all [0, t]. In
particular, if G is a connected,
open subset of RN , V is independent of time,

2
and u C G; [0, ) satisfies 12 u + V u 0, then either u 0 or u > 0
everywhere on G.

Proof: Again, without loss in generality, I assume that s = 0. In addition,I


may and will assume that x = 0, V is uniformly bounded, and u Cb G; [0, ) .
To see that these latter assumptions cause no loss in generality, one can use an
exhaustion argument of the same sort as was used in the proof of Theorem 10.1.2.
N
Given (t, ) (0, )C(R
) with (0) = 0 and , ( ) G for [0, t],

suppose that u t, (t) > 0. In order to get a contradiction,
choose r > 0 so

that u(t, y) r if |y (t)| r and so that , 0 ( ) G if [0, t] and
= {(t, y) : (t, y) G}, then, just as in the proof of
k 0 k[0,t] r. If G
Theorem 10.1.2,

Z
0 = u(0, 0)

R tG (0 )
0

retkV ku k W (N )

V (, 0 ( )) d


u t 0G ( 0 ), 0 (t 0G ) W (N ) (d 0 )

{ 0 : k 0 k[0,t] r} .


Since, by Corollary 8.3.6, W (N ) { 0 : k 0 k[0,t] r} > 0, we have the
required contradiction.
Turning to the final assertion, take G = R G, and observe that for all
(x, y) G2 there is a such that (0) = x, (1) = y, and ( ) G for all
[0, 1]. 
At first glance, one might think that the strong minimum principle overshadows the weak minimum principle and makes it obsolete. However, that is not
entirely true. Specifically, before one can apply the strong minimum principle,
one has to know that a minimum is actually achieved. In many situations,
continuity plus compactness provide the necessary existence. However, when
compactness is absent, special considerations have to be brought to bear. The
weak minimum principle does not suffer from this problem. On the other hand,
it suffers from a related problem. Namely, one has to know ahead of time that
(10.1.5) holds. As we will see below, this is usually not too serious a problem,
but it should be kept in mind.
10.1.3. The Hermite Heat Equation. In the preceding subsection I gave
an example of how probability theory can give information about solutions to
partial differential equations. In this subsection, it will be a differential equation
that gives us information about probability theory. To be precise, I, following M.
Kac, will give in this subsection his derivation of the formulas that we derived

406

10 Wiener Measure and P.D.E.s

by purely Gaussian techniques in Exercise 8.2.16, and in the next section I will
give his treatment of a closely related problem.2
Closed form solutions to the Cauchy initial value problem are available for
very few V s, but there is a famous one for which they are. Namely, when
V = 12 |x|2 , a great deal is known. Indeed, already in the nineteenth century,
Hermite knew how to analyze the operator 12 12 |x|2 . As a result, this operator
is often called the Hermite operator by mathematicians, although physicists
call it the harmonic oscillator because it arises in quantum mechanics as minus
the Hamiltonian for an oscillator that satisfies Hooks law. Be that as it may,
set (cf. (10.1.1))

(10.1.7)

h(t, x, y) = e

N t+|x|2
2

(N )


|y|2
1 e2t
t
,y e x e 2
2

for (t, x, y) (0, ) RN RN . By using the fact that g (N ) solves the heat
equation and tends to 0 as t & 0, one can apply elementary calculus to check
that

t h(t, , y) = 12 12 |x|2 h(t, , y) in (0, ) RN
for each y RN .
and lim h(t, x, y) = yx
t&0

Now let Cb (RN ; R) be given, and set


Z
u (t, x) =

(y)h(t, x, y) dy.
RN

Then, u is a bounded solution to t u = 12 u 12 |x|2 u that tends to as t & 0.


Hence, as an immediate consequence of Corollary 10.1.3, we see that
(N )

u (t, x) = EWx

h 1 Rt
i
|( )|2 d

(t) .
e 2 0

By taking = 1 and performing a tedious, but completely standard, Gaussian


computation, one can use this to derive
(N )

Wx





Rt
 N2
|x|2
|( )|2 d
12
0
tanh t ,
exp
= cosh t
e
2

which, together with Brownian scaling, vastly generalizes the result in Exercise
8.2.16.
2

See Kacs On some connections between probability theory and differential and integral
equations, Proc. 2nd Berkeley Symp. on Prob. & Stat. Univ. of California Press (1951),
where he gives several additional, intriguing applications of Corollary 10.1.3.

10.1 Martingales and Partial Differential Equations

407

10.1.4. The Arcsine Law. As I said at the beginning of the last subsection,
there are very few V s for which one can write down explicit solutions to equations of the form t u = 12 u + V u. On the other hand, when V is independent
of time one can often, particularly whenRN = 1, write down a closed form ex
pression for the Laplace transform U = 0 et u(t, ) dt of u. Indeed, if u is a
bounded solution to t u = 12 u + V u, then it is an elementary exercise to check
that

12 V U = f,

and when N = 1 this is an ordinary differential equation. Moreover, when


U C 2 (RN ; R) is bounded, one can apply Corollary 10.1.3 to see that
h RT
i
V (( )) d
e 0
U (T )
"Z
#
T Rt

(N )
V (( )) d
Wx
+E
e 0
f (t) dt
(N )

U (x) =EWx

for T > 0,

where V = V . Hence, if V is uniformly negative and one lets T , one


gets
Z R t


(N )
V (( )) d
Wx
0
U (x) = E
e
f (t) dt .
0

The preceding remark is the origin of Kacs derivation of Levys Arcsine Law
for Wiener measure.
Theorem 10.1.8. For every T (0, ) and [0, 1],
(
W

(1)

1
C(R) :
T

)!

1[0,) (t) dt


2
arcsin .

Proof: First note that, by Brownian scaling, it suffices to prove the result when
T = 1. Next, set
1


Z
F () = W
C(R) :

1[0,)


(s) ds


,

[0, ),


and let denote the element of M1 [0, ) for which F is the distribution
function. We are going to compute F () by looking at the double Laplace
transform
Z
G()
et g(t) dt, (0, ),
(0,)

where

Z
g(t)
[0,)

et (d),

t (0, );

408

10 Wiener Measure and P.D.E.s

and, by another application of the Brownian scaling property, we see that

 Z t



(1)
G() =
exp
+ 1[0,) (s) ds W (d) dt
0
0
Z R t

(1)
V (( )) d
= EW
e 0
dt
where V 1[0,) .
Z

Z

At this point, the strategy is to calculate G() with the help of the idea
explained above. For this purpose, I begin by seeking as good a solution x
R 7 u (x) R as I can find to the equation 12 u00 + V u = 1. By considering
this equation separately on the left and right half-lines and then matching, in so
far as possible, at 0, one finds that the best choice of bounded u will be to take

i
h p
1

if x [0, )
A exp 2(1 + ) x + 1+
u (x) =
i
h

B exp 2 x + 1
if x (, 0),

where

A =

1
(1 + )

 12

1+


and B =

1
(1 + )

 12

1
.

(The choice of sign in the exponent is dictated by my desire to have u bounded.)


If u were twice continuously differentiable, I could apply the reasoning above
directly and thereby arrive at G() = u (0). However, because the second
derivative of u is discontinuous at 0, I have to work a little harder.
Notice that, although the second derivative of u has a discontinuity at 0,
u0 is nonetheless uniformly Lipschitz continuous everywhere. Hence, by taking
Cc R; [0, ) with Lebesgue integral 1 and setting
Z
u (x y)(ny) dy,

u,n (x) = n

n Z+ ,

we see that u,n


Cb (R; R) for each n Z+ , u,n u uniformly on R as

n , supnZ+ u,n kCb2 (R;R) < , and, as n ,
fn


1 00
u,n + 1[0,) u,n 1
2

on R \ {0}.

Thus, since the argument that I attempted to apply to u works for u,n , we
know that
Z R t


V (( )) d
W (1)
u,n (0) = E
e 0
fn (t) d dt .
0

10.1 Martingales and Partial Differential Equations

409

In addition, because
W (1)

Z

1{0}

0
W (1)

 Z

(t) dt =


0,t {0} dt = 0,

Z

Rt
0

V (( )) d

fn



(t) dt G().

Hence, the conclusion u (0) = G() has now been rigorously verified.
 1
Knowing that G() = (1) 2 , the rest of the calculation is easy. Indeed,
since
r
Z

12 t
,
t e
dt =

the multiplication rule for Laplace transforms tells us that


1
g(t) =

es

1
p
ds =

s(t s)

Z
0

et
p
d;
(1 )

and so we now find that


Z


2
1
1 1
p
d = arcsin 1 . 
F () =

0
(1 )

Just as Donskers Invariance Principle enabled us in Exercise 9.3.7 to derive


the ErdosKac Theorem from the reflection principle for Brownian motion, it
now allows us to transfer the Arcsine Law for Wiener measure to the Arcsine
Law for sums of independent random variables.


Corollary 10.1.9. If Xn : n 1 is a sequence of independent, uniformly
square P-integrable random variables with mean value 0 and variance 1 on some
probability space (, F, P), then, for every [0, 1],



2
Nn ()

= arcsin ,
lim P
:
n

n
Pm
where Nn () is the number of m Z+ [0, n] for which Sm () `=1 X` ()
is non-negative.

Proof: Thinking of
9.2.1)

Nn ()
n

as a Riemann approximation to (cf. the notation in


Z


1[0,) Sn (t, ) dt,

one should guess that, in view of Theorem 9.3.1 and Theorem 9.1.13, there
should be very little left to be done. However, once again there are continuity

410

10 Wiener Measure and P.D.E.s


issues that have to be dealt with. Thus, for each f C R; [0, 1] and n Z+ ,
introduce the functions F f and Fnf on C(R) given by
F f () =


f (t) dt

and Fnf () =

n
1 X
f
n m=1

m
n




for any f C R; [0, 1] . Since Fnf F f uniformly on compacts, Theorem
9.3.1 plus Lemma 9.1.10 show that the distribution of
7 Afn ()



n
Sm ()
1 X
f
1
n m=1
n2

under P tends weakly to that of C(R) 7 F f () under W (1) . Next, for


each (0, ), choose continuous functions f so that 1(,) f+ 1[0,)
and 1[0,) f 1[,) , and conclude that




 +
Nn
(1)
f
W
F
lim P
n
n
and





Nn
(1)
f
< W
F
<
lim P
n
n
for every > 0. Passing to the limit as & 0, we arrive at


lim P

Nn

Nn
<
n

(1)


Z
:



1(0,) (t) dt

and


lim P
n

(1)


Z
:

1[0,) (t) dt <


.

Finally, since
Z Z

1{0}


Z

(1)
(t) dt W (d) =

and [0, 1] 7 arcsin


W (1) (t) = 0 dt = 0,


is continuous, the asserted result follows.

10.1 Martingales and Partial Differential Equations

411

Remark 10.1.10. The renown of the Arcsine Law stems, in large part, from the
following counterintuitive deduction that can be drawn from it. Namely, given

0, 12 , guess which maximizes limn P Nnn ( , + ) mod1 for
a fixed . Because of The Law of Large Numbers (in more common parlance,
The Law of Averages), most people are inclined to guess that the maximum
should occur at = 12 . Thus, it is surprising that, since

1
[0, ]
[0, 1] 7 p
(1 )

is convex and has its minimum at 12 , the Arcsine Law makes the exact opposite
prediction! The point is, of course, that the sequence of partial sums {Sn () :
n 1} is most likely to make long excursions above and below 0 but tends to
spend relatively little time in a neighborhood of 0. In other words, although
one may be correct to feel that my luck has got to change, one had better be
prepared to wait a long time.
A more technical point is one raised by S. Sternberg. The arcsine distribution
is familiar to people who study iterated maps and is important to them because
(cf. Exercise 10.1.15) it is the one and only absolutely continuous probability
distribution on [0, 1] that is invariant under x [0, 1] 7 4x(1 x) [0, 1].
Sternberg asked whether a derivation
R 1 of Theorem 10.1.8 can be
R 1based on this
invariance property. Taking T+ = 0 1[0,) (s) ds and S = 0 sgn (s) ds,
and noting that 4T+ (1 T+ ) = 1 S 2 , one way to phrase Sternbergs question
is to ask is whether there is a pure thought way to check that T+ and 1 S 2
have the same distribution under W (1) and that that distribution is absolutely
continuous. I have posed this problem to several experts but, as yet, none of
them has come up with a satisfactory solution.

10.1.5. Recurrence and Transience of Brownian Motion. In this subsection I will use solutions to partial differential equations to examine the long
time behavior of Brownian motion.
Theorem 10.1.11. For r [0, ), define


r () = inf t [0, ) : |(t)| = r ,

C(RN ).

Then
  r2 |x|2
r =
N


(N ) 
(N
+ 4)r2 N |x|2 2
r |x|2
EWx r2 =
2
N (N + 2)
(N )

EWx

for |x| < r.

412

10 Wiener Measure and P.D.E.s

In addition, if 0 < r < |x| < R < , then

Wx(N )

R |x|

Rr

 log R log |x|


r < R =
log R log r

 N 2 N 2

R
|x|N 2
r

N
2
R
rN 2
|x|

if

N =1

if

N =2

if

N 3.

In particular,

Wx(2)


Wx(1) 0 < = 1 for all x R,


0 < = 0, x 6= 0, but Wx(2) r < = 1, x R2 and r > 0,

and
Wx(N )


r < =

r
|x|

N 2
, 0 < r < |x|,

when N 3.

Proof: To prove the first two equalities, set f (t, x) = |x|2 N t, use Theorem
10.1.2 to show that



f t r , (t r ) , Ft , Wx(N )
and
2
f t r , (t r ) 4

tr

|(s)|2 ds, Ft , Wx(N )

are continuous martingales, and conclude that


(N )

N EWx

h


 2 i
(N )
t r = EWx (t r |x|2 ,

t [0, ),

and
2

(N )

Wx

N E



(N )
(t r )2 =|x|4 + 4EWx

"Z

tr
2

|(s)| ds
0

(N )

+ 2N EWx

h

2 i
4 i
(N )
(t r ) (t r ) EWx (t r )

10.1 Martingales and Partial Differential Equations

413
(N )

for all t [0, ). Now assume that |x| r, and use the first of these N EWx [r ]
(N )
(N )
r2 . Thus Wx (r < ) = 1, and so N EWx [r ] = r2 |x|2 follows when t .
To get the second equality, use Theorem 10.1.2 to show that
!
Z tr

4

2
(t r ) (4 + 2N )
(s) ds, Ft , Wx(N )
0

is a continuous martingale and therefore that, for x B(0, r),


"Z
#
tr
h

i
(N )
(N )
2
(s) ds = EWx (t r ) 4 |x|4 ,
(4 + 2N )EWx
0

plug this into the above, and pass to the limit as t .


Turning to the second part of the theorem, for each N Z+ choose fN

Cb RN ; R) so that, on an open neighborhood G of the annulus B(0, R)\B(0, r),


fN is equal to the corresponding expression on the right-hand side of the equality
under consideration, note that fN = 0 on G, and conclude (via Theorem
10.1.2) that



fN t r R , Ft , Wx(N )

is a bounded, continuous martingale. In particular, after one lets t and


(N )
notes that, by the first part of this theorem, Wx (R < ) = 1 for |x| < R,
this leads to
h

i
(N )
Wx(N ) r < R = EWx fN r R = fN (x), 0 < r < |x| < R,
as required. Given this, the rest of the theorem follows easily when one lets
R % and, in the case when N = {1, 2}, r & 0. 
The second part of Theorem 10.1.11 says something significant about the
global behavior of Brownian paths and the dependence of that behavior on
dimension. Namely, when N {1, 2}, it says that, no matter where it is started,
a Brownian path will hit any non-empty open set with probability 1. As will be
shown in Theorem 10.2.3, this property implies the seemly stronger statement
that, with probability 1, a Brownian path will visit every non-empty open set
infinitely often and will spend infinite time in each. For this reason, Brownian
motion in one and two dimensions is said to be recurrent. By contrast, when
N 3, Theorem 10.1.11 says that, with positive probability, a Brownian path
will never visit a closed ball in which it was not started. Moreover, if it is started
outside of a ball, then the probability of its ever hitting that ball goes to 0 as
the diameter of the set goes to 0. As I am about to show, this latter property
leads to the conclusion that, with probability 1, a Brownian path in three or
more dimensions tends to infinity.

414

10 Wiener Measure and P.D.E.s

Corollary 10.1.12. If N 3, then


Wx(N )




lim (t) = = 1,

x RN .

Proof: Given r > 0, apply Theorem 10.1.2 to see that (cf. the notation in
Theorem 10.1.11)



(t r ) N +2 , Ft , Wx(N )
is a bounded, non-negative martingale for every |x| > r > 0. Hence, by Theorem
7.1.14, for any 0 s t < and A Fs ,
h


i
(s) N +2 , A r () > s
h
 N +2

i
(N )
= EWx t r
, A r () > s ;
(N )

|x|N +2 EWx

(N ) 
and, because N 3 and therefore r % a.s., Wx
as r & 0, an application
of the Monotone Convergence Theorem and Fatous Lemma leads to
(N )

|x|N +2 EWx

h
i
h
i


(N )
(s) N +2 , A EWx (t) N +2 , A

for all 0 s t < , A Fs , and x 6= 0. In particular, this proves that




N +2
(t)
, Ft , Wx(N )
is a non-positive submartingale
for every x 6= 0 and therefore, by Theorem


(N )


7.1.10, that limt (t) exists in [0, ] for Wx -almost every C(RN ).
At the same time,





Wx(N ) (t) R = 0,tI y : |y x| R 0
as t for every R (0, ) and x RN ; and so we now know that, at least
(N )
when x 6= 0, |(t)| for Wx -almost every C(RN ). Finally, since
(N )
W0



inf (t) R

tT +1

Z
=

Wx(N )



inf (t) R

tT


0,I (dx),

RN \{0}

the same result also holds when x = 0. 


The conclusion drawn in the preceding is sometimes summarized as the statement that Brownian motion in three or more dimensions is transient.

Exercises for 10.1

415

Exercises for 10.1


Exercise 10.1.13. Referring to 8.4.1, define U(t, x, ) by (8.5.1), and let

(N )
Ux M1 C(RN ) denote the W (N ) -distribution of
U (N ) ( , x, ). Given
N
G
a non-empty open set G R R , define s () as in Theorem 10.1.2, and
show that for each w C 1,2 (G; R) Cb (G; R) and f Cb (G; R) satisfying

1
1
t w(t, y) w(t, y) + y, w(t, y) RN f in G,
2
2
!
Z tsG ()


G
G
(N )
w s + t s (), (t s )
f s + , ( ) d, Ft , Ux
0

is a submartingale for all (s, x) G.


Exercise 10.1.14. Let h be the function described in (10.1.7), and show that


h 1 RT
i
(N )
h t, x, (T )
|( )|2 d
2
Wx
0
.
E
e
{(T )} = (N )
g
T, (T ) x

Next, referring to Exercise 8.3.21, set `T,x,y (t) = TTt x + Tt y for t [0, T ], let

(N )
WT,x,y M1 C([0, T ]; RN ) denote the W (N ) -distribution of
`T,x,y + T 
[0, T ], and show that
i
h 1 RT
(N )
h(t, x, y)
|( )|2 d

.
= (N )
EWT ,x,y e 2 0
g (T, y x)
Exercise 10.1.15.
The purpose of this exercise is to examine the assertion made in Remark 10.1.10 about the characterization of the arcsine distribution (i.e., the Borel probability
measure on [0, 1] with distribution function
x [0, 1] 7 F (x) = 2 arcsin x [0, 1]). Specifically, the goal is to show that
the arcsine distribution is the one and only Borel probability measure on [0, 1]
that is absolutely continuous with respect to Lebesgue measure and invariant
under x [0, 1] 7 4x(1 x) [0, 1].
2
[0, 1], and show that a Borel
(i) Define x [0, 1] 7 (x) = sin x
2
probability measure on [0, 1] is invariant under x
4x(1x) if and only if
is invariant under x
2x mod 1. Conclude that the desired characterization of
the arcsine distribution is equivalent to showing that Lebesgue measure [0,1] on
[0, 1] is the one and only Borel probability measure on [0, 1] that is absolutely
continuous with respect to Lebesgue measure and invariant under x
2x mod 1.

(ii) Suppose that is a Borel probability measure on [0, 1] that is invariant



under x
2x mod 1 and assigns probability 0 to {0}. Set F (x) = [0, x] , the
distribution function for , and use induction on n 0 to show that
n
2X
1


F (x) =
F m2n + x2n F m2n
m=0

for x [0, 1].

416

10 Wiener Measure and P.D.E.s

(iii) Now add the assumption that  [0,1] , let f be the corresponding Radon
Nikodym derivative, and extend f to R by taking f = 0 off of [0, 1]. Given
0 x < x + y 1, conclude that
Z





F (x + y) F (x) F (y) f t + x2n f (t) dt 0
R

as n . In other words, F (x + y) = F (x) + F (y) whenever 0 x <


x + y 1. Finally, after combining this with the facts that F (0) = 0, F (1) = 1,
and F is continuous, conclude that F (x) = x for x [0, 1]. In view of part
(i), this completes the proof that the arcsine distribution admits the asserted
characterization.
(vi) To see that absolute continuity is absolutely essential in the preceding con+
siderations, consider any Borel probability measure M on {0, 1}Z that is stationary in the sense that the M -distribution of
+

{0, 1}Z 7 (2 , . . . , n+1 , . . . ) {0, 1}Z


is again M . Show that the M -distribution of

X
+
{0, 1}Z 7
2n n [0, 1]
n=1

is invariant under x
2x mod 1. In particular, this means that, for each
p (0, 1) \ { 12 }, the p described in Exercise 1.4.29 is a non-atomic, Borel
probability measure on [0, 1] that is invariant under x
2x mod 1 but singular
to Lebesgue measure.

10.2 The Markov Property and Potential Theory


In this section I will discuss the Markov property for Wiener measure and show
how it can be used as a tool for connecting Brownian motion to partial differential
equations.
10.2.1. The Markov Property for Wiener Measure. The introduction
(N )
of the translates Wx s facilitates the statement of the following important interpretation of Theorem 7.1.16. In its statement, and elsewhere, t : C(RN )
C(RN ) is the time-shift map determined by t ( ) = (t + ), [0, ),
and when is a stopping time,  is the map on { : () < } C(RN )
given by ( ) = () + .
Theorem 10.2.1. If is a stopping time and F : C(RN ) C(RN ) [0, )
is a F FC(RN ) -measurable function, then
Z

F , Wx(N ) (d)
{:()<}

(10.2.2)

F (,
{:()<}

C(RN )

(N )
) W() (d 0 )

Wx(N ) (d).

10.2 The Markov Property and Potential Theory

417

Proof: Given Theorem 7.1.16, the proof is mostly a matter of notation. In the
first place, by replacing F (, 0 ) with F (x + , 0 ), one can reduce to the case
when x = 0. Thus, I will assume that x = 0. Secondly, = () + if
() < . Hence,
Z
Z
 (N )

F , W (d) =
F , () + W (N ) (d).
{:()<}

{:()<}



Now define F (, 0 ) = 1[0,) () F , () + 0 , note that F is again
F BC(RN ) -measurable, and apply Theorem 7.1.16 to reach the desired conclusion. 
Theorem 10.2.1 is a statement of the Markov property for Wiener measure.
More precisely, because it involves stopping times, and not just fixed times, it is
often called the strong Markov property.
10.2.2. Recurrence in One and Two Dimensions. As my first application
of the Markov property, I will prove the statement made following Theorem
10.1.11 about the recurrence of Brownian motion when N {1, 2}.
Theorem 10.2.3. If N {1, 2}, then, for all x RN ,
Z


Wx(N )
1B(c,r) (t) dt = for all c RN and r (0, ) = 1.
0

Proof: Because RN is separable, it is easy to use countable additivity to see


that the asserted result will be proved once we show that
Z


(N )
(*) Wx
1B(0,r) (t) dt = = 1 for all x RN and r (0, ).
0

Define { n : n 0} so that 0 () = inf{t 0 : |(t)| 2r } and





inf t n1 () : |(t)| r
for odd n 1
n


() =
for even n 2,
inf t n1 () : |(t)| 2r

with the understanding that n1 () = = n () = . Clearly, all


the n s are stopping times. In addition, 2n () < = 2n+1 () =
2n () + B(0,r) 2n () for all n 0, and 2n1 () < = 2n () =
2n1 () + r2 2n1 () for n 1, where B(0,r) () = inf{t 0 : |(t)| r}


and, as in Theorem 10.1.11, r2 () = inf t 0 : |(t)| = 2r . Hence, by
Theorem 10.2.1,



(N )
Wx(N ) 2n+1 t F 2n = W( 2n ) B(0,r) t if 2n () < ,
(**)



(N )
if 2n1 () < .
Wx(N ) 2n t F 2n1 = W( 2n1 ) r2 t

418

10 Wiener Measure and P.D.E.s

In particular, because N {1, 2}, Theorem 10.1.11 says that both B(0,r) and
(N )
r2 are Wy -almost surely finite for all y RN . Thus, by induction, n <
(N )

Wx -almost surely for all n 0.


Next set
 2n+1

() 2n () if 2n () <
Xn ()
0
if 2n () = .
(N )

By the preceding, we know that, for each n 0, Xn > 0 Wx -almost surely.


In addition, it is obvious that
Z

1B(0,r) (t) dt
0

Xn ().

n=0

Hence, if we show that the Xn s are mutually independent and identically dis(N )
tributed under Wx , then (*) will follow from The Strong Law of Large Numbers. But, by (**), we will know that the Xn s have both these properties once
(N )
we show that Wy ( B(0,r) t) is the same for all y RN with |y| = 2r . To
this end, let yi , i {1, 2} with |yi | = 2r be given, and choose an orthogonal
(N )
transformation O of RN so that y2 = Oy1 . Then, Wy2 is the distribution of
(N )
(N )
(N )

O under Wy1 , and so Wy2 ( B(0,r) t) = Wy1 ( B(0,r) t). 


10.2.3. The Dirichlet Problem. There are many ways in which the Markov
property can be used to relate Brownian motion to partial differential equations,
but among the most compelling is the one that was discovered by S. Kakutani
and developed by Doob.1 What Kakutani discovered is that the capacitory
potential (cf. 11.4.1) of a set K R2 at a point x R2 \ K is equal to the
probability that a Brownian motion started at x ever hits K. What Doob did
is extend Kakutanis result to RN and show that it is a very special case of a
result that identifies the distribution of the place where a Brownian motion hits
the boundary of a set as the harmonic measure (cf. 11.1.4) for that set. In this
subsection, I will give a brief introduction to these ideas. A much more thorough
account is given in Chapter 11.
Let G be a non-empty, connected open subset of RN . Given an f Cb (G; R),
one says that u C 2 (G; R) solves the Dirichlet problem for f in G if u is
1

Kakutanis 1944 article, Two dimensional Brownian motion and harmonic functions, Proc.
Imp. Acad. Tokyo, 20, together with his 1949 article, Markoff process and the Dirichlet problem, Proc. Imp. Acad. Tokyo, 21, are generally accepted as the first place in which a definitive
connection between harmonic functions and Brownian motion was established. However, it
was not until with Doobs Semimartingales and subharmonic functions, T.A.M.S., 77, in
1954 that the connection was completed. It is ironic that this connection was not made by
Wiener himself. Indeed, Wieners early fame as an analyst was based on his contributions to
potential theory. However, in spite of his claims to the contrary, I know of no evidence that
he discovered the connection between his measure and potential theory.

10.2 The Markov Property and Potential Theory

419

harmonic (i.e., u = 0) in G and, for each a G, u(x) f (a) if as x G


tends to a. Assuming that (10.1.5) holds with G = R G, the weak minimum
principle shows that there is at most one solution to the Dirichlet problem for
each f Cb (G; R). However, the corresponding question about the existence
of solutions is not so easily resolved. The following preliminary result will get
us started. In its statement, and elsewhere, a harmonic function on a nonempty, open subset G of R is a u C 2 (G; R) that satisfies u = 0. Also, if
is a non-zero, finite measure on E and f : E R is a -integrable function, I
will write
Z
Z
1
f d
f d
(E)

to denote the average value of f with respect to . Finally, G : C(RN ) [0, ]


given by G () = inf{t 0 : (t)
/ G} is the first exit time from G.
R),
Theorem 10.2.4. Let G be a non-empty, open subset of RN . If u Cb (G;
u  G is harmonic, and x is an element of G for which

(10.2.5)
Wx(N ) G < = 1,
then
(10.2.6)

(N )

u(x) = EWx




u ( G ) , G () < .

In particular, if u is harmonic on G, then2


(10.2.7)

Z
B(x, r) G = u(x) =
SN 1

u(x + r) SN 1 (d).

Conversely, if u : G R is a locally bounded (i.e., bounded on compacts),


Borel measurable function that satisfies (10.2.7), then u C (G; R) and u is
harmonic. Finally, if f : G R is a bounded, Borel measurable function,
then the function u : G R given by
h
i

(N )
u(x) = EWx f ( G ) , G () < for x G
u is a bounded, harmonic function on G.
R) is harmonic on G. By Theorem 10.1.2,
Proof: Suppose that u Cb (G;



u (t G ) , Ft , Wx(N )
is a martingale.

(N ) 
Hence, u(x) = EWx u (t G ) , and so, after letting t and taking
(10.2.5) into account, one gets (10.2.6).
2

Remember that I use the notation G to mean that is a compact subset of G.

420

10 Wiener Measure and P.D.E.s

Now assume that u is harmonic in G and that B(x, r) G. By applying


(10.2.6) to u  B(x, r) and noting that (cf. the first part of Theorem 10.1.11)
(N )
Wx ( B(x,r) < ) = 1, one has
(N )

u(x) = EWx




u ( B(x,r) ) , B(x,r) () < .

Hence, the proof of (10.2.7) reduces to the observation that the distribution of
(N )
{ B(x,r) < } 7 ( B(x,r) ) B(x, r) under Wx is same as that of
{ B(0,r) < } 7 x+( B(0,r) ) under W (N ) and that (cf. Exercise 4.3.10)
the distribution of { B(0,r) < } 7 ( B(0,r) ) under W (N ) is rotation
invariant.
Turning to the converse assertion, suppose that u : G R is a locally
bounded, Borel measurable function for which (10.2.7) holds. To see that u 
C (G; R), extend u to RN so that it is 0 off of G, and choose a Cc R; [0, )
with support in (0, 1) and total integral 1. Using (10.2.7) together with Fubinis
Theorem, one sees that, as long as B(x, r) G,
1

Z

u(x) =
(t)
u(x + tr) S N 1 (d) dt
N 1
0
Z S

1
|y x|1N r1 |y x| u(y) dy,
=
N 1 r RN
Z

from which it is clear that u C (G; R). Further, knowing that u is smooth
and satisfies (10.2.7), it is easy to see that it is harmonic. Indeed, by Taylors
Theorem, we know that
Z

SN 1

Z
u(x + r) SN 1 (d) u(x) =

SN 1

r2
2 u(x)

u(x + r) u(x) SN 1 (d)

+ o(r2 ),

since, for any orthonormal basis (e1 , . . . , eN ) in RN and 1 i N ,


Z

ei ,

2

SN 1

RN

Z
2
1
SN 1 (d)

SN 1 (d) =
N SN 1

and, when 1 i 6= j N ,
Z
ei ,
SN 1


RN

SN 1 (d) =

ei ,
SN 1


RN

ej ,


RN

SN 1 (d) = 0.

Hence, after dividing through by r2 and letting r & 0, we see that (10.2.7)
implies u(x) = 0.

10.2 The Markov Property and Potential Theory

421

To complete the proof, let u be as in the final assertion. Because G =

+ G B(x,r) if B(x, r) G and B(x,r) < , Theorem 10.2.1 implies


that
B(x,r)

(N )

B(x, r) G = u(x) = EWx




u ( B(x,r) ) , B(x,r) () < ;

and, as we have seen earlier, this implies that (10.2.7) holds. 


An easy, but important, corollary of the preceding is that if G is connected,
then (10.2.5) for one x G implies (10.2.5) for all x G. Indeed, take u(x) =
(N )
1Wx ( G < ), apply Theorem 10.2.4 to see that u is a [0, 1]-valued harmonic
function in G, and apply the strong minimum principle to see that u(x) = 0
for some x G implies u = 0 throughout G. A second easy corollary is that if
(10.2.5) holds and if u solves the Dirichlet problem in G for some f Cb (G; R),
then
(10.2.8)

u(x) = EW

(N )




f (t) , G () < .

Thus, if (10.2.5) holds for all x G and we are going to solve the Dirichlet
problem for f , then we have no choice but to show that the u given by (10.2.8)
is a solution. Furthermore, because of the last part of Theorem 10.2.4, we already
know that this u is harmonic in G. Thus, all that remains is to find conditions
under which the u in (10.2.8) will take the correct boundary values.
It should be reasonably clear, and will be verified shortly (cf. Theorem 10.2.14),
that if f is continuous at a G and if
(10.2.9)

lim Wx(N ) ( G ) = 0

xa
xG

for all > 0,

then the function u in (10.2.8) tends to f (a) as x a through G. For this


reason, I will say that a G is a regular point if (10.2.9) holds, in which case
I write a reg G.
In order to give a probabilistic criterion with which to test the regularity of
a point, I need to introduce some notation. Given s [0, ), set sG () =
inf{t s : (t)
/ G}, the first time exits from G after time s, and define
G
0+
() = lims&0 sG (), the first positive time exits from G. Notice that, for
s (0, ),
(10.2.10)

sG = s + G s

G
G
and 0+
s = 0+
= sG .

Lemma 10.2.11. Regularity is a local property in the


 sense that, for each
r (0, ), a reg G if and only if a reg G B(a, r) . Furthermore,
(10.2.12),


G
a reg G a G and Wa(N ) 0+
> 0 = 0,

422

10 Wiener Measure and P.D.E.s

and so reg G is Borel measurable. Finally, if a reg G, then, for each > 0,



(N )
G
G
(10.2.13)
lim
W

,
(
)

(0,
)

B(a,
)
= 1.
x
xa
xG

Proof: Set G(a, r) = G BRN (a, r). Since it is obvious that G(a,r) is dominated by G , there is no question that a reg G = a reg G(a, r). On the
other hand, if a reg G(a, r) and  > 0, then, for all 0 < < ,

lim Wx(N ) ( G )
lim Wx(N ) ( G ) xa

xa
xG

xG

lim
xa

Wx(N )



lim Wx(N ) BRN (a,r)
G(a,r) + xa
xG

xG(a,r)

W (N )

sup |(t)|
t[0,]

r
2

!
0

as & 0.

Hence, we have now also proved that a reg G(a, r) = a reg G.


Next, let a G. To check the equivalence in (10.2.12), use the first part of
(10.2.10) and the Markov property to see that
Z

x RN 7 Wx(N ) sG ) =
Wy(N ) G s g (N ) (s, y x) dy [0, 1]
RN

is a continuous function for every s (0, ), and therefore that




G
x RN 7 Wx(N ) 0+
= lim Wx(N ) sG
s&0

(N )

is upper semicontinuous for all 0. In particular, if Wa


G
because G () = 0+
() when (0) G, it follows that


G
0+
> 0 = 0, then,



G
lim Wx(N ) 0+
=0
lim Wx(N ) G = xa

xa
xG

xG

for every > 0. To prove the converse, suppose that a reg G, let positive 
and be given, and choose r > 0 so that

Wx(N ) G  for x G B(a, r).
Then, by the second part of (10.2.10), the Markov property, and (4.3.13), for
each s (0, ) one has
h
i


(N )
(N )
G
Wa(N ) 0+
2 EWa W(s) G , (s) G

r2
 + Wa(N ) (s)
/ B(a, r)  + 2N e 2N s ,

10.2 The Markov Property and Potential Theory

423


(N ) G
from which Wa 0+
> 0 = 0 follows when first s & 0 and then  & 0.
Now, assume that a reg G, and observe that, for each 0 <  < ,



Wx(N ) G
/ B(a, ) or G
!

Wx(N ) G  + Wx(N ) sup |(t) a| .
t[0,]

Hence, (10.2.9) and (4.3.13) together imply that

lim Wx(N )
xa
xG

/ B(x, ) or



2
,
2N exp
2N 


from which (10.2.13) follows after one lets  & 0. 


In view of the last part of Theorem 10.2.4 and (10.2.13), the following statement is obvious.
Theorem 10.2.14. Let G be a non-empty open subset of RN and f : G R
a bound, Borel measurable function. If u is given by (10.2.8), then u is a bounded
harmonic function in G, and, for every a reg G at which f is continuous,
u(x) f (a) as x a through G.
Before closing this brief introduction to one of the most successful applications
of probability theory to partial differential equations, it seems only appropriate
to check that the conclusion in Theorem 10.2.14 is equivalent to the classical one
at which analysts arrived. To be precise, recall the famous program, initiated
by O. Perron and completed by Wiener, M. Brelot, and others, for solving the
Dirichlet problem. Namely, given a bounded, non-empty open set G in RN
and an f C(G; R), consider the set U(f ) of lower semicontinuous functions
w : G R that are bounded below and satisfy the super-mean value property
Z
B(x, r) G = w(x)
w(x + r) SN 1 (d),
SN 1

and the boundary condition


lim w(x) f (a)

for all a G.

xa
xG

At the same time, define L(f ) to be the set of v : G R such that v U(f ).
Finally, given a G, say that a admits
a barrier if, for some r > 0, there

exists an C 2 G B(a, r); (0, ) such that
lim

xa
xGB(a,r)

(x) = 0

and

 for some  > 0.

424

10 Wiener Measure and P.D.E.s

A famous theorem3 proved by Wiener states that


inf{w(x) : w U(f )} = sup{v(x) : v L(f )}
and that if Hf (x) denotes this common value, then x
harmonic function on G with the property that
lim Hf (x) = f (a)

xa
xG

for all x G
Hf (x) is a bounded

for a G that admit a barrier.

Theorem 10.2.15. Referring to the preceding paragraph, the function Hf


described there coincides with the function u in (10.2.8). In addition, a boundary
point a G is regular (i.e., (10.2.9) holds) if and only if it admits a barrier.
Proof: To prove the first part, all that I have to do is check that v u w for
all v L(f ) and w U(f ). For this purpose, set r(x) = 12 |x G{|, and define
{ n : n 0} so that 0 = 0 and


n+1 () = inf t n () : |(t) ( n )| r ( n )
for n 0,

with the usual understanding that n () = = n+1 () = . An easy


inductive argument shows that all the n s are stopping times. In addition, it is
clear that n n+1 G . I now want to show that G () < = n () %
G (). To this end, suppose that supn0 n () < G () < , in which case

there exists an  > 0 such that r ( n )  for all n 0. But this would
mean that { n () : n 0} is a bounded sequence for which inf n0 |( n+1 )
( n )| , which contradicts the continuity of . Finally, choose a reference
point y G, and set Xn () equal to ( n ) or y according to whether n () <
or not, Rn () = r Xn () , and Bn () = B Xn (), Rn () , the ball around
Xn () of radius Rn (), and observe that
n () < = n+1 () = n () + Bn () n ().
With these preparations at hand, let w U(f ) and x G be given. By
Theorem 10.2.1 and the preceding,


(N ) 
EWx w ( n+1 ) , n+1 () <

Z
Z
 (N )

=
w 0 ( Bn () ) WXn () (d 0 ) Wx(N ) ()

{: n ()<}

Z
(N )
Wx
=E

SN 1

(N )

EWx
3

{ 0 : Bn () ( 0 )<}

w Xn () + Rn () SN 1 (d), () <




w ( n ) , n () < ,

See O.D. Kelloggs Foundations of Potential Theory, Dover Publ. (1963).

10.2 The Markov Property and Potential Theory

425

where, in the passage to the second to last line, I have used the fact, established
earlier, that the exit place from a ball of a Brownian path started at its center
is uniformly distributed. Hence, by Fatous Lemma and the boundary condition
satisfied by w,



(N ) 
w(x) lim EWx w ( n ) , n () <
n


(N ) 
EWx f ( G ) , G () < = u(x).
Thus, we have now shown that w u for all w U(f ). Of course, if v L(f ),
then, because v U(f ), we also know that v u and therefore that
v u.
(N )
I turn next to the second part of the theorem. Set m(x) = EWx [ G ], x G.
Clearly m is positive. Moreover, if m(x) 0 as x a through G, then a is
regular. Conversely, suppose a is regular. Since
(N )

m(x) + EWx

1

(N ) 
1
G , G + Wx(N ) ( G ) 2 EWx ( G )2 2 ,

it will follow that m(x) 0 as x a through G once we check that x



(N ) 
EWx ( G )2 is bounded on G. But, if R is the diameter of G and c RN
is chosen so that G B(c, R), then G B(c,R) , and, by the first part of

(N ) 
Theorem 10.1.11 and translation invariance, x
EWx ( B(c,R) )2 is bounded
on B(c, R). Hence, we now know that a G is regular if and only if m(x)
0 as x a through G. To complete the proof at this point, set m(x)

=
(N )
EWx [ B(c,R) ], and observe that, since B(c,R) = G + B(c,R) G when G <
,


(N ) 
m(x)

m(x) = EWx m
( G ) , G () < .
Thus mm

is harmonic on G and so, by the first part of Theorem 10.1.11, m =


m
= 2 on G. Hence, if a is regular, then m is a barrier at a. Conversely,
suppose that a admits a barrier C 2 G B(a, r); (0, ) . Because of the
locality property proved in Lemma 10.2.11, I will, without loss in generality,
assume that B(a, r) G. Choose a sequence {Gn : n 1} of open sets so that
n Gn+1 for each n and Gn % G. Then, by Theorem 10.1.2, for x Gn
G
and t 0,

(N ) 
(x) (x) EWx (t Gn )
#
"Z
t Gn ()


(N )
 Wx(N ) 
1
t Gn .
= EWx
2 ( ) d 2 E
0

Hence, after letting first t and then n tend to infinity, we see that m(x) 2 (x)
for all x G; and, since (x) 0 as x tends to a through G, it follows that
a reg G. 

426

10 Wiener Measure and P.D.E.s

The argument used to prove the first part of Theorem 10.2.15 is a probabilistic
implementation of what analysts call the balayage procedure for solving the
Dirichlet problem.
Exercises for 10.2
Exercise 10.2.16. Suppose that G is a non-empty, open subset of RM RN and
that (x, y) G 7 u(x, y) R is a Borel measurable function that is harmonic
with respect to x and y separately (i.e., u( , y) is harmonic on {x : (x, y) G}
for each y G and u(x, ) is harmonic on {y : (x, y) G} for each x G).
Assuming that u is bounded below on compact subsets of G, show that u is
harmonic on G.
Hint: Clearly, all that one has to show is that u is smooth on G. In addition,
without loss in generality, one can assume that u can be extended to RM RN
as a non-negative, Borel measurable function. Making this assumption,
 proceed
as in the proof of Theorem 10.2.4 to show that if Cc (0, 1); R has total
integral 1 and BRM (x, r) BRN (y, r) G, then u(x, y) equals
ZZ
 1

1
1M
1N
1
|x|
|y|

r
|x|

r
|y|
u(, ) dxd.
M 1 N 1 r2
RM RN

Thus, all that remains is to justify differentiating under the integral.



Exercise 10.2.17. Show that the only functions u C 2 R2 ; [0, ) satisfying
u 0 are constant. This result is a manifestation of recurrence. Indeed, show
that it is completely false when R2 is replaced by either the half-space R(0, )
or R3 .
Hint: Using
 the sort
 of reasoning in the proof of Corollary 10.1.12, show that
u((t) , Ft , W (2) is a non-positive submartingale and, as a consequence,

that limt u (t) exists W (2) -almost surely. Now, using Theorem 10.2.3,
show that this is possible only if u is constant. To handle the last part, let
f Cc R; [0, ) , and consider the function on R3 given by

Z |x| Z
1
u(x) = |x|
f () d d.
0

Exercise 10.2.18. The goal of this exercise is to prove Blumenthals 01


T
(N )
Law, which states that if A F0+ t>0 Ft , then Wx (A) {0, 1} for each
x RN .
(i) If F : C(RN ) R is a bounded, continuous function, show that, for any
A F0+ and x RN ,


(N ) 
(N ) 
EWx F, A = lim EWx F t , A
t&0
Z
(N )
(N )
W
= lim
E (t) [F ] Wx(N ) (d) = EWx [F ]Wx(N ) (A).
t&0

Exercises for 10.2

427


(N ) 
(N )
(N )
(ii) For any A BC(RN ) , show that EWx F, A = EWx [F ]Wx (A) for all
bounded, Borel measurable F : C(RN ) R if it holds for all bounded, continuous ones.
(N )

(N )

(iii) By combining (i) and (ii), show that Wx (A) = Wx (A)2 for all A F0+
and x RN .
Exercise 10.2.19. Let G be a non-empty, open subset of RN . In this exercise,
we will develop a criterion for checking the regularity of boundary points.
(1)

(i) As an application of Exercise 4.3.15, show that, for Wx -almost every


C(R), t > 0 (0, t) ( ) = x. Next use this to see that when N = 1 every
boundary point of every open set is regular.
(N )

G
(ii) As an application of Blumenthals 01 Law, show that Wx (0+
> 0)
{0, 1} for all x RN . Next, using this together with (10.2.12), show that a is

(N )
regular if and only if Wa 0+ = 0 > 0.

(iii) Assume that a G has positive, upper Lebesgue density in G{. That is,

lim

r&0

|B(a, r) G{|
> 0,
|B(a, r)|

where || denotes the Lebesgue measure of BRN . Show that a is regular. In


particular, because, for any Borel set , the set of x with upper Lebesgue
density less than 1 has Lebesgue measure 0, this proves that G \ reg G has
Lebesgue measure 0. (See the Lemma 11.1.9 for another proof of this fact.)
Hint: Show that, for all t > 0,
Wa(N )

0+ t

Wa(N )

 N e 12 |B(a, t 12 ) G{|
,
(t)
/G
1
N
|B(a, t 2 )|
(2) 2

where N |B(0, 1)|.


(iv) Use (ii) to prove the exterior cone condition for regularity. That is,
show that a G is regular if there is an SN 1 and an (0, 1] such that
the cone
)
(

y a, RN
N
<
y R : 0 < |y a| < &
|y a|

is contained in G{
(v) If F is a closed subset of RN , r > 0, and G = {x RN : |x F | > r},
show that every boundary point of G satisfies the exterior cone condition and is
therefore regular.

428

10 Wiener Measure and P.D.E.s

Exercise 10.2.20. Let G be a non-empty, open subset of RN . In this exercise


we will give a probabilistic justification of the famous CourantFriedrichsLewy4
finite difference scheme for solving the Dirichlet problem. To this end, let {Xn :
n 1} be a sequence of independent, identically distributed RN -valued random

variables with mean 0 and covarianceI, define {Sn (t) : t 0} : n 0 as
in 9.2.1, and let Pn,x M1 C(RN ) be the distribution of x + Sn ( ). By
Donskers Invariance Principle, we know that, as n , {Pn,xn : n 0} tends
(N )
weakly to Wx if xn x. Thus, one might hope that, for f Cb (G; R),
(10.2.21)






(N ) 
EPn,x f ( G ) , G () < EWx f ( G ) , G () <

uniformly on compacts. On the other hand, in order to justify (10.2.21), one


has to confront the problem that
G () is, in general, only a lower semicontinuous function, not a continuous one. Thus, we must find conditions under
(N )
which G is Wx -almost surely continuous.

(i) Let G () = inf{t 0 : (t)


/ G}.
Obviously, G G . Show that

G () is upper semicontinuous and that G is continuous at any for

which G () = G () < .


(N )
(ii) Say that a G is strongly regular if Wa G = 0 = 1. If every a G
is strongly regular and if x G is a point at which (10.2.5) holds, show that


(N )
Wx G = G < = 1. Thus, (10.2.21) holds in this situation.

Hint: Use G < = G = G + G G .


(iii) Using Blumenthals 01 Law and the technique described in the Hint for


(N )
part (iii) of Exercise 10.2.19, show that Wa G = 0 = 1 if a G has
that is, if
positive, upper Lebesgue density in RN \ G,

|B(a, r) (RN \ G)|


> 0.
r&0
|B(a, r)|
lim

Thus, if (10.2.5) holds for all x G and every a G has positive, upper
then (10.2.21) holds uniformly for x in compact
Lebesgue density in RN \ G,
subsets of G.
4

This type of approximation was carried out originally by H. Phillips and N. Wiener in Nets
and Dirichlet problem, J. Math. Phys. 2 in 1923. Ironically, the authors do not appear to have
made the connection between their procedure and probability theory. In 1928, a more complete

analysis was carried out in the famous article Uber


die partiellen Differenzengleichungen der
Phsik, Ann. Math. 5 # 2, of R. Courant, K. Friedrichs, and H. Lewy. Interestingly, these
authors do allude to a possible probabilistic interpretation, although their method (based on
energy considerations) makes no use of probability theory.

10.3 Other Heat Kernels

429

Exercise 10.2.22. Although, as the preceding exercise shows, probability


theory provides approximation schemes with which to solve the Dirichlet problem, it is less successful when it comes to writing down explicit expressions for
solutions. Nonetheless, there is a situation in which probability theory does lead
to an explicit answer. To wit, consider the upper half-space H = RN (0, )
in RN +1 . Given y (0, ), show that, for x RN and BRN ,
(N +1)

W(x,y)



(1)  (N )
( H ) {0} = EW x, y I () ,

where y () = inf{t 0 : (t) y}. Next, recall from Exercise 7.1.24 that the
1

W (1) -distribution of y is the one-sided, 12 -stable law 21 . Thus


22 y

(N +1)

W(x,y)


( H ) {0} =

yR (y x) dy,

where
N

yR (y) =

0,tI (y) 21 (dt).


(0,)

22 y

Finally, referring to Exercise 3.3.17, conclude that yR is the Cauchy distribution


in (3.3.19). This, of course, explains the reason, alluded to in (ii) of Exercise
N
3.3.17, why analysts call yR the Poisson kernel for the upper half-space.
10.3 Other Heat Kernels
As we saw in 10.1, from the perspective of someone studying partial differential
equations, the function (t, x, y) (0, ) RN RN 7 g (N ) (t, y x) (0, )
is the heat kernel, or, equivalently, the fundamental solution, to the classical
heat equation t u = 12 u in (0, ) RN . That is, if Cb (RN ; R), then
Z
u(t, x) =
(y)g (N ) (t, y x) dy
RN

is the unique bounded solution to the classical heat equation that tends to
as t & 0. Of course, from a probabilistic perspective, g (N ) (t, y x) is the
probability (in the sense of densities) of a Brownian path going from x to y
during a time interval of length t.
In this section I will construct other functions that, on the one hand, are
the fundamental solution to a heat equation and, at the same time, the density for the probability of a Brownian motion making transitions under various
conditions.
10.3.1. A General Construction. For each t > 0, let Et : C(RN ) [0, )
be a Ft -measurable function with the property that
(10.3.1)

Es+t () = Es ()Et s )

for s, t (0, ) and C(RN ),

430

10 Wiener Measure and P.D.E.s

and define
(10.3.2)

q(t, x, y) = EW

(N )

h
i
Et x(1 `t ) + t + y`t g (N ) (t, y x),

for (t, x, y) (0, ) RN RN ,


N
where `t ( ) = t
t , [0, ), and t = (t)`t , (R ). Clearly (x, y)
(RN )2 7 q(t, x, y) [0, ) is Borel measurable for each t > 0.
My goal in this subsection is to prove the following theorem.

Theorem 10.3.3. For each t (0, ) and Borel measurable : RN R


that is bounded below,
Z
h
i
(N )
(y)q(t, x, y) dy = EWx Et () (t) .
RN

Moreover, for all s, t (0, ) and x, y RN , q satisfies the ChapmanKolmogorov equation


Z
q(s + t, x, y) =
q(s, x, z)q(t, z, y) dz.
RN

Finally, if, for each t > 0, Et is reversible in the sense that


Et () = Et (t ),

C(RN ),

where t ( ) = (t t), [0, ), then q(t, x, y) = q(t, y, x) for all (t, x, y)


(0, ) (RN )2 .
Proof: The first assertion is an easy application of (8.3.12) with n = 1. Namely,
by that result,


(N ) 
(N ) 
EWx Et () (t) = EW
Et (x + ) x + (t)
Z
Z

(N ) 
=
EW
Et (x + t + y`t ) x + y g (N ) (t, y) dy =
(y)q(t, x, y) dy.
RN

RN

To prove the ChapmanKolmogorov equation, set q(t, x, y) = g(Nq(t,x,y)


) (t,yx) .
Then, another application of (8.3.12), this time with n = 2, t1 = s, and t2 = s+t,
shows that q(s + t, x, y) equals
ZZ
h
i
(N )
EW
Es+t x+(s,s+t,(,)) +(yx)`s+t g (N ) (s, )g (N ) (t, ) dd.
RN RN


Next note that, by (10.3.1), Es+t x + (s,s+t,(,)) + (y x )`s+t equals

 
s
(y x ) `s
Es x + s + + s+t

s
(y x ) + (s )t
Et x + + s+t


s
(y x ) `t ( s) .
+ y x + + s+t

10.3 Other Heat Kernels

431

Therefore, since Es is Fs -measurable and s  [0, s] is W (N ) -independent of


(s )t ,
h
i
(N )
EW
Es+t x + (s,s+t,(,)) + (y x )`s+t


s
s
(y x ), y .
(y x ) q t, x + + s+t
= q s, x, x + + s+t

Plugging this into the expression for q(s + t, x, y) and making the change of
s
(y x ), one finds that q(s + t, x, y) equals
variable x + + s+t
ZZ
q(s, x, )
q (t, , y)g (N ) (s, + c)g (N ) (t, + c) dd,
RN RN

where =
Z

s
s+t ,

t
s+t ,

and c =

tx+sy
s+t .

At the same time, by Exercise 10.3.34,

g (N ) (s, + c)g (N ) (t, + c) d = g (N )

RN

st
s+t ,

g (N ) (s, x)g (N ) (t, )


,
g (N ) (s + t, y x)

and so we are done.


To prove the last assertion, simply note that when Et is reversible, q(t, x, y)
equals
h
h
i
i
(N )
(N )
EW
Et x(1`t )+t +y`t = EW
Et y(1`t )+(t )t +y`t = q(t, y, x),
since, by part (ii) of Exercise 8.3.22,
(t )t  [0, t] has the same distribution
(N )
under W
as
t  [0, t]. 
10.3.2. The Dirichlet Heat Kernel.
 Let G be a non-empty, open subset
of RN , and set EtG () = 1(t,) G () . Obviously Et is Ft -measurable and
(10.3.1) holds. In addition, if pG (t, x, y) is used to denote the associated q given
in (10.3.2), then, pG (t, x, y) = 0 unless x, y G, and, by Theorem 10.3.3,
Z
h
i

(N )
(y)pG (t, x, y) dy = EWx (t) , G () > t ,
(10.3.4)
G
for (t, x) (0, ) G,
G

(10.3.5) p (s+t, x, y) =

pG (s, x, z)pG (t, z, y) dz,

(s, x), (t, y) (0, )G,

and
(10.3.6).

pG (t, x, y) = pG (t, y, x)

for (t, x, y) (0, ) G2 .

In order to show that pG is smooth on (0, ) G2 , I will use the Duhamel


formula contained in the following.

432

10 Wiener Measure and P.D.E.s

Theorem 10.3.7. For all (t, x, y) (0, ) G2 ,


h
i

(N )
(10.3.8) pG (t, x, y) = g (N ) (t, y x) EWx g (N ) t, y ( G ) , G () < t .
Proof: Given (0, 1), set


q (t, x, y) = W (N ) x + t ( ) + (y x)`t ( ) G, [0, t] g (N ) (t, y x).
Clearly q (t, x, y) & pG (t, x, y) as % 1 for each (t, x, y) (0, ) G2 . Thus,
it suffices for us to know that, for (0, 1) and (t, x, y) (0, ) G2 ,
h
i

(N )
(*) q (t, x, y) = g (N ) (t, y x) EWx g (N ) t, y ( G ) , G () t .
Further, by the same argument as was used to prove the first assertion in Theorem 10.3.3, for any Cc (G; R),
Z
h
i

(N )
(y)q (t, x, y) dy = EWx ( G ) , G () > t
G
Z
=
(y)g (N ) (y x) dy
G
hZ
i

(N )
EWx
(y)g (N ) t G (), y ( G ) , G () t ,
G

where, in the passage to the second line, I have applied the same reasoning as was
suggested in part (i) of Exercise 7.3.7. Hence, (*) will follow once y
q (t, x, y)
q (t,x,y)
g(N ) (t,yx) is shown to be continuous on G. To this end, argue as in the last
part of Theorem (10.3.1) and apply the Markov property to show that q (t, x, y)
equals

W (N ) x + t (t ) + (y x)`t (t ) G, [(1 )t, t]

= W (N ) y + t ( ) + (x y)`t ( ) G, [(1 )t, t]
Z




=
g (N ) (1 )t, z y Wz(N ) ( ) + y (t) `t ( ) G dz,
RN

which is certainly continuous with respect to y. 


The importance of (10.3.8) is that it provides vital information used in the
proof of the next theorem.
Theorem 10.3.9. For each (t, x, y) (0, )G2 , pG (t, x, y) > 0 or pG (t, x, y)
= 0 according to whether y is or is not in the same connected component of G
as x. Furthermore, for each K G,


Z




lim sup 1
pG (t, x, y) dy = 0 for all r > 0,

t&0 xK
GB(x,r)

10.3 Other Heat Kernels

433

and
sup pG (t, x, y) = 0 for (s, a) (0, ) reg G.

lim

(t,x)(s,a) yK
xG

Finally, pG is smooth on (0, ) G2 , and, for each m 1, tm pG (t, x, y) =


G
m m G
2m m
y p (t, x, y) on (0, ) G G.
x p (t, x, y) = 2
Proof: Obviously, pG (t, x, y) = 0 unless x and y lie in the same connected
component of G. On the other hand, if x and y lie in the same connected
component of G, then there is a smooth f : [0, t]
 G such that f (0) = x and
f (t) = y. Thus, if h( ) = f ( t) x 1 `t ( t) y`t ( t) for [0, ),
then h (RN ) and there is an r > 0 such that x 1 `t ( ) + t ( ) + y`t ( )
G, [0, t], for B(RN ) (h, r). Hence, by Corollary 8.3.6,

pG (t, x, y) W (N ) B(RN ) (h, r) > 0.
Next, let K G be given. Because, by (10.3.8), pG (t, x, y) g (N ) (y x),
Z

pG (t, x, y) dy

pG (t, x, y) dy
G
Z

(N ) G
= Wx ( > t

GB(x,r)

g (N ) (y x) dy

RN \B(x,r)

g (N ) (y) dy,

RN \B(0,r)

and therefore



Z




lim sup 1
pG (t, x, y) dy lim sup Wx(N ) ( G t),
t&0 xK
t&0 xK
GB(x,r)

which, by (4.3.13), is 0. Also, again by (10.3.8),


(N )

pG (t, x, y) =EWx

h
i

g (N ) (t, y x) g (N ) t G (), y ( G ) , G < t

+ g (N ) (t, y x)Wx(N ) ( G t).


Thus, as an application of (10.2.13), it is an easy matter to see that pG (t, x, y)
tends to 0 uniformly in y K as (t, x) (s, a) (0, ) reg G.
To prove the asserted smoothness properties, begin with the observation that,
for any multi-index NN 1
x g (N ) (t, x) = t
1

N +kk
2

 |x|2
1
P t 2 x e 2t ,

I use the conventional multi-index notation for partial derivatives. Thus, if = (1 , . . . , N )

NN , then y = y11 yNN and kk =

PN
i=1

i .

434

10 Wiener Measure and P.D.E.s

where P is an kkth order polynomial. Hence, by considering the cases t |x|2


and t |x|2 separately, one finds that


max x g (N ) (t, x)

(10.3.10)

kk=n

Cn

(t + |x|2 )

N +n e

|x|2
4t

for some Cn < . Hence, if kk = n, then, by (10.3.8),


(N )

EWx

 (N )


y g
t G (), y ( G ) , G () < t
i
h |y(G )|2
(N )
Cn
G
Wx

4t
,

()
<
t
.
E
e

|y G|N +n

At the same time,


(N )

Wx



i
h |y(G )|2
|yx|2
|y x|
(N
)
G

4t
,
kk[0,t]
, () < t e 16t + W
e
2

and so we now see (cf. (4.3.13)) that, for some other choice of Cn < ,


(10.3.11) y pG (t, x, y) Cn

(t + |y x|2 )

N +n
2

1
+
|y G|(N +n)

!
e

|yx|2
16N t

when kk = n.
Combining (10.3.11) with the symmetry of pG , we have


(10.3.12) x pG (t, x, y) Cn

(t + |y x|2 )

N +n
2

1
+
|x G|(N +n)

!
e

|yx|2
16N t

In addition, from (10.3.5),


pG (t, x, y) =

Z
G

pG

t
2 , x, z

pG

t
2 , z, y

dz,

and so, by (10.3.12) and (10.3.11), we see that (x, y)


pG (t, x, y) is smooth for
each t (0, ).
To check the assertions about the time derivatives, first observe that for any
Cb2 (G; R) and (x, y) G2 ,

Z

1
pG (h, x, y)(y) dy (x) = 12 (x)
h&0 h
ZG

1
G
p (h, x, y)(x) dx (y) = 12 (y).
lim
h&0 h
G
lim

10.3 Other Heat Kernels

435

To see this, use the symmetry of pG to show that the second of these follows
from the first one. To prove the first one, use pG (h, x, y) g (N ) (h, y x) and
(10.3.8) to show that, for any Cc2 (RN ; R) that equals in a neighborhood
of x,
Z

Z


G
(N
)
p (h, x, y)(y) dy (x)
g (h, y x)(y)

dy

G

tends to 0 faster than any power of h. Thus, since


Z

1
(N )
g (h, y x)(y)

dy (x) 12 (x),
h
G

the assertion is proved. Given the preceding, we know that



i 1 Z
1h G
G
G
G
G
p (h, x, z)p (t, z, y) dz p (t, x, y)
p (t + h, x, y) p (t, x, y) =
h G
h

tends to 12 x pG (t, x, y). Thus, t pG (t, x, y) = 12 x pG (t, x, y). Similarly, using


Z
Z
pG (t + h, x, y) =
pG (t, x, z)pG (h, z, y) dz =
pG (h, y, z)pG (t, x, z) dz,
G
G

one gets t p (t, x, y) =


(10.3.11) to justify

tm pG (t

G
1
G
2 y p (t, x, y).

+ h, x, y) = 2

Finally, assume the result for m, use

G
pG (h, x, z)m
y p (t, z, y) dz,

differentiate this with respect to h, and let h & 0 to arrive at


G
tm+1 pG (t, x, y) = 2m1 x m
y p (t, x, y)
 m+1 G
x p (t, x, y)
= 2m1
G
m+1 G
m
p (t, x, y). 
y x p (t, x, y) = y

The following result provides the justification for my calling pG the Dirichlet
heat kernel on G.
Corollary 10.3.13. For each Cb (G; R), the function
Z
 G

(N ) 
Wx
u(t, x) = E
(t) , () > t =
(y)pG (t, x, y) dy
G

is a smooth solution to the boundary value problem


t u(t, x) = 12 u(t, x) in (0, ) G,

lim u(t, ) =

t&0

lim
(t,x)(s,a)
xG

u(t, x) = 0

uniformly on compacts,
for (s, a) (0, ) reg G.

Moreover, if G = reg G, then u is the only bounded solution to this boundary


value problem.

436

10 Wiener Measure and P.D.E.s

Proof: That the u in the first part is a bounded, smooth solution follows easily
from (10.3.12) and the last part of Theorem 10.3.9. To prove the uniqueness
assertion when G = reg G, choose {Gn : n 1} to be a non-decreasing
S
sequence of open sets so that Gn G and G = n1 Gn . Given a bounded
solution u, apply Theorem 10.1.2 to see that, for each n 1, u(t, x) equals
(N )





(N ) 
(t) , Gn () > t + EWx u t Gn (), ( Gn ) , Gn () t




(N ) 
(N ) 
= EWx ((t) , G () > t + EWx u t Gn (), ( Gn ) , G () < t



(N ) 
+ EWx u t Gn (), ( Gn ) (t) , Gn () t < G () .

EWx

Since Gn % G , the second and third terms on the right tend to 0 as n . 


Remark 10.3.14. The uniqueness part of Corollary 10.3.13 continues to hold
even if G 6= reg G. Indeed, the only place at which I used the assumption
that

G = reg G was where I wanted to know that u t Gn (), ( Gn () 0 on
{ G < t}, and for this I needed to be sure that G () < = ( G ) reg G.
(N )
However, it would have been enough to know that ( G ) reg G for Wx almost every { G < }, and this is always the case. Because the proof

(N )
that, for x G, Wx ( G )
/ reg G = 0 is not simple, and since Corollary
10.3.13 covers most applications, I have chosen to settle for the weaker statement
here and postpone the proof of the general case until the next chapter (cf. 11.1).
10.3.3. FeynmanKac Heat Kernels. In this subsection I will put some
of the considerations in 10.1.3 and 10.1.4 into a more general framework.
Let V : RN R be a Borel measurable function that is bounded above, and
define
q V (t, x, y)
(10.3.15)

= EW

(N )


Z t

exp
V x + t + (y x)`t ( ) d
g (N ) (t, y x).
0

By applying Theorem 10.3.3 with


t

Z


V ( ) d

Et () exp


,

we see that
Z
(10.3.16)

(y)q (t, x, y) dy = E
RN

(N )

Wx


Z t




exp
V ( ) d (t)
0

for (t, x) (0, ) RN and Borel measurable s that are bounded below,
Z
V
(10.3.17)
q (t, x, y) =
q V (s, x, z)q V (t, z, y) dz
RN

10.3 Other Heat Kernels

437

for (s, x), (t, y) (0, ) RN , and


(10.3.18)

q V (t, x, y) = q V (t, y, x) for (t, x, y) (0, ) (RN )2 .

As a consequence of (10.3.16) and Theorem 10.1.2, we know that q V (t, x, )


is intimately related to the operator 12 + V . Indeed, by that theorem, we know

that if u Cb1,2 (0, ) RN ; R satisfies the Cauchy initial value problem

(10.3.19)

t u = 12 u + V u

with lim u(t, ) = uniformly on compacts


t&0

for some Cb (RN ; R), then


Z
(10.3.20)
u(t, x) =
(y) q V (t, x, y) dy

(t, x) (0, ) RN .

RN

I now want to make an analysis of q V (t, x, ) which, among other things, will
enable me to show (cf. Corollary 10.3.22) that, under suitable conditions on V ,
the right-hand side of (10.3.20) is necessarily a solution to (10.3.19). For this
reason, I will call q V the FeynmanKac heat kernel with potential V .
Assume that V C n (RN ; R) is bounded above and that,

Theorem 10.3.21.
for some Cn < ,




max x V (x) Cn 1 + V (x) ,

kkn

x RN .

Then q V (t, , y) C n (RN ; R) for every (t, y) (0, ) R, x q V (t, x, )


C n (RN ; R) for each NN with kk n, and there exists a C < such that,
when kk kk n,
V

|yx|2
N +kk+kk 
+
x y q (t, x, y) C 1 + (t + |y x|2 )
2
etkV ku 4t .

Finally, if n 2 and m

tm q V (t, x, y) =

1
2 x

n
2,

then

m
+ V (x) q V (t, x, y) =

1
2 y

m
+ V (y) q V (t, x, y).

Proof: To prove the differentiability properties of q V (t, x, y) with respect to


x, let kk n be given, use (10.3.15) to see that x q V (t, x, y) is a finite linear
combination of terms of the form
#
" ` Z
Y t
(k)


(k)
(N )
k
k
( V ) t,x,y ( ) E V (t, x, y, ) d
EW
1 t
k=1

(0)

x g (N ) (t, y x),

438

10 Wiener Measure and P.D.E.s

Rt
V (t,x,y ( )) d
, and
where t,x,y ( ) = x + t ( ) + t (y x), E V (t, x, y, ) = e 0
P`
(k)
= . Since, by our hypotheses, each of the integrands in these terms
k=0
+
is bounded by a constant times etkV ku , the asserted estimate for x q V (t, x, y)
follows from this and (10.3.10).
The rest of the proof is similar to, but easier than, that of Theorem 10.3.9.
Specifically, one uses q V (t, x, y) = q V (t, y, x) and
Z

q (t, x, y) =

qV

RN

t
2 , x, z

qV

t
2 , z, y

dz

to prove the existence of and estimate for x y q V (t, x, y). Also, knowing these
results about the spacial derivatives, one deals with the time derivatives in the
same way as I did at the end of that theorem. The details are left to the
reader. 
Corollary 10.3.22. Let V be as in Theorem 10.3.21, and assume that n 2.
Then, for each Cb (RN ; R), the function
(N )

Wx

u(t, x) = E

Z
h Rt
i
V (( )) d
0
e
t) =

(y)q V (t, x, y) dy

RN


is the unique u C 1,2 (0, ) RN ; R that is bounded on (0, T ) RN for each
T > 0 and satisfies (10.3.19).
Proof: The only assertion that has not already been proved is that the u
described takes on the correct initial value. However, because q V (t, x, y)
+
ekV ku g (N ) (t, y x), it is clear that, for each r > 0,
Z

q V (t, x, y) dy = 0.

lim sup

t&0 xRN

B(x,r){

Hence, all that remains is to check that, for each R > 0,




Z


lim sup 1
q V (t, x, y) dy = 0.
t&0 |x|R

RN

But if K(R) = sup|y|2R |V (y)|, then



Z

sup 1




Rt


V (( )) d
W (N ) x

q (t, x, y) dy E
1 e 0

|x|R
RN


+
tK(R)etK(R) + 1 + etkV ku W (N ) kk[0,t] R ,
V

which, by (4.3.13), gives the desired conclusion.

10.3 Other Heat Kernels

439

10.3.4. Ground States and Associated Measures on Pathspace. From


a probabilistic standpoint, the heat kernel q V (t, x, y) is flawed by the fact that
it is not a probability density. However, in many cases this flaw can be removed
by what physicists call switching to the ground state representation.
This terminology and the ideas underlying it are best understood when expressed in terms of operators. Thus, let V C(RN ; R) be bounded above, refer
to the preceding subsection, and define the operator
Z
V
Qt (x) =
(y)q V (t, x, y) dy for t 0 and Cb (RN ; R).
RN

QVt

We know that
is a bounded map from Cb (RN ; R) into itself. In addition, by
V
(10.3.17), {Qt : t 0} is a semigroup. That is, QVs+t = QVt QVs . Also, by
Corollary 10.3.22, we know that if

(10.3.23)
V C 2 (RN ; R) and max | V | C 1 + V ,
kk2

then (t, x)
QVt (x) is a solution to (10.3.19).
I will say that : RN R is a ground state for V if is a (strictly) positive,
continuous function that satisfies the equation et = QVt for some R and
all t 0, in which case will be called the eigenvalue associated with .

Lemma 10.3.24. Let V be as above, and assume that C RN ; [0, ) does
not vanish identically. If et = QVt for all t 0, then
 is a ground state
with associated eigenvalue . In fact, Cb2 RN ; (0, ) if is bounded and
V C 2 (RN ; R) satisfies (10.3.23). Next, if is a twice continuously differentiable
ground state with associated eigenvalue , then 12 + V = . Conversely,
if is a twice continuously differentiable, bounded solution to 12 + V = ,
then is a ground state with associated eigenvalue .

Proof: Since I can always replace V by V , I may and will assume that = 0
throughout. Also, observe that if C RN ; [0, ) satisfies = QV1 , then,
because q V (1, x, y) > 0 everywhere, > 0 everywhere unless 0. Hence, the
first assertion is proved.
Next suppose that is a twice continuously differentiable ground state with
eigenvalue 0. To see that 12 + V = 0, it suffices to show that

N
1
2 + V , L2 (RN ;R) = 0 for all Cc (R ; R).

To this end, let Cc (RN ; R) be given, and apply symmetry, Theorem 10.1.2,
and Fubinis Theorem to justify


0 = , QV1 L2 (RN ;R) = QV1 , L2 (RN ;R)
Z 1
 
=
QV 21 + V , 2 N d
L (R ;R)

Z 1
0

1
2

+ V , QV


L2 (RN ;R)

d =

1
2


+ V , L2 (RN ;R) .

440

10 Wiener Measure and P.D.E.s

Finally, suppose that is a bounded, twice continuously differentiable solution


to 12 + V = 0. Then, by Corollary 10.1.3 applied to the time-independent
function u(t, ) = , we know that = QVt for all t 0. Thus, by the initial
observation, is a ground state with associated eigenvalue 0. 

Theorem 10.3.25. Let V C(RN ; R) be bounded above, assume that is a


ground state for V with associated eigenvalue , and set
p (t, x, y) = et (x)1 q V (t, x, y)(y) for (t, x, y).
Then p is a strictly positive, continuous function, p (t, x, ) has total integral
1 for all (t, x) (0, ) RN ,
Z
lim sup
p (t, x, y) dy = 0 for all r, R (0, ),
t&0 |x|R

B(0,r)

and
p (s + t, x, y) =

p (t, z, y)p (t, x, z) dz.

RN

Finally, if V C 2 (RN ; R) satisfies (10.3.23), then x


p (t, x, y) is twice conN
tinuously differentiable for each (t, y) (0, ) R , y
x p (t, x, y) is twice
continuously differentiable for each with kk 2 and (t, x) (0, ) RN ,
and

t p (t, x, y) = 12 x p (t, x, y) + x (log ), x p (t, x, y) RN

= 12 y p (t, x, y) divy p (t, x, y) log (y)

for all (t, x, y) (0, ) RN RN . In particular, for each Cb (RN ; R), the
function
Z
u(t, x) =
(y)p (t, x, y) dy
N
R

is the one and only bounded u C 1,2 (0, ) RN ; R that satisfies

t u(t, x) = 12 u(t, x) + log (x), u(t, x) RN in (0, ) RN

lim u(t, x) = (x) uniformly on compacts.

t&0

Proof: The only assertion that is not an immediate consequence of Theorem


10.3.21, Corollary 10.3.22, and the preceding lemma is the uniqueness in the final
part, which is an easy consequence of the corresponding uniqueness statement in
Corollary 10.3.22. Indeed, if u is a bounded solution to the given Cauchy initial
value problem and w(t, ) = u(t, ), then w is a bounded solution to t w =
1
2 w + (V )w with initial condition . Hence, by
R the uniqueness result in
Corollary 10.3.22, w(t, ) = QVt (), and so u(t, ) = RN (y)p (t, x, y) dy. 

The advantage that p (t, x, y) has over q V (t, x, y) is that we can construct
measures on C(RN ) that bear the same relationship to it as the Wiener measures
(N )
Wx bear to the classical heat kernel g (N ) (t, y x).

10.3 Other Heat Kernels

441

Theorem 10.3.26. Let V C(RN ; R) be bounded above, and assume that is


a ground state for V with associated
eigenvalue . Then, for each x RN , there

is a unique Px M1 C(RN ) such that, for each n 1, 0 = t0 t1 < < tm ,
and , , n BRN ,
Px


(tm ) m , 1 m n =

Z Y
n

p tm tm1 , ym1 , ym ) dy1 dyn ,

1 n m=1

where y0 = x. In fact, if

R (t, ) = e

1 V

(0)
E (t, ) (t)

then

(N )

Px (A) = EWx
Finally, x

Rt
V ((( )) d
where E (t, ) = e 0
,
V



R (t), A for all t 0 and A Ft .

Px is continuous, and, for any stopping time ,

Z
F ,

Px (d)

{()<}

Z

Z
=

F (,

) P() (d 0 )

Px (d)

{()<}

whenever F : C(RN ) C(RN ) R is a F BC(RN ) -measurable function that


is bounded below.
(N ) 
Proof: I begin by showing the R (t), Ft , Wx
is a martingale. Indeed,

(N ) 
Wx

N
E
R (t) = 1 for all (t, x) [0, ) R . In addition, R (s + t, ) =

R (s, )R t, s , and so, by (10.2.2),


(N )

EWx



R (s+t), A =

R (s, )E

(N )



(N ) 
R (t) Wx(N ) (d) = EWx R (s), A

W(s) 

for A Fs .

(N )
Determine t,x M1 C(RN ) by t,x (d) = R(t, )Wx (d). By the preceding, t1 ,x  Ft1 = t2 ,x  Ft1 for
 all 0 t1 t2 , and so (cf. Exercise 9.3.6)
there is a unique Px M1 C(RN ) whose restriction to Ft is the same as that
of t,x for all t 0.
To see that x
Px is continuous, it suffices to check that
lim R (t, y + ) = R (t, x + )

yx

in L1 (W (N ) ; R).

But clearly this convergence is taking place pointwise for each C(RN ). In
addition, R (t, ) 0 and, for each z RN , R (t, z + ) has W (N ) -integral 1.
Hence, the convergence is also taking place in L1 (W (N ) ; R).

442

10 Wiener Measure and P.D.E.s

Now suppose that is a stopping time and that T for some T (0, ).
Then, for any F FT -measurable F : C(RN )2 R that is bounded below,
Z

F , Px (d)
Z



= R (), R (2T (), F , Wx(N ) (d)
Z

Z


(N )

0
0
0
= R (),
R 2T (), F (, )W() (d ) Wx(N ) (d)
Z

Z


0
0
= R (),
F (, ) P() (d ) Wx(N ) (d)

Z Z

0
0
=
F (, ) P() (d ) Px (d),
where I have again used (10.2.2) and, in the final step, Hunts Theorem (cf.
Theorem 7.1.14) to replace R (), ) by R (T, ). Starting from this, one
can easily remove the condition that is bounded and extend the result to all
F s that are F BC(RN ) -measurable and bounded below.
To complete the proof, observe that, as a special case of the preceding,
Z

 


EPx (s + t) , A = EPx
p t, (s), y) dy, A
RN

for all s, t [0, ), A Fs , and bounded Borel measurable : RN R.


Hence, proceeding by induction on n and applying the preceding at each stage,
one can readily show that Px is related to p (t, x, y) in the way described in the
initial assertion. 
Corollary 10.3.27. Let everything be as in Theorem 10.3.26, only this
time assume that is twice
continuously differentiable. Then, for
 any bounded

1
1,2
N
C
[0, ) R ; R such that f t + 2 + log , RN is bounded,


Z t



t, (t)
f , ( ) d, Ft , Px
0
N

is a martingale for all x R .


Proof: By replacing V with V , I can reduce to the case when = 0. Hence,
I will assume that = 0.
Rt
V ((( )) d
To prove the asserted martingale property, set E V (t, ) = e 0
, and
1
1
remember that, by Lemma 10.3.24, 2 +V = 0. Thus, 2 ()+V () = f ,
and so, by Theorem 10.1.2,


E V (t B(x,R) (), )() t B(x,R) (), (t B(x,R) )

t B(x,R) ()
V

E (, )(f ) , ( )
0

d, Ft , Wx(N )

10.3 Other Heat Kernels

443

is a martingale for every R > 0. Equivalently,






R t B(x,R) (), (t B(x,R) )
t B(x,R) ()

R (, )f ( ) d, Ft , Wx(N )

0
(N ) 

is a martingale. Hence, since R (t), Ft , Wx


Theorem 7.1.14 to see that
(t B(x,R) )


is a martingale, one can apply

t B(x,R) ()

!

f ( ) d, Ft , Px

is a martingale. Finally, since and f are bounded, we can let R and


thereby get the required conclusion. 
In order to understand the relationship between Brownian paths and the paths
as seen by the measure Px , I will need the following general lemma.
Lemma 10.3.28. Let b : RN RN be a continuous function, and set
Z
B(t, ) = (t) x


b ( ) d.



If P M1 C(RN ) has the properties that P (0) = x = 1 and



Z t
 


1
( ) d, Ft , P
(t)
2 + b, RN
0


is a martingale for all Cc (RN ; R), then B(t), Ft , P is a Brownian motion.
Proof: Without loss in generality, I will assume that x = 0.
Given RN and R > 0, set e (y) = e 1(,y)RN ,


||2
1 , b(y) RN , and ER (t, ) = exp
f (y) =
2

t B(0,R) ()


f ( ) d

!
.

By choosing Cc (RN ; C) so that = e on B(0, 2R) and applying Doobs


Stopping Time Theorem, we know that MR (t), Ft , P is a martingale, where
MR (t, )

= e (t

B(0,R)

) +
0

t B(0,R) ()



f ( ) e ( ) d.

444

10 Wiener Measure and P.D.E.s

Thus, by Theorem 7.1.17,


ER (t)MR (t)

t B(0,R) ()

!
MR (, )f ( ) ER (, ) d, Ft , P


is also a martingale. At the same time, after performing elementary calculus


operations, one sees that



||2
t B(0,R) ()
exp 1 , B(t B(0,R) () RN +
2
Z t B(0,R) ()

= ER (t)MR (t)
MR (, )f ( ) ER (, ) d.
0

Hence

exp

1 , B(t

B(0,R)




||2
B(0,R)
t
() , Ft , P
() RN +
2

is a martingale for every R > 0,


 and so, after letting R , we know, by
Theorem 7.1.7, that B(t), Ft , P is a Brownian motion. 
It is important to be clear about what Lemma 10.3.28 says and what it does not
say. It says that there is a progressively measurable B : [0, ) C(RN ) RN
such that B(t), Ft , P is a Brownian motion and
Z t

(*)
(t) = x + B(t, ) +
b ( ) d, (t, ) [0, ) C(RN ).
0

In the probabilistic literature, this would be summarized by saying that P is


the distribution of a Brownian motion with drift b. What Lemma 10.3.28
does not say is that one can always use (*) to reconstruct from B( , ).
More precisely, is not necessarily a measurable function of B( , ). Indeed,
without additional assumptions on b, it will not be a measurable function of B.
Nonetheless, if b is locally
Lipschitz continuous, then it will be. To see this,

N
take Cc R
;
[0,
1]
so
that
= 1 on B(0, 2) and 0 off of B(0, 3), and set

y
b(y). Then bR is uniformly Lipschitz continuous, and so, by
bR (y) = R
completely standard methods (e.g., Picard iteration), one can show that there
is a continuous map C(RN ) 7 XR ( , ) C(RN ) such that, for each
C(RN ),
Z t

R
X (t, ) = (t) +
bR XR (, ) d, t 0.
0

Moreover, if C(RN ) and


Z
(t) = (t) +
0


bR ( ) d,

t [0, T ],

10.3 Other Heat Kernels

445

then  [0, T ] = XR ( , )  [0, T ]. Hence, if




A(b) = C(RN ) : t 0 R > 0 kXR ( , )k[0,t] R ,
then A(b) BC(RN ) , and I can define the Borel measurable map C(RN )
7 Xb ( , ) C(RN ) given by

Xb (t, ) =

XR (t, )

if A(b) and kXR ( , )k[0,t] R

(t)

if
/ A(b).

In particular, when b is locally Lipschitz continuous,


Lemma 10.3.28 says that

x+B( , ) A(b) and (t) = Xb t, x+B( , ) for all (t, ) [0, )C(RN ).
Corollary 10.3.29. Let everything be as in Corollary 10.3.27, b = log ,
and define the set A(b ) and the map Xb accordingly, as in the preceding

(N )
(N )
discussion. Then Wx A(b ) = 1 and Px = (Xb ) Wx for all x RN .
Proof: Define
B( , ) in terms of b as in Corollary 10.3.27. Then, by
(N )
that corollary, we know that Wx is the distribution of
x + B( , ) under

Px . Therefore, since x + B( , ) A(b ) and (t) = Xb t, x + B( , ) for
all (t, ) [0, ) C(RN ), the desired conclusions follow immediately. 
10.3.5. Producing Ground States. As yet I have not addressed the problem of producing ground states. In this subsection I will provide two approaches.
The first of these gives a criterion that guarantees the existence of a ground state
for a given V . The second goes in the opposite direction.
It is the essentially

trivial remark that there are many C 2 RN ; (0, ) such that is the ground
state of some V .
The first approach is an application of elementary spectral theory and is based
on the observation that, because q V (t, x, y) = q V (t, y, x), QVt is symmetric on
L2 (RN ; R) in the sense that
(10.3.30)

1 , QVt 2


L2 (RN ;R)

= 2 , QVt 1


L2 (RN ;R)

for all 1 , 2 Cc (RN ; R).


The fact that QVt is symmetric on L2 (RN ; R) has profound implications, a few
of which are contained in the following lemma.
Lemma 10.3.31. For each q [1, ) and t (0, ), QVt  Cc (RN ; R) admits
a unique extension (which I again denote by QVt ) as a bounded linear operator on
+
Lq (RN ; R) into itself with norm at most etkV ku . Moreover, for each t > 0, QVt
is non-negative definite and self-adjoint on L2 (RN ; R), and, for each q [1, ),
QVt takes Lq (RN ; R) into Cb (RN ; R) for each q [1, ) and
N

kQVt (x)ku (2t) 2q etkV

ku

kkLq (RN ;R) .

446

10 Wiener Measure and P.D.E.s

Finally,
ZZ

q (2t, x, x) dx (4t)

q (t, x, y) dx dy =

N
2

RN RN

e2tV (x) dx.

RN

RN

Proof: Given q [1, ) and a Borel measurable : RN [0, ), we have,


by Jensens Inequality, that
h Rt
q
i q
(N )
V (( )) d
QVt (x) = EWx e 0
(t)
Z
h Rt
q i
(N )
+
q
V (( )) d
EWx e 0
(t)
eqtkV ku
(y)q g (N ) (t, y x) dy.
RN

Hence, since g (N ) (t, ) has L1 (RN ; R) norm 1,


kQVt kLq (RN ;R) etkV

ku

kkLq (RN ;R) ,

and so we have proved the first assertion. In addition, if q 0 is that Holder


conjugate of q, then
kQVt

ku e

tkV + ku

kg

(N )

(t, )kLq0 (RN ;R) kkLq (RN ;R)

etkV

ku
N

(2t) 2q

kkLq (RN ;R) .

Thus, since QVt maps Cc (RN ; R) into Cb (RN ; R), it also takes Lq (RN ; R) there.
Because (10.3.30) holds for elements of Cc (RN ; R), the preceding estimates
make it clear that it continues to hold for elements of L2 (RN ; R). That is, QVt is
self-adjoint on L2 (RN ; R). To see that it is non-negative definite, simply observe
that


, QVt L2 (RN ;R) = QVt , QVt L2 (RN ;R) 0.
2

Turning to the final estimate, note that (cf. (10.3.17))


Z
Z
q V (t, x, y)2 dy =
q V (t, x, y)q V (t, y, x) dy = q V (2t, x, x).
RN

RN

At the same time, by Jensens Inequality,


h R 2t
i
(N )
V (x+2t ( )) d
q V (2t, x, x) = EW
e 0
g (N ) (2t, 0)
Z
1 2t W (N )  2tV (x+2t ( )) 
N
2
E
e
d,
(4t)
2t 0

and, by Tonellis Theorem,


Z
Z

(N ) 
EW
e2tV (x+( )) dx =
RN

e2tV (x) dx. 

RN

In the language of functional analysis, the last part of Lemma 10.3.31 says
that QVT is HilbertSchmidt and therefore compact if e2T V L1 (RN ; R). As
a consequence, the elementary theory of compact, self-adjoint operators allows
us to make the conclusions drawn in the following theorem.

10.3 Other Heat Kernels

447

Theorem 10.3.32. Assume that eT V L2 (RN ; R) for some T (0, ). Then
there is a unique Cb RN ; (0, ) L2 (RN ; R) such that
kkL2 (RN ;R) = 1 and et = QVt for some R and all t (0, ).
Moreover, if V C 2 (RN ; R) satisfies (10.3.23), then p (t, , y) C 2 (RN ; R) and


t p (t, x, y) = 12 x p (t, x, y)+ log (x), x p (t, x, y) RN in (0, )RN RN .
Proof: The spectral theory of compact, self-adjoint operators guarantees that
the operator QVT has a completely discrete spectrum and that its largest eigenvalue is



(T ) = sup , QVT L2 (RN ;R) : kkL2 (RN ;R) = 1 .
Now let be an L2 (RN ; R)-normalized eigenvector for QVT with eigenvalue
(T ). Because (T ) = QVT , we know that can be taken to be continuous.
In addition, by the preceding paragraph,
ZZ

(x)q V (T, x, y)(y) dx dy = (T )

RN RN

ZZ

|(x)|q V (T, x, y)|(y)| dx dy,

RN RN

which, because q V (T, x, y) > 0 for all (x, y), is possible only if (T ) > 0 and
never changes sign. Therefore we can be take to be non-negative. But,
if 0, then, since p (T, x, y) > 0 everywhere and (T ) = QVT , > 0
everywhere. Thus, we have now shown that every normalized eigenvector for
QVT with eigenvalue (T ) is a bounded, continuous function that, after a change
of sign, can be taken to be strictly positive. In particular, if 1 and 2 were
linearly independent, normalized eigenvectors of QVT with eigenvalue (T ), then
g=

2 (1 , 2 )L2 (RN ;R) 1


k2 (1 , 2 )L2 (RN ;R) 1 kL2 (RN ;R)

would also be such an eigenvector, and this one would be orthogonal


to 1 .

On the other hand, since neither 1 nor g changes sign, 1 , g L2 (RN ;R) 6= 0. In
summary, we now know that there is, up to sign, a unique L2 (RN ; R)-normalized
eigenvector for QVT with eigenvalue (T ) and that can be taken to be strictly
positive, bounded, and continuous.
To complete the proof, I must show that QVt = et , where
= T1 log (T ).

To this end, set t = QVt for t > 0. Then t Cb RN ; (0, ) for each t > 0 and
t
t (x) is continuous for each x RN . Moreover, QVT t = QVt QVT = (T )t .
Hence, by the uniqueness proved above, t = (t) for some (t) R. In

448

10 Wiener Measure and P.D.E.s

addition, because t
Finally,

t (x) is continuous and strictly positive, so is t

(s + t) = , QVs+t


L2 (RN ;R)

= (s) , QVt


L2 (RN ;R)

(t).

= (s)(t),

which means that (t) = et for some R, and, because (T ) = eT , this completes the proof of everything except the final statement, which is an immediate
consequence of Theorem 10.3.21. 
If nothing else, Theorem 10.3.32 helps to explain the terminology that I have
been using. In Schr
odinger mechanics, the function in Theorem 10.3.32 is
called the ground state because it is the wave function corresponding to the
lowest energy level of the quantum mechanical Hamiltonian 12 V . From
our standpoint, its importance is that it shows that lots of V s admit a ground
state.
I turn now to the second method
for producing ground states. Namely, sup
pose that C 2 RN ; (0, ) . Then, it is obvious that 12 + V = 0, where

V =

log + | log |2

.
=
2
2

Theorem 10.3.33.
Let U C 2 (RN ; R), and assume that both U and V U

1
2
N
2 U + |U |  are bounded above. Then,
 for each x R , there is a unique
U
N
U
Px M1 C(R ) such that Px (0) = x = 1 and



(t)

1
2

Z t
0

 

+ U, RN ( ) d, Ft , PU
x

is a martingale for all Cc (RN ; R). Moreover, for each x RN ,




Z
(t) x


U ( ) d, Ft , PU
x

is a Brownian motion and


PU
x (A)

=e

U (x)

(N )

Wx

Rt U
h
i
U (((t))+
V (( )) d
0
e
,A

for all t 0 and A Ft .

Finally, x
PU
x is continuous and, for any stopping time and any F BC(RN ) measurable F : C(RN ) C(RN ) that is bounded below,
Z
{()<}

F (, ) PU
x (d)

Z
=

Z
F (,

{()<}

0
) PU
() (d )

PU
x (d).

Exercises for 10.3

449

Proof: By Lemma 10.3.24, eU is a ground state for V U with associated


eigenvalue 0. Thus, the existence of PU
x follows immediately from Theorem
(N )
U
10.3.26 with = e , as does the relation between this choice of PU
x and Wx
as well as the Markov property in the final statement. Moreover, by Lemma
10.3.28, we know that any P satisfying the initial condition and the stated martingale property is related to Brownian motion in the stated way. Therefore,
all that remains is to show that this relationship to Brownian motion determines P. But, by Corollary 10.3.29 with = eU , we know that such a P equals
(N )
(XU ) Wx , where XU is the mapping described in the paragraph preceding
that corollary. 
Exercises for 10.3
Exercise 10.3.34. Given , (0, ) and a, b RN , show that
Z

g (N ) (s, + a)g (N ) (t, + b) d = g (N ) 2 s + 2 t, b a .
RN

Hint: Note that


g (N ) (s, + a)g (N ) (t, + b) =

1 (N )
g

s
2 ,

g (N )

t
2 ,

Exercise 10.3.35. When N = 1, the considerations in 7.2.2 can be used to


give a reasonably explicit formula for pG (t, x, y). Namely, show that
p(0,) (t, x, y) = g (1) (t, y x) g (1) (t, x + y)
1

for (t, x, y) (0, ) (0, )2 ,

where g (1) (, ) = (2 ) 2 e 2 . In addition, referring to Corollary 7.3.4, show


that, for c R, r > 0, and (x, y) (c r, c + r),



p(cr,c+r) (t, x, y) = r1 g(1) r2 t, r1 (yx) r1 g(1) r2 t, r1 (x+y+22c)) ,
g (1) (t, x + 4m).
QN
Exercise 10.3.36. Set Q(a, R) = i=1 [ai R, ai + R] for a RN and R > 0.
Show that
where g(, ) =

mZ

pQ(a,R) (t, x, y)

Q(a,R)

N
Y
i=1

sin

N
N 2 Y
(xi ai + R)
(yi ai + R)
sin
dy = e 8R2 t
2R
2R
i=1

for (t, x, y) (0, ) Q(a, R)2 . Conclude that

N 2
1
log Wx(N ) ( Q(a,R) > t) =
t t
8R2
lim

for x Q(a, R).

450

10 Wiener Measure and P.D.E.s

Hint: First observe that it suffices to handle a = 0, R = 1, and N = 1. To prove


2
(1) 
, and show that u(t, (t)), Ft , Wx
the first part, set u(t, x) = e 4 t sin (x+1)
2
2
(1)
is a martingale. Given the first part, limt 1t log Wx ( (1,1) > t) 8 is
clear. To get the inequality in the opposite direction, note that p(1,1) (t, x, y)
p(R,R) (t, x, y) if R > 1, and use this to see that, for R > 1 and (t, x)
(0, ) (1, 1),
Z
2
(x + R)
(y + R)
.
dy e 8R2 t sin
p(1,1) (t, x, y) sin
2R
2R
(1,1)

Exercise 10.3.37. Let G be a non-empty, bounded, connected, open subset


(N )
of RN , and set w(t) = supxG Wx ( G > t) for t > 0. The purpose of this
exercise is to show that G limt 1t log w(t) exists and is an element of
(0, ).

(i) Show that w is sub-multiplicative in the sense that w(s + t) w(s)w(t), and
conclude from this that limt 1t log w(t) = supT >0 T1 f (T ) [, 0].
Hint: Set f (t) = log w(t). Because w takes values in (0, 1] and is non-increasing,
f is non-positive and bounded on compacts.
f (s+t)
 t  Further, f is sub-additive:
1
f (s)+f (t). Thus, given, T > 0, f (t) T f (T ), and so limt t f (t) T1 f (T )
for every T > 0. Conclude from this that limt 1t f (t) = supT >0 T1 f (T )
[, 0].

(ii) Refer to the notation in Exercise 10.3.36, set R1 = sup{r 0 : Q(a, r)


2
. In particG for some a G}, and show that G limt 1t log w(t) N
8R2
1

ular, G < .

(ii) Let R2 be the diameter of G, choose a RN so that G B(a, R2 ), and use


(N )
R2
the first part of Theorem 10.1.11 to show that EWx [ G ] N2 for all x G. In

log 2
> 0.
particular, conclude that w 2N 1 R22 12 and therefore that G N2R
2
2

Exercise 10.3.38. Again let G be a bounded, connected, open subset of


RN . Using spectral theory, the conclusions drawn in Exercise 10.3.37 can be
sharpened. Namely, this exercise outlines a proof that

X
G
(10.3.39)
p (t, x, y) =
etn n (x)n (y),
n=0

where {n : n 0} (0, ) is a non-decreasing sequence that tends to , {n :


n 0} Cb (G, R) is an orthonormal basis in L2 (G; R) of smooth functions,
0 < 1 , 0 > 0, and the convergence is uniform on [, ) G2 for each  > 0.
Finally, from (10.3.39), it will follow that
t

e 0 p(t, x, y) 0 (x)0 (y) 1 et , (t, x, y) [1, ) G2 ,
for some > 0. In particular, this means that 0 here is equal to G in Exercise
10.3.37.

Exercises for 10.3

451

G
(i) Let PG
t be the operator on Cb (G; R) whose kernel is p (t, x, y), and show
G
2
that Pt admits a unique extension to L (G; R) as a self-adjoint contraction.
Further, show that {PG
t : t > 0} is a continuous semigroup of non-negative
definite, self-adjoint contractions on L2 (G; R). Finally, show that

ZZ

p (t, x, y) dxdy =

pG (2t, x, x) dx

GG

|G|
N

(4t) 2

and therefore that each PG


t is HilbertSchmidt.
(ii) Knowing that the operators PG
t form a continuous semigroup of self-adjoint,
HilbertSchmidt (and therefore compact), non-negative definite contractions,
standard spectral theory2 guarantees that there exists a non-decreasing sequence
{n : n 0} [0, ) tending to and an orthonormal basis {n : n 0}
in L2 (G; R) such that etn n = PG
t n for all t (0, ) and n 0. Conclude
from this that n can be taken to be smooth and bounded. In addition, show
that PG
t 0 0 uniformly, and therefore that 0 > 0.
(iii) Show that
0
, PG
t

(*)


L2 (G;R)

etn , n


L2 (G;R)

0 , n


L2 (G;R)

n=0

for , 0 L2 (G; R), and conclude that





e0 = sup , PG
1 L2 (G;R) : kkL2 (G;R) = 1 .
Use (cf. the proof of Theorem 10.3.32) this to show that if n = 0 , then n
never changes sign and can therefore be taken to be non-negative. In particular,
show that this means that 1 > 0 and that 0 > 0.
(iv) Starting from (*), show that

X
n=0

tn

, n

2
L2 (G;R)

Z
=

(x)pG (t, x, y)(y) dxdy (2t) 2 kk2L1 (G;R)

GG

What is needed here is the variant of Stones Theorem that applies to semigroups. The
technical question which his theorem addresses is that of finding a simultaneous diagonalization
of the operators PG
t . Because we are dealing here with compact operators, this question can
be reduced to one about operators in finite dimensions, where it is quite easy to handle. For
a general statement, see, for example, K. Yoshidas Functional Analysis and its Applications,
Springer-Verlag (1971).

452

10 Wiener Measure and P.D.E.s

for any L2 (G; R), and use this to show that, for any M N and , 0
L2 (G; R),

tn

, n

n=M

0
e 2M

0



1
1

n L2 (G;R)
N kkL (G;R) k kL (G;R) .
L2 (G;R)
(t) 2

Next, given x, y G, set R = |x G| |y G|, and, for 0 < r R, apply


the preceding to see that
Z
Z



e t2M




tn
e
n (z) dz
n (z) dz

N .
B(x,r)
B(y,r)

(t) 2
n=M

Finally, by combining this with (*), reach the conclusion that


t

M
1
X
G
2e 2M
p (t, x, y)
etn n (x)n (y)
N ,
(t) 2
n=0

which, because M , certainly implies the asserted convergence result.


(v) To complete program, set = 1

t

e 0 p(t, x, y) 0 (x)0 (y)

0
1

(0, 1). Show that


! 12

etn n (x)2

t1
2

pG

t
2 , x, x

 12

pG

t
2 , y, y

! 12
etn n (y)2

n=1

n=1

 12

t1
2

(t) 2

Exercise 10.3.40. M. Kac3 made an interesting application of (10.3.39) to


a problem raised originally by the physicist H. Lorentz and solved, remarkably
quickly, by H. Weyl. What Lorentz noticed is that, if one takes Plancks theory of
black body radiation seriously, then the distribution of high frequencies emitted
should depend only on the volume of the radiator. In order to state Lorentzs
question in mathematical terms, let G be a non-empty, bounded, connected, open
subset of RN , let {n : n 0} be the eigenvalues, arranged in non-decreasing
order, of 12 with zero boundary conditions, and use N() to denote the
number of n 0 such that n . What Lorentz predicted was that the rate
at which N() grows as depends only on the volume |G| of G and on
nothing else about G. Thus, the original interest in the result was that the
asymptotic distribution of high frequencies is so insensitive to the shape of the
3

See Kacs wonderful article Can one hear the shape of drum?, Am. Math. Monthly 73 # 4,
pp. 123 (1966), or, better yet, borrow the movie from the A.M.S.

Exercises for 10.3

453

radiator. When Kac took up the problem, he turned it around. Namely, he asked
what geometric information, besides the volume, is encoded in the eigenvalues.
When he explained his program to L. Bers, Bers rephrased the problem in the
terms that Kac adopted for his title. Audiophiles will be disappointed to learn
that, according to C. Gordon, D. Webb, and S. Wolperts,4 one cannot hear the
shape of a drum, even a two dimensional one.
This exercise outlines Kacs argument for proving Weyls asymptotic formula
N
|G| 2
,
N ()
N
(2) 2 ( N2+1 )

in the sense that the ratio of the two sides tends to 1 as .


(i) Refer to Exercise 10.3.38, and show that, for each n 0,
1
2 n

= n n and lim n (x) = 0 for a reg G.


xG
xa

Thus, I will interpret the n s in Exercise 10.3.38 as the frequencies referred to


in Lorentzs problem.
(ii) Using (10.3.39), show that
Z
e

N (d) =

(0,)

X
n=0

tn

Z
=

pG (t, x, x) dx,

where N (d) denotes integration with respect the purely atomic measure on
(0, ) determined by the non-decreasing function
N ().
(iii) Using (10.3.8), show that
N

1 (2t) 2 pG (t, x, x) 1 E(t, x),

where E(t, x) 0 and, as t & 0, E(t, x) 0 uniformly on compact subsets of


G. Conclude that
Z
N
et N (d) = |G|.
lim (2t) 2
t&0

(0,)

At this point, Kac invoked Karamatas Tauberian Theorem,5 which relates the
asymptotics at infinity of an increasing function to the asymptotics at zero of
4

See their 1992 announcement in B.A.M.S., new series 27 (2), One cannot hear the shape
of a drum.
5 See, for example, Theorem 1.7.6 in N. Bingham, C. Goldie, and J. Teugels Regularly Varying
Functions, Cambridge U. Press (1987).

454

10 Wiener Measure and P.D.E.s

its Laplace transform. Given the preceding, Karamatas theorem yields Weyls
asymptotic formula. It should be pointed out that the weakness of Kacs method
is its reliance on the Laplace transform and Tauberian theory, which gives only
the principal term in the asymptotics. Further information can be obtained
using Fourier methods, which, in terms of partial differential equations, means
that one is replacing the heat equation by the wave equation, an equation about
which probability theory has embarrassingly little to say.
Exercise 10.3.41. It will have occurred to most readers that the relation between the Hermite heat kernel in (10.1.7) and the OrnsteinUhlenbeck process
in 8.4.1 is the archetypal example of what we have been doing in this section.
This exercise gives substance to this remark.
(i) Set (x) = e

|x|2
2

, and show that


2

1
2

12 |x|2 = N2 . By Lemma

10.3.24, is a ground state for |x|2 with associated eigenvalue N2 , a fact


that also can be verified by direct computation using (10.1.7). Show that the
1
1

measure Px is the distribution under W (N ) of {2 2 U(2t, 2 2 x, ) : t 0},


where U(t, x, ) is the OrnsteinUhlenbeck process described in (8.5.1).

(ii) Although it does not follow from Lemma 10.3.24, use (10.1.7) to show that
2
+ is also a ground state for |x|2 with associated N2 . (See Exercise 10.3.43.)
1

Also, show that Px+ is the W (N ) -distribution of { et x+2 2 V(2t, ) : t 0},


where {V(t, ) : t 0} is the process discussed in Exercise 8.5.14.
x2

x2

d
2
Exercise 10.3.42. Recall the Hermite polynomials Hn (x) = (1)n e 2 dx
ne
in 2.4.1. Show that the Hermite functions (although these are not precisely the
ones introduced in 2.4, they are obtained from those by rescaling)
1

2
4
n (x) = 2 1 e x2 Hn (2 12 x), n 0,
h
(n!) 2
form an orthonormal basis in L2 (R; R) and that
Z
n (x), n 0 and (t, x) (0, ) R,
n (y)h(t, x, y) dy = e(n+ 12 )t h
h

where h(t, x, y) is the function in (10.1.7) when N = 1. As a consequence, if


n (x) =
h

N
Y

hni (xi )

for n NN and x RN ,

i=1

n : n NN } is an orthonormal basis in L2 (RN ; R) and


show that {h
Z
n (x), n NN and (t, x) (0, ) RN .
n (y)h(t, x, y) dy = e(knk+ N2 )t h
h
R

Hint: Remember that

X
2
n
Hn (x) = ex 2 .
n!
n=0

Exercises for 10.3

455

Exercise 10.3.43. Part (ii) of Exercise 10.3.41 might lead one to question the
necessity of the boundedness assumption made in Lemma 10.3.24. However, that
would be a mistake because, in general, a positive solution to 12 + V =
need not be a ground state. For example, in this exercise we will show that

x4
although (x) = e 4 satisfies 12 x2 + V = 0 when V = 12 x6 + 3x2 , this is
not a ground state for V . The proof is based on the following idea. If were a
ground state, then Theorems 10.3.26 and its corollaries would apply, and so we
would know that the equation

Z
(*)

X(t, ) = (t) +

X(, )3 d

0
(1)

would have a solution on [0, ) for Wx -almost every C(R) for every x R.
The following steps show that this is impossible.
(i) Suppose that 1 , 2 C(R) and that 0 1 (t) 2 (t) for t [0, 1]. If
X( , 2 ) exists on [0, 1], show that X( , 1 ) exists on [0, 1].
Rt
Hint: Define X0 (t, ) = (t) and Xn+1 (t, ) = (t) + 0 Xn (, )3 d . First
show that if 0 1 (t) 2 (t), then 0 Xn ( , 1 ) Xn ( , 2 ). Second, if
supn0 kXn ( , )k[0,T ] < , show that Xn ( , ) converges uniformly on [0, T ]
to the unique solution to (*) on [0, T ].


1
(ii) Show that if (t) 1 for t [0, 1], then X(t, ) (1 2t) 2 for t 0, 12
and therefore X( , ) fails to exist after time 12 .

(1)
(iii) Show that W2 (t) 1 for t [0, 1] > 0, and conclude from this that
cannot be a ground state for V .

Chapter 11
Some Classical Potential Theory

In this concluding chapter I will discuss a few refinements and extensions of the
material in 10.2 and 10.3. Even so, I will be barely scratching the surface. The
interested reader should consult J.L. Doobs thorough account in Classical Potential Theory and Its Probabilistic Counterpart, published by SpringerVerlag
in 1984, or S. Port and C. Stoness Brownian Motion and Classical Potential
Theory, published by Academic Press in 1978.
11.1 Uniqueness Refined
In this section I will refine some of the uniqueness statements made in 10.2.
The improved statements result from the removal of the defect mentioned in
Remark 10.3.14. To be precise, recall that if G is an open subset of RN , then
G
sG () = inf{t s : (t)
/ G}, 0+
= lims&0 sG , and (cf. Lemma 10.2.11)
(N ) G
reg G is the set of x G such that Wx (0+
= 0) = 1. The main result
proved in this section is Theorem 11.1.15, which states that, for any x G and
(N )
Wx -almost all C(RN ), G () < = ( G ) reg G. However, I
will begin by amending the treatment that I gave in 10.3 of the Dirichlet heat
kernel pG (t, x, y).
11.1.1. The Dirichlet Heat Kernel Again. In 10.3, I introduced the
Dirichlet heat kernel pG (t, x, y). At the time, I was concerned with it only when
(x, y) G G, and so I defined it in such a way that it was 0 outside G G.
When G is regular in the sense that G = reg G, this choice is the obvious one,
since (cf. Theorem 10.3.9) it is the one that makes pG (t, , y) continuous on R
for each (t, y) (0, ) RN . However, when G is not regular, it is too crude
for the analysis here. Instead, from now on I will take
pG (t, x, y) =
(11.1.1)



W (N ) x 1 `t ( ) + t ( ) + y`t ( ) G, (0, t) g (N ) (t, y x),

and t ( ) = ( ) (t)`t ( ). Notice that the difference


where `t ( ) = t
t
between this definition and the one in 10.3.2 results from the replacement of
the closed interval [0, t] there by the open interval (0, t) here. That is, in 10.3.2,
pG (t, x, y) was given by



W (N ) x 1 `t ( ) + t ( ) + y`t ( ) G, [0, t] g (N ) (t, y x).
456

11.1 Uniqueness Refined

457

Of course, unless x, y G, the difference between these two disappears. On


the other hand, when either x or y is an element of G, there is a subtle, but
crucial, difference.
To relate the preceding definition to the considerations in 10.3.1, set Et () =
G
1[t,) 0+
() . Then (11.1.1) is equivalent to saying that pG (t, x, ) = q (t, x, y)

when q (t, x, y) is defined in terms of Et via (10.3.2). Hence, just as in the proof
of Theorem 10.3.3, one can use the results in 8.3.3 to check that pG (t, x, y) =
pG (t, y, x) is again true but that (10.3.4) has to be replaced by
Z

(N )

(y)pG (t, x, y) dy = EWx

(11.1.2)
RN


 G

((t) , 0+
() t .

However, the analog here of the ChapmanKolmogorov equation (10.3.5) presents something of challenge. To understand this challenge, note that t
Et
fails to satisfy (10.3.1). Indeed,


Es+t
() = 1G (s) Es ()Et ().

(11.1.3)

Thus, repeating the argument used in the proof of Theorem 10.3.3 to derive
(10.3.5), one finds that
(11.1.4)

p (s + t, x, y) =

pG (s, x, z)pG (t, z, y) dz,

which, because the integral is over G and not RN , is a flawed version of the
ChapmanKolmogorov equation. In order to remove this flaw, I will need the
following lemma.
Lemma 11.1.5. For each (t, x) (0, ) RN ,
G
Wx(N ) ( G = t) = 0 = Wx(N ) (0+
= t),

and therefore
Z
(11.1.6)
RN

h
i
 G
(y)pG (t, x, y) = Wx(N ) (t) , 0+
() > t

for all Borel measurable : RN R that are bounded below. In particular,


pG (t, x, y) = 0 for Lebesgue-almost every y
/ G.
Proof: Set
Z
() =
RN


Wy(N ) G > 0,I (dy),

(0, ).

458

11 Some Classical Potential Theory

Obviously, is a right-continuous, non-increasing, [0, 1]-valued function, and, as


such, it has only countably many discontinuities. Hence, there is a countable set
(0, ) such that

/ = Wy(N ) ( G = ) = 0

for Lebesgue-almost every y RN .

Now let (t, x) (0, ) RN be given, and choose s (0, t) so that t s


/ .
Then, by the Markov property and (10.2.10),



G
G
Wx(N ) 0+
= t = Wx(N ) 0+
> s & G s = t s Wx(N ) G s = t s
Z

(N )
=
Wx+y G = t s 0,sI (dy) = 0.
RN

G
In addition, because G () = t = 0+
() = t when t > 0, it follows that
(N ) G
Wx ( = t) = 0 also.
Given the preceding, it is clear how to pass from (11.1.2) to (11.1.6). Finally,
by applying (11.1.6) with = 1G{ , we see that
Z

G
pG (t, x, y) dy = Wx(N ) (t)
/ G & 0+
() > t = 0,
G{

which says that pG (t, x, ) vanishes Lebesgue-almost everywhere on G{. 


Because of the final part of Lemma 11.1.5, we can now replace the preceding
flawed version of the ChapmanKolmogorov equation by
Z
G
(11.1.7) p (s+t, x, y) =
pG (s, x, z)pG (t, z, y) dz, (t, x, y) (0, )(RN )2 .
RN

Before completing this discussion, I want to develop a Duhamel formula for


pG . That is, I want to show that
(11.1.8)

pG (t, x, y) =g (N ) (t, y x)
h
i
 G
(N )
G
G
EWx g (N ) t 0+
(), y (0+
) , 0+
() < t

for all (t, x, y) (0, ) (RN )2 , and the idea is very much the same as the one
used to prove (10.3.8). Thus, for (0, 1), set



q (t, x, y) = W (N ) x 1 `t ( ) + t ( ) + y`t ( ) G, (0, t) g (N ) (t, y x).
Obviously, q (t, x, y) & pG (t, x, y) as % 1. In addition, proceeding as in the
proof of Theorem 10.3.3, one finds that q (t, x, ) is continuous and that
Z
h
i
 G
(N )
(*)
(y)q (t, x, y) dy = EWx (t) , 0+
() t .
RN

11.1 Uniqueness Refined

459

Now use the Markov property to justify


Z
(y)g (N ) (t, y x) dy
RN
h
i
h
i


(N )
(N )
= EWx (t) , sG () t + EWx (t) , sG () < t
h
i

(N )
= EWx (t) , sG () t
Z


(N )
Wx
(N )
G
G
G
+E
(y)g
t s (), y (s ) dy, s () < t .
RN

for all (0, 1), t (0, ) and s (0, t). Thus, by (*), after letting s & 0,
we see that
Z
(y)q (t, x, y) dy
RN
Z
=
(y)g (N ) (t, y x) dy
RN
Z


(N )
G
G
G
EWx
(y)g t 0+
(), y (0+
) dy, 0+
() < t .
RN

Because q (t, x, ) is continuous, this means that


(N )

q (t, x, y) = g (N ) (t, y x) EWx

 (N )
 G

G
G
g
t 0+
(), y (0+
) , 0+
() < t ,

and so (11.1.8) follows when one lets % 1.


11.1.2. Exiting Through reg G. The purpose of this subsection is to prove
that when Brownian motion exits from a region, it does so through regular
points. My proof of this fact follows the reasoning in the book, cited above, by
Port and Stone.
Lemma 11.1.9. Let G be a non-empty, connected open subset of RN , and define
pG by (11.1.1). Then, for each (t, x, a) (0, )RN reg(G) reg G(RN \ G,
pG (t, x, a) = 0. On the other hand, if (t, x) (0, ) G, then pG (t, x, a) > 0
for all a G \ reg G. In particular, G \ reg G has Lebesgue measure 0.
Next, suppose that a
Proof: Obviously, pG (t, x, a) = 0 if a RN \ G.
G
reg G. Then, by (11.1.8), p (t, a, x) = 0 for all (t, x) (0, ) RN , and so, by
symmetry, the same is true of pG (t, x, a).
To go in the other direction when (t, x) (0, ) G, let a G be given,
and begin with the observation
that (t, x) (0, ) G 7 pG (t, x, a) is in

1,2
C
(0, ) G; [0, ) and satisfies t pG (t, x, a) = 12 x pG (t, x, a). To check
this, use (11.1.4) to write
Z
G
p (t, x, a) =
pG (t s, x, z)pG (s, z, a) dz
G

460

11 Some Classical Potential Theory

for any 0 < s < t, and note that pG (s, , a) is bounded. Hence, the desired
conclusions follow from (10.3.12) and the argument used to prove the last part of
Theorem 10.3.9. Next, suppose that pG (t0 , x0 , a) = 0 for some (t0 , x0 ) (0, )
G. Then, by the strong minimum principle (cf. Theorem 10.1.6), pG (t, x, a) = 0
for all (t, x) (0, t0 ) G. But this, by (11.1.2) and symmetry, means that, for
t (0, t0 ),
Z
Z
(N ) G
G
Wa (0+ t) =
p (t, a, y) dy =
pG (t, x, a) dx = 0,
RN

where I have used the final part of Lemma 11.1.5 to get the second equality.
Hence, pG (t0 , x0 , a) = 0 = a reg G.
Finally, because, by the preceding and symmetry, for any x G, G \ reg G
is contained in {y
/ G : p(1, x, y) > 0}, and, by Lemma 11.1.5, the latter set
has Lebesgue measure 0, it is clear the G \ reg G has Lebesgue measure 0. 
I next introduce the function
(N )

v G (x) EWx

(11.1.10)

 G 
e 0+ ,

x RN .

Since, by the Markov property,


Z
(N ) 
(N ) 
G
G
s
e
g (N ) (s, y x)EWy e dy = EWx es % v G (x)
RN

as s & 0, it is clear that v G is lower semicontinuous. In addition, it is obvious


that v G 1 everywhere and that



x RN : v G (x) = 1 = reg(G) = reg G RN \ G .

Lemma 11.1.11. Define the Borel measure G on RN by 1


Z
h G
i
(N )
G
G
() =
EWx e0+ () , (0+
) dx.
RN

Then

is supported on G{, and if


Z
r(x) =
et g (N ) (t, x) dt,

x RN ,

(0,)

then
(11.1.12)

r(y x) G (dy),

v (x) =

x RN .

RN

In particular, G is always locally finite and is therefore finite in the case when
G{ is compact. Finally, for any non-empty, open set H RN ,
h H
i
(N )
H
(11.1.13) G{ reg(H) = v G (x) = EWx e0+ v G (0+
) , x RN ,

where reg(H) = reg H (RN \ H).


1

G
0+
()

Below I use the convention that e

G () = . Thus, the problem of


= 0 when 0+
G
0+
()

G ) meaning when G () = does not arise in integrals having e


giving (0+
0+
factor in their integrands.

as a

11.1 Uniqueness Refined

461

Proof: Clearly G is supported on G{. To prove (11.1.12), note that the


symmetry of pG (t, x, y) together with (11.1.8) imply that
h
i
 G
(N )
G
G
EWx g (N ) t 0+
(), y (0+
) , 0+
() < t
h
i
 G
(N )
G
G
= EWy g (N ) t 0+
(), x (0+
) , 0+
() < t
for all (t, x, y) (0, ) RN RN . Hence, after multiplying by et and integrating with respect to t (0, ), one arrives at
h G
h G
i
i
(N )
(N )
G
G
EWx e0+ () r (0+
) y = EWy e0+ () r (0+
)x .
But

Z
r(x y) dy = 1,

x RN ,

RN

and so (11.1.12) follows after one integrates the preceding over y RN and
applies Tonellis Theorem.
Given (11.1.12) and the fact that r is uniformly positive on compacts, it
becomes obvious that G must be always locally finite and finite when G{ is
compact. Thus, all that remains is to check (11.1.13). But clearly, after multiplying (11.1.8) with G = H throughout by et and integrating with respect to
t (0, ), one gets
Z
h G
i
(N )
G
r(x y) =
et pH (t, x, y) dt + EWx e0+ r (0+
)y .
(0,)

Hence, since, by the first part of Lemma 11.1.9 with G = H, pH (t, x, ) vanishes
on reg(H), (11.1.13) follows after one integrates the preceding with respect to
G (dy) and uses (11.1.12). 
Lemma 11.1.14. If G{ is compact and, for some [0, 1), v G  G{ , then

(N ) G
Wx 0+
< = 0 for every x RN .
G
Proof:
by checking that
that
 I begin
v everywhere. Thus, suppose
N
G
H = x R : v (x) > +  6= for some  > 0. Because v G is lower
semicontinuous, H is open. I will derive a contradiction by first showing that
G{ reg(H) and then applying (11.1.13). To carry out the first step, use
(11.1.12) to see that, for any s (0, ),
Z

Z
G
t
(N )
G
v (x)
e
g (t, y x) (dy) dt
s
RN
Z

Z
s
t
(N )
G
=e
e
g (s, y x)v (y) dy dt
(0,)

( +

)Wx(N )

RN



H
(s) H es ( + )Wx(N ) 0+
>s ,

462

11 Some Classical Potential Theory


(N )

H
and so, after letting s & 0, we have that v G (x) ( + )Wx (0+
> 0).
(N ) H
In particular, if x
/ G, then ( + )Wx (0+ > 0), which means that
(N ) H
x
/ G = Wx (0+ > 0) < 1. Hence, because (cf. part (ii) of Exercise
(N ) H
10.2.19) Wx (0+
> 0) {0, 1}, this means that x
/ G = x reg(H) and
therefore that (11.1.13) applies. But if x H, (11.1.13) yields the contradiction
(N )

+  < v G (x) = EWx

h H
i
H
e0+ v G (0+
) < + ,

H
H
since 0+
() < = (0+
)
/ H. That is, I have shown that H must be
empty.
Knowing that v G everywhere, I now want to argue that G (RN )
G (RN ). Since G (RN ) < , this will show that G = 0 and therefore, by
(N ) G
(11.1.12), that v G 0, which is the same as saying that Wx (0+
< ) = 0
everywhere. Thus, let K = G{, and set Kn = {x : dist(x, K) n1 } and
Gn = Kn { for n 1. Clearly, K RN \ Gn reg(Gn ), and so, by (11.1.12) and
Tonellis Theorem,
Z
Z
G N
Gn
G
(R ) =
v (x) (dx) =
v G (y) Gn (dy) Gn (RN ).

RN

RN

Thus, all that we have to do is check that Gn (RN ) & G (RN ) when n .
But
Z
Gn
N
(R ) =
v Gn (x) dx
RN

and G1 (RN ) < . Hence, by the Monotone Convergence Theorem, it is enough


for us to know that v Gn (x) & v G (x) for Lebesgue-almost every x RN . Because
(N )
Gn
G
x Gn implies 0+
= Gn % G = 0+
Wx -almost surely, 1 v Gn & v G on
Gn
G
G. At the same time, 1 v
v = 1 on reg(G), and, by the last part of
Lemma 11.1.9, G{ \ reg(G) = G \ reg G has Lebesgue measure 0. 
Theorem 11.1.15. For every open G RN ,


G
G
Wx(N ) 0+
() < & (0+
)
/ reg G = 0 for all x G.
(N )

G
Proof: Suppose not. Because Wy (0+
> 0) {0, 1} for all y RN , we could
then find an x G and a > 0 for which


G
G
Wx(N ) 0+
() < & (0+
) > 0,

where

o
n

G
= y G : Wy(N ) 0+
12 .

11.1 Uniqueness Refined

463

But then there would exist a compact K for which





K{
G
G
Wx(N ) 0+
< Wx(N ) 0+
() < & (0+
) K > 0.
On the other hand, because K{ G, v K{ v G everywhere, and therefore,
because v G (y) 12 1 + e < 1 for y K, Lemma 11.1.14 would say that

(N ) K{
Wx 0+
< = 0, which is obviously a contradiction. 
11.1.3. Applications to Questions of Uniqueness. My main reason for
wanting the result in Theorem 11.1.15 is that it allows me to improve on the
uniqueness results that were proved in 10.2.3 and 10.3.1. For example, by
the comment in Remark 10.3.14, we can now remove the assumption that G =
reg G from the uniqueness assertion in Corollary 10.3.13.

Theorem 11.1.16. Let G be an open subset of RN and Cb (G; R). Then


Z
 G

(N ) 
Wx
(t, x) (0, ) G 7 E
(t) , () > t =
(y)pG (t, x, y) dy R
G

is the one and only bounded, smooth solution to the boundary value problem
described in Corollary 10.3.13.
More interesting are the improvements that Theorem 11.1.15 allows me to
make to the results in 10.2.3.
Theorem 11.1.17.
f : G R, set
(11.1.18)

Given an open G RN and a bounded Borel measurable


(N )

uf (x) = EWx




f ( G ) , G () < ,

for x G.

Then uf is a bounded harmonic function on G and limxa uf (x) = f (a) whenxG


ever a regG is a point at which f is continuous.
Furthermore, if f

2
Cb G; [0, ) and u is an element of C G; [0, ) that satisfies
u 0

in G and

lim u(x) f (a) for a reg G,

xa
xG


then uf u. In particular, if f Cb G; R , then uf is the one and only
harmonic function u on G with the properties that



u(x) CWx(N ) G < for all x G,
for some C < and
lim u(x) = f (a) for each a reg G.

xa
xG

464

11 Some Classical Potential Theory

Proof: The initial assertions are covered already by Theorem 10.2.14. Next,
let f Cb (G; R) be given, and suppose that u is an element of C 2 G; [0, )
which satisfies the conditions
 in the second assertion. To prove that uf u, set
Ft = {( ) : [0, t]} , and choose a sequence of bounded, open subsets Gn
(N ) 
so that Gn G and Gn % G. Then, for each n 1, u (t Gn ), Ft , Wx
is a submartingale, and so we know that, for each x G, u(x) dominates

lim

(N )

lim EWx


u (T Gn ) lim




u ( Gn ) , G T

T % n

T % n
(N )

Wx

(N )

lim EWx

h
i

f ( G ) , G < = uf (x),

where, in the passage to the last line, I have used Fatous Lemma and Theorem
11.1.15.
Finally, let f Cb (G; R) be given. What I still have to show is that if u
is a harmonic function on G which tends to f at points in reg G and satisfies
(N )
|u(x)| CWx ( G < ) for some C < , then u = uf . Thus, suppose u is
such a function, and set M = C + kf ku . Then, by the preceding, we have both
that


M Wx(N ) G < + u(x) uM 1+f (x) = M Wx(N ) G < + uf (x)
and that


M Wx(N ) G < u(x) uM 1f (x) = M Wx(N ) G < uf (x),
which means, of course, that u = uf . 
As an immediate consequence of Theorem 11.1.17, we have the following.
Corollary 11.1.19. Assume that
(11.1.20)

Wx(N ) ( G < ) = 1 for all x G.

Then, for each f Cb (G; R) the function uf in (11.1.18) is the one and only
bounded, harmonic function u on G which satisfies limxa u(x) = f (a) for every
xG
a reg G. In particular, this will be the case if G is contained in a half-space.
In order to go further, it will be helpful to have the following lemma.
Lemma 11.1.21. Let G be a non-empty, connected, open set in RN . Then

reg G = Wx(N ) G < = 0 for all x G.
On the other hand, if reg G 6= and b G, then


/ BRN (b, r) & G < > 0.
b
/ reg G lim lim Wx(N ) ( G )
r&0 xb
xG

11.1 Uniqueness Refined

465

Proof: The equivalence


reg G = Wx(N ) ( G < ) = 0,

x G,

follows immediately from Theorems 11.1.15 and 11.1.17.


Now assume that reg G 6= , and let b G. If b reg G, then


lim lim Wx(N ) ( G )
/ BRN (b, r) & G < = 0
r&0 xb
xG

follows from (10.2.13). Thus, suppose that b


/ reg G. Choose a reg G,
1
and set B = BRN (b, r), where 0 < r 2 |a b|. One can then construct an
f C G; [0, 1] with the properties that f = 0 on B G and f (a) = 1. In
particular,


0 uf (x) Wx(N ) ( G )
/ B & G < 1 for all x G,

and so we need only check that limxb uf (x) > 0. To this end, first note that,
xG

since

lim uf (x) = f (a) = 1,

xa
xG

the Strong Minimum Principle (cf. Theorem 10.1.6) says that uf > 0 everywhere
in G. Next, because b is not regular, we can find a > 0 and a sequence
{xn : n 1} G such that xn b and

) G
 inf+ Wx(N
> > 0.
n
nZ

Moreover, by the Markov property, we know that


i Z

(N ) 
Wx n
G
G
uf (xn ) E
f ( ) , < < =
uf (y) pG (, xn , y) dy.
G

At the same time, we know that pG (, xn , y) g (N ) (, y xn ), and therefore


that
Z

sup
pG (, xn , y) dy
2
+
nZ
G\K

for some compact subset K of G. Hence,

lim uf (x) lim uf (xn )

xb
xG


inf uf (y) > 0. 
2 yK

As a consequence of Lemma 11.1.21, I will now show that solutions to the


Dirichlet problem will not, in general, approach the correct value at points outside of reg G.

466

11 Some Classical Potential Theory

Theorem 11.1.22. Let G be a connected open set in RN , and assume


that

reg G 6= . If b G \ reg G, then there exists an f C G; [0, 1] which has
the property that
lim uf (x) 6= f (b).
xb
xG

Proof: Given b, use Lemma 11.1.21 to find an r (0, ) so that




lim Wx(N ) ( G )
/ B(b, r) & G < > 0,
xb
xG

and construct f so that f 1 on G B(b, r){ and f (b) = 0. Then f (b) <
limxb uf (x). 
xG

I next take a closer look at the conditions under which we can assert the
uniqueness of solutions to the Dirichlet problem. To begin, observe that, by
Corollary 11.1.19, the situation is quite satisfactory when (11.1.20) holds. In
fact, the same line of reasoning which I used there shows that the same conclusion

(N )
holds as soon as one knows that Wx G < is bounded below by a positive
(N )
constant; and therefore, because x G 7 Wx ( G < ) is a bounded
harmonic function which tends to 1 at reg G, Theorem 11.1.17 tells us that


(11.1.23)
inf Wx(N ) G < > 0 = inf Wx(N ) G < = 1.
xG

xG

I will close this discussion of the Dirichlet problem with two results which
reflect the transience of Brownian paths in three and higher dimensions and
their recurrence in one and two dimensions.
Theorem 11.1.24. Assume that N 3, and let G be a nonempty, connected,
open subset of RN . If f Cc (G; R), then uf is the one and only bounded
harmonic function u on G which tends to f at reg G and satisfies
(11.1.25)

lim u(x) = 0.

|x|
xG

Proof: We already know that uf is a bounded harmonic function which tends


to f at reg G, but we must still show that it satisfies (11.1.25). For this purpose,
choose r (0, ) so that f is supported in B(0, r). Then (cf. the last part of
Theorem 10.1.11), because N 3,



uf (x) kf ku Wx(N ) r < 0 as |x| .
To prove that uf is the only such function u, select bounded open sets Gn % G
with Gn G, and note that, for each T (0, ),
h
i
(N )
u(x) = lim EWx u (T Gn )
n
h
i
h
i


(N )
(N )
= EWx f ( G ) , G T + EWx u (T ) , T < G <
h
i

(N )
+ EWx u (T ) , G = .

11.1 Uniqueness Refined

467

Clearly,
(N )

uf (x) = lim EWx


T %

and

(N )

lim EWx

T %

h
i

f ( G ) , G T

i

u (T ) , T < G < = 0.



Finally, because N 3 and, therefore, by Corollary 10.1.12, (T ) as
(N )
T % for Wx -almost every C(RN ), (11.1.25) guarantees that
(N )

lim EWx

T %

h
i

u (T ) , G = = 0,

which completes the proof that u = uf . 


The situation when N {1, 2} is more complicated.
Theorem 11.1.26.
RN ,

If N {1, 2}, then for every non-empty, open set G in


Wx(N ) G < ) = 1 for all x G or Wx(N ) G < = 0 for all x G,
depending on whether reg G 6=  or reg G = . Moreover, if reg G = , then the
only functions u C 2 G; [0, ) satisfying u 0 are constant. In particular,
either reg G = , and there are no non-constant, nonnegative harmonic functions
on G, or reg G 6= , and, for each f Cb (G; R), uf is the unique bounded
harmonic function on G which tends to f at reg G.

(N )
Proof: Suppose that Wx0 G < < 1 for some x0 G, and choose open
sets Gn % G
so that x0 G1 and Gn G for all n Z+ . Given u

C 2 G; [0, ) with u 0, set



Xn (t, ) = 1(t,] Gn () u (t)

for (t, ) [0, ) C(RN ).

(N ) 
(N )
Then Xn (t), Ft , Wx0 is a non-positive,
right-continuous, Wx0 -submartin
gale when Ft = {( ) : [0, t]} . Hence, since

Xn (t, ) % X(t, ) 1(t,] ( G ) u (t)

pointwise as n ,

an application of The Monotone Convergence Theorem allows us to conclude


(N ) 
that X(t), Ft , Wx0
is also a non-positive, continuous, submartingale. In
particular, by Theorem 7.1.10, this means that

)
lim u (t) exists for Wx(N
-almost every { G = }.
0

468

11 Some Classical Potential Theory


(N )

At the same time, by Theorem 10.2.3, we know that, for Wx0 -almost every
C(RN ),
Z

1U (t) dt = for all open U 6= .
0


(N )
Hence, since Wx0 G = > 0, there exists a 0 C(RN ) with the properties
that (0) = x0 , G (0 ) = ,
Z


1U 0 (t) dt = for all open U 6= , and lim u 0 (t) exists,
t

which is possible only if u is constant. In other words, we have now proved that

(N )
when Wx0 ( G < ) < 1 for some x0 G, then the only u C 2 G; [0, )
with u 0 are constant.
Given the preceding paragraph, the rest is easy. Indeed, if reg G = , then
(N )
Theorem 11.1.15 already implies that Wx ( G < ) = 0 for all x G. On the

(N )
other hand, if a reg G but Wx0 G < < 1 for some x0 G, then the
(N )
(N )
preceding paragraph applied to x
Wx ( G < ) says that Wx ( G < )
is constant, which leads to the contradiction
) G
1 > Wx(N
( < ) = xa
lim Wx(N ) ( G < ) = 1. 
0
xG

11.1.4. Harmonic Measure. We now have a rather complete abstract analysis of when the Dirichlet problem can be solved. Indeed, we know that, at least
when f Cc (G; R), one cannot do better than take ones solution to be the
function uf given by (11.1.18). For this reason, I will call

(11.1.27)
G (x, ) Wx(N ) ( G ) , G () <
the harmonic measure for G based at x G of the set BG . Obviously,
Theorem 11.1.15 says that G (x, G \ reg G) = 0, and
Z
uf (x) =
f () G (x, d).
G

This connection between harmonic measure and Wieners measure is due to


Doob,2 and it is the starting point for what, in the hands of G. Hunt,3 became
an isomorphism between potential theory and the theory of Markov processes.
2

Actually, S. Kakutanis 1944 article, Two dimensional Brownian motion and harmonic functions, Proc. Imp. Acad. Tokyo 20, together with his 1949 article, Markoff process and the
Dirichlet problem, Proc. Imp. Acad. Tokyo 21, are generally accepted as the first place in
which a definitive connection between the harmonic functions and Wieners measure was established. However, it was not until with Doobs Semimartingales and subharmonic functions,
T.A.M.S. 77, in 1954 that the connection was completed.
3 In 1957, Hunt published a series of three articles: Markov processes and potentials, parts
I, II, & III, Ill. J. Math. 1 & 2. In these articles, he literally created the modern theory of
Markov processes and established their relationship to potential theory. To see just how far
Hunts ideas can be elaborated, see M. Sharpes General Theory of Markov Processes, Acad.
Press Series in Pure & Appl. Math. 133 (1988).

11.1 Uniqueness Refined

469

Although (11.1.27) provides an intuitively appealing formula for the harmonic


measure G (x, ), it hardly can be considered explicit. Thus, in this subsection
I will write down two important examples in which explicit formulas for the
harmonic measure are readily available. The first example is the one discussed
in Exercise 10.2.22, namely, when G is a half-space. To be precise, if N = 1
and G = (0, ), then, because one-dimensional Wiener paths hit points, it is
clear that (0,) (x, ) is nothing but the point mass 0 for all x (0, ). On
N 1
the other hand, if N 2 and G = RN
(0, ), then we know from
+ R
Exercise 10.2.22 and (3.3.19) that, for y (0, ),

N
R+ (0, y), d =

y
2
N 1 (d),
N 1 y 2 + ||2  N2 R

y (0, ),

N 1
where N 1 is the surface area of SN 1 and I have identified RN
+ with R
and used RN 1 to denote Lebesgue measure on RN 1 . Hence, after a trivial
translation,


N
R+ (x, y), d =

y
2
N 1 (d)
N 1 y 2 + |x |2  N2 R

for

(x, y) RN 1 (0, ).

Moreover, by using further translation plus Wiener rotation invariance (cf. (ii) in
Exercise 4.3.10), one can pass easily from the preceding to an explicit expression
of the harmonic measure for an arbitrary half-space.
In the preceding, we were able to derive an expression giving the harmonic
measure for half-spaces directly from probabilistic considerations. Unfortunately, half-spaces are essentially the only regions for which probabilistic reasoning yields such explicit expressions. Indeed, embarrassing as it is to admit,
it must recognized that, when it comes to explicit expressions, the time-honored
techniques of clever changes of variables followed by separation of variables are
more powerful than anything which comes out of (11.1.27). To wit, I have been
unable to give a truly probabilistic derivation of the classical formula given in
the following.
Theorem 11.1.28 (Poisson Formula). Use SN 1 to denote the surface
measure on the unit sphere SN 1 in RN , and define
(N ) (x, ) =

N 1

1 |x|2
|x |N

for (x, ) B(0, 1) SN 1 .

Then:
B(0,1) (x, d) = (N ) (x, ) SN 1 (d),

for x B(0, 1).

470

11 Some Classical Potential Theory

More generally, if c RN , r (0, ), and SN 1 (c,r) denotes the surface measure


on the sphere SN 1 (c, r) B(c, r), then
B(c,r) (x, d) =

r2 |x c|2
SN 1 (c,r) (d),
N 1 r |x |N
1

x B(c, r).

Equivalently, for each open G in RN , harmonic function u on G, B(c, r) G,


and x B(c, r),
Z

u(x) =
u(c + r) (N ) xc
r , SN 1 (d).
SN 1

In particular, if {un : n 1} is a sequence of harmonic functions on the open


set G and if un u boundedly and pointwise on compact subsets of G, then
u is harmonic on G and un u uniformly on compact subsets. (See Exercise
11.2.22 for another approach.)
Proof: Set B = B(0, 1). Clearly, everything except the final assertion follows
by scaling and translation once we identify (N ) as the density for B . To make
this identification, first check, by direct calculation, that (N ) ( , ) is harmonic
in B for each SN 1 . Hence, in order to complete the proof, all that we have
to do is check that Z
f () (N ) (x, ) SN 1 (d) = f (a)

lim

xa
xB

SN 1

for every f C SN 1 ; R) and a SN 1 . Since, for each > 0, it is clear that


Z
lim
(N ) (x, ) SN 1 (d) = 0,
xa
xB

SN 1 B(a,){

we will be done Zas soon as we show that


(N ) (x, ) SN 1 (d) = 1

for all x B.

SN 1

But, because, for each SN 1 , (N ) ( , ) is harmonic in B and, by (10.2.7),


SN 1 (0,r)
for each r (0, ),
B(0,r) (0, ) =
N 1 rN 1

we have that, for r [0, 1) and SN 1 ,


Z
(N )
1 = N 1 (0, ) =
(N ) (r, ) SN 1 (d)
SN 1

Z
=

(N ) (r, ) SN 1 (d),

SN 1

where, in the final step, I have used the easily verified identity
(N ) (r, ) = (N ) (r, )

2
for all r [0, 1) and (, ) SN 1 .

Thus, by writing x = r, we obtain the desired identity. 


When N = 2, one gets the following dividend from Theorem 11.1.28.

11.1 Uniqueness Refined

471

Corollary 11.1.29. Set D(r) = B(0, r) in R2 for r (0, ). Then

|x|2 r2
r|x|2

S1 (0,r) (d)
2 |x|2 r2 x 2

for each x
/ D(r). In particular, if u Cb R2 \ D(r); R is harmonic on
R2 \ D(r), then
Z
|x|2 r2
|x|2
u(x) =

u(r)S1 (d),
2 S1 |x|2 rx 2
(11.1.30)

\D(r)

(x, d) =

and so
(11.1.31)

1
lim u(x) =
2
|x|

Z
S1

u(r) S1 (d).

Proof: After an easy scaling argument, I may and will assume that r = 1.
Thus, set D = D(1), and
that u Cb R2 \ D; R is harmonic in R2 \

 assume

x
for x D \ {0}. Obviously, v is bounded and
D. Next, set v(x) = u |x|
2
continuous. In addition, by using polar coordinates, one can easily check that v
is harmonic in D \ {0}. In particular, if (0, 1) and G() B \ B(0, ), then
h
i
h
i


(N )
(N )
v(x) = EWx v (1 ) , 1 < + EWx v ( ) , < 1 , x G(),

where the notation is that in Theorem 10.1.11. Hence, because, by that theorem,
(N )
% (a.s., Wx ) as & 0, this leads to
Z
h
i

(N )
1
1 |x|2
Wx
v(x) = E
v (1 ) , 1 < =

u() S1 (d)
2 S1 x 2

for all x D \{0}. Finally, given the preceding, the rest comes down to a simple
matter of bookkeeping. 
As a second application of Poissons formula, I make the following famous observation, which can be viewed as a quantitative version of the Strong Minimum
Principle (cf. Theorem 10.1.6) for harmonic functions.
Corollary 11.1.32 (Harnacks Principle).
(0, ),

rN 2 r |x c| B(c,r)
(c, )
N 1
r + |x c|

For any c RN and r

(11.1.33)

B(c,r)


rN 2 r + |x c| B(c,r)
(c, ).
(x, )
N 1
r |x c|

472

11 Some Classical Potential Theory

for all x B(c, r). Hence, if u is a non-negative, harmonic function on B(c, r),
then


rN 2 r + |x c|
rN 2 r |x c|
(11.1.34)
N 1 u(c).
N 1 u(c) u(x)
r |x c|
r + |x c|

In particular, if G is a connected region in RN and {un : n 1} is a nondecreasing sequence of harmonic functions on G, then either limn u(x) =
for every x G or there is a harmonic function u on G to which {un : n 1}
converges uniformly on compact subsets of G.
Proof: The inequalities in (11.1.33) are immediate consequences of Poissons
formula and the triangle inequality; and, given (11.1.33), the inequalities in
(11.1.34) comes from integrating the inequalities in (11.1.33). Finally, let a
connected, open set G and a nondecreasing sequence {un : n 1} of harmonic functions be given. By replacing un with un u0 if necessary, I may
and will assume that all the un s are nonnegative. Next, for each x G, set
u(x) = limn un (x) [0, ]. Because (11.1.34) holds for each of the un s and
B(c, r) G, the Monotone Convergence Theorem allows us to conclude that
it also holds for u itself. Hence, we know that both {x G : u(x) = } and
{x G : u(x) < } are open subsets of G, and so one of them must be empty.
Finally, assume that u < everywhere on G, and suppose that B(c, 2r) G.
Then, by the right-hand side of (11.1.34), the un s are uniformly bounded on

B c, 3r
2 , and so, by the last part of Theorem 11.1.28, we know that u is harmonic and that un u uniformly on B(c, r). 

Notice that, by taking c = 0 and letting r % in (11.1.34), one gets an


easy derivation of the following general statement, of which we already know a
sharper version (cf. Theorem 11.1.26) when N {1, 2}.
Corollary 11.1.35 (Liouville Theorem). The only nonnegative harmonic
functions on RN are constant.
Exercises for 11.1
Exercise 11.1.36. As a consequence of (11.1.31), note that if u is a bounded
harmonic function in the exterior of a compact subset of R2 , then u has a limit
as |x| . Show (by counterexample) that the analogous result is false in
dimensions greater than two.
Exercise 11.1.37. Once I reduced the problem to that of studying v on D\{0},
the rest of the argument which I used in the proof of (11.1.31) was based on a
general principle. Namely, given an open G, a K G, and a harmonic function
on G \ K, one says that K is a removable singularity for u in G if u admits
a unique harmonic extension to the whole of G.

Exercises for 11.1

473

(ii) Let K RN , and take K () = inf{t > 0 : (t) K} to be the first


positive entrance time of C(RN ) into K. Given an open G K, show
that

(11.1.38)
Wx(N ) K < G = 0 for all x G \ K
if and only if K reg (G \ K) = , and use the locality proved in Lemma 10.2.11
to conclude that (11.1.38) for some G K is equivalent to K reg (G \ K) =
for all G K. In particular, conclude that (11.1.38) holds for some G K
if and only if


(11.1.39)
Wx(N ) t [0, ) (t) K = 0 for all x
/ K.
(iii) Let K RN be given, and assume that (11.1.39) holds. Given G K
and a u C(G; R) which is harmonic on G \ K, show that K is a removable
singularity for u in G.
Hint: Begin by choosing a bounded open set H K so that H G. Next,
set
n
o

1
dist K, H{ ,
n () = inf t > 0 : dist (t), K 2n

and define un on H by
(N )

un (x) = EWx

i

u ( H ) , H < n .

Show that, on the one hand, un u on H \ K, while, on the other hand,


h
i

(N )
lim un (x) = EWx u ( H ) , H <
n

for all x H.
(iii) Let K be a compact subset of RN and a connected G K be given.
Assuming either that N 3 or that reg G 6= , show that (11.1.39) holds if K
is a removable singularity in G for every bounded, harmonic function on G \ K.

(N )
Hint: Consider the function x G \ K 7 Wx K < G [0, 1], and use
the Strong Minimum Principle.
(iv) Let G be a non-empty, open subset of RN , where N 2, and set D =
{(x, x) : x G}, the diagonal in G2 . Given a u C(G2 ; R) which is harmonic
on G \ D, show that u is harmonic on G2 .
Hint: Show that

(2N )
Wx,y
t [0, ) (t) D
Z


Wy(N ) t (0, ) (t) = 1 (t) Wx(N ) (d) = 0


C(RN )

for (x, y) G2 \ D.

474

11 Some Classical Potential Theory

Exercise 11.1.40. For each r (0, ), let S(r) denote the open vertical strip
(r, r) R in R2 . Clearly,


S(r) () = r(1) () inf t 0 : |1 (t)| r ,
and so the harmonic measure for S(r), based at any point in S(r), will be
supported on {(x, y) : x = r and y R}. In particular, if u Cb S(r); R is
bounded and harmonic on S(r), then

(11.1.41)

kuku sup |u(1, y)| |u(1, y)|.


yR

The estimate in (11.1.41) is a primitive version of the PhragmenLindelof maximum principle. To get a sharper version, one has to relax the global boundedness
condition on S(r). To see what can be expected, consider the function


y 
(x + r) 
for z = (x, y) R2 .
cosh
ur (z) sin
2r
2r

Obviously, ur is harmonic everywhere but (11.1.41) fails dramatically. Hence,


even if boundedness is not necessary for (11.1.41), something is: the function
cannot be allowed to grow, as |y| , as fast as ur does. What follows is the
outline of a proof that those harmonic functions which grow strictly slower than
ur do satisfy (11.1.41). More precisely, it will be shown that, for u C S(r); R
which are harmonic on S(r),



|y|
u(x, y) < for some [0, 1)
sup exp
2r
(x,y)S(r)

= u satisfies (11.1.41),
which is the true Phragm
enLindel
of principle
(i)

(i) Given R (0, ), set R () = inf{t 0 : |i (t)| R}, and show that, for

any u C S(r); R which is harmonic on S(r),
h 
i
h 

i

(2)
(2)
(2)
(2) 
(2)
u(z) = EWz u r(1) , r(1) R + EWz u R
, R < r(1)

for z S(r, R) (r, r) (R, R). Conclude that (11.1.41) holds as long as




(2)
lim sup u(x, R) u(x, R) Wz(2) R < r(1) = 0, z S(r).
R |x|1

Thus, the desired conclusion comes down to showing that, for each (r, ),



R
(2)
Wz(2) R < r(1) = 0, z S(r).
(*)
lim exp
R
2

11.2 The Poisson Problem and Green Functions

475

(ii) To prove (*), let (r, ) be given. Show that, for R (0, ) and
z S(r, R),
 
i

h

(2)
(2)
1 R +
(2)
(1)
Wz
R
,

<

sin
u (z) = cosh 2 E
r
R
2




(2)
Wz(2) R < r(1) ,
cos r
cosh R
2
2

and from this get (*).


11.2 The Poisson Problem and Green Functions
Let G be an open subset of RN and f a smooth function on G. The basic problem
which motivates the contents of this section is that of analyzing solutions u to
the Poisson problem
(11.2.1)

1
2 u

= f in G and

lim u(x) = 0 for a reg G.

xa

Notice that, at least when G is bounded, or, more generally, whenever (11.1.20)
holds, there is at most one bounded u C 2 (G; R) which satisfies (11.2.1). Indeed, if there were two, then their difference would be a bounded harmonic function on G satisfying boundary condition 0 at reg G, which, because of (11.1.20)
and Corollary 11.1.19, means that this difference vanishes. Moreover, when
N 3, even if (11.1.20) fails, one can (cf. Theorem 11.1.24) recover uniqueness
by adding to (11.2.1) the condition that
(11.2.2)

lim u(x) = 0.

|x|
xG

In view of the preceding discussion, the problem in Poissons problem is that


of proving that solutions exist. In order to get a feeling for what is involved,
given f Cc (G; R), define
# Z Z
"Z

T
T

(N )
Wx
f (y)pG (t, x, y) y dt
1[0, G ) (t)f (t) dt =
uT (x) = E
1
T

1
T

for T (1, ) and x G. Then, by Corollary 10.3.13,


1
2 uT

Z
=


f (y) pG (T, x, y) pG (T 1 , x, y) dy

and xa
lim uT (x) = 0 for a reg G.
xG

R
Hence, at least when (11.1.20) holds and therefore G pG (T, x, y)f (y) dy 0
as T % , it is reasonable to hope that u = limT uT exists and will be the

476

11 Some Classical Potential Theory

desired solution to (11.2.1). On the other hand, it is neither obvious that the
limit will exist nor, even if it does exist, in what sense either the smoothness
properties or (11.2.2) will survive the limit procedure.
Motivated by these considerations, I now define the Green function to be
the function g G given by
Z
G
(11.2.3)
g (x, y) =
pG (t, x, y) dt, (x, y) G2 .
(0,)

My goal in this section is to show that, in great


generality, g G is the fundamental
R
solution to (11.2.1) in the sense that x
f (y)g G (x, y) dy solves (11.2.1).
G
11.2.1. Green Functions when N 3. The transience of Brownian motion
in RN for N 3 greatly simplifies the analysis of g G there. The basic reason
why is that
Z
Z
|yx|2
N
RN
(N )
N
2
t 2 e 2t dt
g (x, y)
g (t, y x) dt = (2)
0
0

N
2 1
,
=
N
2 2 |y x|N 2

and therefore (cf. part (i) in Exercise 2.1.13)


(11.2.4)

g R (x, y) =

2|y x|2N
,
(N 2)N 1
N

where N 1 is the area of SN 1 . In particular, when N 3, g R (x, ) is smooth


and has bounded derivatives of all orders in RN \ B(x, r) for each r > 0. Next,
by integrating both sides of (10.3.8) with respect to t (0, ), we obtain, for
any G, the Duhamel formula
h N
i
 G
(N )
N
G
(11.2.5)
g G (x, y) = g R (x, y) EWx g R (0+
), y , 0+
< ,
which means that g G (x, ) is bounded on compact subsets of G\{x}. In addition,
for each y G, the second term on the right
R of (11.2.5) is a harmonic function
of x G, and so, for any f Cc (G; R), G f (y)g G ( , y) dy makes sense and
R
N
differs from RN f (y)g R ( , y) dy by a function that is harmonic on G.
Now define
Z
(11.2.6)
GG f (x) =
f (y)g G (x, y) dy for f Cc (G; R) and x G.
G

What we still have to find are conditions under which GG f solves (11.2.1) and
satisfies (11.2.2). From (11.2.5) and Theorem 10.2.14, it is clear that GG f (x)

11.2 The Poisson Problem and Green Functions


N

477
N

tends to 0 as x tends to reg G. In addition, since |GG f | GR |f | and GR |f |


tends to 0 at infinity, GG f satisfies (11.2.2). Hence, the remaining question is
whether 12 GG f = f on G. As an initial step, suppose that GG f Cb2 (G; R),
and note that, for each x G,

Z Z
1 s
G
G
1
(y)p
(t,
x,
y)
dy
dt
G
f
(x)
=
lim
2
s&0 s 0
G
Z

1
G
G
G
p (s, x, y)G f (y) dy G f (x)
= lim
s&0 s
G

Z Z
1 s
G
f (y)p (t, x, y) dy dt = f (x).
= lim
s&0 s 0
G

Thus, what we need to know is whether GG f Cb2 (G; R). By the considerations
N
above, we already know that GG f Cb2 (G; R) if and only if GR f is. Moreover,
N
N
if f Cc2 (G; R), then GR f = GR f for any with kk 2. In addition,
N

GR xi xj f (x) =

N 1

Z
RN

(yi xi )yj f (y)


dy.
|x y|N

Hence, by starting with f s that are in Cc2 (G; R) and applying an obvious approximation argument, we see that GG f Cb2 (G; R) whenever f Cc1 (G; R).1
Theorem 11.2.7. Assume that N 3 and that G is a non-empty, open subset
of RN . Then, for each f Cc1 (G; R), the function GG f in (11.2.6) is the unique
bounded, twice differentiable solution to (11.2.1) which satisfies (11.2.2).
Remark 11.2.8. Notice that the Duhamel formula in (11.2.5) could have been
N
guessed. To be precise, g R is a fundamental solution for 12 in RN in the
N
sense that 12 GR f = f all test functions f Cc1 (RN ; R), and g G is to be a
fundamental solution for 12 in G with 0 boundary data in the sense that it
should be the kernel for the solution operator which solves the Poisson problem in
(11.2.1). Based on these remarks, one should guess that a reasonable approach
N
to the construction of g G would be to correct g R ( , y) for each y G by
N
subtracting off a harmonic function which has g R ( , y) as its boundary value,
and this is, of course, precisely what is being done in (11.2.5).

11.2.2. Green Functions when N {1, 2}. Because (cf. Theorem 10.2.3)
Brownian paths in one and two dimensions spend infinite time in every nonempty open set, the reasoning 11.2.1 is too crude to handle the Poisson problem
1

It turns out that if f is H


older continuous of some order, then GG f will be twice continuously
differentiable and its second derivatives will be H
older continuous of the same order as f . Such
results are called Schauder estimates. See, for example, N.V. Krylovs Lectures on Elliptic and
Parabolic Equations in H
older Spaces, A.M.S. Graduate Studies in Math. 12 (1996).

478

11 Some Classical Potential Theory


N

in these dimensions. In particular, when N {1, 2}, g R will be identically


infinite, and so (11.2.5) does us no good. To overcome this difficulty, I will use a
generalization of (11.2.5). Namely, let H be an open set that contains G. Then,
by the Markov property, it is easy to check that
h
i

(N )
P H (t, x, ) = P G (t, x, ) + EWx P H t G (), ( G ), , G () <
for (t, x) (0, ) G and BG . As a consequence, we have that
h
i

(N )
pH (t, x, y) = pG (t, x, y) + EWx pH t H (), ( G ), y , G () <
for (t, x, y) (0, ) G2 . Hence, after integrating with respect to t (0, ),
one obtains
h
i

(N )
(11.2.9)
g H (x, y) = g G (x, y) + EWx g H t G (), y , G () <
for all (x, y) G2 .
Of course, (11.2.9) is useful only when g H is finite and calculable, and so
we have to find a suitable class of Hs for which it is. Thus, I will start by
calculating g (0,) (x, y) for x, y (0, ). To this end, recall from Exercise
10.3.35 that p(0,) (t, x, y) = g (1) (t, y x) g (1) (t, x + y). Next, check that
Z
Z
2


1
1
a2 3 a2
a2t
21
t 2 e 2t dt = 2 2 |a| 12 = (2) 2 |a|
1 dt =
e
t
2 0
0

for any a R. Thus,


Z

(0,)
g
(x, y) =
g(t, y x) g(t, y + x) dt
Z
Z 0 

(yx)2
1
12
2t
12
2
1 dt (2)
= (2)
e
t


 (y+x)2
1
t 2 e 2t 1 dt

= |x + y| |x y| = 2(x y).
More generally, by translation and reflection, we see that, for any c R,
(11.2.10)

g (c,) (x, y) = 2(x c) (y c)

for x, y (c, )

and that g (,c) (x, y) = g (c,) (x, y).


Now suppose that G 6= R is a non-empty, connected, open subset of R. Then,
either G = (c, ) for some c R or G = (a, b) for some < a < b < .
Since we already know an exact expression in the first of these cases, assume
that G = (a, b). By taking H = (a, ) in (11.2.9), we know that
h
i

(1)
g (a,b) (x, y) = g (a,) (x, y) EWx g (a,) ( (a,b) ), y , (a,b) () < .

11.2 The Poisson Problem and Green Functions

479

(1)

Since Wx ( (a,b) < ) = 1 for all x R and the boundary of (a, b) is regular,
Corollary 11.1.19 together with (11.2.10) say that, as a function of x (a, b),
the second term on the right equals u, where u00 = 0, limx&0 u(x) = 0, and
, and so
limx%b u(x) = 2(y a). Hence, u(x) = 2(xa)(ya)
ba

(11.2.11)

g (a,b) (x, y) =


2
x y a (b x y).
ba

Starting from these, it is an easy matter to check by hand that if G 6= R is any


open interval and f Cc (G; R), GG f is bounded and solves (11.2.1). Moreover,
because (11.1.20) holds, GG f is the only such solution.
When N = 2, matters are significantly more complicated but much more
interesting. I will begin by considering the R2 analog of (0, ), which is the
upper half-space R2+ = {(x1 , x2 ) : x2 > 0}. It should be clear that, for x =
(x1 , x2 ) and y = (y1 , y2 ),


|
yx|2
|yx|2
1
2t
2t
R2+
(1)
(0,)
,
e
e
p (t, x, y) = g (t, y1 x1 )p
(t, y1 , y2 ) =
2t

= (y1 , y2 ). Therefore,
where y
Z
2
2
pR+ (t, x, y) dt
(0,)

Z
= lim

T %

1
t





|
y x|2
|y x|2
dt
exp
exp
2t
2t

2
|yx|
Z

= lim

T %

1 1
e 2tT dt,
t

|
yx|2
2

which means that g R+ (x, y) = 1 log


2

|yx|
|
yx| .

h
(2)

g G (x, y) = g R+ (x, y) EWx


if G R2+ . Furthermore, because x
from the preceding to
(11.2.12) g G (x, y) =

Hence, by (11.2.9), we know that


i
 G
2
G
g R+ (0+
), y , 0+
<

log |
y x| is harmonic in G, one can pass

h
i
(2)
1
1
G
G
)|, 0+
() < ,
log |y x| + EWx log |y (0+

first for G R2+ and then, after translation and rotation, for G contained in
any half-space. In addition, by the same argument as I used in 11.2.1, one can
use (11.2.12) to check that if G is contained in a half-space, then GG f solves
Poissons problem for every f Cc (G; R).
To handle regions that are not contained in a half-space, one needs to work
harder.

480

11 Some Classical Potential Theory


If G is an open subset of R2 for which reg G 6= , then, for

Lemma 11.2.13.
each K G,

g G (x, y) dx = sup

sup
yG

xG

and

(2)

sup
(x,y)K 2

EWx

g G (x, y) dy <

h
i

log |y ( G )| , G () < < .

In addition, for each c G and r > 0,


h
i
(2)
(x, y)
ur (x, y) EWx log |y ( G )|, G () < B(c,r) ()
2
is harmonic on G B(c, r) , and, as r , {ur : r > 0} tends uniformly on
compact subsets of G2 to the function
h
i
(2)
(x, y) G2 7 u(x, y) EWx log |y ( G )|, G () < R.
In particular, u is harmonic on G2 .
Proof: Since g G is symmetric, the first equality is obvious. While proving the
associated finiteness assertion, I may and will assume that G is connected. In
addition, it suffices for me to prove
"Z G
#
()

(2)
sup EWx
1B(c,r) (t) dt <
xG

for all c G and r > 0 with B(c, 2r) G. Given such a ball, set B = B(c, r)
and 2B = B(c, 2r), and define {n : n 0} inductively by 0 = 0 and, for n 1,
/ 2B}.
2n1 = inf{t 2(n1) : (t) B} and 2n = inf{t 2n1 : (t)

(2)
G
If u(x) = Wx 1 < , then u is a [0, 1]-valued, harmonic function on G \ B
that tends to 0 as x tends to reg G and to 1 as x tends to B. Thus, since
reg G 6= , the Minimum Principle says that u(x) (0, 1) for all x G \ B. In
particular, this means that max{u(x) : |x c| = 2r} (0, 1). At the same
time, by the Markov property,




(2) 
Wx(2) 2n+1 < G = EWx u (2n ) , 2n () < G () Wx(2) 2n1 < G ,


(2) 
(2)
and so Wx 2n1 < G n1 for n Z+ . Hence, if f (y) = EWy 2B ,
then
"Z G
#
"Z
#

2n
X


(2)
(2)
Wx
Wx
G
E
1B (t) dt =
E
1B (t) dt, 2n1 () <
0

X
n=1

n=1
(2)

EWx

2n1




kf ku
.
f (2n1 ) , 2n1 () < G ()
1

11.2 The Poisson Problem and Green Functions

481

Since, by Theorem 10.1.11, f is bounded, this completes the proof.


Turning to the second part, begin by observing that, for each r > 0 and
x G(r) G B(c, r), ur (x, ) is a harmonic function on G(r). Next, given
y G(r), define f on G(r) so that f () = log |y | or 0 according to whether
is or is not an element of G(r) \ B(c, r). Then

(2) 
ur (x, y) = EWx f ( G(r) ), G(r) () < ,
and so ur ( , y) is also harmonic on G(r). Hence, since ur is locally bounded
on G(r)2 , Exercise 10.2.16 applies and says that ur is harmonic on G(r)2 . To
complete the proof, let B be an open ball whose closure is contained in G, set
G{), and choose R > 0 so that B
G(R). Then, for each r > R,
D = dist(B,

vr (x, y) ur (x, y) log DWx(2) G < B(c,r)


(2)
|y ( G )| G
, () < B(c,r) ()
= EWx log
D

is a non-negative, harmonic function on B 2 , and, for each (x, y) B 2 , vr (x, y) is


non-decreasing as a function of r > R. Thus, by Harnacks Principle (cf. Corollary 11.1.32), either limr vr = on B 2 or vr tends uniformly on compact
subsets of B 2 to a harmonic function v. Since





lim sup Wx(2) G < B(c,r) Wx(2) ( G < ) = 0,
r xB

it is clear that the latter case implies that


h
i

(2)
sup EWx log |y G ()| , G () <
(x,y)K 2

lim

sup

r (x,y)K 2

vr (x, y) + | log D| <

for K B and that ur tends to u uniformly on compact subsets of B. Hence,


all that remains is for me to rule out the possibility that limr vr = on B.
Equivalently, I must show that limr ur (x, y) < for some (x, y) B 2 .
For this purpose, note that, because G(r) is bounded, and therefore contained
in some half-space, (11.2.12) applies and says that
h
i

(2)
g G(r) (x, y) + log |y x| = EWx log y ( G(r) ) , G(r) () <
h
i

(2)
= ur (x, y) + EWx log y ( B(c,r) ) , B(c,r) () < G () < .

Hence, for sufficiently large rs and all (x, y) B 2 ,


ur (x, y) g G(r) (x, y) +

1
1
log |y x| g G (x, y) + log |y x|,

which, by the first part of this lemma, means that limr ur cannot be infinite
everywhere on B 2 . 

482

11 Some Classical Potential Theory

Theorem 11.2.14.
Let G be a non-empty, open subset of R2 for which
reg G 6= . Then, (11.1.20) holds,
(2)

sup EWx
x,yK

h
i



log y ( G ) , G < < for K G,

and

(2)

(x, y) G2 7 EWx

h
i

log y ( G ) , G < R

is a harmonic function. In addition, for each c G, the limit


log r (2)  B(c,r)
Wx
G ,
r

hG (x) lim

(11.2.15)

x G,

exists, is uniform with respect to x in compact subsets of G and independent of


c G, and determines a harmonic function of x G. Finally,
(11.2.16) g G (x, y) =

i
h

(2)
1
1
log |yx|+ EWx log y( G ) , G < +hG (x)

for all distinct xs and ys from G, and so either hG 0 or G is unbounded and


(11.2.17)

g G ( , y) hG

uniformly on compacts as |y| through G.

Proof: Note that, because N = 2, Theorem 11.1.26 guarantees that (11.1.20)


follows from reg G 6= , and the rest of the initial assertion is covered by Lemma
11.2.13.
To prove the remaining assertions, let c G be given, set G(r) = G B(c, r),
and set gr (x, y) = g G(r) (x, y) for (x, y) G(r)2 . By (11.2.12),
gr (x, y) =

h
i
(2)
1
1
log |y x| + EWx log |y ( G(r) )|, G(r) () < .

In particular, for each (x, y) G(r)2 , gr ( , y) is harmonic on G(r) \ {y} and


gr (x, ) is harmonic on G(r) \ {x}. Hence, by Exercise 10.2.16, gr is a non\2 {(x, y) G(r)2 : x 6= y}. At the same
negative, harmonic function on G(r)
time, because pG(r) (t, x, y) is non-decreasing in r for each (t, x, y) (0, )
G(r)2 , we know that gr is non-decreasing in r. Hence, by Harnacks Principle
c2 {(x, y)
(cf. Corollary 11.1.32), either limr% gr is everywhere infinite on G
2
c
2
G : x 6= y} or gr converges uniformly on compact subsets of G to a harmonic
function. Because
Z
Z
g G (x, y) =
pG (t, x, y) dt = lim
pG(r) (t, x, y) dt = lim gr (x, y),
(0,)

r%

(0,)

r%

11.2 The Poisson Problem and Green Functions

483

we conclude from the first part of Lemma 11.2.13 that only the second alternative
c2 and that
is possible. Thus, we now know that g G is harmonic on G
(*)

gr (x, y) % g G (x, y)

c2 .
uniformly on compact subsets of G

To go further, first notice that the expression in (11.2.12) for gr can be rewritten as
(**)

gr (x, y) = log |y x| + ur (x, y)


h
i

(2)
+ EWx log y ( B(c,r) ) , B(c,r) () G () < ,

where
(2)

ur (x, y) = EWx

i
h

log y ( G ) , G () < B(c,r) ()

for (x, y) G(r)2 .

By the second part of Lemma 11.2.13, we know that each ur is harmonic on


G(r)2 and that, as r , {ur : r > 0} tends uniformly on compact subsets of
G2 to the harmonic function
h
i
(2)
(x, y)
u(x, y) EWx log |y ( G )|, G () < .
Moreover, by combining this with (*) and (**), we also know that the third term
c2 to a harmonic
on the right of (**) converges uniformly on compact subsets of G
c2 . At the same time, as r ,
function on G
h
i


(2)
EWx log y ( B(c,r) ) , B(c,r) () G () <


log rWx(2) B(c,r) G () <
#
"

!
y ( B(c,r) )
(2)
, B(c,r) G () < 0
= EWx log
r

uniformly for (x, y) in compact subsets of G2 . Thus, the asserted limit in


(11.2.15) exists, the function hG is harmonic on G, and (11.2.16) holds.
Finally, to complete the proof, note that if G is bounded, then (11.2.12) holds
and therefore hG must be identically 0. Now, assume that G is unbounded. To
prove (11.2.17), use (11.2.16) to write
#
"

!
y ( G )
1 Wx(2)
G
G
G
, () < ,
log
h (x) = g (x, y) + E
|y x|

and apply Lebesgues Dominated Convergence Theorem together with the integrability estimate in the second part of Lemma 11.2.13 to see that, as |y|
through G, the second term tends to 0 uniformly for x in compact subsets of
G. 

484

11 Some Classical Potential Theory

Remark 11.2.18. The appearance of the extra term hG in (11.2.16) is, of


course, a reflection of the fact that, for unbounded regions in R2 , we do not
know a priori which harmonic function (cf. Remark 11.2.8) should be used to
correct 1 log |yx|. When N 3, the obvious choice was the one that behaved
N
the same way at as g R itself (i.e., the one that tends to 0 at ). Actually,
as (11.2.17) makes explicit, the same principle applies to the case when N = 2,
G
although now 0 may not be that limiting behavior. To
 see that, in general, h
is not identically 0, consider the open disk D(R) = x : |x| < R , and take
G = R2 \ D(R). Then it is an easy matter to check that, for R < |x| < r,

 log |x|
R
.
Wx(2) D(r) < G =
log Rr

Hence, by (11.2.15), we see that


2

hR

\D(R)

(x) =

|x|
1
,
log
R

x
/ D(R).

As we are about to see, for Gs whose complements are compact, the conclusion
drawn about hG at the end of Remark 11.2.18 is typical, at least as |x| .
Corollary 11.2.19. Let everything be as in Theorem 11.2.14, and assume
that K R2 \ G is compact. Then, for each R (0, ) with the property that
K D(R), one has that
Z
|x|2 R2
|x|2
|x|
1
G
=
hG (x) log

h (R) S1 (d)
2 S1 |x|2 Rx 2
R

Z
1
hG (R) S1 (d)

2 S1

as |x| .
Proof: Define : C(RN ) [0, ] to be the first entrance time into D(R),
and note (cf. the preceding discussion) that, for each r > R and R < |x| < r,

Wx(2) D(r) < G
h
i


(2)
(2)
= Wx(2) D(r) < + EWx W() D(r) < G , < D(r)

h
i

(2)
log |x|
(2)
Wx
R
W() D(r) < G , < D(r) .
r +E
log R

Hence, after multiplying the preceding through by log r , using (11.2.15), and
letting r , we arrive at
h
i

(2)
1
|x|
1
+ EWx hG () , < , x R2 \ D(R),
hG (x) = log

11.2 The Poisson Problem and Green Functions

485

which certainly implies that


x R2 7 hG (x)

|x|
1
log
R

is a bounded function that is harmonic off of D(R). Thus, the desired result
now follows from the first part of Theorem 11.1.29. 
Notice that, as a by-product, one knows that the number
Z
1
1
hG (R) S1 () log R

2 S1

does not depend on R as long as G{ B(0, R). This number plays an important role in classical two-dimensional potential theory, where it is known as
Robins constant for G.
Corollary 11.2.20. Again let everything be as in Theorem 11.2.14. Then,
for each K G and r > 0,
n
o
sup g G (x, y) : |x y| r and y K <
and
lim sup g G (x, y) = 0 for each a reg G.

xa
xG yK

Moreover, for each f Cc1 (G; R), GG f is the unique bounded solution to
(11.2.1).
Proof: To prove the initial statements, let c G and r > 0 satisfying B(c, 2r)
G be given, set B = B(c, r), and define
the first entrance time () of

B
.
By
the Markov property, we see that,
0
:
(t)

into B by () = inf t

for any f Cc B; [0, ) ,
"Z G
#
Z


(2)
g G (x, y)f (y) dy = EWx
f (t) dt, < G
G

(2)

Wx

=E

Z
g

(), y f (y) dy, <


.

Hence, if x
/ 2B B(c, 2r) and therefore g G (x, )  B is continuous, we find
that
h
i
(2)
g G (x, y) = EWx g G (), y), < G
for all y B.

But, because g G  (2B) B is bounded, we now see that

(*)
sup g G (x, y) CWx(2) < G , x
/ 2B,
yB

486

11 Some Classical Potential Theory

for some C (0, ). In particular, this, combined with the obvious Heine
Borel argument, proves the first estimate. In addition, if a reg G, then, for
each > 0,



lim Wx(2) G >
lim Wx(2) + xa
lim Wx(2) < G xa
xa
xG

xG

xG

lim Wx(2)
xa
xG

Thus, since the last expression obviously tends to 0 as & 0, this, together with
(*), implies that
lim sup g G (x, y) = 0,
xa
xG yB

which (again after the obvious HeineBorel argument) means that we have also
proved the second assertion.
Turning to the last part of the statement, let f Cc1 (G, R) be given. By the
preceding, we know that GG f is bounded and tends to 0 at reg G. In addition,
using Theorem 11.2.14, especially (11.2.16), and arguing as I did in the case
when N 3, it is easy to check that GG f C 2 (G; R) and 12 GG = f . Thus,
GG f is a bounded solution to (11.2.1), and, because (11.1.20) holds, it can be
the only such solution. 

Exercises for 11.2


Exercises 11.2.21. Give an explicit expression for the Green function g B(c,R)
when N 2. To this end, first use translation and scaling to see that


xc yc
,
g B(c,R) (x, y) = R2N g B(0,1)
R
R

for distinct x, y from B(c, R). Thus, assume that c = 0 and R = 1. Next,
observe that



y
|x y| = |y|x |y|
for x SN 1 and y BRN (0, 1) \ {0},

and use this observation together with (11.2.12) and (11.2.5) to conclude that


(

y

|y|x
log
1
1
if y 6= 0

B(0,1)
|y|
g
(x, y) = log |y x| +
0

if y = 0

when N = 2 and
N

g B(0,1) (x, y) = g R (x, y)

N

y
gR

|y|x
if y 6= 0
|y|

when N 3.

2
(N 2)N 1

if y = 0

11.3 Excessive Functions, Potentials, and Riesz Decompositions

487

Exercise 11.2.22. The derivation that I gave of Poissons formula (cf. Theorem 11.1.28) required me to already know the answer and simply verify that it is
correct. Here I outline another approach, which is the basis for a quite general
procedure. To begin with, recall the classical Greens Identity
Z 
Z


u
v
dG
v n
uv vu dx =
u n
G

for bounded, smooth regions G in R and functions u and v that are smooth
in a neighborhood of G. (In the preceding, w
n (x) is used to denote the normal
derivative w(x), n(x) RN , where n(x) is the outer unit normal at x G
and G is the standard surface measure for G.) Next, let c be an element of
B(0, 1), suppose r > 0 satisfies B(c, r) B(0, 1), and let u be a function that
is harmonic in a neighborhood of BRN (0, 1). By applying Greens Identity with
G = BRN (0, 1) \ B(c, r) and v = 12 g B(0,1) (c, ), use Exercise 11.2.21 to verify
Z

N 1
u(c) = lim r
, v(c + r) RN u c + r) SN 1 (d)
r&0
SN 1
Z
Z

=
, v() RN u ) SN 1 (d) =
u ) (N ) (c, ) SN 1 (d),
SN 1

SN 1

where (N ) is the Poisson kernel given in Theorem 11.1.28. Finally, given f


C(G; R), extend f to BRN (0, 1){ so that it is constant along rays, take
(N )

uR (x) = EWx




f ( B(0,R) ) , B(0,R) < for R 1 and x B(0, R),

check that, as R & 1, uR u1 uniformly on B(0, 1), and use the preceding to
conclude that
Z
u1 (c) =
f () (N ) (c, ) SN 1 (d),
SN 1

which is, of course, the result that was proved in Theorem 11.1.28.
11.3 Excessive Functions, Potentials, and Riesz Decompositions
The origin of the Green function lies in the theory of electricity and magnetism.
Namely, if G is a region in RN whose boundary is grounded and y G, then
g G ( , y) should be the electrical potential in G that results from placing a unit
point charge at y. More generally, if is any distribution of charge in G (i.e.,
a non-negative, locally finite, Borel measure on G), then one can consider the
potential GG given by
Z
(11.3.1)
GG (x) =
g G (x, y) (dy), x G,
G

where I have implicitly assumed that either N 3 or (11.1.20) holds. In this


section I will characterize functions that arise in this way (i.e., are potentials).

488

11 Some Classical Potential Theory

11.3.1. Excessive Functions. Throughout this subsection, G will be a nonempty, connected, open region in RN , and I will be assuming either that N 3
or that (11.1.20) holds. Thus, by the results obtained in 8.2.1 and 8.2.2, the
Green function (cf. (11.2.3)) g G satisfies (depending on whether N = 1, N = 2,
or N 3) either (11.2.10), (11.2.11), (11.2.16), or (11.2.5), and, in order to have
g G defined everywhere on G2 , I will take g G (x, x) = , x G, when N 2.
I will say that u is an excessive function on G and will write u E(G) if
u is a lower semicontinuous, [0, ]-valued function that satisfies the super mean
value property:
u(x)

N 1

SN 1

u(x + r) SN 1 (d)
whenever BRN (x, r) G.

As the next lemma shows, there are lots of excessive functions.


Lemma 11.3.2. E(G) is closed under non-negative linear combinations and
non-decreasing limits,
and u, v E(G) = u v E(G). Moreover, if

u C 2 G; [0, ) , then u E(G) u 0. Finally, for each non-negative,
locally finite, Borel measure on G and each non-negative harmonic function h
on G, GG + h is an excessive function on G.
Proof: The initial
 assertions are obvious. To prove the next part, suppose that
u C 2 G; [0, ) is given. If u E(G), then
1
2 u(x)

= lim

r&0

N 1


u(x + r) u(x) S N 1 (d) 0

SN 1

for each x G. Conversely, if u 0 and B(x, r) G, then


(N )
Wx

u(x) = E




(N )
u ( B(x,r) ) , B(x,r) < EWx

"Z

B(x,r)

N 1

#
1
2 u

( ) d

Z
SN 1

u(x + r) SN 1 (d).

Clearly the third assertion comes down to showing that GG is excessive.


Moreover, by Fatous Lemma and Tonellis Theorem, we will know that GG is
excessive as soon as we show
that, for each y G, g G ( , y) is excessive. To this

1
end, set fn = pG n , , y and (cf. (11.2.6)) un = GG fn . Because

pG (t, , y) dt % un
1
n

as T ,

11.3 Excessive Functions, Potentials, and Riesz Decompositions

489

un is lower semicontinuous. In addition, by the Markov property and rotation


invariance, B(x, r) G implies
"Z G
#

h
i


(N )
(N )
Wx
un (x) E
fn (t) dt = EWx un (r ) , r <
r

N 1

Z
un (x + rx) SN 1 (dx),
SN 1

where I have introduced the notation


n
o

(11.3.3)
r () = inf t : (t) (0) r
and used the rotation invariance of Brownian motion. Hence, each un is excessive, and therefore, since
Z
pG (t, x, y) dt % g G (x, y) as n ,
un (x) =
1
n

we are done. 
11.3.2. Potentials and Riesz Decomposition. My next goal is to prove
that, apart from the trivial case when u , every excessive function on G
admits a unique representation in the form GG + h for an appropriate choice
of and h. The proof requires me to make some preparations.
Lemma 11.3.4. If u E(G), then either u or u is locally integrable on G.
Next, given a u E(G) that is not identically infinite, there exists a sequence
{un : n 1} Cc (G; R) and a non-decreasing sequence {Gn : n 1} of
open subsets of G with the properties that Gn G, Gn % G, un u,
un 0 on Gn for each n 1, and un u pointwise as n . Moreover,
if n (dy) = 12 1Gn (y)un (y) dy, then there is a non-negative, locally finite,
Borel measure on G such that
Z
Z
(11.3.5)
lim
dn =
d for all Cc (G; R).
n

In fact, is uniquely determined by the fact that = 12 u in the sense that


Z
Z
1
d for all Cc (G; R).
(11.3.6).
2 (y)u(y) dy =
G

Proof: To prove the first assertion, let U denote the set of all x G with the
property that
Z
u(y) dy < for some r > 0 with B(x, r) G.
B(x,r)

490

11 Some Classical Potential Theory

Obviously, U is an open subset of G. At the same time, if x G \ U and r > 0


is chosen so that BRN (x, 2r) G, then, for each y B(x, r) and s (0, r),
Z
1
u(y + s) SN 1 (d),
u(y)
N 1 SN 1

and so, after integrating this with respect to N sN 1 ds over (0, r), we get
Z
Z
1
1
u(z) dz = ,
u(z) dz
u(y)
N 1 rN B(x,)
N 1 rN B(y,r)

where r |y x|. Hence, we now see that G \ U is also open, and therefore
that either U = G or U = and u .
Now assume that u E(G) is not identically infinite. To construct the required
Gn s and un s, choose a reference point c G, set R = 12 |c G{|, and take

Cc B(0, R4 ); [0, ) to be a rotationally invariant function with total integral
1. Next, for each n Z+ , set


and
Gn = x G B(c, n) : |x G{| > R
n
Z
(11.3.7)
un (x) =
n (x y)u(y) dy, x RN ,
G4n


where n () = nN (n). Clearly, {un : n 1} Cc G; [0, ) . In addition, if
x Gn , then, by taking advantage of the rotation invariance of , one can check
that

Z
Z

N 1
t
t
(t)
u x + n SN 1 (d) dt
un (x) =
(0, R
4 )

SN 1

tN 1 (t) dt = u(x),

u(x) N 1
(0, R
4 )


where : R [0, ) is taken so that (x) = |x| . Similarly, if B(x, r)
Gn , then
Z
un (x + r) SN 1 (d)
SN 1

Z

(z)

=
B(0, R
4 )

Z
N 1
B(0, R
4 )

u x+
SN 1

1
nz

+ r SN 1 (d)


dz


(z)u x + n1 z dz = N 1 un (x).

Hence, un  Gn is a smooth element of E(Gn ), and therefore, by the second part


of Lemma 11.3.2, we know that un 0 on Gn . To see that un u pointwise,

11.3 Excessive Functions, Potentials, and Riesz Decompositions

491

observe that we already know that u(x) limn un (x). On the other hand,
because u is lower semicontinuous, an application of Fatous Lemma yields
Z

(y) u x + n1 y dy = lim un (x).
u(x) lim
n

To complete the proof, let n be the measure described, and note that
#
"Z
t Gn
h
i


(N )
(N )
1
un (x) = EWx un (t Gn ) EWx
2 un (s) ds
0

(N )

Wx

"Z

t Gn

#
1
2 un


(s) ds =

Z t Z
p
0

Gn


(s, x, y) n (dy)

ds

Gn

for all n Z+ and (t, x) (0, ) Gn . Hence, after letting t % , we see that
Z
u(x) un (x)
g Gn (x, y) n (dy), n Z+ and x Gn .
Gn

In particular, because u(x) < for Lebesgue-almost every x G, this proves


that, for each K G, supnZ+ n (K) < , and therefore (cf. part (iv) of
Exercise 9.1.16 and apply a diagonalization procedure) {n : n 1} is relatively compact in the sense that every subsequence {nm : m 1} admits a
subsequence {nmk : k 1} and a locally finite, non-negative, Borel measure
on G with the property that
Z
Z
lim
dnmk =
d for all Cc (G; R).
k

At the same time, using integration by parts followed by Lebesgues Dominated


Theorem, we see that
Z
Z
Z
1
1
Cc2 (G; R),

u
dx
=

lim
dn = lim
n
2 u dx,
2
n

and therefore any limit of {n : n 1} must satisfy (11.3.6), which proves


not only that there is such a but also that (11.3.5) is satisfied. 
Lemma 11.3.8. For any lower semicontinuous u : G [0, ], u E(G) if
and only if
h
i
h
i


(N )
(N )
(11.3.9) EWx u ( ) , () < G () EWx u () , () < G ()


for every pair and of Bt : t [0, ) -stopping times with . In
particular, if u E(G)
 and B(x, r) G, then, for any rotationally symmetric
Cc B(0, r); [0, ) with total integral 1,
Z
t (0, 1) 7
(y) u(x + ty) dy [0, ]
B(0,r)

is a non-increasing function.

492

11 Some Classical Potential Theory

Proof: Let u E(G) be given. Clearly (11.3.9) is trivial in the case when
u . Thus, assume that u 6 , and define Gn and un for n Z+ as in
(11.3.7). Because un  Gn 0, we know that
h
i

(N )
EWx un ( Gm T ) , () T < Gm ()
h
i

(N )
EWx un ( T ) , () T < Gm ()
for all 1 m n, x Gm , and T [0, ). Next, after noting that Gm <
(N )
Wx -almost surely, let T % in the preceding, and arrive at
h
i
h
i


(N )
(N )
EWx un ( Gm ) , () < Gm () EWx un () , () < Gm () .
But, because and u un 0, this means that
h
i
h
i


(N )
(N )
EWx un ( ) , () < Gm () EWx u () , () < Gm () ,
which, because 0 un u pointwise, leads, via Fatous Lemma, first to
h
i
h
i


(N )
(N )
EWx u ( ) , () < Gm () EWx u () , () < Gm ()
and thence, by the Monotone Convergence Theorem, to (11.3.9) when m .
From here, the rest is easy. Given a lower semicontinuous u : G [0, ]
and B(x, r) G, we have (cf. (11.3.3))
Z
h
i

(N )
1
u(x + r) SN 1 (d) = EWx u (r ) , r () < G () .
N 1 SN 1

Thus, if, in addition, (11.3.9) holds, then


Z
1
u(x + tr) SN 1 (d) [0, ]
t [0, 1] 7
N 1 SN 1

is non-increasing; and, therefore, not only is u excessive but also (after passing
to polar coordinates and integrating) one finds that the monotonicity described
in the final assertion is true. 
Theorem 11.3.10 (Riesz Decomposition). Let G be a non-empty, connected open subset of RN , and assume either that N 3 or that (11.1.20)
holds. If u E(G) is not identically infinite, then there exists a unique locally finite, non-negative Borel measure and a unique non-negative harmonic function
h on G with the property that
(11.3.11)

u(x) = GG (x) + h(x) for all x G.

In fact, is uniquely determined by (11.3.6), and h is the unique harmonic


function on G that is dominated by u and has the property that h w for every
non-negative harmonic w that is dominated by u. (Cf. Exercise 11.3.14 as well.)

11.3 Excessive Functions, Potentials, and Riesz Decompositions

493

Proof: Take Gn and un as in (11.3.7), and define n accordingly, as in Lemma


11.3.4. Then, for each 1 m n, Lemma 11.3.4 and the final part of Lemma
11.3.8 say that um un % u pointwise on Gm . In addition, for m n and
x Gm ,
Z

g Gm (x, y) n (dy) + wm,n (x),

un (x) =
Gm

(N )

where wm,n = EWx




un ( Gm ) , Gm < .

Hence, by the Monotone Convergence Theorem, for any locally finite, nonnegative, Borel measure on G,
Z
ZZ
Z
Gm
(*)
u(x) (dx) = lim
g (x, y) (dx)dn (y) +
wm (x) (dx),
Gm

Gm

G2m



(N ) 
where wm (x) = EWx u ( Gm ) , Gm < .
Notice (cf. Harnacks Principle) that, as the non-decreasing limit of nonnegative harmonic functions {wm,n : n m}, wm is either identically infinite
or is itself a non-negative harmonic function on G; and so, since u(x) <
Lebesgue-almost everywhere, (*) shows that the latter must be the case. Now
let a be a fixed element of Gm , take n as in (11.3.7), and, for n m, define
(R
(x a)g Gm (x, y) dx if y Gm
Gm n
n (y) =
0
otherwise.
By taking (dx) = 1Gm (x)n (x a) dx in (*), we see that, for n m,
Z
Z
n (x a) u(x) dx = lim
n (y) k (dy)
k G
Gm
Z
+
n (x a) wm (x) dx.
Gm

But, since Gm is the intersection of two sets, both of which (cf. part (iv) in
Exercise 10.2.19) are regular, and is therefore regular as well, there is an n(a)
m for which n is continuous whenever n n(a). In particular, by (11.3.5), we
can now say that
Z
Z
Z
n (x a) u(x) dx =
n (x) (dx) +
n (x a) wm (x) dx
Gm

Gm

for all n n(a). In addition, as n , the reasoning with which we showed


the un u in Lemma 11.3.4 shows that the term on the left tends to u(a). At

494

11 Some Classical Potential Theory

the same
 time, it is clear that the second term on the right goes to wm (a) and
that n (y) : n n(a) tends non-decreasingly to g Gm (a, y). Thus, we have
now proved that
(**)

u = GGm + wm

on Gm for every m Z+ .

Starting from (**), the rest of the proof is quite easy. Namely, fix x G,
choose m so that x Gm , note that, g Gn (x, ) is non-decreasing as n m
increases, and conclude that GGnm (x) % GG (x). Hence, by (**) (alternatively, by (11.3.9)), we know that wmn (x) tends non-increasingly to a limit
h(x), which Harnacks Principle guarantees to be harmonic as a function of
x G. Thus, after passing to the limit as m in (**), we conclude that
(11.3.11) holds with the satisfying (11.3.6) and h = limm H Gm u.
To prove that these quantities are unique, note that if is any locally finite,
non-negative, Borel measure on G for which u GG is a non-negative harmonic
function, then, for every Cc (G; R), simple integration by parts plus the
symmetry of g G shows that
Z
Z
Z
G
1
1
G d =
d.
u dx = 2
2
G

That is, must satisfy (11.3.6); and so we have now derived the required uniqueness result.
Finally, to check the asserted characterization of h, suppose that v is a nonnegative harmonic function that is dominated by u on G. We then have


(N ) 
v(x) = EWx v ( Gm ) , Gm () < wm (x) for m Z+ and x Gm ,
and therefore the desired conclusion follows from the fact that wm tends to h. 
By combining Lemma 11.3.2 with Theorem 11.3.10, we arrive at the following
characterization of potentials.
Corollary 11.3.12. Let everything be as in Theorem 11.3.10, and suppose
that u : G [0, ] is not identically infinite. Then a necessary and sufficient
condition for u to be the potential GG of some locally finite, non-negative,
Borel measure on G is that u be excessive on G and have the property that
the constant function 0 is the only non-negative harmonic function on G that is
dominated by u.
Let u be an excessive function on G that is not identically infinite. In keeping
with the electrostatic metaphor, I will call the measure entering the Riesz decomposition (11.3.11) of u the charge determined by u. A more mathematical
interpretation is provided by Schwartzs theory of distributions. Namely, when
u E(G) is not identically infinite, it is (cf. Lemma 11.3.4) locally integrable on
G, and, as such, it determines a distribution there. Moreover, in the language
of distribution theory, (11.3.6) says that = 12 u. However, the following
theorem provides a better way of thinking about .

11.3 Excessive Functions, Potentials, and Riesz Decompositions

495

Theorem 11.3.13. Let G be as in Theorem 11.3.10 and u : G [0, ] a


lower semicontinuous function. Then u E(G) if and only if
Z
u(x) us (x)

u(y)pG (s, x, y) dy

for all (s, x) (0, ) G.

Moreover, if u E(G) is not identically infinite and, for s (0, ), s (dx) =


s (x)
, then, as s & 0, {s : s > 0} tends to the
fs (x) dx, where fs (x) = u(x)u
s
charge of u in the sense that

Z
(x) (dx) = lim

(x) s (dx)

s&0

for all Cc (G; R).

Proof: If u E(G), then, by the first part of Lemma 11.3.8 with = s and
= 0, one sees that u us . Conversely, suppose that u : G [0, ] is lower
semicontinuous, not identically infinite, and satisfies u us for all s > 0. Then,
since pG (s, x, ) > 0, u is locally integrable on G. Thus, if B(c, r) G and

u(y)pB(c,r) (s, x, y) dy,

ws (x) =
B(c,r)

then ws is bounded on B(c, r) and therefore, because pB(c,r) is smooth on


(0, ) B(c, r)2 and satisfies the ChapmanKolmogorov equation, it follows
that ws is smooth on B(c, r). In addition, because pB(c,r) pG and ut u,
another application of the ChapmanKolmogorov equation leads to
Z

u(y)pB(c,r) (s + t, x, y) dy

ws+t (x) =
B(c,r)

pB(c,r) (s, x, y)ut (y) dy ws (x)

B(c,r)


for (s, t) (0, )2 and x B(c, r). Hence, if Cc2 B(c, r); [0, ) , then
Z
B(c,r)

1
t&0 s

12 ws (x)(x) dx = lim


ws (x) ws+t (x) (x) dx 0,

B(c,r)


which proves that ws 0 on B(c, r). Since this means that ws E B(c, r)
for each s > 0 and because
ws is non-increasing as a function of s, we will know

that u E B(c, r) once we show that ws u pointwise on B(c, r). But,
since ws u, this comes down to checking u(x) lims&0 ws (x), which follows
from lower semicontinuity.

496

11 Some Classical Potential Theory

Turning to the second assertion, begin with the observation that, because
u us and u is lower semicontinuous, us u pointwise as s & 0. Next, note
that for (s, x) (0, ) G,

"Z
#
Z T +s
s
1
ut (x) dt
ut (x) dt
g (x, y)fs (y) dy = lim
T s
0
T
G
Z
1 s
ut (x) dt u(x).

s 0

Hence, since u < Lebesgue-almost everywhere on G, sups>o s (K) <


for all K G, and so {s : s > 0} is (cf. part (iv) of Exercise 9.1.16)
relatively sequentially compact in the sense that every subsequence admits a
subsequence that converges when tested
against Cc (G; R). At the same
R
time, if Cc2 (G; R) and s (x) = G (y)pG (s, x, y) dy, then
s

Z

s =
0

G
1
2 (y)p (,


, y) dy

d,

and so, by Fubinis Theorem and the symmetry of pG (, x, y), one can justify
Z
ds =
G

1
2s
Z

Z Z


u (y) d

(y) dy
Z
1
d.
2 (y)u(y) dy =
G

Hence, every limit of {s : s > 0} is . 


Exercises for 11.3
Exercise 11.3.14. Let G be a connected open set in RN , and assume that
N {1, 2}. If (11.1.20) fails, show that every excessive function on G is constant.
Hence, the only cases not already covered by Rieszs Decomposition Theorem
are trivial anyhow.
Hint: Using the reasoning employed to prove the first part of Lemma 11.3.4,
reduce to the case when u is smooth and satisfies u 0, and in this case apply
the result in Theorem 11.1.26.
Exercise 11.3.15. Let G be an open subset of R, and assume that either N 3
or (11.1.20) holds. If u is an excessive function on G that is not identically
infinite and has charge , show that u is harmonic on any open H G for
which (H) = 0. In addition, show that u is a potential if it is bounded and
u(x) 0 as x G tends to reg G {}.

11.4 Capacity

497

Exercise 11.3.16. Let G be a connected, open subset of RN , and again assume


that either N 3 or (11.1.20) holds. If u E(G) is not identically infinite but u


(N )
is infinite on the compact set K, show that Wx t 0, G () (t) K = 0
for all x G \ K. Finally, apply part (ii) of Exercise 11.1.37 to conclude that
(N )
Wx (t > 0 (t) K) = 0 for all x
/ K.
11.4 Capacity
In the classical theory of electricity, a question of interest is that of determining
the largest charge that can be placed on a body so that the resulting electric
field nowhere exceeds 1. From a mathematical standpoint this question is the
following. Let M(G) denote the space of non-negative, finite Borel measures
on an open set G. Then, given =
6 K R3 , what we want to know is the
3
total mass of the K M(R ) that is supported on K and solves the extremal
problem
3

GR K (x) = max{GR (x) : M(R3 ) with (R3 \ K) = 0 and GR 1}


for all x R3 . Of course, it is not at all obvious that such a K exists. Indeed,
the proof that it always does was one of Wieners significant contributions to
classical potential theory. As we are about to see, probability provides a simple
proof of Wieners result.1
11.4.1. The Capacitory Potential. Here I will show that the extremal
problem described above has a solution.
Theorem 11.4.1. Assume that G is a connected, open subset of RN and that
either N 3 or (11.1.20) holds. Given K G, set



(N )
(11.4.2)
pG
t 0, G () (t) K , x G.
K (x) = Wx
G
Then pG
K is a potential whose charge K is supported on K. Moreover, if
M(G) is supported on K and GG 1, then GG pG
K.

Proof: I begin by checking that pG


K is excessive. For this purpose, note that,
for any s > 0, the Markov property says that
Z



G
(N )
pG
t s, G () (t) K pG
K (y)p (s, x, y) dy = Wx
K (x).
G

In addition, because pG
K is bounded, the left-hand side is continuous with respect
to x G, and clearly the middle expression tends non-decreasingly to pG
K (x) as
s & 0. Thus, by the first part of Theorem 11.3.13, we now know that pG

E(G).
K
1

It is interesting to note that, although Wieners 1924 article, Certain notions in potential
theory, J. Math. Phys. M.I.T. 4, contains the first proof that an arbitrary compact set is
capacitable, it contains no reference to his own measure.

498

11 Some Classical Potential Theory

The next step is to prove that pG


K is a potential whose charge is supported
on K. But, because N 3 or (11.1.20) holds, it is clear that pG
K (x) tends to 0
as x G tends to either reg G or . Hence, if u is a non-negative harmonic
function on G that is dominated by pG
K , then u must be a bounded harmonic
function that tends to 0 at reg G {}, and so, because N 3 or (11.1.20)
holds, u 0. Therefore, pG
K is a potential. By Exercise 11.3.15, to check that
G
(G
\
K)
=
0,
it
suffices
to show that pG
K
K is harmonic on G \ K. For this
purpose, assume that B(x, r) (G \ K), and use the Markov property to
justify

N 1

 B(x,r)

(N ) 
Wx
B(x,r)
pG
pG
,
() <
K () SN 1 (d) = E
K (
SN 1



= Wx(N ) t B(x,r) (), G () (t) K = pG
K (x).

That is, pG
K satisfies the mean value property in G \ K and is therefore harmonic
there.
To complete the proof I must still show that if M(G) is supported on
G
K and u GG 1, then u pG
K , and I will start by showing that u pK
on G \ K. To this end, observe that u is harmonic on G \ K and that it tends
to 0 at reg G {}. Thus, if () = inf{t 0 : (t) K()}, where
K() = {x : |x K| }, then, for 0, dist(K, G{) and x G \ K(), u(x)
is dominated by
(N )

EWx







u ( ) , () < G () Wx(N ) t 0, G () (t) K() .

But, as & 0, the last expression tends to pG


K (x) plus

Wx(N ) > 0 < G and lim = = G ,
&0

and, because N 3 or (11.1.20) holds, this additional term is 0.


We now know that u pG
K on G \ K. To prove that the same inequality holds
on K, first observe that, by part (i) of Exercise 10.2.19, pG
K  K = 1 u  K
when N = 1. Thus, assume that N 2. In this case, g G (x, x) = for x G,
and so, since u 1, must be non-atomic. In particular, this means that
Z
u(x) = lim ur (x),
r&0

where ur

g G ( , y) (dy).

G\B(x,r)

But, by the preceding applied with K \ B(x, r) replacing K, ur (x) pG


K\B(x,r) ,
G
G
and obviously pK\B(x,r) pK . 

11.4 Capacity

499

G
The function pG
K and the measure K are, for the reasons explained above,
known as, respectively, the capacitory potential and the capacitory distribution for K in G, and the total mass

Cap(K; G) G
K (K)

(11.4.3)

is called the capacity of K in G. As a dividend from Theorem 11.4.1, we get


the following important connection between properties of Brownian paths and
classical potential theory.
Corollary 11.4.4. Let everything be as in the statement of Theorem 11.4.1.
Then the following are equivalent:
(i) For every x G,



Wx(N ) t 0, G () (t) K > 0.
(ii) There is an x G for which



Wx(N ) t 0, G () (t) K > 0.
(iii) There exists a non-zero, bounded potential on G whose charge is supported
in K.
(iv) Cap(K; G) > 0.
Moreover, Cap(K; G) = 0 for, when N 3, some G K or, when N {1, 2},

(N )
some G K satisfying (11.1.20), if and only if Wx t (0, ) (t) K =
0 for all x
/ K.
Proof: The only implications in the equivalence assertion that are not completely trivial are (iii) = (iv) and (iv) = (i). But, by Theorem 11.4.1, (iii)
G
implies that pG
K 6 0 and therefore that K 6= 0. Similarly, (iv) implies that
G
G
K 6= 0, and therefore, since g > 0 throughout G2 , that pG
K > 0 throughoutG.
(N )
To prove the final assertion, first suppose that Wx0 t (0, ) (t) K >
0 for some x0
/ K. Then we can choose R (0, ) so that K B(0, R) and
B(0,R)
B(0,R)
pK
(x0 ) > 0. In particular, K
6= 0 and
B(0,R)

GGB(0,R) K

B(0,R)

GG K

1.

At the same time, because


(N )

g G (x, y) g GB(0,R) (x, y) + EWx

i

g G ( GB(0,R) ) , GB(0,R) () < ,

there exists (cf. Corollary 11.2.20 when N = 2) a C < such that g G (x, y)
B(0,R)
g GB(0,R) (x, y) + C for all x
/ B(0, R) and y K. Hence, GG K

500

11 Some Classical Potential Theory


B(0,R)
1 + CCap K, B(0, R) , and so we have shown that GG K
is a non-zero,
bounded potential on G whose charge is supported in K, which, by the preceding
equivalences, means that Cap(K; G) > 0. Conversely, if Cap(K; G) > 0, then,
again by the preceding equivalences, we know that pG
K > 0everywhere on G,
(N )
which, of course, means that Wx t (0, ) (t) K > 0, first for all
x G and then for all x RN . 
The last part of the preceding allows us to use capacity to determine whether
Brownian paths will hit a K RN . Indeed, we now know that they will if
and only if Cap(K; G) > 0 for some G K satisfying our hypotheses. Thus,
the ability of Brownian paths in RN to hit a set is completely determined by
the singularity in the Green function. Namely, they will hit K with positive
probability if and only if there is a non-zero supported on K for which GG
is bounded. When N = 1, there is no singularity, and so even points can be hit.
When N 2, there is a singularity, and so, in order to be hit, K has to be large
enough to support a measure that is sufficiently smooth to mollify the singularity
in the Green function. Non-trivial (i.e., Ks for which K{ is the interior of its
closure) examples of Ks that cannot be hit are hard to come by. Lebesgues
spine provides one in R3 and can be adapted to RN for N 3. When N = 2
one has too work much harder. The most famous example is a devilishly clever
construction, known as Littlewoods crocodile, due to J.E. Littlewood. See M.
ements de la Theorie Classique du Potenial published
Brelots lecture notes El
in 1965 by Centre de Documentation Universitaire, Sorbonne, Paris V.
11.4.2. The Capacitory Distribution. In this subsection I will give a probabilistic representation, discovered by K.L. Chung, of the capacitory distribution
N
G
K . Again I assume that G is a connected open subset of R and that either
N 3 or (11.1.20) holds.
N
The function `G
K : C(R ) [0, ] given by
(11.4.5)




G
`G
K () = sup t 0, () : (t) K




0 if t 0, G () : (t) K = .

is called a quitting time. Clearly, `G


K is not a stopping time. On the other
hand, it transforms nicely under the time-shift maps t . Specifically,
+
G
`G
K t = `K t

for t [0, G ).

Theorem 11.4.6 (Chung).2 Let G be a connected open subset of RN , assume


that either N 3 or that (11.1.20) holds, and suppose that K G with
2

This result appeared originally in K.L. Chungs Probabilistic approach in potential theory
to the equilibrium problem, Ann. Inst. Fourier Gren. 23 # 3, pp. 313322 (1973). It gives
the first direct probabilistic interpretation of the capacitory measure.

11.4 Capacity

501

Cap(K; G) > 0. Then, for all Borel measurable : G R that are bounded
below and every c G,
#
"

Z
(N )
(`G
K)
G
G
Wc
 , `K (0, ) .
(11.4.7)
dK = E
g G c, (`G
G
K)

Proof: Take u = pG
f and s for s > 0 as in Theorem 11.3.13.
K , and define
 s
(N )
G
Then sfs (x) = Wx 0 < `K s , and so, for any Cb (G; R),
"Z G
#
Z



(N )
G
Wc
g (c, y)(y) s (dy) = E
(t) fs (t) dt
G

1
s

1
s

(N )

i
 (N )
 G
(t) W(t) 0 < `G
K s , > t dt

(N )

i

(t) , t < `G

s
+
t
dt
K

EWc

EWc

#
" Z G

1 `K
G
(t) dt, `K (0, )
=E
s (`G
s)+
K
h
i
 G
(N )
EWc (`G
as s & 0,
K ) , `K (0, )
(N )

Wc

where, in the passage to the third line, I have applied the Markov property and
used the time-shift property of `G
K . Next, let Cc (G; R) be given, note that

is
again
an
element
of
Cc (G; R), and conclude from Theorem 11.3.13
= gG (c,
)
and the preceding that (11.4.7) holds first for s in Cc (G; R) and then for all
bounded, measurable s on G. 
Aside from its intrinsic beauty, (11.4.7) has the virtue that it simplifies the
proofs of various important facts about capacity. For instance, it allows one to
prove a basic monotone convergence result for capacity. However, before doing
so, I will need to introduce the the energy E G (, ), which is defined for locally
finite, non-negative Borel measures and on G by
ZZ
E G (, ) =
g G (x, y) (dx)(dy).
G2

Clearly E G (, ) is some sort of inner product, and so it is not surprising that


there is a Schwarz inequality for it.
Lemma 11.4.8.
and on G,

For any pair of locally finite, non-negative, Borel measures


E G (, )

E G (, )

E G (, );

and, when the factors on the right are both finite, equality holds if and only if
a b = 0 for some pair (a, b) [0, )2 \ (0, 0).

502

11 Some Classical Potential Theory

Proof: For each (t, x) (0, ) G, set


Z

pG (t, x, y) (dy)

f (t, x) =

g G (t, x, y) (dy),

and g(t, x) =

and note that, by the ChapmanKolmogorov equation, Tonellis Theorem, and


Schwarzs Inequality:

E G (, ) =

ZZ

pG (t, x, y) (dx)(dy) dt

Z
(0,)

G2

ZZ
=

t
2, x

t
2, x

dtdx

(0,)G

ZZ

2
t
2, x

ZZ

2
t
2, x

dtdx

(0,)G

12

12

ZZ

ZZ

g(t, x) dtdx


f (t, x) dtdx
(0,)G

(0,)G


dtdx

(0,)G

12

12

q
E G (, ) E G (, ).

Furthermore, when f and g are square integrable, then equality holds if and only
if they are linearly dependent in the sense that af bg = 0 Lebesgue-almost
everywhere for some non-trivial choice of a, b [0, ). But this means that
Z

a
a
d = lim
T
&0
T
G

a
= lim
T &0 T

Z


(x)p (t, x, y) (dx) dt
G

ZZ

b
(x) f (t, x) dtdx = lim
T &0 T

b
T &0 T

Z

= lim

(x) g(t, x) dtdx

(0,T ]G

(0,T ]G

ZZ

(x)pG (t, x, y) (dx)

Z
dt = b

d
G

for every Cc (G; R), and so a b = 0. 


With this lemma, I can now give the application of Theorem (11.4.7) mentioned above.

11.4 Capacity

503

Theorem 11.4.9. Let G be as in Theorem (11.4.7) and


T{Kn : n 1} a nonincreasing sequence of compact subsets of G. If K = 1 Kn , then, for every
Borel measurable : G R that is continuous in a neighborhood of K1 ,
Z
Z
G
lim
dKn =
dG
K,
n

and so
Cap(K; G) = lim Cap Kn ; G).
n

Finally, if is any non-negative Borel measure on G satisfying (G \ K) = 0


and GG 1, then

E G , Cap(K; G) and equality holds = G
K.
Proof: Let c G \ K1 be given. In view of (11.4.7), checking the first assertion
(N )
comes down to showing that, for Wc -almost every C(RN ),

G
G
`G
if either
Kn () `K () 0, ()
 G



G
`Kn () : n 1 0, G () or `G
K () 0, () .


To this end, let C(RN ) with (0) = c be given. If `G
() : n 1
K
n

0, G () , then it is clear that

G
`G
where T 0, G () .
Kn () & T `K (),
In addition, by continuity, (T ) K, which
means first that T `G
K () and

G
G
G
then that `Kn () `K () 0, () . Next, observe that
 G

G
G
G
0 < `G
for all n Z+ .
K () < () < = `Kn () `K (), ()
Hence, we are done if (11.1.20) holds. On the other hand, if N 3, then,
(N )
because limt |(t)| = for Wc -almost all C(RN ), we know that, for
(N )
Wc -almost every C(RN ),
 G


G () = and `G
`Kn () : n 1 0, G () ;
K () (0, ) =
and so we have now completed the proof of the first part.
To prove the final assertion, first choose compact Kn s in G so that K
(Kn ) for each n Z+ and Kn & K as n . Because pG
Kn  K 1 and
G
pKn 1, we have that
Z

G
G
G
Cap(K; G) =
pG
G
Kn (x) K (dx) = E
K , Kn
G

= EG

 12
G
E G G
Kn , Kn
Z
 12
1
G
G
G
G 2
pKn (x) Kn (dx)
K , K

EG

1
1
1
1
G 2
G 2
Cap K; G 2
Cap Kn ; G 2 E G G
G
K , K
K , K

G
E G G
K , K

 21

504

11 Some Classical Potential Theory


G
as n . Hence, Cap(K; G) E G G
K , K . On the other hand, if (G \ K) =
0 and GG 1, then, by Theorem 11.4.1, GG pG
K 1,
Z
Z

G
E G (, ) =
GG d
pG
G
K d = E
K,
G
G

G
G
G
K , K

 12

E (, ) 2
E
Z
 12
q
p
1
G
G
2
E G (, ),
Cap(K;
G)
E
(,
)
=
pG
d
K
K
G

and equality can hold only if aG


K b = 0 for some non-trivial pair (a, b)
[0, )2 . When one takes = G
,
K this, in conjunction with the preceding, proves
G
that Cap(K; G) = E G G
,

K
K . In addition, for any with (G \ K) = 0 and
G
G
G 1, it shows that E (, ) Cap(K; G) and that equality can hold only
if and G
in which case = G
K are related by a non-trivial linear equation,
K

G
G
G
G
follows immediately from the equality E K , K = E (, ). 
The result in Theorem 11.4.9, which was known to Wiener, played an important role in his analysis of classical potential theory. To be more precise, when
3
3
N = 3 and K{ is regular, pR
K is the continuous function on R that is harmonic
off K, is 1 on K, and tends to 0 at infinity. Thus, it is a relatively simple problem to define the capacitory distribution for such Ks in R3 . The importance
to Wiener of results like that in Theorem 11.4.9 is that they enabled him (cf.
Exercise 11.4.20) to make a consistent assignment of capacity to Ks for which
K{ is not necessarily regular.
11.4.3. Wieners Test. This subsection is devoted to another of Wieners
famous contributions to classical potential theory.
As was pointed out following Corollary 11.4.4, capacity can be used to test
whether Brownian paths will hit a compact set K. By Lemma 11.1.21, an
equivalent statement is that capacity can be used to test whether reg (K{) is
empty or not. The result of Wiener that will be proved here can be viewed as a
sharpening of this remark.
Assume that N 2, and let an open subset G of RN and an a G be given.
For n Z+ , set
n
o
Kn = y
/ G : 2n1 |y a| 2n ,
and define

(11.4.10)

Wn (a, G) =

nCap Kn ; B(a, 1)

2n(N 2) Cap Kn ; B(a, 1)

if N = 2


Then Wieners test says that


(11.4.11)

a reg G

X
n=1

Wn (a, G) = .

if N 3.

11.4 Capacity

505

Notice that, at least qualitatively, (11.4.11) is what one should expect in that
the divergence of the series is some sort of statement that G{ is robust at a.
The key to my proof of Wieners test is the trivial observation that because
Z
B(a,1)
B(a,1)
pn (x) pKn (x) =
g B(a,1) (x, y) Kn (dy),
Kn

and, depending on whether N = 2 or N 3, there exists (cf. Exercise 11.2.21) an


1
N (0, 1) such that N n g B(a,1) (a, y) N
n or N 2n(N 2) g B(a,1) (a, y)
1 n(N 2)
n
n1
N 2
for y B(a, 2 ) \ B(a, 2
), we know that
N Wn (a, G) pn (a) Wn (a, G),

n Z+ .

Hence, in probabilistic terms, Wieners test comes down to the assertion that
Wa(N )

G
0+

X


= 0 = 1
Wa(N ) An = ,
1

where An is the set of C(RN ) that visit Kn before leaving B(a, 1). Actually,
although the preceding equivalence is not obvious, the closely related statement



G
(11.4.12)
Wa(N ) 0+
= 0 = 1 Wa(N ) lim An > 0
n

G
is essentially immediate. Indeed, if (0) = a and 0+
() = 0, then there
exists a sequence of times tm & 0 with the property that (tm ) B(a, 1)
G{ for all m, from which it is clear that visits infinitely many Kn s before
leaving B(a, 1). Hence, the = in (11.4.12) is trivial. As for the opposite
N
B(a,1)
implication,
suppose
() < ,

 B(a,1)
 that C(R ) has the properties that
t 0,
() : (t) = a} = {0}, and that visits infinitely many Kn s
before leaving B(a, 1). We can then find a subsequence {nm : m 1} and
a convergent sequence of times tm > 0 such that (tm ) Knm for each m.
Clearly, limm 
(tm) = a, and therefore
limm

tm = 0. In other words, if
B(a,1) () < , t 0, B(a,1) () : (t) = a = {0}, and limn An ,
G
then 0+
() = 0. Hence, since N 2 and therefore

Wa(N )

: B(a,1) () < and t > 0 (t) 6= a

= 1,

we have shown that





G
Wa(N ) 0+
= 0 Wa(N ) lim An ;
n

(N )

and therefore, because Wa


in (11.4.12).


G
0+
= 0 {0, 1}, we have proved the equivalence

506

11 Some Classical Potential Theory

In view of the preceding paragraph, the proof of Wieners test reduces to the
problem of showing that
Wa(N )

(11.4.13)

lim An > 0


Wa(N ) An = .

By the trivial part of the BorelCantelli Lemma, the = implication in


(11.4.13) is easy. On the other hand, because the events {An : n 1} are not
mutually independent, the non-trivial part of that lemma does not apply and
therefore cannot be used to go in the opposite direction. Nonetheless, as we will
see, the following interesting variation on the BorelCantelli theme does apply
and gives us the = implication in (11.4.13).
Lemma 11.4.14. Let (, F, P) be a probability space and {An : n 1} a
sequence of F-measurable sets with the property that



P Am An CP Am P An , m Z+ and n m + d,
for some C [1, ) and d Z+ . Then

X
1




1
.
P An = = P lim An
n
4C

Proof: Because

X


P An = =
P And+k = for some 0 k < d,

n=1

n=1

whereas


P




lim An P lim And+k
n

for each 0 k < d,

I may and will assume that d = 1. Further, since





P lim An lim P An ,
n

1
for all n Z+ . In particular, these assumptions
I will assume that P(An ) 4C
mean that, for each m Z+ , we can find an nm > m such that

sm

nm
X

  3 1
,C .
P A` 4C

`=m


Pn
Indeed, simply take nm to be the largest n > m for which `=m P A`
At the same time, by an easy induction argument on n > m, one has that
!
n
n
[
X

 1 X
P Ak A`
P
A`
P A`
2
`=m

`=m

mk6=`n

1
C.

11.4 Capacity
for all n > m 1, and therefore
!

[
P
A` P
`=m

n
m
[

!
A`

sm

`=m

507

1
Cs2m

4C
2

for all m Z+ . 
Proof of Wieners Test: All that remains is to check that the sets An
(N )
appearing in (11.4.13) satisfy the hypothesis in Lemma 11.4.14 when P = Wa .
To this end, set
n
o
n () = inf t (0, ) : (t) Kn .


Clearly, An = n < B(a,1) , and so



Wa(N ) Am An Wa(N ) m < n < B(a,1) + Wa(N ) n < m < B(a,1)
for all m Z+ and n 6= m. But, by the Markov property,



(N ) 
Wa(N ) m < n < B(a,1) EWa pn (m ) , m () < B(a,1) ()
(m, n)pm (a),
where I have introduced the notation (m, n) maxxKm pn (x). Finally, beB(a,1)

cause pn (x) = GB(a,1) Kn (x) and there is a CN < such that


S
CN for x |mn|2 Km and y Kn ,

g B(a,1) (x,y)
g B(a,1) (a,y)

(m, n) CN pn (a) for all |m n| 2.



(N )
Hence, since pn (a) = Wa An , we have now shown that



Wa(N ) Am An 2CN Wa(N ) Am Wa(N ) An
for all |m n| 2,
which means that Lemma 11.4.14 applies with C = 2CN and d = 2. 
11.4.4. Some Asymptotic Expressions Involving Capacity. Assume
K{
that K RN and that N 2. Given K RN , define K () = 0+
() =
inf{t > 0 : (t) K} to be the first positive entrance time into K. In this
subsection I will make some computations in which K and capacity play a
critical role.
I begin with a result of F. Spitzers3 about the rate of heat transfers from
the outside to the inside of a compact set. To be precise, let K RN , where
N 3, and think of
Z

(11.4.15)
EK (t)
Wx(N ) K t dx
K{

as the amount of heat that flows into K during [0, t] from outside.
3

See Electrostatic capacity, heat flow, and Brownian motion, in Z. Wahrsh. Gebiete. 3. Recently, M. Van den Burg has written several papers in which he greatly refines Spitzers result.

508

11 Some Classical Potential Theory

Theorem 11.4.16 (Spitzer). Assume that N 3, and, for K RN , define


t
EK (t) as in (11.4.15). Then
EK (t)
= Cap(K; RN ).
t
t
lim

Proof: Because, by the second part of Lemma 11.1.5,



Wx(N ) K = t = 0

for all (t, x) (0, ) RN ,

we know that t
EK (t) is a bounded, non-negative, continuous, non-decreasing
function.
I next observe that, for any 0 h < t,
Z

EK (t) EK (t h) =
Wx(N ) t h < K t dx.
RN

To see this, notice that there would be nothing to do if the integral were over
(N )
K{. On the other hand, by part (ii) of Exercise 10.2.19, Wx (K > 0) = 0
Lebesgue-almost everywhere on K, and so the integral over K does not contribute anything.
I now want to replace the preceding by
Z

h
(*)
EK (t) EK (t h) =
Wy(N ) K h and K
> t dy,
RN

where


h
K
() inf s (h, ) : (s) K
is the first entrance time into K after time h. To prove (*), set
(x,y)

(s) =

s
ts
x + t (s) + y,
t
t

s [0, t],

where t (s) = (s) st


t (t). Then, by (8.3.12) and the reversibility property
discussed in Exercise 8.3.22,


Wx(N ) t h < K t
Z


(x,y) 
=
W (N ) t h < K t
t g (N ) (t, y x) dy
N
ZR


(y,x) 
(y,x) 
h
=
W (N ) K t
h and K
t
> t g (N ) (t, y x) dy,
RN

and now integrate with respect to x to arrive at (*) after an application of


Tonellis Theorem and another application of (8.3.12).

11.4 Capacity

509

Starting from (*), one has that, for each h [0, ),



K (h) lim EK (t + h) EK (t)
t
Z


h
=
Wy(N ) K h and K
= dy,
RN

the convergence being uniform for h in compacts. Thus, K is non-negative and


continuous, and, from its definition, it is clear that it is additive in the sense
that K (h1 + h2 ) = K (h1 ) + K (h2 ). Therefore, by standard results about
additive functions, we now know that K (h) = hK (1).
The problem which remains is that of evaluating K (1). First observe that,
by (4.3.13),





|y K|2
(N )
h
(N )
,
Wy
K h and K = Wy K h 2N exp
2N h

and therefore that

1
K (h)
= lim
h&0 h
h&0
h

K (1) = lim


h
Wy(N ) K h & K
= dy

B(0,R)

for any R > 0 satisfying K B(0, R). Second, note that





h
h
Wy(N ) K h and K
= = Wy(N ) K
= Wy(N ) K =
Z


N
N
h
= Wy(N ) K < Wy(N ) K
< = pR
(y)

g (N ) (h, y )pR
K
K () d.
RN


Finally, combine these with Theorem 11.3.13 to arrive at K (1) = Cap K; RN .
To complete the proof, set ]t[= t btc and write
[t]

EK (t) = EK

 X


]t[ +
EK ]t[ +n EK ]t[ +n 1 .
n=1

Using this together with K (h) = hCap(K; G), one obtains the desired result. 
The next two computations provide asymptotic formulas as t % for the

(N )
quantity Wx K (t, ) .
Theorem 11.4.17.4 If N 3 and K RN , then, as t % ,
pK (t, x)

Wx(N )


N
 2Cap(K; RN ) 1 pR
K (x) 1 N
t 2
(t, )
N
(2) 2 (N 2)

uniformly for x in compacts.


4

This result was conjectured by Kac and first proved by his student A. Joffe. However, I will
follow the argument given by F. Spitzer in the article cited above.

510

11 Some Classical Potential Theory

Proof: Without loss in generality (cf. Corollary 11.4.4), I will assume that
N
K{
Cap(K; RN ) > 0. Next, set pK (x) = pR
(t, x, y), and
K (x) and pK (t, x, y) = p
note that, by the Markov property,
Z
pK (t, x) =

pK (y) pK (t, x, y) dy.


K{
N

Thus, since pK (t, x, y) (2t) 2 , we know that

lim sup t

N
2

t xRN



Z




p
(t,
x)

p
(y)
p
(t,
x,
y)
dy
K
=0
K
K


|y|R

for every R > 0 with K B(0, R). At the same time, because
Z

g R (x, y) R
K (dx),

pK (y) =
K

it is clear that
lim |y|N 2 pK (y) =

|y|

2Cap(K; RN )
.
(N 2)N 1

Hence, we have now shown that





N Z
p
(t,
x,
y)
2Cap(K;
R
)


K
dy
lim sup t 1 pK (t, x)
=0
N
2
t xRN


(N 2)N 1 |y|R |y|
N
2

for each R (0, ) with K B(0, R), and what we must still prove is that

(*)




N Z
N 1 (N )
pK (t, x, y)

2 1
W
(
=
)
dy

lim sup t
=0
K
N
x
N
2
t |x|r

|y|
(2) 2
|y|R

for all positive r and R with K B(0, R).


To prove (*), let r and R be given, and use (10.3.8) to see that
Z
|y|R

h
i

(N )
pK (t, x, y)
Wx
dy
=
q(t,
x)

E
q
t

,
(
)
,

<
t
,
K
K
K
|y|N 2

where
Z
q(t, x)
|y|R

g (N ) (t, y x)
dy
|y|N 2

for (t, x) (0, ) RN .

11.4 Capacity

511

After changing to polar coordinates and making a change of variables, one can
easily check that, for each T [0, ),

N
N 1

lim sup t 2 1 q(t s, x)
N
t 0<sT
(2) 2
|x|r




= 0.

Thus, if, for T (0, t), we write

N 1 (N )
pK (t, x, y)
(K = )
dy
N Wx
N 2
|y|
(2) 2
|y|R
!
i
h N

(N )
N
N 1
N 1
Wx
1
2 1 q t
,

T
,
(
)

E
t
= t 2 q(t, x)
K
K
K
N
N
(2) 2
(2) 2
i
h N


(N )
N 1 (N )
K (T, ) ,
EWx t 2 1 q t K , (K ) , K (T, t) +
N Wx
(2) 2

N
2

then it becomes clear that (*) will follow once we check that

(**)

lim



lim sup Wx(N ) K (T, ) = 0 and
T xRN
h
i

(N )
N
sup t 2 1 EWx q t K , (K ) , K (T, t) = 0.

T t>T
xRN

To check the first part of (**), note that, by the Markov property,

Wx(N ) K (T, T + 1] =


pK (T, x, y)Wy(N ) K 1 dy

K{
N
2

(2T )

RN


N
Wy(N ) K 1 dy CT 2 ,

where C = C(N, R) (0, ). Hence, after writing


Wx(N )

 X


(T, )
Wx(N ) K (T + n, T + n + 1] ,
n=0


(N )
we see that, as T , Wx K (T, ) 0 uniformly with respect to
x RN .
To handle the second part of (**), note that there is a constant A (0, )
for which
N
q(t, x) A (t 1)1 2 , (t, x) (0, ) K,

512

11 Some Classical Potential Theory

and therefore
N

(N )

t 2 1 EWx




q t K , (K ) , K (T, t)


N
1
2
Wx(N ) K [t] 1, t
At
[t]1


N
(t `)1 2 Wx(N ) K (` 1, `]

`=[T ]

[t]1

ACt

N
2

([t] 1)

N
2

+ ACt

N
2

(t `)1 2 (` 1) 2 ,

`=[T ]

where the C is the same as the one that appeared in the derivation of the first
part of (**). Thus, everything comes down to verifying that
N

lim sup n 2 1

m n>m

n1
X

(n `)1 2 ` 2 = 0.

`=m

But, by taking m = m N 1 and considering

(n `)1 2 ` 2

and

(n `)1 2 ` 2

(1m )n`n

m`(1m )n

separately, one finds that there is a B (0, ) such that


N

n 2 1

n1
X

(n `)1 2 ` 2 Bm . 

`=m

As one might guess, on the basis of (11.2.15), the analogous situation in R2 is


somewhat more delicate in that it involves logarithms.
Theorem 11.4.18 (Hunt).5 Let K be a compact subset of R2 , define K as

(2)
above, assume that Wx K < = 1 for all x R2 , and use hK to denote
the function hG given in (11.2.15) when G = R2 \ K. Then, as t % ,
 2hK (x)
Wx(2) K > t
log t
5

for each x R2 \ K.

This theorem is taken from G. Hunts article Some theorems concerning Brownian motion,
T.A.M.S. 81, pp. 294319 (1956). With breathtaking rapidity, it was followed by the articles
referred to in 11.1.4.

11.4 Capacity

513

Proof: The strategy of Hunts proof is to deal with the Laplace transform
Z



(2) 
et W (2) K > t dt = 1 1 EWx eK ,
0

show that


(2) 
log 1 
1 EWx eK = hK (x),
&0 2

(*)

lim

and apply Karamatas Tauberian Theorem to conclude first that


Z

log t t (2)
Wx K > d = hK (x)
lim
t 2t 0

and then, because t
W (2) K > t is non-increasing, that the asserted result
holds. Thus, everything comes down to proving (*).
Set G = R2 \ K. By assumption, G satisfies the hypotheses of Theorem
11.2.14. Now let x G be given, and choose y G \ {x} from the same
connected component of G as x. Then pG (t, x, y) > 0 for all t (0, ). In
addition, by (10.3.8), for each (0, ),
Z
et pG (t, x, y) dt
0


Z
Z

(N )
t (2)
Wx
K
t (2)
=
e g
t, y (K ) dt E
e
e g (t, y x) dt .
0

Next observe that




Z
|z|2
t (2)
e g (t, z) dt = f
2
0
Z


1
t1 exp t t1 dt for > 0.
where f ()
2 0

Writing
Z
2f () =
0

1
1



t exp t t1 dt +
Z
+
t1 et dt,





t1 et exp t1 1 dt

integrating by parts, and performing elementary manipulations, we find that


f () =

log

+ + o(1)

as & 0,

514

11 Some Classical Potential Theory

where
=

et log t dt.

At the same time, we have that


Z
et pG (t, x, y) dt g G (x, y)

as & 0.

Hence, when we plug these into the preceding, we get


g G (x, y) =


(2) 
1
1
log |y x| + EWx log |y (K )|, K <


(2) 
log 1 
1 EWx eK + o(1)
+
2

as & 0. Finally, after comparing this to (11.2.16), we arrive at (*). 


Let K RN be as in the preceding theorem, and choose some c K{. By
comparing the result just obtained to (11.2.15), we see that
(2)

lim

Wx
(2)

Wx


K > t

K > BR2 (c,t)

 = 2 for each x K{.

It would be interesting to know if there is a more direct route to this conclusion,


in particular, one that avoids a Tauberian argument.
Exercises for 11.4
Exercise 11.4.19. Assume that N 2. Given a M(RN ), say that is
tame if
Z

sup
log |y x| 1 (dy) < when N = 2
xR2
R2
Z
sup
|y x|2N (dy) <
when N 3.
xRN

RN

Further, say that BRN has capacity zero if there is no tame M(RN )
for which () > 0.

(i) If K RN , show that K has capacity 0 if and only if Cap K; B(0, R) = 0
for some R > 0 with K B(0, R). Further, show that if K has capacity 0, G
is open with K G, and either N 3 or (11.1.20) holds, then Cap(K; G) = 0.
(ii) If BRN , show that has capacity 0 if and only if every compact K
has capacity 0.
(iii) For any open G RN , show that G \ reg G has capacity 0.

Exercises for 11.4

515

(iv) Let G be an open subset of RN , and assume that either N 3 or (11.1.20)


holds. If u E(G) is not identically infinite, show that {x G : u(x) = } has
capacity 0.
(v) Suppose that G is an open subset of RN and that either N 3 or (11.1.20)
holds. If K G, show that {x K : pG
K (x) < 1} has capacity 0. Conclude
N
N
that if M(R ) is tame and (R \ K) = 0, then
Z
Z
G
G
G
(K) =
pK d = E (, K ) =
GG dG
K.
G

G
N

Exercise 11.4.20. Let G be an open subset of R for some N 2, and assume


that either N 3 or that (11.1.20) holds. We know how to define Cap(K; G)
for K G. However, the map K
Cap(K; G) is somewhat mysterious. In
this exercise we will discuss a few of its important properties, properties that
enabled G. Choquet1 to prove that Cap( , G) admits a well-defined extension
to all of BG .
(i) If , M(G) and GG GG , show that E G (, ) E G (, ). In
particular, conclude that Cap(K1 , G) Cap(K2 , G) for all compacts K1 K2
G. Thus the convergence in Theorem 11.4.9 is non-increasing convergence.
(ii) If K1 , K2 G, show that

G
(N )
pG
K2 < G K1
K1 K2 (x) pK1 (x) = Wx

G
Wx(N ) K2 < G K1 K2 pG
K2 (x) pK1 K2 (x),
G
G
G
and therefore that pG
K1 K2 + pK1 K2 pK1 + pK2 .
(iii) By combining (i) and (ii), arrive at

E G (K1 K2 + K1 K2 , K1 K2 + K1 K2 ) E G (K1 K2 + K1 K2 , K1 + K2 ).
Next, apply (v) of the preceding exercise to see that
E G (K1 K2 +K1 K2 , K1 K2 +K1 K2 ) = Cap(K1 K2 ; G)+3Cap(K1 K2 ; G)
and
E G (K1 K2 +K1 K2 , K1 +K2 ) = Cap(K1 ; G)+Cap(K2 ; G)+2Cap(K1 K2 ; G),
and conclude that Cap( ; G) satisfies the strong sub-additivity property
Cap(K1 K2 ; G) + Cap(K1 K2 ; G) Cap(K1 ; G) + Cap(K2 ; G).
What Choquet showed is that a non-negative set function defined for compact
subsets of G and satisfying the monotonicity property in (i), the monotone
convergence property in (ii), and the strong subadditivity property in (iii) admits
a unique extension to BG in such a way that these properties persist. In the
articles alluded to earlier, Hunt used Choquets result to show that the first
positive entrance into a Borel set is measurable.
1

See Choquets Lectures on Analysis, Vol. I, W.A. Benjamin (1965).

Notation

General
Description

Notation
ab&ab
a+ & a

The minimum and the maximum of a and b


The non-negative part, a 0, and non-positive part, (a
0), of a R

f S

The restriction of the function f to the set S

k ku

The uniform (supremum) norm

kk[a,b]

See

The uniform norm of the path restricted to the inter(4.1.1)

val [a, b]
Variation norm of the path [a, b]

(4.1.2)

Euler Gamma function

(1.3.20)

N 1

The surface area of the sphere SN 1 in RN

(2.1.13)

N 1

The volume, N 1 N 1 , of the unit ball B(0; 1) in RN

var[a,b] ()
(t)

The integer part of t R

Sets and Spaces


A

The complement of the set A

A()

The -hull around the set A

3.1

The indicator function of the set A.

1.1

1A
BE (a, r)
B (E; R)
K E

The ball of radius r around a in E. When E is omitted,


it is assumed to be the RN for some N Z+
Space of bounded, Borel measurable functions from E
into R
To be read: K is a compact subset of E.

The complex numbers

The non-negative integers: N = {0} Z+

517

Notation

518

The unit sphere in RN

SN 1

The set of rational numbers

Q
Z & Z+

Set of all integers and the subset of positive integers


The space C ([0, ); RN ) of continuous paths : [0, )

C(RN )

RN

9.3

The space of bounded continuous functions from E into

Cb (E; R)

R.
The space of continuous, R-valued functions having com-

Cc (G; R)

pact support in the open set G


The space of functions (t, x) R RN R which are

C 1,2 (R RN ; R)

continuously differentiable once in t and twice in x.


The space of right-continuous paths : [0, ) RN

D(RN )

with left-limits on (0, )

4.1.1

The CameronMartin subspace for Wiener measure on

H ( RN )

8.1.2

( RN )
The Lebesgue space of E-valued functions f for which

Lp (; E)

kf kpE is -integrable
The space of Borel probability measures on E

M1 (E)

9.1.2

The space of non-negative, finite, Borel probability mea-

M(E)

sures on E

S (RN ; R) or S (RN ; C)

Real- or complex-valued Schwartz test function space on


3.2.3

RN

Measure Theoretic
BE
B(E; R)

The Borel -algebra over E


The space of bounded, measurable functions on E
To be read the expectation value of X with respect to

E [X, A]

on A. Equivalent to

X d. When A is unspecified, it

is assumed to be the whole space


a

The unit point mass at a

Notation
A
E [X | F ]
f

519

Lebesgue measure on the set A. Usually A = RN or some


interval
To be read: the conditional expectation value of X given
5.1.1

the -algebra F
The Fourier transform of the function f

2.3.1

f g

The convolution of f with g

h, i

An alternative notation for E []

2.1

The density of the Gauss distribution in RN

10.1

g (N ) (t, x)

Wiener Measure
Gaussian or normal distribution with mean m and co-

m,C

n =

2.3.1

variance C
The Fourier of the measure

2.3.1

The convolution of measures with


The measure is absolutely continuous with respect to
The sequence {n : n 1} tends weakly to

The set of medians of the random variable Y

N (m, C)

Normal distributions with mean m and covariance C

({Xi : i I})

Fi

9.1.2

The measure is singular to

med(Y )

Chap. III

The pushforward (image) of under

1.4
2.3.1
(1.1.16)

The -algebra generated by the set of random variables


{Xi : i I}
The -algebra generated by

iI

Fi

iI

The differential time-shift map on C(RN )

7.1.4

The time-shift map on C(RN )

10.2.1

Wiener measure on (RN ) or C(RN )

8.1.1

The distribution of x + under W (N )

10.1.1

W (N )
(N )

Wx

(H, E, WH )

The abstract Wiener space with CameronMartin space


H

8.2.2

Notation

520

Potential Theoretic
E(G)
g G (x, y)
GG
pG (t, x, y)

The set of excessive functions on G


Dirichlet Green function for G

11.3.1
11.2

Green potential with charge in G

(11.3.1)

Dirichlet heat kernel for G

10.3.1

Index

iterated logarithm, 189, 366


L
evys martingale characterization, 282
L
evys modulus of continuity, 191
non-differentiability, 183
on a Banach space, 361
pinned, 327, 334
recurrence in one and two dimensions, 413
reflection principle, 188, 294
rotational invariance, 187
scaling invariance, 187, 335
for Banach space, 365
strong law, 188
time inversion, 187
for Banach space, 365
transience for N 3, 414
transition function for killed, 298
variance of paths, 333
with drift, 444
Burkholders Inequality, 262
application to Fourier series, 263
application to Walsh series, 264
for continuous martingales, 289
martingale comparison, 257
for martingale square function, 262

absolutely monotone, 19
absolutely pure jump path, 158
abstract Wiener space, 309
orthogonal invariance, 328
ergodicity, 329
adapted, 266
-algebra
atom in, 13
tail, 2
trivial, 2
approximate identity, 16
a.e. convergence of, 241
Arcsine Law, 407
a characterization of, 415
for random variables, 409
asymptotic, 32
atom, 13
Azemas Inequality, 264
B
Bachelier, 188
barrier function, 423
Beckners inequality, 108
Bernoulli multiplier, 101
Bernoulli random variables, 5
Bernstein polynomial, 17
BerryEsseen Theorem, 77
Bessel operator, 350
Beta function, 138
Blumenthals 01 Law, 426
Bochners Theorem, 119
Borel measureable linear maps are continuous, 314
BorelCantelli Lemma
extended version of, 506
martingale extension of, 229
original version, 3
Brownian motion, 177
Erd
osKac Theorem, 399
H
older continuity, 183
in a Banach space, 359

C
Calder
onZygmund Decomposition
Gundys for martingales, 227
CameronMartin formula, 312
CameronMartin space, 305
classical, 305
in general, 310
capacitory distribution, 499
Chungs representation of, 500
capacitory potential, 497, 499
capacitory distribution, 499
capacity, 499
monotone continuity, 502
capacity zero, 514
Cauchy distribution, 149
Cauchy initial value problem, 400
centered Gaussian measure, 299
non-degenerate, 306

521

522

Index

centered random variable, 179


Central Limit phenomenon, 60
Central Limit Theorem
basic case, 64
BerryEsseen, 77
higher moments, 87
Lindeberg, 61
sub-Gaussian random variables, 89
characteristic function, 82
Chebychev polynomial, 34
Chebyshevs inequality, 15
Chernoffs Inequality, 30
ChungFuchs Theorem, 231
conditional expectation, 194
application to Fourier series, 204
basic properties, 197
existence and uniqueness, 195
infinite measure, 200
Banach spacevalued case, 200
Jensens Inequality for, 210
properties, 197
regular, 386
versus orthogonal projection, 202
conditional probability, 196
as limit of nave case, 209
nave case, 193
regular version, 388
conditional probability distribution, 388
continuous martingale, 267
Burkholders Inequality for, 289
DoobMeyer Theorem, 285
exponential estimate, 291
exponential martingale, 291
continuous singular functions, 47
convergence
in law or distribution, 379
weak, 116
convolution, 63
measure with measure, 115
of function with measure, 83
of functions, 63
countably generated -algebra, 13
covariance, 84
Cram
ers Theorem, 27
D
De,Finetti, 219
strong law, 220
difference operator, 18
Dirichlet problem, 418

balayage procedure, 426


CourantFriedrichsLewy scheme, 428
finite difference scheme, 428
PerronWiener solution, 423
regular point, 421
uniqueness, 463
uniqueness criterion
N 3, 466
N {1, 2}, 467
distribution, 12
function, 7
Gaussian or normal, 85
uniform, 6
distribution of a stochastic process, 152
Donskers Invariance Principle, 393
Doobs Decomposition, 213
continuous case, see DoobMeyer
Doobs Inequality
Banach-valued case, 239
continuous parameter, 270
discrete parameter, 207
Doobs Stopping Time Theorem
continuous parameter, 275
discrete parameter, 213
DoobMeyer Decomposition, 285
drift, 444
Duhamels Formula, 282
for Green function when N = 2, 482
for Green function when N 3, 476
for killed Brownian motion, 298
E
eigenvalues for Dirichlet Laplacian, 450
principal eigenvalue, 450
Weyls asymptotic formula, 453
empirical distribution, 384
energy of a charge, 501
equicontinuous family, 377
Erd
osKac Theorem, 399
ergodic hypothesis
continuous case, 254
discrete case, 249
ergodic theory
Individual Ergodic Theorem
continuous parameter, 254
discrete parameter, 248
stationary family, 251
error function, 72
Eulers Gamma function, 32
excessive function, 488

Index
excessive function (continued)
charge determined by, 494
Riesz Decomposition of, 492
exchangeable random variables, 220
Strong Law for, 220
exponential random variable, 161
extended stopping time, 278
F
Ferniques Theorem, 306
application to functional analysis, 314
Feynmans representation, 303
FeynmanKac
formula, 403
heat kernel, 437
fibering a measure, 389
first entrance time, asymptotics of distribution
N = 2, 512
N 3, 509
first exit time, 419
fixed points of T , 92
Fourier transform, 82
Beckners inequality for, 108
diagonalized by Hermite functions, 100
for measure on Banach space, 301
inversion formula, 98, 112
of a function, 82
of a measure, 82
operator, 100
Parsevals Identity for, 112
free fields
Gaussian, 343
erogicity, 358
existence of, 352
function
characteristic, 82
distribution, 7
error, 72
Eulers Beta, 138
Eulers Gamma, 32
excessive, 488
Fourier transform of, 82
Hermite, 100
indicator, 4
moment generating, 23
logarithmic, 25
normalized Hermite, 112
probability generating, 19
progressively measurable, 266

523
Rademacher, 5
rapidly decreasing, 82
tempered, 97
G

Gamma distribution, 138


Gamma function, 32
Gauss kernel, 23
Gaussian family, 179
conditioning, 203
Gaussian measure
on a Banach space, 299
support of, 321
Gaussian random variable,
independence vs. orthogonality, 94
generalized Poisson process, 171
Green function, 476
for balls, 486
Duhamels Formula for N = 2, 482
Duhamels Formula for N 3, 476
properties when N = 2, 485
Greens Identity, 487
ground state, 439, 448
associated eigenvalue, 439
ground state representation, 439
Guivarch recurrence lemma, 45, 256
H
Haar basis, 319
Hardys Inequality, 238
HardyLittlewood Maximal Inequality, 235
harmonic function, 419
Harnacks Inequality and Principle, 471
Liouville Theorem, 472
removable singularities for, 472
harmonic measure, 468
for balls, 469
for RN
+ , 469
harmonic oscillator, 406
Harnacks Inequality, 471
Harnacks Principle, 471
heat equation, 400
Cauchy initial value problem, 400
heat kernel, 429
Dirichlet, 435
FeynmanKac, 437
Hermite, 406, 454
heat transfer, Spitzers asymptotic rate, 507
Hermite functions, 100
eigenfunctions for Hermite operator, 454

524

Index

Hermite functions (continued)


Fourier eigenvectors, 100
normalized, 112
Hermite heat kernel, 406
Hermite multiplier, 98
Hermite operator, 406
Hermite polynomials, 97
Lp -estimate, 114
HewittSavage 01 Law, 221
H
older conjugate, 100
hypercontractive, 105
I
independent
events or sets, 1
random variables, 4
existence in general, 12
existence of R-valued sequences, 7
-algebras, 1
indicator function, 4
inequality
Azemas, 264
Burkholders, 262, 289
Grosss logarithmic Sobolev, 114
Harnacks, 471
Jensens, 210, 240
Khinchines, 94
Kolmogorovs, 36
L
evys, 40
Nelsons Hypercontractive, 106
infinitely divisible, 115
measure or law, 115
inner product for measures, 230
integer part, 5
invariant set, 246
J
Jensens Inequality, 210
Banach-valued case, 240
jump function, 156
K
Kacs Theorem, 252
Kakutanis Theorem, 229
kernel
Gauss, 23
Mehlers, 98
Khinchines Inequality, 94

Kolmogorovs
continuity criterion, 182
Extension or Consistency Theorem, 384
Inequality, 36
Strong Law, 38
01 Law, 2
Kroneckers Lemma, 37
L
-system, 8
Laplace transform inversion formula, 21
large deviations estimates, 28
Law of Large Numbers
Strong
in Banach space, 241, 256, 384
for empirical distribution, 384
for exchangeable random variables, 220
Kolmogorovs, 38
Weak, 16
refinement, 20, 44, 45
Law of the Iterated Logarithm
converse, 56
proof of, 54
statement, 49
Strassens Version, 340, 366
Lebesgues Differentiation Theorem, 237
L
evy measure, 128
It
o map for, 390
L
evy operator, 268
L
evy process, 152
reflection, 292
L
evy system, 134
L
evys Continuity Theorem, 118
second version, 120
L
evyCram
er Theorem, 66
L
evyKhinchine formula, 136
limit superior of sets, 2
Lindebergs Theorem, 61
LindebergFeller Theorem, 62
Fellers part, 90
Liouville Theorem, 472
locally -integrable, 199
Logarithmic Sobolev Inequality, 113
for Bernoulli, 113
logarithmic Sobolev Inequality
for Gaussian, 114, 356
lowering operator, 97

Index
M
marginal distribution, 83
Markov property, 417
martingale, 205
application to Fourier series, 263
continuous parameter, 267
complex, 267
Gundys decomposition of, 227
Hahn decomposition of, 227
reversed, 217
Banach-valued case, 241
on -finite measure space, 233
martingale convergence
continuous parameter, 271
Hilbert-valued case, 243
Marcinkewitzs Theorem, 207
preliminary version for Banach space, 239
second proof, 226
third proof, 227
via upcrossing inequality, 214
maximal function
HardyLittlewood, 235
HardyLittlewood inequality, 236
maximum principle of Phragm
en Lindel
of, 474
Maxwell distribution for ideal gas, 70
mean value
Banach space case, 199
vector-valued case, 84
measure
invariant, 112
locally finite, 63
non-atomic, 381
product, 10
pushforward of under , 12
measure preserving, 244
measures
consistent family, 383
tight, 376, 382
median, 39
variational characterization, 43
Mehler kernel, 98
minimum principle, 130
strong, 405
weak, 404
moment estimate for sums of independent
random variables, 94
moment generating function, 23
logarithmic, 25
multiplier
Bernoulli, 101

525
Hermite, 98
N

Nelsons Inequality, 106


non-degenerate, 306
non-negative definite function, 119
non-negative linear functional, 374
normal law, 23
fixed point characterization, 91
L
evyCram
er Theorem, 66
standard, 23
null set, see P-null set
O
operator
Fourier, 100
hypercontractive, 105
lowering, 97
raising, 96
optional stopping time, 280
OrnsteinUhlenbeck process, 344
ancient, 345
associated martingales, 415
Gaussian description, 344
Hermite heat kernel, 454
reversible, 346
in Banach space, 365
P
PaleyLittlewood Inequality for Walsh
series, 264
PaleyWiener map, 312
as a stochastic integral, 316
Parsevals Identity, 112
path properties, 158
absolutely pure jump, 158
piecewise constant, 158
Phragm
enLindel
of, 474
pinned Brownian motion, 327
-system, 8
P-null set, 194
Poincar
es Inequality for Gaussian, 355
Poisson jump process, 168
It
os construction of, 390
Poisson kernel, 149
for upper half-space, 429
for ball via Greens Identity, 487
Poisson measure, 122
generalized, 171
simple, 161

526

Index

Poisson point process, 176


Poisson problem, 475
Poisson process, 161, 163
associated with M , 164
generalized, 171
jump distribution, 163
rate, 163
simple, 161
Poisson random variable, N-valued, 21
Poissons formula, 469
Polish space, 367
potential, 487
charge determined by, 494
in terms of excessive functions, 494
principle of accompanying laws, 380
probability space, 1
process
Brownian motion, 177
with drift, 444
OrnsteinUhlenbeck, 344
stationary, 345
process with independent, homogeneous
increments, 152
product measure, 10
progressively measurable, 205, 266
versus adapted, 267
pushforward measure , 12
Q
quitting time, 500
R
Rademacher functions, 5
RadonNikodym derivatives, martingale
interpretation, 216
raising operator, 96
random variable, 4
N-valued Poisson, 21
Bernoulli, 5
characteristic function, 82
convergence in law, 379
Gaussian or normal, 23
vector-valued case, 85
median of, 39
sub-Gaussian, 88
symmetric, 44
uniformly integrable, 15
variance of, 15
rapidly decreasing, 9, 82
Rayleighs Random Flights Model, 396, 399

recurrence of Brownian motion, 413


reflection principle
Brownian motion, 188, 294
for independent random variables, 40
regular point, 421, 427
exterior cone condition, 427
probabilistic criterion, 421
Wieners test for, 504
removable singularity, 472
return time, Kacs Theorem for, 252
RiemannLebesgue Lemma, 121
Riesz Decomposition Theorem, 492
Robins constant, 485
S
semigroup, hypercontractive estimate, 105
shift invariant, 251
-algebra, countably generated, 13
simple Poisson process, 163
run at rate , 163
Sobolev space, 350
square function, Burkholders Inequality
for, 262
stable laws, 141
1
order one-sided
2
Brownian motion, 281
density, 149
characterization, 144
one-sided, 147
density, 148
symmetric, 146
densities, 149
state space, 152
stationary, 251
stationary family
canonical setting for, 251
Kacs Theorem for, 252
stationary process, 345
statistical mechanics, derivation of Maxwell
distribution, 70
Steins method, 72
Stirlings formula, 32, 70
stochastic integral, 316
stochastic process, 152
adapted, 266
continuous, 266
distribution of, 152
independent increments, 152
modification, 189
reversible, 346

Index
stochastic process (continued)
right-continuous, 266
state of, 152
stochastic continuity, 189
stopping time, 212
continuous parameter, 272
discrete case, 212
extended, 278
old definition, 280
optional, 280
Stopping Time Theorem
Doobs, continuous parameter, 275
Doobs, discrete parameter, 213
Hunts, continuous parameter, 275
Hunts, discrete parameter, 213
Strassens Theorem, 340
Brownian formulation of, 363
Strong Law of Large Numbers, 23
for Brownian motion, 188
for empirical distribution, 384
in Banach space, 241, 256, 384
Kolmogorovs, 38
strong Markov property, 417
Strong Minimum Principle, 405
strong topology on M1 (E), 369
not metrizable, 381
sub-Gaussian random variables, moment
estimates, 93
submartingale, 205
continuous parameter, 267
Doobs Decomposition, 213
Doobs Inequality
continuous parameter, 270
discrete parameter, 206
Doobs Upcrossing Inequality, 214
reversed, 217
-finite measure space, 233
stopping time theorem
Doobs
discrete parameter, 212
Doobs continuous parameter, 275
Hunts
discrete parameter, 213
Hunts continuous parameter, 275
subordination, 148
symmetric difference of sets, 246
symmetric random variable, 44
moment relations, 45
T
tail -algebra, 2
and exchangability, 220
ergodicity of, 256

527

tempered, 97
tempered distribution, 350
tight, 376, 382
for finite measures, 382
time reversal, 335
time-shift map, 416
Tonellis Theorem, 4
transform
Fourier, see Fourier transform
Laplace, 21
Legendre, 26
transformation, measure preserving, 244
transient, 414
transition probability, 112
U
uniform norm k ku , 17
uniform topology on M1 (E), 367
uniformly distributed, 6
uniformly integrable, 15
unit exponential random variable, 161
V
variance, 15
variation norm, 368
W
Walsh functions, 264
weak convergence, 116
equivalent formulations, 372
principle of accompanying laws, 380
Weak Law of Large Numbers, 16
Weak Minimum Principle, 404
weak topology on M1 (E), 370
completeness, 377
Prohorov metric for, 379
separable, 376, 382
weak-type inequality, 207
Weierstrasss Approximation Theorem, 17
Wiener measure, 301
Arcsine law, 407
Feynmans representation, 303
Markov property, 417
translation by x, 401
Wiener series, 318
classical case, 334
Wieners test for regularity, 504

You might also like